244 20 8MB
English Pages [441] Year 2013
The Bloomsbury Companion to Lexicography
Bloomsbury Companions Bloomsbury Companion to Cognitive Linguistics, Edited by John R. Taylor and Jeanette Littlemore Bloomsbury Companion to Syntax, Edited by Silvia Luraghi and Claudia Parodi Continuum Companion to Discourse Analysis, Edited by Ken Hyland and Brian Paltridge Available in Paperback as Bloomsbury Companion to Discourse Analysis Continuum Companion to Historical Linguistics, Edited by Silvia Luraghi and Vit Bubenik Available in Paperback as Bloomsbury Companion to Historical Linguistics Bloomsbury Companion to Phonetics, Edited by Mark J. Jones and Rachael-Anne Knight Continuum Companion to Phonology, Edited by Nancy C. Kula, Bert Botma and Kuniya Nasukawa Available in Paperback as Bloomsbury Companion to Phonology Continuum Companion to the Philosophy of Language, Edited by Manuel García-Carpintero and Max Köbel Available in Paperback as Bloomsbury Companion to the Philosophy of Language Continuum Companion to Second Language Acquisition, Edited by Ernesto Macaro Available in Paperback as Bloomsbury Companion to Second Language Acquisition Forthcoming: Bloomsbury Companion to M. A. K. Halliday, Edited by Jonathan J. Webster Bloomsbury Companion to Stylistics, Edited by Violeta Sotirova Bloomsbury Companion to TESOL, Edited by Jun Liu Bloomsbury Companion to Translation Studies, Edited by John Kearns and Jody Bryne
The Bloomsbury Companion to Lexicography Edited by
Howard Jackson
L ON DON • N E W DE L H I • N E W Y OR K • SY DN EY
Bloomsbury Academic An imprint of Bloomsbury Publishing Plc 50 Bedford Square London WC1B 3DP UK
1385 Broadway New York NY 10018 USA
www.bloomsbury.com First published 2013 © Howard Jackson and Contributors, 2013 All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. No responsibility for loss caused to any individual or organization acting on or refraining from action as a result of the material in this publication can be accepted by Bloomsbury Academic or the author. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. eISBN: 978-1-4411-1415-0 Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress.
Typeset by Newgen Imaging Systems Pvt Ltd, Chennai, India
Contents List of Contributors 1
Introduction Howard Jackson
2
A History of Research in Lexicography Paul Bogaards
vii 1 19
3 Research Methods and Problems 3.1 Researching Lexicographical Practice Lars Trap-Jensen
33 35
3.2 Methods in Dictionary Criticism Kaoru Akasu
48
3.3 Researching Users and Uses of Dictionaries Hilary Nesi
62
4 Current Research and Issues 4.1 Using Corpora as Data Sources for Dictionaries Adam Kilgarriff
75 77
4.2 Researching the Use of Electronic Dictionaries Verónica Pastor and Amparo Alcina
97
4.3 Researching Historical Lexicography and Etymology John Considine
148
4.4 Researching Pedagogical Lexicography Amy Chi
165
4.5 Monolingual Learners’ Dictionaries – Where Now? Shigeru Yamada
188
4.6 Issues in Compiling Bilingual Dictionaries Arleta Adamska-Sałaciak
213
4.7 Issues in Compiling Dictionaries for African Languages Danie J. Prinsloo
232
4.8 Issues in Sign Language Lexicography Inge Zwitserlood, Jette Hedegaard Kristoffersen and Thomas Troelsgård
259
Contents
4.9
Identifying, Ordering and Defining Senses Robert Lew
284
4.10 A Theory of Lexicography – Is There One? Tadeusz Piotrowski
303
5 New Directions in Lexicography 5.1 e-lexicography: The Continuing Challenge of Applying New Technology to Dictionary-Making Pedro A. Fuertes-Olivera
321
5.2 The Future of Historical Dictionaries, with Special Reference to the Online OED and Thesaurus Charlotte Brewer
323
341
5.3
The Future of Dictionaries, Dictionaries of the Future Sandro Nielsen
355
6
Resources Reinhard Hartmann
373
7
Glossary of Lexicographic Terms Barbara Ann Kipfer
391
8
Annotated Bibliography Howard Jackson
407
Index
419
vi
List of Contributors Arleta Adamska-Sałaciak is Professor and Head of the Department of Lexicography and Lexicology in the Faculty of English, Adam Mickiewicz University in Poznań. She is the author of Meaning and the Bilingual Dictionary (Peter Lang, 2006), as well as a number of works on metalexicography, lexical semantics, philosophy of linguistics, theory of language change and nineteenth-century linguistic thought. Since the mid-1990s, she has co-authored and edited six bilingual dictionaries of English and Polish, published, among others, by HarperCollins and Pearson Education. In terms of coverage, the New Kosciuszko Foundation Dictionary (New York and Kraków, 2003) is the largest English-Polish, Polish-English dictionary to date, while the Longman Słownik Współczesny (Harlow, 2004, 2nd edition, 2011) is a highly successful pedagogical dictionary for Polish learners of English. More information at http://wa.amu. edu.pl/wa/Adamska-Salaciak_Arleta. Kaoru Akasu is Professor of English Linguistics at Toyo University in Tokyo, Japan. His fields of interest include pedagogical and bilingual lexicography as well as contrastive studies of English and Japanese. He is a co-editor of Lighthouse English-Japanese Dictionary and Luminous English-Japanese Dictionary, both published by Kenkyusha, one of the major dictionary publishers in Japan, and he has also contributed to other dictionary projects. He is also interested in dictionary criticism and has published a number of articles in which he has conducted critical analyses of such dictionaries as OALD6, CALD, NODE, CIDE, and LDOCE2, among others. He co-edited Lexicography: Theoretical and Practical Perspectives, Proceedings of the Seventh ASIALEX Biennial International Conference, 2011, Kyoto, and his most recent dictionary analysis ‘The first dictionary of English collocations in Japan’ is found in Research on Phraseology in Europe and Asia: Focal Issues of Phraseological Studies. Amparo Alcina is Professor at the Universitat Jaume I of Castellón (Spain), where she teaches Translation Technology and Terminology to translation students. She completed a Master’s Degree in Computational Linguistics at the University of Barcelona in 1993, and defended her PhD thesis on Nominal Phrases and Reference at the University of Valencia in 1999. She is currently the Director of the Master’s Degree in Translation Technology and Localization at the Universitat Jaume I and coordinates the research team TecnoLeTTra (http://tecnolettra.uji.es), which focuses on language, terminology and translation technology. She also leads the research projects ONTODIC I and II, which
List of Contributors
are working on the creation of onomasiological and combinatory dictionaries based on ontologies, as well as other research and educational projects on digital dictionaries, specialized corpora and translation memories. She has published articles in journals such as Meta, Target, Perspectives: Studies in Translation, Terminology, International Journal of Lexicography and The Interpreter and Translator Trainer. Paul Bogaards was born in Leiden, The Netherlands, in 1940. He was an experienced teacher and teacher trainer in French, applied linguistics and didactics, and taught French, applied linguistics and lexicology at Leiden University from 1976 to 2002. He compiled several Dutch/French dictionaries, published by Van Dale Lexicografie (Utrecht and Antwerp) and Dictionnaires Le Robert (Paris), wrote books on learner characteristics, language learning, and vocabulary learning in a second language, published numerous papers in various journals and was the editor of the International Journal of Lexicography from 2002 until his death on 3 October 2012. His main research interests concerned the acquisition and testing of lexical knowledge in a second language and the use and usability of dictionaries. Charlotte Brewer’s history of the twentieth- and twenty-first-century OED, Treasure-House of the Language: The Living OED, was published by Yale University Press in 2007. She has written many articles on the OED and on dictionaries more widely and is currently engaged on a study of the OED’s treatment of great writers, including Shakespeare, Chaucer and Austen. Her other publications include Editing Piers Plowman: The Evolution of the Text (Cambridge University Press, 1996) and the co-edited Traditions and Innovations in the Study of Middle English Literature: The Influence of Derek Brewer (Boydell and Brewer, 2013). She is Professor of English Language and Literature at Oxford University and a fellow of Hertford College, Oxford. Amy Chi has been teaching English for academic purposes to EFL/ESL students for over 20 years. Since 2008, she has co-taught an elective course entitled ‘Bilingual Lexicography’ to students studying for an MA degree in Computer-aided Translation. The core of her research work is in the area of pedagogical lexicography underpinning teaching and learning English as a second language. The scope of her research includes EFL learners’ dictionaries, dictionary use training (teaching methodology and material writing) and second language vocabulary acquisition. The greater part of her publications has been research work related to Hong Kong students’ use of English dictionaries to assist their language learning. She was an Advisor for the compilation of the Macmillan English Dictionary (1st edition, 2002). She is the founding secretary (1997–9) and Executive Board member (1999–2011) of ASIALEX. Her recent publications include: ‘Dictionary, teaching and student – the English Language
viii
List of Contributors
Triangle’, Journal of the Institute of Linguists 25, 2005; ‘Relevance of EFL dictionaries in English teaching – ♫ “Will you still need me, when I’m sixty-four?”’, In ASIALEX ’09 Bangkok Proceedings, 2009; ‘When dictionaries support vocabulary learning, where to begin?’, In ASIALEX ’11 Kyoto Proceedings, 2011. John Considine teaches English at the University of Alberta, Canada, and was formerly an Assistant Editor of the Oxford English Dictionary. He is the author of Dictionaries in Early Modern Europe (Cambridge, 2008) and, with Sylvia Brown, editor of a facsimile, with introduction and index of sources, of the Ladies Dictionary of 1694 (Ashgate, 2010). He has recently finished a book on dictionaries in the academy tradition before 1800. Pedro A. Fuertes-Olivera is Associate Professor at the University of Valladolid, where he teaches specialized discourse, and Velux Visiting Professor 2011–12 (Aarhus University, Department of Business Communication). He obtained his accreditation for Full Professor in 2011, and has published around 120 academic contributions, and several printed and internet dictionaries. He is co-author of Pedagogical Specialised Lexicography (John Benjamins), and editor of Specialised Dictionaries for Learners (De Gruyter) and e-Lexicography. The Internet, Digital Initiatives and Lexicography (Continuum). Reinhard R. K. Hartmann grew up in Vienna (Austria), studying economics and translation, and enjoyed an academic career at three English universities, specializing in applied linguistics and producing 19 books, including the Dictionary of Lexicography (co-author Gregory James, 1998/2001), the textbook Teaching and Researching Lexicography (2001) and the three-volume reader Lexicography. Critical Concepts (2003). Since his retirement from the University of Exeter, he has been honorary professor at Birmingham University where the Dictionary Research Centre was moved in 2001. The LEXeter ’83 conference led to EURALEX and other international initiatives, such as the Lexicographica Series Maior of which he was one of the editors from 1984 to 2007. He has compiled a list of Reference Portals for the EURALEX website and is currently working on an International Directory of Lexicography Institutions. Howard Jackson is Professor Emeritus of English Language and Linguistics in the School of English at Birmingham City University, United Kingdom, where he taught for over 40 years. He taught modules on lexicography and corpus linguistics, as well as on grammar and lexicology. He is currently academic dean of the British SIL training programme, managing an MA in field linguistics. He is the author of Lexicography: An Introduction (Routledge, 2002) and (with Etienne Zé Amvela) of Words, Meaning and Vocabulary (2nd edition, Continuum, 2007). He also authored the ‘Lexicography’ article in the Morphology volume of the HSK (Mouton de Gruyter, 2004).
ix
List of Contributors
Adam Kilgarriff is both Director of Lexical Computing Ltd, and a research scientist working at the intersection of lexicography, corpus linguistics and language technology. His company has developed the Sketch Engine (www.sketchengine. co.uk), a leading tool for corpus research used for linguistic research, translation and dictionary-making at Oxford University Press, Cambridge University Press and many other companies and universities. His PhD, on ‘polysemy’, was from the University of Sussex and he has since worked at Longman Dictionaries and the University of Brighton. He is a Visiting Research Fellow at the University of Leeds. He is active in moves to make the web available as a linguists’ corpus and was the founding chair of ACL-SIGWAC (Association for Computational Linguistics Special Interest Group on Web as Corpus). See also www.kilgarriff. co.uk Barbara Ann Kipfer is a lexicographer and ontologist, currently a semantic curation specialist with Google. She has worked for such companies as Dictionary. com and Thesaurus.com, Answers.com, Ask Jeeves, Bellcore/Telcordia, Federated Media Publishing, General Electric Research, IBM Research, idealab, Knowledge Adventure and Wolfram|Alpha. Barbara holds a PhD and MPhil in Linguistics (University of Exeter), a PhD in Archaeology (Greenwich University), an MA and a PhD in Buddhist Studies (Akamai University) and a BS in Physical Education (Valparaiso University). She is the author of 14,000 Things to Be Happy About and 50 other books including thesauri and dictionaries, trivia and question books, archaeology reference books, and happiness and spiritually themed books. Her websites are www.referencewordsmith.com and www.thingstobehappyabout.com. Jette Hedegaard Kristoffersen was trained as a sign language interpreter at the Copenhagen Business School and subsequently received a BA in Linguistics from the University of Copenhagen. She has since been teaching linguistics and ethics for sign language interpreter training at the Centre for Sign Language (now a department of University College Capital, Copenhagen – UCC) and been working in the field of lexicography. She has written sign language-related articles for several publications, most recently a chapter on sign language lexicography in Electronic Lexicography (Oxford University Press, 2012). She is the leader of the Danish Sign Language Dictionary at UCC. Robert Lew is Professor in the Department of Lexicology and Lexicography in the Faculty of English, Adam Mickiewicz University in Poznań, Poland. His current interests centre on dictionary use, and he is involved in a number of research projects including topics such as access-facilitating devices, definition formats, dictionaries for production, space in dictionaries and training in dictionary skills. He is Reviews Editor and Associate Editor for the International Journal of Lexicography (Oxford University Press). In 2011 he edited a special
x
List of Contributors
issue of IJL on dictionary use. He has worked as a practical lexicographer for various publishers, including HarperCollins, Pearson-Longman and Cambridge University Press. Hilary Nesi has been Professor in English Language at Coventry University since 2007. Prior to this she taught for 20 years at Warwick University. She leads the Research Group ‘English Language in the Professions and in Higher Education’ (ELPHE) and supervises PhD students on topics relating to learners’ dictionaries, corpus design and/or the use of English for professional or academic purposes. She has published a number of studies on dictionary design and use, including The Use and Abuse of Learners’ Dictionaries (Max Niemeyer, 2000), and recent contributions to Electronic Lexicography (Granger and Paquot (eds), 2010) and The Oxford Handbook of Lexicography (Durkin (ed.), forthcoming). Hilary was principal investigator for the projects to develop the British Academic Spoken English (BASE) corpus and the British Academic Written English (BAWE) corpus. Her book Genres across the Disciplines: Student Writing in Higher Education, co-authored with Sheena Gardner, was published by Cambridge University Press in 2012. Sandro Nielsen is affiliated with the Department of Business Communication, School of Business and Social Sciences, Aarhus University, Denmark, where he is Associate Professor. He has an MA in English (LSP for translators and interpreters) from 1987 and was awarded his PhD degree in specialized lexicography in 1992. He has published extensively on theoretical and practical lexicography, and is the author of The Bilingual LSP Dictionary. Principles and Practice for Legal Language (Narr, 1994), co-editor of Lexicography in the 21st Century (Benjamins, 2009), Lexicography at a Crossroads (Lang, 2009). He is the author and co-author of a printed and an online bilingual law dictionary, three printed and twelve online accounting dictionaries, and a contributor to the Manual of Specialised Lexicography. His main research areas are principles for online LSP dictionaries, user guides in dictionaries, lexicographic information costs and academic dictionary reviewing. Teaching interests focus on lexicography and legal translation for translators and interpreters. Verónica Pastor is a PhD researcher at Universitat Jaume I of Castellón (Spain), Department of Translation and Communication. She completed a Master’s Degree in Translation Technology and Localization at Universitat Jaume I in 2008. She is currently collaborating in the TecnoLeTTra research team (http:// tecnolettra.uji.es), which focuses on language, terminology and translation technology. Her PhD thesis centres on techniques and strategies in resources for onomasiological searches. Her recent publications focus on search techniques in electronic dictionaries.
xi
List of Contributors
Tadeusz Piotrowski holds the Chair for Lexicography and Translation Studies at the Wyzsza Szkola Filologiczna, Wrocław, Poland. He has published almost one hundred scholarly papers and reviews in journals and books in Poland, the United Kingdom, the United States, Hungary, the Czech Republic, Russia and Germany, as well as three books on lexicography. He is interested in the bilingual and monolingual lexicography of Slavic languages and English, both in their theoretical and practical aspects; and he has edited over 30 dictionaries published in Poland. For many years he was working as a translator, and currently teaches theoretical and practical translation. His major current publications are on tertium comparationis in translation and on equivalence in bilingual dictionaries. Danie Prinsloo is Professor in African languages and former Head of the Department of African languages and Chair of the School of Languages at the University of Pretoria. He is well-known nationally and internationally for his ground-breaking research in the field of African language lexicography. He is the author or co-author of more than 100 scientific articles, books and dictionaries and has presented more than 100 papers on African language lexicography. He started the corpus era in Africa two decades ago and plays a major role in the development of corpus-based lexicography and in particular new designs for paper and electronic dictionaries. He is a five-times awardee of the Exceptional Academic Performer Award of the University of Pretoria and acknowledged as one of this University’s Centenary Leading Minds. In 2010 he received an award from the Pan South African Language Board for his contribution to effective innovation of technology to promote multilingualism. Lars Trap-Jensen has an educational background in general linguistics, Greenlandic, and social studies (Aarhus University), with an MPhil in linguistics (Cambridge University). He has been a lecturer in Danish language at the universities of Basel and Zürich. Since 1994 he has been working as a practical lexicographer at the Society for Danish Language and Literature, Copenhagen, since 2003 as the managing editor of The Danish Dictionary and the dictionary site ordnet.dk. Other projects include the digitization of the Ordbog over det danske Sprog (Dictionary of the Danish Language, 28 volumes + 5 supplementary volumes), development of the Danish wordnet, DanNet and the Danish Thesaurus (ongoing). He is engaged in promoting the field of lexicography, with active memberships in the Danish Association for Lexicography (Leda), in the Nordic Association for Lexicography, and in Euralex, the European Association for Lexicography (currently Vice President). With over twenty publications in the last five years, he has published most recently in LexicoNordica (2011, 2012), Dansk Noter (2012), Nordiske Studier i leksikografi 11 (Eaker, Larsson and Mattisson (eds) 2012).
xii
List of Contributors
Thomas Troelsgård received an MA in Russian language and computational linguistics from the University of Copenhagen, and has since then worked in the field of lexicography. He participated in the development of the Danish Sign Language Dictionary at the Centre for Sign Language (now a department of UCC, University College Capital, Copenhagen). At present he is working partly at the Centre for Sign Language, and partly at the Society for Danish Language and Literature. Thomas Troelsgård has written sign language-related articles for several publications, most recently a chapter on sign language lexicography in Electronic Lexicography (Oxford University Press, 2012). Shigeru Yamada is Professor at Waseda University, Japan. He is the vice-president of the Japan Society of English Usage and Style and the JACET Society of English Lexicography. He also serves on the executive board of the Asian Association for Lexicography. He specializes in EFL, bilingual, and electronic lexicography and dictionary criticism. His recent publications include ‘EFL Dictionary Evolution: Innovations and Drawbacks’ (2010, in English Learners’ Dictionaries at the DSNA, K Dictionaries) and ‘An Analysis of the Oxford Advanced Learner’s Dictionary of Current English, Eighth Edition’ (2012, with M. Kozaki et al., LEXICON 42). He has contributed to a number of dictionary projects, such as Kenkyusha’s New English-Japanese Dictionary (6/e, 2003), Luminous Japanese-English Dictionary (2/e, 2005, Kenkyusha) and Dictionnaire Japonais-Français/Français-Japonais (2009, Assimil/K Dictionaries). Inge Zwitserlood obtained MA and PhD degrees in linguistics from Utrecht University, specializing in sign language linguistics early on, in particular the morphology and morphosyntax of sign languages. She is affiliated with the Centre for Language Studies at Radboud University Nijmegen, where she has, among other things, been involved in the construction of a sign language corpus. Currently, she is working on Sign Language of the Netherlands (NGT) and Turkish Sign Language. Her recent publications include: ‘Classifiers’, in R. Pfau, M. Steinbach and B. Woll (eds) Sign Language: An International Handbook (Mouton de Gruyter, 2012, pp. 158–86); (with P. M. Perniss and A. Ozyurek) ‘An empirical investigation of expression of multiple entities in Turkish Sign Language (TİD): considering the effects of modality’ (Lingua 122, 2012, 1636–67); (with O. Crasborn and J. Ros) ‘The Corpus NGT: an online corpus for professionals and laymen’ (2008; available at www.ru.nl/corpusngt).
xiii
1
Introduction Howard Jackson
Chapter Overview What is Lexicography? Aims and Organization of the Companion
1 3
This introductory chapter has two purposes: to characterize the field of lexicography; to outline the aims and scope of the book.
1 What is Lexicography? The term ‘lexicography’ is used in two distinct senses: first, it refers to the compilation of dictionaries; and second, it refers to the study of dictionaries. It is the first sense that is usually represented in dictionary definitions, such as that in Collins English Dictionary (online): ‘the process or profession of writing or compiling dictionaries’. The terms ‘practical lexicography’ or ‘lexicography practice’ are used for the first sense, and ‘lexicography theory’ or ‘dictionary research’ for the second. We will return to the question of theory a little later. For the second sense, the term ‘metalexicography’ has also been coined, and those who engage in the study or research of dictionaries are called metalexicographers. It is to this audience, and especially to those who are becoming metalexicographers, that this volume is primarily addressed. Dictionary research covers a wide range of activities, and metalexicographers may become experts in one or more aspects, many of which are represented in the contributions to this volume. Some concentrate on the history of dictionary making, others on historical dictionaries. Some investigate the typology of dictionaries, distinguishing monolingual from bilingual, historical from synchronic, general from specialized, alphabetical (semasiological) from 1
The Bloomsbury Companion to Lexicography
thematic or topical (onomasiological). Others investigate the compilation process, including the use of corpora, or the design and structure of dictionaries. Still others engage in dictionary criticism, evaluating the structure and content of dictionaries, both in general terms (the macrostructure) and in terms of the information contained in individual entries (the microstructure). Some concentrate their interest on pedagogical lexicography, the provision of dictionaries for language learners, and how these can contribute to the learning process, or on dictionaries for sign languages, as pedagogical aids to deaf communities. Others research the users and uses of dictionaries more generally, seeking to discover how different groups use dictionaries, and whether the design, structure and content of dictionaries match users’ expectations and reference skills. And still others have turned their attention to elexicography, which is almost certainly the future of lexicography – at least one publisher has already ceased print publication and is offering its dictionary only on the internet (Macmillan Dictionary Blog, 5 November 2012). Where, we may ask, does lexicography sit in the panoply of academic and scholarly disciplines? Is it a branch of linguistics? Or does it belong somewhere else? Or is it an independent discipline? There is no single answer to these questions from those engaged in metalexicography, as you will observe from the contributions to this volume. In 1996, Reinhard Hartmann wrote a chapter (in Hartmann (ed.) 1996) entitled ‘Lexicography as an Applied Linguistic Discipline’. Hartmann’s criteria for an applied linguistic discipline, as expressed in Hartmann (2001: 33), were that it should be ‘linguistic in orientation, interdisciplinary in outlook and problem-solving in spirit’, which he claims (ibid.) applies to pedagogical lexicography and perhaps to computational lexicography, though not to other aspects of (meta)lexicography. Indeed, the introduction to the Dictionary of Lexicography (Hartmann and James 1998: vi) proclaims: Lexicography, often misconceived as a branch of linguistics, is sui generis, a field whose endeavours are informed by the theories and practices of information science, literature, publishing, philosophy, and historical, comparative and applied linguistics. McArthur (1998: 219) places lexicography within a newly minted discipline of ‘reference science’, along with ‘encyclopedics’ and ‘tabulations’, ‘directories’ and ‘catalogues’ (see also McArthur 1986). Addressing the question ‘Who is a lexicographer?’, Bergenholtz and Gouws (2012) are adamant that lexicography is not a subdiscipline of linguistics; it is ‘an independent discipline’, part of information science; there is a range of people who can be called lexicographers, and they do not necessarily have a background in linguistics. Indeed, lexicographers include both ‘those people writing dictionaries but equally those people writing about dictionaries’ (Bergenholtz and Gouws 2012: 76). 2
Introduction
The practice of lexicography was for centuries independent of linguistics. It was Philip Gove, the editor-in-chief of Webster’s Third New International Dictionary (1961), who was the first practising lexicographer to acknowledge explicitly the influence of modern linguistics on the compilation of the dictionary of which he was editor. Since then, linguistics has informed lexicographic practice, especially in respect of the genre of learners’ dictionaries, and to an extent that of native-speaker dictionaries as well. Lexicographic practice has, though, always drawn on a range of other disciplines and crafts, relevant to reference works generally. For this reason, many lexicographers consider it to be either a discipline in its own right (sui generis) or a branch of reference science or information science. One of the debates within lexicography addresses the issue of whether there is such a thing as lexicographic theory. Some would respond to this question with a resounding ‘No’ (e.g. Béjoint 2010). Others give a cautious ‘No’, for example Rundell (2012: 83), ‘it is not clear that there is a role for “lexicographic theory” as such’. Hartmann (1996) makes a distinction between ‘practical’ lexicography (dictionary compilation) and ‘theoretical’ lexicography (dictionary study). Not all see lexicographic theory in this way. You will find that some contributors to this volume, notably Piotrowski (in 4.10), respond to the question, understanding it in a more scientific sense, with an unequivocal ‘Yes’ and present arguments why this should be so. We are engaged in a fascinating discipline. Long may the discussions continue!
2 Aims and Organization of the Companion This Companion to Lexicography is aimed primarily at students of lexicography who are proposing to undertake research in one of the areas covered by ‘lexicography’. While it cannot possibly be comprehensive in its coverage – think of the three-, soon to be four-volume encyclopedia of lexicography (Hausmann et al. 1989–91, Gouws et al. 2013) – the Companion aims to give a broad overview of the discipline, dealing with the main trends and issues in the contemporary study of lexicography; and the contributions have been selected with this purpose in mind. They are all written by experts in their field who are at the cutting edge of lexicographic research. The Companion contains some 20 contributions in 8 chapters, including this introduction. Chapter 2 is free-standing, as a review of research in lexicography over the last six or so decades. Chapters 3 to 5 each contain a number of contributions and comprise the main body of the volume. Chapter 3 has three contributions under the title ‘Research Methods and Problems’; Chapter 4 contains ten contributions on current research and issues in lexicography; and Chapter 5 looks forward, with three contributions on directions in which the study of 3
The Bloomsbury Companion to Lexicography
lexicography appears to be travelling. The final three chapters contain reference material considered to be useful to a lexicography researcher: Chapter 6 indicates the resources that are available for a researcher to tap into; Chapter 7 contains a glossary of key terms in lexicography; and Chapter 8 comprises an annotated bibliography of recent significant work in lexicography research, as well as pointers to where further bibliographical information may be found. Rather than have a single list of references at the end of the book, it has been decided to retain the reference list supplied by each author at the end of their contribution. While the disadvantage may be that some general works will be referenced more than once, this will not be the case with the majority of the references, as each chapter concentrates on one specific area of lexicography research and references the works pertinent to that area. The advantage is that the reader will more readily be able to ascertain the references that pertain to each individual contribution and area of research. To orientate the reader to the scope of the Companion, a summary of each contribution will now be given.
Chapter 2 In Chapter 2, the late Paul Bogaards, editor of the International Journal of Lexicography, reviews the development of research in lexicography, which he dates from the mid-twentieth century in France and then the United States. It was then given a significant boost in the 1980s with the formation of EURALEX. Bogaards shows how the development can be traced through the publication of journals, first the Cahiers de lexicologie, then Dictionaries, followed by IJL and Lexikos, as well as the publication of the three-volume encyclopedia (Hausmann et al. 1989–91). The 1990s saw lexicography established as an academic discipline in the universities of a number of countries. After this review of the development of lexicographic research, Bogaards gives an overview of the dominant research trends in the study of the main areas of lexicography: dictionary history, with its investigation of tradition and innovation; dictionary criticism, encompassing reviews, analyses and most recently forensic dictionary analysis; dictionary typology, which operates with a variety of oppositions, such as monolingual versus bilingual, general versus specific, foreign learner versus native speaker; dictionary structure, usually proceeding from the distinction between macrostructure and microstructure, though, as Bogaards points out, the distinction is by no means always clear-cut; dictionary use, beginning with questionnaires and proceeding to more experimental techniques, and investigating dictionary use for reading tasks, writing tasks and vocabulary learning; and finally dictionary content, in which Bogaards underlines developments in corpus linguistics that have influenced dictionary content, from the KWIC 4
Introduction
concordance to Sketch Engine. Bogaards concludes that lexicographic research constitutes a ‘patchwork’, applied to a number of separate domains, each with their own approaches and methodologies.
Chapter 3 In 3.1, Lars Trap-Jensen discusses research into lexicographic practice, beginning with the observation that the electronic revolution has had a profound impact on dictionary making. He follows this through with a look at the various phases of dictionary making. Probably the one least affected by developments in computing is the initial, and most important, planning phase, though he notes that decisions about the database that will underlie the possible range of dictionary types will be crucial; and he observes that the production of data is now clearly separated from its presentation to the user. This leads on to the next phase, that of designing the database (or group of interconnected databases), choosing the format, deciding what should be included in the light of the anticipated user groups, even perhaps including data that will not be used in any dictionary product. Dealing with the phase of describing linguistic data, Trap-Jensen notes that description has replaced a previous prescriptive or normative approach, though some norms may still apply, for example in spelling. In terms of lemma selection, Trap-Jensen reflects on the fact that frequency in a corpus is not a reliable guide to what is useful to a dictionary user, and decisions have to be made about technical terms, dialect, slang, jargon and loanwords. Indeed there may be local language policies that determine how loanwords, for example, are treated. He concludes this section by observing that an elegant definition is still a man-made object. One influence of IT for good is the advent of digital writing systems, which can check that lexicographers have been consistent, both in data entry and in cross-referencing. The digital revolution also presents a number of challenges: with the availability of different platforms (print, computers, smartphones); how data is presented and accessed requires adaptation; and lexicographers are presented with the challenges of ‘crowdsourcing’ and collaborative lexicography. Looking to the future, Trap-Jensen wonders how current digital reality will change dictionaries; and he anticipates that they are likely to become more ‘embedded’, as currently in e-readers. In 3.2, Kaoru Akasu addresses the issue of dictionary criticism and the methods employed to undertake it. Dictionary criticism, as well as providing a critical evaluation for potential users of a dictionary, has as its primary aim the continual improvement of dictionaries. Akasu notes that there are no agreed criteria nor a systematic framework for evaluating dictionaries. He reviews the few (some seven) attempts to provide such criteria and observes that each of 5
The Bloomsbury Companion to Lexicography
these is devoted to a particular type of dictionary: college dictionary, bilingual dictionary, learner’s dictionary. He concludes that there is no ‘common yardstick’ that could be applied across the board to dictionary criticism and reviewing; criteria will vary according to dictionary type and the purpose of the review (e.g. journalistic vs academic). Akasu then goes on to outline a method that he has used, which he calls ‘dictionary analysis’. It is particularly associated with the Iwasaki Linguistic Circle in Tokyo, of which he is a member. More than 40 such dictionary analyses have been published since 1968, predominantly in the ILC’s publication Lexicon. The method involves using a team of reviewers, and Akasu believes that they should all have had experience of practical dictionary compiling. Various aspects of the dictionary, determined by the team leader, are allocated for investigation, which may involve random sampling of entries, comparisons with other dictionaries and a close attention to detail; both quantitative and qualitative analyses are performed. Akasu does not claim that dictionary analysis is the only right way to do dictionary criticism, and he believes that it needs to develop and improve. More recently, user studies have begun to be incorporated, and comparison of digitized dictionaries undertaken, in order to develop the method. In 3.3 Hilary Nesi tackles the subject of research into dictionary use. She notes an upsurge of studies from the 1980s onwards, with much of the most recent research on electronic dictionaries. The aim of such research is to increase the success of dictionary consultation, identify users’ needs and skill deficits and match types of dictionary to user and use. Nesi reviews methods that have been used in user studies and notes their advantages and disadvantages: questionnaires (too dependent on users’ recall), interviews and observations (small numbers of subjects and not particularly natural), lab-based methods (small numbers and artificial setting), natural dictionary consultation via portfolios and self-reports (subjects need training in think-aloud techniques), log files (cannot record the success or otherwise of the lookup). She notes that there is little that can be said by way of generalization; most studies focus on particular types of user in a particular context. Indeed, because of their availability to researchers based in universities, most users studied have been university students. Dictionary use is divided into ‘receptive’ and ‘productive’ and has been largely associated with the written medium, until the advent of handheld electronic dictionaries. Research into the types of dictionary preferred has been hampered by users’ ignorance of dictionary types and their suitability for different activities; Nesi notes the increasing use of internet dictionaries, but observes that little research has been undertaken into the use of this type as yet. Indeed, such dictionaries are difficult to investigate, because their content changes and is poorly described. Nesi wonders whether dictionary use research has had much effect on improving dictionary design. It has probably had even less effect on e-dictionaries, though she notes, encouragingly, the many university-based 6
Introduction
experiments in specialized e-dictionaries undertaken by informed academics being reported at recent conferences.
Chapter 4 In 4.1, Adam Kilgarriff, based on his vast experience in the field, discusses how corpora can be used as data sources in the compiling of dictionaries. He argues that a corpus can support many aspects of dictionary creation, from developing headword lists to identifying salient features of lexical units to providing examples. He then proceeds to examine in detail, using his own experience, how corpora can be used for some of these purposes. Kilgarriff begins with headword selection, pointing out that a lexicographer cannot simply use the most frequent so-many words in a corpus, because every corpus shows ‘noise’ and bias. Additional issues that require creative strategies include: identifying multiword units, lemmatization and identifying neologisms. The second area Kilgarriff examines is collocation and word sketches, using the Sketch Engine software tool, which he was involved in developing. He shows how one-page word sketches, first used for the Macmillan dictionary, are a distinct advance over multiple pages of concordance lines, and they are also able to provide a guide to sense differentiation; the lexicographer’s methodology has consequently changed. Sketch Engine has been further developed to produce thesauruses, as well as synonym and antonym comparisons. Kilgarriff goes on to show how corpora can be used to suggest labels that indicate a word’s restrictive distribution, including grammatical labels (e.g. ‘usually passive’), register labels (‘formal’ and ‘informal’), domain labels, and regional labels. Finally, Kilgarriff demonstrates how the selection of appropriate examples from a corpus can be automated, and how translation equivalents can be suggested from parallel corpora. He concludes with the claim that ‘corpora can make dictionary-making more accurate, efficient, complete and consistent’. In 4.2, Verónica Pastor and Amparo Alcina discuss their research on the search techniques used in electronic dictionaries. Reviewing previous user studies, they establish that there are two problems with dictionary use: dictionaries are not user-friendly, and users don’t know how to consult a dictionary. They also note that dictionaries don’t always exploit the full potential of the electronic medium, especially those that have simply been transferred from the paper format. They also note from their review that no studies have been undertaken to propose a universal classification of search techniques: this chapter aims to remedy that lack, in order to provide a guide to search types and query options. In various reflections in the literature on electronic dictionaries and their search capabilities, Pastor and Alcina highlight the many suggestions made for enhancing and developing search techniques to satisfy particular user 7
The Bloomsbury Companion to Lexicography
needs. In their research, Pastor and Alcina examined 32 electronic dictionaries and analysed their search options. They conclude that every search has three elements: the query, the resource and the result. All search types can be classified under these three headings. The ‘query’ is the expression introduced by the user, which may be, for example, an exact word, an approximate expression or a combination. The query may also have filters applied, for example restricting the search to a particular part of speech. Pastor and Alcina show how queries are operationalized in electronic dictionaries they examined. The ‘resource’ is the field or section of a dictionary to be searched, which could be the headword list, the content or a thematic field index. Again, this is exemplified from a number of electronic dictionaries. The ‘result’ is the information obtained from the query; Pastor and Alcina illustrate some of the more innovative results they found, including giving contextual information and displaying relations between words graphically. They conclude by observing that dictionaries use different nomenclature to refer to their search types, with the same term sometimes being used for different features; there is a need for some standardization of search type nomenclature. In 4.3, John Considine considers research on historical lexicography and etymology. He begins with observing two unique features: they are not primarily concerned with the present; and definitions or translation equivalents are not of primary importance. He identifies two senses of historical lexicography: a ‘weak’ sense, equivalent to lexicography with an historical dimension; and a ‘strong’ sense, which refers to dictionaries edited on historical principles and founded on attestations of the actual usage of each (sense of a) word, and presented in chronological order with quotation paragraphs. For this type, the analysis of quotations is the fundamental task of the lexicographer. Challenges are varied: for dead languages (Classical Latin, Old English) it may be scanty evidence; but where extensive evidence exists (e.g. Early Modern English, post-classical Latin) it is the selection of appropriate quotations from the mass of data. The most audacious undertakings are those dictionaries, like the NED/OED, which aim to document a language from a given start point up to the present. Problems for the historical lexicographer identified by Considine include: selection – some sources are less available than others (such as informal or non-elite texts); scientific and technical vocabulary, where reference to encylopedic sources has to be made rather than induction from quotations; pronunciation, which is the most remote from historical principles. Considine sees etymological lexicography as having three conceptual axes: the number of etymologies presented in a given publication; the presentation of the results more or less dogmatically; the search for origins as against the inquiry into development. The English tradition tends to the dogmatic, whereas other languages (e.g. Catalan, French, Italian) have etymological dictionaries that are more analytic, in that they consider and review alternative etymological analyses. Etymology in the OED is of 8
Introduction
the inquiry into word histories variety, intertwined with historical lexicography. Considine concludes by doubting that there is much prospect of development of etymological lexicography; but he sees the prospects for the development of historical lexicography as being dramatic and including: the development of historical corpora; the plugging of gaps in regional historical lexicography (e.g. of West African English); the development of author lexicography; and the integration of the online texts of the major historical dictionaries. Many projects have ongoing funding problems, but the genre is alive with possibilities. In 4.4, Amy Chi, in discussing research on pedagogical lexicography, concentrates on the monolingual learner’s dictionary (MLD) tradition in English, claiming that it has represented a major thrust for the development of pedagogical lexicography. After a brief survey of the development of English MLDs, Chi proposes to examine research in four aspects of pedagogical lexicography: design, compilation, use and evaluation. Under the topic of design, Chi looks first at alphabetical ordering and points out how the tendency to reduce nesting and to make some previously nested items (such as compounds) into headwords does not always serve the user well, especially in the tasks of encoding and vocabulary learning. Moreover, some groups of learners may come with expectations from their native-speaker dictionary tradition which cause confusion when they consult a MLD. It is suggested that the future, in the electronic medium, may be towards customization, to suit the lookup habits of the individual user. Chi also broaches the topic of the use of menus and guide-words for highly polysemous items, and notes that these are not without their critics. Under the topic of compilation, Chi looks at the issue of the wordlist in a MLD, commenting on an earlier tendency to concentrate on high-frequency words, but noting a shift since 1995 towards including more lower-frequency words in order to satisfy users’ decoding needs. She also addresses the benefits and disbenefits of using a restricted defining vocabulary, as well as the advantages and disadvantages of the full-sentence definition style. Under the topic of use, Chi notes an increase in user studies since the 1980s, but with no conclusive results as yet; there have been some improvements in methodology, moving from questionnaires to other kinds of instrument, but the sample sizes remain generally quite small, so that only limited generalizations can be drawn. Many subjects also seem to display limited reference skill and low language proficiency. Under the topic of evaluation, Chi looks at who is best able to undertake evaluations of MLDs, and concludes that it is not the student users, though there might be some possibility of EFL/ESL teachers performing this role; however, it is usually informed academics/metalexicographers who undertake it. However, the purpose of evaluations is not always clear, nor is there a set of standard criteria. Most useful are comparative evaluations of different dictionaries. As a conclusion, Chi suggests that more attention should be paid to researching language teachers and their influence, through their teaching methods and curriculum, 9
The Bloomsbury Companion to Lexicography
on students and their use of MLDs. She attributes poor reference skills in part to communicative language teaching, and proposes that there is a need for dictionary use skills to be taught from an early age. New research is needed on how teaching approaches, curricula and teaching methodologies influence users’ abilities to use dictionaries, as well as on how best to teach reference skills. In 4.5, Shigeru Yamada reflects on the development of monolingual learners’ dictionaries, the innovations that they have given rise to and what potential exists in the electronic medium. He divides the evolution of MLDs into five periods: prelude and monopoly (1942–73), when the OALD was developed in Japan by A. S. Hornby and colleagues; rivalry and sophistication (1974–86), when competition from LDOCE developed; competition and versatility (1987–94), with the advent of the corpus-based COBUILD dictionary and its full-sentence definitions, and with substantial revisions to OALD and LDOCE; check and convergence (1995 onwards), with the introduction of CIDE/CALD and MEDAL, when ‘corpus-based’ and ‘user-friendly’ were the watchwords, and in Yamada’s opinion leading to a convergence and loss of individual identity that existed before; going electronic (1990s on), in which he concentrates on evaluating the advent of handheld electronic dictionaries, which he notes as an East Asian phenomenon. He considers that the electronic medium has yet to produce a paradigm shift in the design of MLDs. Yamada discusses six innovative features of MLDs: frequency information, used at first covertly in compilation and then overtly in frequency banding of words; grammar, with a development from difficult-to-learn codes to spelling out, though with a focus on what is typical replacing one on what is possible; signposts and guide-words to help users navigate long entries, with research showing that they aid rapid access, but often with a lack of system or consistency; defining vocabulary, initiated by LDOCE and adopted by others, much appreciated by users but not without its problems, including sometimes lengthy and unnatural definitions; defining style, with the spread of full-sentence definitions innovated by COBUILD, which incorporated contextual information, but again not without problems such as overspecification and increased complexity; examples, increasingly drawn from corpora – a matter of principle for COBUILD – but with the result that they are not always ‘didactically oriented’. Yamada sees the electronic medium as presenting many possibilities, especially since space is not a constraint; but dictionary information needs to be ordered hierarchically and able to be rearranged according to lookup need, and presentation needs to respond flexibly to search strategies. If large-scale dictionaries with options and customization are developed for the electronic medium, then, Yamada expects, variety will return to the MLD. In 4.6, Arleta Adamska-Sałaciak discusses issues relating to the compilation of bilingual dictionaries. For some issues the decisions are not that different from those involved in compiling a monolingual dictionary. The target audience 10
Introduction
has to be considered and what the dictionary will be used for. For speakers of a lesser used language, neither language of the bilingual dictionary may be familiar to them; it is not clear how such users can be accommodated. Bilingual dictionaries may be either monoscopal (Lx–Ly with a Ly–Lx index) or the more usual biscopal (Lx–Ly and Ly–Lx). Biscopal dictionaries may be either monodirectional (aimed at one language group only) or the more usual bidirectional, with the danger that some information may be superfluous for one or other group. There has been an increase recently in the number of monodirectional dictionaries, particularly for learners. Emphasis in bilingual dictionaries has been on satisfying receptive needs, though more attention is now being paid to productive needs. Addressing compilation, Adamska-Sałaciak reflects on the requirements of the lexicographers, as well as on the sources of data for the dictionary. Ideally, two representative and balanced corpora should be used, and if parallel or a pair of comparable corpora are available that is a bonus. More often the case is that a corpus exists for the well-described of the two languages, and the other has to rely on the best resources available, perhaps a bilingualization of a monolingual dictionary. The wordlists should be based on corpus frequency, but not only on this, as usefulness to a target audience must also be considered, including the inclusion of older words for reading literature. For individual entries, Adamska-Sałaciak considers that finding the right equivalent is the essential and often difficult task of the bilingual lexicographer, not helped by the typical characteristics of semantic vagueness, polysemy and lack of one-to-one correspondence: the constant struggle with anisomorphism between lexical systems. Where more than one equivalent exists, they need to be discriminated, and bilingual dictionaries are getting better at this. Similarly, the use of examples is increasing; whether they are corpus-derived or invented depends on the target users and the word being illustrated. Ideally, examples should address the typical production errors of the group of learners. And it is an issue whether examples are used to illustrate the headword (in the L2-L1 section) or the equivalent (in the L1-L2 section); and whether to supply a translation. Adamska-Sałaciak concludes by saying that the future is hard to predict, but that the need for interlingual lexical correspondence will not disappear. In 4.7, Danie J. Prinsloo discusses issues that arise in compiling dictionaries for African languages, with a focus on Bantu languages. African language lexicography is rooted in the work of missionaries, whose approach was Eurocentric, serving the scholarly community rather than African people; it was done by individuals and the quality was not high. Now there is an internal drive for mother-tongue speakers to take responsibility. Some examples of successful dictionaries are given, but they are pockets of excellence, rather than a trend or revolution. African lexicography has been overtaken by the electronic era. In many communities there is a lack of dictionary culture; the pressure is to produce general-purpose dictionaries rather than those tailored to the needs 11
The Bloomsbury Companion to Lexicography
of specific user groups; there is a problem of affordability, so they tend to be limited in scope. Prinsloo identifies as one of the most significant problems for African language lexicography that of lemmatization. Nouns and verbs have a complex morphology, with noun classes and other grammatical categories marked by multiple affixation. There are two traditions: the word tradition, and the stem tradition, with the latter often seen as superior, but not always suitable. Identifying stems can be a major problem, and users do not have sufficient language knowledge to know where to look. Prinsloo identifies four approaches to lemmatization: traditional – based on intuition; paradigm – include all derivations as lemmas or sub-lemmas; rule-orientated – with sets of derivational rules to guide the user; frequency – imposing a cut-off point for which derivations to include. This also has implications for alphabetical ordering, where some letters will have a large number of entries if full forms are lemmas; if noun plurals are entered as lemmas, these will take up space with cross-references to their singular forms; additionally, it is important to mark tone, with some languages having up to nine tones. Prinsloo argues, as he has done elsewhere, for ‘simultaneous feedback’, that is while the compilation process is ongoing, rather than waiting until the dictionary is published. He concludes by pleading for sophisticated electronic dictionaries for African languages, so that problems like that of lemmatization can be addressed and solved. Currently there are few good electronic dictionaries; Prinsloo exemplifies some of them. African language lexicography needs to break away from the tradition of compiling basic dictionaries in order to meet the challenges of the information age. In 4.8, Inge Zwitserlood, Jette Hedegaard Kristoffersen and Thomas Troelsgård consider issues relating to sign language lexicography. More than 130 sign languages have been identified, and, because they are in the visual modality and they have no standardized written form, they represent a number of challenges to the sign language lexicographer; although the use of the electronic medium signals a significant step forward. Sign language dictionaries are needed by educators and others needing to learn a sign language for communicating with deaf people. The earliest ones were little more than glossaries (from word to sign), though, especially in the electronic medium, a lot more information is now included. Zwitserlood et al. outline the various ways in which signs may be represented: drawings or photographs, with dynamic aspects shown by arrows or other symbols; in electronic form, with the use of animated cartoons or video clips, but the signal is brief and may need to be reviewed several times; by means of a notation system, with a set of symbols and rules for their combination. Such a system facilitates ordering and searching, but they are not widely used and take effort to learn. Next the authors discuss lemma selection and observe that until the development of sign language corpora, it was rather unprincipled; ongoing issues include how to treat sign variants, whether to regard as lemmas, among others, classifier predicates 12
Introduction
and nouns incorporating numerals. Since there is no standard written form, lemma ordering is variable: alphabetically by gloss, or by features of the main phonological parameters (handshapes, etc). Similarly, performing searches in electronic sign language dictionaries varies – by sign form or sign usage – as does the presentation of search results; both are illustrated from a number of sign language dictionaries. In terms of future developments, Zwitserlood et al. anticipate that obtaining data will become easier as increasing numbers of video clips are posted on the internet; however, there is a need to address morphological and morpho-syntactic issues in dictionaries and to include more information on inflection and word formation. Also, a greater variety of sign language dictionaries is needed, to serve different groups of users, such as sign language users themselves, as well as learners. Bilingual dictionaries between two sign languages would also be a welcome development. In 4.9, Robert Lew addresses the question of the identification, ordering and definition of senses. He begins by making the point that lexicographic sense division does not readily mirror linguistic reality; senses are a way of structuring dictionary entries. He contrasts the ‘atomic’ view, which considers senses to be discrete entities, with the ‘meaning potential’ view, which accommodates vagueness. He asks whether lexicographers can actually identify senses. The vast citation evidence that can now be collected from corpora does not help the task; lexicographers are looking for clustering of uses for sense identification, so senses in dictionaries are artefacts. And he notes the two traditions of ‘lumping’, to minimize the number of ‘senses’, and ‘splitting’, with a large number of finely distinguished senses. Senses are only objective with respect to the entry structure of a particular dictionary, which segments meaning in a way that is intended to be useful for a particular target user group. In bilingual dictionaries, on the other hand, the lexicographer may be guided by interlingual equivalence relations, so that some sense distinctions in the source language may be rendered redundant, or extra senses may need to be created in the light of distinctions made in the target language. Turning to the ordering of senses, Lew argues that common sense should prevail over strict adherence to any principle. The major ordering principles used include: chronological/ historical – a requirement for historical dictionaries, but also used, inappropriately, in general-purpose dictionaries; markedness – where restricted (marked) senses follow unmarked (more general) senses, though this criterion is not sufficient of itself; frequency – with the development of computer corpora, which is relatively objective but does not always serve the user well; logic – a cover-all term for intuitive attempts at presenting an entry as a coherent text, often with a hierarchy of senses and subsenses. In bilingual dictionaries there is an argument for having as first sense the one with the most common equivalent in the other language, and ordering by this principle; ideally, this would be based on parallel corpora. After a brief excursus into menus and guidewords, as aids for 13
The Bloomsbury Companion to Lexicography
the user to identify the relevant sense in a long entry, Lew turns his attention to definition, which, he points out, is relevant only for semasiological monolingual dictionaries. The classical definition is the genus + differentia specifica, but, from COBUILD on, there has been a revival of the full-sentence definition, in which the defined item is embedded and which can include typical word combinations; it is not without its problems, not least excessive wordiness. Other definitions styles include: single-clause, especially for abstract nouns that lack a genus; and synonyms, which are like the translation equivalents in bilingual dictionaries. A general principle is that the definition should use simpler words than that being defined, which led, in MLDs, to the development of restricted defining vocabularies; though these are not without their problems. Lew concludes by considering whether an entry should indicate relatedness between senses, or whether each sense should be autonomous in its definition. In 4.10, Tadeusz Piotrowski provides an answer to the question of whether we can talk of a theory in lexicography. He notes that there have been strong statements against theories in lexicography, since lexicography is a craft and produces artefacts. Piotrowski points out that this is similar to translation, and there is no shortage of translation theories. Similarly, lexicographers are constantly making decisions, based on assumptions (theories), even if these are implicit. Piotrowski argues that the problem is with the meaning of ‘theory’, which is rarely defined by those who talk about it. In lexicography there is no theory in the natural science sense, that is as a set of hypotheses; but in the sense of ‘theory’ versus ‘practice’, as a description of the principles on which a lexicographer works, there are a number of lexicographical theories. These range from style guides at the lowest level to general theories of lexicography at the most abstract. The problem with most general theories, according to Piotrowski, is that they do not go deep enough and are not critical enough: the assumptions on which dictionaries are compiled need to be made explicit. While lexicography as practice is not ‘science’ in the general sense, metalexicography is. What lexicographers do is not that different from descriptive linguists: they study data, they make generalizations from the data, and they record these in dictionary entries, which are classifications of the facts of the language. The difference is that lexicographers have their eye on the usefulness of their descriptions for a target user audience. In effect, a finished entry is a ‘hypothesis’. However, there is no ‘explanation’ in lexicographical theories, of why certain methods were used rather than others. Piotrowski proposes that a theory of lexicography has three components: the abstract information structure of dictionaries (syntactic); the content that is inserted into this structure (semantics); the target user group (pragmatics). The content is different from linguistics: the data is idealized with the user in mind, and the lexicographer deals with huge amounts of data by contrast with the linguist. Piotrowski concludes by asking whether lexicographical theories have validity. He notes that they are not predictive, 14
Introduction
which is not possible in the humanities; instead, generalizations are based on past events, with the assumption that the near future will be similar. However, we live in revolutionary times for lexicography, and past practice is not a guide for the future. So, lexicographical theories do not have validity.
Chapter 5 In 5.1, Pedro Fuertes-Olivera makes some suggestions about how new technology can be used in the design and compilation of electronic dictionaries on the internet. He observes that ‘e-lexicography’ has been interpreted in various ways by different scholars, even to include the uploading of printed dictionaries to the internet. He takes a broader perspective: e-lexicography relates to electronic reference tools, to include the relationship between technology and humans, lexicographic developments and costs. His approach is to consider the human factor, including the analysis of users’ needs, within the function theory of lexicography (see also 5.3). Fuertes-Olivera reviews three types of online dictionary: printed, replicated and functional. Printed online dictionaries are essentially print dictionaries with faster access by means of a search engine, many of which are restricted in their functions; such dictionaries do not challenge established and embedded practices in lexicography. Replicated online dictionaries imitate other electronic products without challenging whether established practices are adequate, for example basing the dictionary on a particular linguistic theory or relying on data mined from a corpus, neither of which may be appropriate, for example for specialized online dictionaries. Functional online dictionaries, on the other hand, consider the relationship between data, access routes and users’ needs. Lexicography becomes part of information science, and the internet is exploited to enable personalization of information to the individual user. Fuertes-Olivera outlines the technologies that allow this to happen: searching, navigating, user profiling or modelling – to create different articles for different categories of user; filtering – to allow users to select the amount and type of data retrieved; adaptive hypermedia – to tailor what the user sees to their needs; linked open knowledge – cross-referencing to external sources; recommender systems – predicting user’s preferences; annotation systems – allowing the user to make comments on the dictionary. This is illustrated from the Accounting Dictionaries, a project that Fuertes-Olivera has been involved with. These dictionaries make use of a database which can generate several dictionaries, and they have searches that retrieve only the information that the user needs for a particular lookup. Online dictionaries are always a work in progress, subject to revision in the light of new data and new knowledge. Fuertes-Olivera thinks that future online dictionaries will allow users to re-use the hits that they have achieved. 15
The Bloomsbury Companion to Lexicography
In 5.2, Charlotte Brewer reflects on the future of historical dictionaries, with a focus on the Oxford English Dictionary (OED). She claims that historical dictionaries are a special case in lexicography; they are historical in the sense that they tell the story of the language (and culture and society); and they are historical in the sense of being historical documents themselves, reflecting the scholarly methods and cultural attitudes of the times of their compilation. Digitization has opened up possibilities for both types of historical study. OED’s future is electronic, and increasingly sophisticated search and display tools have been developed for the OED online, which reaches a far wider audience. Digitization has also meant that the written sources of evidence for the OED can also be searched electronically, so expanding the data available to lexicographers to a seemingly bewildering extent. After these general preliminary remarks, Brewer traces the development of the OED from its mid-nineteenth-century beginnings to the first edition, then the supplements of the 1970s and 1980s, and then the decision to digitize and merge the first edition with the supplements to produce the 1989 second edition, and so to the realization of the unsatisfactory nature of this edition and the decision to undertake a thoroughgoing revision, still ongoing, for a third, online edition. Since the relaunch of the online OED in 2010, Brewer notes that the searchable second edition has been withdrawn, so that new and revised entries are now seamlessly merged with unrevised entries, creating a confusing mixture. The online dictionary enables this kind of revision, but for scholarly purposes it can be frustrating, since an entry might change from one quarter to the next; and the ordinary user cannot discern what they are retrieving, whether a revised or an unrevised (and so out-of-date) entry. This will be resolved only when the revision is complete, and then Brewer hopes the searchable second edition will be restored; otherwise the investigation of the cultural history aspects of the dictionary will be lost. Another advantage of the online version is the provision of links to other related material, notably the Historical Thesaurus, which is almost unusable in its print form, but excellent with electronic searching, linked to the OED. Links also exist to other works, such as Oxford Reference Online and the Dictionary of National Biography, as well as to dictionaries of Old and Middle English. Brewer suggests that further links, for example to regional and dialect dictionaries, should be included. The future of historical dictionaries is definitely digital, and working out how to unlock and present scholarly information to a wider audience while retaining academic standards is an exciting challenge for today’s historical lexicography. In 5.3, Sandro Nielsen looks forward to the future for lexicography. He observes that the lexicographic landscape is changing, with developments both internally and externally. He considers it imprudent to think that dictionaries will disappear in the near future, but what people regard as dictionaries may change. Trends that have been noticed recently include: the shift of lexicography 16
Introduction
from linguistics to information science; change in the form and size of printed and electronic dictionaries; and a recognition that general lexicography can benefit from advances in specialized lexicography. Nielsen proposes a redefinition of dictionary. Dictionaries are complex reference tools made up of a number of surface features and three underlying features: lexicographic functions; data to support those functions; and lexicographic structure to link the data. He takes a functional approach, as a response to the needs for information and knowledge in the information society of tomorrow. Nielsen argues that lexicography of the future should adopt a transformative approach: to develop guidelines for designing and making dictionaries that are adapted to specific types of user and use situations. Nielsen makes a distinction between offline (including printed) and online dictionaries; and he notes a trend evident in Scandinavia for publishers of dictionaries of ‘small languages’ to discontinue print dictionaries and offer only electronic versions. He foresees the digitization of communication generally leading to printed dictionaries giving way to online information tools. Since internet search engines provide too much unstructured information, lexicographers can future-proof their dictionaries by providing structured results for targeted searches. Nielsen proposes that dictionaries should have three components: a structured database; access to several dictionaries via an interface; and a search engine to mediate between database and dictionaries. One database, which could be monolingual or bilingual, can serve multiple dictionaries and give the user just the information they need for a particular task. He then illustrates how this is achieved in the online English/Danish accounting dictionaries. The electronic medium allows for a dynamic approach to the access and presentation of lexicographic information; for example, several styles of definition can be provided for different types of user (expert vs non-expert, native speaker vs learner), or in two or more languages. Nielsen also foresees the development of voice-enabled access with audio-visual presentation of results, or dictionaries with operative functions, instructing how to perform some task or procedure; and digital media can be personalized. Lexicographers need to consider the information costs, to maximize ease of access and interpretation and clarity of presentation. Concluding, Nielsen predicts that dictionaries of the future will be regarded as digital assistants; and lexicographers need a platform that allows them to respond to the information and knowledge needs of actual and potential users, with a focus on needs-adapted data presentation.
Chapters 6 to 8 In Chapter 6, Reinhard Hartmann presents the fruit of his recent research into the resources available to metalexicographers to aid them in their study and research. He discusses eight types of resource: academies, associations, corpora 17
The Bloomsbury Companion to Lexicography
and databases, journals, networks, online dictionaries, publishers and university research centres. For each, he gives a table of ten representative examples; and the surrounding text contains indicators of further and wider resources associated with each type. In Chapter 7, Barbara Ann Kipfer presents a glossary of lexicographic terms, which she has developed over many years. Chapter 8 contains an annotated bibliography, organized under topics, of mainly twenty-first-century works on research in lexicography and the study of dictionaries.
References Béjoint, H. (2010) The Lexicography of English. From Origins to Present. Oxford: Oxford University Press. Bergenholtz, H. and Gouws, R. H. (2012) Who is a lexicographer? Lexicon 42, 68–78. Collins English Dictionary (online) at www.collinsdictionary.com (accessed 30 November 2012). Gouws, R. H., Heid, U., Schweickard, W. and Wiegand, H. E. (eds) (2013) Dictionaries. An International Encyclopedia of Lexicography. Supplementary Volume: Recent Developments with Special Focus on Computational Lexicography. Berlin and New York: Mouton de Gruyter. Hartmann, R. R. K. (1996) Lexicography as an applied linguistic discipline. In: Hartmann, R. R. K. (ed.), 230–44. Hartmann, R. R. K. (ed.) (1996) Solving Language Problems. Exeter: University of Exeter Press. Hartmann, R. R. K. (2001) Teaching and Researching Lexicography. Harlow: Longman/ Pearson Education. Hartmann, R. R. K. and James, G. (1998) Dictionary of Lexicography. London and New York: Routledge. Hausmann, F. J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (1989–91) Wörterbücher/ Dictionaries/Dictionnaires: An International Encyclopedia of Lexicography, Vols 1–3, Berlin: Walter de Gruyter. Macmillan Dictionary Blog (2012), online at www.macmillandictionaryblog.com/ bye-print-dictionary (accessed 7 November 2012). McArthur, T. (1986) Worlds of Reference: Lexicography, Learning and Language from the Clay Tablet to the Computer. Cambridge: Cambridge University Press. — (1998) Living Words: Language, Lexicography and the Knowledge Revolution. Exeter: Exeter University Press. Rundell, M. (2012) It works in practice but will it work in theory? The uneasy relationship between lexicography and matters theoretical (Hornby Lecture). In: R. V. Fjeld and J. M. Torjusen (eds) Proceedings of the 15th EURALEX Congress. Oslo: University of Oslo, 47–92. Webster’s New International Dictionary, 3rd edition (1961), ed. Philip Gove. Springfield, MA: Merriam-Webster.
18
2
A History of Research in Lexicography Paul Bogaards
Chapter Overview Historical Overview Research in Lexicography Conclusion
19 20 28
1 Historical Overview Although dictionary criticism is almost as old as dictionaries themselves (at least as far as the Western world is concerned, see Hartmann 2008: 136) and even if we can find more theoretical reflections on dictionaries from the sixteenth century on (Hausmann 1989), a more focused scientific study of lexicographical works dates from the middle of the twentieth century only. In 1959 the French journal Cahiers de lexicologie was launched and one year later a centre for the development of a new national dictionary was created in Nancy (France). In November 1960 a first conference on ‘Problems in Lexicography’ was held in Bloomington, Indiana (see Householder and Saporta 1962). Ten years later, again in the United States and in France, the first handbooks on lexicography appeared. Ladislav Zgusta published his Manual of Lexicography in 1971, and in the same year Jean and Claude Dubois’ Introduction à la lexico graphie and Josette Rey-Debove’s Étude linguistique et sémiotique des dictionnaires français contemporains came out. A year before, an issue of the journal Langages had also been devoted to lexicography (Rey-Debove 1970). A few years later, in 1975, the Dictionary Society of North America (DSNA) was founded and their journal Dictionaries appeared for the first time in 1979.
19
The Bloomsbury Companion to Lexicography
A new impulse to the study of dictionaries was given in the 1980s. In 1983 Reinhard Hartmann invited scholars and lexicographers from all Europe to Exeter (UK) to attend a conference (see Hartmann 1983) which was to be the first in a long series organized by EURALEX, the European association of people working in lexicography and related fields that was created during the conference. In 1984 Sidney Landau published his Dictionaries. The Art and Craft of Lexicography (2nd edition 2001). In 1985 the first issue of Lexicographica. International Annual for Lexicography appeared. In 1988 the first volume of the International Journal of Lexicography was published, to be followed in 1991 by Lexikos, an annual journal that is now the mouthpiece of AFRILEX, the African sister association of EURALEX. The international encyclopedia of lexicography was to appear around this time under the title Wörterbücher/Dictionaries/ Dictionnaires (Hausmann et al. 1989–91). From the 1990s on lexicography has been an established academic subject in a number of universities, for example Aarhus (Denmark), Barcelona (Spain), Birmingham (UK), Cergy-Pontoise (France), Göteborg (Sweden), Lille (France) and Poznań (Poland) (also see Hartmann, this volume). In about half a century dictionaries and lexicography have become a mature field of scientific endeavour, having its own traditions and institutions (also see Gouws 2004).
2 Research in Lexicography As may already be clear from the titles of some of the handbooks quoted above, the exact content of what is covered by the term lexicographic research is not always the same: for one author lexicography is essentially a craft or even an art, whereas others insist on the necessity of a sound scientific basis for dictionaries. Reading through the relevant literature makes it clear also that the points of view taken and the types of dictionaries studied may vary substantially. Zgusta (1971), for instance, starts with the discussion of a number of linguistic (semantic, morphological, combinatorial and stylistic) aspects which cover more than half of his book. Dubois and Dubois (1971), on the other hand, present dictionaries, monolingual as well as bilingual ones, language dictionaries as well as encyclopedic dictionaries, as pedagogic texts at the service of an intended group of users, devoting only limited space to some semantic and stylistic issues. Rey-Debove (1971) defines the dictionary as ‘a work that describes a language through a lexical approach’, paying no attention to the user nor to bilingual or encyclopedic dictionaries. In the following sections I will try to give an overview of the main research trends in a number of fields, following the distinctions proposed by Hartmann (2008: 137). As will become clear, progress has been attained at quite unequal paces, and methodologies have not been worked out in all domains. 20
A History of Research in Lexicography
2.1 Dictionary History Historical research on dictionaries is mainly concerned with two subjects: the influence of older dictionaries on newer ones and the new elements that have been introduced by great lexicographers. In other words: tradition and innovation. According to Landau (2001: 43), the ‘history of English lexicography usually consists of a recital of successive and often successful acts of piracy’. Formulated more neutrally as borrowing, the phenomenon of the influence existing dictionaries have had on newly produced ones can be intricate and intriguing, as is for instance demonstrated by Rodríguez-Álvarez and Rodríguez-Gil (2006), who studied two eighteenth-century English dictionaries. In the same vein a number of French seventeenth- and eighteenth-century dictionaries are studied by a research group in Canada (Cormier 2003, Cormier and Fernandez 2004, 2005, Cormier 2010). In all cases, systematic in-depth comparisons are made between the dictionaries involved and in some cases clearly formulated hypotheses are confronted with considerable amounts of precise data. A comparable methodology is sometimes used in order to find out what kind of author may have compiled a given dictionary, for example the anonymous sixteenth-century Vocabulario trilingüe studied by Clayton (2003). In other pieces of historical research the focus is more on what some dictionaries or lexicographers have introduced as innovations. It goes without saying that this type of research is also based on a broad knowledge of the dictionary scene of the time and on comprehensive comparisons of dictionaries. A case in point is a special issue of the International Journal of Lexicography on Dr Samuel Johnson and his Dictionary of the English Language. This dictionary was first published in 1755 and is considered as a leap towards modern lexicography (McDermott and Moon 2005).
2.2 Dictionary Criticism Assessing the quality of a dictionary is not a simple affair. There are so many aspects that can be studied and the evaluations can differ so vastly from one point to another, that it is practically impossible to give a final mark or to prefer one dictionary on all points over another. Dictionary criticism, be it in the form of discussions about the compiling of new dictionaries or in the form of assessments of existing dictionaries, almost never covers the totality of a dictionary. Most dictionary reviews that appear in newspapers are limited to what is called the macrostructure, the list of entries, in which neologisms are welcomed or criticized. In more scientific evaluations a limited number of aspects is normally taken into account, mainly depending on the expertise of the evaluator. Going 21
The Bloomsbury Companion to Lexicography
through the many reviews of individual dictionaries that have been published in the International Journal of Lexicography does not lead to any uniform approach or methodology, and only vague conclusions could be drawn from studies of dictionary reviews by Ripfel (1989) and Jehle (1990). Nevertheless, some attempts have been made at making reviews more systematic or more comprehensive. Wiegand (1998, 2002) has tried to cover a maximum of viewpoints on two learners’ dictionaries of German by publishing in one volume the reviews of a great number of specialists, who comment each on one particular aspect of the dictionary. Chapters are devoted to grammar, morphology, phonetics, orthography and so on. But in spite of the richness of this approach, it cannot be seen as exhaustive and it would be difficult to draw any general conclusions from the data presented. A comparable path is followed by a group of Japanese researchers who have published a number of reports in Lexicon, a journal that is published by the Iwasaki Linguistic Circle in Tokyo. In recent issues one can find thorough analyses of recently published learners’ dictionaries (Masuda et al. 2008, Kanazashi et al. 2009, Dohi et al. 2010; Kokawa et al. 2010). All analyses are globally structured along the same lines: macrostructure, phonetics, definitions, examples and illustrations are in each case studied in about this order, and a user study is in principle included at the end. However, the depth of the analysis is not always the same and additional subjects may vary considerably. In Kanazashi et al. (2009), for instance, chapters on various types of notes and on etymology are added, whereas in Masuda et al. (2008) a chapter is included about ‘vocabulary builders’. Another attempt at structuring reviews of a given type of dictionaries is done by Bogaards (1996). Following step by step the problems the learner is supposed to face while reading or writing a text in a foreign language, he systematically compares four learners’ dictionaries of English. This methodology was followed in the same way for other learners’ dictionaries (see for instance Bogaards 2010a). Recently, Coleman and Ogilvie (2009) introduced evidence-based statistical, textual and qualitative techniques in what they call ‘forensic dictionary analysis’.
2.3 Dictionary Typology Often for the layman ‘the’ dictionary is just the one they happen to have on their shelves or in their computer. But for researchers it is crucial to exactly determine the type of dictionary they want to study. This is not always an easy task. Dictionaries can be categorized in many ways. According to Rey (2003: 89) ‘the typology of dictionaries is almost as complex as that of leguminous plants or of arthropods . . . and still awaits its Linnaeus or its Cuvier’. 22
A History of Research in Lexicography
Typologies are set up from very different points of view such as their comprehensiveness (unabridged vs college dictionaries), the number of languages involved (monolingual vs bilingual or multilingual dictionaries), the type of data treated (linguistic vs encyclopaedic dictionaries), the language of the users (native-speaker vs foreign learner dictionaries) and so on. After having discussed a fair number of attempts at categorizing all existing or possible dictionary types, Béjoint (2010: 45) concludes that ‘it is impossible to classify dictionaries in a way that would be both orderly and realistic’, and then presents a list of seven oppositions: (1) monolingual and bilingual dictionaries – including dialect and slang dictionaries as monolingual, and bilingualized, semi-bilingual and hybrid dictionaries as bilingual; (2) general and specialized dictionaries – the latter category including dictionaries giving information on pronunciation, etymology, synonyms, collocations and so on; (3) encyclopaedic and linguistic dictionaries – the first ones including proper names of any kind; (4) foreign learners’ and native speakers’ dictionaries; (5) dictionaries for adults and dictionaries for children – or for any other age group; (6) alphabetized and non-alphabetized (i.e. ideological, thematic or onomasiological) dictionaries; (7) electronic and paper dictionaries. Although this list of parameters will indeed clearly categorize the majority of dictionaries, it does not take into account differences between academic and more practical dictionaries, or between descriptive and prescriptive dictionaries. Likewise it would be easy to find omissions when using the admittedly restricted list of oppositions presented by Atkins and Rundell (2008: 24ff.) or the ‘general dictionary typology’ given by Svensén (2009: 22ff.).
2.4 Dictionary Structure Dictionaries are structured on a number of levels. Rey-Debove (1971) first introduced the terms ‘macrostructure’ and ‘microstructure’. The macrostructure of a dictionary is the list of entries or its nomenclature, that is a collection of entities that are selected in order to give an adequate view of the domain that is meant to be described. In a language dictionary, the macrostructure is supposed to reflect a rational choice of the lexical stock that is available in that language. As language dictionaries are never complete (only dictionaries of an author’s 23
The Bloomsbury Companion to Lexicography
works or of a ‘dead’ language can be), all lexicographers have to make choices about which items to include and which not. Selections are made in view of the intended use and users of the dictionary. Frequency normally plays a central role in the selection of the headwords to include, but this criterion is not always decisive. Children’s dictionaries may include words that are not very frequent in the language as a whole or even in children’s language, but the fact that many children tend to stumble over these items may lead to their inclusion. A dictionary of synonyms will leave out a number of the most frequent words for the sole reason that they don’t have any synonyms. On the other hand, even if frequency counts do not yield a given element of a series (names of the days, military ranks, colour names, etc.), the lexicographer has to make sure that a whole range is represented in a systematic way. The microstructure of a dictionary is the nature of the information given about the headwords and the way this information is presented. Lexicographers have many decisions to make about what information will be presented in what form and in what order. Pronunciation data, etymology, word class, definition, examples, collocations, homonyms and many other aspects of words can be part of the microstructure, which will present all elements selected in the same order and according to the same standards in each entry. Although the distinction between macro- and microstructure is quite straightforward, in some cases elements can be part of either one. This applies to compounds and multiword expressions as well as to derived forms. Compounds like double time or old girl used to be part of the microstructure not so long ago but are more and more treated as headwords in more recently produced dictionaries (cf. Atkins and Rundell 2008: 181). Phrasal verbs like get around or pull off tend to occupy a place that is somewhere in between the two structure types: they are often presented under a given headword, but have their own microstructure, which is in principle structured in the same way as the one used for independent headwords. Multiword expressions such as fixed phrases (night after night, finer feelings), similes (drunk as a skunk, deaf as a post), proverbs (a rolling stone gathers no moss) and the like are most of the time treated in the microstructure although their fixed status (when they have one, which is not always the case) could justify their presence in the list of headwords. Adverbs like photographically, nouns like Islamism, that is rather infrequent words that are derived from more frequent elements, are sometimes treated in the microstructure of the simplex headwords they are derived from, although they are lexical items with the same linguistic status as these headwords. In principle, headwords are given in their canonical form: infinitive for verbs, singular for nouns, etc. Irregular inflected forms may be listed in the microstructure, but may have their place in the macrostructure as well, especially when the dictionary is meant for users having another native language (the receptive side of a bilingual dictionary or a learners’ dictionary). 24
A History of Research in Lexicography
More recently, two other terms have been introduced to describe the structural composition of dictionaries. The mediostructure or cross-reference structure of a dictionary is the network of internal references that makes information available to the users that is present at other places in the dictionary (synonyms, references to tables or pictorial illustrations, etc.). The megastructure is the relationship and order between the components of a dictionary: the front matter including for example a foreword, an introduction and a list of abbreviations and symbols, the lemma list; and the back matter, which may contain special lists of items or grammatical information and so on (for more details on dictionary structures see Svensén 2009). As will be clear, dictionary structure is most of all a subject of interest for practising lexicographers, who have to devise the content and the presentation of their dictionary. Little scientific research has been done so far in this field. The research that has been done pertains to the next section, where the effects of the choices made by lexicographers are confronted with the use that is made by groups of users.
2.5 Dictionary Use The first point the participants of the first conference on lexicography unanimously agreed on was that ‘dictionaries should be designed with a special set of users in mind and for their specific needs’ (Householder 1962: 279). It took about 20 years, however, before research into dictionary use began to blossom. At first, from 1980 onwards, questionnaires were used to ask users how often they used a dictionary, what they looked up, and if they were satisfied with the information found. The results yielded by these studies (see Bogaards 1988 for an overview) were not very informative and did not lead to any precise conclusions about the strengths or weaknesses of specific (types of) dictionaries. Since the 1990s, research on dictionary use has been more and more experimental in nature and the methodologies used have evolved, seeking to confirm or reject strictly formulated hypotheses and using ever more sophisticated statistics. The main topics that have been studied concern the comprehension of words or texts, the production of words or texts and vocabulary acquisition through dictionary use. In Bensoussan et al. (1984), more than 800 students had to read texts in English as a second language and to answer multiple choice questions about the content of these texts; they were allowed to use a monolingual or a bilingual dictionary. The researchers did not find significant differences in the results obtained by students who did use a dictionary, bilingual or monolingual, or who did not use a dictionary (cf. Nesi and Meara 1991 for similar results). In fact, the approach turned out to be too global and not all variables involved were under control. 25
The Bloomsbury Companion to Lexicography
For instance, it is not clear to what extent the subjects who had dictionaries at their disposal did indeed use them, nor what the relationship was between difficult vocabulary and the way text comprehension was measured. The same topic was studied in much more precise terms and in tightly controlled conditions by Tono (2001) and by Lew (2004). In the latter study, for instance, the words to be looked up were pseudo-English words so as to ensure that there was no possible prior knowledge, and the dictionary entries were immediately available next to the text. In this structured situation, the effect of dictionary use was statistically highly significant. Ard (1982) was the first study aimed at measuring the effect of the use of bilingual dictionaries in writing in a foreign language. Due to an informal methodology based on retrospection and on oral protocols during the writing task, the results are rather vague and inconclusive. In a well-designed study, Laufer (1993) had students translate a number of isolated low frequency words and then use these words in a sentence. For each word the students were provided with either a definition or an example, and later on both a definition and an example. Considering only the productive part here, it turned out that definitions and examples are equally effective, but that the combination of both types of information leads to the best results. Again, only in this more structured situation could clear conclusions be drawn. A similar study has been done by Laufer and Hadar (1997). Research into the relationship between dictionary use and vocabulary learning is the most recent development in this field. Laufer and Hill (2000) studied the lookup strategies of learners using an electronic dictionary and their influence on immediate word retention. Following up on a series of studies executed by Laufer and her collaborators, Chen (2011) compared the relevance of various types of dictionaries for vocabulary learning, testing word retention at the end of the experimental session and two weeks later. Dziemianko (2010 and 2011b) studied the effectiveness of paper and electronic dictionaries in this context but found various differences in favour of electronic or paper dictionaries for immediate reception and production as well as for retention (also see Bogaards 2010b). Other user studies have focused on more specific aspects of dictionaries. Bogaards and Van der Kloot (2001, 2002) studied the effectiveness of grammatical information, the difference between the two studies being that in the first one real dictionary entries were used, whereas in the second, more conclusive, study manipulated entries were presented to the subjects. Recently Dziemianko (2011a) did research in the same spirit, taking into account the proficiency level of the subjects, the part of speech of the words and the particular form of grammatical codes used. Research on the importance of various types of aids in the access structures of dictionaries, such as signposts, menus or guide words, was done by Bogaards (1998b) and was recently taken up by Nesi and Tan (2011) as 26
A History of Research in Lexicography
well as by Tono (2011), who applied a technique (‘eye-tracking’) that is totally new in lexicographic research. The quality of various types of definition was studied by Lew and Dziemianko (2006). For an extensive overview of dictionary user studies see Welker (2010), for methodological considerations see Lew (2011).
2.6 Dictionary Content Dictionary content depends on the type of dictionary. A dictionary of music is supposed to give an authoritative description of the world of musical notation, instruments, composers and so on. In the same way, a language dictionary, which is the prototypical representative of this type of reference book, should present a complete and reliable picture of the lexical riches of the language described. Whereas in this field traditionally data were manually collected from printed sources such as novels and newspapers, nowadays computers and the internet can provide huge masses of data of various types. Although spoken data still cause serious problems, they have also become available in ever larger quantities. Corpus linguistics was introduced in the 1980s and the COBUILD dictionary was the first one to profit entirely from this new approach. The choices made and the techniques used were justified in a book, Looking Up (Sinclair 1987), which had a big influence. Corpora such as the British National Corpus (BNC) or the Bank of English are now far bigger than they were in those early days (600 million tokens or more as against 20 million then). But, more importantly, methods to extract useful lexical data have evolved in important ways. In the pioneering stage of corpus linguistics use was essentially made of the KWIC (Keyword in Context) procedure, a device that aligns a given word within all the contexts it has in the corpus. As the output of this type of analysis was not always manageable, especially not for more frequent words with thousands of concordance lines, ways were sought to automatically produce more concise images of lexical items in a language. Using tagged input and a set of linguistic relations that can hold between words, Kilgarriff and Tugwell (2002) created the Sketch Engine, a tool that presents in one page the most salient characteristics of a given item, such as a verb’s most frequent subjects and objects, the most frequent adjectives that accompany a noun and so on (cf. Kilgarriff et al. 2004). The Sketch Engine is now widely used, not only for dictionary making but for the development of theoretical linguistics as well (see Pustejovsky and Rumshisky 2008). In his theory of norms and exploitations, Hanks (2004) combines prototype theory with the outcomes of corpus linguistics and shows 27
The Bloomsbury Companion to Lexicography
how literal meanings of words can be distinguished from conventional metaphorical or idiomatic uses as well as from incidental metaphorical uses. Sue Atkins (2010), who was one of the important promoters of corpus lexicography, describes how, ideally, lexicographic data described in FrameNet (cf. Fillmore et al. 2003) could be combined with data available in other databases in order to reach ever more complete descriptions of the lexicon of a language.
3 Conclusion As has become clear, I hope, research in lexicography does not constitute one consistent body. It is more of a patchwork composed of quite separate domains, each with their own topics to be studied and their own approaches or methodologies. One could think that this is due to the absence of a comprehensive theory, but up to now it is unclear how such a theory should, or even could, be formulated (cf. Bogaards 2010c).
References Ard, J. (1982) The use of bilingual dictionaries by ESL students while writing. ITL Review of Applied Linguistics 58, 1–27. Atkins, B. T. S. (2010) The Dante Database: its contribution to English lexical research, and in particular to complementing the FrameNet Data. In: G.-M. de Schryver (ed.) A Way with Words: Recent Advances in Lexical Theory and Analysis. A Festschrift for Patrick Hanks. Kampala: Menha Publishers, 267–97. Atkins, B. T. S and Rundell, M. (2008) The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. Béjoint, H. (2010) The Lexicography of English. From Origins to Present. Oxford: Oxford University Press. Bensoussan, M., Sim, D. and Weiss, R. (1984) The effect of dictionary usage on EFL test performance compared with student and teacher attitudes and expectations. Reading in a Foreign Language 2, 262–76. Bogaards, P. (1988) A propos de l’usage du dictionnaire de langue étrangère. Cahiers de Lexicologie 52, 131–52. — (1996) Dictionaries for learners of English. International Journal of Lexicography 9/4, 277–320. — (1998a) Des dictionnaires au service de l’apprentissage du français langue étrangère. Cahiers de Lexicologie 72, 127–67. — (1998b) Scanning long entries in learner’s dictionaries. In: T. Fontenelle et al. (eds) Actes EURALEX ’98 Proceedings. Liège: Université de Liège, 555–63. — (2010a) The evolution of learners’ dictionaries and Merriam-Webster’s Advanced Learner’s English Dictionary. In: I. J. Kernerman and P. Bogaards (eds) English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: KDictionaries, 11–27. — (2010b) Dictionaries and second language acquisition. In: A. Dykstra and T. Schoonheim (eds) Proceedings of the XIV Euralex International Congress. Ljouwert: Fryske Akademy, 99–123.
28
A History of Research in Lexicography — (2010c) Lexicography: science without theory? In: G.-M. de Schryver (ed.) A Way with Words: Recent Advances in Lexical Theory and Analysis. A Festschrift for Patrick Hanks. Kampala: Menha Publishers, 313–22. Bogaards, P. and van der Kloot, W. (2001) The use of grammatical information in learner’s dictionaries. International Journal of Lexicography 14/2, 97–121. — (2002) Verb constructions in learners’ dictionaries. In: A. Braasch and C. Povslen (eds) Proceedings of the Tenth Euralex International Congress 2002, Copenhagen, Vol. II, 747–57. Chen, Y. (2011) Studies on bilingualized dictionaries: the user perspective. International Journal of Lexicography 24/2, 161–97. Clayton, M. L. (2003) Evidence for a native-speaking natural author in the Ayer Vocabulario Trilingüe. International Journal of Lexicography 16/2, 99–119. Coleman, J. and Ogilvie, S. (2009) Forensic dictionary analysis: principles and practice. International Journal of Lexicography 22/1, 1–22. Cormier, M. (2003) From the Dictionnaire de l’Académie Françoise, dédié au Roy (1694) to the Royal Dictionary (1699) of Abel Boyer: tracing inspiration. International Journal of Lexicography 16/1, 19–41. Cormier, M. (ed.) (2010) Perspectives on Seventeenth- and Eighteenth-century European Lexicography. Special issue of International Journal of Lexicography 23/2, 133–222. Cormier, M. and Fernandez, H. (2004) Influence in lexicography: a case study. Abel Boyer’s Royal Dictionary (1699) and Captain John Stevens’ Dictionary English and Spanish (1705). International Journal of Lexicography 17/3, 291–308. — (2005) From the Great French Dictionary (1688) of Guy Miège to the Royal Dictionary (1699) of Abel Boyer: tracing inspiration. International Journal of Lexicography 18/4, 479–507. Dohi, K., Osada, Y., Shimizu, A., Asada, Y., Takahashi, R. and Kanazashi, T. (2010) An analysis of Longman Dictionary of Contemporary English, 5th edition. Lexicon 40, 85–187. Dubois, J. and Dubois, C. (1971) Introduction à la lexicographie: le dictionnaire. Paris: Larousse. Dziemianko, A. (2010) Paper or electronic? The role of dictionary form in language reception, production and the retention of meaning and collocations. International Journal of Lexicography 23, 257–73. — (2011a) User-friendliness of noun and verb coding systems in pedagogical dictionaries of English: a case of Polish learners. International Journal of Lexicography 24/1, 50–78. — (2011b) Does dictionary form really matter? In: K. Akasu and S. Uchida (eds) Asialex 2011 Proceedings. Lexicography: Theoretical and Practical Perspectives. Kyoto: Asian Association for Lexicography, 92–101. Fillmore, C., Johnson, C. R. and Petruck, M. R. L. (2003) Background to FrameNet. International Journal of Lexicography 16/3, 235–50. Gouws, R. H. (2004) Meilensteine auf dem historischen Weg der Metalexikographie. Lexicographica 20, 155–75. Hanks, P. (2004) The syntagmatics of metaphor and idiom. International Journal of Lexicography 17/3, 245–74. Hartmann, R. R. K. (ed.) (1983) Lexicography: Principles and Practice. London: Academic Press. Hartmann, R. R. K. (2008) Twenty-five years of dictionary research: taking stock of conferences and other lexicographic events since LEXeter ’83. In: E. Bernal and J. DeCesaris (eds) Proceedings of the XIII EURALEX International Congress (Barcelona, 15–19 July 2008). Barcelona: Universitat Pompeu Fabra, 131–48. Hausmann, F. J. (1989) Kleine Weltgeschichte der Metalexikographie. In: H. E. Wiegand (ed.) Wörterbücher in der Diskussion. Vorträge aus dem Heidelberger Lexikographischen Kolloquium (Lexicographica. Series Maior 27). Tübingen: Max Niemeyer, 75–109.
29
The Bloomsbury Companion to Lexicography Hausmann, F. J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (eds) (1989–91) Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography. 3 Vols. Berlin: Walter de Gruyter. Householder, F. W. (1962) Summary report. In: F. W. Householder and S. Saporta (eds), 279–82. Householder, F. W. and Saporta, S. (eds) (1962) Problems in Lexicography. Bloomington: Indiana University. Jehle, G. (1990) Das englische und französische Lernerwörterbuch in der Rezension. Theorie und Praxis der Wörterbuchkritik. Tübingen: Max Niemeyer. Kanazashi, T., Otani, T., Nonomiya, A. and Ryu, M. (2009) An analysis of the Longman Advanced American Dictionary, New Edition: a pedagogical viewpoint. Lexicon 39, 18–99. Kilgarriff, A. and Tugwell, D. (2002) Sketching words. In: M.-H. Corréard (ed.) Lexicography and Natural Language Processing. A Festschrift in Honour of B.T.S. Atkins. Euralex. Kilgarriff, A., Rychlý, P., Smrž, P. and Tugwell, D. (2004) The Sketch Engine. In: G. Williams and S. Vessier (eds) Proceedings of the Eleventh EURALEX International Congress. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud, 105–16. [See also www.sketchengine.co.uk] Kokawa, T., Aoki, R., Sugimoto, J., Uchida, S. and Ryu, M. (2010) An analysis of the Merriam-Webster’s Advanced Learner’s English Dictionary. Lexicon 40, 27–84. Landau, S. (1984) Dictionaries. The Art and Craft of Lexicography. New York: Scribner, 2nd edition 2001, Cambridge: Cambridge University Press. Laufer, B. (1993) The effect of dictionary definitions and examples on the use and comprehension of new L2 words. Cahiers de Lexicologie 63, 131–42. Laufer, B. and Hadar, L. (1997) Assessing the effectiveness of monolingual, bilingual, and ‘bilingualized’ dictionaries in the comprehension and production of new words. The Modern Language Journal 81, 189–96. Laufer, B. and Hill, M. (2000) What lexical information do L2 learners select in a CALL dictionary and how does it affect word retention? Language Learning & Technology 3, 58–76. Lew, R. (2004) Which Dictionary for Whom? Receptive Use of Bilingual, Monolingual and Semi-bilingual Dictionaries by Polish Learners of English. Poznań: Motivex. — (2011) User studies: opportunities and limitations. In: K. Akasu and S. Uchida (eds) Asialex2011 Proceedings. Lexicography: Theoretical and Practical Perspectives. Kyoto: Asian Association for Lexicography, 7–16. Lew, R. and Dziemianko, A. (2006) A new type of folk-inspired definition in English monolingual learners’ dictionaries and its usefulness for conveying syntactic information. International Journal of Lexicography 19/3, 225–42. Masuda, H., Uchida, S., Hirayama, M., Kawamura, A., Takahashi, R. and Ishii, Y. (2008) An analysis of Collins COBUILD Advanced Dictionary of American English. Lexicon 38, 46–155. McDermott, A. and Moon, R. (eds) (2005) Johnson in Context. Special issue of International Journal of Lexicography 18/2, 153–266. Nesi, H. and Meara, P. (1991) How using dictionaries affects performance in multiple-choice EFL tests. Reading in a Foreign Language 8, 631–43. Nesi, H. and Tan, K. H. (2011) The effect of menus and signposting on the speed and accuracy of sense selection. International Journal of Lexicography 24/1, 79–96. Pustejovsky, J. and Rumshisky, A. (2008) Between chaos and structure: interpreting lexical data through a theoretical lens. International Journal of Lexicography 21/3, 337–55. Rey, A. (2003) La renaissance du dictionnaire de langue française au milieu du XXe siècle: une révolution tranquille. In: M. Cormier, A Francoeur and J.-C. Boulanger (eds)
30
A History of Research in Lexicography Les dictionnaires Le Robert. Genèse et évolution. Montréal: Les Presses de l’Université de Montréal, 88–99. Rey-Debove, J. (ed.) (1970) La lexicographie (Langages 19). Paris: Didier. Rey-Debove, J. (1971) Étude linguistique et sémiotique des dictionnaires français contempo rains. The Hague: Mouton. Ripfel, M. (1989) Wörterbuchkritik: eine empirische Analyse von Wörterbuchrezensionen. Tübingen: Max Niemeyer. Rodríguez-Álvarez, A. and Rodríguez-Gil, M. E. (2006) John Entick’s and Ann Fisher’s dictionaries: an eighteenth-century case of (cons)piracy? International Journal of Lexicography 19/3, 287–319. Sinclair, J. (ed.) (1987) Looking Up. An Account of the COBUILD Project in Lexical Computing. London and Glasgow: Collins ELT. Svensén, B. (2009) A Handbook of Lexicography. The Theory and Practice of Dictionary-Making. Cambridge: Cambridge University Press. Tono, Y. (2001) Research on Dictionary Use in the Context of Foreign Language Learning: Focus on Reading Comprehension (Lexicographica. Series Maior 106). Tübingen: Max Niemeyer. — (2011) Application of eye-tracking in EFL learners’ dictionary look-up process research. International Journal of Lexicography 24/1, 124–53. Welker, H. A. (2010) Dictionary Use. A General Survey of Empirical Studies. Available at www.let.unb.br/hawelker/images/stories/professores/documentos/dictionary_use_ research.pdf Wiegand, H. E. (ed.) (1998) Perspektiven der pädagogischen Lexikographie des Deutschen. Untersuchungen anhand von Langenscheidts Grosswörterbuch Deutsch als Fremdsprache. Tübingen: Max Niemeyer. — (ed.) (2002) Perspektiven der pädagogischen Lexikographie des Deutschen II. Untersuchungen anhand des de Gruyter Wörterbuchs Deutsch als Fremdsprache. Tübingen: Max Niemeyer. Zgusta, L. (1971) Manual of Lexicography. The Hague/Paris: Mouton.
31
3 Research Methods and Problems
3.1
Researching Lexicographical Practice Lars Trap-Jensen
Chapter Overview Dictionary Conceptualization Designing the Database Describing the Linguistic Data Dictionary Writing Systems Data Access and Presentation Finding the Dictionary – the Future
36 37 38 43 44 46
The computerization of work routines that the world has witnessed over the last couple of decades has changed the lives of many people, but the effect it has had on dictionaries and on the lexicographer’s daily life is all-embracing and difficult to overstate. In this chapter, we look at the various stages involved in dictionary-making and some of the decisions that the lexicographer is faced with in the process. It should be noted that even if some of the issues are general and shared by different types of dictionaries, others pertain to just one particular type. Monolingual dictionaries are obviously different products compared to bilingual ones, and making a dictionary is different from making an encyclopedia, terminology or even a telephone directory, even if they all must be considered lexicographical products. In the following, the focus of attention is, unless stated otherwise, on monolingual dictionaries for native speakers.
35
The Bloomsbury Companion to Lexicography
1 Dictionary Conceptualization While many things have changed dramatically in lexicography, the planning phase is arguably one of the areas that has been least affected, and yet it is perhaps the most important one. It is during this phase that crucial decisions about the database structure and its inventory must be made, based on an analysis of the intended users and their needs. These decisions condition how the data in a later phase can be presented to the end-user and how it can be re-used in other applications. One thing that has changed is that lexicographers today are less inclined to have one specific product in mind when they build their dictionary database. Over the last decades publishers have spent much effort in unifying their dictionary resources and standardizing the information contained in each element in the database. Instead of offering a range of independent dictionaries, each with their own specific list of entry words, inflectional information, synonyms, style labels, etc., most publishing houses now have one central database from which individual dictionaries can be produced by extracting the desired combinations of information types needed for a particular lexicographical product. From the publisher’s point of view, this solution gives them two important advantages. First, it makes maintenance easier, as updates that are made in one element in the database immediately feed through to all the dictionaries in which that element is used. Second, it enables them to refine the range of dictionaries offered, as they can extract different combinations of information types to suit the needs of a specific user group. For the user, it means that they are more likely to recognize a distinct flavour of a particular publishing house’s products, and perhaps a sense of familiarity if they buy more than one product. From a lexicographical point of view, what has happened is that the production of lexicographical data has become more clearly separated from the presentation of the data to the user. Today, many dictionaries are available both as classical paper products (although sales are rapidly declining) and in electronic form. Digitally, they may appear as CD-ROMs, as online versions and as apps for smart phones and tablet computers – and even integrated with other products and applications. The latter includes lexicographical data that is utilized as a resource but is mostly either invisible to the user, as the data used by spell-checkers in word processing programs, or only becomes visible when activated, such as the dictionary definitions found in e-readers that show as pop-ups when users click on a word. More will be said on this in the last section of the chapter.
36
Researching Lexicographical Practice
2 Designing the Database Lexicography involves a lot of decision-making: How many words should be in the dictionary and by what criteria? What types of information are relevant for the intended target group? Does the intended target group coincide with the actual user group, and if not, does it matter? What is the best way to explain a particular word meaning to the reader? The answers to these and a good deal more are not necessarily easy to provide beforehand, but they are important for the way the database should be built. A database designed to meet future requirements for other dictionaries or publication channels should anticipate as many aspects as possible in the early stage of the process. To take an example: in a dictionary that is going to appear as a concise paper dictionary, it may be appropriate to use abbreviations, whereas the online version will have the full forms. However, not all abbreviations have a one-to-one expansion: adj. refers sometimes to ‘adjective’ and sometimes to ‘adjectives’, bot. can unfold as ‘botany’ or as ‘botanical’. It is likely that a simple list of abbreviations and their expansions will not do. Instead, all the different possibilities must be taken into account and a special field or an attribute should be available in the database to show how a given word form is presented as a full and abbreviated form respectively. Another example is morphological information. In a bilingual L1-L2 dictionary, morphological information about headwords is not necessary, as it can be assumed that the users know how the words are inflected in their native language. If the same list of headwords is, at some later point in time, used in a different dictionary, such as a learners’ dictionary, no such assumption can be made and the morphological information will have to be produced if it is not in the database from the outset. For definitions, it is not recommended to use the same wording in a technical dictionary as in an encyclopedia, not to speak of children’s dictionaries. For that reason, the database may well include several versions of the same definition to be used for different user groups. Even within the same dictionary, two versions could be offered: a short definition for quick reference and a more elaborate one for users who prefer an encyclopedic explanation with attention to detail. More could be added to the list of examples: information about pronunciation in either phonetic notation or as sound clips, images and video clips, syntactic and encyclopedic information, quotations and other language examples are all information types that are important to store in the database. They may not all be relevant for publication in one and the same dictionary, but it is advisable to store the information in a central base from where it can be easily retrieved. In some cases, it may even be practical to include elements in the database that are not ever going to be shown to the end-user but which can be useful for other purposes. In a dictionary project that began in the early 1990s, The Danish 37
The Bloomsbury Companion to Lexicography
Dictionary, it was decided to include information about the nearest superordinate word (the genus proximum) and about subject domain if at all possible. This information was systematically entered by the editors throughout the compiling period but was not used at all in the printed dictionary. It was, however, very useful when, later, the dictionary data was used to build a Danish wordnet (on the model of Princeton WordNet) and to compile a Danish thesaurus (Pedersen et al. 2009, Lorentzen and Trap-Jensen 2011). Technically, there is a wide range of software solutions available. Some lexicographers and publishers prefer relational databases, others XML bases, and both types exist as proprietary commercial products and as open source products. No more will be said about soft- and hardware, but it should be stressed that the notion ‘a central database’ is used here as a broad cover term. An actual implementation often involves several databases. The main point is that the overall architecture should be such that the bases are designed to function as a conceptual unit, linked to each other via unique ID numbers. Apart from defining what elements to use in the dictionary, it is important in the planning phase to prepare a manual or style guide that tells the lexicographers about the inventory of elements and how they should be used. A style guide is especially vital for larger projects with a staff of considerable size, and for long-term projects that have to account for some degree of staff turnover. It is an obvious boon for training new editors and helps to secure a uniform final appearance. Style guides are project-internal tools and as such they vary greatly from project to project, ranging from rough principles (be brief and to the point; don’t use brackets and exceptions if you can avoid them) to very specific instructions for certain elements (use a maximum of four synonyms; only describe syntactic patterns with ten corpus examples or more; in metatext, use only words from the defining vocabulary). A style guide that carefully records all the principles and conventions defined in the planning phase, supplied with the revisions and adjustments made during the compiling process will ultimately capture what in the end gives the dictionary its own characteristic style and personality.
3 Describing the Linguistic Data After the initial planning phase, where all the general decisions are made, it is time to consider the object of description, the linguistic data. This is an area that has undergone a dramatic development over the last decades, both in the methods used and in the resources available. The achievements within the field of corpus linguistics have produced a range of tools that lexicographers use to establish a sound empirical basis for their linguistic description. Corpus linguistic methods are employed at almost every stage of the dictionary entry: lemma selection, lexical variants, inflection, collocations, valency patterns, set 38
Researching Lexicographical Practice
phrases, compounding and derivation. This interesting topic is explored in further detail in Chapter 4.1. Here, we will start by taking a closer look at the empiricist position and ask whether it is as justified as many lexicographers are inclined to think.
3.1 Prescription or Description? Historically, the view that dictionaries should reflect the language of all its speakers cannot be taken for granted. In the nineteenth century and earlier it was widely held that, because of the important educational role of dictionaries, they should be normative in the true sense of the word: serving as an exemplary model for their users. Consequently, headwords and examples were excerpted from texts written by respected, canonical authors of their time. A well-known case in point is the Dictionnaire de l’Académie française, which set an example for a number of national dictionaries in the eighteenth and nineteenth centuries. To illustrate, one of the pioneers behind the Dictionary of the Royal Academy in Denmark (Langebek 1740) claimed that there was no room in the dictionary for: All coarse, rude and lecherous words and phrases which contradict decency . . . for they need not be known to those who do not appreciate it, and those who do will surely get to know them anyhow. And a hundred years later, the editor of the most popular Danish dictionary of the time wrote in his preface (Molbech 1859: viii): Even the most frequent use of a newly formed word, especially in colloquial language, renders it no authority or proof of usability in pure speech and good style, nor of its admittance into a dictionary as long as it offends the cultivated ear and the delicate language instinct. There are notable exceptions to the normative tradition, but even so it is not until well into the twentieth century that it became generally accepted for dictionaries to reflect the language community taken as a whole. No doubt, the greater availability of texts beyond the professional works of authors and journalists played a role in paving the way for the descriptive view dominant in the latter half of the twentieth century. Most lexicographers today accept the descriptive role of dictionaries and prefer to see their own role as objective observers of linguistic facts, but there are areas of lexicographical practice that fall outside the scope of description. Whenever the normative role of language is involved, an element of authority 39
The Bloomsbury Companion to Lexicography
and language policy is present. The orthographic forms of the headwords in a dictionary are in many countries regulated not directly by the practice of the language users, but at most indirectly via an official body that has been given the formal authority to decide how words are to be spelled. Other countries, such as the United Kingdom, have no such body but a de facto norm is set by one or two dictionaries which are widely recognized and followed by the educational system and by central authorities.
3.2 Lemma Selection Another area where normative aspects are involved is lemma selection, clearly illustrated by the above quotations. A strictly descriptive approach would involve ranking all the words of a well-balanced corpus and mechanically selecting the most frequent ones until a given cut-off point, determined externally by the size and resources of the dictionary project. In itself, it is no trivial matter to decide what constitutes a word or, more precisely, a lexical unit, but we cannot go into the details here. However, very few dictionaries build their headword list in this mechanical way, as the frequency principle would inevitably produce a number of undesired headwords. Most obvious examples are proper names, which occur frequently in corpus texts but are for the most part uninteresting for a general language dictionary. Admittedly, there are exceptions, such as proper names with a metonymic function (The White House, Mecca), names that are part of multiword expressions (Adam’s apple, Rome wasn’t built in a day) or culture-specific names that require explanation (American idol, the London Eye). Apart from proper names, compounds and combining forms (long-tailed, long-haired, long-eared) are examples of words that are often frequent in texts but are not always obvious lemma candidates. They are often semantically transparent and thus predictable from their components. It should be noted, though, that the process of compounding and derivation is language specific but in languages where the process is productive (which is in general the case for Germanic languages, although less pronounced for English) the result may be a large number of often trivial compounds. In many instances, therefore, the user is better off being able to look up less frequent simplex words which cannot be decoded immediately. Conversely, the descriptivist model would most likely lead to accidental lexical gaps in dictionary coverage. Parts of a language’s vocabulary are made up of closed sets of lexical items, and most people would find it odd if they could only look up some of the months of the year or all the days of the week except Tuesday. For systematic reasons, the solution would be to include all the members of the set no matter if, by chance, one or two were not sufficiently represented in the corpus to warrant their inclusion. Even if absence from the 40
Researching Lexicographical Practice
corpus is non-accidental, inclusion may be worthwhile after all. A case in point is the chemical elements, some of which are undoubtedly better known and used than others. Another problem with lemma selection is the difficulty involved in defining what lexical units belong to a particular language. We have seen that the descriptive approach attempts to reflect the language of the whole language community. But how exactly is a language community delineated? There is no doubt a common core of words that are known to all speakers of English. As one moves away from the common core, however, the vocabulary of individual speakers becomes gradually less concordant. Due to differences such as age, education and housing history, the linguistic experience of a middle-aged engineer from Manchester is different from that of a university student in Cardiff, which is again quite different from a fisherman from Aberdeen. The engineer knows many technical terms from his field of speciality, the fisherman is familiar with the words associated with fishing gear and navigation at sea, and the student probably knows many slang words and informal expressions the others don’t, apart from the special vocabulary associated with her subject of study. Due to differences in personal life and linguistic experience it is unlikely that any two speakers of a language have exactly the same stock of words at their disposal. How should the dictionary deal with this? Should it include all the technical terms from subject fields, and all slang, jargon and informal expressions? Ideally, perhaps yes, especially in an electronic dictionary where physical space is irrelevant. In practice, lexicographers are forced to decide on priorities, in which case it is important to realize who is the intended target group of the dictionary. For a learner’s dictionary, the users can be expected to look up slang and informal expressions more often than special terms from the fishing trade, and they are also more likely to come across the special words used in linguistics and language pedagogy than words belonging to engineering. When it comes to regional language, the practice of most dictionaries is to leave out genuine dialect words that are rare outside the geographical area where the dialect is spoken. Instead, these are included in special dictionaries devoted to that particular dialect. Somewhat more controversial are words from other languages that appear in the corpus texts. In the English-speaking world this is perhaps not as controversial an issue as it can be in other countries and languages around the world where the dominant influence of English as a global language is felt. Because of the status associated with the language, English words and expressions appear quite frequently in otherwise ‘pure’ (Spanish, Czech, Swedish, etc.) contexts. The lexicographer must determine whether to treat these items as loanwords that need explanation like any other word turning up in a corpus with a sufficient frequency, or if they should be interpreted as instances of code switching they can safely neglect. There is no simple answer to this problem, and the lexicographer must in each case carefully 41
The Bloomsbury Companion to Lexicography
analyse if the item shows signs of integration into the surrounding language, for example in the way the word is pronounced, inflected or used syntactically. The more established the word is in the new language, the more reasonable it is to include it in the dictionary. One should, however, be aware that practice varies significantly, as every country and language have their own cultural and political contexts and traditions. This can be a highly sensitive matter, especially in areas where a minority language has been historically dominated by a larger, perhaps colonial language.
3.3 Language Policy Dictionaries and language policy can play important and active roles in contributing to the cultural identity and self-understanding of a young nation. Think of the status of Russian in the Baltic states, or of the role of dictionaries for minority languages such as Frisian, Basque, Irish, Sami, etc., where the attitude towards loanwords from the dominant language easily comes to carry political overtones. What in one context is viewed as linguistic puritanism may in another be interpreted positively as a sign of pride in the local language. In some countries much effort is spent in coining new words and expressions in the local language in order to avoid the influence from English or another dominant language. Such an undertaking is, of course, politically rather than linguistically motivated. From the language’s point of view, it doesn’t matter if the Icelandic word for a female flight attendant is stewardess with an English loanword or flugfreyja (literally ‘flight-Freya’, after the goddess of love in Nordic mythology) or if the English word computer is used instead of the Icelandic coinage tölva (a contraction of tala ‘number’ and Völva, a soothsayer mentioned in the younger Edda). What is important, however, is that the language policy is actively supported by the population, whatever direction it takes. Otherwise it may lead to the absurd situation where the dictionary lists one set of words but you hear a totally different set when you visit the local pub. Even if a solution is found for the descriptive problems discussed here, and even if the achievements of corpus linguistics have indisputably made life easier for the lexicographer in many ways, it should not be forgotten that a substantial amount of data found in dictionaries still cannot be verified empirically. Whether scent and perfume are synonyms and what their most appropriate equivalents are in French, or whether scumbag should be labelled ‘informal’, ‘derogatory’ or ‘slang’ are not questions that can be answered by checking against empirical evidence in a corpus. They are the result of the lexicographer’s evaluation based on his or her professional skill and linguistic perception. Writing a precise, informative and elegant definition is still an area where man is superior to the computer. 42
Researching Lexicographical Practice
Finally, the role of and limits to the use of corpora have been questioned in recent years. For a long time, corpus frequency has been unrivalled as the dominant criterion for lemma selection. But one could also ask: can it be taken for granted that the most frequent words are also the words that users want to look up? Traditionally it is a question that was difficult if not impossible to examine empirically. With the arrival of e-dictionaries, log-file analysis can provide valuable data. Those studies that have been carried out (Bergenholtz and Johnsen 2005; de Schryver et al. 2006) suggest that there is in fact little correlation between corpus frequency and look-up frequency. On the other hand, there is still a long way to go before we can predict which words will be looked up in a dictionary and which ones will not. It is simply an area where we have too little knowledge at present. Undoubtedly, it is a field that will attract more attention, not least because corpus-driven dictionaries are being put under pressure from user-driven tendencies. Future dictionaries may well use no-match lists from the log-files rather than corpus frequency as the main criterion for lemma selection.
4 Dictionary Writing Systems Another area where computers have made life easier for lexicographers is the software they use for entering the lexicographic data into the database. Dedicated dictionary writing systems (DWS) help build the data structure and secure data consistency. They are designed to implement some of the decisions that would formerly be part of the style guide. By creating a Document Type Definition (DTD) or, more recently, an XML schema for the document in which the dictionary is being edited, the lexicographers can specify everything related to the document structure: what elements can be used, in what order are they allowed to occur, which elements may be used recursively, what content is possible (characters, images, sound or video clips), and what attributes an element can have. If an editor makes a mistake in attempting to store the article document, he or she is notified immediately and presented with the possible causes of schema violation. Cross-references are another traditional source of errors in dictionaries that can be handled by a DWS that binds and automatically tracks the source and targets of a reference. Again, if an editor deletes or changes either of the two, he or she will be notified and can take appropriate action. Most DWSs offer various other features, such as: advanced search and statistics, preview settings, export or publishing modules, integration or interoperability with other bases and multi-user set-up. If the DWS has a login function, it can be used as a tool for the project management to keep track of article production and workflow in the various editorial phases. DWSs may be developed 43
The Bloomsbury Companion to Lexicography
and tailored to meet the exact needs of a specific project, but there are also several off-the-shelf products available on the market that are sufficiently flexible to meet most customization needs.
5 Data Access and Presentation As mentioned earlier, there is a growing demand for dictionaries to be available in various channels and on several platforms. This implies that their contents must be presented to the user in different ways, as the possibilities in a printed dictionary are very different from those of an internet browser – the use of hyperlinks and audio/video clips are obvious examples. Likewise, the limited size of the screen constrains what can be displayed on smartphones and other mobile devices in comparison with a 24-inch desktop monitor. If the structure of the dictionary has been devised with sufficient care, it is possible to take the differences into account in the publishing phase. The function and aesthetics of layout and typography in general belong to a long and well-established tradition with obvious consequences for lexicography. However, the readers are encouraged to explore for themselves the wealth of literature on the subject as no more will be said about it here. Instead, we will look at a few selected themes and tendencies that have been the object of discussion in e-lexicography recently.
5.1 Flexible Data Presentation The use of hyperlinks in a browser leads to different ways of navigating as compared with the two dimensions of a sheet of paper. On the computer screen, you can read from the top left to the lower right corner as on a book page but, in addition, you can also navigate ‘downwards’ by clicking on links that will expand an element on the page or take you to a different page. This has been exploited in e-dictionaries in various ways: (1) by having different functionalities on different tabs which the user can shift between (2) by letting the user choose between different contents according to a specified profile (3) by letting the user expand or unfold certain information types by clicking a button or symbol. Many lexicographers have seen this as the fulfilment of their dreams and have welcomed the digital possibilities with enthusiasm. Through hyperlinks they 44
Researching Lexicographical Practice
can now present to the user all and only the relevant information needed in a specific look-up situation. Consequently, a number of edictionaries have appeared that make use of the customization possibilities, ranging from fairly simple options (show more, show less) to highly elaborate user profiles (Trap-Jensen 2010; Verlinde 2010) whereas others have marketed the customized content as different dictionaries altogether (Bergenholtz 2011). In a way, it is the lexicographer’s dream, but it has turned out to have one serious disadvantage: so far, user studies have not been able to confirm that users take advantage of the possibilities offered to them. On the contrary, evidence suggests that they are not very good at analysing their own needs and the look-up situation they are in (Trap-Jensen 2010, Lorentzen and Theilgaard 2012). Lexicographers will have to respond to this challenge and find new ways of accommodating the users. One possible reaction could involve changing the focus from customization towards the use of adaptive technologies: instead of leaving it to the users to select the appropriate combination of data for a task, the dictionary could do so, adapting in line with the user’s previous search behaviour. This is in keeping with the service provided by Amazon and other companies that offer new items to their customers based on what they have bought earlier (cf. Rundell 2012: 23).
5.2 Crowdsourcing and Collaborative Lexicography While everyone knows Wikipedia, few successful attempts have so far been made at creating dictionaries with content that is entirely user-generated (though see Wiktionary, www.wiktionary.org). This could change, of course, but it seems to be in keeping with a preference among crowdsourcing contributors for niche areas where they are experts. Thus user-driven dictionaries are more likely to be successful if they are directed towards a limited area (such as slang, neologisms, dialects, special subject fields) rather than towards general language vocabulary. Lexicographers should take advantage of this and welcome contributions from users. Most obvious are suggestions for new entries, where users can submit anything from a headword to a full entry proposal with sense divisions, definitions, collocations and authentic examples. User-involvement and interactivity in general are characteristic trends in internet behaviour, which can be incorporated in dictionaries in various forms: on social media, as blogs or forums, RSS feeds, questions and answers, comments and feedback on individual entries, etc. Also entertainment, gamification and dynamic content are features that cannot be dismissed as a mere whim of fashion, especially in learners’ dictionaries and other dictionaries aimed at the younger generation of digital natives.
45
The Bloomsbury Companion to Lexicography
6 Finding the Dictionary – the Future The digital reality that we live in today is going to change the form and status of the dictionary, no doubt about it. The question is: how will it change? If we think of the analogue products of the not-so-distant past, the dictionary was a very concrete and tangible object: a physical book on a shelf. Faced with a linguistic problem, the user would have to make a deliberate choice and reach out for the dictionary that he or she thought would help solve the problem. This is not so in the digital era. Faced with a similar problem today, nine out of ten people do not turn to their favourite e-dictionary. Instead, they simply ask Google and they don’t care if the answer comes from a dictionary, a forum discussion or a newspaper article. One response from lexicographers to this challenge is search engine optimization: make sure your dictionary appears as early as possible on the Google result page. Another reaction is resource integration: provide the answer to the user where the problem occurs. Instead of turning to a completely different site, whether an e-dictionary or Google, the user looks up in the embedded dictionary via a keyboard shortcut (e.g. double click) without leaving the site. This is already common in many e-readers but could be developed further, for instance as part of individual applications and sites or even as part of the computer’s operating system. Much more will be said about future dictionaries in Chapter 5. The challenge for lexicography in digital times is that dictionaries will definitely change their appearance and most likely will lose status and run the risk of drowning in the profusion of other resources with which they compete for user attention. Whether this is viewed as a good or bad thing is more than anything a matter of individual inclination. For the pessimist, it may be a comfort that nothing suggests that the need for lexicographical data is diminishing.
References Bergenholtz, H. (2011) Access to and presentation of needs-adapted data in monofunctional internet dictionaries. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds) e-Lexicography: The Internet, Digital Initiatives and Lexicography. London and New York: Continuum, 30–53. Bergenholtz, H. and Johnsen, M. (2005) Log files as a tool for improving internet dictionaries. Hermes 34, 117–41. De Schryver, G.-M., Joffe, D., Joffe, P. and Hillewaert, S. (2006) Do dictionary users really look up frequent words? – on the overestimation of the value of corpus-based lexicography. Lexikos 16, 67–83. Langebek, J. (1740) Plan for the Organisation of the Dictionary of the Royal Academy. Here quoted from L. Jacobsen and H. Juul-Jensen (1918) Preface to the Dictionary of the Danish Language, Vol. I, Copenhagen: Gyldendal Publishers. Lorentzen, H. and Theilgaard, L. (2012) Online dictionaries – how do users find them and what do they do once they have? In: R.V. Fjeld and J. M. Torjusen (eds) Proceedings
46
Researching Lexicographical Practice of the 15th EURALEX International Congress. Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 654–60. Lorentzen, H. and Trap-Jensen, L. (2011) There and back again – from dictionary to wordnet to thesaurus and vice versa: how to use and reuse dictionary data in a conceptual dictionary. In: I. Kosem and K. Kosem (eds) Electronic Lexicography in the 21st Century. New Applications for New Users (Proceedings of eLex 2011, Bled, 10–12 November 2011). Ljubljana: Trojina, Institute for Applied Slovene Studies, 175–9. Molbech, C. (1859) Molbechs ordbog 1–2. Copenhagen: Gyldendalske Boghandlings Forlag. Pedersen, B. S. et al. (2009) DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary. Language Resources and Evaluation, 43/3, 269–99. Rundell, M. (2012) It works in practice but will it work in theory? The uneasy relationship between lexicography and matters theoretical. In: R. V. Fjeld. and J. M. Torjusen (eds) Proceedings of the 15th EURALEX International Congress. Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 47–92. Trap-Jensen, L. (2010) One, two, many: customization and user profiles in internet dictionaries. In: A. Dykstra and T. Schoonheim (eds) Proceedings of the XIV Euralex International Congress, Leeuwarden. Fryske Akademy, 1133–43. Verlinde, S. (2010) The base lexicale du français: a multi-purpose lexicographic tool. In: S. Granger and M. Pacquot (eds) eLexicography in the 21st Century: New Challenges, New Applications (Proceedings of eLEX 2009). Louvain-la-Neuve: Presses Universitaires de Louvain (Cahiers du CENTAL series), 325–34.
47
3.2
Methods in Dictionary Criticism Kaoru Akasu
Chapter Overview Introduction Dictionary Criticism as it Stands Suggestions for Improvement Some Thoughts on the Proposals for Improvement Dictionary Analysis New Directions in Dictionary Analysis Conclusion
48 49 49 52 54 56 56
1 Introduction Dictionary criticism, or dictionary evaluation, is an area of lexicography that, through evaluations and appraisals, aims to contribute towards improving the quality of a dictionary or dictionaries or, for that matter, to help to further progress lexicography per se. Hartmann and James (1998: 85) noted, in the entry for lexicography, that ‘[t]here are as yet no internationally agreed standards of what constitutes a good dictionary’. Potentially, dictionary criticism does, however, have a crucial and pivotal role to play in transforming the current situation. In what follows, I will first touch upon some observations concerning dictionary criticism in order to find out where it stands. I will, then, take a brief look at the kinds of attempt made thus far to improve the situation as regards dictionary criticism, and I will have some comments to make about them. And finally, I will introduce the reader to what we call dictionary analysis, and elaborate on and present it as a reasonably practical and realistic, if not the best or ideal, method for conducting dictionary criticism. 48
Methods in Dictionary Criticism
2 Dictionary Criticism as it Stands Hartmann (1996: 241) notes that ‘[e]valuating and assessing lexicographic products is a time-honoured activity’ and Dohi (1993: 22) also states that ‘the criticism of dictionaries of English is considered to have a history of well more than 200 years’. It is worth pointing out, however, that in the monumental work on the topic (Hausmann et al. 1989–91), a tome with more than 3,200 pages in 3 volumes, containing some 330 articles on a vast variety of topics of lexicography, there is only one article, that of Osselton’s (1989), that carries the phrase ‘dictionary criticism’ in its title. Osselton (1989: 229) observes quite harshly as well as to the point: [T]he criticism . . . reveals a surprising lack of interest in general principles, with incidental sniping taking place of any real exploration of the intentions with which the works being criticized had been set up. Omissions are lamented and superfluities condemned, but the whole basis for determining the nomenclature remains largely undiscussed. The near-total absence of concern for the semantic principles of definition is specially striking, and the topic of lemmatization is seldom raised. User-convenience is hardly an issue. Although these comments of Osselton’s were made about some of the major academic, historical dictionaries such as OED, views of a similar tenor are echoed by many scholars. Hartmann (1996: 241), for instance, writes that dictionary criticism ‘has been beset by personal prejudice rather than noted for the application of objective criteria’. Dohi (1992: 6) commented earlier that ‘[d]ictionaries in the past do not seem to have attracted the critical attention they deserve, or the criticism has been made not from a substantial but from a superficial, nitpicking point of view’. Hartmann and James (1998: 53) note, in the entry for evaluation, that ‘[a] systematic framework for formulating criteria with respect to COVERAGE, FORMAT, SCOPE, SIZE, TITLE etc. has yet to be developed’. All of these observations point to the fact that dictionary criticism leaves a great deal to be desired and that there is an acute need to ‘establish a sound and rigorous basis on which to conduct the criticism, together with a set of applicable criteria’, as Jackson (2002: 173) notes. The need for more research into dictionary criticism itself and for more objective evaluation criteria is obvious.
3 Suggestions for Improvement It is worth noting that a wide variety of attempts have been made to introduce or set up applicable criteria that may be used in the actual implementation of 49
The Bloomsbury Companion to Lexicography
dictionary criticism. I will touch upon some of the attempts, but I hasten to add that I do not mean to give a definitive, final answer to the question of exactly what the specific evaluation criteria should be, for reasons that will be given later. In his review of five American college dictionaries, McMillan (1949: 214) wrote that ‘[t]hese dictionaries1 can be compared by evaluating (1) the quantity of information, (2) the quality of the information, and (3) the effectiveness of presentation’ and went on to say that ‘[t]he quality of the definition in a dictionary can be judged by various criteria: accuracy, completeness, clearness, simplicity, and modernity’ (McMillan 1949: 218). Steiner (1984) provides ‘a checklist for reviewers of bilingual dictionaries’, comprising three major categories, each with a few subcategories, which goes as follows (Steiner 1984: 168–78): (I) The degree of inclusiveness: A. The degree of inclusiveness of lexical elements, B. The degree of inclusiveness of nonlexical elements; (II) Problems involving either content or organization, or both: A. Bias and prejudice shown in the lexical or the nonlexical elements, B. Glossing the entry word only by substitutable translation equivalents, C. The degree to which the user is afforded meaning discrimination and the method by which it is provided, D. Is the dictionary monodirectional or bidirectional?, E. The establishment of standards for equivalents and the search for new equivalents, F. The faithful reversibility of the two sides of the dictionary, G. The amount of information given by the orthography adopted in the dictionary; (III) Problems involving organization: A. The consistency of the alphabetization, B. The feminine form of an adjective as a noun, C. The number of alphabetical lists, D. Under which entry word is an idiomatic expression to be entered?, E. The order in which the part-of-speech function is treated, F. The order of meanings, G. Words of different origin and/or meaning of the same spelling, H. The degree to which the typography implements the goal of lexicographer, I. Convenient arrangement of the book. Some subcategories break down into further subparts. Kister’s (1992) contribution, which may be said to be a little different in nature in that it is a buyer’s guide, dealing with an overwhelming number of dictionaries, gives ‘[t]wenty points to consider when choosing a dictionary’ (Kister 1992: 64–73): 1. Does the dictionary provide the level of vocabulary coverage you need? 2. Are the dictionary’s contents clear and readable? 3. Is the dictionary produced by reputable people? 4. Is the dictionary reasonably current? 5. Are the dictionary’s definitions thorough, accurate, precise, and objective? 50
Methods in Dictionary Criticism
6. Does the dictionary include etymologies and, if so, are they relatively easy to understand? 7. Does the dictionary include illustrative quotations or examples and, if so, are they effective? 8. Does the dictionary include pictorial illustrations and, if so, are they effective? 9. Does the dictionary include synonyms and antonyms and, if so, how extensive are they? 10. Does the dictionary include variant spellings and pronunciations? 11. Is the dictionary’s pronunciation system reasonably precise and not overly complicated? 12. Does the dictionary furnish adequate usage notes and labels? 13. Does the dictionary emphasize American or British English? 14. Does the dictionary offer any special or unique lexical features? 15. Does the dictionary include any useful nonlexical (or encyclopedic) material? 16. Are the dictionary’s page layout and typography appealing to the eye? 17. Is the dictionary physically well made? 18. Is the dictionary fairly priced? 19. What do knowledgeable critics say about the dictionary? 20. How well does the dictionary measure up to its major competitors? All of these points are interesting and legitimate questions for prospective buyers. However, in academic or scholarly reviews of dictionaries, we do not take up some questions, such as 17 and 18 above, as they are aimed at a different audience. This is also part of the reason why I am of the opinion that there should be a distinction drawn between specialist reviews and journalistic ones.2 Nakamoto’s (1994) paper, which, in my view, is among the most in-depth and thoroughgoing pieces of work dealing with evaluation criteria, attempts to provide a comprehensive checklist for reviewing monolingual English dictionaries for foreign learners. Nakamoto (1994: 16) states that the checklist ‘consists of four parts: (a) checkpoints about the macro-structure and micro-structure of the dictionary reviewed (to be abbreviated to [DA] and [DI] hereafter), (b) those about the review ([R]), (c) those about the critic(s) ([C]), and (d) those about the influence ([I])’. All checkpoints are ‘written in question form so that the critic can draft a review systematically by answering these questions one by one’ (Nakamoto 1994: 2). I refrain from listing all the specific checkpoints that he has produced because there are so many of them: [DA] has 27 items in it, [DI] 26, [R] 21, [C] 13, and [I] just 2. To give a few examples for each category, [DA] includes questions such as ‘Is there any front-matter article?’, ‘Who are the intended users?’, and ‘What is the general entry structure?’, and [DI] questions like ‘How are compounds shown?’, ‘How are the senses presented?’, and ‘Is any pragmatic information provided?’ In [R] are included such questions as ‘Who are the intended readers of the review?’, ‘Which features of the dictionary are reviewed?’, and ‘Is it a descriptive or evaluative review?’. [C] contains questions such as ‘Who is the reviewer?’, ‘What does the critic review the dictionary for?’, and ‘What experience does 51
The Bloomsbury Companion to Lexicography
the reviewer have of dictionary reviewing?’ Lastly, the checkpoint questions in [I] are ‘Has the review changed the reviewed dictionary in its subsequent printing(s) and/or edition(s)?’ and ‘Has the review influenced dictionary making?’ (Nakamoto 1994: 16–44). Bogaards (1996) made a critical appraisal of the so-called big four dictionaries that came out in the year 1995, that is OALD5, LDOCE3, COBUILD2 and CIDE. In his review, Bogaards developed a set of criteria for evaluating EFL dictionaries. What should be noted is that his set of criteria is divided into two main parts: that of RECEPTION and that of PRODUCTION. I will have something to say about this later on. Chan and Taylor’s (2001) survey is very interesting in that it is a review of various aspects of a number of (selected) dictionary reviews. Accordingly, it may be said that their work is not directly involved in setting up evaluation criteria, but it does have an indirect bearing on it. Their findings suggest that ‘most dictionary reviews are factual and descriptive rather than evaluative, and only in some cases is the evaluation based on a principled study of any kind’ (Chan and Taylor 2001: 163). They also suggest that ‘they [such reviews] should be evaluative and that at least part of the evaluation should be based on a study of the use of the dictionary by target users’ (ibid.). As the title of his paper indicates, Swanepoel (2008) is still more interesting because he has made a first attempt to develop ‘a general framework for the description and evaluation of dictionary evaluation criteria, using parameters from research on dictionary criticism and the usability of websites’ (Swanepoel 2008: 207). He concludes by providing, at the end of the article, a framework consisting of the following four parameters: ‘Information covered by the evaluation criteria’; ‘Presentation format of the evaluation criteria’; ‘Validity of the evaluation criteria’; and ‘Application of the evaluation criteria’. Each parameter has a few subcategories. He goes on to say that ‘to be usable, the evaluation criteria themselves will have to meet the following evaluation criteria: be explicitly formulated, valid/motivated, generally acceptable, and the evaluative concepts on which they are based will have to be clearly defined and operationalized’ (ibid.: 228). There are many more articles and books dealing with evaluation criteria that I have failed to touch upon here, but let us put an end to this survey and move on to the next section, which does not imply in any way that these other works are not worthy of note or worth considering.
4 Some Thoughts on the Proposals for Improvement Let us pause for a moment to consider some of the issues involved in or implied by studies of evaluation criteria such as those I mentioned above. 52
Methods in Dictionary Criticism
First, let us think about the possibility of establishing comprehensive criteria that could be applied across the board, that is, to all dictionary types or genres. From an idealistic point of view, it would be practical to have such a ‘common yardstick’. Let us name the comprehensive tool, applicable to all dictionaries, ‘Common Yardstick’ with capital letters. My question is, however, whether we would ever want to compare, for instance, a scholarly historical dictionary such as OED with a learner’s dictionary like OALD or LDOCE. I suppose not. As suggested by the fact that most investigations of evaluation criteria mentioned above have focused on one type of dictionary or another, it would be sensible and practical to set up evaluation criteria by defining or demarcating, early on, the dictionary type or genre of the dictionaries to be reviewed. Consequently, a ‘common yardstick’ rather than the Common Yardstick will be more easily accessible and available, and it will actually suffice insofar as dictionaries under examination are of a particular type. Take, for example, the number of words entered in dictionaries as one aspect of dictionary features to be compared. Should it be the sole purpose of the dictionary to provide headwords, say, for the game of Scrabble, then the yardstick that we would be looking for will be quite simple: the more the better. That is not the case, however. Dictionaries have multiple purposes and functions that are reflected in their structure. Therefore, dictionary criticism will have to be conducted in such a way that reviewers take this multilayered complexity into account. Put another way, dictionaries may not be reviewed in such a simplistic manner. This is another reason for not seeking the comprehensive Common Yardstick. As for the distinction between receptive and productive functions as illustrated by Bogaards (1996), it is, indeed, a useful as well as meaningful dichotomy in many ways. Consider native-speaker dictionaries like COD, however. It is questionable whether the targeted users, that is native speakers of English wish to have such grammatical information as ‘countable’ or ‘uncountable’ for nouns, ‘attributive’ or ‘predicative’ for adjectives, or ‘complementation patterns’ for verbs. So, the kind of information needed for dictionaries of one particular genre obviously differs significantly from that required of dictionaries of another genre. We should remember that Bogaards’ detailed review focused on the big four, monolingual learners’ dictionaries of English rather than native-speaker dictionaries. Accordingly, the adoption of the dichotomy is adequate, effective, and valid as far as learners’ dictionaries are concerned, whereas it may not be so for other types of dictionary such as native-speaker ones. This would constitute yet another reason for not seeking the Common Yardstick. If we do, however, press forward with a quest for the Common Yardstick, we will find ourselves compelled to raise the degree or level of abstraction of evaluation criteria so that the Yardstick would encompass all relevant properties of different criteria. We would eventually end up with something like 53
The Bloomsbury Companion to Lexicography
accuracy, adequacy, consistency, correctness and the like. Yet, the difficulty in their actual applicability or practicality lies precisely in the fact that these concepts are highly abstract. It would seem that a further attempt to provide a clear and definite answer to the question of what constitutes the Common Yardstick lies outside the scope of this chapter.
5 Dictionary Analysis In this section, I would like to introduce the reader to what we call ‘dictionary analysis’. By ‘we’ I refer to those active members of the Iwasaki Linguistic Circle (abbreviated to ILC hereafter) who, like myself, have been involved in the making of dictionary analyses.3 Dictionary analysis is put forward here as one of the suitable or promising candidates for a reasonably practical method or measure to carry out dictionary criticism. My belief is that this particular type of analysis deserves more attention and recognition among people concerned with dictionary criticism in particular and, for that matter, lexicography in general, both theoretical and practical. As I mentioned in Akasu (2007), the first dictionary analysis of its kind was published back in 1968. There have been more than 40 articles of dictionary analysis published in Lexicon, the mouthpiece of the ILC, and other journals.4 I will consider, as an example, the following article in Lexicon that came out in 2000: ‘An Analysis of The New Oxford Dictionary of English’ (abbreviated to NODE, hereafter). In what follows, special attention will be given to the methodological aspects of the analysis. First, I was asked to be the head of the reviewing team. Thus, this was going to be a ‘team review’, in accordance with Chapman (1977), with each reviewer being a specialist in some area.5 My next task was to decide on what sort of dimensions NODE should be looked into. This would determine exactly how many other analysts were needed. The following six dimensions were chosen: headwords, pronunciation, definition, illustrative examples, grammatical information and etymology. The dimensions like these would of course vary in number and type according to the kind of dictionary to be analysed. I will take up, by way of illustration, Section Four of the article titled ‘Sense description’, the section that I was in charge of. For reasons of brevity, I cannot give a detailed account of all sections in Akasu et al. (2000), but, basically, the same policy and principles of analysis run though all the other sections as well, so one may safely say that an account of the features as seen in one section will be sufficient for clarification. The section in question is composed of ten subsections, beginning with ‘Introductory remarks’, part of which goes as follows: ‘[T]he sense description of NODE will be examined from a number of aspects. First, the core sense and 54
Methods in Dictionary Criticism
subsense structure will be considered. Then, specific entries will be looked into according to their types and, in so doing, reference will be made, where appropriate, to the division, arrangement, and presentation of the senses of words entered. Lastly, usage labels will be discussed’ (Akasu et al. 2000: 71). The remaining subsections included are: ‘Core senses and subsenses’, ‘Common words’, ‘Ergative verbs’, ‘Phrasal verbs’, ‘Encyclopedic and specialist entries’, ‘Function words’, ‘Derivatives’, ‘Coverage’, and ‘Labels’. As is clear from the list above, different types of entries are the subject of study in the analysis. It should be noted, in this connection, that this is a comparative review, in that corresponding entries of such dictionaries as CED4 and CD were referred to in investigating the relevant entries of the dictionary under review.6 Other dictionaries including COD9, COBUILD2 and LDOCE3 were also consulted where appropriate. Coverage refers to the areas of meaning of words covered in the dictionary. In examining coverage, the following eight pages of NODE were selected: 100–1, 600–1, 1100–1 and 1600–1. It should be noted that this is a case of random sampling to make an objective survey. There were 249 headwords in all, and then the corresponding entries of CED4 were closely compared. It was found that CED4 covers a slightly wider range of meaning. As for labels used, a survey on the same pages mentioned above was carried out. Obviously, these are pieces of quantitative research. In contrast, I looked into a number of particular entry words qualitatively to examine the core sense and subsense structure, common words and so on. Thus, this whole analysis turns out to be a combination of qualitative and quantitative studies. Incidentally, Nakao (1972: 53), one of the original members of the ILC, noted that ‘it is important to point out problems of a dictionary under review by surveying or analysing it from a holistic point of view, in addition to criticism of specific entry problems’ (my translation), which is a telltale sign that he knew early on that a well-balanced review was necessary because only that could give rise to improvement in dictionaries. One more point to be noted about our dictionary analysis is that all members of the reviewing team are practical as well as theoretical lexicographers. By practical lexicographers I mean that they are experienced lexicographers in that they have had, in one way or another, a substantial as well as active role to play in the compilation of different dictionaries. My conviction is that a reviewer of dictionaries should have had at least some experience of writing a dictionary before doing a review. The knowledge should certainly give the reviewer some idea of what is involved in actual dictionary making in real terms, and this in turn would help him or her to make a realistic and reasonable appraisal or to make an informed, sensible judgement, without going too far, about the work under review. Consider the following quotation from Rundell (1998: 316): ‘[I]mprovement can be said to take place when: the description of a language that a dictionary provides corresponds more closely to reliable empirical 55
The Bloomsbury Companion to Lexicography
evidence regarding the way in which that language is used; the presentation of this description corresponds more closely to what we know about the reference needs or reference skills of the target user’. While this statement is sensible and sound, I suspect that Rundell’s perspicacity as demonstrated in the statement above comes, partly at least, from his own experience as a practical lexicographer, which does seem to make a difference.
6 New Directions in Dictionary Analysis The importance or significance of user research has already been recognized among lexicographers, and our dictionary analysis has also started to incorporate it.7 To be specific, Dohi et al. (2002) were the first to include user research as part of the dictionary analysis. The research consisted of three forms of enquiry: a questionnaire, a written test and interviews. Ishii (2010), another member of the ILC, has offered a promising new approach. He has built a database or special corpus of the full texts of CALD3, COBUILD6, LDOCE5, MEDAL2 and OALD7, and made an attempt to compare the readability of definitions and illustrative examples employed in each of the dictionaries above. In so doing, he used the Flesch Reading Ease Score and the Flesch-Kincaid Grade Level as measures of calculating the specific level of readability. What should be noted in this context is the fact that this is a quantitative study that might replace the random sampling method mentioned earlier and that could allow us to conduct a full and thoroughgoing investigation, resulting in still more accurate, objective and reliable data.
7 Conclusion We saw in this chapter that the business of dictionary criticism is still under development and that the setting up of applicable evaluation criteria is the key issue in its refinement and improvement. I argued that the establishment of those criteria may be encouraged or facilitated by delimiting or delineating the relevant dictionary type or types. For reviewing dictionaries comparatively, I presented dictionary analysis as an effective as well as a workable method, subsuming all suggestions put forward by Chapman (1977). I do not claim that this kind of analysis is the best or ideal method, but it has certainly provided us with findings of special and professional interest and revealing insight. I may have sounded pessimistic about the setting up of the Common Yardstick because I gave up on it. I just chose to be realistic and practical. I hasten to add that we are not content, nor should we be, with the way dictionary analysis stands now. It is about time dictionary analysis itself was reviewed in 56
Methods in Dictionary Criticism
a new light and, in this connection, the two new approaches introduced in the preceding section are welcome additions in that direction. One final note of caution. The business of dictionary criticism might lead dictionary reviewers into a risky position where their work could induce some form of uniformity or at least a propensity for it in the design features of dictionaries. In fact, back in 1995, the celebrated ‘big four’ dictionaries did have their own distinct, characteristic features, but it would seem that the current ‘big five’, with MEDAL added in 2002, have now lost some of their distinctive identity and have begun to show an increasing resemblance to one another in at least some of their features, though the widespread use of corpora may have been a contributing factor. Similar concerns are voiced by some lexicographers such as Yamada (2010). That we do not want to see happen.
Notes 1. ‘These dictionaries’ refer to The American College Dictionary (1948), New College Standard Dictionary (1947), Macmillan’s Modern Dictionary, rev. edition (1947), Webster’s Collegiate Dictionary, 5th edition (1941) and The Winston Dictionary, College edition (1946). 2. See Svensén (2009) for the distinction between specialist reviews and journalistic reviews. It is to be noted, also, that Swanepoel (2008: 214) mentions that ‘Ripfel (1989) distinguishes between evaluation criteria used in journalistic reviews and those used in expert reviews’. I hasten to add that I am well aware that such features as the physical quality of dictionaries and their pricing do matter a great deal when it comes to deciding which dictionary to buy. I stress the fact, however, that these features are not peculiar to dictionaries but to other kinds of publication as well. 3. The Iwasaki Linguistic Circle, or ‘Iwasaki Kenkyukai’ in Japanese, is a group of linguists and lexicographers, theoretical and practical, based in Tokyo, Japan. The ILC has been publishing a journal called Lexicon once a year for the past 40 years now, which is characterized by the fact that each issue contains one or two articles of dictionary analysis. For a fuller account of it, see Akasu (2007). An electronic version of the article is available at the following site: http://kdictionaries.com/kdn/kdn15/ kdn1504-akasu.html. 4. ‘Other journals’ include International Journal of Lexicography, in which there are two reviews conducted and contributed by some ILC members. One, which appeared in 1992, was of the 8th edition of The Concise Oxford Dictionary and the other, published in 1994, was of the 2nd edition of Longman Dictionary of the English Language. Although these two reviews are considerably shorter, due to space limitations of IJL reviews, than the kind of dictionary analysis found in Lexicon, they do retain the characteristic features of ILC’s dictionary analysis. It might be a good starting point for interested readers to take a look at these, that is Higashi et al. (1992) and Masuda et al. (1994), in order to get some idea of what dictionary analysis is like. 5. It should be underlined that the very first dictionary analysis mentioned earlier, which was of (the 1st edition of) The Penguin English Dictionary, had been performed, with subsequent analyses following one after another, long before Chapman (1977: 158) made this suggestion ‘toward a still better method’.
57
The Bloomsbury Companion to Lexicography 6. Comparative reviews may be divided into two major types: diachronic or longitudinal and synchronic or, if you will, ‘latitudinal’ ones. The former includes comparisons of the type where the current edition of a dictionary is compared with its previous edition or editions, for example a comparison of COD12 with COD11 or of LDOCE5 with LDOCE4 and/or still earlier editions. Our analysis of NODE belongs to the latter type. There are a number of other subtypes in this category. To give just a few examples, a dictionary of one variety of a language may be compared with its counterpart of another, for example a comparison of NODE with NOAD or of OALD with OAAD. A comparison can also be made between a ‘senior’ dictionary and its ‘junior’ version, for example a comparison between COD and POD or CALD and CLD. 7. The following is an oft-quoted passage from Johnson’s celebrated Plan: ‘The value of a work must be estimated by its use: It is not enough that a dictionary delights the critic, unless at the same time it instructs the learner’ (Johnson 1747: 5). Truer words were never spoken.
References Dictionaries CALD: Cambridge Advanced Learner’s Dictionary, 1st edition 2003, Cambridge: Cambridge University Press. CALD3: Cambridge Advanced Learner’s Dictionary, 3rd edition 2008, Cambridge: Cambridge University Press. CD: The Chambers Dictionary, 1998, Edinburgh: Chambers Harrap. CED4: Collins English Dictionary, 4th edition 1998, Glasgow: HarperCollins. CLD: Cambridge Learner’s Dictionary, 1st edition 2001, Cambridge: Cambridge University Press. COBUILD2: Collins COBUILD English Dictionary, New edition 1995, London: HarperCollins. COBUILD6: Collins COBUILD Advanced Dictionary of English, 6th edition 2009, Glasgow: HarperCollins. COD: The Concise Oxford Dictionary of Current English, 1st edition 1911, Oxford: Clarendon Press. COD9: The Concise Oxford Dictionary of Current English, 9th edition 1995, Oxford: Oxford University Press. COD11: Concise Oxford English Dictionary, 11th edition 2004, Oxford: Oxford University Press. COD12: Concise Oxford English Dictionary, 12th edition 2011, Oxford: Oxford University Press. LDOCE: Longman Dictionary of Contemporary English, 1st edition 1978, Harlow: Longman LDOCE3: Longman Dictionary of Contemporary English, 3rd edition 1995, Harlow: Longman. LDOCE4: Longman Dictionary of Contemporary English, 4th edition 2003, Harlow: Pearson Education Limited. LDOCE5: Longman Dictionary of Contemporary English, 5th edition 2009, Harlow: Pearson Education Limited. MEDAL: Macmillan English Dictionary for Advanced Learners, 1st edition 2002, Oxford: Macmillan Education.
58
Methods in Dictionary Criticism MEDAL2: Macmillan English Dictionary for Advanced Learners, 2nd edition 2007, Oxford: Macmillan Education. NOAD: The New Oxford American Dictionary, 1st edition 2001, New York: Oxford University Press. NODE: The New Oxford Dictionary of English, 1st edition 1998, Oxford: Clarendon Press. OAAD: Oxford Advanced American Dictionary for Learners of English, 2011, Oxford: Oxford University Press. OALD: (Advanced) Learner’s Dictionary of Current English, 1st edition 1948, London: Oxford University Press. OALD7: Oxford Advanced Learner’s Dictionary of Current English, 7th edition 2005, Oxford: Oxford University Press. OED: The Oxford English Dictionary, 1st edition 1884–1933, Oxford: Clarendon Press. POD: The Pocket Oxford Dictionary of Current English, 1st edition 1924, Oxford: Clarendon Press.
Other Works Akasu, K. (2003) Dictionary analyses in Lexicon revisited, a paper presented at the Third Asialex Biennial International Conference, Meikai University, Urayasu, Chiba, Japan, 27 August 2003. — (2005) A glimpse into dictionary analyses, a paper presented at the meeting of the English Language Postgraduate Seminar, University of Birmingham, UK, 18 November 2005. — (2007) The Iwasaki Linguistic Circle and dictionary analysis, Kernerman Dictionary News 15, 5–11. — (2008) An analysis of A Valency Dictionary of English: A Corpus-based Analysis of the Complementation Patterns of English Verbs, Nouns and Adjectives. Lexicon 38, 12–28. — (2012) The first dictionary of English collocations in Japan. In: J. Szerszunowicz (ed.) Research on Phraseology in Europe and Asia: Focal Issues of Phraseological Studies, Vol. 1. Bialystok: University of Bialystok Publishing House, 45–56. Akasu, K. and Uchida, S. (eds) (2011) Lexicography: Theoretical and Practical Perspectives, Proceedings of the Seventh ASIALEX Biennial International Conference. Kyoto: The Asian Association for Lexicography. Akasu, K., Koshiishi, T., Makino, T., Kawamura, A. and Asada, Y. (2005) An analysis of Cambridge Advanced Learner’s Dictionary. Lexicon 35, 127–84. Akasu, K., Saito, H., Kawamura, A., Kokawa, T. and Hotta, R. (2001) An analysis of the Oxford Advanced Learner’s Dictionary of Current English, 6th edition. Lexicon 31, 1–51. Akasu, K., Koshiishi, T., Matsumoto, R., Makino, T., Asada, Y. and Nakao, K. (1996) An analysis of Cambridge International Dictionary of English. Lexicon 26, 3–76. Akasu, K., Nakamoto, K., Saito, H., Asada, Y., Urata, K. and Omiya, K. (2000) An analysis of The New Oxford Dictionary of English. Lexicon 30, 53–117. Bogaards, P. (1996) Dictionaries for learners of English. International Journal of Lexicography 9/4, 277–320. Chan, A. and Loong, Y. (1999) Establishing criteria for evaluating a learner’s dictionary. In: R. Berry, B. Asker, K. Hyland and M. Lam (eds) Language Analysis, Description and Pedagogy. Hong Kong: Hong Kong University of Science and Technology, 298–307. Chan, A. Y. W. and Taylor, A. (2001) Evaluating learner dictionaries: What the reviewers say. International Journal of Lexicography 14/3, 163–80.
59
The Bloomsbury Companion to Lexicography Chapman, R. L. (1977) Dictionary reviews and reviewing: 1900–1975. In: J. C. Raymond and I. W. Russel (eds) James B. McMillan: Essays in Linguistics by his Friends and Colleagues. Alabama: University of Alabama Press, 143–61. Dohi, K. (1992) A note on dictionary criticism. LEXeter Newletter 10, 6–7. Exeter: Dictionary Research Centre. — (1993) The need for dictionary criticism. Toyoko English Studies 2, 21–32. Dohi, K., Shimizu, A., Osada, T., Komuro, K., Kanazashi, T., Isozaki, S. and Urata, K. (2002) An analysis of Longman Advanced American Dictionary. Lexicon 32, 1–96. Fontenelle, T. (2008) Practical Lexicography: A Reader. Oxford: Oxford University Press. Hartmann, R. R. K. (1996) Lexicography as an applied linguistic discipline. In: R. R. K. Hartmann (ed.) Solving Language Problems: From General to Applied Linguistics. Exeter: University of Exeter Press, 230–44. — (1998) Contemporary lexicography, with particular attention to the user’s perspective. Lexicon 28, 141–7. — (2001) Teaching and Researching Lexicography. Harlow: Pearson Education. Hartmann, R. R. K. and James, G. (1998) Dictionary of Lexicography. London: Routledge. Hausmann, F. J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (eds) (1989–91) Wörterbücher, Dictionaries, Dictionnaires: Ein Internationales Handbuch zur Lexikographie, Vols 1–3. Berlin: Walter de Gruyter. Higashi, N., Dohi, K. and Akasu, K. (1988) BBI eigo rengo katsuyo jiten no bunseki [An analysis of The BBI Combinatory Dictionary of English]. Lexicon 17, 43–124. Higashi, N., Takebayashi, S., Nakao, K., Sakurai, M., Yamamoto, F., Masuda, H. and Yawata, S. (1992) A review of The Concise Oxford Dictionary of Current English. International Journal of Lexicography 5/2, 129–60. Ishii, Y. (2010) Shuyo EFL jisho ni okeru yorei no goi-reberu hikaku [Comparison of vocabulary levels of illustrative examples in major EFL dictionaries], a paper presented at JACET workshop held at Toyo University, 27 March 2010. — (2011) Comparing the vocabulary sets used in the ‘big five’ English monolingual dictionaries for advanced EFL learners. In: K. Akasu and S. Uchida (eds), 180–9. Ito, F., Nakao, K., Takebayashi, S. and Watanabe, M. (1968) The Penguin English Dictionary no bunseki [An Analysis of The Penguin English Dictionary]. Denki Tsushin Daigaku Gakuho [Reports of the University of Electro-Communications] 24, 179–211. Jackson, H. (2000) Dictionary criticism, a paper presented at InterLex 14, University of Exeter, 10 April 2000. — (2002) Lexicography: An Introduction. London: Routledge. Johnson, S. (1947) The Plan of a Dictionary of the English Language. London: J. and P. Knapton (reprinted in Fontenelle 2008). Kincaid, J. P., Fishburne Jr., R. P., Rogers, R. L. and Chissom, B. S. (1975) Derivation of new readability formulas (Automated readability index, Fog count and Flesch reading ease formula) for navy enlisted personnel. In: Research Branch Report. Millington, TN: Naval Technical Training, U. S. Naval Air Station, Memphis, TN, 8–75. Kister, K. F. (1992) Kister’s Best Dictionaries for Adults and Young People: A Comparative Guide. Phoenix, AZ: Oryx Press. Masuda, H., Takebayashi, S., Akasu, K., Yamamoto, F. and Nakao, K. (1994) A review of Longman Dictionary of the English Language. International Journal of Lexicography 7/1, 31–46. McMillan, J. B. (1949) Five college dictionaries. College English 4, 214–21. Nakamoto, K. (1994) Establishing criteria for dictionary criticism: a checklist for reviewers of monolingual English learners’ dictionaries. An unpublished MA dissertation submitted to the University of Exeter. — (1995a) A checklist for reviewers of EFL dictionaries: checkpoints about the review. Lexicon 25, 1–13.
60
Methods in Dictionary Criticism — (1995b) A checklist for reviewers of EFL dictionaries: checkpoints about the dictionary under review. In: S. Takahashi, K. Asao, and R. Matsumoto (eds) In Honor of Nobuyuki Higashi: Papers Contributed on the Occasion of his Sixtieth Birthday September 4, 1995. Tokyo: Kenkyusha, 16–35. — (1998) An analysis of ILC’s ‘dictionary analysis’. Lexicon 28, 28–38. Nakao, K. (1972) Jisho no chosa/bunseki: sono igi to mondaiten [Survey and analysis of dictionaries: its significance and problems]. Lexicon 1, 49–57. Osselton, N. E. (1989) The history of academic dictionary criticism with reference to major dictionaries. In: F. J. Hausmann et al. (eds), Vol. 1, 225–30. Ripfel, M. (1989) Wörterbuchkritik. Eine empirische Analyse von Wörterbuchrezensionen (Lexicographica. Series Maior 29). Tübingen: Max Niemeyer. Rundell, M. (1998) Recent trends in English pedagogical lexicography. International Journal of Lexicography 11/4, 315–42. Steiner, R. J. (1984) Guidelines for reviewers of bilingual dictionaries. Dictionaries 6, 166–81. — (1993) Reviews of dictionaries in learned journals in the United States. Lexicographica 9, 158–73. Svensén, B. (2009) A Handbook of Lexicography: The Theory and Practice of Dictionary-Making. Cambridge: Cambridge University Press. Swanepoel, P. (2008) Towards a framework for the description and evaluation of dictionary evaluation criteria. Lexikos 18, 207–31. Yamada, S. (2010) EFL dictionary evolution: innovations and drawbacks. In: I. J. Kernerman and P. Bogaards (eds) English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: KDictionaries, 147–68.
61
3.3
Researching Users and Uses of Dictionaries Hilary Nesi
Chapter Overview Introduction Methods Who Are the Users of Dictionaries? What Kinds of Activity Prompt Dictionary Use? What Kinds of Dictionary do Users Prefer to Use? What Kinds of Information Do Dictionary Users Look for, and What Consultation Strategies Do They Employ? Conclusion
62 63 66 67 68 69 70
1 Introduction Dictionary use was not a popular research topic until fairly recently. Welker (2010) summarizes 320 empirical dictionary use studies, but only 6 of these were conducted before 1980. In the 1980s there was an upsurge of interest, however, and an increasing number of studies have taken place in each decade since then, in an ever-wider range of dictionary-using contexts. Much of the latest research focuses on electronic dictionaries produced locally or regionally for specific user groups, as evidenced for example in the proceedings of the two eLex conferences (Granger and Paquot 2010, Kosem and Kosem 2011) and recent AFRILEX, ASIALEX and EURALEX conferences. The aim of all studies of dictionary use is to discover ways to increase the success of dictionary consultation. This involves the identification of users’ needs and skills deficits, and the making of appropriate matches between types 62
Researching Users and Uses of Dictionaries
of dictionary, types of dictionary user and types of dictionary use. The following research questions are pertinent: zz Who are the users of dictionaries?
zz What kinds of activity prompt dictionary use?
zz What kinds of dictionary do users prefer to use?
zz What kinds of information do dictionary users look for?
zz What consultation strategies do dictionary users employ, and how suc-
cessful are these strategies?
This chapter looks at some of the various ways in which these questions have been addressed.
2 Methods Welker (2010) identifies six main methods of investigating dictionary use: (1) (2) (3) (4) (5) (6)
Questionnaire surveys Interviews Observation Protocols Tests and experiments Log files
All these methods are intended to shed light on the way dictionary users consult dictionaries for their own purposes, under non-experimental conditions. Gathering such information is not easy, however, because the data-gathering instruments often rely heavily on users’ ability to explain their consultation behaviour, and because they also tend to steer users towards uncharacteristic patterns of use. The questionnaire survey is the most common approach, and can be an effective means of gathering data from large numbers of respondents regarding their long-term dictionary-using habits and attitudes (Lew 2002). Questionnaires have come in for criticism, however, for example by Hatherall (1984), Wiegand (1998), Nesi (2000a) and Tarp (2009), because they sometimes place unreasonable demands on users’ powers of recall, and because users and questionnaire designers may not share the same concepts and terminology. Interviews and observations are used less frequently in dictionary research, and generally with only a few participants, because of the cost in terms of time and expertise. Neubach and Cohen (1988) interviewed only six dictionary users, for example, and East (2008) observed groups of six and five users. 63
The Bloomsbury Companion to Lexicography
Interviews and observations can, however, be more successful than questionnaire surveys as a means of probing dictionary-using behaviour. Interview participants can ask each other for clarification if unexpected aspects of dictionary use come to light, and observations can reveal behaviour without the need for users to describe it at all. The data are therefore less likely to be coloured by misunderstandings and misconceptions, although they do not always reveal natural look-up behaviour because the interviewer or observer may unintentionally influence the outcome, especially if participants believe that researchers approve of certain strategies, and disapprove of others. Recently, new laboratory-based methods have enabled researchers to observe in detail the way users interact with dictionary information on the computer screen. Heid and Zimmerman (2012), for example, adopted usability testing methods from the field of information science to record keystroke patterns for search routes through different types of dictionary interface. Tono (2011), Kaneta (2011) and Simonsen (2011) employed eye-tracking techniques previously used in the fields of cognitive science, psycholinguistics and human-computer interaction to discover what areas of the dictionary entry users view, in what order and for how long. Inevitably research of this kind is conducted in an artificial setting with relatively small numbers of participants, but the techniques reveal aspects of dictionary use that probably cannot be discovered by a human interviewer or observer. Completely natural look-up behaviour is difficult to record because it is a private activity that occurs spontaneously rather than to order. A researcher might spend a very long time observing a potential user as they read or write for their own purposes, before catching the moment when dictionary consultation occurs. Tests or tasks which prompt dictionary use are useful as a means of generating a lot of structured data in a short amount of time, particularly for comparative purposes, for example to identify the conditions under which users look up the most words, take the least time, achieve the highest comprehension scores or retain the most vocabulary. It is not always possible to extrapolate information about natural dictionary consultation from these findings, however – for example, the dictionaries available at the site of the experiment may not be the same as the ones users normally consult, and the task may not bear much resemblance to the users’ normal reading, writing or translating activities. Protocols, or self-reports, can shed light on users’ understanding and decision-making, either during spontaneous dictionary use, or while completing a task set by the researcher. Oral protocols are recordings of participants’ thoughts, spoken aloud throughout the consultation process. User behaviour is thus open to examination without the distortion of faulty recall or reinterpretation, but usually relates to only a small number of participants because of the special skills needed to think aloud, and the amount of time required to 64
Researching Users and Uses of Dictionaries
gather and analyse spoken data. Nesi and Boonmoh (2009), for example, chose 17 participants from a cohort of 580 students to train in think-aloud techniques. They collected data from just eight of these, the ones who proved most capable of verbalizing their dictionary use. Written protocols can be either freely written or structured using a format prepared by the researcher, perhaps with multiple choice options. Typically they record a reason for each dictionary search, the information searched for, the dictionary used and an evaluation of the success of the consultation. The method is suitable for use with multiple participants: Müllich (1990) collected 108 written protocols from language learners, for example, and Harvey and Yuill (1997) collected 211. Retrospective protocols are problematic because users quickly forget the details of the consultation, but the process of completing a protocol while using a dictionary is quite disruptive. Atkins and Varantola (1997) asked participants to work in pairs to reduce disruption, one member using a dictionary, and the other recording the process. With all forms of protocol it is likely that some behaviours will go unrecorded or misrecorded, however, because consultation processes cannot always easily be described. Log files observe users’ interactions with their computers in an unobtrusive way. They can be used to record experimental data (e.g. Lew and Doroszewska 2009), but are also a good way of capturing information about the searches users make online, when they are engaged in their normal activities, perhaps over an extended period of time. Like other forms of observation, however, log files alone cannot provide much insight into the context or purpose of dictionary consultation, unless the dictionary is linked to an online text or task. Moreover, although a log file may be able to indicate whether consultation led to the information the user was searching for, when analysed in isolation it cannot record whether the user considered the consultation successful. Dictionary users, uses and contexts of use can all vary enormously, making it unsafe to generalize from the findings of individual studies. In some other fields of research large-scale controlled trials can test how effectively a given treatment works, but the effectiveness of a dictionary cannot usually be investigated by this means because it is difficult to enlist the aid of a representative sample of all potential users (Welker 2010: 13). Studies therefore tend to focus on the behaviour of smaller and more specific groups, representing dictionary users of one particular type, in one particular context. To facilitate the comparison of findings from different studies researchers sometimes try to adopt similar methods; Welker (2010: 13) cites a number of studies utilizing similar questionnaire formats, for example, and Dziemianko (in press) traces a sequence of dictionary use replication studies. Researchers also seem increasingly likely to adopt a mixed method approach. This helps to compensate for the inevitable limitations of each individual method, and increases the reliability of the findings. 65
The Bloomsbury Companion to Lexicography
Multiple studies of different groups, using similar or complementary methods, may gradually enable us to build up a complete picture of ‘how dictionaries are used … who the users are, where, when and why they use dictionaries, and with which result’ (Tarp 2009: 279).
3 Who Are the Users of Dictionaries? Varantola (2002: 33) divides dictionary users into three broad categories: language learners, non-professional users and professional users, these last being those who ‘normally use a dictionary to perform a task that they get paid for’. Other user variables that are likely to affect behaviour are: zz age
zz mother tongue
zz second or foreign language zz language proficiency level zz educational level
zz level of skill in dictionary use
zz role (as a teacher, learner, translator, traveller, player of word games, etc.)
zz location (geographically, and within the home, place of work or educa-
tional institution).
The geographical location of online dictionary users is ascertainable from log files. Otherwise questionnaires are typically considered the best way to obtain factual information relating to these variables. Questionnaire items to establish user profiles were used in studies by Atkins and Varantola (1998), for example, and Hartmann (1999), and are often included in larger surveys of user wants and needs involving multiple data collection methods. Thus Law and Li (2011) triangulated findings from a questionnaire survey with findings from interviews, and Ptaszynski and Sobkowiak (2011) combined a questionnaire survey with protocols produced during translation and writing tasks, and post-task interviews. Research participants are usually selected on the grounds that they are available, willing to take part, and reasonably representative of the types of user the researchers are most concerned with. This means that they are often university students, as researchers are usually based in universities. It also means that people in locations where little research takes place tend to be under-represented in studies of dictionary use. Lew (2011) points out that there is a deficit of information about many contexts of use, for example by tourists, or families doing crossword puzzles at home.
66
Researching Users and Uses of Dictionaries
4 What Kinds of Activity Prompt Dictionary Use? Most dictionary consultations are undertaken when the user is engaged in another activity, in order to ‘solve a context-dependent problem’ (Varantola 2002). Dictionary use is typically classified as ‘receptive’ (i.e. to help with text decoding tasks) or ‘productive’ (i.e. to help with text encoding tasks), although dictionaries can also be treated as resources for learning new vocabulary or finding out about a language. The following table summarizes the broad range of activities generally associated with dictionary use. Receptive
Productive
Written medium
Reading Translating from L2 to L1
Writing Translating from L1 to L2
Spoken medium
Listening Interpreting from L2 to L1
Speaking Interpreting from L1 to L2
Gathering language information Activities associated with dictionary use.
Traditionally the large monolingual dictionaries have focused on the receptive needs of native speakers, while learners’ dictionaries, bilingualized dictionaries and L1-L2 bilingual dictionaries also support language production by providing translations and/or more grammar, phraseology, usage and pronunciation information. Most, but not all, surveys have found that dictionaries are more often used receptively, while reading (Marello 1987, Hartmann 1999, Stark 1999) or translating from L2 to L1 (Tomaszczyk 1979). Battenburg (1991) found that lower-level students used their dictionaries more while reading, and advanced-level students used their dictionaries more while writing. Tomaszczyk’s survey respondents reported using dictionaries for speaking and listening activities, but he concluded that they might have been referring to the preparation of oral reports. Dictionary use prior to the advent of mobile e-dictionaries was generally associated with activities in the written medium. ‘Reading’, ‘writing’, ‘speaking’ and ‘listening’ are very broad activity types. Some questionnaires make finer distinctions; Ripfel (1990) lists reading newspapers or magazines, listening to the radio or watching television, explaining word meanings to children, doing homework, and writing letters, for example; and Hartmann (1999) includes playing word games, writing assignments and reading for study and pleasure. Presumably users’ needs change according to the type of receptive or productive activity they are engaged in: translators need to understand every word in the text, while learners reading for pleasure may only read for gist, ignoring many of the words they do not know. Activity type 67
The Bloomsbury Companion to Lexicography
can also affect users’ choice of dictionary format. Nesi (2010) records a complex picture of e-dictionary preferences, with students using computer-based dictionaries when reading and writing at the computer, and portable electronic dictionaries when reading and writing with paper-based materials. Her participants also preferred mobile e-dictionaries for speaking and listening activities, because of their accessibility and audio pronunciation features. The secondary ‘knowledge-oriented’ use of dictionaries (Bergenholtz and Tarp 2003) has most often been studied in connection with vocabulary learning and retention. This is usually measured by testing participants after they have completed a task under various conditions, for example with or without a bilingual and/or monolingual dictionary, in print and/or in electronic form (e.g. Luppescu and Day 1993, Wingate 2002, Dziemianko 2010). Some items in questionnaire surveys explore the secondary use of dictionaries in more natural surroundings (e.g. Marello 1987, Chi 1998, Hass 2005). Ronald (2002) examined vocabulary acquisition as a result of using a dictionary while reading. Nesi (2010) investigated the way users created and annotated their own wordlists, using the latest e-dictionary resources. It seems that the e-dictionary format encourages browsing for general interest, especially when words within one entry are hyperlinked to other entries (Nesi 2000b).
5 What Kinds of Dictionary do Users Prefer to Use? The typology of reference works is complex, but the basic choices facing users are between monolingual and bilingual, hard-copy and electronic. Studies suggest that although users can distinguish these broad categories, many fail to make finer distinctions in terms of types of reference work, the different user groups they are intended for, and the relative merit of comparable titles. Participants in surveys and interviews are often unable to give precise information about the publishers and titles of their dictionaries (Nesi and Haill 2002, Law and Li 2011), and generally the investigation of dictionary preferences is hampered by users’ ignorance of the details concerning the dictionaries they own and use, and of the different types of dictionaries that exist. In many educational contexts dictionary skills are not systematically taught (Atkins and Varantola 1997, Bae 2011). When they are taught they rarely include the skills of selection and criticism. One simple but seemingly underused way of establishing users’ preferences is to ask them to evaluate various kinds of dictionary material. MacFarquhar and Richards (1983) used this method to compare users’ impressions of different defining styles, and Kanazashi (2011) reports studies comparing users’ responses to formatting and entry features. User evaluations do not prove that one lexicographical approach is superior to another, but they are a useful 68
Researching Users and Uses of Dictionaries
supplement to the comments of dictionary reviewers, who often do not belong to the user group the publisher is targeting. Publishers can obtain information about the popularity of their dictionaries through log files and sales figures, but such commercially sensitive information is rarely made available to external researchers, and most published log-file studies of natural dictionary use, such as those of Bergenholtz and Johnsen (2007) and Hult (2007), relate to the use of individual experimental reference works. User preferences are more often investigated by means of questionnaire surveys: results generally indicate that language learners prefer bilingual or bilingualized dictionaries (see, for example, Tomaszczyk 1979, Baxter 1980, Atkins and Varantola 1997, Lew 2004), although monolinguals tend to be used progressively more at more advanced levels of study. Monolingual dictionaries are also often regarded as superior in quality, especially by teachers (Boonmoh and Nesi 2008). Chan (2011) found that users mistakenly believed bilingualized dictionaries to contain less usage information. Some surveys have investigated dictionary purchasing choices: the decision sometimes rests with teachers (Béjoint 1981, Hartmann 1999, Boonmoh and Nesi 2008) and may not reflect users’ real preferences. The market for print dictionaries is declining rapidly, as increasing numbers of users access information via the internet, usually for free. E-dictionary packages and portals often contain a wide range of dictionaries of varying quality, some of which contain unattested headwords and idioms, presumably included to increase the extent of coverage and impress unsophisticated users (Nesi 2012). However despite the growing number of studies of e-dictionary ownership and user preferences, little research has been undertaken to evaluate the content of popular e-dictionary sites, and little information is available to help users choose and use e-dictionaries appropriately.
6 What Kinds of Information Do Dictionary Users Look for, and What Consultation Strategies Do They Employ? Unsurprisingly, surveys of native-speaker users (e.g. Quirk 1975, Jackson 1988, Hartmann 1999, Chatzidimou 2007) and language learners (e.g. Tomaszczyk 1979, Béjoint 1981, Battenburg 1991, Bishop 1998) indicate greatest interest in information that can be applied immediately to a receptive or productive task, such as meaning and spelling, rather than knowledge-oriented information such as the etymology of the look-up word. Because dictionary use generally occurs while users are busy doing something else, they generally want to find information quickly, with as little disruption as possible to the task they are undertaking. Several studies have investigated users’ misinterpretations of dictionary information, through failing to read the 69
The Bloomsbury Companion to Lexicography
entire entry (Miller and Gildea 1985, Nesi and Meara 1994), failing to understand grammatical information (Chan 2012) or consulting the wrong entry or subentry (Nesi and Haill 2002). The first definition in polysemous entries is the one that immediately catches the user’s eye, and it also usually represents the most familiar meaning, so alternative definitions lower down the entry are often ignored (Tono 1984, Bogaards 1998). Some studies have explored the role of ‘signposts’ within long entries as a means of helping users find the right, contextually appropriate, definition (Tono 1992, 1997, Bogaards 1998, Lew and Pajkowska 2007, Lew 2010, Nesi and Tan 2011). One of the reasons why e-dictionaries are becoming so popular is that they provide faster access than print dictionaries. Initially researchers worried that the speed of consultation might affect the quality of the experience (Taylor and Chan 1994, Stirling 2003, Zhang 2004), but experiments with comparable print and e-dictionaries have either recorded no significant difference in task performance (Nesi 2000b, Koyama and Takeuchi 2003, 2007) or significantly better performance by e-dictionary users (Shizuka 2003, Dziemianko 2010). Many popular online bilingual dictionaries translate in a fairly primitive way, however, without information or labels to indicate register differences or restrictions on use (Nesi 2012). Thus they might encourage a tendency, noted by Hatherall (1984), to look for one-word equivalents of search terms, translating word-for-word rather than considering the context. Researching e-dictionary use is particularly problematic because the content of e-dictionary packages and portals is changeable and poorly described. Commercial products can also be prohibitively expensive; unlike print publishers, e-dictionary developers are not in the habit of offering review copies or discounts on class sets. Shizuka (2003), Koyama and Takeuchi (2007) and Diehr (undated) acquired pocket electronic dictionaries from Casio for use in experiments in Japan and Germany, but other researchers such as Chen (2010) complain that they lacked the resources they needed to fully investigate e-dictionary use.
7 Conclusion Research into dictionary use is ultimately intended to help users consult dictionaries more successfully. Progress in this respect has been patchy, however. In some cases research has informed teacher training, and the teaching of dictionary skills (see, for example, Bae 2011), but it does not seem to have greatly affected the choices made by commercial dictionary publishers. ‘Few modifications to the learners’ dictionary design are supported by published results of experimental research on how learners really use dictionaries’, as Lew and Dziemianko (2006: 277) point out.
70
Researching Users and Uses of Dictionaries
The decline of the print dictionary has led to a reduction in the size of lexicographical teams in mainstream publishing houses, greater reliance on automatic dictionary compilation procedures and the rise of online dictionary sites created and managed by computer experts rather than lexicographers. This might suggest that there will be fewer opportunities for user research to influence design in years to come. Fortunately, however, the technology is also enabling many university-based research groups to experiment with new presentation techniques and dictionary content. The proceedings of the two recent eLex conferences (Granger and Paquot 2010, Kosem and Kosem 2011) are full of descriptions of small and specialized e-dictionaries designed to meet the needs of particular groups of users. Development teams for dictionaries of this sort are familiar with the research methods described in this chapter, and have the means to conduct their own user research, building on prior research findings. Their dictionaries will not be ‘block-busters’ like the famous print dictionaries of the past, but they do offer the hope of further, fruitful, user-research-informed design.
References Akasu, K. and Uchida, S. (eds) (2011) Lexicography: Theoretical and Practical Perspectives. Papers Submitted to the Seventh ASIALEX Biennial International Conference, Kyoto, August 22–24 2011. Kyoto, Japan: The Asian Association for Lexicography. Atkins, B. T. S. and Varantola, K. (1997) Monitoring dictionary use. International Journal of Lexicography 10/1, 1–45. — (1998) Language learners using dictionaries: The final report on the EURLEX/AILA research project on dictionary use. In: B. T. S. Atkins (ed.) Using Dictionaries: Studies of Dictionary Use by Language Learners and Translators. Tübingen: Max Niemeyer, 21–81. Bae, S. (2011) Teacher-training in dictionary use: voices from Korean teachers of English. In: K. Akasu and S. Uchida (eds), 46–55. Battenburg, J. D. (1991) English Monolingual Learners’ Dictionaries: A User-Oriented Study. Tubingen: Max Niemeyer. Baxter, J. (1980) The dictionary and vocabulary behavior: a single word or a handful? TESOL Quarterly 14, 325–36. Béjoint, H. (1981) The foreign student’s use of monolingual English dictionaries: a study of language needs and reference skills. Applied Linguistics 2/3, 207–22. Bensoussan, M., Sim, D. and Weiss, R. (1984) The effect of dictionary usage on EFL test performance compared with student and teacher attitudes and expectations. Reading in a Foreign Language 2/2, 262–76. Bergenholtz, H. and Johnsen, M. (2007) Log files can and should be prepared for a functionalistic approach. Lexikos 17, 1–20. Bergenholtz, H. and Tarp, S. (2003) Two opposing theories: on H.E. Wiegand’s recent discovery of lexicographic functions. Hermes 31, 171–96. Bishop, G. (1998) Research into the use being made of bilingual dictionaries by language learners. Language Learning Journal 18, 3–8. Bogaards, P. (1998) Scanning long entries in learner’s dictionaries. In: T. Fontenelle, P. Hiligsmann, A. Moulin and S. Theissen (eds) EURALEX ’98 Actes/Proceedings. Liege: Université Départements d’Anglais et de Néerlandais, 555–63.
71
The Bloomsbury Companion to Lexicography Boonmoh, A. and Nesi, H. (2008) A survey of dictionary use by Thai university staff and students, with special reference to pocket electronic dictionaries. Horizontes de Lingüística Aplicada 6/2, 79–90. Chan, A. (2011) Bilingualised or monolingual dictionaries? Preferences and practices of advanced ESL learners in Hong Kong. Language, Culture and Curriculum 24/1, 1–21. — (2012) Cantonese ESL learners’ use of grammatical information in a monolingual dictionary for determining the correct use of a target word. International Journal of Lexicography 25/1, 68–94. Chatzidimou, K. (2007) Dictionary use in Greek education: an attempt to track the field through three empirical studies. Horizontes de Lingüística Aplicada 6/2, 91–104. Chen, Y. (2010) Dictionary use and EFL learning. A contrastive study of pocket electronic dictionaries and paper dictionaries. International Journal of Lexicography 23/3, 275–306. Chi, M. L. A. (1998) Teaching dictionary skills in the classroom. In: T. Fontenelle, P. Hiligsmann, A. Moulin and S. Theissen (eds) EURALEX ’98 Actes/Proceedings. Liege: Université Départements d’Anglais et de Néerlandais, 565–77. Diehr, B. (undated) MOBIDIC hilft beim Englisch lernen. Available online at www.presse. uni-wuppertal.de/archiv_ab2008/archiv_medieninformationen/2010/1105_mobidic. html Dziemianko, A. (2010) Paper or electronic? The role of dictionary form in language reception, production and the retention of meaning and collocations. International Journal of Lexicography 23/3, 257–73. — (in press) Why one and two do not make three: dictionary form revisited. Lexikos. East, M. (2008) Dictionary Use in Foreign Language Writing Exams: Impact and Implications. Amsterdam: John Benjamins. Granger, S. and Paquot, M. (eds) (2010) eLexicography in the 21st Century: New Challenges, New Applications. Proceedings of eLex 2009, Louvain-la-Neuve, 12–14 October 2009. Louvain: Presses Universitaires de Louvain. Hartmann, R. R. K. (1999) Case study: the Exeter University survey of dictionary use. In: R. R. K. Hartmann (ed.) Thematic Network Project in the Area of Languages. Sub-project 9: Dictionaries. Dictionaries in Language Learning. Berlin: Freie Universität, 36–52. Harvey, K. and Yuill, D. (1997) A study of the use of a monolingual pedagogical dictionary by learners of English engaged in writing. Applied Linguistics 18/3, 253–78. Hass, U. (2005) Nutzungsbedingungen in der Hypertextlexikografie. Über eine empirische Untersuchung. In: D. Steffens (ed.) Wortschatzeinheiten: Aspekte ihrer (Be)schreibung. Dieter Herberg zum 65. Geburtstag. Mannheim: Institut für Deutsche Sprache, 29–42. Hatherall, G. (1984) Studying dictionary use: some findings and proposals. In: R. R. K. Hartmann (ed.) LEX’eter ’83 Proceedings: Papers from the International Conference on Lexicography at Exeter, 9–12 September 1983 (Lexicographica. Series Maior 1). Tübingen: Max Niemeyer, 183–9. Heid, U. and Zimmerman, J. T. (2012) Usability testing as a tool for e-dictionary design: collocations as a case in point. In: R. Vatvedt Fjeld and J. M. Torjusen (eds) Proceedings of the 15th EURALEX International Congress. Oslo: University of Oslo, 661–71. Hult, A.-K. (2007) A study in dictionary use on the internet. Nordiska studier i lexikografi. Rapport från 9. Konference om leksikografi i Norden, Akureyri 22.–26. maj 2007. Jackson, H. (1988) Words and Their Meaning. London: Longman. Kanazashi, T. (2011) Three areas of dictionary research where user studies are of particular importance. In: K. Akasu and S. Uchida (eds), 209–18. Kaneta, T. (2011) Folded or unfolded: eye-tracking analysis of L2 learners’ reference behaviour with different types of dictionary interfaces. In: K. Akasu and S. Uchida (eds), 219–24. Kosem, I. and Kosem, K. (eds) (2011) Electronic Lexicography in the 21st Century: New Applications for New Users. Proceedings of eLex 2011, Bled, 10–12 November 2011. Trojina: Institute for Applied Slovene Studies.
72
Researching Users and Uses of Dictionaries Koyama, T. and Takeuchi, O. (2003) Printed dictionaries versus electronic dictionaries: a pilot study on how Japanese EFL learners differ in using dictionaries. Language Education and Technology 40, 61–79. — (2007) Does look-up frequency help reading comprehension of EFL learners? Two empirical studies of electronic dictionaries. Calico Journal 25/1, 110–25. Law, W. and Li, K. (2011) Mobile phone dictionary: friend or foe? A user attitude survey of Hong Kong translation students. In: K. Akasu and S. Uchida (eds), 303–12. Lew, R. (2002) Questionnaires in dictionary use research: a re-examination. In: A. Braasch and C. Povlsen (eds) Proceedings of the Tenth EURALEX International Congress, Vol. 1. Copenhagen: Center for Sprogteknologi, Copenhagen University, 267–71. — (2004) Which Dictionary for Whom? Receptive Use of Bilingual, Monolingual and Semi-bilingual Dictionaries by Polish Learners of English. Poznań: Motivex. Available at www.staff.amu.edu.pl/~rlew/pub/Lew_publ.htm — (2010) Users take shortcuts: navigating dictionary entries. In: A. Dykstra and T. Schoonheim (eds) Proceedings of the 14th EURALEX International Congress. Leeuwarden/Ljouwert, The Netherlands: Fryske Akademy, 1121–32. — (2011) User studies: opportunities and limitations. In: K. Akasu and S. Uchida (eds), 7–16. Lew, R. and Doroszewska, J. (2009) Electronic dictionary entries with animated pictures: lookup preferences and word retention. International Journal of Lexicography 22/3, 239–57. Lew, R. and Dziemianko, A. (2006) Non-standard dictionary definitions: what they cannot tell native speakers of Polish. Cadernos de Traduçao 18, 275–94. Lew, R. and Pajkowska, J. (2007) The effect of signposts on access speed and lookup task success in long and short entries. Horizontes de Lingüística Aplicada 6/2, 235–52. Luppescu, S. and Day, R. (1993) Reading, dictionaries and vocabulary learning. Language Learning 43/2, 263–87. Marello, C. (1987) Examples in contemporary Italian bilingual dictionaries. In: A. P. Cowie (ed.) The Dictionary and the Language Learner: Papers from the EURALEX Seminar at the University of Leeds, 1–3 April 1985 (Lexicographica. Series Maior 17). Tübingen: Max Niemeyer, 224–37. MacFarquhar, P. and Richards, J. (1983) On dictionaries and definitions. RELC Journal 14/1, 111–24. Miller, G. and Gildea, P. (1985) How to misread a dictionary. AILA Bulletin, 13–26. Müllich, H. (1990) ‘Die Definition ist blöd!’ Herübersetzen mit dem einsprachigen Wörterbuch. Das französische und englische Lernerwörterbuch in der Hand der deutschen Schüler. Tübingen: Max Niemeyer. Nesi, H. (2000a) The Use and Abuse of Learners’ Dictionaries. Tübingen: Max Niemeyer. — (2000b) On screen or in print? Students’ use of a learner’s dictionary on cd-rom and in book form. In: P. Howarth and R. Herington (eds) EAP Learning Technologies. Leeds: Leeds University Press, 106–14. — (2010). In: G. Blue (ed.) Developing Academic Literacy. Oxford: Peter Lang, 213–26. — (2012) Alternative e-dictionaries: uncovering dark practices. In: S. Granger (ed.) Electronic Lexicography. Oxford: Oxford University Press, 357–72. Nesi, H. and Boonmoh, A. (2009) A close look at the use of pocket electronic dictionaries for receptive and productive purposes. In: T. Fizpatrick and A. Barfield (eds) Lexical Processing in Second Language Learners. Clevedon, UK: Multilingual Matters, 67–81. Nesi, H. and Haill, R. (2002) A study of dictionary use by international students at a British university. International Journal of Lexicography 15/4, 277–306. Nesi, H. and Meara, P. (1994) Patterns of misrepresentation in the productive use of EFL dictionary definitions. System 22/1, 1–15. Nesi, H. and Tan, K. H. (2011) The effect of menus and signposting on the speed and accuracy of sense selection. International Journal of Lexicography 24/1, 79–96.
73
The Bloomsbury Companion to Lexicography Neubach, A. and Cohen, A. (1988) Processing strategies and problems encountered in the use of dictionaries. Dictionaries 10, 1–19. Ptaszynski, M. O. and Sobkowiak, M. (2011) Is it all just text production? Examining dictionary use in L1-L2 translation and in free composition in L2. In: K. Akasu and S. Uchida (eds), 426–35. Quirk, R. (1975) The social impact of dictionaries in the UK. In: R. McDavid and A. Ducket (eds) Lexicography in English. New York: Annals of the New York Academy of Sciences, 211, 76–88. Ripfel, M. (1990) Wörterbuchbenutzung bei Muttersprachlern. Untersuchungsbericht über eine Befragung erwachsener muttersprachlicher Sprecher zur Wörterbuchbenutzung. Lexicographica 6, 237–51. Ronald, J. (2002) L2 lexical growth through extensive reading and dictionary use: a case study. In: A. Braasch and C. Povlsen (eds) Proceedings of the Tenth EURALEX International Congress, Copenhagen, Denmark, August 12–17 2002, Vol. 2. Copenhagen: Center for Sprogteknologi, Copenhagen University, 765–71. Scholfield, P. (1982) Using the English dictionary for comprehension. TESOL Quarterly 16, 185–94. Shizuka, T. (2003) Efficiency of information retrieval from the electronic and the printed versions of a bilingual dictionary. Language Education & Technology 40, 15–33. Simonsen, H. K. (2011) User consultation behaviour in internet dictionaries: an eyetracking study. Hermes 46, 75–101. Stark, M. (1999) Encyclopedic Learners’ Dictionaries: A Study of their Design Features from the User Perspective. Tübingen: Max Niemeyer. Stirling, J. (2005) The portable electronic dictionary – faithful friend or faceless foe? Modern English Teacher 14/3, 64–72. Tarp, S. (2009) Reflections on lexicographical user research. Lexikos 19, 275–96. Taylor, A. and Chan, A. (1994) Pocket electronic dictionaries and their use. In: W. Martin, W. Meijs, M. Moerland, E. ten Pas, P. van Sterkenburg and P. Vossen (eds) Proceedings of the 6th Euralex International Congress. Amsterdam: Euralex, 598–605. Tomaszczyk, J. (1979) Dictionaries: users and uses. Glottodidactica 12, 103–19. Tono, Y. (1984) On the Dictionary User’s Reference Skills. B.Ed. Dissertation, Tokyo: Gakugei Univeristy. — (1992) The effect of menus on EFL learners’ look-up processes. Lexikos 2, 230–53. — (1997) Guide word or signpost? An experimental study on the effect of meaning access indexes in EFL learners’ dictionaries. English Studies 28, 55–77. — (2011) Application of eye-tracking in EFL learners’ dictionary look-up process research. International Journal of Lexicography 24/1, 1–30. Varantola, K. (2002) Use and usability of dictionaries: common sense and context sensibility? In: M.-H. Corréard (ed.) Lexicography and Natural Language Processing A Festschrift in Honour of B.T.S. Atkins. Euralex, 30–44. Welker, H. A. (2010) Dictionary Use: A General Survey of Empirical Studies. Brasilia: Author’s Edition. Available at www.let.unb.br/hawelker/dictionary_use_research.pdf Wiegand, H. E. (1998) Wörterbuchforschung. Untersuchungen zur Wörterbuchbenutzung, zur Theorie, Geschichte, Kritik und Automatisierung von Wörterbüchern. Berlin: de Gruyter. Wingate, U. (2002) The Effectiveness of Different LearnerDictionaries. An Investigation into the Use of Dictionaries for Reading Comprehension by Intermediate Learners of German. Tübingen: Max Niemeyer. Zhang, P. (2004) Is the electronic dictionary your faithful friend? CELEA Journal 27/2, 23–8.
74
4 Current Research and Issues
4.1
Using Corpora as Data Sources for Dictionaries Adam Kilgarriff
Chapter Overview Introduction Headword Lists Collocation and Word Sketches Labels Examples Translations Summary
77 79 83 87 91 91 94
1 Introduction There are three ways to write a dictionary zz Copy
zz Introspect
zz Look at data
The first has its place. Making checks against other dictionaries is a not-tobe-overlooked step in the lexicographic process. But if the dictionary is to be an original work, it never has more than a secondary role. Introspection is central. The lexicographer always needs to ask themself, ‘what do I know about this word?’ ‘how do I interpret this evidence?’ ‘does that 77
The Bloomsbury Companion to Lexicography
make sense?’ But by itself, intuition does a limited and partial job. Asked ‘what is there to say about the verb remember?’ we might come up with some facts about meaning, grammar and collocation – but there are many more we will miss, and some of our ideas may be wrong. That leaves data, and for lexicography, the relevant kind of data is a large collection of text: a corpus. A corpus is just that: a collection of data – text and speech – when viewed from the perspective of language research. A corpus supports many aspects of dictionary creation: zz headword list development
zz for writing individual entries:
zz discovering the word senses and other lexical units (fixed phrases, comzz zz zz zz zz zz
pounds, etc.) identifying the salient features of each of these lexical units their syntactic behaviour the collocations they participate in any preferences they have for particular text-types or domains providing examples providing translations.
The chapter describes how a corpus can support each of these parts. First, two apologies. First, I am a native speaker of English who has worked mostly on monolingual English dictionaries. Examples, and the experiences which inform the chapter, will largely be from English. Second, to illustrate and demonstrate how corpora support lexicography, one cannot go far without referring to the piece of software that is the intermediary between corpus and lexicographer: the corpus query system. But this chapter is not a review of corpus query systems, so we simply use one – the one developed by the author and colleagues, the Sketch Engine (Kilgarriff et al. 2004) – to illustrate the various ways in which the corpus can help the lexicographer. For a review of corpus query systems see Kilgarriff and Kosem (2012). With ever growing quantities of text available online, faster computers, and progress in corpus and linguistic software, the field is changing all the time. By the time any practice is standard and widely accepted, it will be well behind the latest developments, so if this chapter were to talk only of standard and widely accepted practices it would run the risk of looking dated by the time it was published. Instead, I mainly describe recent and current work on projects I am involved in, arrogantly assuming that this coincides substantially with the leading edge of the use of corpora in lexicography.
78
Using Corpora as Data Sources for Dictionaries
2 Headword Lists Building a headword list is the most obvious way to use a corpus for making a dictionary. Ceteris paribus, if a dictionary is to have N words in it, they should be the N words from the top of the frequency list.
2.1 In Search of the Ideal Corpus It is never as simple as that, mainly because the corpus is never good enough. It will contain noise and biases. The noise is always evident within the first few thousand words of all the corpus frequency lists that I have ever looked at. In the British National Corpus1 (BNC), for example, a large amount of data from a journal on gastro-uterine diseases presents noise in the form of words like mucosa – a term much-discussed in these specific documents, but otherwise rare and not known to most speakers of English.2 Bias in the spoken BNC is illustrated by the very high frequencies for words like plan, elect, councillor, statutory and occupational: the corpus contains a quantity of material from local government meetings, so the vocabulary of this area is well represented. Thus keyword lists of the BNC in contrast to other large, general corpora show these words as particularly BNC-flavoured. If we turn to UKWaC (the UK ‘Web as Corpus’, Baroni et al. 2009), a web-sourced corpus of around 1.6 billion words, we find other forms of noise and bias. The corpus contains a certain amount of web spam. We discovered that people advertising poker are skilled at producing vast quantities of ‘word salad’ which, at the time, escaped our automatic routines for filtering out bad material. Internet-related bias also shows up in the high frequencies for words like browser and configure. While noise is simply wrong, and its impact is progressively reduced as our technology for filtering it out improves, biases are more subtle in that they force questions about the sort of language to be covered in the dictionary, and in what proportions.3
2.2 Multiwords Dictionaries have a range of entries for multiword items, typically including, for English, noun compounds (credit crunch, disc jockey), phrasal and prepositional verbs (take after, set out) and compound prepositions and conjunctions (according to, in order to). While corpus methods can straightforwardly find high-frequency single-word items and thereby provide a fair-quality first pass at a headword list for simple words, they cannot do the same for multiword items. Lists of high-frequency word-pairs in any English corpus are dominated 79
The Bloomsbury Companion to Lexicography
by items which do not merit dictionary entries: the string of the usually tops the list of word-pairs, or bigrams. The Sketch Engine has several strategies here: one is to view multiword headwords as collocations (see discussion below) and to find multiword headwords when working through the alphabet looking at each headword in turn. Another is to use lists of translations. This was explored in the Kelly project (Kilgarriff et al. 2012). The project worked on nine languages. First, we prepared and cleaned up a corpus headword list of around 6,000 words for each language. Then, all the words on those lists were translated (by a professional translation agency) into each of the eight other languages, giving us a database with 72 directed language pairs.4 We reasoned that where one language uses a multiword expression for a unitary concept (say, English look for) it was likely that other languages had a single word for the concept (e.g. French chercher, Italian cercare) and that when the Italian-to-English and French-to-English translators encountered cercare and chercher, they were likely to translate it as look for. So, although look for did not appear in the English source list, it appeared multiple times in the database as a translation. The strategy produced a modest number of multiword expressions.
2.3 Lemmatization The words we find in texts are inflected forms; the words we put in a headword list are lemmas. So, to use a corpus list as a dictionary headword, we need to map inflected forms to lemmas: we need to lemmatize. English is not a difficult language to lemmatize as no lemma has more than eight inflectional variants (be, am, is, are, was, were, been, being), most nouns have just two (apple, apples) and most verbs just four (invade, invades, invading, invaded). Most other languages present a substantially greater challenge. Yet even for English, automatic lemmatization procedures are not without their problems. Consider the data in Table 4.1.1. To choose the correct rule we need an analysis of the orthography corresponding to phonological constraints on vowel type and consonant type, for both British and American English.5 Even with state-of-the-art lemmatization for English, an automatically extracted lemma list will contain some errors. These and other issues in relating corpus lists to dictionary headword lists are described in detail in Kilgarriff (1997).
80
Using Corpora as Data Sources for Dictionaries
Table 4.1.1 Complexity in verb lemmatization rules for English Lemma
-ed, -s forms
Rule
-ing form
Rule
Fix Care Hope Hop
fixed, fixes cared, cares hoped, hopes hopped
fixing caring hoping hopping
delete -ing delete -ing, add -e delete -ing, add -e delete -ing, undouble consonant
Fuse Fuss Bus
AmE
hops fused fussed bussed, busses??
fusing fussing bussing
BrE
bused, bused
delete -ed, -es delete -d, -s delete -d, -s delete -ed, undouble consonant delete -s delete -d delete -ed delete -ed/-s, undouble consonant delete -ed
delete -ing, add -e delete -ing delete -ing, undouble consonant delete -ing
busing
2.4 User Profiles Building a headword list for a new dictionary (or revising one for an existing title) has never been an exact science, and little has been written about it. Headword lists are typically extended in the course of a project and are only complete at the end. A good starting point is to have a clear idea of who will use your dictionary, and for what purpose: a ‘user profile’. A user profile ‘seeks to characterise the typical user of the dictionary, and the uses to which the dictionary is likely to be put’ (Atkins and Rundell 2008: 28). This is a manual task, but it provides filters with which to sift computer-generated wordlists.
2.5 New Words As everyone involved in commercial lexicography knows, neologisms punch far above their weight. They might not be very important for an objective description of the language but they are loved by marketing teams and reviewers. New words and phrases often mark the only obvious change in a new edition of a dictionary, and dominate the press releases. Mapping language change has long been a central concern of corpus linguists and a long-standing vision is the ‘monitor corpus’, the moving corpus that lets the researcher explore language change objectively (Clear 1988, Janicivic and Walker 1997). The core method is to compare an older ‘reference’ corpus with an up-to-the-minute one to find words which are not already in the dictionary, and which are in the recent corpus but not in the older one. O’Donovan and O’Neill (2008) describe how this has been done at Chambers Harrap Publishers,
81
The Bloomsbury Companion to Lexicography
and Fairon et al. (2008) describe a generic system in which users can specify the sources they wish to use and the terms they wish to trace. The nature of the task is that the automatic process creates a list of candidates, and a lexicographer then goes through them to sort the wheat from the chaff. There is always far more chaff than wheat. The computational challenge is to cut out as much chaff as possible without losing the wheat – that is, the new words which the lexicography team have not yet logged but which should be included in the dictionary. For many aspects of corpus processing, we can use statistics to distinguish signal from noise, on the basis that the phenomena we are interested in are common ones and occur repeatedly. But new words are usually rare, and by definition are not already known. Thus lemmatization is particularly challenging since the lemmatizer cannot make use of a list of known words. So, for example, in one list we found the ‘word’ authore, an incorrect but understandable lemmatization of authored, past participle of the unknown verb author. For new-word finding we will want to include items in a candidate list even though they occur just once or twice. Statistical filtering can therefore be used only minimally. We are exploring methods which require that a word that occurred a maximum of once or twice in the old material occurs in at least three or four documents in the new material, to make its way onto the candidate list. We use some statistical modulation to capture new words which are taking off in the new period, as well as the items that simply have occurred where they never did before. Many items that occur in the new words list are simply typing errors. This is another reason why it is desirable to set a threshold higher than one in the new corpus. For English, we have found that almost all hyphenated words are chaff, and often relate to compounds which are already treated in the dictionary as ‘solid’ or as multiword items. English hyphenation rules are not fixed: most word pairs that we find hyphenated (sand-box) can also be found written as one word (sandbox), and as two (sand box). With this in mind, to minimize chaff, we take all hyphenated forms and two- and three-word items in the dictionary and ‘squeeze’ them so that the one-word version is included in the list of already-known items, and we subsequently ignore all the hyphenated forms in the corpus list. Prefixes and suffixes present a further set of items. Derivational affixes include both the more syntactic (-ly, -ness) and the more semantic (-ish, geo-, eco-).6 Most are chaff: we do not want plumply or ecobuddy or gangsterish in the dictionary, because, even though they all have google counts in the thousands, they are not lexicalized and there is nothing to say about them beyond what there is to say about the lemma, the affix and the affixation rule. The ratio of wheat to chaff is low, but among the nonce productions there are some which 82
Using Corpora as Data Sources for Dictionaries
are becoming established and should be considered for the dictionary. So we prefer to leave the nonce formations in place for the lexicographer to run their eye over. For the longer term, the biggest challenge is acquiring corpora for the two time periods which are sufficiently large and sufficiently well-matched. If the new corpus is not big enough, the new words will simply be missed, while if the reference corpus is not big enough, the lists will be full of false positives. If the corpora are not well-matched but, for example, the new corpus contains a document on vulcanology and the reference corpus does not, the list will contain words which are specialist vocabulary rather than new, like resistivity and tephrochronology. While vast quantities of data are available on the web, most of it does not come with reliable information on when the document was originally written. While we can say with confidence that a corpus collected from the web in 2009 represents, overall, a more recent phase of the language than one collected in 2008, when we move to words with small numbers of occurrences, we cannot trust that words from the 2009 corpus are from more recently written documents than ones from the 2008 corpus. Two text-types where date-of-writing is usually available are newspapers and blogs. Both of these have the added advantage that they tend to be about current topics and are relatively likely to use new vocabulary. My current strategy for new-word-detection involves large-scale gathering of newspaper and blog feeds every day.
3 Collocation and Word Sketches The arrival of large corpora provided the empirical underpinning for a view of language associated with Firth and Sinclair, in which the patterning of words in text was central: collocation came to the fore. Since the beginning of corpus lexicography, the primary means of analysis has been the reading of concordances. Since the earliest days of the COBUILD project, the lexicographers scanned concordance lines – often in their thousands – to find all the collocations and all the patterns of meaning and use. The more lines were scanned, the more patterns and collocations were found (though with diminishing returns). This was good and objective, but also difficult and time-consuming. Dictionary publishers were always looking to save time, and hence cut costs. Early efforts to offer computational support were based on finding frequently co-occurring words in a window surrounding the headword (Church and Hanks 1990). While these approaches had generated plenty of interest among university researchers, they were not taken up as routine processes by lexicographers: the ratio of noise to signal was high, the first impression of a 83
The Bloomsbury Companion to Lexicography
collocation list was of a basket of earth with occasional glints of possible gems needing further exploration, and it took too long to use them for every word. The ‘word sketch’ is a response to this problem. A word sketch is a one-page, corpus-based summary of a word’s grammatical and collocational behaviour, as illustrated in Figure 4.1.1. It uses a parser to identify all verb-object pairs, subject-verb pairs, modifier-modifiee pairs and so on, and then applies statistical filtering to give a fairly clean list, as proposed by Tapanainen and Järvinen (1998, and for the statistics, Rychly 2008). Word sketches need very large, part-of-speech-tagged corpora: in the late 1990s this had recently become available for general English in the form of the British National Corpus, and the first edition of word sketches were prepared to support a new, ‘from scratch’ dictionary for advanced learners of English, the Macmillan English Dictionary for Advanced Learners (MEDAL, Rundell 2001).
Figure 4.1.1 Word sketch for baby (from enTenTen12, a very large 2012 web corpus) 84
Using Corpora as Data Sources for Dictionaries
As the lexicographers became familiar with the software, it became apparent that word sketches did the job they were designed to do. Each headword’s collocations could be listed exhaustively, to a far greater degree than was possible before. That was the immediate goal. But analysis of a word’s sketch also tended to show, through its collocations, a wide range of the patterns of meaning and usage that it entered into. In most cases, each of a word’s different meanings is associated with particular collocations, so the collocates listed in the word sketches provided valuable prompts in the key task of identifying and accounting for all the word’s meanings in the entry. The word sketches functioned not only as a tool for finding collocations, but also as a useful guide to the distinct senses of a word – the analytical core of the lexicographer’s job (Kilgarriff and Rundell 2002). It became clear that the word sketches were more like a contents page than a basket of earth. They provided a neat summary of most of what the lexicographer was likely to find by the traditional means of scanning concordances. There was not too much noise. Using them saved time. It was more efficient to start from the word sketch than from the concordance. Thus the unexpected consequence was that the lexicographer’s methodology changed, from one where the technology merely supported the corpus-analysis process, to one where it pro-actively identified what was likely to be interesting and directed the lexicographer’s attention to it. And whereas, for a human, the bigger the corpus, the greater the problem of how to manage the data, for the computer, the bigger the corpus, the better the analyses: the more data there is, the better the prospects for finding all salient patterns and for distinguishing signal from noise. Though originally seen as a useful supplementary tool, the sketches provided a compact and revealing snapshot of a word’s behaviour and uses and became the preferred starting point in the process of analysing complex headwords. Since the first word sketches were used in the late 1990s, the Sketch Engine, the corpus query tool within which they are presented, has not stood still. Word sketches have been developed for a dozen languages (the list is steadily growing) and have been complemented by an automatic thesaurus (which identifies the words which are most similar, in terms of shared collocations, to a target word, see Figure 4.1.2) and a range of other tools including ‘sketch diff’, for comparing and contrasting a word with synonyms or antonyms (see Figure 4.1.3). There are also options such as clustering a word’s collocates or its thesaurus entries. The largest corpus for which word sketches have been created so far contains 70 billion words (Pomikalek et al. 2012). In a quantitative evaluation, two thirds of the collocates in word sketches for four languages were found to be ‘publishable quality’: a lexicographer would want to include them in a published collocations dictionary for the language (Kilgarriff et al. 2010). 85
The Bloomsbury Companion to Lexicography
Figure 4.1.2 Thesaurus entry for gargantuan
Figure 4.1.3 Sketch diff comparing strong and powerful 86
Using Corpora as Data Sources for Dictionaries
4 Labels Dictionaries use a range of labels (such as usu pl., informal, Biology, AmE) to mark words according to their grammatical, register, domain and regional characteristics, whenever these deviate significantly from the (unmarked) norm. All of these are facts about a word’s distribution, and all can, in principle, be gathered automatically from a corpus. In each of these four cases, computationalists are currently able to propose some labels to the lexicographer, though there remains much work to be done. In each case the methodology is to: zz specify a set of hypotheses
oo there will usually be one hypothesis per label, so grammatical hypoth-
eses for the category ‘verb’ may include: □□ is it often/usually/always passive □□ is it often/usually/always progressive □□ is it often/usually/always in the imperative zz for each word oo test all relevant hypotheses oo for all hypotheses that are confirmed, alert the lexicographer □□ (in the Sketch Engine, by adding the information to the word sketch).
Where no hypotheses are confirmed – when, in other words, there is nothing interesting to say, which will be the usual case – no alerts are given.
4.1 Grammatical Labels: usually plural, often passive, etc. To determine whether a noun should be marked as ‘usually plural’, we simply count the number of times the lemma occurs in the plural, and the number of times it occurs overall, and divide the second number by the first to find the proportion. Similarly, to discover how often a verb is passivized, we can count how often it is a past participle preceded by a form of the verb be (with possible intervening adverbs) and determine what fraction of the verb’s overall frequency the passive forms represent. Given a lemmatized, part-of-speech-tagged corpus, this is straightforward. A large number of grammatical hypotheses can be handled in this way. The next question is: when is the information interesting enough to merit a label in a dictionary? Should we, for example, label all verbs which are over 50 per cent passive as often passive? To assess this question, we want to know what the implications would be: we do not want to bombard the dictionary user with too many labels (or the 87
The Bloomsbury Companion to Lexicography
lexicographer with too many candidate-labels). What percentage of English verbs occur in the passive over half of the time? Is it 20 per cent, or 50 per cent or 80 per cent? This question is also not in principle hard to answer: for each verb, we work out its percentage passive, and sort according to the percentage. We can then give a figure which is, for lexicographic purposes, probably more informative than ‘the percentage passive’: the percentile. The percentile indicates whether a verb is in the top 1 per cent, or 2 per cent, or 5 per cent, or 10 per cent of verbs from the point of view of how passive they are. We can prepare lists as in Table 4.1.2. This uses the methodology for finding the ‘most passive’ verbs (with frequency over 500) in the BNC. It shows that the most passive verb is station: people and things are often stationed in places, but there are far fewer cases where someone actively stations things. For station, 72.2 per cent of its 557 occurrences are in the passive, and this puts it in the 0.2 per cent ‘most passive’ verbs of English. At the other end of the table, levy Table 4.1.2 The ‘most passive’ verbs in the BNC, for which a ‘usually passive’ label might be proposed
88
Percentile
Ratio
Lemma
0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1.1 1.1 1.2 1.2 1.3 1.5 1.5 1.5 1.6 1.7 1.9
72.2 71.8 71.1 68.7 66.3 65.0 64.7 64.1 63.2 62.0 59.8 58.1 57.8 55.5 55.4 54.9 53.9 53.1 52.8 52.4 51.5 50.8 50.1
Station Base Destine Doom Poise Situate Schedule Associate Embed Entitle Couple Jail Deem Confine Arm Design Convict Clothe Dedicate Compose Flank Gear Levy
Frequency 557 19201 771 520 640 2025 1602 8094 688 2669 1421 960 1626 2663 1195 11662 1298 749 1291 2391 551 733 603
Using Corpora as Data Sources for Dictionaries
is in the passive just over half the time, which puts it in the 1.9 per cent most passive verbs. The approach is similar to the collostructional analysis of Gries and Stefanowitsch (2004). As can be seen from this sample, the information is lexicographically valid: all the verbs in the table would benefit from an often passive or usually passive label. A table like this can be used by editorial policymakers to determine a cut-off which is appropriate for a given project. For instance, what proportion of verbs should attract an often passive label? Perhaps the decision will be that users benefit most if the label is not overused, so just 4 per cent of verbs would be thus labelled. The full version of the table in Table 4.1.2 tells us what these verbs are. And now that we know precisely the hypothesis to use (‘is the verb in the top 4% most-passive verbs?’) and where the hypothesis is true, the label can be added into the word sketch. In this way, the element of chance – will the lexicographer notice whether a particular verb is typically passivized? – is eliminated, and the automation contributes to a consistent account of word behaviour.
4.2 Register Labels: Formal, Informal, etc. Any corpus is a collection of texts. Register is in the first instance a classification that applies to texts rather than words. A word is informal (or formal) if it shows a clear tendency to occur in informal (or formal) texts. To label words according to register, we need a corpus in which the constituent texts are themselves labelled for register in the document header. Note that at this stage, we are not considering aspects of register other than formality. One way to come by such a corpus is to gather texts from sources known to be formal or informal. In a corpus such as the BNC, each document is supplied with various text-type classifications, so we can, for example, infer from the fact that a document is everyday conversation, that it is informal, or from the fact that it is an academic journal article, that it is formal. The approach has potential, but also drawbacks. In particular, it is not possible to apply it to any corpus which does not come with text-type information. Web corpora do not. An alternative is to build a classifier which infers formality level on the basis of the vocabulary and other features of the text. There are classifiers available for this task: see for example Heylighen and Dewaele (1999), and Santini et al. (2009). Following this route, we have recently labelled all documents in a 12 billion word web corpus according to formality, so we are now in a position to order words from most to least formal. The next tasks will be to assess the accuracy of the classification, and to consider – just as was done for passives – the percentage of the lexicon we want to label for register. 89
The Bloomsbury Companion to Lexicography
The reasoning may seem circular: we use formal (or informal) vocabulary to find formal (or informal) vocabulary. But it is a spiral rather than a circle: each cycle has more information at its disposal than the previous one. We use our knowledge of the words that are formal or informal to identify documents that are formal or informal. That then gives us a richer dataset for identifying further words, phrases and constructions which tend to be formal or informal, and allows us to quantify the tendencies.
4.3 Domain Labels: Geol., Astron., etc The issues are, in principle, the same as for register. The practical difference is that there are far more domains (and domain labels): even MEDAL, a general-purpose learner’s dictionary, has 18 of these; larger dictionaries typically have over 100. Collecting large corpora for each of these domains is a significant challenge. It is tempting to gather a large quantity of, for example, geological texts from a particular source, perhaps an online geology journal. But rather than being a ‘general geology’ corpus, that subcorpus will be an ‘academic-geology corpus’, and the words which are particularly common in the subcorpus will include vocabulary typical of academic discourse in general, and vocabulary associated with the preferences and specialisms of that particular journal, as well as of the domain of geology. Ideally, each subcorpus will have the same proportions of different text-types as the whole corpus. None of this is technically or practically impossible, but the larger the number of subcorpora, the harder it is to achieve. Once we have the corpora and counts for each word in each subcorpus, we need to use statistical measures for deciding which words are most distinctive of the subcorpus: which words are its ‘keywords’, the words for which there is the strongest case for labelling. The maths we use is based on a simple ratio between relative frequencies, as implemented in the Sketch Engine and presented in Kilgarriff (2009).
4.4 Region Labels: AmE, AustrE, etc. The issues concerning region labels are the same as for domains but in some ways a little simpler. The taxonomy of regions, at least from the point of view of labelling items used in different parts of the English-speaking world, is relatively limited, and a good deal less open-ended than the taxonomy of domains. In MEDAL, for example, it comprises just 12 varieties or dialects: American, Australian, British, Canadian, Carribean, Irish, New Zealand and South/East/ West African English. 90
Using Corpora as Data Sources for Dictionaries
5 Examples Most dictionaries include example sentences. They are especially important in pedagogical dictionaries, where a carefully selected set of examples can clarify meaning, illustrate a word’s contextual and combinatorial behaviour and serve as models for language production. The benefits for users are clear, and the shift from paper to electronic media means that we can now offer users far more examples. But this comes at a cost. Finding good examples in a mass of corpus data is labour-intensive. For all sorts of reasons, a majority of corpus sentences will not be suitable as they stand, so the lexicographer must either search out the best ones or modify corpus sentences which are promising but in some way flawed.
5.1 GDEX In 2007, the requirement arose – in a project for Macmillan – for the addition of new examples for around 8,000 collocations. The options were to ask lexicographers to select and edit these in the ‘traditional’ way, or to see whether the example-finding process could be automated. Budgetary considerations favoured the latter approach, and subsequent discussions led to the GDEX (‘good dictionary examples’) algorithm, which is described in Kilgarriff et al. (2008). The method is to score sentences, and only display the highest-scoring ones. A wide range of heuristics are used for scoring, including sentence length, the presence (or absence) of rare words or proper names and the number of pronouns in the sentence. The system worked successfully on its first outing – not in the sense that every example it identified was immediately usable, but in the sense that it streamlined the lexicographer’s task. GDEX continues to be refined, as more selection criteria are added and the weightings of the different filters adjusted, for English and for other languages. The lexicographer can scan a short list until they find a suitable example for whatever feature is being illustrated, and GDEX means they are likely to find what they are looking for in the top 5 examples, rather than, on average, within the top 20 to 30.
6 Translations The corpora that help most for finding translations are parallel corpora: corpora comprising pairs of texts that are translations of each other. Parallel corpora are the fuel that Google Translate feeds on, and ‘statistical machine translation’, of which Google Translate is the highest-profile example, is a great success story of language technology and the use of corpora. 91
The Bloomsbury Companion to Lexicography
Parallel corpora are of most use if they are aligned: that is, for each sentence, or word, in the one text, the computer knows what the corresponding item is in the other. Where the text is a straightforward literal translation, sentence alignment can now be performed with high accuracy. Of course, some sentences are not one-to-one, and some sections may exist in only one language. Working solutions have been found for identifying and handling these cases, which are all on a cline of how closely the translation follows the original. Throughout parallel corpus work, text-pairs which are literal translations are easiest to work with, whereas free translations of novels offer much less. Word alignment is intrinsically a trickier concept than sentence alignment. First, very often, an individual word is not translated by a single word. Second, items often do not stay in the same order. One can expect the sentences and their translations to be in the same order as each other, but one cannot expect the words and their translations to be in the same order in source and target text. There are two ways for lexicographers to use parallel corpora: parallel concordances, and summaries. The first is simpler and is based only on sentence alignments. The lexicographer searches for a word or phrase on one side of the corpus, and sees pairs of sentences which are translations of each other. The website www.linguee.com offers exactly this, for the big European languages, and since its arrival in 2009, it has rapidly become a translator’s favourite. Its display for the English search term baby and the language pair English-German is shown in Figure 4.1.4.
Figure 4.1.4 Screenshot from Linguee.com for English search term baby, language pair English-German 92
Using Corpora as Data Sources for Dictionaries
As with examples in general, people (translators, lexicographers and other users) find these example pairs very useful and easy to use. They will often remind a lexicographer of ways of translating a word or phrase that should be included in the dictionary entry, and will supply example sentence pairs to be included (usually after some editing). The ‘summaries’ approach for using parallel corpora only applies when the corpora are large, and for words where there are many sentence pairs. Then, it will not be possible for the lexicographer to read all the sentence pairs, and it should be possible for the computer to summarize what it finds in them. This is a bilingual version of the reasoning that led to word sketches. In a process closely related to methods for word alignment, the computer can find the
Figure 4.1.5 Bilingual word sketch, based on a parallel corpus, for red for the language pair English-French 93
The Bloomsbury Companion to Lexicography
other-language words that occur with particularly high frequency in the node word’s aligned sentences. The process can also be applied to find candidate collocations as translations of the node word’s collocations. A first version of a bilingual word sketch based on a parallel corpus, for red for the language pair English-French, is shown in Figure 4.1.5. A limiting factor for parallel-corpus work is the availability of a parallel corpus, for the language pair in question. The early work in the field was based on the Canadian parliamentary proceedings, ‘Canadian Hansard’, which were available in English and French and were fairly literal professional translations of each other. Other sources frequently used, for the languages of the EU, are the European parliamentary proceedings and other documents from the EU (as used for the screenshot). Other text-types where parallel data is often available include software documentation, documentation for vehicles and machinery, and film transcripts. A large and well-maintained collection of parallel data is available at the OPUS website.7 For any particular language pair, some text-types will be available, others will not.
7 Summary Corpora can make dictionary-making more accurate, efficient, complete and consistent. They can deliver a candidate headword list, and, where the corpora are developed with care, with neologism-finding in mind, can identify candidate neologisms. There are many ways in which they can support entry-writing. They can provide a wide range of clues to the lexicographer for analysing the word’s range of meaning into distinct senses. In combination with a suitable corpus query system they can find the idioms, phrases and collocations for a word. They can identify if a word has noteworthy behaviour in relation to grammar, domain, region and register. They can do the preparatory work for finding good example sentences, and translations. Corpora have been used in these ways in a range of dictionary projects, and the chapter has described how this has worked, with reference to a particular corpus query tool, the Sketch Engine. Over the last two decades, the lexicographer’s role has been more and more often checking and confirming or editing the corpus tool’s work, where earlier it would have been ‘writing from scratch’. In the early twenty-first century, with the advent of the web and many and varied online resources, much is changing in the world of dictionary-making, and many things are uncertain. Quite what the role of the lexicographer will be in ten years’ time is far from clear, but I am confident that the role of the corpus will grow, with the line between dictionary and corpus blurring, and the lexicographer operating at that interface. 94
Using Corpora as Data Sources for Dictionaries
Notes 1. The website for the BNC is http://natcorp.ox.ac.uk 2. In the BNC mucosa is marginally more frequent than spontaneous and enjoyment, but appears in far fewer corpus documents. 3. As is now generally recognized, the notion of ‘representativeness’ is problematical with regard to general-purpose corpora like BNC and UKWaC, and there is no ‘scientific’ way of achieving it: see for example Atkins and Rundell (2008: 66). 4. The database can be explored online at http://kelly.sketchengine.co.uk 5. The issue came to our attention when an early version of the BNC frequency list gave undue prominence to verbal car. 6. Here we exclude inflectional morphemes, addressed under lemmatization above: in English a distinction between inflectional and derivational morphology is easily made for most cases. 7. http://opus.lingfil.uu.se/
References Atkins, S. and Rundell, M. (2008) The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. Baroni, M., Bernardini, S., Ferraresi, A. and Zanchetta, E. (2009) The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation Journal 43/3, 209–26. Church, K. and Hanks, P. (1990) Word association norms, mutual information and lexicography. Computational Linguistics 16, 22–9. Clear, J. (1988) The Monitor Corpus. In: M. Snell-Hornby (ed.) ZüriLEX ’86 Proceedings. Tübingen: Francke Verlag, 383–9. Fairon, C., Macé, K. and Naets, H. (2008) GlossaNet2: a linguistic search engine for RSS-based corpora. In: S. Evert, A. Kilgarriff and S. Sharoff (eds) Proceedings, Web as Corpus Workshop (WAC4), Marrakech, 34–9. Gries, S. Th. and Stefanowitsch, A. (2004) Extending collostructional analysis: a corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics 9/1, 97–129. Heylighen, F. and Dewaele, J.-M. (1999) Formality of Language: Definition, Measurement and Behavioural Determinants, Internal Report. Free University Brussels. [available at http:// pespmc1.vub.ac.be/Papers/Formality.pdf] Janicivic, T. and Walker, D. (1997) NeoloSearch: automatic detection of neologisms in French internet documents. In: Proceedings of ACH/ALLC’97, Queen’s University, Ontario, Canada, 93–4. Kilgarriff, A. (1997) Putting frequencies in the dictionary. International Journal of Lexicography 10/2, 135–55. — (2009) Simple maths for keywords. In: M. Mahlberg, V. González-Díaz and C. Smith (eds) Proceedings, Corpus Linguistics. Liverpool. [available at http://ucrel.lancs.ac.uk/ publications/cl2009/] Kilgarriff, A. and Kosem, I. (2012) Corpus tools for lexicographers. In: S. Granger and M. Paquot (eds) Electronic Lexicography. Oxford: Oxford University Press. Kilgarriff, A. and Rundell, M. (2002) Lexical profiling software and its lexicographic applications: a case study. In: A. Braasch and C. Povlsen (eds) Proceedings of the Tenth Euralex Congress. Copenhagen: University of Copenhagen, 807–18.
95
The Bloomsbury Companion to Lexicography Kilgarriff, A., Rychlý, P., Smrz, P. and Tugwell, D. (2004) The Sketch Engine. In: G. Williams and S. Vessier (eds) Proceedings of the Eleventh Euralex Congress. Lorient, France: UBS, 105–16. Kilgarriff, A., Husák, M., McAdam, K., Rundell, M. and Rychlý, P. (2008) GDEX: automatically finding good dictionary examples in a corpus. In: E. Bernal and J. DeCesaris (eds) Proceedings of the XIII Euralex Congress. Barcelona: Universitat Pompeu Fabra, 425–31. Kilgarriff, A., Kovář, V., Krek, S., Srdanović, I. and Tiberius, C. (2010) A quantitative evaluation of word sketches. In: A. Dykstra and T. Schoonheim (eds) Proceedings of 14th EURALEX International Congress. Leeuwarden, Fryske Academy, 372–9. Kilgarriff, A., Charalabopoulou, F., Gavrilidou, M., Bondi Johannessen, J., Khalil, S., Johansson Kokkinakis, S., Lew, R., Sharoff, S., Vadlapudi, R. and Volodina, E. (2012) Corpus-based vocabulary lists for language learners for nine languages. Language Resources and Evaluation Journal 46 (to appear). O’Donovan, R. and O’Neill, M. (2008) A systematic approach to the selection of neologisms for inclusion in a large monolingual dictionary. In: E. Bernal and J. DeCesaris (eds) Proceedings of the XIII Euralex Congress. Barcelona: Universitat Pompeu Fabra, 571–9. Pomikálek, J., Jakubíček, M. and Rychlý, P. (2012) Building a 70 billion word corpus of English from ClueWeb. Proceedings LREC, Istanbul. [available at: www.lrec-conf.org/ proceedings/lrec2012/index.html] Rundell, M. (ed.) (2001) Macmillan English Dictionary for Advanced Learners. Oxford: Macmillan Education. Rychlý, P. (2008) A lexicographer-friendly association score. In: P. Sojka and A. Horák (eds) Proceedings of 2nd Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Masaryk University. Santini M., Rehm, G., Sharoff, S. and Mehler, A. (eds) (2009) Introduction, Journal for Language Technology and Computational Linguistics (Special Issue on Automatic Genre Identification: Issues and Prospects) 24/1, 129–45. Tapanainen, P. and Järvinen, T. (1998) Dependency concordances. International Journal of Lexicography 11/3, 187–203.
96
4.2
Researching the Use of Electronic Dictionaries Verónica Pastor and Amparo Alcina
Chapter Overview Introduction Electronic Dictionaries and Search Possibilities Analysis of Search Techniques in Electronic Dictionaries Classification of Search Techniques in Electronic Dictionaries Summary Table and Conclusion Acknowledgements
97 99 103 104 131 132
1 Introduction The use of electronic dictionaries has many advantages over the traditional paper dictionary. However, access to the lexicon and terminology of a dictionary presents certain difficulties, partly due to the lack of user knowledge about how a dictionary can be queried to access this kind of information, and partly due to the diversity of ways a dictionary can be consulted (in different areas of the dictionary, with different operators, in widely varied interfaces), which vary from one dictionary to another. Many studies can be found in the literature on dictionary use by translators (Meyer 1988, Roberts 1990, Atkins et al. 1998, Mackintosh 1998, Corpas et al. 2001, Sánchez 2004a), and also by native speakers of a language, and language learners (Tomaszczyk 1979, Béjoint 1981, Bogaards 1988, Tono 1989, 97
The Bloomsbury Companion to Lexicography
Hulstjin et al. 1998, Hartmann 1999, McCreary et al. 1999). Most of these and other authors conclude that users come across difficulties in their dictionary consultations, generally due to two main causes: the dictionary does not facilitate access to information (dictionaries present deficiencies and are not user-friendly), and users are unaware of how to use dictionaries to access the information they contain (Béjoint 1989: 280, Cowie 1999: 188, Hartmann 1999: 40, Nesi et al. 2002: 285, Fernández-Pampillón et al. 2003: 150, Sánchez 2004a: 482). In addition, dictionaries do not always take advantage of the full potential offered by electronic formats. Electronic dictionaries are frequently mere copies of paper dictionaries; they are texts where the same information appears in different font styles. However, these dictionaries do not offer the search possibilities of an advanced database system. Despite these reflections in the literature, we have found no studies that establish a ‘universal’ classification or arrangement of the search techniques that can be used in a dictionary, in other words, one that is valid for training in electronic dictionary use in general, and that can be adapted to any specific dictionary. By this, we refer to a classification that can specify and highlight all the query options a dictionary offers. This classification may serve as a guide to design search types when creating a new electronic dictionary, either from scratch or based on a paper dictionary. The present research, within the framework of the ONTODIC Project, investigates how to incorporate new search techniques, such as onomasiological searches, in electronic dictionaries (Alcina 2009). This has led us to analyze the search techniques that are currently available in electronic dictionaries (Pastor et al. 2010, this chapter1), as well as in electronic corpora (Pastor et al. 2009), and the internet (Pastor et al. 2011). In the present chapter, we first analyse the reflections offered by some authors on the features of electronic dictionaries, mainly in contrast to paper dictionaries. This is followed by an analysis of 32 advanced electronic dictionaries, from which we compiled all the search possibilities offered by electronic dictionaries, including the most common and best known, as well as the most innovative. Our empirical study is summarized in Annex 1. Following this analysis we describe our classification of search techniques, based on the authors’ reflections and the dictionary analysis. In this classification (Figure 4.2.1) we focus on three main elements that cover all the search possibilities. We term these three elements: the query, the resource and the result. The combination of these three elements and their subtypes allows us to structure all the search techniques that can be used in dictionaries. Finally, we conclude by highlighting the practical uses and contributions our classification provides.
98
Researching the Use of Electronic Dictionaries
Figure 4.2.1 Classification of electronic dictionaries by Lehr (1996: 315), translation in de Schryver (2003: 148)
2 Electronic Dictionaries and Search Possibilities The development of new technologies and the internet have progressively modified the concept of the dictionary. Many paper dictionaries have been converted to electronic formats, such as CD-ROM, while others are available online. Electronic dictionaries can be classified into various types according to different criteria. Some authors have attempted to devise typologies of electronic dictionaries (Ide 1993, Sharpe 1995, Nesi 1998, 1999, 2000b). One example of an electronic dictionary typology is that by Lehr (1996: 315), which focuses on technical and meta(lexicographic) evaluation. Based on technical evaluation, this author distinguishes between online or offline dictionaries. Offline dictionaries comprise pocket electronic dictionaries (PEDs) and PC dictionaries. PC dictionaries include dictionaries in CD-ROM, floppy disk and other formats. Based on meta(lexicographic) evaluation, this typology distinguishes between electronic dictionaries based on their paper versions, and newly developed electronic dictionaries, as well as electronic dictionaries with both print and innovative appearances. In this classification, the author distinguishes between newly developed electronic dictionaries (new development) and electronic versions of paper dictionaries (based on paper) (Gross 1997, Jacquet-Pfau 2002: 90). Nesi (2000a: 140) states that fully electronic dictionaries are more effective than electronic dictionaries 99
The Bloomsbury Companion to Lexicography
adapted from paper versions: ‘electronic dictionaries would be most effective if they were designed from scratch with computer capabilities and computer search mechanisms in mind’. Electronic dictionaries can be easily updated (Kay 1984: 461, Carr 1997: 214, Harley 2000) and allow a quicker, more precise and exhaustive search, in which a variety of search criteria can be combined (Jacquet-Pfau 2002: 99, de Schryver 2003: 157). According to Sharpe (1995: 48), electronic adaptations of paper dictionaries incorporate the same types of search as paper dictionaries, and their search possibilities are therefore less flexible. Lew (2011: 238, 242) points out that some online dictionaries are more uncomfortable to use than paper dictionaries because they do not exploit the potential of the electronic format. Forget (1999) affirms that electronic dictionaries differ from paper dictionaries in factors such as: use, presentation, search capabilities, technical aspects and nature of contents (multimedia elements). With regard to the first of these factors, for example, this author points to the use of hypertext in electronic dictionaries, and a higher flexibility in searches as compared to paper dictionaries. Forget also states that every electronic dictionary has a different interface, and as a result the user must devote more time to learning how to use each one of the electronic dictionaries available. The presentation of electronic dictionaries often involves the use of different colours to highlight information, as well as more intuitive interfaces. Electronic dictionaries are reported to incorporate different search types, and although electronic searches are always quicker than manual searches, much time can also be wasted on searching in an electronic dictionary without obtaining a satisfactory result. Finally, technical aspects mentioned include the interactivity of electronic dictionaries, which allow the user to add to a dictionary entry a comment that may be useful in future searches, such as a usage note, a context, a translation used for a specific client, etc. In addition, users can copy terms directly from the dictionary without having to type them, electronic dictionaries take up less space than paper dictionaries, and do not deteriorate with time. Of all the differences between electronic and paper dictionaries already mentioned in this section, the main difference stated by many authors, and that we want to emphasize here, lies in the way the user accesses the information in a dictionary (Nesi 1998, de Schryver 2003: 147). Search possibilities in paper dictionaries are limited: the search is restricted to the arrangement of contents by the lexicographer, which is always alphabetical (Sallas 2001) and, therefore, is limited to the search for an exact word (Santana et al. 1996: 70). In contrast, searches in electronic dictionaries are quicker and more flexible because they incorporate more advanced search techniques (Sánchez 2004b: 181, Kaalep et al. 2008). Some authors have reflected on the types of search that have already been implemented in electronic dictionaries, or that would be desirable to 100
Researching the Use of Electronic Dictionaries
incorporate, such as de Schryver’s (2003) paper ‘Lexicographer’s Dreams in the Electronic-Dictionary Age’. In this publication the author calls for dictionary designs that would allow more complex searches, and reviews all the authors that have dealt with the advances or desirables in the development of electronic dictionaries to date. Knowles (1990: 1656) suggests the use of hyperlinks; Abate (1985) mentions the use of images and graphics, as well as the use of natural language in searches; Poirier (1989) refers to the search in the whole text of the dictionary with Boolean operators; Sobkowiak (1999) mentions access to corpora concordances as a dictionary option; Nesi (2000a) suggests simultaneous searches in different dictionaries; Corris et al. (2000) refer to functions of the dictionary that suggest similar spellings to the search words; Geeraerts (2000) considers search functions by inverse indexes, anagrams, and phonetic similarity. Dodd (1989: 89) also refers to the search capabilities of electronic dictionaries and anticipates the possibility of searching for a word from its phonetics, spelling similarities, etymology, thematic area, semantic relations with other words (synonymy, antonymy, hyponymy), words in the definition, part of speech, etc. Many electronic dictionaries allow searches in the entire dictionary content, or in some of its sections, and not only in the entries (Roberts et al. 2001: 712), for example, in dictionary definitions or contexts. In addition, hyperlinks in electronic dictionaries link words that are related to other words (Nesi 1999: 61; Gómez González-Jover 2005: 161, Church 2008). Hamon et al. (2001: 187) identify different types of hyperlinks: in the entry index, in the keywords and in relations of synonymy and hyponymy between entries and subentries. The inclusion of images in dictionaries is also a very useful complement to linguistic information (Faber et al. 2007, Montero et al. 2008: 151, Lew 2011: 245). Some entries provide audio files, although dictionaries do not include video files (Lew 2011: 246). Another option of electronic dictionaries is the possibility to access the most recent entries the user consulted, similar to the ‘search history’ in web search engines (Forget 1999, Rizo et al. 2000: 369). Fernández-Pampillón et al. (2003) distinguish the following types of searches: search in an entry, search in a list of entries (alphabetical or inverse), assisted search, multiple search, search with related words (use of the dictionary as a thesaurus), search in anagrams and search using abbreviations and marks. Sánchez (2004b: 193–4) includes a range of searches: a search in the entries, similar to an alphabetical search in paper dictionaries; an assisted search in which the dictionary suggests words when the user has misspelled the search word; an advanced search, which allows the user to search in the dictionary content (definitions, examples, etc.); a search with wildcards, patterns defined by the user, and filters, which allows the user to combine search words with operators, and to search for orthographical variants with wildcards; a search for related words, which accesses words that are semantically related to other 101
The Bloomsbury Companion to Lexicography
words; a search with anagrams, which retrieves words that result from changing the order of the letters the user introduces in the dictionary; a search with abbreviations and marks, which retrieves all the forms corresponding to abbreviations and marks used in the dictionary entries; and finally a refined search, which retrieves words by introducing their pronunciation or etymology. Moreover, it is worth noting that a term’s meaning is determined not only by the concept it refers to, but also by the context in which it is used (Kussmaul 1995: 89, Corpas et al. 2001: 253, Robinson 2003: 113); hence, the importance given by translators and other language professionals to the search for terms within their context of use. Currently, dictionaries, even electronic ones, do not have a wide enough contextual field to solve all user needs. Bowker (1998: 648) states that dictionaries lack contexts compared to the valuable information offered by electronic corpora. Bowker performed a study with translation students in which half the students translated a text using dictionaries only, and the other half used electronic corpora and WordSmith tools. She found that the translations produced by the students with access to corpora were of a higher quality than those from the students that had only used dictionaries. These context-related deficiencies of dictionaries are one of the reasons some authors (Montero et al. 2008: 162) give to explain why translators have gradually moved from searching in dictionaries to other resources, such as the internet and corpora, which offer users wider contexts and facilitate access to these contexts by means of various search techniques that dictionaries have not yet incorporated. Because contexts are so relevant to translators and other language professionals, some studies call for the incorporation of corpora within the search systems of electronic dictionaries. In this way, corpora are not simply a source for creating dictionaries, nor a dictionary substitute (Colominas 2004: 362), but rather a complement to or accessory of the dictionary. In this vein, works like that of Castagnoli (2008) are of particular interest; this author created a database in which the contextual field is eliminated from the entries, and terms in the entries are linked to a reference corpus where the user can consult the concordances of a term. In addition, some corpora can be queried like a dictionary (Lew 2011: 246–7). In this section we have outlined some of the reflections many authors have made on electronic dictionaries, with particular attention to how they differ from paper dictionaries. Specifically, the literature highlights aspects such as interactivity, variety and flexibility in the searches. However, we have found no exhaustive studies on dictionary search possibilities, or on how these search possibilities could be organized. We therefore detected the need to develop a classification of search techniques in electronic dictionaries, through the analysis of a set of electronic dictionaries and the search possibilities they offer. 102
Researching the Use of Electronic Dictionaries
3 Analysis of Search Techniques in Electronic Dictionaries In an earlier study we analysed 15 electronic dictionaries and we proposed a classification of the search techniques used in electronic dictionaries (Pastor et al. 2010). Subsequently, we extended the analysis to 32 electronic dictionaries, and our classification proved useful. Only two new search results were added: collocations and related words, and the graphical representation of relations. The criterion used to select the dictionaries was that they should incorporate innovative search techniques as well as the traditional alphabetical search. In addition, we analysed both online and offline dictionaries in order to examine the advantages and disadvantages of both formats. Online dictionaries are more accessible than the CD-ROM format, and most online dictionaries are free. Moreover, online dictionaries can be consulted on any computer (with internet access), and the dictionary does not have to be installed in the computer in order to use it. However, offline dictionaries generally provide more search techniques than online dictionaries, and are more stable and durable compared to online dictionaries. The virtual nature of online dictionaries means that their URL location may change at any time, or they may even disappear altogether. We analysed dictionaries available on CD-ROM in Spanish, such as the Diccionario de uso del español (DUE) by María Moliner and the Diccionario de la lengua española (DRAE) by the Real Academia Española, in English, such as the Oxford English Dictionary (OED) and the Collins English Dictionary (CED), and in French, such as Le Grand Robert de la Langue Française. We then described a number of online dictionaries and databases, such as (in alphabetical order) the Base Lexicale du Français (BLF),2 Cercaterm, Collins BETA, the Diccionari de la llengua catalana, the Dicouèbe3 dictionary and other similar dictionaries (DiCoInfo,4 DiCoEnviro, DAD), the Diccionario de Colocaciones del Español (DiCE),5 Dirae, EcoLexicon,6 EOHS Term,7 FrameNet,8 Le grand dictionnaire terminologique, Genoma,9 Goodrae, IATE, Just The Word, Lexical FreeNet,10 Macmillan English Dictionary, the Merriam-Webster monolingual English dictionary, thesaurus, and visual dictionary, OncoTerm,11 the OneLook Reverse Dictionary, TERMIUM Plus, Trésor de la Langue Française informatisé (TLFi), Ultralingua, UNTERM, WordNet,12 the WordReference dictionary and the Wordsmyth dictionary. The search options of these dictionaries are summarized in the table (Table 4.2.1) in Annex 1, where they are compared according to different criteria (search for a word in the alphabetical entry list, search for one or more words in the definitions or other fields of an entry, use of operators, search for a semantic relation by navigation, search for a semantic relation by direct search, access to complementary forums, search by thematic area, access to external links, search for images, introduction of an exact word, introduction of a partial word, use of wildcards, introduction of a word to search for phonetically or orthographically related words, introduction of an inflected form, search for anagrams, 103
The Bloomsbury Companion to Lexicography
specification of a part of speech, and introduction of a question in natural language). The horizontal axis of the table shows the search options, and the dictionaries analysed appear on the vertical axis. Comments are also added on some search options in each dictionary. This is intended to give an overall picture of the diverse query options that can be found in electronic dictionaries before we go on to the systematization of search techniques in the following section.
4 Classification of Search Techniques in Electronic Dictionaries Our review of the search techniques reported in the literature and our analysis of the search techniques in electronic dictionaries reveal a wide variety of search possibilities. Not all the dictionaries incorporate the same search options. In addition, the same types of searches have different names, depending on the dictionary. We therefore consider it necessary to systematize all the search techniques that have been incorporated in electronic dictionaries to date. The purpose of this chapter is to present a proposal for the classification of search techniques in electronic dictionaries. Our search technique classification is divided into three elements that we have found to be present in every search: the query, the resource and the result. By differentiating the query, the resource and the result, we are able to reflect clearly and coherently all the search possibilities offered by each dictionary. Search techniques are options that a user can apply to a resource to obtain a result. The user wants to obtain particular information (the meaning of a word, a usage context, etc.), and to obtain this information the user queries an electronic resource by introducing an expression, which we call the query. The electronic resource queried by the user can be a dictionary section, for example a dictionary field where the user can find information. The specific element of the electronic resource queried by the user is called the resource. Finally, by introducing a query in a resource, the user obtains a result, the third element of our classification. Therefore, we distinguish three elements in a search technique: a query, a resource and a result. The query is the word or phrase introduced by the user in the interface of a resource. The resource is the resource or part of the resource in which the word or phrase is searched. The result is the element obtained when a query is searched in a resource. In a dictionary, if we introduce the exact word house as a query to search in the list of dictionary entries (resource), the result we obtain is the dictionary entry for the word house with information about this word. In contrast, if the query we introduce is a combination of words, for example yellow fruit and we search in the definition field of a dictionary (resource), the result we obtain is a 104
Researching the Use of Electronic Dictionaries
Figure 4.2.2 Representation of two search techniques in a dictionary list of words whose definition field contains the words introduced in the query (see Figure 4.2.2). Below, we explain in more detail the search techniques included in our classification, and provide examples of how these search techniques are applied in the electronic dictionaries we have analysed.
4.1 The First Element: The Query The query is the expression introduced by the user when searching in a dictionary. It is normally an exact word. In some cases, the user may introduce a partial word, for example, part of a word. In other dictionaries we can introduce an approximate word, an anagram, or a sequence of characters that may or may not form a word. Some dictionaries allow the user to introduce a combination of two or more words. Together with the expression, the user can also introduce other information, in the form of filters, in order to restrict or specify the result they want to obtain. A filter limits the expression to a particular criterion, such as a part of speech or a thematic field. For instance, a filter for the word play could be the part of speech ‘noun’. The types of queries we have discerned in electronic dictionaries fall into a range of search technique types and subtypes: (1) an exact word, (2) a partial word, (3) an approximate expression (inflectional form, or spelling similarity), (4) an anagram, and (5) a combination of two or more words.
4.1.1 Search by Exact Word
The search by an exact word consists of introducing a complete word in the same form as it is included in the dictionary. This search can be used to obtain the dictionary entry containing information about the introduced word (a definition, an example, grammatical information, etc.). This option is offered by all 105
The Bloomsbury Companion to Lexicography
dictionaries. For example, the user can search in the dictionary for the word house in the list of entries of the dictionary and find its definition, etymology, etc. The search of an exact word may be accent and case sensitive or not.
4.1.2 Search by Partial Word
A partial word is an incomplete word. The omitted part of the word can be the start, the middle or the end of the word. This omitted part of the word is replaced by a wildcard. The most frequent wildcards are the asterisk ‘*’, and the question mark ‘?’. The question mark normally replaces only one character. For example, analy?ed retrieves analyzed and analysed. The asterisk normally replaces one or more characters. For example, house* retrieves housemaid, housewife, housebreaking, household, housekeeper, etc. Of the dictionaries analysed, these two wildcards can be used in the CED, DRAE, DUE, OED, and the OneLook Reverse Dictionary. In some dictionaries other wildcards can be used. For example, the Ultralingua dictionary uses the asterisk (to replace zero or more characters), the question mark, and also the plus sign ‘+’, which replaces one or more characters. The Wordsmyth dictionary uses the asterisk or the percentage sign ‘%’ to substitute any sequence of characters, and the dot ‘.’ or the underscore or understrike ‘_’ to replace one character. Finally, the Lexical FreeNet dictionary has an option called ‘substring’ that retrieves words that start, end or contain the characters introduced, for example, the sequence reach retrieves the words reach, reachable, preach, unreachable, overreached, etc. In addition, almost all the electronic dictionaries that we have analysed include an auto-complete feature, which means that when the user starts introducing a word in the dictionary, a drop-down list appears suggesting entries included in the dictionary that start with the letters that the user is introducing. For example, in the Macmillan English Dictionary, if we start by introducing ima, the dictionary suggests words such as image, imagery, imaginable, imaginary, imagination, etc.
4.1.3 Search by Approximate Expression
An approximate expression is a word or sequence of characters that is similar to an exact word included in the dictionary. The approximate expression can be an inflected form of a word, or a word or sequence of characters that is pronounced or spelled similarly to another word. This search technique can be useful to obtain a list of words included in the dictionary that are similar to the word or sequence introduced by the user. We explain this search technique in more detail below.
106
Researching the Use of Electronic Dictionaries
Figure 4.2.3 Search by inflected form in the Ultralingua dictionary 4.1.3.1 Search by inflected form An inflected form is a word with inflectional morphemes, for example a plural noun or a conjugated verb. When the user queries an inflected form in a dictionary, the dictionary retrieves the base form of that word, which corresponds to the headword of a dictionary entry. For example, in the Ultralingua dictionary, if we introduce a conjugated verb, the dictionary retrieves the infinitive, or if we introduce a plural noun or adjective, it retrieves a singular noun or an adjective. Figure 4.2.3 illustrates the search for the conjugated verb played in the Ultralingua dictionary; the dictionary retrieves the entries that include the base form of this verb, the infinitive play. 4.1.3.2 Search by a similarly pronounced or spelled word This search technique consists in introducing a sequence of characters that may or may not constitute a word and that are spelled or pronounced similarly to a word that is included in the dictionary. This feature can be found in the CED, DRAE, DUE, Lexical FreeNet, OED, TLFi, Wordsmyth, etc. For example, Lexical FreeNet has two options to look for approximate expressions. One option, called ‘spell check’, retrieves words with a similar spelling to the introduced word. For example, the introduction of the word broad retrieves broad, abroad, byroad, road, etc. Other options called ‘sounds-like’ and ‘rhyme’ retrieve words that are phonetically similar to the word introduced. For example, if we introduce the word congratulations, some of the results are: nations, stations, creations, crustaceans, conjugations, etc.
107
The Bloomsbury Companion to Lexicography
In addition to these specific search options provided by some dictionaries to search for a similarly pronounced or spelled word, almost all the electronic dictionaries that we have analysed include a feature suggesting similarly spelled words that are included in the dictionary when the search word is not found. For instance, in the Macmillan English Dictionary, if we introduce wron, the dictionary does not display any result, but it suggests other similar words from the dictionary, such as wrong, wren, pron, iron, won, wrote, etc.
4.1.4 Search by Anagram and Crossword Search
An anagram is a sequence of characters that may or may not constitute a word, whose transposition results in one or more complete words included in the dictionary. An anagram search can be useful to obtain a list of dictionary words that contain all the letters of the anagram, in the same or a different order. Some dictionaries can also randomly add or discard a number of letters specified by the user (crossword search). These search techniques are useful for finding a word if we know some or all of the letters it contains. Anagram searches can be found in the CED, DRAE, DUE and Wordsmyth. If we introduce the letters bowle in the CED and select the ‘anagrams’ option, a list of words made up of those letters is retrieved: bowel, below and elbow. In addition, DRAE can perform crossword searches. For example, if we introduce the letters casa, and select the option of generating words by adding up to three more letters, we retrieve a long list of words including: asca ‘ascus’, casa ‘house’, actas ‘records’, casal ‘country house’, caseta ‘hut, stand or kennel’, casuca ‘shanty or hovel’, casería ‘country house’, caserna ‘barracks’, etc.
4.1.5 Search by a Combination of Two or More Words
Search by a combination of two or more words consists of introducing two or more words in the dictionary at the same time. Our analysis revealed five ways of combining words in a query: (1) presence of all the words introduced, (2) presence of any of the words introduced, (3) presence of one word and absence of another, (4) presence of the exact sequence of words, (5) introduction of words in the form of a question in natural language. Dictionaries normally combine words using operators. In the dictionaries analysed, the search for two or more words can be made in content fields, for example, to obtain a list of dictionary words whose definition contains the words included in the query, or that are related to the query words. 4.1.5.1 Presence of all words In this search technique all the words included in the query must be present in the resource. Below, we present some examples.
108
Researching the Use of Electronic Dictionaries
Figure 4.2.4 Presence of the words agreement and lawsuit in the OneLook Reverse Dictionary In the first example we combine the words agreement and lawsuit without operators in the OneLook Reverse Dictionary. The result we retrieve is a list of words, including: settle, accord, contract, champerty, etc. All the words retrieved by the dictionary are related to the query words. Figure 4.2.4 shows part of the entry for the word settle in the dictionary, where settle is given as a verb used in lawsuit contexts to designate the agreement reached by the parties. The operators used to indicate that all the words must be present can vary from one dictionary to another. The DUE uses the operator ‘&’ to combine two words, one at each side of the operator (word1 & word2). In this case, the dictionary retrieves the entries that contain both query words in their content fields. The CED uses the operator ‘AND’, and Wordsmyth requires the option ‘all of the words’. 4.1.5.2 Presence of any of the words The operators used to indicate that any of the query words can be present also vary from one dictionary to another. In the DUE, the operator ‘|’ is used in combination with two words, one at each side of the operator (word1 | word2). 109
The Bloomsbury Companion to Lexicography
In this case, the dictionary retrieves entries that include only the first or the second word introduced with the operator, or both words at the same time. In the CED, the operator ‘OR’ is used. In the Wordsmyth dictionary, the ‘word(s)’ option retrieves entries that include any of the words the user introduces. Figure 4.2.5 shows an example of this search technique in the Wordsmyth dictionary (reverse search). We introduce a three-word query, transport, carry and arrows, to search for in the dictionary definitions (‘word(s)’ and ‘definition’ options). The dictionary retrieves a list of over 100 words whose definitions include any of the words introduced. In this list we find words such as
Figure 4.2.5 Presence of any of the words in the Wordsmyth dictionary 110
Researching the Use of Electronic Dictionaries
quiver, which is defined as ‘a case designed to hold and transport arrows, often strapped to the back or waist’. This definition contains two of the three query words. The list also includes other words whose definitions contain only one of the query words, for example, achieve. One of the definitions of achieve is ‘to reach or carry through successfully; accomplish’. The criterion used by the dictionary to list the words is alphabetical. The dictionary does not prioritize entries with a higher number of query words in their definition. 4.1.5.3 Presence of one word and absence of another word In this search technique some of the words must be present in the content field of the dictionary and others must not be present. Of the dictionaries analysed, CED, DUE, OED and TERMIUM Plus allow this search technique. DUE uses the operator ‘!’, which is combined with two words, one on each side of the operator (word1 ! word2). This option retrieves entries whose definitions include the first word, with the condition that the second word does not appear. The CED and OED use the operator ‘NOT’, and TERMIUM Plus ‘AND NOT’. Figure 4.2.6 presents an example of this search in the CED. We combine the words feline NOT domestic, to search in the definition fields. We retrieve entries whose definition contains the word feline but not domestic. The result is a list of words referring to felines: bobcat, caracal, cheetah, feline, jaguar, jaguarondi, leopard, lion, lynx, etc. The definitions of these words include the word feline, but not domestic. As we can see in the definition of leopard, the word feline is included, but domestic is not. In the search results the word cat does not appear because it is defined in the dictionary as domestic feline. It is worth noting that searches ‘by a combination of two or more words’, for example in the dictionary definitions, can be problematic when they require the user to guess words that might appear in the definition. Although the search of feline NOT domestic in the CED yields a good list of wild felines, cat NOT house, will not. 4.1.5.4 Presence of an exact sequence of words Among the dictionaries analysed, the DUE, Dirae, Goodrae and TERMIUM Plus allow users to search for an exact sequence of words in the dictionary’s content field. In some dictionaries the sequence of words must be introduced in quotation marks. Figure 4.2.7 presents an example of a search for an exact sequence of words in TERMIUM Plus. When we search for words whose definition includes the exact sequence fabricar moneda ‘to make a coin’, the result is the word acuñar ‘to coin’. As we can see in Figure 4.2.7, the exact sequence fabricar moneda appears in the definition of acuñar.
111
The Bloomsbury Companion to Lexicography
Figure 4.2.6 Presence of one word and absence of another word using operators in the CED13 4.1.5.5 Queries in natural language The last search subtype using combinations of two or more words is the introduction of queries in natural language. This search can be used to obtain a list of words that might answer the question introduced in the dictionary. Of the dictionaries analysed, only the OneLook Reverse Dictionary allows this search technique. There is no restriction on the queries that can be introduced and so, for example, the user may use wh-questions (e.g. starting with what is or who is). 112
Researching the Use of Electronic Dictionaries
Figure 4.2.7 Presence of an exact sequence of words in TERMIUM Plus14
The query a big building retrieves a list of words including hall, block, barn, tower, castle, pile, termites, court, basilica, mausoleum, etc. The word castle is defined as ‘a large building’. In the same dictionary, if we introduce the query a small building, the resulting list includes chapel, turret, shed, summerhouse, lodge, shop, dentil, coach house, portakabin, cottage, etc. The word lodge is defined as ‘a small (rustic) house’ (Figures 4.2.8a and 4.2.8b). Searches using questions in natural language are the same as other searches with combinations of two or more words, because what the dictionary does is to extract the question keywords to look for the related words, which it then retrieves. In the case of the question What is a big building? the dictionary looks for words that are related to big and building. The dictionary also detects some more complex elements in a question, such as negations placed immediately before a word. For instance, if we introduce the query country which has sea in the OneLook Reverse Dictionary, the first result is seaside (a place by the sea). In contrast, if we introduce country which has no sea, the first result is landlocked (almost or entirely surrounded by land). However, if the negation does not appear immediately before the keyword, the system does not understand what the user wants to ask. For example, if we introduce the query country that does not have sea, the retrieved words are short or solid, which are not related to the query. Thus, this system clearly does not understand complex questions.
113
The Bloomsbury Companion to Lexicography
Figure 4.2.8a and b Queries in natural language in the OneLook Reverse Dictionary
4.1.6 Filters
Filters are used to add a search restriction to the query and may take the form of, for example, a part of speech restriction, a thematic field restriction or a language restriction (in bilingual or multilingual dictionaries). Filters can be combined with an exact word, a partial word or an approximate word. Some dictionaries allow the user to refine the search by searching inside the search 114
Researching the Use of Electronic Dictionaries
result from a previous search (refine search). Below we provide two examples of filters. 4.1.6.1 Part of speech filter An example of a part of speech filter is to look for a word only in its noun form. The Lexical FreeNet dictionary allows the user to restrict queries to nouns, adjectives, verbs and adverbs. As we can see in Figure 4.2.9, if we restrict the query play to the part of speech ‘noun’, the dictionary retrieves the word play when it is a noun and it does not include the verb play in the results. 4.1.6.2 Thematic area filter An example of a thematic area filter is found in the Cercaterm dictionary (see Figures 4.2.10a and 4.2.10b). We introduce the word puente and we restrict the search with the thematic area filter Electrónica ‘electronics’. The result is a list of words or expressions in Catalan in which puente is a type of electronic connection (pont ‘jumper’, establir un pont ‘to jumper’, pont de Schering ‘Schering bridge’, etc.). If we introduce the same query puente and restrict the search with the thematic area filter Transportes ‘transport’, the result is a list of words in
Figure 4.2.9 Use of part of speech filters in the Lexical FreeNet dictionary 115
The Bloomsbury Companion to Lexicography
Figure 4.2.10a and b Use of thematic area filters in Cercaterm 116
Researching the Use of Electronic Dictionaries
which puente is a type of building, a bridge (pont ‘bridge’, aproximació a un pas a nivell o a un pont mòbil ‘warning sign for grade crossing or a bascule or traveling bridge’, etc.).
4.2 The Second Element: The Resource In an electronic dictionary the information is structured in sections or fields. Each of these sections can be queried. This feature of electronic dictionaries differs from paper dictionaries, in which information is alphabetically ordered and the only option available is to search alphabetically for a word in the list. Electronic dictionaries allow other types of search techniques: in the dictionary entries, in the content fields (definitions, examples, relations, forums or corpora), in a thematic index, and in external links (to a search engine or other dictionaries). Broadly speaking, our classification divides the resource or specific sections that contain searchable information into four types: (1) entry field, (2) content field, (3) thematic field index, and (4) external links access field.
4.2.1 Search in the Entry Field
We understand entry field to mean all the headwords used to head and list the dictionary entries. The use of any query (an exact word, a partial word, etc.) in this field will allow the user to access all the entries whose headword coincides with the query introduced. A search in the entry field can be used to obtain information about a word that is found in such entries, for example, a definition, grammatical information, a usage example, etc. A single entry can also contain several sub-entries, where each sub-entry has its own entry field. For example, an entry may contain the main entry (that may contain spelling variants, derived words and phrases plus inflections of all types of headwords), as well as one or more sub-entries containing idioms and phrasal verbs. A word can be looked up in alphabetical order in the entries. For example, in the CED, the ‘browse’ search option orders the list of dictionary entries starting with the word or sequence of characters introduced by the user. If we introduce the sequence hous, the list starts with the word house, which is the first word in the dictionary that starts with hous (see Figure 4.2.11). Some dictionaries allow the list of entries to be ordered inversely, starting with the last letter introduced. For example, as we can see in Figure 4.2.12, if we introduce the sequence of characters cción with the option ‘diccionario inverso’ in the DRAE, the list of entries begins with the word acción ‘action’, the first word in alphabetical order ending in cción, and is followed by others such as redacción ‘writing’, reacción ‘reaction’, facción ‘faction’, etc.
117
The Bloomsbury Companion to Lexicography
Figure 4.2.11 Search in the alphabetical list of entries in CED
118
Researching the Use of Electronic Dictionaries
Figure 4.2.12 Search in the inverse alphabetical list of entries in the DRAE
4.2.2 Search in the Content Field
Content fields include information in text format in each entry. The information can vary: a definition, examples, lexical or semantic relations, comments from a forum and corpus concordances. The user can search with queries in these content fields to find entries whose content coincides with the query introduced. Below we provide examples of searches in the content fields mentioned. 4.2.2.1 Search in the definition fields In this example, we look for the words fruit and yellow in the definition fields of the Wordsmyth dictionary. The dictionary retrieves a list of words agrimony, apple, apricot, cherry, chinaberry, citron, Golden Delicious, grapefruit, Japanese quince, jujube and lemon. Figure 4.2.13 shows the definitions to the right of these words in which the words fruit and yellow appear. The CED, Diccionari de la llengua catalana, Dirae, DRAE, DUE, Goodrae, Grand dictionnaire terminologique, OED, OneLook Reverse Dictionary, TERMIUM Plus and WordReference also provide this search technique option. 4.2.2.2 Search in the relations fields Some dictionaries incorporate information on paradigmatic relations (e.g. synonymy, antonymy, hyponymy) or syntagmatic relationships (e.g. collocations) among their words or terms. These relation fields include words or expressions with a lexical or semantic relation to a dictionary entry. In many cases, this information can be accessed by navigating the dictionary’s hyperlinks or by a direct search using keywords. In the search by navigation, the user accesses synonyms of a word that are linked from within an entry, and which can lead to other synonyms. In the direct search with keywords, the dictionary searches 119
The Bloomsbury Companion to Lexicography
Figure 4.2.13 Search in the definition fields of the Wordsmyth for the query word or words in the synonymy fields, and retrieves a list of entries that include that synonym in their synonymy fields. Below we include examples of both search techniques. Examples of a search in the relations field by navigation come from the WordNet dictionary, Merriam-Webster monolingual English dictionary, Genoma and OncoTerm. The WordNet dictionary entries include semantically related words, such as synonyms, hypernyms, hyponyms, etc. As we can see in Figure 4.2.14, the entry for the word transport accesses the hyponyms air transport, navigation, hauling or trucking, etc. An example of direct search with keywords in the semantic relations field can be found in the Wordsmyth dictionary. Figure 4.2.15 shows a search for entries that include the word agreement among their synonyms. The dictionary retrieves a list of 27 synonyms for agreement, including accession, accord, alliance, etc. Other dictionaries also allow searches in the semantic relations fields. The WordReference and Ultralingua dictionaries contain a synonym search option. The OneLook Reverse Dictionary generates lists of words related to the query. Lexical FreeNet allows two types of searches in the semantic relations field. First, words related to one term; for example, if we introduce the word falcon, the dictionary option ‘show related’ retrieves the related words pigeons, nest, hawk, hunt, American kestrel, caracara, falco columbarius, falco peregrinus. Every word in the list is related to the term falcon (synonymy relation, generalization,
120
Researching the Use of Electronic Dictionaries
Figure 4.2.14 Search in the semantic relations field by navigation in the WordNet dictionary
Figure 4.2.15 Direct search in the semantic relations in the Wordsmyth dictionary specialization, etc.). Second, this dictionary can retrieve words that are related at the same time to two query words (‘connection’ option). As we can see in Figure 4.2.16, a word that is related to both agreement and lawsuit is settlement. Some dictionaries allow searches for collocations (Dicouèbe, DiCoInfo, DiCoEnviro, DAD, DiCE). In the Computer Science dictionary, DiCoInfo, we introduce the verb envoyer in French and search in the ‘lien lexical’ (lexical 121
The Bloomsbury Companion to Lexicography
Figure 4.2.16 Direct search in semantic relations in the Lexical FreeNet dictionary relation) field. The result is a list of collocations of the verb envoyer. Figure 4.2.17 shows that the results include the collocation envoyer un courriel or the collocation envoyer un spam à. The user can access the entries through the hyperlink of the collocate. 4.2.2.3 Search in a complementary forum Some dictionaries incorporate forums in their content fields. In these forums the user can ask questions related to the entry or consult the answers to previous questions asked by other users. These forums can be useful when the information included in the entry does not satisfy the user’s queries. For example, in the WordReference dictionary, the entries for agreement and contract include a comment from the forum contract for agreement in which the difference between a contract and an agreement is explained (see Figure 4.2.18).
122
Researching the Use of Electronic Dictionaries
Figure 4.2.17 Search in the lexical relations field in the DiCoInfo dictionary
123
The Bloomsbury Companion to Lexicography
Figure 4.2.18 Search in a complementary forum in the WordReference dictionary Forums can be useful for both dictionary users and creators, since the latter can use the users’ queries and answers to improve the dictionary. This way the forum works as a control and evaluation mechanism and allows creators to adapt the dictionary to the users’ needs. However, this search technique has to be used with caution: answers in a forum are not always controlled, and their reliability should be verified. 124
Researching the Use of Electronic Dictionaries
4.2.2.4 Search in a complementary corpus Some dictionaries incorporate links to a corpus in the entry content, in which the user can access concordances for each dictionary entry. An example of this search technique can be found in the EOHS Term database. Figure 4.2.19 shows this database entry for the term employee. Above the entry is a tab marked ‘Concordances’, which provides a list of 333 concordances of the term employee. The user can read each concordance, and also access the complete text in which
Figure 4.2.19 Search in a complementary corpus in the EOHS Term dictionary 125
The Bloomsbury Companion to Lexicography
the concordance appears, by clicking on the icon to the right of each concordance. Genoma and Just The Word also retrieve concordances from a corpus.
4.2.3 Search in the Thematic Field Index
A thematic field index is a list of hierarchically ordered areas, in which the user can navigate and select the item they want to consult. This field is frequently used to show a map of thematic areas. The dictionary entries, which can be words, but may also be images, are classified in these thematic areas. There are two types of search in the thematic field index. In the search by navigation the user scrolls down the hierarchical structure of thematic areas. In the direct search the user introduces a keyword in the dictionary that corresponds to a thematic area. Examples of both types of search are given below. In the first example (Figure 4.2.20) we search by navigation in the DRAE thematic field index. We scroll down the profesiones y disciplinas ‘professions and disciplines’ hierarchical structure, then enter ciencia y técnica ‘science and technology’, matemáticas ‘mathematics’ and finally álgebra ‘algebra’. The result is a list of dictionary entries classified within the selected thematic area that includes the terms binomio ‘binomial’, cociente ‘quotient’, coeficiente ‘coefficient’, combinación ‘permutation’, etc. In the following example we perform a direct search with keywords in the thematic field index of the Merriam-Webster visual dictionary. We introduce the keyword flower. The dictionary suggests the thematic areas flower and flowering. If we select the area flower we access entries with images related to this thematic area, such as pleasure garden, examples of flowers, structure of a flower, structure of a plant.
Figure 4.2.20 Search in the DRAE thematic field index 126
Researching the Use of Electronic Dictionaries
4.2.4 Search in External Links Access Field
This type of field offers links to resources external to the dictionary. Some dictionaries are linked to web search engines and other dictionaries. For example, in the WordReference dictionary, the ‘in context’ option searches in Google for the query introduced. In addition, the ‘images’ option will search for the query in Google images. Our example of a search in links to external dictionaries comes from the OneLook Reverse Dictionary. Figure 4.2.21 shows that the entry for the word hall has links to other general online dictionaries, such as the American Heritage Dictionary of the English Language or the Cambridge International Dictionary of English, as well as other specialized online dictionaries in the fields of arts, economics, computer science, medicine, etc.
4.3 The Third Element: The Result The result of a search is the information the user obtains after querying a dictionary. The result of a dictionary search is usually the entry with information about a word (meaning, grammatical information, pronunciation, etymology, use in context, equivalences, collocations and related words, etc.). In other cases, the result is a word or a list of words that corresponds to an entry in the dictionary. Finally, the result may also be an image or list of images, and the results may include audio files. The retrieval of these results depends on the type of dictionary and the options incorporated in it. Below we describe what we consider to be the most innovative results as implemented in electronic dictionaries.
4.3.1 Context(s)
We start by explaining the result where the user obtains information about the use in context of a word. Some dictionaries include contextual information about words in their entry content fields, for example FrameNet. Figure 4.2.22 shows the entry for the verb play in the frame ‘Performers_and_roles’. All the frame elements that have been annotated in the contexts of play are included in a table. Each frame element has a different colour, which allows the user to identify every frame element in a context. The entry also includes information about the syntactic patterns of frame elements. The user can also obtain contextual information about the dictionary entries by accessing the concordances of a complementary corpus, for example in the EOHS Term database.
4.3.2 Collocations and Related Words
The result of a dictionary search may be an entry with information about paradigmatically related words (e.g. synonyms, antonyms, etc.) and/or syntagmatically 127
The Bloomsbury Companion to Lexicography
Figure 4.2.21 Search in the external links access field of the OneLook Reverse Dictionary
128
Researching the Use of Electronic Dictionaries
Figure 4.2.22 Result of information about use in context in FrameNet related words (collocations). In the French version of the DiCoInfo, when a lexical unit is the base for collocations expressing different meanings, these collocations are grouped according to their meaning. For example, the lexical unit fichier ‘file’ is the base for two collocations which express the idea of ‘to delete’ a file: supprimer or effacer un fichier, therefore these two collocations occur together. In the DiCE the ayuda a la redacción ‘help with text production’ search option allows the user to retrieve the collocates that express the idea of ‘true love’: amor acendrado, verdadero, único.
4.3.3 Graphical representation of relations between words
In some dictionaries paradigmatic and syntagmatic relations are presented in graphs or tables. In the Visual DiCoInfo, the query cliquer retrieves the graphical representation of synonyms (double cliquer and double-cliquer), quasi-antonyms (relâcher), derivations (cliquable) and actants (fichier, utilisateur). In EcoLexicon results are presented in a graph, for example drought ‘attribute of’ dryness, ‘type of’ hydrological drought, ‘affects’ temperature, precipitation, area of land, ephemeral lake, flow and equivalents in several languages (see Figure 4.2.23). 129
The Bloomsbury Companion to Lexicography
Figure 4.2.23 Graphical representation of relations in Ecolexicon
4.3.4 Word or List of Words
Another result obtained from the dictionary is a word or list of words. Some search techniques, such as the search in content fields or the search in a thematic field index, can generate a list of words. For example, if we search the query vehicle in the definition field of the Wordsmyth dictionary, it retrieves a list of words that contain the query in the definition fields of their entries, such as aircraft, airflow, ambulance, aquaplane, ATV, automobile, etc. In addition, if we search for the thematic area Lógica ‘Logic’ in the DRAE thematic field index (profesiones y disciplinas ‘Professions and disciplines’ → Filosofía ‘Philosophy’ → Lógica ‘Logic’), the result we obtain is a list of words that are classified within this field, such as a contrariis ‘e contrario’, ad hóminem ‘ad hominem’, antecedente ‘antecedent’, a pari ‘a pari’, apodíctico ‘apodictic’, argumento ‘argument’, etc. It is worth mentioning that in most of the dictionaries that we have analysed, lists of words are displayed in alphabetical order. Yet in Dirae, for example, results may be displayed in alphabetical order, in order of relevance according to the user’s search, in order of frequency in a corpus, or according to word length. 130
Researching the Use of Electronic Dictionaries
It would be a valuable aid to the user if all the dictionaries displayed search results in order of relevance according to the user’s search. For example, if the user searches for agency, the resulting list of words should not be displayed in the order in which they appear in the dictionary, but in order of relevance, so that agency appears in the search result list before adoption agency.
4.3.5 Image or List of Images
Some visual dictionaries retrieve images classified in a thematic field index. For example, in the Merriam-Webster visual dictionary, if we navigate in the thematic field (‘themes’) in Plants & gardening → Plants → Flower → Structure of a flower, the dictionary retrieves the image of a flower with the names of all its parts. Illustrations are present in some entries in EcoLexicon, and in Collins BETA photos are embedded from Flickr.
4.3.6 Audio files
It is becoming common among electronic dictionaries to include audio files in the entries of some terms. In most dictionaries these audio files provide the pronunciation of words. Other types of audio file are also available in some electronic dictionaries. Some audio files help the user to understand the meaning of a word, in the same way as a definition or an image can do. For example, in the Macmillan English Dictionary some entries include audio files that clarify the meaning of a word with a sound. In the entry of the noun applause, we can hear the sound of people applauding, thus helping the user to understand the meaning of that word.
5 Summary Table and Conclusion In the analysis presented here, we have shown how each electronic dictionary uses a different nomenclature to refer to the types of search it offers. Moreover, in some cases the same name is used in different dictionaries to refer to different searches. This inevitably leads to problems when comparing dictionaries or teaching students how to use them. Neither does this situation encourage communication between lexicographers, terminographers and computer technicians when collaborating to develop a new dictionary, nor assist faculty in teaching future lexicographers, terminographers, philologists or translators to use a standard classification that is valid for all the search techniques. We think that a homogeneous classification and nomenclature such as the one we have presented here would be useful to evaluate electronic dictionaries, assess their functions, teach how to use them, or design new dictionaries. From the review of the literature and the analysis of a set of electronic dictionaries, we have synthesized the search techniques in electronic dictionaries 131
The Bloomsbury Companion to Lexicography
into three elements: the query, the resource and the result. These three elements embody all the search possibilities we have observed in electronic dictionaries. In addition, this structure is flexible, in that new elements can be incorporated if needed. In a first analysis of 15 electronic dictionaries, we proposed a classification of search techniques in electronic dictionaries; we then increased the number of dictionaries analysed to 32 and our classification still proved useful. Moreover, we have noticed that dictionaries are adding innovative features that we have included in our classification, for example, new results such as collocations and related words, and the graphical representation of relations. With this classification (Figure 4.2.24), we aim to contribute to solving the problems related to nomenclatures, types and subtypes of searches that we have found in each particular dictionary.
Acknowledgements We would like to thank the authors and editors of the following dictionaries for granting us permission to reproduce screenshots of them: Cercaterm, Collins English Dictionary (CED), Diccionario de la lengua española (DRAE), DiCoInfo, EcoLexicon, EOHSTerm, FrameNet, Lexical FreeNet, OneLook Reverse Dictionary, TERMIUM Plus, Ultralingua, WordNet, WordReference, Wordsmyth. This research is part of the ONTODIC Project: Methodology and technologies for the elaboration of onomasiological dictionaries based on ontologies. Terminological resources for e-translation, TSI2006–01911, and the ONTODIC II Project: Methodology and techniques for the elaboration of collocations dictionaries based on ontologies. Terminological resources for e-translation, TIN2009–07690, both funded by the Spanish Government.
Figure 4.2.24 Summary of our classification of search techniques for electronic dictionaries 132
Researching the Use of Electronic Dictionaries
Notes 1. This chapter is a revised and expanded version of our analysis which appeared in an article published in Vol. 23 (3) of the International Journal of Lexicography in 2010, reproduced here with kind permission of Oxford University Press. 2. See (Verlinde et al. 2010). 3. For more information on the Explanatory Combinatorial Lexicology (ECL), see (Mel’čuk et al. 1995). 4. See L’Homme (2008), Jousse et al. (2011). 5. See Alonso et al. (2010), Vincze et al. (2011). 6. For more information on Frame-Based Terminology and EcoLexicon, see Faber et al. (2005, 2006), Tercedor et al. (2008), Faber et al. (2009), Prieto-Velasco et al. (2009), Faber (2010). 7. See Castagnoli (2008). 8. For more information on semantic frames and FrameNet, see Fillmore (1985), Fillmore et al. (1998, 2002, 2003), Ruppenhofer et al. (2006). 9. See Cabré (2006). 10. This dictionary and all its functions are described in greater detail in Beeferman (1998). 11. See López-Rodríguez et al. (2006). 12. For more information on WordNet, see Fellbaum (1998), Miller (1998a, 1998b). 13. Reproduced from Collins Electronic English Dictionary & Thesaurus with the permission of HarperCollins Publishers Ltd. © HarperCollins Publishers 1992. 14. Source: Translation Bureau, Government of Canada’s Terminology and Linguistic Databank TERMIUM Plus® (www.btb.termiumplus.gc.ca), Government of Canada, 2012. Reproduced with the permission of the Minister of Public Works and Government Services Canada, 2012.
133
The Bloomsbury Companion to Lexicography
Annex 1 Table 4.2.1 Summary of the electronic dictionary analysis Search for a Search for one word in the or more words alphabetical in the definitions or other entry list fields of an entry
Use of operators
Access to Search for Search for complea semantic a semantic mentary relation by relation by navigation direct search forums
Search by thematic area
BLF
YES Search for definitions containing this word?, The full article in the BLF?, Search for word combination
Cercaterm
YES YES Advanced search. “” Field: denominación, definición and nota
YES
YES Restrict subject area
YES YES Advanced search. i, o in àrea temàtica Field: definició, option exemple
YES
YES Advanced search. Field: àrea temàtica
Diccionari de la llengua catalana
YES
Dicouèbe
YES Synonyms and antonyms
YES All fields
YES All the information about this word on the web?
YES
DiCoInfo
YES
YES Option: lien lexical
YES
YES Option: lien lexical
DiCE
YES
YES Option: lemas, funciones léxicas and valores
YES
YES Option: consultas avanzadas
EOHS-Term
YES
YES Option: advanced search
YES
YES Field related terms
134
Access to external links
YES It has no links but includes context fields with contexts extracted from a corpus
YES Option: full list by domain
YES Links to concordances of a corpus
Researching the Use of Electronic Dictionaries
Search for images
Introduction Introduction of an exact of a partial word word
Use of wildcards
YES
YES With wildcards
YES %
YES
YES With wildcards, and advanced search
YES *, ?
YES
YES Començada per, acabada en, en qualsevol posició, no començada per, no acabada en, que no contingui
YES
YES With wildcards
YES
YES Options: Terme commençant par and Terme contenant
Introduction Search for Introduction anagrams of an of a word to inflected search for form phonetically or orthographically related words
Specification of a part of speech
Introduction of a question in natural language
YES word/form
YES aproximat option in advanced search
YES in advanced search
YES
YES %
YES carac. grammaticales [lexie:cgs]
YES
YES
YES Introducing a sequence of characters without wildcards
Continued 135
The Bloomsbury Companion to Lexicography
Table 4.2.1 Continued Search for a Search for one word in the or more words alphabetical in the definitions or other entry list fields of an entry EcoLexicon
Use of operators
Access to Search for Search for complea semantic a semantic mentary relation by relation by navigation direct search forums
Search by thematic area
Access to external links
YES
YES Domains
YES Links to websites with images. It contains txt files with contexts extracted from a corpus
YES Search by frame
YES It has no links but includes contexts extracted from a corpus
FrameNet
YES
YES
Genoma
YES Search in a complementary corpus
YES
Grand dictionnaire terminologique
YES dans la définition
YES Search in Bwananet
IATE
YES
Just The Word
YES Search in the BNC corpus
Lexical FreeNet
YES
YES Option: alternatives YES
YES Options: show related and show reachable
Macmillan Dictionary
YES
YES
YES Thesaurus
MerriamWebster
YES
YES
YES Thesaurus
OncoTerm
YES
136
YES Conceptual relations
YES
YES Navigating or direct search with keywords
Researching the Use of Electronic Dictionaries
Search for images
Introduction Introduction of an exact of a partial word word
Use of wildcards
YES
YES
YES *, ?
YES
YES
YES
Introduction Search for Introduction anagrams of an of a word to inflected search for form phonetically or orthographically related words
Specification of a part of speech
Introduction of a question in natural language
YES
YES Option: que empiece por, que contenga, que termine por YES With wildcards
YES *, ?
YES YES
YES In the visual dictionary with a thematic index or direct search with keywords
YES
YES
YES Option: substring
YES Options: rhyme coercion and spell check
YES
YES Función autocompletar
YES
YES
YES Options: common words, nouns, verbs, adjectives, adverbs.
YES
YES Some entries include images
Continued 137
The Bloomsbury Companion to Lexicography
Table 4.2.1 Continued Search for a Search for one word in the or more words alphabetical in the definitions or other entry list fields of an entry OneLook
Use of operators
Access to Search for Search for complea semantic a semantic mentary relation by relation by navigation direct search forums
YES
YES
Termium Plus
YES
YES
YES YES AND, OR, AND NOT
TLFi
YES
YES
YES
Contenu
Ultralingua
YES
UNTERM
YES main field, acronym
WordReference YES
YES Options: In context, images (in Google), synonyms
Wordsmyth
YES
YES
Option: advanced search (old interface), Reverse Search (new interface)
Options: word(s), word(s) +forms, all word(s), text string
YES
YES &, |, !, ( ), “ ”.
DUE
138
YES
YES Direct search with keywords
Access to external links
YES Links to other dictionaries YES Restrict subject area
YES
YES
Recherche complexe – liens
Recherche assistée – discipline
YES English synonym dictionary
YES and, &
WordNet
YES
Search by thematic area
YES
YES
YES
YES Spanish synonym dictionary
YES
YES
Options: synonyms, all word relations, Similar Word, Related Word YES
YES
YES Link to Google
Researching the Use of Electronic Dictionaries
Search for images
Introduction Introduction of an exact of a partial word word
Use of wildcards
Introduction Search for Introduction anagrams of an of a word to inflected search for form phonetically or orthographically related words
YES
YES With wildcards
YES *, ?
YES
YES
YES
YES *, ?
YES
YES
YES
contenant un mot donné
Sons saisis, correcteur d’erreurs automatique, correcteur d’erreurs forcé
YES
YES YES Option: word *, ?, + hunt
YES
YES
YES With wildcards
YES YES Option: Option: Find similar Find similar strings (fuzzy strings (fuzzy search) search)
YES *
Specification of a part of speech
Introduction of a question in natural language
YES
YES
YES, listes de mots
YES Recherche assistée – code grammatical
YES
YES YES In Google
YES
YES Autocomplete feature
YES
YES
YES
YES
YES
With wildcards
*, %, ., _
Options: spelledlike and pronunciations
YES With wildcards
YES *, ?
YES Option: Búsqueda en las entradas
Some entries include images
YES
YES
YES
YES YES, Anagram and Crossword Solver Option: advanced search (old interface), Reverse Search (new interface) YES
Continued 139
The Bloomsbury Companion to Lexicography
Table 4.2.1 Continued Access to Search for Search for complea semantic a semantic mentary relation by relation by navigation direct search forums
Search for a Search for one word in the or more words alphabetical in the definitions or other entry list fields of an entry
Use of operators
YES
YES
YES YES Y, O, NO in búsqueda múltiple
Dirae
YES In DRAE definitions
YES “ ”, -
Goodrae
YES In DRAE definitions
YES “ ”, *
YES
DRAE
YES Introduction of a ~ at the end of the query
OED
YES
YES
YES AND, OR, NOT
YES
CED
YES
YES
YES AND, OR, NOT, ( )
YES
Robert
YES
YES YES ?, *, &, # Recherche option and then texte intégral (clic on définitions box)
140
YES
YES Recherche option and then texte intégral (clic on synonymes, renvois et contraires box)
Search by thematic area
Access to external links
Researching the Use of Electronic Dictionaries
Search for images
Introduction Introduction of an exact of a partial word word
YES
Use of wildcards
Introduction Search for Introduction anagrams of an of a word to inflected search for form phonetically or orthographically related words
YES YES YES Introduction *, ?, +, [. . .], of the begin- [!. . .], @, # ning of a word and índice de todas las palabras option, or introduction of the end of a word and búsqueda inversa option
YES
YES
Specification of a part of speech
Introduction of a question in natural language
YES Option: árbol de categoría gramatical
YES stemming
YES
YES With wildcards
YES *
YES stemming
YES
YES With wildcards
YES *, ?
YES
YES
YES With wildcards
YES *, ?
YES
YES
YES With wildcards
YES *, ?, &, #
YES Correction phonétique option and phonetic search
YES
YES
YES formes list
YES
YES
141
The Bloomsbury Companion to Lexicography
References Electronic Dictionaries Base lexicale du français – BLF, Katholieke Universiteit Leuven [Retrieved: 16–07–2012]. Cercaterm, Centre de Terminologia TERMCAT [Retrieved: 16–07–2012]. Collins BETA [Retrieved: 16–07–2012]. Collins English Dictionary, Electronic Edition. Version 1.5, HarperCollins Publishers (CD-ROM). Diccionario de colocaciones del español (DiCE), Grupo DiCE, Universidade da Coruña [Retrieved: 16–07–2012]. Diccionario de la lengua española, Electronic Edition. Version 21.1.0., Real Academia Española. Madrid, Espasa Calpe (CD-ROM). Diccionario de la lengua española, 22nd edition, Real Academia Española [Retrieved: 16–07–2012]. Diccionari de la llengua catalana, 2nd edition, Institut d’Estudis Catalans [Retrieved: 16–07–2012]. Diccionario de uso del español, Moliner, M., Madrid, Gredos (CD-ROM). DiCoEnviro (Le dictionnaire fondamental de l’environnement), Observatoire de Linguistique Sens-Texte (OLST), Université de Montréal [Retrieved: 16–07–2012]. DiCoInfo (Dictionnaire fondamental de l’informatique et de l’Internet), Observatoire de Linguistique Sens-Texte, Université de Montréal [Retrieved: 16–07–2012]. DiCoInfo Visuel, Observatoire de Linguistique Sens-Texte (OLST), Université de Montréal [Retrieved: 16–07–2012]. Dicouèbe: Dictionnaire en ligne de combinatoire du français, Observatoire de Linguistique Sens-Texte (OLST), Université de Montréal [Retrieved: 16–07–2012]. Dictionnaire analytique de la distribution, Dancette, J. [Retrieved: 16–07–2012]. Dictionnaire d’Apprentissage du Français des Affaires (DAFA), GRELEP [Retrieved: 16–07–2012]. Dirae, Rodríguez Alberich, G. and Real Academia Española [Retrieved: 16–07–2012]. EcoLexicon, LexiCon Research Group, Universidad de Granada [Retrieved: 16–07–2012]. EOHS Term, Advanced School of Modern Languages for Interpreters and Translators (SSLMIT), University of Bologna [Retrieved: 16–07–2012]. FrameNet, International Computer Science Institute [Retrieved: 16–07–2012]. Genoma, IULATERM, Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra [Retrieved: 16–07–2012]. Goodrae, Abad, S. [Retrieved: 16–07–2012]. Grand dictionnaire terminologique, Office québécois de la langue française, Gouvernement du Québec [Retrieved: 16–07–2012]. IATE (InterActive Terminology for Europe), European Communities [Retrieved: 16–07–2012].
142
Researching the Use of Electronic Dictionaries Just The Word, Sharp Laboratories of Europe [Retrieved: 16–07–2012]. Le Grand Robert de la langue française, Electronic Edition. Version 2.0 (CD-ROM). Le Trésor de la Langue Française informatisé, Atilf [Retrieved: 16–07–2012]. Lexical FreeNet, Version 2.0, Datamuse [Retrieved: 16–07–2012]. Macmillan English Dictionary, Macmillan Publishers Limited [Retrieved: 16–07–2012]. Merriam-Webster Online, Merriam-Webster, incorporated [Retrieved: 16–07–2012]. OncoTerm: Sistema Bilingüe de Información y Recursos Oncológicos, Grupo de Investigación OncoTerm, Universidad de Granada [Retrieved: 16–07–2012]. OneLook Reverse Dictionary [Retrieved: 16–07–2012]. Oxford English Dictionary, Electronic Edition. 2nd edition. Version 1.00, Oxford University Press (CD-ROM). TERMIUM Plus, Government of Canada [Retrieved: 16–07–2012]. Ultralingua [Retrieved: 16–07–2012]. WordNet 3.0, Princeton University [Retrieved: 16–07–2012]. WordReference Online Language Dictionaries [Retrieved: 16–07–2012]. Wordsmyth [Retrieved: 16–07–2012].
Other Literature Abate, F. R. (1985) Dictionaries past and future: issues and prospects. Dictionaries 7, 270–83. Alcina, A. (2009) Metodología y tecnologías para la elaboración de diccionarios terminológicos onomasiológicos. In: A. Alcina, E. Valero and E. Rambla (eds) Terminología y sociedad del conocimiento. Bern: Peter Lang, 33–58. Alonso, M., Nishikawa, A. and Vincze, O. (2010) DiCE in the web: an online Spanish collocation dictionary. In: S. Granger and M. Paquot (eds) eLexicograpy in the 21st Century: New Challenges, New Applications. Proceedings of eLex 2009 (Cahiers du Cantal 7). Louvain-la-Neuve: Presses Universitaires de Louvain, 364–74. Atkins, B. T. S. (ed.) (1998) Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators (Lexicographica. Series Maior 88). Tübingen: Max Niemeyer. Atkins, B. T. S. and Varantola, K. (1998) Monitoring dictionary use. In: B. T. S. Atkins (ed.), 83–122. Beeferman, D. (1998) Lexical discovery with an enriched semantic network. In: S. Harabagiu (ed.) Proceedings of the Workshop on Applications of WordNet in Natural Language Processing Systems. ACL/COLING, 135–41. Béjoint, H. (1981) The foreign student’s use of monolingual English dictionaries: a study of language needs and reference skills. Applied Linguistics 2/3, 207–22. — (1989) The teaching of dictionary use: present state and future tasks. In: F. J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds), Vol. 1, 208–15. Bogaards, P. (1988) À propos de l’usage du dictionnaire de langue étrangère. Cahiers de Lexicologie 52/1, 131–52.
143
The Bloomsbury Companion to Lexicography Bowker, L. (1998) Using specialized monolingual native-language corpora as a translation resource: a pilot study. Meta 43/4, 631–51. Cabré, M. T. (2006) From terminological data banks to knowledge databases: the text as the starting point. In: L. Bowker (ed.) Lexicography, Terminology, and Translation. Text-based Studies in Honour of Ingrid Meyer. Ottawa: University of Ottawa Press, 93–106. Carr, M. (1997) Internet dictionaries and lexicography. International Journal of Lexicography 10/3, 209–30. Castagnoli, S. (2008) Corpus et bases de données terminologiques: l’interpretation au service des usagers. In: F. Maniez, P. Dury, N. Arlin and C. Rougemont (eds) Corpus et dictionnaires de langues de spécialité. Bresson: Presses Universitaires de Grenoble, 213–30. Church, K. W. (2008) Approximate lexicography and web search. International Journal of Lexicography 21/3, 325–36. Colominas, C. (2004) Los corpora como herramientas de traducción. In: E. Ortega (ed.) Panorama actual de la investigación en Traducción e Interpretación. Granada: Atrio, 362–72. Corpas, G., Leiva, J. and Varela, M. J. (2001) El papel del diccionario en la formación de traductores e intérpretes: análisis de necesidades y encuestas de uso. In: M. Ayala (ed.) Diccionarios y enseñanza. Alcalá de Henares: Servicio de Publicaciones de la Universidad de Alcalá, 239–73. Corris, M., Manning, C. D., Poetsch, S. and Simpson, J. (2000) Bilingual dictionaries for Australian languages: user studies on the place of paper and electronic dictionaries. In: U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), 169–81. Cowie, A. P. (1999) English Dictionaries for Foreign Learners – A History. Oxford: Clarendon Press. de Schryver, G. M. (2003) Lexicographers’ dreams in the electronic-dictionary age. International Journal of Lexicography 16/2, 143–99. Dodd, W. S. (1989) Lexicomputing and the dictionary of the future. In: G. James (ed.) Lexicographers and Their Works (Exeter Linguistic Studies 14). Exeter: Exeter University Press, 83–93. Faber, P. (2010) Terminología, traducción especializada y adquisición de conocimiento. In: E. Alarcón (ed.) La traducción en contextos especializados. Propuestas didácticas. Granada: Atrio, 87–96. Faber, P., León-Araúz, P. and Prieto-Velasco, J. A. (2009) Semantic relations, dynamicity, and terminological knowledge bases. Current Issues in Language Studies 1/1, 1–23. Faber, P., Márquez-Linares, C. and Vega-Expósito, M. (2005) Framing terminology: a process-oriented approach. Meta 50/4 [available at: http://id.erudit.org/ iderudit/019916ar] Faber, P., León-Araúz, P., Prieto-Velasco, J. A. and Reimerink, A. (2007) Linking images and words: the description of specialized concepts. International Journal of Lexicography 20/1, 39–65. Faber, P., Montero, S., Castro-Prieto, M. R., Senso-Ruiz, J., Prieto-Velasco, J. A., León-Araúz, P., Márquez-Linares, C. and Vega-Expósito, M. (2006) Process-oriented terminology management in the domain of coastal engineering. Terminology 12/2, 189–213. Fellbaum, C. (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. Fernández-Pampillón, A. and Matesanz, M. (2003) Los diccionarios electrónicos: hacia un nuevo concepto de diccionario. In: C. López and A. Séré (eds) Nuevos géneros discursivos: los textos electrónicos. Madrid: Biblioteca nueva, 137–58. Fillmore, C. J. (1985) Frames and the semantics of understanding. Quaderni di Semantica 6, 222–53. Fillmore, C. J. and Atkins, B. T. S. (1998) FrameNet and lexicographic relevance. In: A. Rubio, N. Gallardo, R. Castro and A. Tejada (eds) Proceedings of the ELRA Conference on Linguistic Resources. Granada, 28–30 May 1998, 417–23.
144
Researching the Use of Electronic Dictionaries Fillmore, C. J., Baker, C. F. and Sato, H. (2002) The FrameNet database and software tools. In: M. G. Rodríguez and C. P. S. Araujo (eds) Proceedings of the Third International Conference on Language Resources and Evaluation (LREC). Las Palmas de Gran Canaria, 29–31 May 2002, 1157–60. Fillmore, C. J., Johnson, C. R. and Petruck, M. R. L. (2003) Background to FrameNet. International Journal of Lexicography 16/3, Special Issue on Frame Semantics, 235–50. Forget, N. (1999) Les dictionnaires électroniques dans l’optique de la traduction. MA Dissertation, University of Ottawa. Geeraerts, D. (2000) Adding electronic value. The electronic version of the Grote Van Dale. In: U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), 75–84. Gómez González-Jover, A. (2005) Terminografía, lenguajes profesionales y mediación interlingüística. Aplicación metodológica al léxico especializado del sector industrial del calzado y de las industrias afines. PhD Thesis, Universidad de Alicante. Gross, G. (1997) La grammaire, les dictionnaires et l’informatique. In: J. Pruvost (ed.) Les dictionnaires de la langue française et l’informatique. Actes du colloque la Journée des dictionnaires. Cergy-Pontoise: Centre de Recherche Texte-Histoire, 55–64. Hamon, T. and Nazarenko, A. (2001) Detection of synonymy links between terms. In: D. Bourigault, C. Jacquemin and M.-C. L’Homme (eds) Recent Advances in Computational Terminology. Amsterdam/Philadelphia: John Benjamins, 185–208. Harley, A. (2000) Cambridge dictionaries online. In: U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), 85–8. Hartmann, R. R. K. (1999) Thematic report 2. Case study: The Exeter University survey of dictionary use. In: R. R. K. Hartmann (ed.) Dictionaries in Language Learning. Recommendations, National Reports and Thematic Reports from the TNP Sub-Project 9: Dictionaries. Berlin: Freie Universität Berlin, 36–52. Hausmann, F. J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (eds) (1989–91) Wörterbücher/Dictionaries/Dictionnaires, Ein internationales Handbuch zur Lexikographie/ An International Encyclopedia of Lexicography/Encyclopédie internationale de lexicographie, Vols 1–3. Berlin: Walter de Gruyter. Heid, U., Evert, S., Lehmann, E. and Rohrer C. (eds) (2000) Proceedings of the Ninth Euralex International Congress, EURALEX 2000, Stuttgart, Germany, 8–12 August 2000. Universität Stuttgart: Institut für Maschinelle Sprachverarbeitung. Ide, K. (1993) A catalogue of electronic dictionaries. Language 22 /5, 42–9. Jacquet-Pfau, C. (2002) Les dictionnaires du français sur cédérom. International Journal of Lexicography 15/1, 89–104. Jousse, A.-L., L’Homme, M.-C., Leroyer, P. and Robichaud, B. (2011) Presenting collocates in a dictionary of computing and the internet according to user needs. In: I. Boguslavsky and L. Wanner (eds) Proceedings of the 5th International Conference on Meaning-Text Theory. Barcelona, 8–9 September 2011, 134–44. Kaalep, H.-J. and Mikk, J. (2008) Creating specialised dictionaries for foreign language learners: a case study. International Journal of Lexicography 21/4, 369–94. Kay, M. (1984) The dictionary server. In: Proceedings of the Tenth International Conference on Computational Linguistics (COLING-84). Stanford, 2–6 July 1984, 461. Knowles, F. E. (1990) The computer in lexicography. In: F. J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds), Vol. 2, 1645–72. Kussmaul, P. (1995) Training the Translator. Amsterdam/Philadelphia: John Benjamins. L’Homme, M.-C. (2008) Le DiCoInfo. Méthodologie pour une nouvelle génération de dictionnaires spécialisés. Traduire 217, 78–103. Lehr, A. (1996) Electronic dictionaries. Lexicographica 12, 310–7. Lew, R. (2011) Online dictionaries of English. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds) e-Lexicography. The Internet, Digital Initiatives and Lexicography. London: Continuum, 230–50.
145
The Bloomsbury Companion to Lexicography López-Rodríguez, C. I., Faber, P. and Tercedor, M. (2006) Terminología basada en el conocimiento para la traducción y la divulgación médicas: el caso de Oncoterm. Panacea VII (24), 228–40. Mackintosh, K. (1998) An empirical study of dictionary use in L2-L1 translation. In: B. T. S. Atkins (ed.), 123–49. McCreary, D. R. and Dolezal, F. (1999) A study of dictionary use by ESL students in an American university. International Journal of Lexicography 12/2, 107–46. Mel’čuk, I. A., Clas, A. and Polguère, A. (1995) Introduction à la lexicologie explicative et combinatoire. Brussels: Duculot. Meyer, I. (1988) The general bilingual dictionary as a working tool in thème. Meta 33/3, 368–76. Miller, G. A. (1998a) Foreword by George A. Miller. In: C. Fellbaum (ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, xv–xxii. — (1998b) Nouns in WordNet. In: C. Fellbaum (ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 23–46. Montero, S. and Faber, P. (2008) Terminología para traductores e intérpretes. Granada: Tragacanto. Nesi, H. (1998) Dictionaries on Computer: How Different Markets Have Created Different Products. University of Warwick. — (1999) A user’s guide to electronic dictionaries for language learners. International Journal of Lexicography 12/1, 55–66. — (2000a) The Use and Abuse of EFL Dictionaries. How Learners of English as a Foreign Language Read and Interpret Dictionary Entries (Lexicographica. Series Maior 98). Tübingen: Max Niemeyer. — (2000b) Electronic dictionaries in second language vocabulary comprehension and acquisition. In: U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), 839–47. Nesi, H. and Haill, R. (2002) A study of dictionary use by international students at a British university. International Journal of Lexicography 15/4, 277–305. Pastor, V. and Alcina, A. (2009) Search techniques in corpora for the training of translators. In: I. Ilisei, V. Pekar and S. Bernardini (eds) International Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography and Language Learning. Borovets, Bulgaria, 17 September 2009, 13–20. — (2010) Search techniques in electronic dictionaries: a classification for translators. International Journal of Lexicography 23/3, 307–54. — (2011) Acceso a la información terminológica en Internet: técnicas para traductores. In: S. Maruenda-Bataller and B. Clavel-Arroitia (eds) Multiple Voices in Academic and Professional Discourse. Current Issues in Specialised Language Research, Teaching and New Technologies. Newcastle: Cambridge Scholars Publishing, 243–56. Poirier, C. (1989) Les différents supports du dictionnaire: livre, microfiche, dictionnaire électronique. In: F. J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds), Vol. 1, 322–7. Prieto-Velasco, J. A. and López-Rodríguez, C. I. (2009) Managing graphic information in terminological knowledge bases. Terminology 15/2, 179–213. Rizo, A. and Valera, S. (2000) Lexicografía bilingüe: el español y la lengua inglesa. In: I. Ahumada (ed.) Cinco siglos de lexicografía del español. IV Seminario de Lexicografía Hispánica. Jaén, 17–19 November 1999. Publicaciones de la Universidad de Jaén, 341–80. Roberts, R. P. (1990) Translation and the bilingual dictionary. Meta 35/1, 74–81. Roberts, R. P. and Langlois, L. (2001) L’apport de l’informatique à la recherche lexicographique. Meta 46/4, 711–20. Robinson, D. (2003) An Introduction to the Theory and Practice of Translation. London: Routledge.
146
Researching the Use of Electronic Dictionaries Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L., Christopher, R. J. and Scheffczyk, J. (2006) FrameNet II: Extended Theory and Practice; available online at . Sallas, M. (2001) La recerca d’informació i de documentació en terminologia. In: M. T. Cabré, L. Codina and R. Estopà (eds) Terminologia i Documentació. Barcelona: IULA, 107–20. Sánchez, M. d. M. (2004a) Estudio experimental sobre el uso del diccionario como herramienta para el traductor: hacia una descripción de necesidades. In: E. Ortega (ed.) Panorama actual de la investigación en Traducción e Interpretación. Granada: Atrio, 477–86. — (2004b) El uso de los diccionarios electrónicos y otros recursos de Internet como herramienta para la formación del traductor inglés-español. PhD Thesis, Universitat Jaume I. Santana, O., Hernández, Z., Pérez, J., Rodríguez, G. and Carreras, F. (1996) Diccionarios en soportes informáticos. Cuadernos Cervantes 11, 68–77. Sharpe, P. (1995) Electronic dictionaries with particular reference to the design of an electronic bilingual dictionary for English-speaking learners of Japanese. International Journal of Lexicography 8/1, 39–54. Sobkowiak, W. (1999) Pronunciation in EFL Machine-Readable Dictionaries. Poznań: Motivex. Tercedor, M. and López-Rodríguez, C. I. (2008) Integrating corpus data in dynamic knowledge bases. Terminology 14/2, 159–82. Tomaszczyk, J. (1979) Dictionaries: users and uses. Glottodidactica 12, 103–19. Tono, Y. (1989) Can a dictionary help one read better? In: G. James (ed.) Lexicographers and their Works (Exeter Linguistic Studies 14). Exeter: University of Exeter Press, 192–200. Varantola, K. (1998) Translators and their use of dictionaries. In: B. T. S. Atkins (ed.), 179–92. Verlinde, S., Leroyer, P. and Binon, J. (2010) Search and you will find. From stand-alone lexicographic tools to user driven task and problem-oriented multifunctional leximats. International Journal of Lexicography 23/1, 1–17. Vincze, O., Mosqueira, E. and Alonso, M. (2011), An online collocation dictionary of Spanish. In: I. Boguslavsky and L. Wanner (eds) Proceedings of the 5th International Conference on Meaning-Text Theory. Barcelona, 8–9 September 2011, 275–86.
147
4.3
Researching Historical Lexicography and Etymology John Considine
Chapter Overview Historical Lexicography Etymological Lexicography Conclusion
148 155 160
Historical and etymological lexicography are abnormal in two respects. First, they are never primarily concerned with the present. An entry in a historical or etymological dictionary may begin or end with information about the present, but it can never leave the past out of account, and it may be exclusively concerned with the past. Second, although entries in historical and etymological dictionaries usually include definitions or translation equivalents, these are never of primary importance, and they are occasionally dispensed with altogether (cf. Silva 2000: 90). The research undertaken by historical lexicographers and etymologists is therefore markedly different from other kinds of lexicographical research, and some etymological work is not even oriented towards publication in dictionary form.
1 Historical Lexicography 1.1 The Weak and Strong Senses of ‘Historical’ The phrase historical lexicography can be used in a weak sense or a strong one. In its weak sense, it can refer to any lexicography with a diachronic dimension. 148
Researching Historical Lexicography and Etymology
All monolingual Shakespeare dictionaries and glossaries are historical in this sense: they provide lexical items from a subset of early modern English with equivalents or explanations in a more recent variety of English. A dictionary which gives a date for the first attestation of a word, or for the first attestation of each sense of a word, and which registers obsolete words and senses, is likewise historical in this weak sense: Cannon (1996), to which we shall return at Section 2.1 below, is an example, as is the Concise Scots Dictionary of 1985. A characteristic entry in the latter reads (slightly abridged) ‘outlie &c [ˈutˊlaI] n 1 an outlying piece of ground la20-, NE. 2 money put out on loan or on mortgage 19-e20’. Here, the first sense of the word is identified as in recorded use in north-eastern Scotland from the late twentieth century onwards, and the second as in use in the nineteenth and early twentieth centuries. No examples are given, and the definitions are the most important element in the entry. A more richly historical dictionary is the excellent Dictionnaire historique de la langue française edited by Alain Rey. Its entry for cab begins CAB noun masc. is borrowed (1848) from the English cab (1827) . . . Cab originally refers to a carriage with two wheels drawn by one horse, then in 1834 this type of carriage with the driver at the rear . . . the [French] word is also used, as in England, for four-wheeled carriages with the driver at the front. The cab was elegant and fashionable in the second half of the 19th century . . . The form hansom cab is used with an extra quality of snobisme (Proust). Today the word is confined to historical usage.1 This entry combines lexical information about how the French word cab has been used with encyclopedic information about the kind of carriage it designates. No source is quoted, although the use of hansom cab in Proust is commented on; in fact, it is remembered as one of Mme. Swann’s short-lived fashionable gestures – ‘beaucoup d’années auparavant elle avait eu son “hansom cab”’ – and knowing this would have made it possible to place the snobisme to which the entry refers more precisely. The most extensive dictionary of French, the sixteen-volume Trésor de la langue française, is at one level very shallowly historical, covering the period 1789–1960, but its entry structure includes a section in which the first known attestations of the major senses of a given word are briefly identified and dated. A dictionary which is, in the strong sense, edited on historical principles does more than simply providing historical information. It is founded on attestations of the actual usage of each word; as we shall see, these are usually from printed sources. The attestations are presented in every entry. An entry for a hapax legomenon, a word attested once and once only in all the relevant texts which are available to the lexicographer, quotes the text of that single attestation, giving the author, title, and often the date of the source, and as much 149
The Bloomsbury Companion to Lexicography
further information (for instance page, chapter or line numbers) as will allow the reader to locate the quotation. An entry for a word which is attested more than once includes a quotation paragraph which begins with the earliest available attestation and then presents more, chronologically ordered, in a series ending with the latest attestation known to the lexicographer, or with one whose date is reasonably close to the upper chronological limit of the dictionary. In the case of a polysemous word, each sense (and, if appropriate, each subsense) is provided with such a quotation paragraph. The senses are arranged in the entry in an order which reflects their chronological development; there are several possible ways of doing this (see Silva 2000: 90–3). Although each quotation paragraph is likely to be accompanied by a definition, the quotations are not ancillary to the definition – if anything, the reverse is true. As a historical lexicographer has recently put it, It should go without saying that meaning ordinarily ranges continuously. Thus, division into defined senses is intended to set out the range of uses – illustrated with the clearest examples to be found – and not to imply their discreteness or that a use in context cannot reflect more than one of the senses as defined. (Ashdowne 2010: 209; cf. Silva 2000: 89–90) As well as the quotation paragraphs and definitions which are at the heart of the entry, a dictionary on historical principles is likely to offer at least some etymological information, and may also offer further information, for instance about morphology, pronunciation, variations in spelling, sociolinguistic factors and more. The source of the historical information in the entry for outlie in the Concise Scots Dictionary is the Scottish National Dictionary, a dictionary of post-1700 Scots on historical principles. This begins (slightly abridged) OUTLIE, n., v. Also outly(e), ootlie, -lye. [‘utlAe] 1. An outlying piece of ground (Abd., Kcd. 1964). ne.Sc. 1958 People’s Jnl. (6 Sept.): I was plooin’ an oot-lye, a gey bit frae the farm. ‡2. Money lying out at loan or on mortgage (Sc. 1808 Jam.). Per. 1881 R. Ford Hum. Readings 62: Wi’ mair interest for the outlie o’t.
Abd. 1929 J. Alexander Mains & Hilly 56: Interest for the ootlie o’ their siller.
A distinctive feature of this dictionary is its interest in the regional distribution of words: its title-page identifies it as compiled ‘partly on regional lines and partly on historical principles’. This accounts for the abbreviations: Ab[er] d[een], K[in]c[ar]d[ine] and so on. Distinctive too is the heavy dependence on an earlier dictionary, Jamieson’s of 1808 and 1825.2 But the vital role which 150
Researching Historical Lexicography and Etymology
accurately referenced quotation evidence plays in the Scottish National Dictionary is clear, and characteristic of all lexicography on historical principles. ‘The quotations’, in the opinion of James Murray (paraphrased in Murray 1977: 274), were ‘the essence’ of the New English Dictionary on Historical Principles which he edited, now known as the Oxford English Dictionary (OED) and in many ways the pre-eminent historical dictionary in the world.
1.2 Lexicography on Historical Principles: Different Kinds of Evidence The analysis of quotations is, then, the fundamental task of the historical lexicographer, with the collection of quotations as a laborious prerequisite. These tasks have both evolved over the years, not least as a result of changing technologies, and a way to start looking at them is to turn to the very first dictionary which was edited on explicitly historical principles, Franz Passow’s revision of Johann Gottlob Schneider’s Handwörterbuch der griechischen Sprache, published in 1819.3 This dictionary dealt with the language of a limited, stable body of texts. Most (though not all) of the extant texts in ancient Greek were available to Passow in printed editions, and at least relative dates could be assigned to them. Much pioneering work had been done on the excerption of these texts to illustrate the range of senses of individual words (cf. Zgusta 2006: 30–1). Passow’s task was to re-examine the attestations of each word, to place them in an order which would show the chronological development of the word – he referred metaphorically to its ‘life-story’ – and to supplement them from, in the first instance, the oldest texts available to him, namely the poems of Homer and Hesiod, which form a canon of fewer than 250,000 tokens (Thesaurus Linguae Graecae, ‘canon search’ results). These poems were familiar to him, and there was an ample verbal index to them (Seber 1780). Had Passow lived long enough to work his way through all the ancient Greek available to him, he would have found himself dealing with a much less wieldy corpus. In fact, the discoveries of the last two centuries make a corpus of about 105 million words of classical and Byzantine Greek available to students of the language today. Although this corpus is fairly stable inasmuch as nobody is likely to discover another 10 million words of ancient Greek, and although it is machine-readable in the form of the Thesaurus Linguae Graecae, it is large enough to be very difficult to handle lexicographically. So, for example, the stem lexikograph- occurs three times in the corpus, the earliest attestation being in Nymphis, a historian no later than the third century bc (Pantelia 2000: 7). Only the latest of these three attestations is noticed in the best modern dictionary of ancient Greek, that of Liddell and Scott (a descendant of Passow’s work: see Zgusta 2006: 35–8). But it is one thing to note a single omission in a 151
The Bloomsbury Companion to Lexicography
dictionary of a language as well-attested as ancient Greek, and another to fix all of the omissions.4 Other dead languages offer different challenges to the historical lexicographer. Some are known from scantier evidence than that for ancient Greek. The corpus of classical Latin to 200 ad is about seven and a half million words, and future additions to it (for instance from the discovery of documents such as those excavated at Vindolanda) are likely to be minor.5 It was exhaustively concorded on paper slips for the Thesaurus linguae latinae, a dictionary of Latin on historical principles: there is a slip for every occurrence of the conjunction et ‘and’ in every Latin text from before 200 ad (Schröder 2012: 295). The corpus of Old English is about 3 million words, and future additions to it are likely to be tiny. The Dictionary of Old English project currently under way at Toronto is therefore based, like the Thesaurus linguae latinae, on an exhaustive concordancing of the whole corpus of the language. The Oudnederlands Woordenboek is based on a very much smaller corpus, the 68,000 words (including many toponyms) now regarded as Old Dutch (Louwen 2008: 220). The sparser the evidence for a dead language is, the harder it is to reach firm conclusions about its surviving vocabulary, and the historical lexicography of such languages may have a provisional quality, as did Anna Morpurgo’s Mycenaeae Graecitatis Lexicon of 1963, a pioneering dictionary of Mycenaean Greek which was being retrieved from inscriptions in Linear B (for it, see Jones 1966). It may even have a self-consciously controversial quality: Erica Reiner, the chief editor of the Chicago Assyrian Dictionary, once said of it that ‘This is not a bland dictionary. . . . We stick out our necks, and then somebody comes along ten years later and corrects the guess’ (Shenker 1978: 3). When a historical dictionary is compiled on the basis of a large and ill-defined body of evidence, the selection of quotation material becomes a pressing concern. Such a body of evidence may be from an obsolete language variety. For instance, no new texts are being written in early modern English, but although there is a more or less complete inventory of surviving early modern English printed books, their texts are not all available in machine-readable format, and there is a very large body of manuscript material, much of it unedited. Work is in progress on a corpus of approximately a thousand million words of Latin (see Bamman and Smith 2012), of which 99 per cent must be post-classical, and this represents a fraction of the extant texts. It is to the point that attempts to compile a dictionary of early modern English have been unsuccessful (see Adams 2010), and no attempt has ever been made to compile a universal dictionary of post-classical Latin. The first dictionary to be compiled on consciously historical principles after that of Passow, the Deutsches Wörterbuch of Jacob and Wilhelm Grimm (first fascicle 1852), was really of pre-contemporary German, roughly from Luther to Goethe, so it was a dictionary based on materials selected from a closed but indeterminately large body of evidence. 152
Researching Historical Lexicography and Etymology
Two of the nineteenth-century dictionary projects which followed the inception of the Deutsches Wörterbuch were even more audacious. Both the New English Dictionary / OED (first fascicle 1884) and, at least from 1892, the Dutch Woordenboek der Nederlandsche Taal (first fascicle 1863, but only on fully historical principles from 1892: see Eickmans 2012: 272) aimed to document a living language variety from a rationally chosen starting point to the present day. Not only did each project require editorial engagement with materials selected from a body of evidence which was growing daily – hence the justly famous image of F. J. Furnivall cutting useful quotations from his newspaper every morning to add to the New English Dictionary files (Murray 1977: 179–80, and cf. 183) – but each documented a language which had developed a wide and well-recorded variety of specialized terminologies, not least scientific and technological. The questions of selection and of the treatment of terminology affect nearly all dictionaries on historical principles in which living languages are treated, and they are both problematic.
1.3 Some Problems for Historical Lexicographers The problem of selection is that some kinds of source are more readily available than others. Most obviously, verbatim recordings of informal speech are hard to come by before the twentieth century (texts presented as such may often be affected by scribal intervention), and informal and non-élite written usage are likewise increasingly hard to come by as the lexicographer searches backwards in time. The most readily available sources tend to be those which have canonical literary status. So, for instance, when the early editors of the OED made Walter Scott’s Lady of the Lake of 1810 their first quotation for the attributive use of the noun soldier, even though a poem published by Anne Bannerman in the Edinburgh Magazine of 1800 would have antedated the quotation (and may have inspired Scott: see Brewer 2009: 217–18), it was doubtless because The Lady of the Lake was very widely available to, and read by, the readers who provided them with much of their quotation evidence, while old volumes of the Edinburgh Magazine had a more restricted currency.6 Other historical dictionaries show a similarly heavy use of canonical literary sources for the same reason. The historical lexicography of regional varieties of English has necessarily drawn on the non-élite sources in which such varieties are most strongly marked, including popular print and records of oral usage. The latter were neatly integrated with historical printed texts in the Dictionary of Jamaican English in 1967, and have been used in other dictionaries on historical principles such as the Dictionary of Newfoundland English of 1982. Harry Orsman’s Dictionary of New Zealand English integrated its compiler’s recollections of oral usage in past decades with his printed sources. The third edition of the OED 153
The Bloomsbury Companion to Lexicography
does not draw directly on oral sources; nor did previous editions, although Murray allowed himself to supply lacunae in the quotation evidence for current usage with quotations which he made up himself and labelled ‘Mod[ern]’ (see Murray 1977: 200–1). And whereas the OED now draws material from a very wide selection of print sources, especially those whose vocabulary is, in effect, indexed by being made available in online databases and archives, its editors are cautious of using data from online-only sources, just as they have always been cautious of using data from unpublished manuscripts. Where an online-only source provides a valuable quotation, for instance an antedating of all printed sources, it is printed out so that a hard copy can be archived. Scientific and technological vocabulary present one superficial problem for historical lexicography, and one deep one. The superficial problem is that the process of extracting lexical items from specialized scientific texts and writing definitions of them which are both exact and accessible to as wide a readership as possible calls for lexicographers with a special range of competence. But they can be found: the OED appointed its first science editor in 1968 (Brewer 2007: 200–3 and 295 n 84). The deeper problem is that, whereas in principle historical dictionaries proceed by induction from quotations to definition, this is impracticable in the case of terminology: the changing senses of geek can be induced from quotation evidence, but writing a definition of germanium calls for an understanding of the structure and uses of germanium which is hardly to be extracted from quotation evidence, unless the quotations are themselves entries in a scientific dictionary or encyclopedia. No historical dictionary registers the whole of a highly developed scientific terminology, or even as high a proportion of one as the fullest synchronic dictionaries, but they must all engage to some extent with this problem. It is, in fact, the problem of encyclopaedic reference: as soon as the lexicographer is called on to explain the name of a plant, or of a disease of sheep, or of a kind of fishing-boat, she can hardly avoid resorting to encyclopedic sources of information rather than pure induction from the quotations. Dictionaries which register the history of a specialized vocabulary such as that of medicine (see e.g. Norri 2010, esp. 73) are, it might be argued, hybrids of the historical encyclopedia and the historical dictionary. But no lexicography on historical principles is ever absolutely pure. Other elements of the historical dictionary entry may be derived from the quotations. This is the case with information about grammatical function, collocates, geographical distribution and register. Dictionaries of language varieties in which spelling is unstable may present a list of all the spelling variants of a given word, drawing on quotations which are not printed in the entry as well as on all those which are. Compiling these lists is dull work, but they have always been useful for verifying whether a given form can represent a given word (a reader who had encountered the form mamont in a text and suspected that it was a variant of mammoth would indeed find it in the forms list s.v. mammoth in 154
Researching Historical Lexicography and Etymology
the printed OED), and they are of redoubled use when the full text of a dictionary is electronically searchable (a reader who encounters the form occyccion in a text can search the online text of OED for it and will find it in the variant forms list s.v. oxycroceum). Historical dictionaries may comment ad hoc on pronunciation as far as it can be ascertained from metalinguistic comments in quotation evidence or from metrical verse. They have not traditionally provided reconstructed pronunciations, although the third edition of OED records the pronunciation of a given word which is indicated in previous editions if this differs from the pronunciations now regarded as normal. As for current pronunciations, OED takes these from the synchronic Oxford Dictionary of Pronunciation.7 No other element of the dictionary is more remote from historical principles. The hybrid quality of the Trésor de la langue française is to its advantage in this respect, for it can indicate pronunciation in the fundamentally synchronic portion of an entry, while omitting pronunciation information from the historical portion. Etymologies are usually presented in historical dictionaries, though a historical dictionary of a regional variety of a given language may refer whenever possible to the etymologies in a more general historical dictionary: so, for instance, the Dictionary of Newfoundland English notes tersely that sadogue ‘bread, cake’ is from Irish sodóg ‘cake’, but gives no etymological information at all s.v. salmon. The Middle English Dictionary gives minimal etymologies: that for pouche, for instance, reads ‘OF; cp. CF poche, pouche; AF puche’. In some revised OED entries, the etymology is an elaborate essay rich in information about etyma and cognates (that for pouch runs to 228 words, but those for the noun man and the verb may to more than a thousand apiece); this was already true of some first-edition entries, as it had been of some entries in the Grimms’ Deutsches Wörterbuch. Passow’s original formulation of the historical principles of lexicography was in part a rejection of a view of language change in which etymological speculation played a central part (see Zgusta 2006: 27). It is therefore in keeping with his principles that the third edition of the OED has rejected the first edition’s use of asterized reconstructed forms in etymologies: for instance, the noun mind is now derived ultimately from ‘the Indo-European base of a preterite-present verb for “to think, remember, intend”’ rather than from ‘the root *men-, man-, mun- . . . to think, remember, intend’. This brings us back to the fundamental point that lexicography on historical principles is by definition based on historical evidence.
2 Etymological Lexicography For our purposes, all etymological research can be placed at some point on each of three conceptual axes.8 The first two have to do with the form in which 155
The Bloomsbury Companion to Lexicography
the research is disseminated. First, any number of etymologies may be presented in a given publication. A study of the etymology of a solitary word is a matter of lexicology rather than lexicography, but as we shall see, the dividing line between a collection of etymologies and an etymological dictionary is not always clear. Second, the results of etymological research may be presented more or less dogmatically: an entry in an etymological dictionary can make a single unsourced statement as though reporting unquestioned fact, or it can review the previous studies of an etymological problem in more or less depth, expressing a more or less marked preference for the conclusions of one of those studies, or for new conclusions. Turning from form to aims, we may conceive a third axis, extending from the etymological search for origins to etymological inquiry into development.
2.1 The Number of Etymologies Presented How many headwords should an etymological dictionary present? One possible answer would be ‘as many as a general-purpose synchronic dictionary’, but in the cases of English and its relatives, there is good reason for a high proportion of the words registered in a general-purpose dictionary to be excluded from an etymological one. Words like fireside and unhappy are, after all, such transparent formations from familiar English elements that there would seem to be little or no point to including them in an etymological dictionary, and the purchaser of such a dictionary in printed form might well feel that they increased the bulk and cost of the book pointlessly. Of course, a form which appears to be as etymologically banal as unhappy may in fact speak to a form in another language: to what extent has unreal been influenced by Latin irrealis, at least in writings on grammar and philosophy? Should the use of uncanny as a conventional translation equivalent for German unheimlich in and following the writings of Freud be a matter for etymological comment? These are examples of a kind of question to which we shall return in Section 2.3 below. Most etymological dictionaries exclude transparent compounded and affixed forms, but include a wide range of loanwords and all the common native words of the language. It would be possible for them also to exclude loanwords of an obvious kind: since general-purpose dictionaries already offer terse etymologies of the sort which identifies macaroni as Italian in origin or hexagon as Greek, there would be a case for restricting the coverage of an etymological dictionary to words about which there is more to be said. But in practice, all but the most sophisticated dictionary users would be puzzled and dissatisfied by an etymological dictionary from which many common words were excluded – moreover (and here we look forward to Section 2.3 again), a word whose ultimate origin is clear, like hexagon, may conceivably have entered English by any one of several 156
Researching Historical Lexicography and Etymology
routes, and a word whose proximate origin is clear, like macaroni, may present interesting problems in its remoter ancestry. Etymological dictionaries which are more selective than the norm may confine themselves to words which share one language of origin: for instance, Cannon (1996) is a dictionary of Japanese loanwords in English, and Corriente (2008) addresses the more important topic of Arabic loanwords in Spanish and the other Ibero-Romance languages (other examples are in Malkiel 1993: 90). Corriente’s work, like Manfred Görlach’s Dictionary of European Anglicisms, is an example of the polyglot etymological dictionary, a genre which surveys the outcomes in several languages of words from a single language. Whereas Corriente’s project could be undertaken by a single learned scholar, since expertise in one Ibero-Romance language leads naturally to access to the others, Görlach’s, which brings together data from Germanic, Romance and Slavic languages, and from Finnish, Hungarian, Albanian and Greek, had to be the product of teamwork. This, by the way, raises the question of the breadth of linguistic competence necessary for a good etymologist of English. The great etymologist Anatoly Liberman writes (2010: xviii) that The works gathered in this bibliography are in English, German, Dutch, Frisian, five Scandinavian languages (Swedish, Norwegian, Danish, Icelandic, and Faroese), French, Italian, Spanish, Rumanian, eight Slavic languages (Russian, Polish, Ukrainian, Czech, Bulgarian, Slovenian, and Serbian / Croatian), Latvian, Lithuanian, Finnish, Hungarian, Japanese, and Latin. For reading works in the Germanic, Romance, and Slavic languages I did not need help (in the Germanic group, the only exception is Yiddish), but this is where my expertise comes to an end. If my mastery of Finnish, Hungarian, Irish, Welsh, and Japanese were at a respectable level, I am sure that I would have discovered many contributions of which I remained unaware. A series of etymological studies of selected words may be presented as a volume of essays, like David L. Gold’s Studies in Etymology and Etiology, or even as part of a monograph, as in the last chapter of W. B. Lockwood’s Informal Introduction to English Etymology, but if they share a well-defined microstructure, they may also be presented as a dictionary, as has been done by Liberman in the pilot volume of his Analytic Dictionary of English Etymology.
2.2 Dogmatic and Analytic Etymological Dictionaries A feature of Liberman’s dictionary project which sets it apart from all other etymological dictionaries of English is the explicit and sustained critical attention to previous etymological work which is flagged by the word analytic in its title. 157
The Bloomsbury Companion to Lexicography
Of course, scholarly etymological dictionaries of English before his had been grounded in their makers’ study of primary and secondary evidence. They sought out occurrences of forms in English and other languages which might shed light on the etymology of a given English word – early attestations, for instance, of English mac(c)aroni, Italian mac(c)aroni, maccherone and so on – and critical discussions of the word histories implied by these attestations, such as the exchange on macaroon and macaroni published in Notes and Queries in 1871–2 (for which see Liberman 2010: 681). In the case of W. W. Skeat, the maker of the first modern etymological dictionary of English (see Liberman 1998: 42 and Malkiel 1993: 31–2), this search process was originally limited by an explicit personal resolution to 3 hours of study per entry (Skeat 1879–82 / 1910: xiv). In that of later works such as the Oxford Dictionary of English Etymology and indeed the fourth edition of Skeat, difficult cases doubtless had more hours of work allotted to them. But in both cases, and in those of other scholarly etymological dictionaries of English published in the twentieth century, identifications of the predecessors who had given arguments were generally suppressed. Arguments which were rejected would be noted very briefly or passed over in silence, and those which were accepted were presented without comment. A result of this policy was that English had, and has, no multi-volume etymological dictionaries such as were produced in the twentieth century for Spanish and Catalan by Juan Corominas (Joan Coromines) and for Italian by Manlio Cortelazzo and Paolo Zolli, and, at the extreme of scholarly elaboration, for French, in the form of the wonderful 25-volume Französisches etymologisches Wörterbuch founded by Walther von Wartburg.9 The references to secondary sources in etymological dictionaries which do present them vary in exhaustiveness. Those of the German etymological dictionary of Kluge, now in its twenty-fourth edition (1883 / 2002), are presented in abbreviated form at the end of the entry.10 Those in scholarly etymological dictionaries like Winfred Lehmann’s of the extinct Gothic language, are presented more fully, as part of the text of the entry.11 Those in Liberman’s Analytic Dictionary are gathered from a bibliographical survey which approaches exhaustiveness and presented discursively: that for adze (the first one presented in Liberman 2009) runs to five columns. This is a procedure as far removed as can be imagined from the dogmatic English-language tradition. The price which is still being paid for the excellence of Liberman’s work, quite apart from the untold thousands of hours which have been spent on the gathering and analysis of material, is slow publication. At the time of writing, the University of Minnesota Press has published two volumes of the dictionary: one which presents fifty-five specimen entries, and one, of nearly a thousand pages, which presents the bibliography (Liberman 2009 and 2010). Both have been supported by private benefactors (Liberman 2010: xxii–xxiii). 158
Researching Historical Lexicography and Etymology
2.3 Narratives of Origin or of Development Etymological inquiry in the ancient world was typically concerned with the origins of words, and this is still true of some recent etymological lexicography. So, the unsatisfactory etymological dictionary of Eric Partridge, better known for his slang lexicography, was actually called Origins (Liberman 1998: 50). At a higher level of sophistication, the brief etymologies in the widely circulated American Heritage Dictionary of the English Language refer where appropriate to an appendix of Indo-European roots compiled by the eminent philologist Calvert Watkins, with reference to the Indo-European etymological dictionary of Julius Pokorny, and (in the fourth edition of 2000 and the fifth of 2011) to a matching appendix of Semitic roots. By contrast, from the early stages of its planning onwards, the OED has always treated etymological statements about the history of a series of forms outside English as integrated with the story of the development of senses within English. This principle is very clearly expressed in the Oxford Guide to Etymology by Philip Durkin, the Principal Etymologist of the OED, which states at its outset that etymology ‘is the investigation of word histories’, or ‘the whole endeavour of attempting to provide a coherent account of a word’s history (or pre-history)’ (Durkin 2009: 1, 2). The parentheses around ‘or pre-history’ are eloquent: Durkin discusses reconstructed forms at a number of points (e.g. 14–19), and indeed quotes Watkins at length on the subject of reconstruction (253), but in his account, the study of sense-development within English is treated as continuous with the study of how given words first entered the language. This makes it possible to attend effectively to cases where the stories of borrowing and development are intertwined, for instance where there is continuing influence of an etymon on a borrowed form (or reciprocal influence between the two), or where a single form may be traced to multiple borrowings from different languages (see Durkin 2009: 155–78). The principle that etymological lexicography can be so closely connected with historical lexicography that the two must really be practised together, and the fact that there is a major historical dictionary of English in which etymology and historical lexicography are indeed integrated, explain the present three-way partition of the etymological lexicography of English, into specialized single-volume etymological dictionaries with terse origin-driven entries for a wide range of words, large historical dictionaries in which etymology and sense-development are integrated for a much wider range of words, and dictionaries or lexicological collections in which fewer headwords than would be acceptable in a dictionary of the first class are treated in more detail than would be practicable in a dictionary of the second.
159
The Bloomsbury Companion to Lexicography
3 Conclusion Liberman has more than once struck a melancholy note as he has commented on the story of the etymological lexicography of English (see e.g. Liberman 1998: 93–4). It is not easy to see how future general dictionaries of English etymology, successors to the work of Skeat, will be dramatically better than or indeed different from their predecessors. Liberman’s own analytical project is a different matter, though its scale makes it unlikely that it will have many imitators. The most exciting developments in the etymological lexicography of English are those associated with the OED project (see Durkin 1999, and the publications listed at Durkin 2009: 303–4). There are, however, prospects for the dramatic development of the historical lexicography of English, and although these are treated much more fully in Chapter 5 of the present volume, a very brief sketch belongs here as well. At least three possible lines of development can be discerned. First, the development of historical corpora cannot fail to have an effect on the development of historical lexicography, most obviously on the provision of frequency information (this is present in the Trésor de la langue française). Gender and class might become more central to a corpus-based historical lexicography of English than they have been to any previous work in the field. Second, there are wide and interesting gaps in the regional historical lexicography of English – for instance, a dictionary of West African English on historical principles would be most welcome, and perfectly feasible. The same may be said of author lexicography. There are many Shakespeare dictionaries in English, but the best of them, the Shakespeare Glossary of C. T. Onions (1911 / 1986), is a little book, incomparably more modest than the Goethe Wörterbuch (1978–) which perhaps represents the state of the art in historical author-lexicography. Third, the integration of the online texts of major historical dictionaries is now being explored: The Middle English Dictionary and the Historical Thesaurus of the Oxford English Dictionary are now linked to the OED, and the Woordenboek der Nederlandsche Taal and the Oudnederlands Woordenboek are linked to three other historical dictionaries of Dutch and Frisian in the Geïntegreerde Taal-Bank of the Institute for Dutch Lexicography at Leiden. Linking is not really the same as integration, but it is a first step towards it. There are uncertainties about the future of historical lexicography, not least among them the question of the role of print. The uncertainty of funding, even after the publication of a dictionary has begun, is notorious: hence the suspension of publication of Jonathan Lighter’s splendid historical dictionary of American slang (see Winchester 2012: 26), and the heroic ongoing efforts of the editors of the Dictionary of Old English to obtain funding for each new volume. But the genre itself is alive with possibilities.
160
Researching Historical Lexicography and Etymology
Notes 1. Dictionnaire historique de la langue française s.v. cab, ‘CAB n. m. est emprunté (1848) à l’anglais cab (1827) . . . Cab désigne à l’origine une voiture à deux roues tirée par un cheval, puis en 1834 ce type de voiture avec cocher à l’arrière . . . il s’est aussi employé, comme en Angleterre, pour des voitures à quatre roues avec le cocher à l’avant. Le cab était élégant et à la mode dans la seconde moitié du xixe s. . . . La forme hansom cab s’est employée avec un renforcement de snobisme (Proust). C’est aujourd’hui un mot d’historien.’ 2. Jamieson’s work was a dictionary on historical principles avant la lettre, which presented well-referenced and chronologically ordered quotations from historical texts in the Scots language after the definitions. 3. For more on Passow and on the history of historical dictionaries in general, see Considine forthcoming. 4. The desirability of improving Liddell and Scott has been remarked upon on many occasions; for an interesting recent discussion, see Lee 2010. For a comparison between Liddell and Scott and the ongoing Diccionario Griego–Español, see Facal 1981. 5. Word count from Bamman and Smith 2012: 2 n 2; the figure of 10 million words given beside this includes post-200 material. 6. See also Brewer 2010 and 2012, and cf. Considine 2009. 7. See . 8. For a much more elaborate typology than this one, see Malkiel 1976. 9. For Corominas see Malkiel 1993: 140–2; for Cortelazzo and Zolli and their predecessors, ibid. 106–7; for the Französisches etymologisches Wörterbuch, ibid. 80–2, supplemented by the materials available online at (click on ‘Les grands projets’ and then ‘FEW’). 10. No etymological dictionary of English has gone through as many editions as Kluge – but some of those which make up the count of 24 are simply new printings rather than revised editions. 11. See Lehmann 1986: vi, for his treatment of references to secondary sources and that of Sigmund Feist, whose dictionary of 1939 he translated and revised to produce his own.
References Dictionaries American Heritage Dictionary of the English Language (1969 / 2011) 5th edition. Boston: Houghton Mifflin. Cannon, G., with N. Warren (1996) The Japanese Contributions to the English Language: An Historical Dictionary. Wiesbaden: Harrassowitz. Concise Scots Dictionary (1985) Editor-in-chief Mairi Robinson. Edinburgh: Polygon. Corriente, F. (2008) Dictionary of Arabic and Allied Loanwords: Spanish, Portuguese, Catalan, Gallician and Kindred Dialects. Leiden: Brill. Dictionary of Newfoundland English (1982 / 1990) Ed. G. M. Story, William J. Kirwin and J. D. A. Widdowson. Toronto: University of Toronto Press. Dictionary of Old English (1986–) Ed. Antonette diPaolo Healey (A–G), Angus Cameron (D), Ashley Crandall Amos (B–D). 8 fascicles (A–G) published to date, on microfiche and on CD-ROM. Toronto: Pontifical Institute of Medieval Studies.
161
The Bloomsbury Companion to Lexicography Dictionnaire historique de la langue française (2010), New (4th) edition. Ed. Alain Rey. Paris: Dictionnaires Le Robert. Goethe Wörterbuch (1978–) 4 Vols, plus 8 fascicles of Vol. 5 (A–lebensfeindlich) published to date. Stuttgart: Kohlhammer. Online at . Görlach, M. (ed.) (2005) A Dictionary of European Anglicisms. Oxford: Oxford University Press. Jamieson, J. (1808) An Etymological Dictionary of the Scottish Language, Illustrating the Words in their Different Significations by Examples from Ancient and Modern Writers. 2 Vols. Edinburgh: printed at the University Press for W. Creech [etc.]. — (1825) Supplement to the Etymological Dictionary of the Scottish Language. 2 Vols. Edinburgh: printed at the University Press for W. & C. Tait [etc.]. Kluge, F. (1883 / 2002) Etymologisches Wörterbuch der deutschen Sprache. 24th edition. Berlin: Walter de Gruyter. Lehmann, W. (1986) A Gothic Etymological Dictionary: Based on the Third Edition of Vergleichendes Wörterbuch der gotischen Sprache by Sigmund Feist. Leiden: Brill. Liberman, A., with the assistance of J. Lawrence Mitchell (2008) An Analytic Dictionary of English Etymology: An Introduction. Minneapolis and London: University of Minnesota Press. Middle English Dictionary (1952–2001) Ed. Hans Kurath (A–F), Sherman M. Kuhn with John Reidy (G–P), and Robert E. Lewis (Q–Z). 13 Vols. Ann Arbor: University of Michigan Press [some volumes jointly published with Oxford University Press]. Onions, C. T. (1911 / 1986) A Shakespeare Glossary, 3rd edition. Ed. Robert D. Eagleson. Oxford: Clarendon Press. Oudnederlands Woordenboek (2009–) Editor-in-chief Tanneke Schoonheim. Leiden: INL. Online at . Oxford English Dictionary (1884–1933) Ed. James Murray, Henry Bradley, William Craigie and C. T. Onions. 12 Vols. plus supplement. [Known until 1933 as A New English Dictionary on Historical Principles] Oxford: Clarendon Press. — (1989) 2nd edition. Prepared by J. A. Simpson and E. S. C. Weiner, on the basis of the 1st edition and of a 4-Vol. Supplement (1972–86) by R. W. Burchfield, 20 Vols. Oxford: Clarendon Press. — (2000–) 3rd edition. Ed. J. A. Simpson. Oxford: Oxford University Press. Online at . Schneider, J. Gottlob (1819) Johann Gottlob Schneider’s Handwörterbuch der griechischen Sprache: Nach der dritten Ausgabe des grössern Griechischdeutschen Wörterbuchs mit besondrer Berücksichtigung des Homerischen und Hesiodischen Sprachgebrauchs. Ed. Franz Passow, 2 Vols. Leipzig: Friedrich Christian Wilhelm Vogel. Scottish National Dictionary (1931–76 and 2005) Ed. William Grant (Vols 1–3) and David Murison (Vols 3–10), with a supplement ed. Iseabail Macleod, 10 Vols plus supplement. Edinburgh: Scottish National Dictionary Association. Skeat, W. W. (1879–82 / 1910) An Etymological Dictionary of the English Language, 4th edition. Oxford: Clarendon Press. Trésor de la langue française: Dictionnaire de la langue du XIXe et XXe siècle (1789–1960) (1971–94) Ed. Paul Imbs and Bernard Quemada, 16 Vols. Paris: Éditions du Centre National de la Recherche Scientifique (Vols 1–10); Gallimard (Vols 11–16).
Other References Adams, M. (2010) Legacies of the Early Modern English Dictionary. In: John Considine (ed.) Adventuring in Dictionaries: New Studies in the History of Lexicography. Newcastle, UK: Cambridge Scholars Publishing, 290–308.
162
Researching Historical Lexicography and Etymology Ashdowne, R. (2010) ‘Ut Latine minus vulgariter magis loquamur’: The making of the Dictionary of Medieval Latin from British sources. In: C. Stray (ed.), 195–222. Bamman, D. and Smith, D. (2012) Extracting two thousand years of Latin from a million book library. ACM Journal on Computing and Cultural Heritage 5/1, article 2 (separately paginated). Brewer, C. (2007) Treasure-House of the Language: The Living OED. New Haven and London: Yale University Press. — (2009) The Oxford English Dictionary’s treatment of female-authored sources of the eighteenth century. In: Ingrid Tieken-Boon van Ostade and Wim wan der Wurff (eds) Current Issues in Late Modern English. Bern: Peter Lang, 209–38. — (2010) The use of literary quotations in the Oxford English Dictionary. Review of English Studies 61 (248), 93–125. — (2012) ‘Happy copiousness’? OED’s recording of female authors of the eighteenth century. Review of English Studies 63 (258), 86–117. Considine, J. (2009) Literary classics in OED quotation evidence. Review of English Studies 60 (246), 620–38. — (forthcoming) Historical dictionaries: history and development; current issues. In: Philip Durkin (ed.) The Oxford Handbook to Lexicography. Oxford: Oxford University Press. Durkin, P. (1999) Root and branch: revising the etymological component of the OED. Transactions of the Philological Society 97, 1–49. — (2009) The Oxford Guide to Etymology. Oxford: Oxford University Press. Eickmans, H. (2012) Woordenboek der Nederlandsche Taal (WNT). In U. Haß (ed.), 271–91. Facal, Javier L. (1981) The New Greek–Spanish Dictionary. Classical Journal 76/4, 357–63. Geïntegreerde Taal-Bank (2007–) Leiden: Instituut voor Nederlandse Lexikologie. Online at . Gold, David L. (2009) Studies in Etymology and Etiology (ed. F. Rodríguez González and A. Lillo Buades). San Vincente: Publicaciones de la Universidad de Alicante. Haß, U. (ed.) (2012) Grosse Lexica und Wörterbücher Europas. Berlin: De Gruyter. Jones, D. M. (1966) Review of Mycenaeae Graecitatis Lexicon by Anna Morpurgo. Classical Review new ser. 16/3, 374–5. Lee, John A. L. (2010) Releasing Liddell-Scott-Jones from its past. In C. Stray (ed.), 119–38. Liberman, A. (1998) An annotated survey of English etymological dictionaries and glossaries. Dictionaries 19, 21–96. Liberman, A., with the assistance of Ari Hoptman and Nathan E. Carlson (2010) A Bibliography of English Etymology. Minneapolis and London: University of Minnesota Press. Lockwood, W. B. (1995) An Informal Introduction to English Etymology. Montreux: Minerva Press. Louwen, K. (2008) A glimpse behind the scenes of the Oudnederlands Woordenboek (Old Dutch Dictionary). In: Marijke Mooijaart and Marijke van der Wal (eds) Yesterday’s Words: Contemporary, Current and Future Lexicography. Newcastle: Cambridge Scholars Publishing, 218–29. Malkiel, Y. (1976) Etymological Dictionaries: A Tentative Typology. Chicago and London: University of Chicago Press. — (1993) Etymology. Cambridge: Cambridge University Press. Murray, K. M. Elisabeth (1977) Caught in the Web of Words: James Murray and the Oxford English Dictionary. New Haven and London: Yale University Press. Norri, J. (2010) Dictionary of Medical Vocabulary in English, 1375–1550. In: John Considine (ed.) Current Projects in Historical Lexicography. Newcastle: Cambridge Scholars Publishing, 61–82.
163
The Bloomsbury Companion to Lexicography Pantelia, M. (2000) ‘Νούς into Chaos’: the creation of the thesaurus of the Greek language. International Journal of Lexicography 13/1, 1–11. Schröder, B.-J. (2012) Thesaurus linguae latinae. In: U. Haß (ed.), 293–300. Seber, Wolfgang (1780) Index vocabulorum in Homeri Iliade atque Odyssea. New edition. Oxford: ex typographeo Clarendoniano. Shenker, I. (1978) Akkadians had a word for it. New York Times Book Review, 21 May, 3, 38. Silva, P. (2000) Time and meaning: sense and definition in the OED. In: Lynda Mugglestone (ed.) Lexicography and the OED: Pioneers in the Untrodden Forest. Oxford: Oxford University Press, 77–95. Stray, C. (ed.) (2010) Classical Dictionaries Past, Present and Future. London: Duckworth. Thesaurus Linguae Graecae (2009) Director Maria Pantelia. Irvine, CA: University of California, Irvine. . Winchester, S. (2012) The mongrel speech of the streets [review of Green’s Dictionary of Slang, by Jonathon Green.] New York Review of Books, 8 March, 24–6. Zgusta, L. (2006) Lexicography Then and Now: Selected Essays, ed. Fredric S. F. Dolezal and Thomas B. I. Creamer. Tübingen: Max Niemeyer.
164
4.4
Researching Pedagogical Lexicography Amy Chi
Chapter Overview Introduction Monolingual Learner’s Dictionary (MLD) for Advanced Learners of English
165 167
1 Introduction A dictionary, as an art and craft of lexicography, has always been closely associated with the notion of pedagogy. A subject-field dictionary, for instance, such as a dictionary of computer engineering or social science, provides word information of a specific discipline to students to support their study, or to serve professionals in the field. In general, the aims of dictionary compilation are threefold: first, as a record of a particular language, such as the monumental Oxford English Dictionary, which charts the development of the English lexicon; second, and the most common purpose, to solve linguistic problems related to word knowledge and usage that users may encounter in their daily lives (typically, such a dictionary is synchronic in nature, like a general-purpose dictionary or a learner’s dictionary of a foreign language); and third, to be sold as a commercial product to generate profits. It is thus logical to presume that pedagogical lexicography, which Hartmann and James (1998: 107) define as ‘a complex of activities concerned with the design, compilation, use and evaluation of pedagogical dictionaries’, entails a symbiotic relationship between the knowledge provider (lexicographer) and 165
The Bloomsbury Companion to Lexicography
the receiver (user). In other words, at the preparatory stage of a dictionary compilation, lexicographers will have a target user group, or groups, which the reference book will serve. The word information to be provided will require specific design and compilation to meet the needs of the target group(s). To evaluate the accomplishment of a dictionary, we should examine users’ comments on the usefulness of the dictionary (use and evaluation). There will also be academics giving dictionary critiques. With this understanding of pedagogical lexicography as the backdrop, the following presents an overview of the development of English pedagogical lexicography. Over the past six decades, the most exciting development in pedagogical lexicography has been the compilation of English dictionaries for non-native English-speaking (L2) learners of English. Indeed, the monolingual learner’s dictionary (MLD) for advanced L2 learners of English defines the status of such a dictionary type in English lexicography and has been the major thrust for research in pedagogical lexicography. Due to the gargantuan number of its potential user group (with the total number amounting to between 500 million and 1 billion according to Béjoint 2010), and the ever-growing demand for learning English as a foreign language, the MLD has been attracting much attention from academics. It all began with A. S. Hornby’s The Idiomatic and Syntactic English Dictionary (ISE, later published by the Oxford University Press and renamed as the Advanced Learner’s Dictionary: this was the first edition of the dictionary currently known as the Oxford Advanced Learner’s Dictionary, OALD). The dictionary was targeted specifically at helping Japanese advanced learners of English with their encoding or writing tasks. There were also salient features to meet intended users’ decoding needs, such as looking up word meanings. The dictionary was unrivalled (for a period, when three editions were published) until 1978 when Longman Dictionary of Contemporary English (LDOCE) appeared on the market. Collins COBUILD English Dictionary (CCED) joined in the competition in 1987 and initiated a new era of corpus-based lexicography (see 4.1 ‘Using corpora as data sources for dictionaries’). Cambridge International Dictionary of English (CIDE) (1995), Macmillan English Dictionary (MED) (2002) and Merriam-Webster’s Advanced Learner’s English Dictionary (MWALED) (2008) gradually contributed their publications, each with its own innovations and unique features, totalling six MLDs targeting advanced L2 learners of English aged 16 to 24 (Rundell 2010). This chapter will review and discuss some current research studies in pedagogical lexicography in the context of the teaching of English as a second or foreign language (ESL/EFL). We shall focus on four aspects of the MLD: design, compilation, use and evaluation. Some issues related to these four areas overlap while some are interrelated; hence, the sections should not be read as mutually
166
Researching Pedagogical Lexicography
exclusive. Most of the discussion will centre on the advanced learners’ dictionaries in the printed form. The concluding section will indicate some possible future developments of this dictionary type.
2 Monolingual Learner’s Dictionary (MLD) for Advanced Learners of English 2.1 Background McArthur (1989) postulates that the prevailing conditions of English lexicography presented at the beginning of the twentieth century provided an ideal breeding habitat for the MLD. These included in the discipline of English lexicography a practice of using simple words to explain harder words and the growth of systems of education throughout the British Empire, with the latter point implying that the English language would be part of that growth in dependent countries. The development in foreign language teaching and learning and linguistics research studies at the time further impacted the conceptualization of MLDs. For example, Daniel Jones’s English Pronouncing Dictionary of 1917 provides the model accent to be offered to L2 learners of English; such a pronunciation system, or version of the system, has been adopted by all MLDs to date to provide users with English word pronunciation. Also, the Vocabulary Control Movement in the late 1920s and early to mid-1930s affected the selection of the dictionary wordlist, choice of phraseological information and the convention of a restricted defining vocabulary (Cowie 1999). In addition, Palmer’s and Hornby’s interest in pedagogical grammar resulted in the distinct verb-pattern scheme of OALD1. However, Rundell (1988) argues that this dictionary type has inherited many conventions from native-speaker monolingual dictionaries (NSD) that were irrelevant to L2 learners of English. In the first section ‘Researching Design’, we examine the macrostructure of MLDs, focusing on two areas: alphabetical ordering and polysemous word entry. The second section, ‘Researching Compilation’, continues the macrostructure theme with an overview of the MLD wordlist. Two important conventions in presenting word meaning: defining vocabulary and definition writing style are reviewed in detail. In the third section, ‘Researching Use’, we trace the development of user-related research studies since the 1980s, discussing both their strengths and their weaknesses. The fourth section, ‘Researching Evaluation’, includes a review and discussion of who should, and how to, evaluate MLDs. In the concluding section, ‘The Way Forward’, an area which has great potential for pedagogical dictionary research will be proposed.
167
The Bloomsbury Companion to Lexicography
2.2 Researching Design 2.2.1 Alphabetical Ordering
The MLD adopts the NSD’s mainstream tradition of an alphabetical macrostructure in presenting wordlists. However, since ‘no form of alphabetization can successfully deal with all types of idioms without listing each in several places, and no dictionary can afford the luxury of such repetition’ (Landau 1989: 82), many lexicographical decisions (e.g. adjustment and interpretation of the alphabetical macrostructure) will have been implemented before the final presentation. Compilers have mostly taken into consideration, sometimes based on linguistic theories, the need for space and the requirement to offer quick and easy access to words. Their decisions may affect the choice of headword (using the canonical or base form of a word rather than the form in which it most frequently appears), treatment of homographs (e.g. bank ‘the place for money’ or bank ‘place along the side of a river’), presentation of phrasal verbs, derivatives and multiword units (e.g. idioms and compounds). The final access structure of the dictionary may not be transparent or comprehensible to inexperienced L2 users of the dictionary. Indeed, the access macrostructure of the dictionary often becomes a source of obstacles for users in their word search. One issue is whether or not the dictionary should adopt nesting or strict alphabetical ordering in presenting headwords. The nesting approach (placing several related words and phrases under the same headword in a non-alphabetical order), adopted by the OALD until its 6th edition (2000), was perceived to help maintain and develop a sense of semantic and morphological relationships within the lexicon, which was important for users learning the target language, in particular in expanding their vocabulary (Cowie 1999). This structuring principle in presenting words was later replaced by the strict letter-by-letter method, probably to facilitate computing production, based mainly on word class. Such a change gave decoding a higher priority than encoding (Cowie 1999, van der Meer and Sansome 2001) since the arrangement was believed to allow users easier and quicker access to information, which was considered to be important, based upon results from user-related studies (Bogaards 1996, Herbst 1996). Following this convention, compounds become separate headwords while suffixed derivatives of words remain mostly as run-ons (sub-entries of a word or phrase they are related to). All current MLDs follow this arrangement in general; nevertheless, the issue has not yet reached a perfect closure. Bogaards (1996) questions the use of an alphabetical ordering system in presenting headwords and explicates how such an arrangement can create obstacles for L2 users in completing written translation tasks. Since semantically related words are scattered in the dictionary following their orthographic 168
Researching Pedagogical Lexicography
form, he explains that the ‘findability’ of word meanings following this macrostructure is low. Another concern is that since MLDs aim at appealing to a global clientele, the choice of adopting an alphabetical macrostructure might be problematic to users of different linguistic and cultural backgrounds. First, such an ordering system may obstruct or retard the word-search of users whose mother tongue does not share the Latin alphabet (e.g. Reif 1987, Battenburg 1991). Second, even if users’ languages share the same Latin root, as in the case of Spanish learners of English, they may have a different interpretation of the alphabetical order. Scholfield (1999) explains how Spanish users may be confused when looking up words like alley, spelt with ll, which Spanish traditionally would treat as one letter, but will be two in an English dictionary. Third, a strict letter-by-letter arrangement may obstruct some users from transferring their first language dictionary reference experience and skill to the use of the MLD. Chi’s (2003) study revealed how first language reference experience led some Hong Kong Chinese students to word-search failure. For example, when these students transferred their dictionary reference skills from the use of Chinese dictionaries, which mostly follow a word nesting convention, to their use of MLDs, they were often disappointed by not finding the compound word subsumed under the related headword. Furthermore, whether decoding is really preferred over encoding by L2 users when they consult a dictionary is still debated. Rundell (1988) maintains that the belief of MLD compilers in treating meaning to be the most important information that L2 learners of English need from the dictionary, similar to NSD users, is one of the major faults in the design of this dictionary type. Sholfield (1999) upholds that dictionary consultation, for both receptive and productive purposes, should be an integral part of the learners’ vocabulary acquisition process, both in terms of learning new words and in strategy development. A nesting convention facilitates vocabulary development that treats words in families, and van der Meer (2002: 516) postulates that a nesting entry ‘would in many cases also make sense definitions easier, since repetitions of identical semantic information may be avoided’. Indeed, fueled by an abundance of evidence resulting from computing technology and multi-million-word textual corpora, there has been a revival of interest in linguistic disciplines like second language vocabulary acquisition and phraseology. Results from such research studies have impacted on the presentation and vocabulary information in brand new, and new editions of existing MLDs in the past two decades. Most current MLDs take pride in their new features: the red-coloured words (‘[for users] to use them confidently and correctly’ MED1); a wealth of collocation information to support productive tasks and frequency-based vocabulary information to help users communicate effectively (Cambridge Advanced Learner’s Dictionary 2005); and the inclusion of Coxhead’s (2000) Academic Word List (OALD8). With the encoding function of 169
The Bloomsbury Companion to Lexicography
the dictionary seemingly shifting to take a more central position in pedagogical dictionaries, it is possible that the current policy of aligning headwords void of semantic and morphological information will be revisited. On the other hand, at the technological frontier, de Schryver (2003: 175) proclaims that ‘users don’t need to know the sequence of the alphabet’s letters anymore’ in the electronic dictionary interface. He explains that the latest techniques such as voice recognition, ‘focus-in typing’ (while typing, a list of words appears on the screen suggesting or approximating the word in search) and spelling corrections would eliminate the problems of misspelling and prompt users to the right word. The devices solve some apparent problems but help is rendered only to users accessing the MLD electronically. Boonmoh (2010) asserts that the exciting picture that Nesi (2000b) described in respect of dictionaries on CD-ROM have not yet been realized in many pocket electronic dictionaries (PEDs) nor in dictionaries on the internet. In Gouws’ (2009: 4) view, ‘the immediate future of lexicographic tools still sees printed dictionaries as an important and persisting role player’. He continues: This demands that metalexicographical research, besides its focus on electronic dictionaries, should still also be directed at printed dictionaries – including issues regarding the macrostructure of printed dictionaries. New dimensions of research in alphabetical macrostructure, supplemented by recent linguistic findings in English as a second/foreign language acquisition, and supported by computing technology, may yield a new presentation of the MLD that includes an ‘integrated semantic awareness’ (van der Meer 2002). Rundell (2010) suggests that the one-size-fits-all MLD model is out-dated and the future of the dictionary centres on the concept of ‘customization and personalization’. It is possible to offer L2 users from a specific linguistic background a modified or domesticated alphabetical macrostructure. Using the same wordlist, the MLD could be presented non-alphabetically, applying alternative approaches such as by themes (e.g. Longman Lexicon of Contemporary English 1981) or arranging words with synonymous meanings (e.g. Wordnet).
2.2.2 Polysemous Words
Many dictionary-use research studies have revealed that typically users would choose the first sense of a polysemous word entry to answer their need (e.g. Nesi 1987, Bogaards 1998) either because of convenience, or ignorance, or both. This decision is problematic in many respects, especially when the lexicographical decision in prioritizing senses of such types of entry depends mostly on high-frequency use as indicated in native-speaker corpora. van der Meer (2002: 514) cautions, ‘the only thing a frequency-based order does is increase look-up speed for isolated senses. No more.’ 170
Researching Pedagogical Lexicography
Offering ‘Signposts’, ‘Guidewords’, ‘Menus’ or ‘Shortcuts’ at the beginning of, or internally within, a long entry seems to be the convention adopted by most current MLDs to assist users to locate the sense relevant to their need, and to accelerate the look-up process. Data from research studies (e.g. Bogaards 1998, Nesi and Kim 2011, Tono 2011) testing the effectiveness of such a convention have been positive in general. Variables that might affect the accuracy and speed of the search seem to be related to the L2 user’s English proficiency; the length of, and the way meanings are presented in, the entries users looked up; and the synonymous words or phrases used by the dictionary to indicate ‘direction’ for the search. Opponents of this convention have cast doubts on its effectiveness in guiding users accurately to find the right sense since the ‘directive words’ ‘tend to rely on high-frequency superordinate terms, and these are sometimes too ambiguous or vague to facilitate effective searching’ (Rundell 1998: 327). Yamada (2009) contends that the convention is inconsistently applied in dictionaries, and hence may confuse users. He further adds that the signposts are mainly semantic and are not useful to form-based word consultations, while some signposts are semantically redundant, repeating and/or summarizing words found in the sense definition. Online MLDs which offer the ‘signpost’ convention such as the Cambridge Dictionary Online and the Longman Dictionary of Contemporary English, have inherited the problem.
2.3 Researching Compilation 2.3.1 Wordlist
A dictionary for L2 learners of English was first conceived as a synchronic dictionary with a wordlist of frequent words that could fulfil, to a lesser or greater extent, both the decoding and encoding needs of L2 users. Such a notion of including only ‘a selected subset of the lexicon’, as Rundell (1998: 316) commented, ‘marks a major departure from the native-speaker (NSD) tradition’. This concept of identifying a limited set of English vocabulary that suffices for L2 users to function in general second language contexts was the centre of research studies during the Vocabulary Control Movement from the mid-1920s to the late 1930s. Major publications that have influenced the compilation of MLDs include Michael West’s first use of 1,490 words for the vocabulary definitions of the New Method English Dictionary (1935) and his publication of A General Service List of English Words in 1953. The latter was referenced in the compilation of LDOCE1 for its 2,000 word Defining Vocabulary (Cowie 1999), and this defining convention has developed into a norm in the compilation of all major MLDs. Hornby’s wordlist was mostly based on frequency of use, with the belief that dictionary consultations from target users would be for words they encounter 171
The Bloomsbury Companion to Lexicography
frequently; thus, the wordlist contained more structured and ‘heavy duty’ words than rare, technical and scientific words (Cowie 1999, Béjoint 2010). While MLDs published before 1995 more or less attempted to strike a balance between serving the encoding and decoding functions, there appeared a major shift to serve the latter function in the new or newer editions of MLDs. One of the obvious reasons was that user studies undertaken in the 1970s and 1980s revealed that users’ concern was mostly with meaning (see section ‘Researching Use’) in their consultations. Moreover, researchers like Kokawa and Yamada (1998: 354) had elucidated that for a MLD to be complete it has to teach ‘learners linguistic (including pragmatic) knowledge as well as cultural or encyclopaedic aspects of the language, which relate to the “content” of communication’. High-frequency and function words were discovered to be less demanded, a contradiction to the early belief that these words would be needed because of their high frequency of encounters and for use; instead, users are requesting more infrequent words and semantic knowledge about words (Cowie 1999).
2.3.2 Defining Vocabulary
With regard to microstructure, the types of information that go into individual entries have stayed intact, though perhaps in different formats and with different emphases, in all major MLDs since the initial compilation of this dictionary type. As an advocate of using a selective core vocabulary so as not to overtax L2 learners of English, Hornby used simple words and phrases to write definitions. LDOCE1 (1978) went a step further and debuted a Defining Vocabulary of 2,000 words. This new feature initially obtained mostly positive feedback from users’ studies (e.g. MacFarquhar and Richards 1983, Herbst 1986). However, some researchers contested LDOCE’s claim of using only 2,000 words in writing the dictionary definitions while some doubted the clarity and accuracy of definitions written following such a convention (e.g. Stein 1979, Fox 1989, Bogaards 1996, Rundell 1998). While definitions written following the convention of a controlled vocabulary are intended to encompass simplicity and comprehensibility, Cowie (1999: 111) asserts: It is also important that they should be accurate, concise and written in natural English, and LDOCE provides some evidence that, in striving to be simple and comprehensible, compilers can sometimes lose sight of one or more of the other criteria. Béjoint (2010) suggests that such a restrictive list of vocabulary may hamper lexicographers in creating a ‘chain of definitions’. He explains, for example, that by following the genus word mammal used in defining cat, dictionary users may expand their vocabulary knowledge; instead, if animal is used as the genus, as restricted by the vocabulary list, ‘the chain is short-circuited’ (ibid.: 172). 172
Researching Pedagogical Lexicography
Yamada (2010) corroborates the fact that natural and idiomatic English should be used in writing definitions. The abolition of a controlled vocabulary is possible, he affirms, with the gradual ease of referencing facilitated by the electronic interface of dictionaries. Should users face any difficult words in the definition (which a dictionary using a controlled vocabulary claims it may avoid), they can look up the words easily in several clicks and most users would not find that troublesome. Yet, it is noteworthy that all MLDs seemingly have endorsed the benefits of employing a controlled vocabulary in writing definitions by adopting the convention. The early resistance by OALD was removed when OALD5 (1995) announced that 3,500 words were used (3,000 in later editions) in writing their definitions. Collins COBUILD Advanced Dictionary of English (2009) uses 3,000 words in its Defining Vocabulary.
2.3.3 Definition Writing Style
This area has been attracting vigorous debate with the discussion ignited by COBUILD’s introduction of full-sentence definitions in its first edition (1987). The traditional defining style uses short phrases and synonymous words but COBUILD’s style seems to imitate the way a teacher explains meaning to students in a full sentence. Hanks (1987) and Fox (1989) maintained that the new defining style helps, for example, eliminate use of distracting conventions such as parentheses, archaic words and excessive formulaic expressions to define. Proponents of the new style, such as Herbst (1996: 326), praised the definitions mainly because they offer good ‘semantic and collocational ranges of the valency complements of a verb’ and ‘avoid the technical character and syntactic clumsiness’. Critiques (e.g. Bogaards 1996, Cowie 1999, Rundell 2006, Béjoint 2010), on the other hand, have raised doubts about this style of definition for its impreciseness, over-specification, wordiness and redundancy-prone and space-consuming nature. They are also concerned that the relatively long definition style could distract users in the look-up process; and in eliminating old and unnecessary conventions, the full-sentence style has created new obstacles for users. However, studies (e.g. Tickoo 1989, Cumming et al. 1994, Dziemianko 2006) have reported that users prefer full-sentence to traditional definitions, although some which compared students’ performance on given linguistic tasks while using dictionaries with COBUILD’s and traditional defining styles reported inconclusively (e.g. Nesi and Meara 1994, Nesi 1998). Nonetheless, it is a fact that the full-sentence definition style has not yet evolved into the standard convention, though it has been adopted selectively in all MLDs. In all fairness, COBUILD’s defining style broke new ground and drew MLDs further away from the NSD tradition. Yamada (2010) postulates that, in an electronic interface, some of the full-sentence definitions’ weaknesses can be overcome, for example by offering users a button for selecting definition information based on their look-up needs. 173
The Bloomsbury Companion to Lexicography
2.4 Researching Use 2.4.1 Methodology
There has been significant growth in related user research in pedagogical lexicography since the 1980s. Such studies, especially those related to L2 learners of English, have been well documented and discussed in various books and articles (e.g. Cowie 1999, Nesi 2000a, Tono 2001, Jackson 2002, Béjoint 2010). Commentary generally points to the conclusion that although, relatively, much work has been done compared to before the 1980s, not many conclusive results have been reached. For example, when commenting on the empirical studies on dictionary use in the past 20 years or so, Lew (2011: 1) could only suggest that ‘there is no denying that the methodological standards are improving at a steady rate’. Indeed, questionnaire-based study data have been constantly challenged (e.g. Hatherall 1984, Batternburg 1991, Cowie, 1999, Nesi 2000a, Tono 2001, Jackson 2002), mostly on their reliability and accuracy. In Humblé’s (2001) view, the assumptions that most of these studies had that subjects have lexicographic awareness, knowledge of linguistic concepts and are honest in their reports, could be flawed. The questionnaire-methodology was employed generally by researchers (e.g. Tomaszczyk 1979, Béjoint 1981, Kipfer 1987, Taylor and Chan 1994) from late 1970s to early 1990s. These studies intended to collect L2 users’ opinions and discover their habits and/or skills in dictionary use (some studies included the use of bilingual and/or bilingualized dictionaries and electronic dictionaries). Data from such studies have confirmed long-standing conjecture (e.g. meaning is the major information category sought) and revealed new data (e.g. MLDs enjoy high prestige but are depressingly underused). Later studies overcame the weaknesses of questionnaire-based research design, complementing it with other methods like written and/or oral protocols (e.g. Nuccorini 1992, Thumb 2004, Chan 2012), interviews and discussions (e.g. Cubillo 2002, Chi 2010) and/or linguistic tasks or tests (e.g. Cumming et al. 1994, Harvey and Yuill 1997, Atkins and Varantola 1998, Chi 2003). Research designs incorporating tests or tasks in particular have been widely used in discovering users’ reference skills and in examining correlations between the use of the dictionary and users’ performance in completing linguistic tasks, mostly in reading-comprehension and vocabulary acquisition. However, many share the same view as Nesi (2000a: 54) that: The findings of many of the other studies [on learners of English as a foreign language] are ultimately inconclusive, either because they report on the beliefs and perceptions of dictionary users, rather than on the observed consequences of dictionary use, or because different studies of similar phenomena have resulted in contradictory findings. 174
Researching Pedagogical Lexicography
Most critics draw their conclusion with recommendations for fine-tuning or experimenting with new methodology in data collection to ensure reliability and validity in areas such as test administration (e.g. choice of dictionaries and sampling of subjects), test design (e.g. contrived vs natural word search environment) and data analysis and representation (e.g. qualitative vs quantitative). However, it is only fair to point out that the problem identified regarding reliability and validity is universal in empirical research studies across all academic disciplines. The efforts and findings of all the research work should be given credit.
2.4.2 Subjects
Another major concern in these studies is the subjects they employed, including the subject size and nature. Many critics share the same view as Hartmann’s (2001: 94) that ‘The number and scale of user studies is still too small, . . . The target populations observed are still extremely limited’. For example, data shown on Tono’s (2001: 43, 51) two tables – Table 3.3 Research on ESL/EFL learners’ reference needs and 3.7 Studies on users’ reference skills, indicated that the highest sample subject number of study found in both research areas, learners’ reference needs and users’ reference skills was Atkins and Varantola’s (1998), totalling 1,140. Some studies had a very small subject group: Wiegand’s (1985) had just one subject and Ard’s (1982) two. More current studies of similar focus published in the International Journal of Lexicography also share a small subject group design as shown in Table 4.4.1. With such small sample sizes, these studies also have a sporadic pattern of occurrence. There is no denying that these studies have produced insightful data and the qualitative design of some might have justified a small sample size. However, when the reported data of these studies are to be referenced for decisions by a dictionary compiler, which may claim to meet the needs of millions, the number of such studies and their subject sample sizes are not noteworthy. In Hartmann’s (2001: 94.) words, ‘the results of various studies are of limited generalisability’. Table 4.4.1 Samples of recent research studies of dictionary users published in the International Journal of Lexicography Study
Subject number and education background
Chon (2008) Lew and Doroszewska (2009) Chen (2010) Dziemianko (2010) Nesi and Kim (2011)
10 students studying at a Korean university 56 students studying at pre-university level in Poland 85 students studying at a Chinese university 64 students studying at a Polish university 124 students studying at a Malaysian university
175
The Bloomsbury Companion to Lexicography
Another question reviewers raised is the significance of findings from user-related studies, since many of these were collected from the self-reports and observations of behaviours and attitudes of a groups of L2 learners of English who are inexperienced and relatively young. Reviewers doubt the worth of data obtained from these subjects who have only ‘rudimentary reference skills’ (Cowie 1999) and are, in Bejoint’s (2010: 257) view: Impatient . . . [and] anything sophisticated, or abstract, or too long, or expressed in codes will be neglected, because the amount of time and energy necessary to find and understand the information is too much compared with the benefit derived from the consultation. Many of them would ‘ditch a tool which requires too much clicking work’ (Lew, Online Dictionaries of English) when they navigate through the menu on the electronic interface for the full treatment of a word under search. Aside from having relatively low-level reference skills and showing limited patience and inquisitiveness in looking up information from a dictionary, many of these students have low English language proficiency or superficial knowledge of vocabulary acquisition (e.g. Nesi and Meara 1994, Tono 2001, Chi 2010). Additional studies found how infrequently they use their dictionaries (e.g. Béjoint 2010). If one accepts that data obtained in the past two decades provide us with insights into the use, reference needs, attitudes and behaviours of L2 users of MLDs, one should also take into consideration the small window in which these research studies have operated; and hence, the limitations of the representation of such findings. Cowie’s (1999: 187) comment in the following should be kept as a reminder of the urgency of finding new directions in user-related research: There seems little point in trying to assess the ability of students to retrieve information of whose existence they are hardly aware or to judge their performance of activities which they have seldom tackled.
2.5 Researching Evaluation 2.5.1 The Reviewer
How should we evaluate, and who should be the judge of, a pedagogical dictionary? Naturally, one would think the user should have the last word on how well a dictionary has served them for the information sought. However, the MLD user is typically a L2 learner of English studying the language through a structured curriculum, and he or she often ‘approaches the dictionary within the constraints of his or her own needs and skills, but without necessarily receiving appropriate guidance and/or instruction’ (Hartmann 2001: 25). In other words, 176
Researching Pedagogical Lexicography
the choice of their dictionary may have been made under the constraint of the requirement of school, course or teachers. Indeed, most research studies report that students’ choice of dictionaries for use is mostly determined by their teachers’ recommendation (e.g. Béjoint 1981, Nuccorini 1992, Chi 2003). Relatively young learners may have limited skills in, and knowledge of, the dictionary when asked to comment on or use it to perform a linguistic task. Béjoint (2010: 230) cautions that: A dictionary that sells well is not necessarily a good dictionary, but it is certainly a dictionary that corresponds to a social need. But the commercial success of dictionaries is not an unambiguous indicator of what the users need. A more promising group of MLD evaluators would be teachers of the MLD target users, the EFL/ESL teachers. They are professionally trained, equipped with pedagogical theories and methodologies in foreign language acquisition, teaching and attending to the linguistic needs of MLD target users first-hand in the classrooms. The teacher is, very likely, the driving force in consulting a dictionary in a structured English learning environment. Moreover, it may be reasonable to assume that teachers are conversant with dictionary use and can give a lucid account of the practicality of the dictionary in their students’ learning process. Surprisingly, however, research concerning English language teachers and dictionary use and evaluation is not common. In many user-related studies, English language teachers were subsumed as subjects alongside language learners. For example, in Nuccorini’s (1992) study, a group of five Italian teachers of English was used to compare with eleven EAP students on their dictionary choice and use. Other studies surveying teachers’ views and/or use of dictionaries include those of Herbst and Stein (1987), Tickoo (1989), Koren (1997), Chi (2003, 2011) and Boonmoh (2010). In Jackson’s (2002: 175) view, In general, reviewers – of books, plays, films, music – are chosen because they are considered knowledgeable or expert in the subject matter or the techniques of whatever it is they are reviewing. We should expect the reviewers of dictionaries to be knowledgeable in lexicography. If we follow this criterion, and given that the target users of MLDs are mainly L2 learners of English, academic lexicographers or metalexicographers with a speciality in English language teaching and/or learning should be the ideal reviewers. Indeed, most of the user-related research studies involved the researcher (commonly a metalexicographer or linguist with an English language teaching position at a university) examining his or her own students’ 177
The Bloomsbury Companion to Lexicography
use of dictionaries, either using a questionnaire, or a linguistic performance test, or both. Some metalexicographers conducted comparative reviews examining features of a selected few MLDs based on their own personal judgement or set of criteria (e.g. Dalgish 1995, Allen 1996, Bogaards 1996, Herbst 1996, Scholfield 1999). These were reviews mainly of the editions of the major four MLDs that were published in 1995. Chan and Taylor (2003: 259) found such a review approach more helpful than one which compares a particular dictionary with its earlier edition, since a comparative review seems ‘to lead to a more thorough analysis of the selected features of the dictionaries’ and thus, is ‘best calculated to provide users (and those who advise users) with a sound basis for making the right choice of dictionary’ (ibid.: 261). However, they found the 36 reviews they examined ‘primarily factual and descriptive rather than evaluative’ and suggested that they ‘might be better called “book notices”’ (ibid.: 267). Béjoint (2010: 228) further adds that, ‘when the reviewers are academics, the reviews are usually better informed but they are more malicious, and on the whole not much more helpful’.
2.5.2 Methodology
Chan and Taylor (2003) reported that rarely did the reviews they examined provide any clear reviewer’s purpose. Moreover, they found that most of the reviews were directed not to end users but at the users’ teachers. They concluded that the evaluation process of those reviews in general was unclear and the comments made mostly based on the reviewer’s intuition. Jackson (2002) suggests that reviewers should begin their examination of a dictionary by familiarizing themselves through reading different sections of the dictionary such as the front and back matter, the user’s guide, the preface and the staff and consultant lists. Methods adopted could be random sampling of dictionary entries, conducted by one or a team of reviewers. He (ibid.: 176) asserts that: Team reviews allow a more thorough treatment of each aspect of a dictionary’s lexical description, both by enabling more extensive sampling to be undertaken and by tapping into a reviewer’s specialist interest. He proposes two sources of information, the ‘internal’ and ‘external’, to reference for setting criteria for evaluation. While the ‘internal’ source refers to information obtained from the dictionary provided by the compiler, the ‘external’ source is from metalexicography including linguistic theories regarding the lexicon, dictionary design and production. However, Nielsen (2009: 28) considers adopting a linguistic approach for evaluation inadequate since a dictionary is not ‘just a container of the lexicon of a language’. He proposes 178
Researching Pedagogical Lexicography
a lexicographic approach to dictionary evaluation since it focuses on the significant features of both printed and electronic dictionaries. Such an approach encompasses lexicographic function, data and structure. Reviews adopting such an approach will offer readers information on making various decisions in areas such as the usefulness of the dictionary and the practicality and theoretical development of lexicography. A third approach that he raises is the factual approach, which has a focus on ‘an analysis, description and evaluation of the factual (semantic and encyclopaedic) data and topics contained and treated in the dictionary related to the lexicographic functions’ (ibid.: 29). Hartmann (2001) also advocates the need to establish international standards, including features like coverage, format, scope, size, title and authority for dictionary criticism. In the past, most reviews involved comparing several MLDs, a specific MLD with its earlier edition, MLDs with bilingual and bilingualized dictionaries. With the availability of the MLD on CD-ROM, handheld or pocket-size electronic dictionaries, and free internet access, reviews or comparative studies on printed MLDs and their electronic versions are growing: some examples are Nesi (2003), de Schryver (2003), Chen (2010) and Lew (Online Dictionaries of English).
2.6 The Way Forward This chapter has chosen only a few issues in the field of pedagogical lexicography which are of profound significance to the field for review and discussion. Readers may refer to other important issues, such as how computing technology has impacted the compilation of pedagogical dictionaries and the user-related research on electronic dictionaries (including PEDs and MLDs in internet mode) from other chapters under ‘Current Research and Issues’ of this book. In the following, we shall examine why a major aspect in researching pedagogical lexicography in future lies with the first word of the subject matter: ‘pedagogy’.
2.6.1 Pedagogical Lexicography and Language Teaching
Hartmann and James (1998: 107) define pedagogical dictionaries as dictionaries ‘specifically designed for the practical didactic needs of teachers and learners of a language’. Most of the discussion and research studies in the area to date have been on the latter. There is a wealth of research opportunities in exploring the teaching professional, and analysis of the data would bring fresh insight into the fine-tuning of existing MLDs. We still have little knowledge of EFL teachers’ dictionary-use experience and training, given that their teaching philosophy and classroom teaching approach may have impacted on their students’ 179
The Bloomsbury Companion to Lexicography
choices, attitudes and performance in dictionary use. In the following, an explanation is presented for some useful areas of future research. Dictionary users do not approach a MLD with absolutely no reference skills or linguistic knowledge. L2 learners of English would have, for example, approached the dictionary with years of English language knowledge, learning experience and study skills, mostly obtained through a structured English language syllabus instructed by teachers in a classroom setting. The linguistic scaffolding that students acquired in their formative years would have facilitated their use of the dictionary to tackle linguistic tasks they face. When user-related studies uncover that most users exploit only a narrow range of dictionary items in their consultations, focusing predominantly on meanings in their dictionary search, how should researchers interpret the findings? Currently, most studies would conclude that MLDs should be made more transparent and user-friendly in meeting users’ decoding needs. Future research studies including the teacher in the examination may ponder additionally, for example, ‘Has the English language curriculum and/or teaching that users received in the past promoted this narrow usage?’ Other areas of research with a teacher focus could investigate why users are unaware of a range of vocabulary information their dictionaries provide, such as in Fan’s (2000) study where university students who have learned English in a school system for 13 years before entering university were found not recognizing the vast resources that dictionaries contain even though they reported a high ownership; or examine why users are unaware of their own reference needs, such as Frankenberg-Garcia’s (2011) study revealed, instead fixating only on L1-L2 equivalents and word spelling in their dictionary consultations. An investigation of such failings may begin with a plausible hypothesis that the widely adopted communicative approach in EFL teaching in the past three decades has shifted the foreign language teaching objective from writing to speaking and from semantic and syntactic to communicative competence. In such a teaching environment, dictionaries will not have a strong presence in core teaching. Herbst and Stein (1987: 121) criticize the communicative teaching approach, suggesting that it ‘not only discourages dictionary training but actually runs counter to it’ and that ‘semantic precision, situation appropriateness and grammatical correctness have all too often and too readily been set aside and even discredited’. If syntactic characteristics and terminologies are not explained or referenced in class, it is likely that few students would consult the dictionary for grammatical information simply out of ignorance of its presence and use. Taking into consideration the incomplete linguistic scaffolding that students are presumed to possess when they consult a dictionary may suggest a different conclusion from that which most current research studies have reached. Findings on students’ linguistic capacity and learning strategies with
180
Researching Pedagogical Lexicography
reference to their past English language study would help lexicographers to gauge the levels and kinds of dictionary information that a MLD should provide in order to be user-friendly and transparent for its users. Hartmann (2001: 120) postulates that dictionary research as an interdisciplinary subject ‘is the direction in which research in metalexicography or dictionary research should proceed’ and pedagogical lexicography in fact emerged from joining the sister discipline Language teaching with Lexicography (ibid.). In the case of MLD compilation, the siblings do not seem to be working alongside each other and/or getting equal attention from researchers; instead, many dictionary use decisions have been made by the latter sister following findings obtained directly from the sisters’ shared clients, the L2 learners of English. One possible reason could be inferred from Rundell’s (2007) remark on Dziemianko’s (2006) appeal for dictionary use teaching. He wrote: The iPod comes with almost no instructions – you just have to figure it out, and most people under 30 have no problem with this. So it is incumbent on designers of dictionaries to create systems that users don’t have to learn and that don’t require elaborate explanatory material. Such an attitude is confirmed by Yamada’s (2010: 165) supposition that the teachers have been neglected because dictionary compilers are ‘impatient to see the teaching of dictionary use coming in a visible way’ and ‘have gone to great lengths to make their dictionaries accessible to users, in a sense bypassing the teachers’. Rundell’s assumption is flawed, for while it is true that the new generation of MLD users are technology savvy, it does not follow that their reference skills and linguistic knowledge are necessarily better than user-related research subjects examined in the past. The cognitive process and skill required of users, and their expectation of a dictionary consultation for linguistic needs are not comparable to circumstances when an iPod is used in searching for entertainment or news online. Indeed, extra help is needed to direct users to use dictionaries offered in the virtual medium, as Lew (Online Dictionary of English) concludes in his review of online dictionaries that ‘we have seen that a great variety of dictionaries exist, but without proper guidance users run the risk of getting lost in the riches’. Research findings on how foreign language teaching approaches, curricula and methodologies have impacted on students’ dictionary use and choice would deepen our understanding of users’ needs and, in turn, of the appropriate help that could be offered.
2.6.2 Dictionary Use Training and the Language Teacher
In view of users’ general low-level and infrequent usage of the MLD, another solution that has been widely suggested by researchers is to provide training to
181
The Bloomsbury Companion to Lexicography
users (e.g. Atkins and Varantola 1998, Cowie 1999, Chi 2003, Chon 2008, Chen 2010, Yamada 2010, Lew 2011). Béjoint (1989: 212) contends that, The teaching of dictionary use is important not because it aims at improving the way dictionaries are used, but also because it might turn out in the long run to be instrumental in the general progress of lexicography. He produces a checklist of dictionary skills that an ideal user should possess; some skills are essential to all types of dictionary but some require the users to have knowledge of dictionary typology. Nesi (2003) offers a list of dictionary skills that could be taught at university level. Both lists provide valuable suggestions but are rather abstract and may need to be translated into practical teaching methodologies and classroom exercises for practical use. The items also demand that teachers possess lexicographical knowledge which they may not have, because lexicography as a subject is not included in most EFL teaching training courses (e.g. Gates 1997, Chi 2003, 2011). Many researchers suggest integrating dictionary skill teaching into an existing English syllabus, making the skill relevant to students’ immediate study needs because, as revealed in Cubillo’s (2002: 219) study, ‘reference skill acquisition was reinforced by significant purpose of use’, and in Frankenberg-Garcia’s (2011: 121) that ‘it makes more sense to help learners with dictionaries whenever the need for them arises’. Most agree that the training should start early since dictionary use is a skill that requires teaching and practice, and good attitudes should be instilled at a young age. While metalexicographers and lexicographers are absorbed in proving the need for dictionary training and laying out principles on what to teach, the how of teaching and who to teach remain unexplored. First, little research has been conducted exploring the methodology and syllabus for dictionary use teaching and their effectiveness for learners’ performance. Béjoint (2010: 260) considers improving the dictionary skills of users difficult since ‘it requires the cooperation of teachers, teaching systems, and governments in many cases, provided the users themselves are ready to be educated’. Second, can we assume English teachers are willing and/or capable partners in training students to use dictionaries? Researchers have expressed concerns with teachers not being willing to let students use dictionaries in the classroom or to complete linguistic tasks like reading comprehension and vocabulary learning (Tono 2001). Hartmann (2001) suggests that there exists a love–hate relationship between teachers and the dictionary: some teachers may be reluctant to train students because they may feel that once students become proficient in using dictionaries to assist learning, they will not need to depend on them any longer. As regards teachers’ ability to teach the subject, Boonmoh’s (2010) survey reveals that many university lecturers of English are unaware of the development and functions of electronic 182
Researching Pedagogical Lexicography
dictionaries and are unwilling to train their students. Even if teachers are willing and capable of providing dictionary training in the classroom, the subject lacks user-oriented teaching material (Nesi 2000a). In short, researching dictionary use teaching is an unexplored area. Issues which require investigation and testing include syllabuses for teaching users at various proficiency levels, teaching methodologies, materials and assessment. To prepare such a teaching package, there is a need to identify and benchmark the threshold level for reference skills and English proficiency, if such a level exists at all. A teaching goal of dictionary use training could be helping users to reach a level where they could manoeuvre reasonably well (obtaining more successful look-ups) in a printed or electronic dictionary for the completion of various linguistic tasks they are engaging in. Atkins and Rundell (2008) explain that the birth of a brand new dictionary starts and ends with the Marketing Department of the publisher concerned; and English teachers often learn about a particular dictionary only via the ESL/ EFL marketing personnel. Joint research studies, publications and participation in conferences from researchers of both the language teaching and lexicography disciplines would help pedagogical lexicography’s future development. It is still very true as Hulstijn and Atkins (1998: 17) suggest, Juggling these two scenarios [the educational thinking of giving all information or withholding some until people are ready to understand] is the unenviable task of the lexicographer, understanding what’s going on in their dictionaries and teaching dictionary users to understand this, and adapt to it, calls to the language teacher. Collaborative efforts in dictionary use research are, we believe, the way forward.
References Allen, R. (1996) The big four. English Today 46, 12/2, 41–7. Ard, J. (1982) The use of bilingual dictionaries by ESL students while writing. Review of Applied Linguistics 58, 1–27. Atkins, B. T. S. (ed.) (1998) Using Dictionaries: Studies of Dictionary Use by Language Learners and Translators. Tübingen: Max Niemeyer. Atkins, B. T. S. and Rundell, M. (2008) The Oxford Guide to Practical Lexicography. New York: Oxford University Press. Atkins, B. T. S. and Vanrantola, K. (1998) Monitoring dictionary use. In: B. T. S. Atkins (ed.), 21–81. Battenburg, J. D. (1991) English Monolingual Learner’s Dictionaries: A User-oriented Study. Tübingen: Max Niemeyer. Béjoint, H. (1981) The foreign student’s use of monolingual English dictionaries: a study of language needs and reference skills. Applied Linguistics 2/3, 207–22. — (1989) The teaching of dictionary use: present state and future tasks. In: F. J. Hausmann, O. Reichmann, H. E. Wiegand and L. Zgusta (eds) Wörterbücher/Dictionaries/Dictionnaires:
183
The Bloomsbury Companion to Lexicography An International Encyclopedia of Lexicography, Vol. 1. Berlin/New York: Walter de Gruyter, 208–15. — (2010) The Lexicography of English. Oxford/New York: Oxford University Press. Bogaards, P. (1996) Dictionaries for learners of English. International Journal of Lexicography 9/4, 277–320. — (1998) Scanning long entries in learner’s dictionaries. In: T. Fontenelle, P. Hiligsmann, A. Michiels, A. Moulin and S. Theissen (eds) Actes EURALEXE ’98 Proceedings. Papers Submitted to the Eighth EURALEX International Congress on Lexicography in Liège, Belgium, Vol. II. Liège, Beligium: University of Liège, English and Dutch Department, 555–63. Boonmoh, A. (2010) Teachers’ use and knowledge of electronic dictionaries. ABAC Journal 30/3, 56–74. Chan, A. (2012) The use of a monolingual dictionary for meaning determination by advanced ESL learners in Hong Kong. Applied Linguistics 33/2, 115–40. Chan, A. Y. W. and Taylor, A. J. (2003) Evaluating learner dictionaries: what the reviews say. In: R. R. K. Hartmann (ed.), 254–73. Also in International Journal of Lexicography (2001) 14/3, 163–80. Chen, Y. (2010) Dictionary use and EFL learning. A contrastive study of pocket electronic dictionaries and paper dictionaries. International Journal of Lexicography 23/3, 275–306. Chi, M. L. A. (2003) An Empirical Study of the Efficacy of Integrating the Teaching of Dictionary Use into a Tertiary English Curriculum in Hong Kong Vol IV: Research Reports (ed. G. James). Hong Kong: Language Centre, Hong Kong University of Science and Technology. — (2010) Applying formal vocabulary to academic writing: is the task achievable? Reflections on English Language Teaching 9/2, 171–90. — (2011) When dictionaries support vocabulary learning, where to begin? In: K. Akasu and S. Uchida (eds) ASIALEX Proceedings: Lexicography, Theoretical and Practical Perspectives. Papers Submitted to the Seventh ASIALEX Biennial International Conference, Kyoto, 22–24 August 2011, 76–85. Chon, Y. V. (2008) The electronic dictionary for writing: a solution or a problem? International Journal of Lexicography 22/1, 23–54. Cowie, A. P. (1999) English Dictionaries for Foreign Learners: A History. New York: Oxford University Press. Cowie, A. P. (ed.) (1987) The Dictionary and the Language Learner: Papers from the EURALEX Seminar at the University of Leeds, 1–3 April 1985. Tübingen: Franke Verlag. Coxhead, A. (2000) A new academic word list. TESOL Quarterly 34/2, 213–38. Cubillo, M. C. C. (2002) Dictionary use and dictionary needs of ESP students: an experimental approach. International Journal of Lexicography 15/3, 206–28. Cumming, G., Cropp, S. and Sussex, R. (1994) On-line lexical resources for language learners: assessment of some approaches to word definition. System 22/3, 369–77. Dalgish, G. (1995) Learners’ dictionaries: keeping the learner in mind? In: B. B. Kachru and H. Kahane (eds) Cultures, Ideologies, and the Dictionary: Studies in Honor of Ladislav Zgusta. Tübingen: Max Niemeyer, 329–38. De Schryver, G. M. (2003) Lexicographers’ dream in the electronic-dictionary age. International Journal of Lexicography 16/2, 143–99. Dziemianko, A. (2006) User-friendliness of Verb Syntax in Pedagogical Dictionaries of English (Lexicographica. Series Maior 130). Tübingen: Max Niemeyer. — (2010) Paper or electronic? The role of dictionary form in language reception, production and the retention of meaning and collocations. International Journal of Lexicography 23/3, 257–73. Fan, M. (2000) The dictionary look-up behaviour of Hong Kong students: a large-scale survey. Education Journal 28/1, 123–38. Fox, G. (1989) A vocabulary for writing dictionaries. In: M. L. Tickoo (ed.), 153–71.
184
Researching Pedagogical Lexicography Frankenberg-Garcia, A. (2011) Beyond L1-L2 equivalents: where do users of English as a foreign language turn for help? International Journal of Lexicography 24/1, 97–123. Gates, J. E. (1997) A survey of the teaching of lexicography 1979–1995. Dictionaries 18, 66–93. Gouws, R. H. (2009) Sinuous lemma files in printed dictionaries: access and lexicographic functions. In: S. Nielsen and S. Tarp (eds), 3–21. Hanks, P. (1987) Definitions and explanations. In: J. Sinclair (ed.) Looking Up. London and Glasgow: Collins, 116–36. Hartmann, R. R. K. (2001) Teaching and Researching Lexicography. Harlow: Pearson Education Limited. Hartmann, R. R. K. (ed.) (2003) Lexicography Critical Concepts. London: Routledge Hartmann, R. R. K. and James, G. (1998) Dictionary of Lexicography. London: Routledge. Harvey, K. and Yuill, D. (1997) A study of the use of a monolingual pedagogical dictionary by learners of English engaged in writing. Applied Linguistics 18/3, 253–78. Hatherhall, G. (1984) Studying dictionary user: some findings and proposals. In: R. R. K. Hartmann (ed.) LEXeter ’83 Proceedings: Papers from the International Conference on Lexicography at Exeter, 9–12 September 1983. Tübingen: Max Niemeyer, 183–9. Herbst, T. (1986) Defining with a controlled defining vocabulary in foreign learners’ dictionaries. Lexicographica 2, 101–19. — (1996) On the way to the perfect learners’ dictionary: a first comparison of OALD5, LDOCE3, COBUILD2 and CIDE. International Journal of Lexicography 9/4, 321–57. Herbst, T. and Stein, G. (1987) Dictionary-using skills: a plea for a new orientation in language teaching. In: A. P. Cowie (ed.), 115–27. Hulstijn, J. and Atkins, B. T. S. (1998) Empirical research on dictionary use in foreign-language learning: survey and discussion. In: B. T. S. Atkins (ed.), 7–19. Humblé, P. (2001) Dictionaries and Language Learners. Frankfurt am Main: Haag und Herchen. Jackson, H. (2002) Lexicography: An Introduction. London: Routledge. Kernerman, I. J. and Bogaards, P. (eds) (2010) English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: K Dictionaries. Kipfer, B. A. (1987) Dictionaries and the intermediate student: communicative needs and the development of user reference skills. In: A. P. Cowie (ed.), 44–54. Kokawa, T. and Yamada, S. (1998) Review of Della Summers (ed.). Longman Dictionary of English Language and Culture. Harlow: Longman. 1992; Jonathan Crowther (ed.) Oxford Advanced Learner’s Dictionary of Current English, Encyclopedic Edition. Oxford: Oxford University Press, 1992. International Journal of Lexicography 11/4, 343–57. Koren, S. (1997) Quality versus convenience: comparison of modern dictionaries from the researcher’s, teacher’s and learner’s points of view. TESL-EJ 2/3. http://tesl-ej.org/ ej07/a2.html (accessed 30 July 2012). Landau, S. I. (1989) Dictionaries: The Art and Craft of Lexicography. Cambridge: Cambridge University Press. Lew, R. (2011) Studies in dictionary use: recent developments. International Journal of Lexicography 24/1, 1–4. — (preprint version) Online dictionaries of English. Published in: P. A. Fuertes-Olivera and H. Bergenholtz (eds) (2011) E-Lexicography: The Internet, Digital Initiatives and Lexicography. London/New York: Continuum, 230–50. http://hdl.handle.net/10593/742 (accessed 30 July 2012). Lew, R. and Doroszewska, J. (2009) Electronic dictionary entries with animated pictures: lookup preferences and word retention. International Journal of Lexicography 22/3, 239–57.
185
The Bloomsbury Companion to Lexicography McArthur, T. (1989) The background and nature of ELT learners’ dictionaries. In: M. L. Tickoo (ed.), 52–64. MacFarquhar, P. D. and Richards, J. C. (1983) On dictionaries and definitions. RELC Journal 14/1, 111–24. Nesi, H. (1987) Do dictionaries help students write? In: T. Bloor and J. Norrish (eds) Written Language: British Studies in Applied Linguistics 2. London: Centre for Information on Language Teaching and Research, 85–97. — (1998) Defining a shoehorn: the success of learners’ dictionary entries for concrete nouns. In: B. T. S. Atkins (ed.), 159–78. — (2000a) The Use and Abuse of EFL Dictionaries (Lexicographica. Series Maior 98). Tübingen: Max Niemeyer. — (2000b) Electronic dictionaries in second language vocabulary comprehension and acquisition: the state of the art. In: U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds) Proceedings of the Ninth EURALEX International Congress, EURALEX 2000 Stuttgart, Vol. II, 839–47. — (2003) The specification of dictionary reference skills in higher education. In: R. R. K. Hartmann (ed.), 370–93. Also in: R. R. K. Hartmann (ed.) (1999) Dictionaries in Language Learning. Berlin: Free University/FLC/TNP, 53–67. Nesi, H. and Kim, H. T. (2011) The effect of menus and signposting on the speed and accuracy of sense selection. International Journal of Lexicography 24/1, 79–96. Nesi, H. and Meara, P. (1994) Patterns of misinterpretation in the productive use of EFL dictionary definitions. System 22/1, 1–5. Nielsen, S. (2009) Reviewing printed and electronic dictionaries: a theoretical and practical framework. In: S. Nielsen and S. Tarp (eds), 23–41. Nielsen, S. and Tarp, S. (eds) (2009) Lexicography in the 21st Century. Amsterdam: John Benjamins. Nuccorini, S. (1992) Monitoring dictionary use. In: H. Tommola, K. Varantola, T. Salmi-Tolonen and J. Schopp (eds) EURALEX ’92 Proceedings I-II. Papers Submitted to the 5th EURALEX International Congress on Lexicography in Tampere. Finland: University of Tampere, 89–102. Reif, J. A. (1987) The development of a dictionary concept: an English learner’s dictionary and an exotic alphabet. In: A. P. Cowie (ed.), 146–58. Rundell, M. (1988) Changing the rules: why the monolingual learner’s dictionary should move away from the native-speaker tradition. In: M. Snell-Hornby (ed.) ZuriLEX ’86 Proceedings. Papers Read at the Euralex International Congress 1986. Tübingen: Franke Verlag, 127–37. — (1998) Recent trends in English pedagogical lexicography. International Journal of Lexicography 2/4, 315–42. — (2006) More than one way to skin a cat: why full-sentence definitions have not been universally adopted. In: E. Corino, C. Marello and C. Onesti (eds) Proceedings of the XII EURALEX International Congress at Università di Torino, 323–38. — (2007) Review of: Dziemianko, A. (2006) User-friendliness of Verb Syntax in Pedagogical Dictionaries of English, Tübingen: Max Niemeyer. Kernerman Dictionary News 15. http:// kdictionaries.com/kdn/kdn15/kdn1507-rundell.html (accessed 30 July 2012). — (2010) What future for the learner’s dictionary? In: I. J. Kernerman and P. Bogaards (eds), 169–75. Scholfield, P. (1999) Dictionary use in reception. International Journal of Lexicography 12/1, 13–34. Stein, G. (1979) The best of British and American lexicography. Dictionaries 1, 1–23. Taylor, A. and Chan, A. (1994) Pocket electronic dictionaries and their use. In: W. Martin, W. Meijs, M. Moerland, E. ten Pas, P. van Sterkenburg and P. Vossen (eds) EURALEX 1994 Proceedings. Amsterdam: Vrije Universiteit, 598–605.
186
Researching Pedagogical Lexicography Thumb, J. (2004) Dictionary Look-up Strategies and the Bilingualised Learner’s Dictionary. Tübingen: Max Niemeyer. Tickoo, M. L. (1989) Which dictionaries and why? Exploring some options. In: M. L. Tickoo (ed.), 184–203. Tickoo, M. L. (ed.) (1989) Learners’ Dictionaries: State of the Art. Singapore: SEAMEO Regional Language Centre. Tomaszczyk, J. (1979) Dictionaries: users and uses. Glottodidactica 12, 103–19. Tono, Y. (2001) Research on Dictionary Use in the Context of Foreign Language Learning (Lexicographica. Series Maior 106). Tübingen: Max Niemeyer. — (2011) Application of eye-tracking in EFL learners’ dictionary look-up process research. International Journal of Lexicography 24/1, 124–53. van der Meer, G. (2002) Dictionary entry and access. Trying to see trees and woods. EURALEX 2002 Proceedings (digital version). www.euralex.org/elx_proceedings/ Euralex2002/055_2002_V2_Geart%20van%20Der%20Meer_Dictionary%20Entry%20 and%20Access%20Trying%20to%20see%20Trees%20and%20Woods.pdf (accessed 30 July 2012). van der Meer, G. and Sansome, R. (2001) OALD6 in a linguistic and a language teaching perspective. International Journal of Lexicography 14/4, 283–306. Wiegand, H. E. (1985) Fragen zur Grammatik in Wörterbuchbenutzungs-protokollen. Ein Beitrag zur empirischen Erforschung der Benutzung einsprachiger Wörterbücher. In: H. Bergenholtz and J. Mugdan (eds) Lexikographie und Grammatik. Tübingen: Max Niemeyer, 20–98. Yamada, S. (2009) EFL dictionaries on the web: students’ appraisal and issues in the Cambridge, Longman, and Oxford dictionaries. In B. Y. V. Ooi, A. Pakir, I. S. Talib and P. K. W. Tan (eds) Perspectives in Lexicography: Asia and Beyond. Tel Aviv: K Dictionaries, 87–104. — (2010) EFL dictionary evolution: innovations and drawbacks. In: I. J. Kernerman and P. Bogaards (eds), 147–68.
187
4.5
Monolingual Learners’ Dictionaries – Where Now?1 Shigeru Yamada
Chapter Overview Introduction Development of EFL Dictionaries Important Innovations and Features Electronic Possibilities Conclusion
188 189 197 205 206
1 Introduction The EFL dictionary can be said to be a delicate answer to the negotiation of various factors, demands and considerations by and of the ‘protagonists’ (compilers, users, teachers and researchers) (Hartmann 2001: 24–7). Since the first main result bore fruit in ISED2 in 1942, the negotiation has been continuing incessantly and at various speeds, producing more fruit. There have been shifts in emphasis, momentum and fashion. There have emerged major technological advances, which have affected and changed the methods of lexicography and the shape of dictionaries. This chapter deals with EFL dictionaries in the narrow sense – monolingual English dictionaries for foreign students of English – the most advanced and influential genre not only of learners’ dictionaries but also of all dictionary types. In an attempt to describe the present state of the EFL dictionary, we will look first at the developmental stages that those dictionaries went through 188
Monolingual Learners’ Dictionaries – Where Now?
and then at six of their important innovations and features. As pointers to their future development, the implications of electronic media for the presentation of a dictionary will also be touched upon.
2 Development of EFL Dictionaries The important dictionaries from major publishers, almost all for advanced learners, are laid out in Table 4.5.1 below. We will divide the evolution of EFL dictionaries into five stages and characterize each as appears in the section title. We will look at each stage with reference to the associated dictionaries and their innovations and features.
2.1 Prelude and the First Period (1942–73): Beginning and Monopoly The prototypical EFL dictionary was materialized in ISED. The dictionary is supposed to have been the best practice put together by English-speaking scholars engaged in English language teaching in Japan back in the 1920s–1940s, drawing ideas from Europe and India as well as their dealings with Japanese students. In Bengal, India, Michael West conducted a reading-centred approach, creating graded readers, whose vocabulary items were carefully controlled. In 1935, he with James Endicott produced a companion dictionary to his Table 4.5.1 Development of mainstream efl dictionaries Kaitakusha/ OUP
Longman
Chambers/ CUP
OALD3 (1974)
LDOCE1 (1978)
CULD (1980)
OALD4 (1989)
LDOCE2 (1987)
OALD5 (1995)
LDOCE3 (1995)
Collins
Macmillan
Merriam-Webster
ISED (1942) ALD2 (1963) COBUILD1 (1987) CIDE (1995)
OALD6 (2000)
COBUILD3 (2001) LDOCE4 (2003)
OALD7 (2005) OALD8 (2010)
COBUILD2 (1995)
LDOCE5 (2008)
COBUILD4 (2003)
MED1 (2002)
CALD2 (2005)
COBUILD5 (2006)
MED2 (2006)
CALD3 (2008)
COBUILD6 (2009)
MWALED (2008)
189
The Bloomsbury Companion to Lexicography
textbooks – NMED. It characteristically employs a limited defining vocabulary of 1,490 words with which to define 18,000 words and 6,000 idioms (Preface, iii). In 1934, the Carnegie Corporation Conference was held on the initiative of West. It eventually led through the Interim Report on Vocabulary Selection (1936) to the publication of his influential General Service List of English Words (1953). In 1921, Harold E. Palmer was invited by the Ministry of Education to reform Japan’s ELT and stayed in the country until 1936. GEW was published in 1938. Although alphabetical, this precursor to EFL dictionaries is geared to the user’s productive needs, especially in composition (Introduction; iii, v–vi), rather than their receptive needs (cf. Cowie 1999: 36). The dictionary with some 1,000 headwords meticulously provided verb patterns with abundant examples without definitions being regularly given to each sense (Introduction, ix). GEW’s method of indicating verb patterns for the dictionary had a far-reaching influence on EFL dictionaries to come (Cowie 1999: 37). The dictionary ‘left its mark on Hornby’s Idiomatic and Syntactic English Dictionary . . . and helped its strong “productive” character’ (ibid. 3). A. S. Hornby came to Japan in 1924. In 1936 Palmer left Japan in the middle of the dictionary project, after compiling the headword list (Imura 1997: 209). The unfinished groundwork towards the dictionary (revision and enlargement of the selection of collocations, finalization of 25 verb patterns) was taken over, refined, and brought to a workable level by Hornby, together with E. V. Gatenby and H. Wakefield (cf. Naganuma 1978: 11–12). The manuscript was completed as a result of five years’ concerted endeavour (Imura 1997: 236). ISED is a small book by today’s standards of EFL dictionaries in terms of both volume and content (1,519 pages with about 30,000 entries). Hornby seems to have consciously kept the vocabulary size as it was since he intended his dictionary for pre-university EFL students both for decoding and encoding (and continuing at the university level and above as far as production is concerned) (Introduction, iv). Pronunciation is indicated by means of the IPA. McArthur (1992: 593–4) summarizes the characteristics of ‘Hornby’s dictionary’ as follows: (1) Headwords chosen because experienced teachers believed they were the most useful to foreign learners. (2) The omission of archaic usage, historical and literary references, and etymology. (3) The provision of the pronunciation of each headword in IPA transcription derived from Jones’s EPD. (4) Meanings given in simple language, avoiding the often convoluted and Latinate constructions used in many mother-tongue dictionaries. (5) Meanings explained by definitions and specimen phrases and sentences to show the headword in use. 190
Monolingual Learners’ Dictionaries – Where Now?
(6) Grammatical information on every headword provided, including codes referring to the syntactic patterns of all listed verbs. (7) Illustrations providing further information and serving to break up the text. (8) Language-related appendices at the back of the book. All (a few modified) underlie the basic principles of contemporary EFL dictionaries. ISED can be considered to be the best mix put together by English-speaking scholars engaged with ELT in Japan: IPA, research into verb patterns, vocabulary limitation, and collocation, and the editors’ expertise in teaching EFL students. Hornby and his colleagues should be highly praised for making next-to-impossible possible: reconciling opposing double purposes, and ‘simplicity’ (in definitions, specimens of usage, and language notes) and ‘complexity’ (in the phonetic transcriptions and grammatical codes) (McArthur 1992: 594) in 1,519 pages in the foreign language in an accessible way for the target users.
2.2 The Second Period (1974–86): Rivalry and Sophistication During this period, much effort was invested into incorporating additional information, including the results of linguistic research. This was realized by means of grammar codes and ‘dictionary-ese’ to attain the maximum use of the limited dictionary space. With Longman entering into rivalry with Oxford, the EFL dictionary market became competitive. LDOCE1 brought in a number of key features and improvements. The most influential is its defining vocabulary. It is a selection of approximately 2,000 words, based on A General Service List of English Words (1953). LDOCE1 attempted to provide a detailed account of English usage. The Survey of English Usage (SEU) was frequently referred to and many example phrases and sentences were drawn from the collection (LDOCE1, Preface, xxvi). Grammatical information was primarily based on A Grammar of Contemporary English (1972) (McArthur 1992: 594), which is also based on the SEU. Detailed grammar codes were given and extended to nouns and adjectives. A conscious effort was made to describe the pronunciation and vocabulary of American English. Stress shift was indicated. 488 usage notes were incorporated. OALD3 had broken verb patterns down into 51, compared to ISED’s 33. The coding system of LDOCE1 was more ‘user-friendly’ than that of OALD3 because at least part of the former was mnemonic (Cowie 1990: 688). However, it was pointed out that ‘there is a real danger of opening the gap which is known to exist between the sophistication of some features of dictionary design and the 191
The Bloomsbury Companion to Lexicography
user’s often rudimentary reference skills’ (Cowie 1981: 206). The intermediate student’s CULD took full account of this: conservative, plain and simple, giving minimum essential information (Takebayashi et al. 1982: 108–9) without using any complicated codes or language.
2.3 The Third Period (1987–94): Competition and Versatility The year 1987 saw the publication of COBUILD1 and the revision of LDOCE. The former is ‘revolutionary’ in the history of lexicography. Editor-in-chief, John Sinclair states in the Introduction (xv) that the dictionary is different from others in the kind of information, its quality and presentation. Breaking away from the prevalent method of depending on the introspections of lexicographers and foregoing dictionaries, the COBUILD team based their dictionary on a 7.3 million-word corpus. This is by far the single most important innovation (cf. Rundell 2006b: 741). ‘Usage cannot be invented, it can only be recorded’ (Introduction, xv). With the purpose proclaimed on the dust jacket of ‘HELPING LEARNERS WITH REAL ENGLISH’, absolute editorial faith was placed in corpus evidence, and the frequency information dictated the selection of headword items, the discrimination and arrangement of senses, and the identification of grammatical and lexical patterns. Example sentences were basically drawn from the corpus rather than being invented for the purpose of the dictionary. Definition was made universally in ordinary prose – the full sentence definition (FSD) – without recourse to special dictionary conventions or a restricted defining vocabulary. Grammar codes were aligned in the Extra Column to the right of the main entry, and the grammatical terms were explained in boxes and scattered in their alphabetical places. Information on meaning relations (synonymy, antonymy and hyponymy) was also presented in the Extra Column. Pronunciation was represented in a unique way with superscript numbers indicating the range of variations of vowels and some consonants (Kojima et al. 1989: 152). COBUILD1 will be remembered as an important milestone in the history of lexicography. The epoch-making dictionary is not free from criticisms (see Section 3.6 Examples, for instance). However, it is significant that the dictionary has introduced a new look at the language and fresh perspectives on lexicography with the powerful corpus tool. The revision of LDOCE was based on the reactions of users to the first edition, academic reviews, and the publisher’s international user research. As a result, the usefulness of the defining vocabulary was confirmed. The complicated grammar codes were replaced with more transparent ones (see Table 4.5.2). The Longman Citation Corpus was consulted and many examples were based on the corpus. Special attention was given to collocations, which were indicated in 192
Table 4.5.2 Indication of ‘want+object+to-infinitive’ in major EFL dictionaries* OALD
LDOCE
CALD
COBUILD
MED
MWALED
ISED (1942) vt. & i. ❷ (P . . . 3) ALD2 (1963) v.t. & i. 2. (VP . . . 3) 1/e (1978) CULD (1980) v [Wv6] 1 [ . . . V3 . . . ] (not usu used with is, was etc and – ing (defs 1, 2): not used with is, was etc and – ing (defs 3, 4)) 1 vt
4/e (1989) v 1 [ . . . Tnt no passive . . . ]
2/e (1987) v [not usu. in progressive forms] 1 [T] . . . [+obj+to-v] He wants you to wait here.*
5/e (1995) v 1 . . . [V.n to inf] She wants me to go with her.
3/e (1995) v [not usually in progressive] 1 . . . [T] . . . want sb to do sth I don’t want Linda to hear about this.
193
6/e (2000) verb (not usually used in the progressive tenses) . . . 1 . . . [VNtoinf] Do you want me to help?
Monolingual Learners’ Dictionaries – Where Now?
3/e (1974) vt, vi 2 [VP . . . 17]
1/e (1987) 2 . . . V + O + to-INF: NO IMPER . . .
CIDE (1995) want (obj) . . . v . . . Do you want me to take you to the station? [T + obj + to infinitive]
2/e (1995) 1 . . . VB: no cont, no passive . . . V n to-inf . . .
3/e (2001) 1 . . . VB: no cont, no passive . . . V n to-inf . . .
Continued
OALD
LDOCE
CALD
4/e (2003) v [not usually in progressive] 2 . . . [T] . . . want sb to do sth I want you to find out what they’re planning. 7/e (2005) verb (not usually used in the progressive tenses) . . . 1 . . . [VN to inf] Do you want me to help?
2/e (2005) verb [T] 1 . . . [+ obj + to infinitive] Do you want me to take you to the station?
8/e (2010) 3/e (2008) 5/e (2008) verb [T] (not usually verb [T] 1 . . . [+ OBJ + to v [not usually in used in the INFINITIVE] Do you want progressive] 2 . . . [T] progressive tenses) me to take you to the . . . want sb to do sth . . . 1 . . . ~ sb/sth to I want you to find out station? what they’re planning. do sth Do you want me to help?
COBUILD
MED
4/e (2003) 1 . . . VERB: no cont, no passive . . . V n to-inf . . .
1/e (2002) . . . verb [T] . . . 1 . . . want sb/sth to do sth Her parents didn’t want her to marry him.
5/e (2006) 1 . . . VERB: no cont, no passive . . . V n to-inf . . .
2/e (2006)
6/e (2009)** 1 VERB [no cont, no passive] . . . [v n to-inf] They began to want their father to be the same as other daddies.
Note: *When a grammar code is attached to an example, the example is provided. **COBUILD6 abolishes the extra column, placing a grammar code in front of the corresponding example.
MWALED
1/e (2008) verb 3 not used in progressive tenses [+ obj]
The Bloomsbury Companion to Lexicography
194
Table 4.5.2 Continued
Monolingual Learners’ Dictionaries – Where Now?
bold in the examples. In addition to 471 Usage Notes, 20 Language Notes were newly incorporated to give guidance on pragmatics in particular. Two years later, in 1989, OALD was revised, using the conventional method. The layout was greatly improved. The indication of verb patterns was dramatically changed. The patterns were reduced to 32 from 51 in OALD3. The complicated alpha-numeric codes were replaced with mnemonic ones (see Table 4.5.2) together with the indication of complementation. The definitions were made easier (Takahashi et al. 1992: 196) but still difficult without using a defining vocabulary. The definition offered abundant information on selectional restrictions and the examples showed many collocations. Two hundred ‘Notes on Usage’ were newly incorporated. The location of stress was indicated on all compounds and basically all idioms (ibid. 78–80). The EFL dictionary scene at this period was vibrant, with the three distinct dictionaries competing head-to-head with each other: the conservative OALD4 (introspective with phrase definition and invented examples) and the revolutionary COBUILD1 (subjective with FSDs and corpus-based examples) at extremes and the moderate LDOCE2 (linguistic databases referred to and defining vocabulary employed) in the middle. The rich variety offered to users choice and the opportunity to learn from comparison (Yamada 2010: 150).
2.4 The Fourth Period (1995 onwards): Check and Convergence The year 1995 witnessed the revisions and the publication of the “Big Four” EFL dictionaries: CIDE, COBUILD2, OALD5 and LDOCE3 (in order of publication). Macroscopically, it is in this period that the genre of EFL dictionaries has turned in the direction of ‘convergence’ (Rundell 2006b) in the names of corpus-basis and user-friendliness. All four dictionaries claim corpus-basis. In competition to the COBUILD corpus, a balanced corpus, the British National Corpus, was developed, on which OALD5 and LDOCE3 were based. It was the natural course that frequency information took precedence and that corpus-based examples became dominant (see Section 3.6 Examples for detail). Informed partly by the results of increasing user studies, user-friendliness was pursued in the direction of ‘easier access and more lucid presentation’ (McArthur 1992: 594). Navigational aids (‘guide words’ [CIDE3] and ‘signposts’ [LDOCE3]) were newly incorporated (see Section 3.3 Signposts and Menus for detail). To increase lucidity, the idea of a defining vocabulary (initiated by LDOCE1) and the FSD (by COBUILD1) proliferated to other dictionaries. These features are not free from problems in themselves as detailed below (3.4 Defining Vocabulary, and 3.5 Defining Style). It appears that user-friendliness has gone too far, considering the intended audience – advanced students and 195
The Bloomsbury Companion to Lexicography
teachers. Codes and abbreviations came to be spelled out (e.g. see Table 4.5.2). Especially in the print medium, it is problematic that lucidity was realized in a space-wasting manner, involving repetition and redundancy. The above may be too simplistic an observation, but it cannot be denied that EFL dictionaries have been coming closer to each other than ever before in information content and structure. Apart from the obvious fact that the corpus is an influential, huge game changer, dictionaries tend to jump on the bandwagon of rivals’ successful features, which has unfortunately deprived the dictionary scene of the pre-1995 variety and individuality. This is not good in this age when all EFL dictionaries are available online free of charge to be consulted for comparison. Undoubtedly, there have been incremental improvements and revisions (which are of crucial importance), but, overall, mould-breaking innovations are long overdue. With shortening revision intervals, the differences between two consecutive editions have been diminishing. Is this proof of EFL dictionaries reaching full maturity, in the print medium at least?
2.5 The Fifth Period (1990s–): Going Electronic The ongoing fifth period begins from the early 1990s (overlapping with the last part of the third period and the fourth period) when dictionaries were made widely available in electronic format. The electronic medium opened up exciting new possibilities in dictionary presentation and consultation. There are three types of electronic dictionary: hardware, software and web-based (Koike et al. 2003: 662–3). This section focuses on the first handheld electronic dictionaries which are practically an East Asian phenomenon (for the other two, see Chapter 4.2 Researching the use of electronic dictionaries). In Japan, the first handheld machine to include an entire print dictionary was SII’s TR700 (1992), based on LDOCE2 (Sekiyama 2007: 241). The current model includes over 100 references4 and other books,5 costing about 450 USD. Such advantages have been brought about: ease, speed, flexibility and exhaustiveness of reference, portability, versatility, consultation self-sufficiency and multimedia information. However, there are such shortcomings as the too-small screen, difficulty of using several specific functions, and non-customization (in content and functions). According to my classroom survey, the use of the handheld electronic dictionary exceeded that of the print one in 2004–5; the usage rate of the former topped 80 per cent in 2006. However, several functions are severely underutilized (Yamada 2009). The electronic medium has not caused a paradigm shift yet. Electronic dictionaries are still largely based on print dictionaries with a little additional information and electronically enhanced search methods. This has to change. Within the space constraints, print EFL dictionaries stand on a delicate balance carefully 196
Monolingual Learners’ Dictionaries – Where Now?
worked out to satisfy conflicting requirements and demands (e.g. ‘“dual-track” approach’ by MED1 [Introduction, v]). The electronic vehicle has virtually lifted the space restriction. Now corpus data is here. However, vast amounts of information have to be properly managed and presented in an accessible way. What makes sense in the print dictionary does not necessarily do so in the electronic one. Consequently, the shift in media brings about change to dictionary education, criticism, business model, etc.
3 Important Innovations and Features This section deals with six important information categories and features of EFL dictionaries: frequency information, grammar, signposts and menus, defining vocabulary, defining style, and examples. (See also Chapter 4.1 and Chapter 4.9.)
3.1 Frequency Information After the corpus-based COBUILD1, frequency information has been extensively used throughout the compilation stages. Through the analysis of corpus material, frequent enough vocabulary items, meanings and patterns are identified, entered and arranged in frequency order.6 However, in the finished product, a considerable amount of the information is inevitably lost in the conventional linear style of presentation. There is no knowing how frequent the first sense of a word is in comparison with the second, for example. Conscious efforts have been made to overcome this problem. COBUILD2 introduced the five-level Frequency Bands to indicate the relative importance of words. The most important 14,700 or so words are given black diamonds in the Extra Column, according to their frequency in the Bank of English. Now based on the Longman Corpus Network, LDOCE3 presented frequency information in two ways. First, the dictionary distinguishes between written and spoken English and indicates the most frequent 3,000 headwords in three levels in each of the categories. For instance, reasonable is indicated with ‘S1’ and ‘W2’ in the margin (the former on top of the latter), meaning the word is among the most frequently used 1,000 words of spoken English and among the second most frequently used 1,000 words of written English. Over 150 eye-catching bar graphs are introduced to represent relative frequencies of synonyms, grammatical and collocational patterns, and the distribution of words between spoken and written media and between British and American English. OALD7 introduced the Oxford 3000TM, which has a double function: as a defining vocabulary and as a starting point for vocabulary expansion (p. R99). The 3,000 words were selected on three grounds: frequency from the analysis 197
The Bloomsbury Companion to Lexicography
of corpora, range of use in different text types, and familiarity to most users of English; the British National Corpus, the Oxford Corpus Collection, and over 70 experts in teaching and language study were consulted (ibid.). The words of the Oxford 3000 are shown in larger type and with a key symbol. It is noteworthy that, in addition to headwords, the key senses are given a small key symbol in OALD8. However, it is regrettable that the criteria for identifying such senses are stated nowhere in the dictionary (Yamada et al. 2012: 23), let alone relative frequencies among the key senses.
3.2 Grammar Presentation of grammatical information has undergone a drastic change in the course of the development of EFL dictionaries – from opaque to transparent: codes through abbreviations to spell-outs. Table 4.5.2 above chronologically shows how major EFL dictionaries indicate the pattern of ‘want+object+to-infinitive’. The roots of grammar codes can be traced back to Palmer’s GEW,7 a precursor to EFL dictionaries. The scheme of this productively orientated dictionary ‘was later to be applied, with minor or major variations, in the first four editions of ALD and in various rival compilations’ (Cowie 1999: 37). As can be seen in Table 4.5.2, however, the ‘descriptively powerful’ model was essentially ‘difficult to learn’ (Rundell 2006b: 740). Improving upon OALD’s, LDOCE1 adopted a more systematic8 and thus ‘user-friendly’ coding system. With this system not winning it the approval of users (p. F),9 LDOCE2 turned sharply to transparency by abandoning the codes for grammatical patterns. Rundell (2006b: 741) summarizes the recent development as follows: ‘More recently, the emphasis has shifted toward a simpler, surface-grammar model which – while sacrificing some of the delicacy of earlier systems – assumes very little grammatical knowledge on the part of users’. This shift, taking place markedly in the mid-1990s, manifested itself strikingly in LDOCE3 (see Table 4.5.2). The second edition was the first among EFL dictionaries to place a grammatical pattern in front of the example for instant recognition.10 To ensure understanding, the third edition went for partial spell-out, which is closely associated with the productively geared LLA1’s ‘propositional forms’ (p. F10). Rundell (2006b: 741) also offers the following observation: ‘The economy of the older systems allows them to encode every possible pattern for a given meaning, regardless of its frequency . . . The current approach . . . emphasizes what is typical over what is possible’ (emphasis original).11 This trend is related to the availability of corpus data to identify what is typical and inevitably with space limitation. However, if the EFL dictionary is truly to help users in their productive activities as well, the dictionary should present much more than what is typical. 198
Monolingual Learners’ Dictionaries – Where Now?
3.3 Signposts12 and Menus As an EFL dictionary consists of ‘the sheer mass of condensed target language text in monolingual entries’, Scholfield (1996) points out as a long-standing, major challenge that the user has to meet the task of ‘wading through this picking out the numbered definitions and checking each one to find the right one’. Publishers responded to this problem with signposts and menus from 1995 onwards. The relevant dictionaries and editions are as follows (Table 4.5.3). With emphasis on ‘Fast access’ (Introduction, xi), LDOCE3 offers menus and signposts. The guide to the dictionary explains how an entry is organized and how the menu and signposts help reference: In some of the longer entries, meanings that are closely related to each other are grouped together in ‘paragraphs’, or sections in the entry. A menu at the beginning of the entry tells you the paragraph headings, so that you can easily find the section that contains the sense that you want. All these senses begin on new lines, and they have signposts where these are helpful. (LDOCE3, xvii) The menu for shoot looks as follows: ① GUN/WEAPONS ② SPORT ③ SPEAK/TALK/ASK
④ QUICK/SUDDEN ⑤ OTHER MEANINGS
Each item on the menu is repeated at the beginning of an entry. The first two subsume the following signposts and a phrase13: ① GUN/WEAPONS 1 ▶KILL/INJURE◀ 2 ▶FIRE A GUN◀ 3 ▶BIRDS/ANIMALS◀ ② SPORT 4 [No signpost] 5 shoot pool/billiards etc Table 4.5.3 EFL dictionaries adopting signposts and menus Guide words
Short cuts
Signposts & Menus
Menus
CIDE (1995) CALD2 (2005) CALD3 (2008)
OALD6 (2000) OALD7 (2005) OALD8 (2010)
LDOCE3 (1995) LDOCE4 (2003) LDOCE5 (2008)
MED1 (2002) MED2 (2006)
199
The Bloomsbury Companion to Lexicography
The signpost ‘may be a synonym, a short definition, or the typical subject or object of a verb’ (LDOCE3, ‘Guide to the Dictionary’, xvii). In view of users having difficulty navigating long entries, CIDE drastically changed the macrostructure – building each entry around one core meaning instead of cluttering entries with numbered senses. The guide word, provided after the headword (e.g. bear ANIMAL and bear CARRY ), helps users to distinguish between senses of the same word. MED1 (2002) only makes use of menus. Entries with five or more senses are provided with a menu at the top (‘Using your Dictionary’, xi). The one for shoot looks like this: 1 fire gun 2 in sports 3 move suddenly & quickly
4 take photographs etc. 5 put drug in body + PHRASES
Influenced by English-Japanese dictionaries, editor-in-chief Michael Rundell (personal communication) opted for menus for the following reasons: with the information all at the top of the entry, it is easier to see the full picture; since the layout of the menus usually allows lexicographers a little more space than is available for signposts, the clues for users are a little more likely to be helpful. The signpost is a welcome feature (Ichikawa et al. 2005: 28–9) and its effect is empirically supported. Those dictionaries with signposts (LDOCE3 and CIDE) are conducive to better and quicker reference than those without (COBUILD2 and OALD5) (Bogaards 1998: 560). However, the signpost is not free from problems. There are three from Yamada (2010: 154–6). First, while offering practical help, signposts lack system and consistency. Akasu et al. (1996: 38) question the obscure selection process of CIDE’s guide words and the occasional mismatches between the heading and the guide word in parts-of-speech. Urata et al. (1999: 78–9) identify six categories for LDOCE3’s signposts: synonyms; short definitions; hypernyms; typical subjects; typical objects; context, purpose etc. The second problem is related to this miscellany, which may make it difficult for users to establish a systematic search rhythm (Yamada 2010: 155). This is aggravated by the mixing of signposts with phrases by the dictionary (see Note 12). The last problem is that some signposts are redundant. Urata et al. (1999: 78) observe that they just repeat part of the definition or summarize the definition, with reference to LDOCE3 (e.g. stir 3 ▶MOVE SLIGHTLY◀ . . . b) to move slightly). This cannot be said to be an efficient use of space.
200
Monolingual Learners’ Dictionaries – Where Now?
3.4 Defining Vocabulary Dictionary definition has to observe this basic rule: a concept whose content has a certain complexity should be described in a dictionary by means of other less complex concepts (Svensén 1993: 135). In addition, an EFL dictionary is faced with the demanding task of explaining the meaning of the user’s L2 word in the L2 in an accessible way for foreign students. Finding the answer in a defining vocabulary, LDOCE1 (1978) developed the Longman Defining Vocabulary. It is based on West’s General Service List of English Words and other lists and sources (p. ix). The vocabulary is listed in the back matter: ‘List of words used in the dictionary’ (1283–8). Those words outside the defining vocabulary were given in capitals in definitions, so that the user can check them.14 Summers (LDOCE2, F8) reports, on the basis of their international user research: the use of the 2,000 word Longman Defining Vocabulary is the single most helpful feature. Jackson (2002: 130) counts this feature as ‘the most significant’ among a number of the improvements and innovations introduced by the dictionary. The idea of a defining vocabulary was copied by other dictionaries (CIDE [1995], OALD6 [2000], MED1 [2002], and their ensuing editions) and their own versions were used. It is true that a defining vocabulary significantly contributes to lowering the user’s psychological barrier to confronting all L2 dictionary texts and practically eliminates many inconveniences of having to go for a second semantic search resulting from the first. In fact, Herbst (1986) reports that students rated LDOCE1 as the most comprehensible. Certainly, the definitions of LDOCE2 often come across as more approachable than those of OALD4, which involves difficult vocabulary items without being restricted by a defining vocabulary (compare the entries for dare, for example). Svensén (1993: 137) points out that a defining vocabulary allows a systematic description of meaning, benefiting both users and lexicographers. The use of a defining vocabulary has the advantage that one can verify that a concept with a complex content is in fact being defined by means of less complex concepts. One can also define related concepts more consistently: the user can be sure that the concept x is always represented in definitions by the word y, and conversely that the concept x is meant whenever the word y is used. Also, Quirk reports that a restriction in definition language actually breathed new life into semantic analysis: the strict use of the defining vocabulary has in many cases resulted in a fresh and revealing semantic analysis (LDOCE1, Preface, vii). On the other hand, several shortcomings are suggested for both dictionary users and makers. Kawamura (2009: 87–9) identifies six difficulties: (1) Inclusion of lexical items beyond the expected proficiency of EFL dictionary users 201
The Bloomsbury Companion to Lexicography
(2) (3) (4) (5) (6)
Lengthy definitions Unnatural definitions15 Senses to be used are not controlled16 Actual size greater than advertised17 Actual use of defining vocabulary is unclear18
Importantly, a problem of accuracy is raised (Fox 1989: 155, Allen 1996: 47, etc.). As an example, Svensén (1993: 137) cites the definition of cataract from LDOCE2: ‘a diseased growth on the eye causing a gradual loss of sight’. Since LDOCE1, efforts have been made not to detract from lucidity in the execution of the defining vocabulary: a rigorous set of principles was established to ensure that only the most ‘central’ meanings of these 2,000 words, and only easily understood derivatives, were used (p. ix). On the other hand, Rundell (1998: 319) describes the nature of the defining vocabulary and those who work within it: ‘Inevitably, the high-frequency words that make up any DV list are often highly polysemous, and lexicographers have not always resisted the temptation to use such words in non-central or (worse) idiomatic meanings.’ Fox sees a defining vocabulary as putting ‘“arbitrary” constraints on lexicographers’ freedom to define’ (Fox [1989: 155] quoted in Rundell [1998: 319]). Béjoint (1994: 69) argues that ‘the use of a restricted vocabulary blocks the chain of definition’ and that ‘[t]his is clearly a case of conflict between the dictionary as a quick reference tool and as an instrument for self-teaching.’
3.5 Defining Style As in the native speaker’s dictionary, in the EFL dictionary the definition used to be made using a phrase and in a form substitutable for the definiendum. The selectional restrictions and possible objects, which are outside the semantic scope of the definiendum, are marked off by parentheses. When the usage is provided (rather than a definition), it is also given in parentheses, as often the case with functional words: 4 (that is) part in relation to (a whole or all) a (after expressions of quantity): 2 pounds of sugar|2 miles of bad road|much of the night . . . (s.v. ‘of’ in LDOCE1) 17 (after nouns related to verbs): a lover of music (=someone who loves . . .). (ibid.) It cannot be denied, however, that an effort to pack much information into a limited space involved special grammar and dictionary-ese and produced some difficult-to-understand definitions.
202
Monolingual Learners’ Dictionaries – Where Now?
In response, the FSD, initiated by COBUILD1, has spread among EFL dictionaries.19 Sounding as if the teacher is talking to students in the classroom setting, the definition comes across as approachable. Basically, no prior knowledge or special training is required to comprehend FSDs. Another plus is that the FSD is informative, essentially providing contextual information together with semantic information. The FSD partly takes on the role assumed by the example. Also the FSD can incorporate extra information with a degree of flexibility not allowed for by the traditional phrase definition. This innovative defining method meets with several criticisms. A typical FSD comprises two clauses: the subordinate clause, often beginning with if, and the main clause. The if-clause tries to describe the context in which the definiendum occurs and the main clause deals with the meaning.20 The FSD works very well for syntactically simple items with a distinct meaning (e.g. ‘familiar to’ and ‘familiar with’ at Senses 1 and 2 of familiar). However, this approach involves several problems. Rundell (2006a: 330–1) speaks of ‘overspecification’ as an intrinsic weakness: ‘the requirement of specifying lexical and syntactic environments often leads to defining statements which appear to exclude a wide range of completely regular behaviours’ (ibid. 331). He observes that COBUILD3’s definition of cheat incorporates only the pattern of ‘cheat someone out of something’ but excludes ‘cheat someone of something’ and ‘cheat someone’, which are equally frequent. Rundell also discusses the problem of ‘increased complexity’ of the supposedly accessible defining style. He warns of the pitfall of ‘go[ing] from the frying pan of unpacking a dense, formulaic definition to the fire of processing something two or three times longer’, citing the definition of retreat 5.4 in COBUILD1 (2006: 328). The natural prose form, which may be good for understanding, can have detrimental effects on actual consultation – quick reference and information retrieval. While the traditional phrase definition concentrated on meaning, giving it in the substitutable form for the definiendum, the FSD attempts to incorporate not only semantic but also contextual and other information without discrimination. Compare the following pairs of definitions of erudite and gastric: erudite . . . adj fml (of a person or book) full of learning; SCHOLARLY. (LDOCE2) If you describe someone as erudite, you mean that they have or show great academic knowledge. You can also use erudite to describe something such as a book or a style of writing; a formal word. (COBUILD2, emphasis added) gastric . . . adj [attrib] (medical) of the stomach. (OALD4) You use gastric to describe processes, pain, or illness that occur in someone’s stomach; a medical term. (COBUILD2, emphasis added) 203
The Bloomsbury Companion to Lexicography
While LDOCE2’s definition of erudite marks off the selectional restriction with the use of parentheses from the rest of the text that deals with the meaning of the headword, that of COBUILD2 does not.21 Furthermore, in this dictionary’s definition of gastric the semantic information is deferred until the end. There is a danger that those users seeking semantic information only are distracted and put off retrieving the relevant information. Another problem with the prose style is that the FSD blurs the boundary between metalanguage and language, inflicting the distinction onto the users. For example, look at the entry of truncated in COBUILD2. For non-native users, there is no knowing that ‘a truncated version of . . .’ can be used in their actual production of English until they get to the first example that includes the phrase:22 A truncated version of something is one that has been shortened. The review body has produced a truncated version of its annual report.
3.6 Examples The value of examples in EFL dictionaries is enormous. They help users in their reception and production of English texts and also in the consultation of the dictionary. Roughly, there are two types of examples: made-up and corpus-derived. The traditional made-up examples are invented by lexicographers to suit particular dictionary purposes. The examples are conveniently tailored to be succinct, multipurpose, contrastive, and self-contained. Fox, then at Cobuild, doubts the native speaker’s ability to produce natural examples. Citing ‘. . . saluted his friend with a wave of his hand’ as a slightly contrived example to illustrate salute in the sense of ‘greet someone’, she argues that ‘we cannot trust native speakers to invent sentences except in a proper communicative context’ (Fox 1987: 143–4). Cobuild takes a distinctive approach, with almost all examples directly taken from the corpus.23 Sinclair asserts that ‘usage cannot be invented, it can only be recorded’ (COBUILD1, xv). He dismisses invented examples as follows: ‘invented examples are really part of the explanations . . . They give no reliable guide to composition in English and would be very misleading if applied to that task. They do not say “This is how the word is used”’ (ibid.). Fox challenges the self-containedness and excessive informativeness of an invented example. The Cobuild editor takes a global view of examples: The necessity of examples to fit into coherent text is important because language is not a series of isolated sentences, and students should not be 204
Monolingual Learners’ Dictionaries – Where Now?
encouraged to think that it is. We should be much more aware than we have been in the past of the pitfalls of giving these fully-formed isolated sentences as examples. (Fox 1987: 141) She warns of a possible danger of providing students with model sentences that do not fit naturally into a flow of actual discourse by offering invented examples, grammatically well formed and with far too much information (ibid. 141–2). However, COBUILD’s authentic, lexically challenging, open-ended examples in turn come under criticism. Hausmann and Gorbahn (1989: 45) criticize the excessively corpus-dependent examples as ‘not didactically oriented’ on the following grounds: (a) strange and demanding, read out of context (b) distracting (complex) (c) idiosyncratic (d) distracting (abstract and lengthy)24 (e) circulatory (f) not informative (g) dangerous (leading to the production of unnatural English) Pointing out that it is not a matter of ‘a simple choice between the authentic and the invented’, Rundell (1998: 334–5) goes on to argue in favour of corpora as primary sources of examples: ‘Most lexicographers would probably now agree that, where the corpus provides natural and typical examples that clearly illustrate the points that need to be made, there is no conceivable reason for not using them.’ In fact, since COBUILD1, corpus-basis has become the accepted norm among EFL dictionaries, with varying degrees of editing.
4 Electronic Possibilities This medium can solve many of the problems raised so far and others experienced by print dictionaries and can provide new dimensions for dictionary presentation and consultation. The electronic medium virtually removes space constraints. Much more information can be accommodated, for example, glosses for currently undifferentiated synonyms and set phrases, examples for unillustrated entries, etc. However, it has to be remembered that the dictionary remains a tool for quick reference. Information should be hierarchically arranged or options should be incorporated with a quick shift from one mode to another at the user’s will, for example, from the sense arrangement based on sense development to that 205
The Bloomsbury Companion to Lexicography
based on frequency, from the substitutable phrase definition to the FSD, from the definition within a defining vocabulary to one without,25 etc. Flexible presentation can be built in to help consultation. It will be possible, for example, to type commands to facilitate quick reference, so that the relevant parts of the definition will be highlighted according to the look-up strategies adopted (e.g. selectional restrictions, collocations, etc.). If the miscellany of signposts prevents the user from getting into a consistent look-up rhythm, the signposts can be eliminated at the press of a button or be changed to the user’s L1.26 For the benefits of both users and lexicographers, Yamada (2011) suggests an optional entry layout for the electronic dictionary: (1) breaking a definition into meaningful units; (2) diagrammatically showing how they relate to each other; and (3) putting examples to the corresponding semantic units. For example, LDOCE5’s entry of mitigate can be dismembered and represented as below: to make a situation or the effects of something less unpleasant, harmful, or serious . . . : Measures need to be taken to mitigate the environmental effects of burning more coal. ↓ to make a situation less unpleasant less harmful less serious to make the effects of something less unpleasant less harmful less serious Measures need to be taken to mitigate the environmental effects of burning more coal. Though the layout may need refinement, the user can readily understand the structure of the definition and which part of the definition is illustrated by the example (and the exact part of the definition to be applied to the understanding of the meaning of mitigate in the example). Also, the layout helps the lexicographer to double check the wording of a definition and the suitability of examples. In the electronic format, it is ideal for each semantic component to be illustrated.
5 Conclusion Generally, EFL dictionaries can be said to have done a good job of reconciling irreconcilables by ‘striking a balance between several conflicting requirements’ 206
Monolingual Learners’ Dictionaries – Where Now?
and under ‘commercial pressures’ (Cowie 1990: 676). The dictionaries have had two important wings that can dramatically accelerate their advance: corpora and the electronic medium. Whereas corpora have been put to good use, the new medium has not been fully exploited yet. It has the potential for flexibly and harmoniously solving many difficulties and incompatibilities faced by the print dictionary, for the first time in history. Much more information should be added, but has to be edited and presented systematically up to the standards applied to the print dictionary. The building of a large-scale electronic dictionary with options and customization built in will require substantial resources. But corpus data and technology are already in place. With the electronic potential being fulfilled, variety should return to the EFL dictionary scene.
Notes 1. The author would like to express his gratitude to Professor Kazuo Dohi for his valuable comments in the preparation of the manuscript. 2. For the abbreviations of dictionaries, please refer to ‘Dictionaries Cited and Their Abbreviations’ in the Reference. 3. The unique one-entry-per-one-core-meaning structure of the dictionary also necessitated the Phrase Index. 4. The following EFL dictionaries are often included in the models for college students and above: LDOCE5, LLA2, OALD8, Oxford Collocations Dictionary for Students of English (2/e, 2009), and Oxford Learner’s Wordfinder Dictionary (1997), etc. 5. SII’s SR-E10000 (2005) comes with Wordbank (5 million-word database, based on the Bank of English) and SR-G10001 (2009) with Oxford Sentence Dictionary, consisting of 1 million examples from the Oxford English Corpus (2 billion words) (Sekiyama 2010). 6. This benefits the user’s receptive needs – entries and senses are organized in order of the likelihood with which they encounter them in reading. However, Yamada (2010: 156–7) points out the inconsistency brought about by frequency-based sense arrangement – it may or may not correspond to the sense development of a word. Worse, there are cases where only the frequent figurative sense is entered without the not-frequent-enough original sense being included, when the latter will help a user understand and memorize the former (compare OALD4 and LDOCE3 for the treatment of linchpin). Ultimately, it is necessary to indicate the reason for non-entry: low frequency or non-existence. 7. This dictionary indicates the pattern of ‘VERB × DIRECT OBJECT × “TO” × INFINITIVE’ as ‘V.P. 17.’ 8. Cowie (1990: 688) praises the coding system as ‘impressively systematic’ because the system consistently assigns ‘3’ to an infinitive construction, for example. 9. Bejóint (1981: 16, 19) concludes that grammar codes are underutilized, discovering that a disappointing 55 per cent of the university students under survey did not use the codes at all. 10. Lighthouse English-Japanese Dictionary (1/e, 1984) had already given a sentence pattern to the example. The pattern followed the example.
207
The Bloomsbury Companion to Lexicography 11. The comments on OALD4’s grammar codes by Takahashi et al. (1992: 198) are important to note: ‘The new codes, which are far easier to remember, not only show how each of the patterns is composed, but also distinguish between sentences which superficially look the same.’ 12. Longman uses ‘signposts’, Cambridge ‘guide words’, and Oxford ‘short cuts’. In this chapter, ‘signpost’ is used as an umbrella term. 13. Herbst (1996: 350) criticizes this arrangement as ‘inconsistent (and aesthetically disturbing)’ because the signposts (bold in capitals, sandwiched with black triangles) are mixed with the phrases (heading in bold) at the beginning of each sub-entry. In response, the subsequent editions introduced colour printing: signposts highlighted in blue (4/e, 2003) and in white against the blue background (5/e, 2008). 14. In LDOCE1, all examples are written in its defining vocabulary (p. ix). 15. Hanks (1987: 119) states COBUILD1’s stance: ‘No attempt was made to set up a “restricted defining vocabulary” of a fixed number of words. Such vocabularies are a potential source of distortion, especially if they are not accompanied by equally strict controls on the meanings of each word used and the syntactic structures in which they are used.’ 16. Since OALD7, the words in the Oxford 3,000 have been controlled in terms of their senses. A word used in a less frequent sense is capitalized and its sense is identified (Komuro et al. 2006: 82–3, Yamada et al. 2012: 21). 17. See Higashi et al. (1979); Jansen et al. (1987). 18. The widespread rule of printing the words outside the defining vocabulary in small capitals is not strictly observed. For example, LDOCE3 makes proper nouns exceptions (p. B12) and OALD5 colour terms (p. 1417). The former also does not use small capitals for the words whose entry and definition are very close by (p. B12). 19. Only the COBUILD series of dictionaries employs the FSD universally. 20. This should be taught first. If students try to understand the if-definition by translating it into their L1, the subordinate clause will fail them. 21. While Hanks (1987: 116) criticizes the use of parentheses to indicate selectional restrictions in the conventional definition, the user survey conducted by Wehmeier (2000) reveals that users actually are not bothered by parentheses. 22. The repetition of the phrase is also a problem. 23. ‘Very minor changes’ have been made to citations ‘in order to remove unnecessary distracting information’; ‘Only on very rare occasions have we composed an example because there is no suitable one in the corpus’ (COBUILD1, xv). 24. Hausmann and Gorbahn (ibid.) cite the following example to support this point: To have access to the truth and so to pass beyond the region of mere opinion is to take great risks (s.v. ‘region 3’). 25. It is high time the defining vocabulary was abolished and that users were exposed to natural English, reading dictionary definitions (Yamada 2010: 162). The electronic dictionary is equipped with a function of double-clicking an unknown word to check its meaning. Japan’s handheld electronic dictionary offers a similar instant check with an included English-Japanese dictionary. (As for its advantages in the print age) not only does the defining vocabulary severely reduce the lexicographer’s defining power but can also give the user inappropriate input. Svensén (1993: 137–8) points out that the use of a defining vocabulary can produce the kind of definition that few teachers want their students to emulate (e.g. malnutrition and manic depressive in LDOCE2). 26. The L1 signposts are called onto the screen by choice. Since they are all semantic and consistent, they will not get in the way of establishing a search rhythm. Bilingualization of varying kinds will be a key in the development of EFL dictionaries towards customization.
208
Monolingual Learners’ Dictionaries – Where Now?
References Dictionaries Cited and Their Abbreviations The Advanced Learner’s Dictionary of Current English, 2nd edition (ALD2) (1963) Edited by Hornby, A. S., Gatenby, E. V. and Wakefield, H. London: Oxford University Press. Cambridge International Dictionary of English (CIDE) (1995) Edited by Procter, Paul. Cambridge: Cambridge University Press. Cambridge Advanced Learner’s Dictionary, 2nd edition (CALD2) (2005) Edited by Walter, Elizabeth. Cambridge: Cambridge University Press. Cambridge Advanced Learner’s Dictionary, 3rd edition (CALD3) (2008) Edited by Walter, Elizabeth. Cambridge: Cambridge University Press. Chamber’s Universal Learners’ Dictionary, International Students’ Edition (CULD) (1980) Edited by Kirkpatrick, M. E. Edinburgh: Chambers. Collins COBUILD Advanced Learner’s English Dictionary, 4th edition (COBUILD4) (2003) Edited by Sinclair, John. Glasgow: HarperCollins. Collins COBUILD Advanced Learner’s English Dictionary, 5th edition (COBUILD5) (2006) Edited by Sinclair, John. Glasgow: HarperCollins. Collins COBUILD Advanced Dictionary of English, 6th edition (COBUILD6) (2009) Edited by Sinclair, John. Glasgow: HarperCollins/Boston: Heinle Cengage Learning. Collins COBUILD English Dictionary, 2nd edition (COBUILD2) (1995) Edited by Sinclair, John. London: HarperCollins. Collins COBUILD English Dictionary for Advanced Learners, 3rd edition (COBUILD3) (2000) Edited by Sinclair, John. London: HarperCollins. Collins COBUILD English Language Dictionary, 1st edition (COBUILD1) (1987) Edited by Sinclair, John. London and Glasgow: Collins. A Grammar of English Words (GEW) (1938) Edited by Palmer, Harold, E. London and Harlow: Longmans Green. Idiomatic and Syntactic English Dictionary (ISED) (1942) Edited by Hornby, A. S., Gatenby, E. V. and Wakefield, H. Tokyo: Kaitakusha. Lighthouse English-Japanese Dictionary, 1st edition (1984) Edited by Takebayashi, Shigeru and Kojima, Yoshiro. Tokyo: Kenkyusha. Longman Dictionary of Contemporary English, 1st edition (LDOCE1) (1978) Edited by Procter, Paul. Harlow: Longman. Longman Dictionary of Contemporary English, 2nd edition (LDOCE2) (1987) Edited by Summers, Della. Harlow: Longman. Longman Dictionary of Contemporary English, 3rd edition (LDOCE3) (1995) Edited by Summers, Della. Harlow: Longman. Longman Dictionary of Contemporary English, 4th edition (LDOCE4) (2003) Edited by. Summers, Della. Harlow: Pearson Education. Longman Dictionary of Contemporary English, 5th edition (LDOCE5) (2009) Edited by Mayor, Michael. Harlow: Pearson Education. Longman Language Activator, 1st edition (LLA1) (1993) Edited by Summers, Della. Harlow: Pearson Education. Macmillan English Dictionary, 1st edition (MED1) (2002) Edited by Rundell, Michael. Oxford: Macmillan. Macmillan English Dictionary, 2nd edition (MED2) (2006) Edited by Rundell, Michael. Oxford: Macmillan. Merriam-Webster’s Advanced Learner’s English Dictionary (MWALED) (2008) Edited by Perrault, Stephan J. Springfield: Merriam-Webster.
209
The Bloomsbury Companion to Lexicography The New Method English Dictionary (NMED) (1935) Edited by West, Michael P. and Endicott, James G. London: Longmans Green. Oxford Advanced Learner’s Dictionary of Current English, 3rd edition (OALD3) (1974) Edited by Hornby, A. S. Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, 4th edition (OALD4) (1989) Edited by Cowie, A. P. Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, 5th edition (OALD5) (1995) Edited by Crowther, Jonathan. Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, 6th edition (OALD6) (2000) Edited by Wehmeier, Sally. Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, 7th edition (OALD7) (2005) Edited by Wehmeier, Sally. Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, 8th edition (OALD8) (2010) Edited by Turnbull, Joanna. Oxford: Oxford University Press.
Other References Akasu, K. et al. (1996) An Analysis of Cambridge International Dictionary of English. Lexicon 26. Tokyo: Iwasaki Linguistic Circle, 3–76. —(2001) An Analysis of the Oxford Advanced Learner’s Dictionary of Current English, 6th edition. Lexicon 31. Tokyo: Iwasaki Linguistic Circle, 1–51. Allen, R. (1996) The year of the dictionaries. English Today 46. Vol. 12/2, 41–7. Béjoint, H. (1981) The foreign student’s use of monolingual English dictionaries: a study of language needs and reference skills. Applied Linguistics 2, 207–22. —(1994) Tradition and Innovation in Modern English Dictionaries. Oxford: Clarendon Press. Bogaards, P. (1996) Dictionaries for learners of English. International Journal of Lexicography 9/4, 277–320. —(1998) Scanning long entries in learner’s dictionaries. In: T. Fontenelle et al. (eds) Actes EURALEX ’98 Proceedings, Vol. II. Liège: English and Dutch Department, University of Liège, 555–63. Cowie, A. P. (1981) Lexicography and its pedagogic applications: an introduction. Applied Linguistics 2, 203–6. —(1990) Language as words: lexicography. In: N. E. Collinge (ed.) An Encyclopedia of Language. London and New York: Routledge, chapter. 19, 671–700. —(1999) English Dictionaries for Foreign Learners: A History. Oxford: Oxford University Press. Fox, G. (1987) The case for examples. In: J. McH. Sinclair (ed.), chapter 7, 137–49. —(1989) A vocabulary for writing dictionaries. In: M. L. Tickoo (ed.) Learners’ Dictionaries: State of the Art. Singapore: SEAMEO Regional Language Centre, 153–71. Hanks, P. (1987) Definitions and explanations. In: J. McH. Sinclair (ed.), chapter 6, 116–36. Hartmann, R. R. K. (2001) Teaching and Researching Lexicography. Harlow: Pearson Education. Hausmann, F. J. and Gorbahn, A. (1989) COBUILD and LDOCE II: a comparative review. International Journal of Lexicography 2/1, 44–56. Herbst, T. (1986) Defining with a controlled defining vocabulary in foreign learners’ dictionaries. Lexicographica 2, 101–19. —(1996) On the way to the perfect learners’ dictionary: a first comparison of OALD5, LDOCE3, COBUILD2 and CIDE. International Journal of Lexicography 9/4, 321–57.
210
Monolingual Learners’ Dictionaries – Where Now? Higashi, N. et al. (1979) An analysis of Longman Dictionary of Contemporary English (Part 1). Lexicon 8. Iwasaki Linguistic Circle, 45–101. Ichikawa, Y. et al. (2005) An analysis of Longman Dictionary of Contemporary English, Fourth Edition. Lexicon 35. Tokyo: Iwasaki Linguistic Circle, 1–126. Motomichi, I. (1997) Palmer to Nihon no Eigokyoiku [Harold E. Palmer and Teaching English in Japan]. Tokyo: Taishukan. Jansen, A. et al. (1987) Controlling LDOCE’s controlled vocabulary. In: A. P. Cowie (ed.) The Dictionary and the Language Learner (Lexicographica. Series Maior 17). Tübingen: Max Niemeyer, 78–94. Jackson, H. (2002) Lexicography: An Introduction. London: Routledge. Kawamura, A. (2009) Teigigoi Saiko [The defining vocabulary revisited]. Shakai Inobeshon Kenkyu 4/1. Faculty of Social Innovation, Seijo University, 87–98. Koike, I. et al. (eds) (2003) Oyo Gengagaku Jiten [Kenkkyusha Dictionary of Applied Linguistics]. Tokyo: Kenkyusha. Kojima, Y. et al. (1989) An analysis of Collins COBUILD English Language Dictionary. Lexicon 18. Tokyo: Iwasaki Linguistic Circle, 39–158. Komuro, Y. et al. (2006) An analysis of the Oxford Advanced Learner’s Dictionary of Current English, 7th edition, with Special Reference to the CD-ROM. Lexicon 36. Tokyo: Iwasaki Linguistic Circle, 55–146. McArthur, T. (ed.) (1992) The Oxford Companion to the English Language. Oxford: Oxford University Press. Naganuma, K. (1978) The history of the Advanced Learner’s Dictionary: A. S. Hornby, ISED, and Kaitakusha, Tokyo. In: Peter Strevens (ed.) In Honour of A. S. Hornby. Oxford: Oxford University Press, 11–13. Rundell, M. (1998) Recent trends in pedagogical lexicography. International Journal of Lexicography 11/4, 315–42. —(2006a) More than one way to skin a cat: why full sentence definitions have not been universally adopted. In: Elisa Corino et al. (eds) Proceedings of the XII Euralex International Congress, 2006. Torino: Edizioni dell’Orso, 323–38. —(2006b) Learners’ dictionaries. In: Keith Brown (ed.) Elsevier Encyclopedia of Language and Linguistics, 2nd edition, Vol. 6, 739–43. Scholfield, P. (1996) Why shouldn’t monolingual dictionaries be as easy to use as bilingual ones? Longman Language Review 2. Sekiyama, K. (2007) Jisho kara Hajimeru Eigo Gakushu [Learning English with Dictionaries]. Tokyo: Shogakkan. —(2010) Keitai Kopasu to shite no Denshi Jisho: Rebun Kensaku kara OSD made [Hand-held Electronic Dictionaries as Portable Corpora: From Example Sentence Search to OSD (Oxford Sentence Dictionary)]. The 11th JACET English Lexicography Society Workshop, Tokyo University, Tokyo, 27 March 2010. Sinclair, J. McH. (1987) Looking Up. London: Collins ELT. Svensén, B. (1993) Practical Lexicography. Oxford: Oxford University Press. Takahashi, K. et al. (1992) An analysis of Oxford Advanced Learner’s Dictionary of Current English, 4th edition. Lexicon 22. Tokyo: Iwasaki Linguistic Circle, 59–200. Takebayashi, S. et al. (1982) An analysis of Chambers Universal Learners’ Dictionary. Lexicon 11. Tokyo: Iwasaki Linguistic Circle, 30–116. Urata, K. et al. (1999) An analysis of Longman Dictionary of Contemporary English, Third Edition. Lexicon 29. Tokyo: Iwasaki Linguistic Circle, 66–95. Wehmeier, S. (2000) Oxford Advanced Learner’s Dictionary, Sixth Edition – continuity and change. JACET [Japan Association College English Teachers] Kyoto Seminar, Kyoto International Conference Hall, October 28–9.
211
The Bloomsbury Companion to Lexicography Yamada, S. (2007) Problems of guide words, signposts, and short cuts in EFL dictionaries. The 5th Biennial Conference of the Asian Association for Lexicography, Chennai, India, 6–8 December. —(2009) Dictionary use by Japanese college students of English between 1997 and 2009. The 6th Biennial Conference of the Asian Association for Lexicography, Imperial Queen’s Park Hotel, Bangkok, 20–22 August. —(2010) EFL dictionary evolution: innovations and drawbacks. In: Ilan J. Kernerman and Paul Bogaards (eds) English Learners’ Dictionaries at the DSNA. Tel Aviv: K Dictionaries, 147–68. —(2011) Layout matters. Paper presented at the Dictionary Society of North America XVIII Biennial Meeting, McGill University, Montreal, 8–11 June. Yamada, S. et al. (2012) An analysis of the Oxford Advanced Learner’s Dictionary of Current English, 8th edition. Lexicon 42. Tokyo: Iwasaki Linguistic Circle, 1–67.
212
4.6
Issues in Compiling Bilingual Dictionaries Arleta Adamska-Sałaciak
Chapter Overview Preliminary Considerations Compilation Megastructure Microstructure Conclusion
213 217 219 222 228
1 Preliminary Considerations Many decisions taken during the compilation of bilingual dictionaries are essentially identical to those encountered in the preparation of monolingual ones. Consequently, whether the dictionary is to be organized alphabetically or thematically, published in print or electronically, available online for a fee or free of charge, are all questions that will only be mentioned in passing, the main focus of this chapter being on topics specific to the bilingual dictionary genre. The issues that arise in the process of making a bilingual dictionary follow, in the main, from what kind of dictionary it is to be. The coverage (i.e. the number and type of items to be included) and the resulting physical size of the publication need to be decided upon, as does the depth of individual entries, that is, the amount of information offered about a single headword. At the bottom of it all lies the most basic consideration: the target audience, a factor which has a direct bearing on such aspects as the dictionary’s scope and directionality.
213
The Bloomsbury Companion to Lexicography
1.1 Target Audience, Scope and Directionality Thanks to the fact that one of the languages dealt with in a bilingual dictionary is always known to its users, the dictionary’s audience can be immensely varied, ranging from complete beginners to very advanced students of the other language. What any particular user wants from their dictionary may, of course, vary: some people will treat it as a language-learning aid, while others – travellers, tourists – may have no interest at all in mastering the foreign language. For the latter, the dictionary can be anything from an essential tool, ensuring survival in a foreign environment, to an optional extra used – in addition to, or instead of, a phrase book – as a gesture of politeness towards the inhabitants of the place they are visiting. It is customary to view bilingual dictionary users in terms of the two parties they naturally fall into: one native-speaker group for each of the two languages of a particular dictionary. This simple dichotomy glosses over those instances when neither of the dictionary’s object languages is a given user’s native tongue (L1). For speakers of lesser-used languages, there is often no bilingual dictionary available which would cover the foreign language they are interested in combined with their native language. They then have to settle for second best: an L3-L2 dictionary, where L3 is the foreign language they want and L2 is another foreign language, one they have mastered sufficiently well to be able to use it as an intermediary. Sadly, the needs of such users are not taken into consideration in the preparation of bilingual dictionaries, nor is it clear how they could be. Another special case are dictionaries of languages which no longer have any native speakers. A bilingual dictionary of Sanskrit, for instance, will typically feature only one part, with Sanskrit as the source language (SL) and a modern language, such as English, as the target language (TL). Such dictionaries, used mainly by scholars, frequently rely on descriptive explanations of meaning rather than TL equivalents. This is because many headwords are so deeply embedded in the ancient SL culture that translating them by means of single TL words is not possible. Whether dealing with a dead or a living language, dictionaries with only one part (Lx-Ly) are called monoscopal (Hausmann and Werner 1991). Some monoscopal dictionaries are additionally equipped with an Ly-Lx index, meant as a partial substitute for a proper Ly-Lx part in that it allows Ly-speaking users access into the Lx-Ly section via their native language. Most currently produced bilingual dictionaries are biscopal (Lx-Ly and Ly-Lx), that is, consist of two fully fledged word lists, each item in each list being accompanied by one or more equivalents in the other language. The exact contents of a biscopal dictionary for a given pair of languages may vary considerably depending on the dictionary’s directionality. A 214
Issues in Compiling Bilingual Dictionaries
so-called bidirectional dictionary (Hausmann and Werner 1991) is designed with native speakers of both Lx and Ly in mind, while a monodirectional one is meant for an audience consisting exclusively of speakers of either Lx or Ly. The majority of contemporary bilingual dictionaries either are or claim to be bidirectional. The motivation is largely commercial: it is cheaper to produce one dictionary which will be sold in both markets than invest in two different ones. Considered from the perspective of any single user, this is far from ideal. The necessity of addressing two language communities at once results, especially in print dictionaries, in a situation when some of the information supplied is superfluous from the point of view of one of the user groups. Thus, speakers of German do not need to be told what the gender of each German noun is or that a given German expression is very formal. The space occupied by such redundant information could be put to better use if the needs of only one language community – in this instance, that of German speakers – were being catered for. When compiling a monodirectional dictionary, by contrast, lexicographers do not have to constantly switch perspective from one user group to the other: they can maintain a consistent focus on that of the dictionary’s languages which is its intended users’ L2.1 This seems to be reason enough for demanding that, whenever possible, bilingual dictionaries should be monodirectional. Matters are additionally complicated by the fact that, more often than not, one of the two languages of a bilingual dictionary is less widely spoken and less often learnt than the other. As a result, what publishers advertise as a dictionary equally useful to both language communities may, upon closer inspection, prove to be heavily biased towards the needs of the community speaking the less popular language. Such a dictionary is only superficially bidirectional, for example, by virtue of giving grammatical information about headwords in both the Lx-Ly and Ly-Lx section and/or using both Lx and Ly as the metalanguage. Where semantic description is concerned, it cannot help but privilege one of the user groups (see e.g. Atkins 1985: 15). Most bilingual dictionaries with English which are produced in non-anglophone countries fit the above description: in spite of the promises on their back covers, they are directed primarily at speakers of the local language, who are likely to predominate among the users. The good news is that the number of monodirectional dictionaries published worldwide has been growing steadily in recent years. This is especially visible in the area of pedagogical lexicography, that is, among dictionaries geared specifically to the needs of less-than-advanced foreign language learners.2 Even so, it must be remembered that the coverage of such dictionaries is, as a rule, less comprehensive than that of large, academic reference works, which are still predominantly bidirectional. 215
The Bloomsbury Companion to Lexicography
1.2 Function, Organization, Medium of Presentation A dictionary’s target audience, scope and directionality are intimately connected with its functions. Two main functions of bilingual dictionaries have been recognized: the receptive (passive, decoding) one and the productive (active, encoding) one. Traditionally, the emphasis has been on reception. This is hardly surprising, since, for any language pair we care to examine, dictionaries going from L2 to L1 will turn out to be used more readily (and, consequently, to have had a longer history) than those going in the opposite direction. The reasons seem obvious. When a person wants to understand something written or spoken in a foreign language, and there is no one to help them with the task, they have little choice but to turn to a bilingual dictionary. By contrast, when someone wants to express themselves in a foreign language, they may choose to rely on their existing verbal repertoire, thus forgoing dictionary consultation. Besides, the benefits of consulting a dictionary for decoding tend to be rather more immediate and more obvious than its effects on the user’s production. Despite this traditional dominance of reception, much more attention is being paid to the productive function in twenty-first-century bilingual lexicography than used to be the case a mere few decades ago. This is a welcome development, since it is precisely in this area that (biscopal) bilingual dictionaries hold an undisputable advantage over monolingual ones: thanks to being equipped with an L1-L2 part, they enable users to access L2 lexical resources through what those users already know, i.e. through the lexical items of their L1. Even the most production-oriented monolingual learner’s dictionary cannot compete with that.3 The organization of bilingual dictionaries is predominantly semasiological, that is, based on the headword’s (written) form. Accordingly, for languages with alphabetical writing systems, the arrangement of entries follows the order of the alphabet. Each entry is organized around a particular lexical item (the headword), providing the meanings (senses) which that lexical item can express. This is the kind of reference work one normally has in mind when talking about bilingual dictionaries. In the much rarer onomasiological (thematic, ideological) bilingual dictionaries, the word list is arranged according to topics, and each entry is built around a concept, listing the lexical items through which that concept can be expressed. Onomasiological dictionaries are usually monoscopal and cover only a limited subset of the SL lexicon, such as the vocabulary specific to a particular domain of knowledge. They are also primarily concerned with the headwords’ semantics, to the exclusion of aspects such as grammar or pronunciation. There is a certain parallel between an onomasiological dictionary, whether mono or bilingual, and a semasiological L1-L2 bilingual dictionary: both start from the familiar (respectively, a general concept and a word in the user’s native language) 216
Issues in Compiling Bilingual Dictionaries
and proceed towards the less familiar (a specific word the user wants, but either cannot recall or does not know), thus serving primarily an encoding function. Depending on the medium of presentation, bilingual dictionaries are divided into paper (print) and electronic ones. As the properties of electronic dictionaries are discussed in detail in Chapters 5.1. and 5.3., the present contribution deals mainly with medium-independent features. Otherwise, in the absence of indications to the contrary, when talking about a bilingual dictionary we mean a printed book.
2 Compilation 2.1 Human Resources Whether a dictionary succeeds in meeting the expectations of its target users depends, to a large extent, on the skill of the lexicographer(s) who have compiled it. Most quality bilingual dictionaries are prepared by teams of people rather than single-handedly. When perfect bilinguals are not available (which is in most cases), the minimum requirement is that the prospective lexicographer should be a native speaker of one of the dictionary’s languages and have a near-native command of the other. Some project managers insist that the lexicographers and editors working on a dictionary’s Lx-Ly section should be native speakers of Ly (and vice versa), the assumption being that TL equivalents are easier to come by when one is translating into, rather than from, one’s native language. As might be expected, it is not always possible to fulfil this last requirement, either. One variable that is impossible to control is a candidate’s predispositions for the job. As with translating skills, a person’s talent for lexicography often has little to do with whether they hold a degree in linguistics or another language-related discipline (although being a trained linguist obviously cannot hurt). The most desirable qualification, that is, a degree in lexicography, is not a realistic expectation, since few academic institutions offer courses in lexicography (see Chapter 6), and those that do rarely teach actual dictionary-making. This being so, the best indicator of a candidate’s suitability for the job is how well they can execute the sample lexicographic tasks they have been assigned in the course of their preparatory training (which is a necessary stage preceding any respectable dictionary project).
2.2 Data Sources and Methods Whatever the exact procedure followed in the compilation of a bilingual dictionary, it is certain that existing dictionaries (bilingual Lx-Ly and Ly-Lx ones, 217
The Bloomsbury Companion to Lexicography
as well as monolingual ones of both Lx and Ly) will be consulted in the process. For many cheap, low-quality dictionaries, prior publications are the only source of material complementing the authors’ introspection. Many specialists believe that a modern dictionary worthy of the name cannot be prepared without a suitably large, electronically stored language corpus. Corpora provide evidence for SL meanings and for the frequency of occurrence of lexical items. They also help identify common syntactic patterns and recurring phraseological combinations, so that lexicographers can tap them for illustrations of various aspects of language use, which are then presented in the dictionary in the form of example sentences. In sum, it is on the basis of corpus data that representative, up-to-date word lists can be drawn up and the use of headwords illustrated, thereby ensuring that nothing of importance has been overlooked. All of the above is fairly uncontroversial. Exactly what kind of corpus is required for making a bilingual dictionary is slightly less so. Ideally, two representative, well-balanced corpora should be available, one for each of the source languages of a biscopal dictionary. Access to a parallel corpus (i.e. a corpus of SL texts and their translations) and/or to comparable monolingual corpora (i.e. corpora containing texts of the same type, coming from the same period, and dealing with the same kind of topic) would be an additional bonus. There seems to be universal agreement that such closely matched corpora could be of great help in tracking down equivalent candidates for particularly obstinate headwords. Disappointingly, no bilingual dictionaries are known to have been compiled exclusively on the basis of either parallel or comparable corpora.4 Apart from the obvious obstacle (i.e. unavailability of the right kind of corpus), the main problem seems to be the enormous amount of time required to analyse the wealth of corpus data in a satisfactory way (Atkins 2008 [2002]: 258f). A bilingual corpus offers too many equivalence candidates, each of which might seem important to the lexicographer at some point, with the likely result that a dictionary compiled in this way would contain ‘too much detail for most users’ and might end up being ‘too big to appear in print’ (Atkins and Rundell 2008: 478). In principle, of course, these difficulties could be resolved if enough time – and, consequently, money – were to be devoted to editing the preliminary version of the dictionary, allowing the lexicographers to scrutinize equivalence candidates carefully and, ultimately, trim all oversize entries. Meanwhile, as a sort of half measure, corpus data are sometimes incorporated into the electronic versions of bilingual dictionaries (e.g. the CD-ROMs or DVDs accompanying the printed books) in an attempt to offer additional assistance to more advanced users, especially translators and professional linguists. Depending on corpus availability, but also on the time, money, and human resources allocated to a particular project, the actual compilation of a bilingual dictionary may follow different paths. As already indicated, ideally, the 218
Issues in Compiling Bilingual Dictionaries
lexicographers should have at their disposal two linguistically pre-analysed corpora (databases), one for each of the source languages of the dictionary. As yet, this is hardly ever the case. Instead, for dictionaries whose SL is a well-described language Lx, a frequently taken route is to start from a ‘universal’ Lx database built from the resources of an Lx corpus. The database is universal in the sense that it can serve as a blueprint for a bilingual dictionary from Lx into any other language. It is made up of a list of Lx headwords, each complete with the relevant grammatical information, preliminary sense divisions and definitions of the distinguished senses; example sentences are sometimes provided as well. Such a pre-constructed framework is passed on to a team of TL lexicographer-translators, whose main task is to fill the empty slots, in other words, to supply TL equivalents for all the senses identified for the SL headwords. Subject to the chief editor’s approval, TL lexicographers may be allowed to modify the initial framework to varying extents in order to make it fit the target language better; this is achieved mainly by splitting and lumping senses (see Chapter 4.9). What happens with the Ly-Lx part of the dictionary depends on the resources available for Ly. The procedure may either be fully analogous (i.e. Ly-corpus-based) or, in the absence of an Ly corpus, it may involve working with several monolingual dictionaries of Ly and/or using the results of an automatic reversal of the Lx-Ly part (provided the two parts are not compiled simultaneously). There are also cases when a bilingual dictionary is created through a bilingualization of an existing monolingual dictionary. Although seemingly a simpler task, it may actually require more skill and effort to turn a fully fledged monolingual dictionary into a bilingual one than to fashion a bilingual dictionary from a semi-finished product, that is, a database of the kind discussed above. Again, depending on the project, different degrees of intervention in the original macro- and microstructure may be allowed.5 For the sake of completeness, it ought to be mentioned that somewhere between a monolingual dictionary and a bilingual dictionary created on its basis lies an intermediate genre: a bilingualized (semi-bilingual) dictionary, that is, one that offers TL equivalents while retaining the SL definitions from the monolingual dictionary on which it has been founded. Such dictionaries are always monodirectional and monoscopal (L2-L1), with only an L1-L2 index in place of a regular L1-L2 section.
3 Megastructure The overall structure of a dictionary (its megastructure) comprises the central word list (macrostructure) and the outer texts (outside matter). 219
The Bloomsbury Companion to Lexicography
3.1 Wordlists In a biscopal dictionary, of course, we have not one word list but two. The main criterion for inclusion in either of them is a given item’s frequency of occurrence, but other factors play a role as well. In the language-learning context, that is, in pedagogical bilingual dictionaries, potential usefulness for learners – which does not always coincide with corpus frequency – must be taken into account. Thus, vocabulary items connected with the language classroom, complying with examination requirements, and correlated with the interests of the target age group (e.g. schoolchildren or young adults) have a fair chance of being included. Another reason why the word lists cannot simply be determined by corpus frequency is that this would automatically disqualify older words, which are either extremely rare in, or altogether absent from, corpora of contemporary language. Although most bilingual dictionaries are synchronic contemporary dictionaries (Svensén 2009: 23), a comprehensive dictionary of the general language must also include some obsolescent or even completely obsolete words, in order to meet the expectations of those users whose proficiency allows them to read older literature in the foreign language. Since such users will need information about those words for decoding purposes only, it follows that, in a monodirectional dictionary, older vocabulary items will feature mainly, if not exclusively, as headwords in the L2-L1 part. By the same logic, in a bidirectional dictionary, where each of the parts acts as an L2-L1 resource for one of the user groups, both word lists should include important old words. What about the other end of the time spectrum, that is, neologisms? On the whole, lexical items which have not yet become institutionalized do not feature in dictionaries of any sort (except, naturally, dictionaries of neologisms). Indeed, lack of a dictionary record is one of the main criteria for deciding that a particular item is still at the neologism stage. It is all the more remarkable that, occasionally, there may be more justification for admitting a neologism into a bilingual Lx-Ly dictionary than into a monolingual dictionary of Ly. What we have in mind are cases when no Ly equivalent can be found for a particular Lx headword and when, additionally, there is reason to believe that the Lx word will eventually be borrowed into Ly (the telltale signs being its attestation in informal Ly speech and/or in Ly content on the internet). Under such circumstances, a bilingual lexicographer may decide to sanction the incipient borrowing by listing it among the proposed Ly equivalents of the Lx headword. The only chance for a neologism to feature in a bilingual dictionary is thus as a tentative TL equivalent in the Lx-Ly part rather than as a headword in either Lx-Ly or Ly-Lx. Finally, not only single words, but increasingly also multiword units can be headwords in bilingual dictionaries. This is a result of the growing realization 220
Issues in Compiling Bilingual Dictionaries
that units of meaning are not always coextensive with orthographic words, ample evidence for which comes, among others, from the study of language corpora. From the users’ point of view, one of the benefits of multiword units being elevated to the status of independent headwords is that they are easier to locate than when nested inside entries.
3.2 Outer Texts Outer texts are additions situated outside the A-Z core of the dictionary, either on the peripheries of the dictionary proper (front and back matter) or as inserts – plates of drawings, diagrams, etc. – interspersed among the entries (middle matter). Until quite recently, neither dictionary makers nor theoretical lexicographers used to pay much attention to those optional sections, arguing that very few users ever consulted them.6 While doubtless true, it is hard to say whether the latter is solely the cause of the lexicographers’ neglect or also one of its effects. What is certain is that outer texts often give the impression of an afterthought, carrying material which is relatively easy to get hold of, very likely added at the last minute and without much consideration to what the dictionary audience might really need. Fortunately, things are beginning to look up. In good dictionaries, the A-Z text is now routinely preceded by a ‘How to use the dictionary’ section. Also present is a list of abbreviations, including grammatical codes, phonetic symbols, usage labels and the like. A grammar section – with a list of noun declensions and/or verb conjugations, irregular verbs, and similar – is a frequent feature. Occasionally, one can find some or all of the following: a selection of false friends, a writing guide, a bank of common phrases used in everyday conversation and a list of popular texting and emailing abbreviations (and their TL equivalents). In learners’ dictionaries, there is often a special section devoted to aspects of the L2 culture. Lists of geographical and personal (given) names, once quite common, are less frequent these days, partly because important place names tend to feature as headwords in the A-Z part. In print dictionaries, the choice of elements as well as the amount of information to be included in the outside matter is heavily circumscribed by space considerations. In the case of discursive sections, such as those on culture or essay writing, the language of presentation has to be decided upon, as lexicographers can rarely afford the luxury of two language versions. Ideally, of course, any text in L2 should be accompanied by its L1 translation, thus allowing users to choose the version they feel more comfortable with – an easy thing to do in an electronic dictionary. In sum, if a dictionary’s outer texts are to be of real use to the language learner, a lot of thought has to go into their preparation. It may be a good idea 221
The Bloomsbury Companion to Lexicography
to enlist the help of other specialists – language teachers, grammarians, phoneticians – or even delegate the task entirely to them. All of the lexicographers’ time and effort can then be spent on their main job: identifying TL equivalents and presenting them in the most effective and efficient way.
4 Microstructure The microstructure, that is, the structure of a single entry, can be quite complex in a modern dictionary, with many constituent elements following one another in a set order, each conveying different kinds of information. Here, only two types of entry constituent will be dealt with: equivalents and examples of usage.
4.1 Equivalents The one indispensable element of the microstructure of a bilingual dictionary is the headword with its translations (known in lexicography as TL equivalents). Both the identification of suitable TL equivalents and their clear presentation are crucial to the dictionary’s success.
4.1.1 Equivalent Provision
The principal idea of a bilingual dictionary is deceptively simple: to provide equivalents for all senses of all headwords, such that each equivalent is identical in meaning to the sense it has been matched with.7 Unfortunately, the execution of this idea is, in most cases, extremely difficult and, at times, utterly impossible. Three properties of natural languages – two intra- and one interlingual – are responsible for that: vagueness of meaning, polysemy, and lack of one-to-one correspondence between different lexical systems (i.e. so-called anisomorphism8). Both vagueness and polysemy carry substantial benefits for language users, making it possible to express an infinitude of meanings with the help of limited lexical resources. The same cannot be said about their implications for lexicography. Semantic vagueness (indeterminacy) significantly complicates the process of identifying the meanings of (decontextualized) lexical items within each language. Until such identification has been accomplished, one cannot even begin to try and match meanings interlingually for the purposes of a bilingual dictionary. Besides, it is not always clear – not only to lexicographers, but also to lexical semanticists – whether a particular lexical item is best viewed as vague or as polysemous, that is, whether it should be thought of as having one general, 222
Issues in Compiling Bilingual Dictionaries
underspecified meaning or several more or less distinct, independent meanings.9 Opting for the latter, that is, for polysemy, brings with it the need for careful sense division of SL headwords (see Chapter 4.9) as well as for meticulous discrimination and disambiguation of TL equivalents (see sections 4.1.2 and 4.1.3 of the present chapter). There is also a connection between polysemy and anisomorphism: the rarity of interlingually symmetrical polysemy (i.e. of cases where two polysemous words in two different languages have the same number of exactly the same senses) can be viewed as one of the facets of anisomorphism. To illustrate this issue in a condensed form, let us consider The Sense of an Ending, the title of a novel by Julian Barnes. Individually, both sense and ending are polysemous; combined as above, they not only do not disambiguate each other, but intensify the ambiguity of the resulting phrase. At least three readings of the phrase seem possible here: ‘the feeling that an end is approaching’; ‘the meaning of a book’s ending’; ‘the meaning of a life’s ending’.10 Indeed, it is possible to read the title at a more general level, with all three interpretations being ‘turned on’ at once (the overall effect thus amounting to vagueness rather than polysemy). When it comes to translation, there is, naturally, little chance of preserving the multilayered ambiguity – or vagueness, as the case may be – unless the items proposed as TL equivalents of the two English nouns exhibit polysemy parallel (at least in the relevant senses) to that of, respectively, sense and ending.11 In general, anisomorphism – a consequence of the fact that different languages structure reality differently – means that perfect interlingual equivalence is an exception rather than the rule. Consequently, if bilingual dictionaries are to be viable at all, the sameness-of-meaning requirement must be relaxed, in recognition of the graded, rather than absolute, nature of interlingual correspondences. A great deal of ingenuity and effort on the part of the lexicographer may be required before even a limited degree of equivalence between the headword and its dictionary TL counterpart can be reached.12 One of the means of achieving correspondence is to extend the scope of the unit to be translated, in accordance with the principle that, when there is no equivalence at word level, it can still be reached at the level of the entire message. Sometimes, it may be necessary to embed the untranslatable SL lexical item in a sentence (and then translate the lot); at other times, adding some minimal context (e.g. a modifier) and translating the resulting phrase will do the trick. It would, however, be wrong to conclude that it is always easier to provide equivalents for SL stretches larger than a single word. In particular, conventional multiword lexical units, especially figurative ones, bring challenges of their own. On top of the difficulties which follow from lack of isomorphism, the extra complication here is that the meaning of a figurative expression is often to some extent motivated (transparent). The motivating link between the 223
The Bloomsbury Companion to Lexicography
actual (i.e. figurative) meaning of an expression (such as an idiom or a proverb) and its lexical structure resides in the so-called image component,13 that is, the mental picture the expression evokes in the language user. Strictly speaking, a perfect TL equivalent of a conventional figurative expression should therefore correspond to it on both planes: that of the actual meaning and that of the rich image. What usually happens is that correspondence is present only on one of the planes, if that. If two expressions are identical exclusively on the level of the image, then, of course, they do not qualify as equivalent at all; rather, they are phraseological false friends. There is no agreement as to how to treat two expressions which do mean the same, but are based on markedly different images. Most authors (including practising lexicographers) will accept them as fully equivalent, but some (e.g. Dobrovol’skij and Piirainen 2005) will insist, in accordance with the reasoning presented above, that their equivalence is only partial. Consider the pair of proverbs which Duval (2008: 280) cites as an example of full equivalence: English once bitten twice shy and French chat échaudé craint l’eau froide. Despite their identical figurative meaning, the two are not always substitutable in translation: one cannot be used in place of the other if, for whatever reason, the literal meaning – either that of having being bitten or that of having been scalded – gets activated, for example, Once bitten (by Godzilla), twice shy. Unfortunately, the nation is not “once bitten, twice shy.” More like multiple bites . . .14 Of course, bilingual lexicographers could count themselves lucky if all their problems with matching phraseological units were of this kind, the sole question being whether the equivalence works in all contexts or only in some. Most of the time they deal with correspondences which are much more tenuous than in the case just quoted, and sometimes the target language has no fixed expression at all which would make a passable equivalent candidate. In such cases, it is better to give a discursive explanation of the SL meaning in the target language than to propose as an equivalent a TL expression which exhibits significant differences (whether semantic, syntactic or pragmatic) when compared with the headword, thereby potentially misleading the user.
4.1.2 Equivalent Discrimination
Early bilingual dictionaries did not discriminate between equivalents at all, listing them one after another, separated only by commas. Later, in slightly more ambitious publications, a distinction was introduced between a comma, separating supposedly interchangeable equivalents, and a semicolon, separating equivalents which were not fully interchangeable. There are still quite a few dictionaries which continue this tradition. That it is not a tradition worth 224
Issues in Compiling Bilingual Dictionaries
continuing should be obvious from the paucity of cases when a SL headword (or a sense thereof) can be supplied with even a single perfect TL equivalent. To assume that there are more perfect equivalents per headword (sense), and that they are all fully interchangeable, flies in the face of what we know about interlingual anisomorphism, on the one hand, and about the rarity of absolute synonyms within one language, on the other. In a bilingual L2-L1 dictionary meant to serve reception only, such indiscriminate accumulation of equivalents, while hardly ideal, might perhaps be defended. When the dictionary’s target language is the users’ L1, we can, arguably, expect them to be able to pick the appropriate equivalent from among those provided. However, in a dictionary which also aims to take care of the users’ productive needs, lack of equivalent discrimination is bound to have disastrous consequences. Left to their own devices, some users will inevitably make some wrong choices, producing utterances which are, at best, stylistically inappropriate and, at worst, downright incomprehensible or ridiculous. Or simply not what they wanted to say. Guiding the user to the equivalent they need involves making a number of decisions: what kind of information should be given to identify the equivalent; in which of the two languages it should be phrased; what non-verbal signs (numbers, punctuation marks, icons) should be employed for the purpose. The information made use of in equivalent discrimination can be semantic (e.g. a synonym, hyperonym or collocate), syntactic (e.g. an indication of part of speech, transitivity, or type of subject taken by the verb), or pragmatic (e.g. style, domain, regional or temporal label). The choice of typography is largely a question of aesthetics; what matters is that the signs should be distinct enough and not too numerous or difficult to interpret. The metalanguage in a monodirectional dictionary will, naturally, be the native language of the users. Unfortunately, the policy cannot be consistently applied in bidirectional dictionaries, and this necessarily puts one of the user groups at a disadvantage. The next best thing would be to give all information in both languages – not a viable proposition in a printed book, whose entries would swell in size and become impossibly hard to navigate, but easily doable in electronic reference works, in which the choice of the metalanguage can be left in the hands of the user. The entry quoted below demonstrates how a bidirectional dictionary with English and German (OGD) attempts to solve the problem by alternating the metalanguage (in this instance, preceding the German equivalent by an English synonym of the headword in the appropriate sense or subsense and following it with two or three German collocates): handsome (1) (good-looking) gut aussehend [Mann, Frau]; schön, edel [Tier, Möbel, Vase usw.]; 225
The Bloomsbury Companion to Lexicography
(2) (generous) großzügig [Geschenk, Belohnung, Mitgift]; nobel [Behandlung, Verhalten, Empfang]; (3) (considerable) stattlich, ansehnlich [Vermögen, Summe, Preis]; stolz [Preis, Summe]15
4.1.3 Equivalent Disambiguation
Whenever the lexical item offered as a TL lexicographic equivalent is ambiguous (whether due to polysemy, homonymy or vagueness), information should be provided enabling the user to home in on the sense in which the equivalent is to be interpreted. This is especially important in L2-L1 dictionaries; in L1-L2 ones, the problem should not arise, thanks to the user’s familiarity with the meaning of the (native) headword. Often, the same methods which serve the purposes of sense and/or equivalent discrimination simultaneously resolve equivalent ambiguity. Thus, grammar codes, usage labels and guide phrases containing different kinds of clues (synonyms, collocates, etc.) can all be effective disambiguating devices. In addition, if two or more near-synonymous equivalents are given side by side, they often automatically disambiguate each other, their semantic ‘common part’ indicating the relevant sense. If all else fails, the lexicographer can furnish the equivalent with a gloss, as in this entry from a Russian-English dictionary (CORD): интерпретáтор interpreter (expounder), where the gloss deactivates the ‘oral translator’ reading of the equivalent.
4.2 Examples Unlike equivalents, without which a bilingual dictionary is simply impossible, examples of usage are not an obligatory element of the microstructure. Nonetheless, they can be found more and more often these days, especially in bilingual dictionaries geared specifically to the needs of language learners. Once the decision has been made to include this feature, a number of questions have to be answered, such as whether the examples are better invented by the lexicographer or extracted from a corpus (and, if corpus-based, to what extent they can be modified) and whether or not they should be translated. In dictionaries for beginners and/or young schoolchildren, where the examples need to be simple and short, and where, consequently, the authenticity of the illustrative material is less important than its attunement to the users’ proficiency level, examples entirely made up by the lexicographer may be the best solution. On the other hand, simplicity must give way to authenticity in the exemplification of rare words (or senses), which need to be shown in the context in which they are normally found rather than in artificially simplified sentences. In sum, 226
Issues in Compiling Bilingual Dictionaries
neither corpus-based nor made-up examples are inherently better – it all depends on the level of the user and on the kind of word (or sense) being illustrated. Since bilingual dictionaries serve users at all levels of proficiency, the examples they offer cannot always be quoted exactly in the form in which they appear in the corpus, but need to be carefully edited before being put in the dictionary. Editorial intervention involves cutting sentences down to manageable size as well as eliminating excessively difficult lexical items, obscure culture-specific references, and other potentially distracting detail. Something more than simple editing may be required when examples are meant to assist in encoding, as they normally are in bilingual lexicography (given that the decoding function is taken care of by the equivalents). As demonstrated in a recent experimental study (Frankenberg-Garcia 2012), production-oriented examples are much more effective if they have been hand-picked in such a way as to address common production errors, thus providing repeated exposure to structures that are problematic for learners with a particular mother tongue background.16 While not a realistic requirement for each single entry, this policy should at least be followed in entries containing words or structures notoriously difficult for the target users. Opinions differ as to what it is that example sentences ought to illustrate in a bilingual dictionary. In trying to answer this question, it helps to think about what bilingual dictionaries are typically used for. It seems that, whatever their motivation, people do not normally consult a bilingual dictionary in order to find out how their native language works. Consequently, in a monodirectional dictionary, it makes sense for examples always to illustrate the users’ L2. This boils down to illustrating the headword in the L2-L1 section (which is what dictionaries have traditionally been doing) and the equivalent in the L1-L2 section (which is an innovation). The innovative approach has been taken by LSW, a dictionary for Polish learners of English, whose Polish-English part contains examples illustrating the equivalents, as in the following: płynąć (1) (ryba, człowiek) swim: • fish swimming up the stream (2) (rzeka, woda) flow: • The River Elbe flows through the Czech Republic. • A steady stream of cars flowed past her window. (3) (woda, łzy) run: • Tears ran down her face. (4) (statek, statkiem) sail: • We sailed along the coast of Alaska. (5) (czas) go by: • Time goes by so quickly these days. 227
The Bloomsbury Companion to Lexicography
Regrettably, most bilingual dictionaries which feature examples still employ them exclusively to illustrate source language material (i.e. the headword and the combinations it enters into), in both parts of the dictionary and irrespective of its directionality. In bidirectional dictionaries, there is, of course, no other option: examples must illustrate the headword, because the dictionary’s source language is always an L2 for one of the user groups. Still, bilingual learners’ dictionaries being, by definition, monodirectional, there is hope that the situation there might change in the future. The controversy over what examples should illustrate is intimately connected with another question, namely, whether they ought to be translated.17 Most examples support information given earlier in the entry, but some expand on or qualify it, usually by introducing important exceptions (e.g. by showing that, in certain circumstances, a different translational equivalent is needed or that the headword is regularly omitted in translation). Examples which merely support the equivalent(s) can – perhaps even should – be left untranslated, whereas those which illustrate exceptions must be supplied with translations, either of the whole or of the part for which the so-far-unmentioned equivalent is needed. In electronic dictionaries, the question loses much of its urgency, as it is possible to make translation an optional feature, to be switched on and off when needed. Even there, however, the problem does not disappear completely. After all, it does matter – especially for advanced users, who should be given maximally authentic models – whether a particular sentence has originally been uttered in a given language or is merely a translation (and thus potentially exhibits features of translationese).
5 Conclusion While the present chapter has touched upon some important issues involved in the compilation of bilingual dictionaries, it has not attempted to make any systematic predictions about the future. This caution seems justifiable, since even the world’s most renowned lexicographers, when asked what future dictionaries will look like – or, indeed, whether there will still be dictionaries a few years from now – tend to avoid direct answers.18 While it is hard at present to imagine a world without dictionaries, especially bilingual ones, no one really knows how the world of reference science is going to be transformed by the technological advances we are witnessing. What does seem certain is that people will always need information about interlingual lexical correspondences and that, no matter in what form and through what medium that information is conveyed to them, it will capitalize upon the work which bilingual lexicographers have done over the past centuries.
228
Issues in Compiling Bilingual Dictionaries
Notes 1. Or, in general, on the language less known to them (see the remarks about L3-L2 dictionaries above). 2. The dictionaries in the Oxford Beginners series are a good example. 3. See Adamska-Sałaciak (2010b) for a review of the inherent deficiencies of monoligual learners’ dictionaries and the resulting need for bilingual ones. 4. As of December 2011 (on the authority of Gilles-Maurice de Schryver, guest lecturer at the Faculty of English, Adam Mickiewicz University in Poznań). Basing a dictionary on a parallel corpus seems a much more viable proposition where specialist publications are concerned (see e.g. the work in progress on a monoscopal English-Polish dictionary of phrasal verbs reported on in Perdek [2011]). 5. Adamska-Sałaciak (2006a: chapter one) illustrates the different ways in which bilingual dictionary compilation can proceed, using English-Polish/Polish-English dictionary projects from the 1990s and early 2000s as examples. 6. For some background on lexicographic outer texts, see the first two chapters of Bielińska (2010). 7. The intricate relations between lexicographic equivalence and semantic identity are discussed in Adamska-Sałaciak (2013). 8. The term anisomorphism was introduced by Zgusta (1971: 294ff). 9. The different diagnostic tests for distinguishing vagueness from polysemy proposed by philosophers and semanticists are not wholly reliable, often yielding inconclusive or conflicting results (for an overview, see Geeraerts [1993]). 10. Either the life of the aging narrator-protagonist or that of his best friend, whose reasons for killing himself are revealed towards the end of the book, completely undermining the reader’s earlier perception of the characters and events described. 11. Since Polish does not afford such a possibility, in Barnes (2012) the translator chose the first – arguably, most salient – reading, thus eliminating any uncertainty as to the interpretation and destroying the vague aura of mystery carried by the title. 12. A typology of equivalence types, conceived of as areas on a cline, has been developed in Adamska-Sałaciak (2006a, 2010a, 2011). A discussion illustrated with examples taken from dictionaries for different language pairs is forthcoming in Adamska-Sałaciak (forthcoming). 13. The discussion in this paragraph relies on the terminology of the Conventional Figurative Language Theory developed in Dobrovol’skij and Piirainen (2005). 14. The examples have been obtained from the enTenTen Corpus with the help of the Sketch Engine Corpus Query System; original spelling has been retained. 15. In this, as in all other examples of dictionary entries quoted in this chapter, only those elements of the entry are given which illustrate the point under discussion. 16. Although Frankenberg-Garcia’s study deals with definitions and examples, imitating the process of monoligual dictionary consultation by (Portuguese) learners of English, it seems that this particular conclusion reached by the author can be extended to bilingual dictionaries. 17. The problem is considered in detail in Adamska-Sałaciak (2006b). 18. It has become a custom at recent lexicographic conferences (e.g. EURALEX 2010, e-LEX 2011) to organize panel discussions on the topic.
229
The Bloomsbury Companion to Lexicography
References Dictionaries Concise Oxford Russian Dictionary (CORD) (1998) Revised Edition, based on the Oxford Russian Dictionary. Oxford: Oxford University Press. Longman Słownik Współczesny Angielsko-Polski, Polsko-Angielski (LSW) (2011) 2nd edition. Harlow: Pearson Education Limited. Oxford German Dictionary (OGD) (2008) 3rd edition. Oxford: Oxford University Press.
Other References Adamska-Sałaciak, A. (2006a) Meaning and the Bilingual Dictionary: The Case of English and Polish. Frankfurt am Main: Peter Lang. —(2006b) Translation of dictionary examples – notoriously unreliable? In: Elisa Corino, Carla Marello and Cristina Onesti (eds) Atti del XII Congresso Internazionale di Lessicografia, Torino, 6–9 settembre 2006, Vol. 1. Alessandria: Edizioni dell’Orso, 493–501. —(2010a) Examining equivalence. International Journal of Lexicography 23/4, 387–409. —(2010b) Why we need bilingual learners’ dictionaries. In: Ilan Kernerman and Paul Bogaards (eds) English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: K Dictionaries, 121–37. —(2011) Between designer drugs and afterburners: a lexicographic-semantic study of equivalence. Lexikos 21, 1–22. —(2013) Equivalence, sameness of meaning, and synonymy in a bilingual dictionary. International Journal of Lexicography. Special Issue ed. Frederic Dolezal. —(forthcoming) Explaining meaning in a bilingual dictionary. In: Philip Durkin (ed.) The Oxford Handbook of Lexicography. Oxford: Oxford University Press. Atkins, B. T. S. (1985) Monolingual and bilingual learners’ dictionaries: a comparison. In: Robert Ilson (ed.) Dictionaries, Lexicography and Language Learning. Oxford: Pergamon Press, 15–24. —(2002) Then and now: competence and performance in 35 years of lexicography. In: Anna Braasch and Claus Povlsen (eds) Proceedings of the Tenth EURALEX International Congress. Vols. 1–2. Copenhagen: Center for Sprogteknologi, 1–28. (Reprinted in Thierry Fontenelle [ed.] [2008] Practical Lexicography: A Reader. Oxford: Oxford University Press, 247–72.) Atkins, B. T. S. and Rundell, M. (2008). The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. Barnes, J. (2011) The Sense of an Ending. London: Jonathan Cape. —(2012) Poczucie kresu. Warszawa: Świat Książki. (Polish translation of Barnes (2011) by Jan Kabat). Bielińska, M. (2010) Lexikographische Metatexte. Eine Untersuchung nichtintegrierter Aueβentexte in einsprachigen Wörterbüchern des Deutschen als Fremdsprache. Frankfurt am Main: Peter Lang. Dobrovol’skij, D. and Piirainen, E. (2005) Figurative Language: Cross-Cultural and CrossLinguistic Perspectives. Amsterdam and Boston: Elsevier. Duval, A. (1991) Equivalence in bilingual dictionaries. In: Thierry Fontenelle (ed.) (2008) Practical Lexicography: A Reader. Oxford: Oxford University Press, 273–82. (Originally published as Duval, A. [1991] L’équivalence dans le dictionnaire bilingue. In: Franz-Josef Hausmann et al. [eds], Vol. 3, 2817–24.)
230
Issues in Compiling Bilingual Dictionaries Frankenberg-Garcia, A. (2012) Learners’ use of corpus examples. International Journal of Lexicography 25/3, 273–96. Geeraerts, D. (1993) Vagueness’s puzzles, polysemy’s vagaries. Cognitive Linguistics 4, 223–72. (Reprinted in Geeraerts, D. [2006] Words and Other Wonders. Berlin: Mouton de Gruyter, 99–148.) Hausmann, F.-J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (eds) (1989–91) Wörterbücher/ Dictionaries/ Dictionnaires. Ein internationales Handbuch zur Lexikographie. An international encyclopedia of lexicography. Encyclopédie internationale de lexicographie. Berlin and New York: Mouton de Gruyter. Hausmann, F.-J. and Werner, R. O. (1991) Spezifische Bauteile und Strukturen zweisprachigen Wörterbücher: Eine Übersicht. In: Franz-Josef Hausmann et al. (eds), Vol. 3, 2729–69. Perdek, M. (2011) English Phrasal Verbs in Translation: a Lexicographic and Corpus Study of Equivalence. Unpublished PhD dissertation, Adam Mickiewicz University. Svensén, B. (2009) A Handbook of Lexicography. Cambridge: Cambridge University Press. Zgusta, L. (1971) Manual of Lexicography. Prague: Academia/ The Hague: Mouton.
231
4.7
Issues in Compiling Dictionaries for African Languages Danie J. Prinsloo
Chapter Overview Introduction The Role of the Missionaries and the Euro-Centric Approach Dictionaries and a Dictionary Culture Dictionaries for Specific Target Users Affordability Conjunctivism versus Disjunctivism Complexity of Nouns and Verbs Lexicographic Traditions and Lemmatization Problems Lemmatization Approaches Issues Regarding Alphabetical Ordering Tonal Indication Simultaneous Feedback Rulers and Block Systems African Languages and the Electronic Era Conclusion
232 233 235 236 236 236 237 239 243 246 247 249 249 250 255
1 Introduction It is not possible to give a comprehensive overview or a detailed discussion of all issues relevant to African language lexicography within the space limitation of a single chapter. However, what will be attempted is a description of African 232
Issues in Compiling Dictionaries for African Languages
language lexicography in respect of the many challenges facing the lexicographer in terms of socio-economics, dictionary culture, lemmatization problems, lexicographic traditions, etc. Also attempted is the positioning of African language lexicography in terms of lexicographic development in relation to major languages of the world. The chapter is written mainly from a Southern African perspective focusing on the Bantu languages familiar to the author, but arguments are to a large extent applicable to other African languages. Generally speaking, the availability of dictionaries for African languages varies from none for a specific language to one or more and even several which are unfortunately often outdated or out of print. Dictionaries are compiled by individuals, mostly through initiatives from publishing houses or are government funded or institutionally supported.
2 The Role of the Missionaries and the Euro-Centric Approach African language lexicography is traditionally rooted in the work of missionaries. On the one hand it is true that valuable work was done in terms of pioneering dictionaries by the missionaries but on the other hand their first priority was not to promote the interests of the speakers of these languages but rather to create instruments enabling Europeans to fulfil their goals in Africa. Awak (1990: 17) states it as follows: The history of lexicography in Africa began as a result of European activities: exploration, evangelization and colonization. The early lexicons, whether compiled by explorers, missionaries, or colonial administrators, were ‘Euro-centred’, produced in Europe for Europeans rather than for African users . . . Even with the emergence of modern linguistics, lexicographic works have been primarily intended for scholarly interest and not for the needs of ordinary Africans . . . African lexicons first appeared in the form of wordlists (from a European language, e.g. English, French or German into the local language), usually in the 500–1,000 word range appended to grammatical sketches. These wordlists were generally superficial and lacking in basic morphological detail and information on suprasegmental features such as tones and vowel length. Then came the somewhat ‘larger vocabularies’ (to use Newman’s term). The vocabularies were generally oriented from an African language in the direction of a European language and often equipped with a reverse index. More recently, the vocabularies expanded into ‘small dictionaries’ which provided essential morphological information. Today, ‘full dictionaries’ exist, although they may still be restricted in size by comparison with those of major world languages such as English and French. 233
The Bloomsbury Companion to Lexicography
Ideally one could say that lexicography of the African languages is in a developmental phase. More than two decades ago Gouws (1990) made the harsh judgement that African languages generally lack lexicographic quality: Lexicographical activities on the various indigenous African languages [. . . have] resulted in a wide range of dictionaries. Unfortunately, the majority of these dictionaries are the products of limited efforts not reflecting a high standard of lexicographic achievement. (Gouws 1990: 55) Mbogho (1985: 152) highlights a few problematic aspects: Bilingual lexicography seems to be developing without any basic theory . . . the inefficiency of African language bilingual lexicography . . . the basic methodological principles for compiling bilingual dictionaries were largely unknown or overlooked. The root of the problem regarding the lack of quality lies in the lack of proper lexicographic planning and that many of the African languages are not (fully) standardized. Gouws (1992: 21) emphasizes the importance of such planning but the question could be asked to what extent this aim has been achieved. Almost two decades later Gouws (2007: 314) says that a shift has taken place from externally motivated compilation of dictionaries, for example by missionaries, to an internal drive by mother-tongue speakers of the languages to take responsibility for the compilation of dictionaries. There are indeed examples of mother-tongue speakers of African languages biting the bullet by compiling good dictionaries for these languages, often under difficult circumstances. A number of successful examples known to the author include efforts from a group of dedicated lexicographers in Gabon, who leave no stone unturned to compile dictionaries of a high lexicographic standard, the establishment and funding of 11 National Lexicography Units by the South African government, work done by the Institute for Kiswahili Research in Dar-es-Salaam as well as a number of individual entrepreneurs. A few brief examples are the following. In terms of its self-description the monolingual Lusoga dictionary compiled by Minah Nabirye in Uganda ‘is the first of a new generation of dictionaries for the African languages: compiled by a mother-tongue speaker for mother-tongue speakers, and this using a modern compilation approach which results in a user-friendly, intuitive reference work’ (www.menhapublishers.com/). This dictionary is also available on CD-ROM and can be downloaded from the internet, see discussion below. A second example is the Pukuntšutlhaloši ya Sesotho sa Leboa (PTLH) from the Sesotho sa Leboa National Lexicography Unit in South Africa, which is the first 234
Issues in Compiling Dictionaries for African Languages
comprehensive monolingual dictionary for Sepedi. It is also available online at http://africanlanguages.com/psl/. A third example is the lexicographic work done for Swahili at the Institute for Kiswahili research in Tanzania, http://portal.unesco.org/culture/en/ ev.php-URL_ID=4046&URL_DO=DO_TOPIC&URL_SECTION=201.html. See discussion below on the online Swahili–English Dictionary. Finally, isiZulu.net, an excellent online isiZulu dictionary has the potential to solve problems related to stem and word identification. See discussions below. These dictionaries, however, are representative of what could be called pockets of excellence rather than a noticeable trend or dictionary revolution in Africa. Critics could still argue that Gouws’ (1990) evaluation is to a large extent still applicable after two decades. One has to keep in mind that lexicographic practice is not static – in the past two decades alone, major trends and changes occurred such as the dawning of the corpus era, the advent of electronic dictionaries and the revolution of the information age. These new developments brought even greater challenges to African language lexicography. One could say that African language lexicography finds itself in a position where the corpus era as well as the electronic era dawned upon lexicography before a high lexicographic standard for paper dictionaries could be attained. African language lexicography will probably also soon have to face the growing challenges and opportunities of advanced computer technologies referred to by Rundell (2012), such as step-by-step user guidance as advocated by Prinsloo et al. (2011) and adaptive electronic dictionaries allowing for customization to the needs of the user (De Schryver 2003).
3 Dictionaries and a Dictionary Culture In terms of Gouws and Prinsloo (2012: 25), the speech communities of many of the languages of Africa have had a number of lost generations with regard to knowledgeable dictionary use. This is due to the lack of a prevailing dictionary culture within these speech communities. Even the best dictionaries will not fulfil their purpose in a society which lacks a dictionary culture and where individual users cannot use or afford a dictionary. Gouws and Prinsloo (2005: 42) schematically illustrate the need, in terms of Atkins (1998), ‘to improve the dictionary and to improve the dictionary user’. Atkins (1998: 3), after having studied the South African situation. remarks as follows: The speakers of African languages have not in their formative years had access to dictionaries of the richness and complexity of those currently available for European languages. They have not had the chance to internalize the structure and objectives of a good dictionary, monolingual, bilingual or trilingual. 235
The Bloomsbury Companion to Lexicography DICTIONARIES Bad/useless dictionary or no dictionaries available
USERS
Dictionary of Dictionary of Relatively Relatively Perfect good relatively low relat ively poor dictionary dictionary dictionary lexicographic high achievement & lexicographic using using Ideal achievement skills skills user
Pre dictionary culture environment
Figure 4.7.1 Towards the perfect dictionary and the ideal user Source: Gouws and Prinsloo (2005: 42)
Atkins and Varantola (1998: 83) bluntly state: ‘There are two direct routes to more effective dictionary use: the first is to radically improve the dictionary: the second is to radically improve the users.’
4 Dictionaries for Specific Target Users It is echoed over and over in the literature that dictionaries should be compiled for specific target users, their needs studied and fulfilled. However in most cases in African language lexicography dictionary compilers do not have the luxury of compiling such dedicated dictionaries, it always seems to be a matter of having to compile a dictionary that should be ‘everything to everyone’.
5 Affordability It is unfortunate that the majority of the target users of African language dictionaries cannot afford a dictionary. Publishers are hesitant to market dictionaries in excess of 10–20 Pounds simply because they won’t sell. This puts the lexicographer in the unfortunate position of having to choose between a relatively small number of lemmas to be treated in some detail, or a larger number of lemmas with minimal treatment in order to fit into the prescribed number of pages set by the publisher.
6 Conjunctivism versus Disjunctivism Some African languages such as Sepedi have a disjunctive way of writing in contrast to languages such as isiZulu with a conjunctive orthography. Compare
236
Issues in Compiling Dictionaries for African Languages
the following example where isiZulu uses one orthographic word to express the equivalent of four orthographic words in Sepedi. (1) IsiZulu: Sepedi:
Bayazithengisa ‘They are selling it’ Ba a di rekiša ‘They are selling it’
ba- they ba they
-ya- [pres.] a [pres.]
-zi- it di it
thengisa sell rekiša sell
Conjunctively written languages pose greater challenges to the lexicographer than the disjunctively written ones in respect of lemmatization and will be discussed in more detail below in terms of stem and word identification difficulties.
7 Complexity of Nouns and Verbs In all Bantu languages nouns are subdivided into different classes. Compare, for example, a section of the noun class system of Ciluba in Table 4.7.1. In the case of verbs, the lexicographer has to consider the different moods as well as numerous derivations of a specific verb. Van Wyk (1995: 87) calculates that a single verb in isiZulu, for example, can have up to 18×19×6×2 = 4,104 derivations. Verbs combine with a number of affixes, confer the prefixes in (2) and suffixes in (3) for isiZulu and Sepedi respectively. Table 4.7.1 Simplified extract of the noun class system of Ciluba Class
Prefix
Example
Translation
1 2 1a 2a 3 4 5 6 7 8
mubaØ baa+ mumidimacibi-
mulume balume maamu baamaamu muci mici diboko maboko cintu bintu
man men mother mothers tree trees arm arms thing things
Source: Prinsloo and De Schryver (1999: 260)
237
The Bloomsbury Companion to Lexicography
Table 4.7.2 Extract from the modal system in Sepedi pos pres neg 1. INDICATIVE pos past neg
pos pres neg 2. SITUATIVE pos past neg
pos pres neg 3. RELATIVE pos past neg
(2) NEG (k)a
SUBJ NEG ngi nga u nge si ni u/a ba
MOD ya zo yo sa ka nga
OBJ ngi ku si ni m(u) ba
monna o reka puku the man buys a book monna ga a reke puku the man does not buy a book monna o rekile puku the man bought a book monna ga se a reka puku the man did not buy a book ge monna a reka puku if the man buys a book ge monna a sa reke puku if the man does not buy a book ge monna a rekile puku if the man bought a book ge monna a sa reka puku if the man did not buy a book monna yo a rekago puku the man who is buying a book monna yo a sa rekego puku the man who is not buying a book monna yo a rekilego puku the man who bought a book monna yo a sa rekago puku the man who did not buy a book
STEM bona
(3) ROOTa, ROOTile, ROOTwa, ROOTilwe, ROOTagala, ROOTagetše, ROOTagalwa, ROOTagetšwe, ROOTagatša, ROOTagaditšwe, ROOTagatšwa, ROOTagaditšwe, ROOTagana, ROOTagane, ROOTaganwa, ROOTaganwe, ROOTagantšha, ROOTagantšhitše, ROOTagantšhwa, ROOTagantšhitšwe, ROOTaganya, ROOTaganye, ROOTaganywa, ROOTaganywe, ROOTaka, 238
Issues in Compiling Dictionaries for African Languages
ROOTakile, ROOTakwa, ROOTakilwe, ROOTala, ROOTetše, ROOTalwa, ROOTetšwe, ROOTalatša, ROOTaladitše, ROOTalatšwa, ROOTaladitšwe, ROOTalatšetša, ROOTalatšeditše, ROOTalatšetšwa, ROOTalatšeditšwe, ROOTana, ROOTane, ROOTanwa, ROOTanwe, ROOTantšha, ROOTantšhitše, ROOTantšhwa, ROOTantšhitšwe, ROOTantšhetša, For the single verb thenga ‘buy’, for example, a set of 5,000 derivations can be found in the Pretoria Zulu Corpus (PZC). Consider those that occur 10 times or more in the PZC in example (4): (4) uthengizwe (163), ukuthenga (158), uthenge (99), yokuthenga (86), wathenga (82), athenge (81), ngithenge (52), bathenge (48), umthengi (46), abathengi (41), ethenga (41), bathenga (36), ukuthengisa (35), thenga (34), thengizwe (33), kathengizwe (32), ethengisa (31), uthenga (29), sithenge (29), athengise (28), uyothenga (26), bethenga (23), ukuyothenga (22), kothenga (21), ngokuthengisa (21), uthengise (20), bayothenga (19), wathengisa (19), akuthengele (18), bathengisa (18), ukuzithengela (18), abazothenga (17), kuthengwa (17), okuthenga (17), uthengisa (17), uzothenga (17), wayithenga (17), zithengwe (16), umthengeli (16), nokuthenga (16), ayothenga (16), athengwa (15), ngiyothenga (15), abathenga (15), umthengisi (14), abathengisi (14), bezothenga (14), ngathenga (14), zithengwa (13), uyithenge (13), ngokuthenga (13), ngayithenga (13), amthengele (13), ithengwe (13), nokuthengisa (12), ethengwa (11), ngizothenga (11), ethenge (11), beyothenga (11), alithenge (11), abathengisa (11), eyothenga (11), nothengizwe (10), ukuthengisela (10), ngithenga (10), lithengwe (10), ezithengwa (10), ekuthengiseni (10), ekuthengeni (10), azithenge (10), ungithengele (10) The lexicographer has to find ways to lemmatize these complex paradigms.
8 Lexicographic Traditions and Lemmatization Problems It is common knowledge that in order to find the meaning of a word, the user has to look up the lemma. Finding the lemma is relatively non-problematic for languages such as English, German or French if the user knows the alphabet and has a reasonable amount of dictionary skill. The user should know for example that given the paradigm walk, walks, walked and walking the lemma is most likely to be walk. Lemmatization concerns both the user and the lexicographer because the lexicographer should decide on a particular lemmatization strategy understood by the user. In contrast to, for example, English, German or French, lemmatization in African languages proved to be a major stumbling block, as will be illustrated below. 239
The Bloomsbury Companion to Lexicography
Mainly two lexicographic traditions are found in African language dictionaries. A word tradition of lemmatization developed for the disjunctively written languages and a stem tradition for the conjunctively written ones. In a simplified way it could be stated that in the word tradition lexicographers have to decide whether to lemmatize singular and plural forms of nouns, or only singulars. In the stem tradition, lexicographers have to decide whether to lemmatize noun stems with or without their prefixes. (5) Word lemmatization (Sepedi) motho human being batho human beings (6) Stem lemmatization without prefixes (isiZulu) -ntu (umu-, aba) human being, human beings Stem lemmatization with prefixes (Siswati)
Figure 4.7.2 -khosi in Concise SiSwati Dictionary (CSD) (column 1) with alternative suggested alignment (column 2) Identifying the stem has been the major problem for lexicographers and users in the compilation and use of dictionaries for these languages (cf. Gouws and Prinsloo 2005b). In conjunctively written languages such as the Nguni languages isiZulu, isiNdebele, Siswati and isiXhosa, most word forms contain verbal or nominal stems with prefixes and suffixes, written as one orthographic word. Compare, for example, the paradigm for the verbal suffixes and the extract of occurrences of thenga in example (4) above. This consequently results in very long words; the average word length of isiZulu words in the PZC is 6.93 letters, and substantial linguistic knowledge of the language is required to identify the stem for lookup. Consider lemma identification for a number of randomly selected orthographic words in Table 4.7.3. 240
Issues in Compiling Dictionaries for African Languages
Table 4.7.3 Orthographic word versus lemma in isiZulu Orthographic word
Lemma
kwezinsukwana nasemfuleni ngamazwe ngokuphindwa owayelokhu phindela ukhathazekile ukhulumela ukwenzenjani wokuhlabelela
-sukwana -fula -zwe -phinda (-)lokhu -phinda -khathazeka -khulumela -enza -hlabelela
In the compilation of dictionaries where the stem forms of words are lemmatized, instances exist where it is extremely problematic to identify the stem. Van Wyk (1995) states the worst case in relation to nouns of class 9 such as impala ‘impala’, impilo ‘health, life’, intombi ‘young girl’ where, as a result of certain changes, neither the lexicographer nor the user can identify the stem for lemmatization and lookup purposes. Mtuze (1992), in reference to nouns of Class 9 and Class 10 in isiXhosa, bluntly states that one never knows how these nominals were lemmatized. A perception that stems lemmatization is somewhat superior to word lemmatization, such as being the more scientific option, gained such momentum that a number of stem dictionaries were compiled for disjunctively written languages as well. Van Wyk (1995) strongly condemns this perception and is supported by Prinsloo and De Schryver (1999) and Gouws and Prinsloo (2005a), who point out that the stem approach is not only user-unfriendly but also unnecessarily introduces difficulties regarding stem identification in disjunctively written languages. So, for example, it is unnecessary to expect knowledge of the nominal class system from the user in order to look motho up under T instead of M. Bennett (1986) as quoted by De Schryver (2010: 163) rightfully points to the complexity of nouns and verbs in African languages and asserts that stem identification can be problematic. There has been debate as to the proper arrangement of the African language lexicon, and the question is far from settled. The inflection of nominals and verbals by means of prefixes, and the complex and productive derivational system, both characteristic of African languages, pose difficulties [ . . . ] If items are alphabetized by prefix [ . . . ] a verb will be listed far from its
241
The Bloomsbury Companion to Lexicography
nominal derivations, however transparent these may be. [ . . . ] A competing school arranges the lexicon by stem or root; this usefully groups related items, and saves on cross-referencing. Unfortunately, in such a system the user must be able to identify the stem, which given the sometimes complex morphophonemics of African languages may not be easy. (Bennett 1986: 3–4) Consider the following practical example of the user’s dilemma where he/she first has to disentangle the word in order to find the stem for lookup and then has to add up the semantic connotations that each of the affixes convey, in order to reconstruct the meaning of the word. For many years lexicographers believed that stem lemmatization was the best option for conjunctively written languages but this tradition was broken with the publication of Oxford Bilingual School Dictionary: Zulu and English (OZSD) where word lemmatization was followed for a conjunctively written language. Although OZSD eased the burden of stem identification, the adding or stripping of affixes in order to find the word form for lookup is as challenging to the user as is stem identification (cf. Prinsloo [2011] for a detailed discussion). This unfortunately means that the problem of stem identification is simply replaced by the challenge of identifying word forms. Word versus stem lemmatization is discussed in more detail in Prinsloo (1994), Prinsloo and Gouws (1996), Prinsloo and De Schryver (1999) and Prinsloo (2009). There is a complex interplay between lexicographic traditions and lemmatization approaches and will be briefly discussed in the following paragraphs.
Table 4.7.4 Finding the meaning of dikokobotšišano 1
dikokobotšišano
↓
2
kokobotšišano
↓
3
kokobotšišana
↓
4 5 6 7
kokobotšiša kokobotša pull off leaves (from mealie-cob) cause to pull off leaves (from mealie-cob) cause each other to pull off leaves (from mealie-cob the process of causing each other to pull off leaves (from mealie-cob the processes of causing each other to pull off leaves (from mealie-cob
↓ ↓ ↓ ↓
plural deverbative consisting of root + causative + reciprocal + ending singular deverbative consisting of root + causative + reciprocal + ending verb root + causative + reciprocal + ending verb root + causative + ending verb (stem) meaning of the verb add causative sense of ‘let/force’
↓
add reciprocal sense of ‘each other’
↓
nominalise: ‘the process of . . .’ (singular)
8 9 10
242
change ‘the process of . . .’ to the plural
Issues in Compiling Dictionaries for African Languages
9 Lemmatization Approaches Lexicographic traditions and lemmatization approaches go to the heart of African language lexicography. Hartmann (1983: 51) says that ‘one of the basic problems of lexicography is to decide what to put in the dictionary and what to exclude’. For lemmatization this means that the lexicographer should have a selection strategy for inclusion or omission of lemmas.
9.1 The Traditional Approach What can be considered as the traditional approach is a situation where a dictionary compiler adds words to the dictionary on intuition, i.e. without any selection strategy such as the consideration of word frequencies in an electronic corpus. On the one hand, many words likely to be looked for by the target users are not included simply because they did not cross the compiler’s way, and on the other hand, many words unlikely to be looked for are included that merely take up precious dictionary space. This is particularly evident in restricted or even pocket-size dictionaries, where the compiler was allowed only a few thousand entries. De Schryver and Prinsloo (2000a) compared the lemma lists of a number of dictionaries compiled on intuition and highlighted obvious omissions of lemmas from these dictionaries, simply because they did not come to mind at the time.
9.2 The Paradigm Approach The paradigm approach could be described as an urge to physically include all derivations either as lemmas or as sub-lemmas. The lexicographer meticulously lemmatizes all sub-lemmatic forms with their derived forms. So, for example, the lexicographer would lemmatize the verb stem as a point of departure and each derivation as a sublemma. Consider example (7) and Figure (4.7.3) where Groot Noord-Sotho-woordeboek, Noord-Sotho–Afrikaans/Engels (GNSW) follows this strategy and by giving the passive, perfect and passive and perfect combined with the root and its derivations: (7) root + perfect, root + passive, root + perfect and passive root + agal + perfect, root + agal + passive, root + agal + perfect and passive root + agal + el + perfect, root + agal + el + passive, root + agal + el + perfect and passive root + agal + iš + perfect, root + agal + iš + passive, root + agal + iš + perfect and passive 243
The Bloomsbury Companion to Lexicography
Figure 4.7.3 A section of the article for dira in GNSW This in principle is reassuring to the user when he or she at least finds the word in the dictionary but the magnitude becomes a problem to the lexicographer and can easily result in negligence in terms of sufficient semantic information. So, for example, the compilers ‘forgot’ to give the meanings of phefa and all of its derivations in Figure (4.7.4)
9.3 The Rule-Orientated Approach Rule-orientated dictionaries succeed in combating redundancy by not lemmatizing plural forms of nouns or verbal derivations. In an attempt to help the user they provide sets of derivation rules which, if applied correctly, should guide 244
Issues in Compiling Dictionaries for African Languages
Figure 4.7.4 The article for phefa in GNSW the user to the singular forms of nouns or the stem forms of verbs in order to look them up. Compare examples (8) and (9) as extracts of rules guiding the user from the derived verb or the plural noun respectively, given in Pukuntšu woordeboek, Noord-Sotho – Afrikaans, Afrikaans – Noord-Sotho (PUKU 2). (8) -êtše:
-êla, -ala,
ex. ex.
Rapetše robetše
under rapêla under rôbala 245
The Bloomsbury Companion to Lexicography
-itšê:
-ša, -tšha, -sa, -tswa,
ex. ex. ex. ex.
bešitše bontšhitše Lesitše hlatswitše
under under under under
beša bôntšha lesa hlatswa
(9) mabj-, mabo-, me-, meb-, mef-, mengw-,
bj- bo- mo- mm- mph- ngw-
ex. ex. ex. ex. ex. ex.
mabjang mabothata Mello mebutla mefoma mengwaga
under under under under under under
bjang bothata mollo mmutla mphoma ngwaga
9.4 The Frequency Approach The frequency approach simply means that a cut-off point is determined for selecting words from a frequency list culled from an electronic corpus. So, for example, from the paradigm of dira in the Pretoria Sepedi Corpus (PSC) the forms dira, diragala, diragalela, diragatša, diragetše and dirang could be selected, taking a certain cut-off point, as has been done in Popular Northern Sotho Dictionary, Northern Sotho – English, English – Northern Sotho (POP) and Oxford Bilingual School Dictionary: Northern Sotho and English / Pukuntšu ya Polelopedi ya Sekolo: Sesotho sa Leboa le Seisimane (ONSD). (10) dira (8,535), diradira (2), diraga (1), diragaditšwe (1), diragala (74), diragalago (30), diragalang (3), diragale (7), diragalegago (1), diragalela (22), diragalelago (13), diragalele (2), diragaletše (3), diragaletšego (11), diragaletšwego (1), diragalo (1), diragatsa (2), diragatša (60), diragatšago (6), diragatše (1), diragatšego (1), diragatšwa (41), diragatšwago (16), diragatšwe (3), diragatšwego (3), diragela (4), diragelago (1), diragelelago (1), dirageletšego (2), diragelwa (2), diragelwe (1), diragetše (40), diragetšego (60), diragetšeng (2), diragetšwago (1), diragetšwe (2), dirago (1,463), dirala (1), diranago (3), diranang (3), dirane (3), diranego (3), diraneng (1), dirang (496)
10 Issues Regarding Alphabetical Ordering One of the reasons why advocates of the stem tradition are against the lemmatization of the full forms of nouns is that certain alphabetical stretches into which nominal prefixes fall will be very large. In the OZSD for example, i- takes 246
Issues in Compiling Dictionaries for African Languages
up 62 pages, representing 23.5 per cent, that is, almost a quarter, of the dictionary, u- 40 (15.2%) pages and a- 14 pages (5.3%). However, users are unlikely to find this disturbing. In the Collins COBUILD English Dictionary (COBUILD), the alphabetical stretch CON- is almost 30-pages long and to the best of our knowledge no complaints have been voiced in this regard. A second argument against the lemmatization of the full forms of nouns pertains to the lemmatization of plural forms of nouns. It is argued that it takes up too much space in the dictionary and results in overuse of the mediostructure (cross-referencing system), because all such lemmas function as cross-references to the singular forms. African language lexicographers annoy their users when they deviate from an ordinary alphabetical sorting order for some ‘grammatical’ reason. GNSW deviates from an ordinary alphabetical sorting of the lemmas by utilising a phonemic one, namely: A, B, BJ, D, E, F, FS, FŠ, G, H, HL, I, J, K, KG, KH, L, M, N, NG, NX, NY, O, P, PH, etc., because this is in their opinion ‘more scientific’. Even though this deviation can be motivated in terms of sound grammatical considerations it is user-unfriendly. In the category B, for example, lemmas like bolela and bua are sorted before bjale. The same holds true for diacritic signs, by presenting the entire paradigm of ‘s’ first before ‘š’. This means that swina will come before ša in GNSW. For Sepedi, for example, clarity in respect of alphabetical ordering for uppercase and lowercase S/s versus Š/š in combination with ê/e and ô/o should be clearly determined. The recommended ordering for this paradigm is se→ Se → sê → Sê → še → Še → šê → Šê. In principle the golden rule should be to keep the access alphabet as close to the ordinary alphabet as possible.
11 Tonal Indication For African languages guidance in respect of tone is crucial. Louwrens (1994) in his Dictionary of Northern Sotho Grammatical Terms (DGT) describes tone as follows: Tone is one of the distinctive features of the African language family . . . and in these languages differences in tone between words which have exactly the same shape, result in a difference in meaning. Two basic tones (also called tonemes) are usually distinguished, namely a high tone and a low tone, although more detailed distinctions are often drawn between, for example, rising and falling tones, mid, mid-high and mid-low tones, etc.
247
The Bloomsbury Companion to Lexicography
Consider the following example adapted from De Schryver and Kabuta (1998) (RCC), as quoted by Prinsloo and De Schryver (2000a: 301). Item 175 in RCC’s lemmatized frequency list is cilamba. Without tonal indication, this form could mean any of the ten possibilities listed in (11). (11) cilamba [7/8] ‘bridge’ cilàmbà [7/8] ‘piece of cloth’ cìlamba [7/8] ‘creeper’ cilâmba [NP7 + -lamba] ‘crept’ cilàmba [NP7 + -làmba] ‘cooked’ cìlamba [SC7 + -lamba] ‘(it) creeping’ cìlàmba [SC7 + -làmba] ‘(it) cooking’ cìlambà [POT of -lamba with object from class 7] ‘let it (her/him) creep’ cìlambà [POT of -làmba with object from class 7] ‘let it (her/him) cook’ cìlàmbà [IMP of -làmba with object from class 7] ‘cook it’ with: NP# = nominal prefix of class # | SC# = subject concord of class # POT = potential mood | IMP = imperative mood Exact tone distinction in African languages is a complex issue as the following extract from Zulu-English Dictionary (ZED), page xi, clearly illustrates. The Zulu speaker employs a nine-tone system; that is to say, his range of tones in speech covers nine different pitches. These nine tone points cannot be indicated in musical notation, for they depend upon relative and not absolute height. The intervals between the notes are the important things. The whole range is generally slightly above an octave, with a man much lower in the scale than with a woman. No satisfactory method of recording the tones of Zulu words has yet been devised, . . . 1 to 9 have been used to indicate the tone heights of the various syllables.
Figure 4.7.5 Tonal indication in ZED (ZED: xi)
248
Issues in Compiling Dictionaries for African Languages
First, tone can be indicated by means of capital letters for high tone, versus lower case letters for low tone. Secondly, tone can be diacritically represented as in (11). Thirdly an analogue representation of tones can be used, e.g. ulubola [_-_-]; –lwamutwe [-_-] (Examples from iciBemba cited by Mann (1990:47)). Snyman et al. (1990) distinguish between high- [‘], low- [unmarked] and falling [-] tone in the Dikišinare ya Setswana (DS).
12 Simultaneous Feedback The concept of Simultaneous Feedback originated by De Schryver is aimed at determining users’ needs more accurately. The need for simultaneous feedback arises in reaction to the traditional practice of submitting users of a dictionary to a series of tests to monitor their success in information retrieval from the dictionary only after completion and publication of the dictionary. Valuable feedback is often received but unfortunately comes too late and can only be considered in the next or revised edition of the dictionary. Simultaneous feedback comprises feedback from the users while the compilation of the dictionary is still in progress. Detailed discussions can be found in De Schryver and Prinsloo (2000b and 2000c).
13 Rulers and Block Systems A problem in lexicography and also for the compilation of dictionaries for African languages is balance between alphabetical categories in the dictionary. It is often the case that the lexicographer starts off enthusiastically and spends too much time on the alphabetical stretch A before moving on to B. Prinsloo and De Schryver designed so-called Lexicographic Rulers for a number of African languages to serve as instruments indicating the natural balance that exists in each language between alphabetical stretches – given the fact that not all alphabetical categories in a given language contain an equal number of words. So, for example, the alphabetic stretch in S in English is fairly large (211 pages in Macmillan English Dictionary for Advanced Learners [MED]) but V is relatively small (18 pages). In Sepedi, the category M is fairly big (17% of the dictionary) while the category C is almost empty. The calculation principle for lexicographic rulers is simple. Alphabetical lists are culled from the corpus and a percentage breakdown of each of the stretches is calculated. Prinsloo and De Schryver went further to present the rulers in terms of a percentage breakdown, that is, in terms of 100 equal blocks across the alphabet. Consider, for example, the Ruler and Block System for Setswana in Figures 4.7.6 and 4.7.7 in Prinsloo (2004: 165). 249
The Bloomsbury Companion to Lexicography
Figure 4.7.6 A ruler for Setswana
Figure 4.7.7 A block system for Setswana This percentage block can assist the lexicographer in terms of pacing the dictionary compilation in respect of time, and balance in terms of lemma and page allocation. So, for example, the alphabetical substretch ‘50 MALE’ indicates the halfway mark in terms of the number of lemmas, number of pages as well as the allocated time for the compilation or revision of the dictionary.
14 African Languages and the Electronic Era Two decades ago the electronic era was met with great enthusiasm and expectations. Electronic dictionaries were expected to supersede paper dictionaries 250
Issues in Compiling Dictionaries for African Languages
in imaginative ways in a short period of time. So, for example, Meijs (1990) predicted the end of the paper dictionary by 2000. The availability of almost unlimited space and the processing speed of the computer, coupled with sophisticated computer technology set new horizons for lexicography. However more than two decades later the paper dictionary is still standing strong, although the momentum of the electronic dictionary seems to be picking up, fuelled by the explosion of information on the internet. The question is where African language lexicography stands in terms of paper versus electronic dictionaries and the information age. On the one hand it could be argued that given the less favourable socio-economic circumstances on the continent of Africa, there would be a stronger demand for paper dictionaries than for electronic dictionaries. Lack of dictionary skills and limited availability of computers strengthen such arguments. However, the popularity of cell phones and the ability to access the internet via cell phones could lead to increased use of electronic dictionaries. If lexicographers express dissatisfaction about the status and progress in terms of electronic dictionaries for major languages of the world, so much more do African language lexicographers. Sophisticated electronic dictionaries are needed for African languages in order to solve, among others, lemmatization problems, which cannot be resolved in paper dictionaries. With a few exceptions, currently available African language electronic dictionaries (a) are limited in size with only a small number of lemmas treated, (b) offer a limited number of data types and (c) often consist of little more than word lists with one or two translation equivalents. Consider the following examples in (12) and Figures 4.7.8 and 4.7.9. (12) Extract from cBold’s electronic dictionary for Tshivenda lware HH n 11 razor LH n 11 blade of grass lwatsi lwayo LL n 11 human footprint LL n 11 stench lwefha lwendo LL n 11 journey LL n 11 small torch, usu. of reed lwenzhe lwetªo-lwetªo LLLL n 11 general contribution by everyone LLL n 6 Orion’s belt constellation maakhala maala LL n 6 ostrich feather headdress worn by warriors maalo LL n 6 chief’s sleeping mat maamba-ngavhuya LL n 1 slow, deliberate speaker www.cbold.ish-lyon.cnrs.fr/
251
The Bloomsbury Companion to Lexicography
Figure 4.7.8 Translation equivalents given for food in Lingala Source: www.African language-languages.com/en/dico.html
Figure 4.7.9 The treatment of medicine in Dicts.info Source: www.dicts.info/dictionary.php?l1=English&l2=Sesotho&word=medicine&Search=Search
Negligence is particularly visible in cases where African language dictionaries are included in sets of online dictionaries but do not contain any data. Dicts. info offers an English to Sesotho dictionary and lists no less than 55 empty bilingual dictionaries with Sesotho as source language (cf. Figure 4.7.10). These dictionaries are technically fully functional empty shells. In contrast to these dictionaries stand dictionaries such as the Online Lusoga Dictionary, isiZulu.net and the Swahili – English dictionary as examples of good dictionaries. The Online Lusoga Dictionary gives elaborate treatment in terms of comment on form (pronunciation, grammatical information, etc.) and on semantics (sense distinction, paraphrase of meaning, etc.). isiZulu.net goes a long way in assisting the user with clear semantic guidance and morphological decomposition of complex isiZulu words. Finally, consider the Swahili-English dictionary as an example of a good African language electronic dictionary with a number of useful integrated features.
252
Issues in Compiling Dictionaries for African Languages
Figure 4.7.10 Extract from the list of bilingual dictionaries involving Sesotho in Dicts.info
Figure 4.7.11 Article for funda in the Online Lusoga Dictionary
In its self-description it is stated that the dictionary contains more than 16,000 entries and phrases, more than 36,000 translation equivalents with over 20,000 words and phrases in the English index. It can be used for lookup in either Swahili or English. It has a number of advanced features such as a morphological-decomposition-based ‘Smart search’ for Swahili compound words. Furthermore, integration with Microsoft Word enables pop-ups when resting the cursor on a word in the text as for muungano in Figure 4.7.14.
253
The Bloomsbury Companion to Lexicography
Figure 4.7.12 Article for funda in isiZulu.net
Figure 4.7.13 The article of habari in Swahili – English Dictionary | Kamusi ya Kiswahili – Kiingereza Source: http://tshwanedje.com/dictionary/swahili/
254
Issues in Compiling Dictionaries for African Languages
Figure 4.7.14 Pop-up box for muungano through mouse-over in a text Source: http://tshwanedje.com/dictionary/swahili/
15 Conclusion From the previous discussion it can be stated that African language lexicography is faced by many challenges and that it has a long way to go to be on a par with, for example, European or American lexicography in terms of the quality of dictionaries and the sophistication of their users. African language lexicography has to break free from a tradition of compiling basic dictionaries and follow the example of excellent dictionaries referred to in this chapter as pockets of excellence. African language lexicographers not only have to reach a high lexicographic standard with the compilation of paper dictionaries, they also have to meet the challenges and opportunities of the corpus and the information era. African language lexicography has to solve its own unique problems in decisive and imaginative ways. Future challenges, but at the same time also major opportunities, can be found in the objectives of the Scientific e-Lexicography for Africa (SeLA) project with Southern Africa as a laboratory for electronic dictionaries, work on user guidance through decision trees (Prinsloo et al. 2011), adaptive lexicography (De Schryver 2003), new initiatives in linking dictionaries and corpora (Heid et al. 2012), etc. 255
The Bloomsbury Companion to Lexicography
References Dictionaries De Schryver G.-M. and Kabuta, N. S. (1998) Benkopt Woordenboek Cilubà-Nederlands & Kalombodi-mfùndilu kàà Cilubà (RCC). Ghent: Recall. De Schryver, G.-M. et al. (eds) (2008) Oxford Bilingual School Dictionary: Northern Sotho and English / Pukuntšu ya Polelopedi ya Sekolo: Sesotho sa Leboa le Seisimane (ONSD). Cape Town: Oxford University Press. Doke, C. M. and Vilakazi, B. W. (1948) Zulu–English Dictionary, 1st edition (ZED). Johannesburg: Witwatersrand University Press. Kriel, T. J. and Van Wyk, E. B. (1989) Pukuntšu Woordeboek, Noord-Sotho–Afrikaans, Afrikaans–Noord-Sotho (PUKU 2). Pretoria: J. L. van Schaik. Kriel, T. J., Prinsloo, D. J. and Sathekge, B. P. (1997) Popular Northern Sotho Dictionary, Northern Sotho – English, English – Northern Sotho (POP). Cape Town: Pharos. Louwrens, L. J. (1994) Dictionary of Northern Sotho Grammatical Terms (DGT). Pretoria: Via Afrika. Macmillan English Dictionary for Advanced Learners (MED). CD-ROM. Macmillan Publishers Limited (2007). Text: A&C Black Publishers Ltd (2007). Mojela, M. V. (ed.) (2007) Pukuntšutlhaloši ya Sesotho sa Leboa (PTLH). Pietermaritzburg: Nutrend. Rycroft, D. K. (1982) Concise SiSwati Dictionary, Second impression (CSD). Pretoria: J. L. van Schaik. Sinclair, J. (ed.) (1995) Collins COBUILD English Dictionary (COBUILD). London: HarperCollins. Snyman, J. W., Shole, J. S. and Le Roux, J. C. (1990) Dikišinare ya Setswana – English – Afrikaans Dictionary/Woordeboek (DS). Pretoria: Via Afrika. Ziervogel, D. and Mokgokong, P. C. M. (1975) Groot Noord-Sotho-woordeboek, Noord-Sotho– Afrikaans/Engels (GNSW). Pretoria: J. L. van Schaik.
Websites cBold. Available at: www.cbold.ish-lyon.cnrs.fr/. Dictionary of Congo-Brazzaville National Languages. Available at: www.African languagelanguages.com/en/dico.html Dicts.info. Available at: www.dicts.info/ Institute for Kiswahili Research. Available at: http://portal.unesco.org/culture/en/ev.phpURL_ID=4046&URL_DO=DO_TOPIC&URL_SECTION=201.html or: www.iks.udsm. ac.tz/ isiZulu.net. Available at: http://isizulu.net. Online Lusoga Dictionary. Nabirye, Minah et al. (ed.) (2012) e-Eiwanika ly’Olusoga. Kampala: Menha Publishers & Cape Town: TshwaneDJe HLT. Pukuntšutlhaloši ya Sesotho sa Leboa ka Inthanete. Available at: http://africanlanguages. com/psl/ Swahili – English Dictionary | Kamusi ya Kiswahili – Kiingereza. Available at: http://tshwanedje.com/dictionary/swahili/
256
Issues in Compiling Dictionaries for African Languages
Other Sources Atkins, B. T. S. (1998) Some discussion points arising from Afrilex-Salex ’98. Unpublished course evaluation document. Pretoria: University of Pretoria. Atkins, B. T. S. and Varantola, K. (1998) Monitoring dictionary use. In: Atkins, B. T. S. (ed.) Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators (Lexicographica. Series Maior 88). Tübingen: Max Niemeyer, 83–122. Awak, M. K. (1990) Historical background, with special reference to western Africa. In: Hartmann, R. R. K. (ed.), 8–18. Bennett, P. R. (1986) Grammar in the lexicon, two Bantu cases. Journal of African Languages and Linguistics 8/1, 1–30. De Schryver, G.-M. (2003) Lexicographers’ dreams in the electronic-dictionary age. International Journal of Lexicography 16/2, 143–99. —(2010) Revolutionizing African language lexicography – a Zulu case study. Lexikos 20 (AFRILEX-reeks/series 20), 161–201. De Schryver, G.-M. and Kabuta, N. S. (1998) Beknopt woordenboek Cilubà – Nederlands & Kalombodi-mfùndilu kàà Cilubà. Recall Linguistics Series 12. Ghent: Recall. De Schryver, G.-M. and Prinsloo, D. J. (2000a) Electronic corpora as a basis for the compilation of African-language dictionaries, Part 1: The macrostructure. South African Journal of African Languages 20/4, 291–309. —(2000b) The concept of ‘Simultaneous Feedback’: towards a new methodology for compiling dictionaries Lexikos 10, 1–31. —(2000c) Dictionary-making process with ‘Simultaneous Feedback’ from the target users to the compilers. In: U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds) Proceedings of the Ninth EURALEX International Congress, EURALEX 2000, Stuttgart, Germany, August 8th-12th, 2000. Stuttgart: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 197–209. Gouws, R. H. (1990) Information categories in dictionaries, with special reference to Southern Africa. In: R. R. K. Hartmann (ed.), 52–65. —(2007) On the development of bilingual dictionaries in South Africa: aspects of dictionary culture and government policy. International Journal of Lexicography 20/3, 313–27. Gouws, R. H. and Prinsloo, D. J. (2005a) Principles and Practice of South African Lexicography. Stellenbosch: African Sun Media. —(2005b) Left-expanded article structures in Bantu with special reference to isiZulu and Sepedi. International Journal of Lexicography 18/1, 25–46. —(2012) Establishing a dictionary culture inside the classroom for better dictionary use outside the classroom. Conference Booklet of the 17th International Conference of Afrilex. 2–5 July 2012. Pretoria: University of Pretoria, 27–8. Hartmann, R. R. K. (1983) Lexicography: Principles and Practice. London: Academic Press. Hartmann, R. R. K. (ed.) (1990) Lexicography in Africa, Exeter Linguistic Studies, Volume 15. Exeter: University of Exeter Press. Heid, U., Prinsloo, D. J. and Bothma, T. J. D. (2012) Dictionary and Corpus Data in a Common Portal: State of the Art and Requirements for the Future. (Lexicographica. Series Maior). Berlin: De Gruyter. Louwrens, L. J. (1994) Dictionary of Northern Sotho Grammatical Terms. Cape Town: Pharos Dictionaries. Mann, M. (1990) The impact of computer technology with special reference to eastern Africa. In: R. R. K. Hartmann (ed.), 44–51. Mbogho, K. (1985) Observations on bilingual lexicography involving African languages and Indo-European languages. Babel 31/3, 152–62.
257
The Bloomsbury Companion to Lexicography Meijs, W. (1990) Morphology and word-formation in a machine-readable dictionary: problems and possibilities. Folia Linguistica 24, 45–71. Mtuze, P. T. (1992) A critical survey of Xhosa lexicography 1772–1989. Lexikos 2, 165–77. Prinsloo, D. J. (1994) Lemmatization of verbs in Northern Sotho. SA Journal of African Languages 14/2, 93–102. —(2004) Revising Matumo’s Setswana – English – Setswana Dictionary. Lexikos 14, 158–72. —(2009) Current lexicography practice in Bantu with specific reference to the Oxford Northern Sotho School Dictionary. International Journal of Lexicography 22/2, 151–78. —(2011) A critical analysis of the lemmatisation of nouns and verbs in isiZulu. Lexikos 21, 169–93. Prinsloo, D. J. and Gouws, R. H. (1996) Formulating a new dictionary convention for the lemmatization of verbs in Northern Sotho. South African Journal of African Languages, 16/3, 100–7. Prinsloo, D. J. and De Schryver, G.-M. (1999) The lemmatization of nouns in African languages with special reference to Sepedi and Cilubà. South African Journal of African Languages 19/4, 258–75. Prinsloo, D. J., Heid, U., Bothma, T. J. D. and Faaß, G. (2011) Interactive, dynamic electronic dictionaries for text production. In: I. Kosem and K. Kosem (eds) Electronic Lexicography in the 21st Century: New Applications for New Users. Proceedings of eLex 2011. Ljublana: Trojína, 215–20. PSC: Pretoria Sepedi Corpus. University of Pretoria. PZC: Pretoria Zulu Corpus. University of Pretoria. Rundell, M. (2012) It works in practice but will it work in theory? The uneasy relationship between lexicography and matters theoretical. In: R. V. Fjeld and J. M. Torjusen (eds) Proceedings of the 15th Euralex International Congress, 7–11 August 2012, Oslo, 47–92. Scientific e-Lexicography for Africa (SeLA) project. Available at: https://mailman. uni-hildesheim.de/mailman/listinfo/sela. Van Wyk, E. B. (1995) Linguistic assumptions and lexicographical traditions in the African languages. Lexikos 5, 82–96.
258
4.8
Issues in Sign Language Lexicography Inge Zwitserlood, Jette Hedegaard Kristoffersen and Thomas Troelsgård
Chapter Overview Introduction Types of Sign Language Dictionaries The Issue of Sign Rendering Lemmatization Lemma Information Ordering and Searching Future Developments
259 260 261 264 269 272 278
1 Introduction Sign languages make use of the visual/manual modality and are articulated through combinations of handshapes, movements and placement in space or on the signer’s body. The phonemes1 in sign languages are articulated simultaneously for the most part, in contrast to the (mostly) sequential pronunciation of spoken language phonemes. Furthermore, sign languages include non-manual features such as facial expressions, mouth movements, eye-gaze, eye-blinks and movements of the upper body. Sign languages are full independent languages, each with their own lexicon, phonology, morphology and syntax, and are the native or first language of deaf people all over the world. To date, more than 130 sign languages have been identified (see Lewis 2009), although not all have 259
The Bloomsbury Companion to Lexicography
been studied to the same extent. As is the case for all languages, sign languages develop and change within the user community. Reflecting the visual modality of sign language has always been a challenge to the sign language lexicographer, although the use of electronic media facilitated by technological development has been a tremendous step forward. Many challenges still remain in the field, for instance the fact that no sign language, as far as we know, has a written representation commonly used by native signers for everyday reading and writing (see e.g. Brien and Turner 1994), as further exemplified in Section 3. Other problems which leave sign languages in a special position in comparison to many spoken (and written) languages are the lack of standardization, which for example complicates the identification of basic sign forms (see Section 4.2), and the lack of language resources (see Section 7). Both these problems are partly the result of the absence of a written standard, partly because sign language linguistics, including the building of corpora, is a relatively young discipline. In this chapter we will address some of the opportunities and difficulties that sign language lexicography faces if it is to comprehensively reflect these languages – to be true to linguistic principles and to providing dictionary users with the best possibility of obtaining knowledge of the language in question. After a short overview of the history of sign language dictionaries in Section 2, the problem of how to represent a visual language in printed books will be discussed in Section 3, along with the possibilities provided to sign language lexicography by recently developed multimedia platforms. The lack of a written standard not only causes problems for sign rendering; it also challenges lemmatization. Some of the main challenges are addressed in Section 4, followed by discussion of the issues in sign language lexicography concerning the task of deciding what should be handled by a grammar and what should be given separate entries in a dictionary. Section 5 gives an overview of the information types included in recent sign language dictionaries, and Section 6 illustrates some of the solutions to problems of ordering and searching lemmas in dictionaries of sign languages. In the last section we outline some of the possibilities for future development within sign language lexicography.
2 Types of Sign Language Dictionaries Dictionaries come in several types, depending on the needs of the users they serve. Traditionally, users of sign language dictionaries are those who need signs or words to facilitate their signed communication with deaf people (e.g. educators), many of whom are L2 learners of a sign language. As a result, the earliest sign language dictionaries were printed bilingual dictionaries, that is, from a spoken to a signed language, such as the French–French Sign 260
Issues in Sign Language Lexicography
Language dictionary from the late eighteenth century (Bonnal-Vergés 2008). They usually contained a (small) set of words and signs, and were basically glossaries, providing a single sign equivalent for each word and without information on meaning, use or grammatical characteristics. Through time, sign language dictionaries have incorporated more information with respect to the signs. As mentioned in Section 1, technical developments have recently made the creation of electronic dictionaries possible, which, in addition to the ability to show signs and example sentences as video clips, also facilitates (among other functions) bidirectional sign/word searching and clickable cross-referencing. Although there are specialized sign language dictionaries for particular topics, such as legal dictionaries and dictionaries on religion or health care, there is none of the variety of different dictionaries, such as specialized dictionaries of etymology, synonymy or rhyme words that we find for spoken languages (in particular European languages), nor do monolingual sign language dictionaries exist.2 As sign languages are used in increasingly diverse contexts, a greater variety of sign language dictionaries will need to be created. In particular, native and highly fluent signers, who currently hardly use sign language dictionaries, are expected to request sophisticated features that can be facilitated by rapidly changing technical developments.
3 The Issue of Sign Rendering Lemmas, translations and information about lemmas are generally represented in writing in spoken language dictionaries. The orthography of most spoken languages (Chinese being an exception) is alphabetic or syllabic, based on the phonology and/or phonetics of the language. Sign languages also have phonology (see e.g. Sandler 1989, Brentari 1998, Van der Kooij 2002), but, as mentioned in Section 1, they have no accepted form of orthography, although several attempts to capture formal features of signs into notation systems have been made (as will be seen below). As a result, compilers of sign language dictionaries have no standard way to represent signs. Different dictionaries make use of a range of different representation types. Drawings or photographs of a person articulating a sign are the most frequently used means of sign representation. This type of representation is found in most printed dictionaries. The holistic images are quite transparent, even for the naive dictionary user. As they are static representations, the dynamics of the sign (e.g. a movement of the hand or a change in the shape of the hand) need to be additionally represented by arrows and other symbols, picture sequences or multiple overlay images. An illustration of an entry with photographs of the sign can be found in Figure 4.8.1. 261
The Bloomsbury Companion to Lexicography
Figure 4.8.1 Representation of a sign in a (printed) Finnish Sign Language – Finnish dictionary (Malm 1998). Entry for the sign meaning ‘expensive’, ‘precious’, ‘valuable’. The initial and final hand configurations are shown, and the movement is indicated by an arrow. Copyright Malm (1998). Reprinted with permission of the editor. The time consuming construction of images and the amount of space each image takes underlie the fact that only sign translations of words or sign lemmas are provided, but no further information or examples. Other forms of sign representation, found in electronic dictionaries on CD and DVD, and on the internet, include animated cartoons, avatars and, most frequently, video clips. The latter have the advantage of rendering sign dynamics, including non-manual information (e.g. facial expression). Example phrases and sentences for a sign can also be easily included. At the same time, this approach has disadvantages: the signal is brief (a dictionary user may need to view it several times); and it is not possible to view the individual signs in a sequence. For examples we refer the reader to the dictionaries of Australian Sign Language (Auslan) (www.auslan.org.au/dictionary/, Johnston 2009), Danish Sign Language (DTS) (www.tegnsprog.dk, Center for Tegnsprog 2008–12), Flemish Sign Language (VGT) (http://gebaren.ugent.be/, Van Herreweeghe et al. 2004), and German Sign Language (DGS) (www.sign-lang.uni-hamburg. de/alex/, Konrad et al. 2007b). The disadvantages of these representation types are overcome in a third type of representation: a notation system. Such systems consist of sets of symbols and rules for combination of these symbols which are used to describe the formal characteristics of signs, such as the shape of the hand(s), the place of articulation, and the movement(s) of the hand(s). Although this type of representation
262
Issues in Sign Language Lexicography
is static, it captures the signs’ dynamics, takes little space, and facilitates ordering and searching. Nevertheless, notation systems are not very frequently used in dictionaries. Since such systems are not used by the average sign language user in day-to-day communication, many dictionary compilers do not consider the effort worthwhile which would need to be made by dictionary users to acquaint themselves with such a system. Still, sign descriptions using such systems are sometimes found in addition to images, animations and videos, providing systematic information about the form of signs, and are sometimes used for ordering and searching of sign entries. The best-known systems are the phonologically based Stokoe notation system (Stokoe et al. 1965), the phonetic system HamNoSys (Prillwitz et al. 1989) and the SignWriting notation system (Sutton 2011), originally developed as an orthographic system (but it is not generally accepted or used as such); but other systems are used as well, for example the Swedish notation system (Bergmann and Björkstrand 1993) used in the Swedish Sign Language dictionary (Institutionen för Lingvistik 2009–11). Examples of combined sign representation through video clips and a notation system are shown below in Figures 4.8.2 and 4.8.3 (other details in the entries are left out due to space limitations).
Figure 4.8.2 Entry for the sign for ‘worker’ in the online bilingual Dutch-Flemish Sign Language/Flemish Sign Language-Dutch dictionary. Sign representation through video clip and SignWriting (Van Herreweeghe et al. 2004).
263
The Bloomsbury Companion to Lexicography
Figure 4.8.3 Entry for the sign for ‘to cut’ in the Online Health Care dictionary of German Sign Language. Sign representation through video clip and HamNoSys (Konrad et al. 2007b).
4 Lemmatization 4.1 Lemma Selection Searchable sign language corpora have not been available until recent years (also see Section 7). For this reason the selection of lemmas for sign language dictionaries has typically been based on lists of words from spoken/written languages, for instance the basic vocabulary of a language (as in a children’s or learner’s dictionary), or a word list relating to a specific topic, such as food signs, colour signs and sports signs. This approach only works optimally if adequate sign equivalents exist (and are known to the lexicographer). Other sources have been lemma lists taken from existing sign language dictionaries or manual selection performed by sifting through video recordings. The drawback of these approaches, compared to a corpus-based approach, is that many signs will never be selected as lemmas, for instance signs that cannot be translated adequately into a single word or compound and which are therefore less likely to appear in sign lists based on word lists. With the use of sign language corpora as a tool for lemma selection, it will become easier to ensure that the selected vocabulary reflects the actual language use, for instance by including corpus frequency as a selection criterion.
4.2 Lemmatization Principles Just as lemmatization principles can be problematic for spoken languages, they can be so when dealing with sign languages. A particular challenge is a 264
Issues in Sign Language Lexicography
certain degree of non-fixedness of basic sign forms that results from the lack of a written standard (cf. Section 4.4.1). In the absence of an orthography, the lemmatization principles of a sign language dictionary must involve other areas such as phonology, semantics and etymology in order to set up rules for distinguishing between one type (and its tokens) and another. In relation to the phonology-related area, many sign language dictionaries include handshape, orientation, place of articulation and movement as lemmatization parameters, while others include non-manual features such as mouth movement.3 Semantics is often included, but the definition of the minimal semantic distance between two lemmas varies considerably across different dictionaries. Etymology is for many sign languages an unexplored area. Even large sign language dictionaries typically have considerably fewer entries than large dictionaries of written languages, regardless of how counts of entries are made. There are a number of reasons for this: signs in general tend to be more polysemous than words (at least in most European spoken languages); and the usage of signs is harder to account for because of the lack of a standardized written representation and language resources. In other words, many signs may exist, and even occur quite frequently, but not appear in any dictionary or other description of the language. Additionally, and also because of the lack of a standard written form, signs are not ‘preserved’ in texts year after year in a fixed form, as words are in books and on web pages; they appear only in video clips that are much fewer in number than written texts, and they are not searchable – apart from those that occur in recently developed sign language corpora.
4.3 Articulation Form Variants As a part of the lemmatization process it is important to make a clear distinction between form variants and synonyms of a sign. For instance, the DTS sign for ‘hair’ can have at least two forms, as shown in Figure 4.8.4: the thumb and the index finger touch each other in both cases, but the other three fingers can be either bent or extended. Both forms are used and accepted in the language community and both forms seem to be evenly distributed across the community, sometimes even within idiolects. The lexicographer could treat the two forms as variants of one sign, or as synonyms. Native signers consider both forms in Figure 4.8.4 as variants of one sign. This in turn points to the need to decide the extent to which differences in the forms of signs require treatment as separate lemmas. Form variation is quite common in sign languages, as mentioned earlier, partly due to the lack of a written standard. One solution could be to allow for variation in one and only one of the major parameters: handshape, place of articulation, 265
The Bloomsbury Companion to Lexicography
Figure 4.8.4 Two variants of the sign for ‘hair’ in DTS Source: Illustrations from the DTS Dictionary (Center for Tegnsprog 2008–12).
movement or orientation (see Troelsgård and Kristoffersen [2008] for a discussion of this solution) within the same lemma. Another issue is very important once the distinction between synonyms and variants has been made: should form variants be shown side by side in the dictionary, that is, should the dictionary be descriptive or normative? If the goal for the dictionary is normative, an additional problem will be how to decide which variant should be preferred. The lack of large corpora makes these decisions very difficult.
4.4 Lemmatization Issues Traditionally, sign lemmas have been defined as manual forms that have a meaning. In addition, they must also be pronounceable when occurring in isolation. That is, they should comprise at least a handshape, an orientation, and a place of articulation (e.g. in front of the signer’s body). Elements like bound roots, affixes and modifications do not meet these criteria: they do not appear in isolation (similarly to bound roots and affixes in spoken/written languages), and their meanings are often difficult to describe through spoken/written word equivalents. Additionally, the modifications are typically expressed solely by place of articulation, movement, or non-manual features, and are therefore impossible to render in the same way as regular headword signs. Therefore, consideration must be given to whether these elements should be lemmatized in dictionaries or should appear as inflectional/derivational information in the entries or in a grammar. In the following section we focus on different types of such elements, discussing whether they are 266
Issues in Sign Language Lexicography
suitable as dictionary lemmas, and how they could be described in a dictionary (see also Johnston and Schembri 1999).
4.4.1 Classifier Predicates
A type of signs that occur in almost all sign languages studied to date are classifier predicates (sometimes referred to as polymorphemic verbs). These signs express the existence and configuration of entities in space and the movement of entities through space (see among others Supalla 1982, Tang 2003, Cuxac and Sallandre 2007, Zwitserlood 2012). They are productively created by combinations of meaningful units that are expressed by handshapes, places of articulation, and movements. Although there is an ongoing debate on the structure and nature of these signs,4 it is generally assumed that the placement or movement of the hand in a classifier predicate expresses location or motion morphemes, respectively, and that the handshape expresses the morpheme for an entity that is somewhere in space or that is in motion. An example from the Sign Language of the Netherlands (NGT) is in Figure 4.8.5. Classifier predicates are created in the course of signing and can have an infinite number of forms and meanings. They are highly productive and have an interpretation that is predictable from the meaning of the components. As far as we know, no dictionaries have lemmatized classifier predicates. Some dictionaries (e.g. Brien 1992, Tang 2007) describe classifier constructions in a separate grammar section, and list the classifier morpheme inventory using pictures or photos of the corresponding handshapes. Another approach, used in the DTS Dictionary (Center for Tegnsprog 2008–12), has been to treat the classifier handshapes as the roots of the classifier predicates and to provide separate entries for these elements, similar to affix entries (Kristoffersen and Troelsgård 2012). In contrast, representing classifier movements and place of articulation in isolation is simply not possible by means of pictures, photographs or videos of a signing person.
4.4.2 Numeral Incorporation
Another type of morphologically complex signs found in many sign languages are ‘numeral incorporated’ signs. One type of numeral incorporation involves the parameterized, simultaneous expression of a numeral and a noun (e.g. Liddell 1996, Mathur and Rathmann 2010). An example of numeral incorporated signs from the DTS is the third and fourth sign in Figure 4.8.6, where the sign for ‘month’ is combined with the numerals ‘one’ and ‘three’, respectively. This process is generally limited to nouns that express enumerable quantities (e.g. time and currency units), and is not always fully predictable: for instance, within a single sign language, nouns are idiosyncratic with respect 267
The Bloomsbury Companion to Lexicography
Figure 4.8.5 Classifier predicate in NGT. Literally: ‘upright animate entity moves upwards near cylindrical entity’. In context: ‘cat goes up inside drainpipe’ (Crasborn et al. 2008, Creative Commons license BY-NC-SA).
Figure 4.8.6 Examples of numeral incorporation in DTS Source: All pictures are from the DTS Dictionary (Center for Tegnsprog 2008–12).
268
Issues in Sign Language Lexicography
to the numerals that can be incorporated. This process can be described in the grammar section of a sign language dictionary (including the restrictions), and the entries of nouns that allow combination with a numeral can provide this information along with other possible derivations of the noun. A similar process is found, though, where the numeral incorporated forms do not have a pronounceable basic noun root. These forms always express a quantified noun in context, and no citation form of a non-quantified noun is available. Yet, in these cases, a root (i.e. a non-incorporated form) must still be assumed, even though it is not pronounceable in isolation. Such root forms are considered to be bound morphemes and may be expressed by a movement and a place of articulation only. Like the movement in classifier predicates, these roots are difficult to represent in a sign language dictionary in which signs are represented with pictures, photographs or videos.5 The complex forms, clearly derived from the combination of a noun and a numeral, do not need to be lemmatized in dictionaries that provide the user with morphological information, but the forms that do not have a non-quantified noun root cannot be dealt with in that way. However, (in contrast to classifier predicates) the inventory of numeral incorporations of each root is finite, and some dictionaries do lemmatize every incidence of numeral incorporation, for instance the DTS Dictionary (Center for Tegnsprog 2008–12).
5 Lemma Information As mentioned in Section 2, many sign language dictionaries have a rather simple structure, often simply consisting of a spoken/written language headword and a sign language equivalent rendered as a video clip, a photograph or a drawing. Similarly, dictionaries with signs as headwords often have one or more written translation equivalents as the only information about the lemma. Some sign language dictionaries, however, do include other types of information about sign lemmas. These can include: zz A formal notation of the sign form (cf. Section 3).
zz A list of the prominent basic phonological features of the sign, such as
handshape, place of articulation, and movement.
zz Textual description of the sign pronunciation.
zz Information about mouth movements if these typically accompany the
sign.
zz Information about form variants, for instance rendered as additional
video clips.
zz Part(s) of speech.
269
The Bloomsbury Companion to Lexicography zz Morphological information (inflection and derivation). zz Additional written translation equivalents.
zz Description of the use of signs where the meaning or function cannot be
rendered satisfactorily through a written language equivalent.
zz Definitions of sign meanings.
zz Cross-references to sign synonyms, antonyms, etc.
zz Information about usage restrictions, like region or age-specific use of the
sign.
zz Example sentences.
Most commonly, only some of the above mentioned information types are included, and different sign language dictionaries choose different approaches for presenting the information. For instance, a sign language example sentence can be rendered through a video clip, as a series of photos or drawings, or in a formal notation, and it can be accompanied by a sign-by-sign transcription and/ or a translation into a written language. Because of the need for graphic or multimedia representation of signs, information that comprises sign representations (e.g. example sentences) consumes more screen space than a textual rendering of similar information in a dictionary of a spoken/written language, as mentioned in Section 3. Because of this, sign language dictionary entries tend to become visually heavy, and in electronic sign language dictionaries some information types are often relegated to a tab or a sub-page in order to save screen space. In Section 3 we have shown an example of a sign entry in a printed sign language dictionary; Figures 4.8.7 and 4.8.8 are examples of entries in online dictionaries. The entry in Figure 4.8.7 (New Zealand Sign Language [NZSL], McKee 2011) includes a line drawing, a video of the sign, additional information on the pronunciation of the sign, grammatical information and a usage example. Moreover, the example is also rendered as a literal textual translation, each word of which is linked to its entry. The sign entry from the DTS Dictionary (Center for Tegnsprog 2008–12) in Figure 4.8.8 describes a sign having three meanings: ‘minute’/‘second’; ‘time’ (n)/‘moment’, and ‘while’ (conj.), each of which can be viewed in an example sentence with a translation of these sentences. The example sentences are, moreover, transcribed sign by sign (as in the NZSL dictionary), with links to the entries of the signs. Among the information given is the possibility to see a form variant (with the second ’play’ button above the video window), a cross-reference to a sign synonym for the meaning ‘minute’ (the ‘=’ in meaning 1), and acceptable mouthings that can accompany the sign6 (denoted by icons after the relevant Danish equivalents).
270
Issues in Sign Language Lexicography
Figure 4.8.7 Example of a sign entry in the NZSL dictionary (McKee 2011)
271
The Bloomsbury Companion to Lexicography
Figure 4.8.8 Example of a sign entry in the DTS Dictionary (Center for Tegnsprog 2008–12).
6 Ordering and Searching 6.1 Lemma Ordering in Printed Dictionaries Printed dictionaries of spoken/written languages typically order lemmas alphabetically, sometimes topically, with the exception of special purpose dictionaries like crossword dictionaries and rhyming dictionaries. For sign language dictionaries the ordering question is made more difficult because no sign language has a standard written form. Several different approaches to ordering have been chosen in printed sign language dictionaries: some dictionaries order their entries alphabetically according to sign glosses7 or word equivalents; others use features of all or some of the main phonological parameters of handshape, orientation, place of articulation and movement, prioritized in a fixed order, sometimes corresponding to a formal notation system like HamNoSys. For example, the signs in the Auslan dictionary
272
Issues in Sign Language Lexicography
(Johnston 1989) are ordered first by handshape, secondly by handedness8 and thirdly by place of articulation (starting with the head, then moving downwards). Stokoe et al. (1965), in contrast, order signs first by place of articulation (starting with the space in front of the signer, then moving downwards from the head), secondly by handshape and thirdly by handedness. Just as there are varying numbers of words starting with a given letter in spoken language dictionaries, the sections of a phonologically ordered sign language dictionary vary in size, as some handshapes and places of articulation occur more frequently than others. Sign language dictionaries that use a phonologically based ordering of lemmas typically also include an alphabetic index of word equivalents. Similarly, some dictionaries with an alphabetic ordering based on words include an index ordered, for instance, according to handshape (e.g. Konrad et al. 2007a).
6.2 Search Facilities in Electronic Dictionaries Electronic sign language dictionaries typically offer text search on written language equivalents as a minimum. Some dictionaries’ search facilities include topical search. In addition, most newer dictionaries facilitate searches using one or more parameters related to sign form or sign usage, for instance: zz handshape
zz place of articulation zz orientation zz movement
zz handedness
zz mouth movement
zz region- or age-specific use
For form-related searches, the criteria are generally chosen from pop-up windows or sub-pages, or in some cases via a formal notation system like HamNoSys. Such searches are most likely used by users with a fair or good knowledge of the sign language, while text and topic searches are more likely to be preferred by users with less knowledge of the language. Figures 4.8.9–4.8.13 show examples of the selection possibilities found with five different categories of search criteria in electronic sign language dictionaries: handshape, place of articulation (NZSL: McKee 2011), movement (Finnish Sign Language [FSL]: Kuurojen Liitto 2003–5), orientation (DGS, Konrad et al. 2007b) and mouth movement (FSL: Kuurojen Liitto 2003–5).
273
The Bloomsbury Companion to Lexicography
Figure 4.8.9 Handshape selection window in the NZSL dictionary (McKee 2011). The user can choose from 30 handshape groups, comprising in total 63 handshapes.
274
Issues in Sign Language Lexicography
Figure 4.8.10 Selection window for place of articulation in the NZSL dictionary (McKee 2011).
Figure 4.8.11 Movement selection window in the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). The options are (from left to right): straight or curved movement, circular movement, twisting or bending wrist, opening or closing hand, finger wiggling and no movement.
275
The Bloomsbury Companion to Lexicography
Figure 4.8.12 Selection window for finger orientation in the Online Health Care dictionary of German Sign Language (Konrad et al. 2007b).
Figure 4.8.13 Mouth movement selection window in the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). The user can choose from 57 options, 17 shown as photos, as above, and 40 rendered as text in a drop-down list. 276
Issues in Sign Language Lexicography
6.3 Presentation of Search Results in Electronic Dictionaries The amounts and types of information in the search result lists vary in electronic sign language dictionaries. Most commonly, a text representation in the form of a gloss or one or more written language equivalents is given. Other types of information occurring in search results include: zz a photograph or drawing
zz a formal notation of the sign
zz a textual description of the sign pronunciation
zz the mouth movement(s) (if present in the sign) zz an ID number
zz the part(s) of speech zz the topic(s)
Similarly to the ordering of signs in printed sign language dictionaries (cf. Section 6.1), search results in electronic dictionaries are typically ordered alphabetically after a translation equivalent or gloss, or according to sign form. Some dictionaries, however, offer a choice between different sort orders. Figures 4.8.14–4.8.16 show examples of search result lists from three different electronic sign language dictionaries.
Figure 4.8.14 Search result list from the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). Each entry found is represented by a photo, an ID number, and the first line of Finnish equivalents (for each sense). The selected search criteria are located to the right of the result list (not shown in this screenshot). 277
The Bloomsbury Companion to Lexicography
Figure 4.8.15 Search result list from the NZSL dictionary (McKee 2011). Each entry found is represented by a drawing, a main gloss, secondary equivalents (if present), and the parts of speech of the sign. The selected search criteria are shown above the result list.
7 Future Developments In the above sections we have described some of the main issues and problems that sign language dictionary construction faces. As well as the general problems of creating good dictionaries, sign language dictionary compilers have to work in the context of a general lack of (accessible) resources from which they can derive information about the meaning(s), use, and frequency, as well as grammatical information of the signs in the language. Furthermore, because of the absence of an accepted orthography, a representation of signs needs to be chosen. Each of the possible choices results in specific problems as described above. As for data sources, rapid technological development has facilitated the composition of large corpora (e.g. the Auslan, NGT, British Sign Language 278
Issues in Sign Language Lexicography
Figure 4.8.16 Search result list from the DTS Dictionary (Center for Tegnsprog 2008–12). Each entry found is represented by a photo, an ID gloss, 0–3 relevance markers,9 and the first Danish equivalent (of each sense). The selected search criteria are shown to the left of the result list. [BSL] and DGS corpora: Crasborn et al. 2008, Johnston 2009, Rathmann 2011, Schembri et al. 2011), although the necessary annotation and tagging of data in these corpora remains a time- and labour-consuming enterprise. Existing corpora typically consist of elicited language samples; however, because it is now possible for anyone to upload videos to the internet, it will become increasingly easier to acquire sign language data covering a wide range of genres: news, 279
The Bloomsbury Companion to Lexicography
lectures, talks, stories, poetry and jokes, and in this way broaden the coverage provided by corpora. The morphology of most sign languages studied to date is generally quite intricate (see e.g. Sandler and Lillo-Martin 2006). This presents certain challenges for the lexicographer in deciding whether or not to list particular morphemes, and if they are to be included, the best way to do so, as indicated in Section 4. Until now, few of the digital sign language dictionaries address morphological and morpho-syntactic issues (e.g. verb agreement, aspect marking, plural formation and non-manual features), whereas some printed sign language dictionaries have separate grammar sections. Considering the superiority of the electronic medium with respect to the presentation of sign language examples (i.e. as video clips), the addition of more information on inflection and word formation in sign language dictionary entries, as is often found in dictionaries of spoken/written languages, would be a welcome future development. Similarly, one could imagine the addition of grammar sections, for example, with lists of verbs that can be modified for a particular type of agreement, or lists of nouns with a particular plural modification, just as some English dictionaries list verbs with their irregular tenses, or some Turkish dictionaries list verbs with the cases they assign to their arguments. As stated in Section 2, various types of sign language dictionary exist, almost all of them bilingual, and most of the printed (and many of the digital) ones are unidirectional from spoken to signed language. Expansion of approaches to meeting user needs is likely to occur with further development, in order to serve sign language users and learners, whose requirements may include extensive information about pronunciation, meaning, use and grammatical characteristics of the lemmas. We can also envisage bilingual dictionaries for two sign languages. There are no reasons for fewer dictionary types than are found in spoken languages. More comprehensive dictionaries will be facilitated not only by technical development, but also by increasing linguistic understanding of sign languages. Technical development will in particular facilitate creation of user interfaces for a single large dictionary database, where the user can indicate the type(s) of information required, such as signs/words within a particular domain, synonyms and grammatical information. Furthermore, sign language dictionaries for use on portable devices, such as mobile phones and tablets, already exist in the form of websites and apps, and technical development is driving rapid change. Finally, the possibilities for creating connections to other digital language resources, such as corpora, grammars, encyclopaedias and dictionaries of other languages will increase.
280
Issues in Sign Language Lexicography
Notes 1. This term, originally used for the sound system of (spoken) language, has been extended to the basic formal systems (visual or auditory) of languages. 2. In a typical monolingual dictionary, the same language is used as object language and as metalanguage. The use of sign languages as metalanguages, however, is problematic (see Kristoffersen and Troelsgård 2012). 3. For a discussion of the role of mouth movements in the lemmatizing process, see Kristoffersen and Niëmala (2008) 4. See for instance Engberg-Pedersen (1993), Slobin et al. (2003), Schembri (2003) and Liddell (2003). 5. Some researchers (e.g. Liddell 1996) suggest that such forms are affixes rather than roots. 6. See Kristoffersen and Niëmala (2008). 7. In sign language linguistics, a sign is often represented by a gloss, that is, a spoken/ written language word that reflects the meaning (for polysemous signs that is one of the meanings) of the sign, and is used as a mnemonic for the sign. Sign language glosses are typically written in upper case. 8. Johnston (1989) distinguishes between signs that are made with one hand, double-handed signs (both hands have the same handshape), and two-handed signs (the hands have different handshapes). 9. For handshape and place of articulation searches, the matches are weighted according to their appearance in the sign, so that the first occurring handshape or place of articulation scores higher than the following. Text search matches are weighted in the following order: ID-glosses, Danish equivalents, glosses in usage examples, words in the translations of usage examples. Based on the calculated relevance scores each match receives 0–3 ‘relevance stars’.
References Bergman, B. and Björkstrand, T. (1993) Kompendium i teckentranskription. Stockholm: Institutionen för Lingvistik, Stockholms Universitet. Bonnal-Vergés, F. (ed.) (2008) Abbé Jean Ferrand, Dictionnaire à l’usage des sourds et des muets (original circa 1784). Limoges: Lambert-Lucas. Brentari, D. (1998) A Prosodic Model of Sign Language Phonology. Cambridge, MA: MIT Press. Brien, D. (ed.) (1992) Dictionary of British Sign Language/English. London: Faber and Faber. Brien, D. and Turner, G. (1994) Lemmas, dilemmas, and lexicographical anisomorphism: presenting meanings in the first BSL – English dictionary. In: I. Ahlgren, B. Bergman and M. Brennan (eds) Perspectives on Sign Language Usage: Papers from the Fifth International Symposium on Sign Language Research Vol. 2. Durham: The International Sign Linguistics Association (ISLA), 391–407. Center for Tegnsprog (2008) Ordbog over Dansk Tegnsprog. Available at: www.tegnsprog. dk. (Accessed 8 June 2012). Crasborn, O., Zwitserlood, I. and Ros, J. (2008) Corpus NGT. 72 hours of monologues and dialogues in Sign Language of the Netherlands, most of which have an open access Creative Commons license (BY-NC-SA). Available at: www.ru.nl/corpusngten/. (Accessed 10 June 2012). Cuxac, C. and Sallandre, M.-A. (2007) Iconicity and arbitrariness in French Sign Language – highly iconic structures, degenerated iconicity and diagrammatic iconicity. In:
281
The Bloomsbury Companion to Lexicography E. Pizzuto, P. Pietrandrea and R. Simone (eds) Verbal and Signed Languages. Comparing Structures, Constructs and Methodologies. Berlin: Mouton de Gruyter, 13–33. Emmorey, K. (ed.) (2003) Perspectives on Classifiers in Sign Languages. Mahwah, NJ: Lawrence Erlbaum Associates. Engberg-Pedersen, E. (1993) Space in Danish Sign Language. The Semantics and Morphosyntax of the Use of Space in a Visual Language. Hamburg: Signum. Johnston, T. (1989) Auslan dictionary: A Dictionary of the Sign Language of the Australian Deaf Community. Petersham: Deafness Resources. —(2009) The Auslan Signbank. Available at: www.auslan.org.au/dictionary/. (Accessed 13 August 2012). —(2012) Auslan Corpus. Endangered Languages Archive, SOAS, University of London. Available at: http://elar.soas.ac.uk/deposit/johnston2012auslan. (Accessed 13 August 2012). Johnston, T. and Schembri, A. (1999) On defining lexeme in a signed language. Sign Language & Linguistics 2/2, 115–85. Konrad, R., Langer, G., König, S., Schwarz, A., Hanke, T. and Prillwitz, S. (2007a) Fachgebärdenlexikon Gesundheit und Pflege. Hamburg: Signum. —(2007b) Fachgebärdenlexikon Gesundheit und Pflege. Institut für Deutsche Gebärdensprache. Available at: http://www.sign-lang.uni-hamburg.de/glex/intro/inhalt.htm. (Accessed 4 June 2012). Kristoffersen, J. H. and Niemelä, J. B. (2008) How to describe mouth patterns in the Danish Sign Language Dictionary. In: R. Müller de Quadros (ed.), 230–8. Kristoffersen, J. H. and Troelsgård, T. (2012) Electronic sign language dictionaries. In: S. Granger and M. Paqout (eds) Electronic Lexicography. Oxford: Oxford University Press, 290–312. Kuurojen Liitto (2003–5) Suvi Suomalaisen viittomakielen verkosanakirja. Available at: www. viittomat.net. (Accessed 8 June 2011). Lewis, M. (ed.) (2009) Ethnologue: Languages of the World, Sixteenth edition. Dallas, TX: SIL International. Online version available at: www.ethnologue.com/. (Accessed 10 August 2012). Liddell, S. K. (1996) Numeral incorporating roots and non-incorporating prefixes in American Sign Language. Sign Language Studies 92, 201–26. —(2003) Sources of meaning in ASL classifier predicates. In: K. Emmorey (ed.), 199–220. Malm, A. (ed.) (1998) Suomalaisen Viittomakielen Perussanakirja. Helsinki: Kuurojen Liittory Libris Oy. Mathur, G. and Rathman, C. (2010) Two types of nonconcatenative morphology in sign languages. In: G. Mathur and D. J. Napoli (eds) Deaf Around the World: The Impact of Language. Oxford: Oxford University Press, 54–82. McKee, D. (managing ed.) (2011) The Online Dictionary of New Zealand Sign Language. Available at: http://nzsl.vuw.ac.nz. (Accessed 12 June 2011). Müller de Quadros, R. (ed.) (2008) Sign Languages: spinning and unraveling the past, present and future. TISLR9, forty five papers and three posters from the 9th Theoretical Issues in Sign Language Research Conference. Florianopolis, Brazil. December 2006. Petropolis: Editora Arara Azul. Prillwitz, S., Leven, R., Zienert, H., Hanke, T. and Henning, J. (1989) Hamburg Notation System for Sign Languages – An Introductory Guide. International Studies on Sign Language and the Communication of the Deaf, Volume 5. Hamburg: Institute of German Sign Language and Communication of the Deaf, University of Hamburg. Rathmann, C. (project leader) (2011) DGS-Korpus. Available at: www.sign-lang. uni-hamburg.de/dgs-korpus (Accessed 8 June 2011). Sandler, W. (1989) Phonological Representation of the Sign: Linearity and Nonlinearity in American Sign Language. Dordrecht: Foris.
282
Issues in Sign Language Lexicography Sandler, W. and Lillo-Martin, D. (2006) Sign Language and Linguistic Universals. Cambridge: Cambridge University Press. Schembri, A. (2003) Rethinking ‘classifiers’ in signed languages. In: K. Emmorey (ed), 3–34. Schembri, A., Fenlon, J., Rentelis, R. and Cormier, K. (2011) British Sign Language Corpus Project: A Corpus of Digital Video Data of British Sign Language 2008–2011 (1st edn). London: University College London. Available at: www.bslcorpusproject.org. (Accessed 8 June 2012). Slobin, D. I., Hoiting, N., Kuntze, M., Lindert, R., Weinberg, A., Pyers, J., Anthony, M., Biederman, Y. and Thumann, H. (2003) A cognitive/functional perspective on the acquisition of ‘classifiers’. In: K. Emmorey (ed.), 271–98. Stokoe, W. C., Casterline, D. C. and Croneberg, C. G. (1965). A Dictionary of American Sign Language on Linguistic Principles. Washington, DC: Gallaudet College Press. Supalla, T. (1982) Structure and acquisition of verbs of motion and location in American Sign Language. PhD Thesis. San Diego: UCSD. Sutton, V. (2011) Sutton’s SignWriting Site. Available at: www.signwriting.org. (Accessed 8 August 2012) Tang, G. (2003) Verbs of motion and location in Hong Kong Sign Language: conflation and lexicalization. In: K. Emmorey (ed.), 143–65. Tang, G. (ed.) (2007) Hong Kong Sign Language. A Trilingual Dictionary with Linguistic Descriptions. Hong Kong: The Chinese University Press. Troelsgård, T. and Kristoffersen, J. H. (2008) An electronic dictionary of Danish Sign Language. In: R. Müller de Quadros (ed.), 652–62. Van der Kooij, E. (2002) Phonological Categories in Sign Language of the Netherlands. The Role of Phonetic Implementation and Iconicity. PhD Thesis, Leiden University. Utrecht: LOT Publishers. Van Herreweeghe, M., Slembrouck, S. and Vermeerbergen, M. (2004) Digitaal Vlaamse Gebarentaal-Nederlands/Nederlands-Vlaamse Gebarentaal Woordenboek (Digital Flemish Sign Language-Dutch/Dutch-Flemish Sign Language Dictionary). Available at: http:// gebaren.ugent.be/. (Accessed 4 June 2012) Zwitserlood, I. (2012) Classifiers. In: R. Pfau, M. Steinbach and B. Woll (eds) Sign language: An International Handbook. Berlin: Mouton de Gruyter.
283
4.9
Identifying, Ordering and Defining Senses Robert Lew
Chapter Overview Sense(s) in Language versus Senses in the Dictionary Specifying Senses in Monolingual Dictionaries Senses in Bilingual Dictionaries: Meaning Structure versus Equivalence Structure Ordering Senses Helping Dictionary Users Identify the Relevant Sense Defining Senses
284 287 289 291 294 296
1 Sense(s) in Language versus Senses in the Dictionary Linguists and philosophers of language have often talked of sense as a mass noun, typically in opposition to reference, where sense would refer to conceptual meaning, contrasted with a piece of the world that a linguistic expression refers to. In a dictionary, however, senses are something distinctly different. They are basic units of entry organization: the most distinct component parts of the dictionary article. Piotrowski (1994: 21) defines a sense in lexicography as ‘one of the main divisions of the entry, usually marked typographically by consecutive letters or numbers’. Indeed, senses are often explicitly numbered in sequence, less commonly prefixed by letters, or punctuated in a typographically more subtle manner, such as by semicolons. Occasionally, special symbols are used to effect a visual separation of senses, such as a diamond ◆, centred dot •, triangle ▶, or square ■. 284
Identifying, Ordering and Defining Senses
Dictionary senses may be run on (= continued on the same line), but they may also be given each on its own line. A one-sense-per-line presentation is generally believed to be easier to navigate, but it comes at the cost of using up more space. For this reason, this option is particularly common in on-screen presentation of electronic dictionaries and whenever user -friendliness takes precedence over space considerations, such as in dictionaries directed at language learners or children. To make a general point, entry organization in a dictionary serves the purpose of enabling users to locate, and then make good sense of the lexicographic data included in the entry. In most dictionary projects, the aim is to create efficient and effective tools, assisting the user in whatever lexicographically relevant queries, problems and doubts they may have, and good entry organization improves the efficiency of the dictionary as a tool. Dictionary users (including many linguists!) tend to conflate these two rather distinct meanings of sense, assuming without much reflection that when they look up a word in a dictionary, the senses present in the entry mirror what goes on in the language. In most cases, the correspondence is far from perfect, though generally speaking it tends to be closer in monolingual than in bilingual dictionaries. Also, such an approximation is less of a distortion in academic (or ‘scholarly’) dictionaries, whose general aim may be to present a reasonably faithful portrait of a language. However, the fact that such dictionaries often include a diachronic dimension reinforces the point that lexicographic sense division cannot be expected so readily to mirror linguistic reality, however the latter is to be understood. Further evidence of the relative autonomy of the lexicographic sense from the linguistic notion by the same name comes from the practice of the elevation of multiword expressions to sense status in some dictionaries. Multiwords are not infrequently presented on a par with the more ‘traditional’ dictionary senses. For example, Longman Dictionary of Contemporary English (free online version) enters eight senses of train as a noun. Interspersed between the more conventional senses are four multiword items: senses four and five respectively are the multiwords bring something in its train and set something in train. Clearly, these two expressions instantiate quite similar semantic values of the lemma train, and yet they are listed as separate senses. Conversely, there are also many dictionaries which lump all multiwords under a single dictionary sense. This broad variation in lexicographic practice strengthens the point that the lexicographic sense may bear, at best, a tenuous relationship to linguistic notions. It is clear that discrete senses exist in dictionaries, but do they exist in language as well?
1.1 Are Senses Discrete Entities? As explained above, the question of atomicity of senses can apply to both the lexical units of a language and the structural elements of a dictionary. 285
The Bloomsbury Companion to Lexicography
Linguists do not all agree on the issue of atomicity of senses. Some of those that do see meanings as atomic like to embark on the ambitious quest for the boundary between polysemy and vagueness. The opposite view is well represented by, for example, Patrick Hanks (2000: 211), who maintains that words only carry meaning potentials which are rather vague, and do not take on their full shape outside of their context (which includes, but is not exhausted by, co-text). There is nothing wrong with such vagueness, and it may actually foster language creativity, allowing speakers to express new ideas with existing words. Patrick Hanks and John Sinclair have also argued against a strict separation of form and meaning, showing from corpus evidence that the two tend to go hand in hand: like meanings tend to be expressed through like structures. But, again, ‘tend to’ is the operative word, as language is nowhere near as ordered as many linguists would like it to be. Another pertinent observation that lexicographers and linguists owe to Sinclair is dispelling the myth of orthographic words as principal carriers (or containers) of meaning: units of meaning should not be seen as being coextensive with orthographic spaces (the idiom principle). Paradoxically, the very fact that so many linguists seem to feel comfortable with the idea of atomic senses in language may well be a reflection of linguists’ practical, pre-theoretical experience with dictionaries (Nowakowski 1990: 10, Burkhanov 1997: 70). It is not at all unlikely that repeated exposure to structured dictionary entries by linguists-to-be in the role of ordinary dictionary users may have shaped their future thinking on how language itself might be structured. In a similar vein, Hanks (2000: 205) notes that ‘[t]he numbered lists of definitions found in dictionaries have helped to create a false picture of what really happens when language is used’. In this context, it is appropriate to reflect on what the ‘identification of senses’ in the chapter title might really refer to. Who does the identifying and what is the thing that is being identified? One answer that can be given with some confidence is that dictionary users identify senses in the dictionary which they happen to be consulting: they look for the structural segments of entries which best fit the problem at hand which has prompted them to consult a dictionary in the first place. These senses have been put in the dictionary by the lexicographer. But has the lexicographer actually ever identified these exact senses in the language? This is a tough question and the answer can at best be a qualified yes, with the degree to which it may be true depending on the type of the vocabulary item. Some words appear to have meanings which are relatively fixed and do not yield that much to contextual coercion (such words are sometimes termed autosemantic). But there are other words which of themselves tend to be rather vague and pick up a significant portion of their meaning from the context (relatively synsemantic words). Very common words can be semantically impoverished, such as, in English, have in have a go. 286
Identifying, Ordering and Defining Senses
Today, much lexicographic work is done by examining massive corpus evidence, but, as any novice lexicographer is soon bound to discover, it is notoriously difficult to compartmentalize corpus citations into discrete senses. Having access to greater volumes of data usually makes the problem even harder: for commoner words, lexicographers have to wrestle with hundreds of citations and try to group them into manageable clusters of meaning. As a result, as pointed out by van der Meer (2004: 807), ‘one of the hardest problems torturing practising lexicographers has always been the question of how to describe the meaning of so-called polysemous words.’ Atkins and Rundell (2008: 264) concur when they state that ‘there is little agreement about what word senses are (or even whether they exist). Lexicographers are therefore in the position of having to describe something whose nature is not at all clear.’ Consequently, Kilgarriff (1997) in a paper with a telling title (‘I don’t believe in word senses’) rejects the word sense – being an ill-defined entity – as the basic unit. Instead, dictionary word senses are the result of clustering attested uses appearing as concordance lines. So, although there may be no discrete senses in language, they do exist as artefacts in a dictionary.
2 Specifying Senses in Monolingual Dictionaries The modern lexicographer is often confronted with hundreds of citations and faces the intimidating task of having to arrange them neatly into portions appetizing enough to be appreciated by future dictionary users. Working with large corpora is a humbling experience for linguists, and the job of arranging a multitude of corpus citations into neat, discrete senses, is far less obvious than many would believe. In fact, two opposing strategies have been identified at this stage of dictionary compilation, known as lumping and splitting. The first strategy aims to minimize the number of senses so that they each cover as much semantic ground as possible. In contrast, those who follow the second strategy (‘the splitters’) will tend to generate a rather larger number of finely distinguished senses. As Hanks (2000: 208) observes, exposure to ever-growing corpora naturally entices lexicographers into adding yet further definitions to the dictionary. This happens in part because it does seem easier than reflecting on whether the definitions already in place can be modified to accommodate the newly encountered usage, but also because having a lot of ‘meanings’ is often seen as a desirable feature from a marketing point of view, so as to boost the number of ‘references’ that can later be bandied about in promotional materials. But even in corpus citation lines, meanings do not lie there exposed and ready to be picked up or ‘discovered’. Rather, corpus lines provide evidence of ‘traces of meaning events’ (Hanks 2000: 211). 287
The Bloomsbury Companion to Lexicography
That senses in dictionaries do not have as much grounding in linguistic reality as is often naively held, can be readily ascertained by examining closely analogous entries in different dictionaries. To work through a concrete example, let us take the noun mind. In the online version of the Longman Dictionary of Contemporary English, this lemma receives three senses, if we ignore the metonymically derived sense ‘intelligent person’ and all the numerous multiword expressions: (1) Your thoughts or your ability to think, feel and imagine things. (2) Used to talk about the way that someone thinks and the type of thoughts they have. (3) Your intelligence and ability to think, rather than your emotions. In contrast, a close competitor, Oxford Advanced Learner’s Dictionary, also available online, gives four rather different senses: (1) The part of a person that makes them able to be aware of things, to think and to feel. (2) Your ability to think and reason; your intelligence; the particular way that somebody thinks. (3) Your thoughts, interest, etc. (4) Your ability to remember things. At the same time, the DANTE lexical database gives no fewer than eight senses covering roughly the same semantic space. Surely, the best professional lexicographers cannot be describing the same reality? The undeniable observation that the more voluminous (‘comprehensive’) a dictionary, the greater the number of senses it will tend to have for a typical common word (and not just because larger dictionaries address areas of meanings excluded from smaller ones!), testifies to the fact that senses in the dictionary are only objective with respect to the entry structure of this dictionary. They should not be seen as an objective representation of language in any dimension. At the very most, they are attempts at such a representation, but filtered through the practical realities of the particular lexicographic project, dictated by the foreseen target users and uses, and constrained by the available financial, human and technical resources. Rundell (1999: 40) makes the point clearly when he observes: (as lexicographers have always known), the notion that a given word has five or ten or twenty ‘senses’ is simply a useful working convention without any objective truth-value ( . . . ) What dictionary-makers attempt to do is to segment this continuum of meaning in ways that will provide maximum benefit to the target user. 288
Identifying, Ordering and Defining Senses
It is not irrelevant to observe at this point that dictionary senses are not necessarily always designed to represent separate ‘meanings’ of the strict semantic kind. Instead, separate sense status may be accorded to distinct uses of the word. For example, verb entries may be structured by the syntactic patterns of use in which they are observed.
3 Senses in Bilingual Dictionaries: Meaning Structure versus Equivalence Structure In bilingual dictionaries, the issue of sense division is more complex, as it involves, not one, but two lexical systems. In organizing the entry into senses, lexicographers may thus be guided by interlingual equivalence relations. This provides an extra criterion, and a relatively objective one at that, especially if, in the near future, suitable parallel corpora become more widely available as a source of evidence on textual equivalence between lexical items in two (or more) languages (an idea which goes back to Hartmann 1985). The issue was taken up by (among others) Manley et al. (1988), who use the term meaning structure to refer to a type of sense organization which relies on the source language solely, and equivalence structure to one based on the equivalence relations with the target language. They assert that ‘meaning structure is a relic from the monolingual dictionary and . . . the more we can approach equivalence structure the closer we will get to the ideal form of the bilingual dictionary’ (1988: 296). Most authors writing on the issue concur that senses need to reflect such equivalence relations, even if the description of the source language gets ‘subtly distorted’ (Atkins 1996: 523) in the process. There are actually two opposing aspects of equivalence structure: (1) sense distinctions in the source language may be redundant and undergo elimination; and (2) it may be advisable to introduce extra distinctions so as to provide a tighter match between the lexical items in the two languages. To illustrate the first scenario, quite a few senses of English high which tend to be distinguished in monolingual dictionaries translate into German as hoch. All these senses of English could then be conflated in an English-to-German dictionary, thus making the entry presentation more economical and, arguably, easier to navigate and use. But there are doubts, such as what to do when a given sense in L1 has another important translation in L2. Decisions like these are usually best made on a per-case basis, depending on the particular constellation of equivalents and also on what functions the dictionary is envisaged to perform. Conversely, what appears to be a single sense in a source language (SL) item may require splitting according to substantive distinctions in the target language. For example, the English noun drift in the sense ‘deviation from course’ has different equivalents in Russian depending on whether it refers to an 289
The Bloomsbury Companion to Lexicography
aircraft (дрейф) or a vessel (снос). Therefore, the option of separating the two meanings or uses out as either senses or subsenses might at least be considered, if not always acted upon. Of course, one could argue in such cases that we are dealing with the same ‘sense’, merely providing a choice of equivalents that are restricted in their use. But this just begs the question of what a ‘sense’ is; if we see it, as I believe we should in this context, as a lexicographic construct rather than a linguistic one, then it is certainly something that can be split. Even when dictionary editors aim in principle for equivalence structure, practical considerations may prevail and skew the structure in the direction of that found in a monolingual dictionary. This can happen because a monolingual dictionary of a language is not infrequently a starting point in the compilation of a bilingual dictionary with this language as the dictionary’s SL. Alternatively, lexicographers may start with a universal framework of that language created to be used as a skeleton in bilingual projects. It is only natural that this SL-based structure will tend to impress itself on the final product, even if this is not the intention of the lexicographer. Meaning structure is overtly aimed for in dictionaries following what Jarošová (2000: 18) calls the explanatory principle. This echoes Lev Shcherba’s idea of the explanatory dictionary originally expressed in the 1940s (Shcherba 1995 is the English-language version). Meaning structure is also sanctioned in most semi-bilingualized dictionaries, where lexicographers are often discouraged, if not downright prohibited, from manipulating the sense divisions inherited from the monolingual model dictionary. At times, this frustrates the bilingual lexicographer. To use an example from my own experience when working on a Polish adaptation of a major monolingual learner’s dictionary, I had to contend with the basic sense of the English verb pour being defined as ‘to make a liquid flow from or into a container’. This sense was supposed to subsume a similar action on powdery substances such as sugar. The problem is that Polish requires completely different verbs in the two cases, but as splitting senses was not an option, I had to settle for an awkward side-by-side presentation of two totally unrelated (from the point of view of the Polish user) equivalents. All in all, except in artificial cases such as the last one described, it should by now be apparent that the sense structure of most existing bilingual dictionaries is usually a compromise between the analysis of the SL and the constellation of the target language (TL) equivalents of the source item. It can be argued that a bilingual dictionary with a dominant text production function might benefit from a sense structure closer to that of a monolingual dictionary of the source language. Here, the typical user of such an entry has limited knowledge of the target language and may not recognize at least some of the equivalents given. If so, they need guidance in the SL (either their native language or at least one they speak better than the TL), and such guidance more naturally mimics the distinctions typical of a monolingual dictionary. Still, if several senses share the 290
Identifying, Ordering and Defining Senses
same equivalent, there is no compelling reason not to combine them, thus saving a considerable amount of space and improving the visibility of the remaining senses with perhaps more unusual equivalents.
4 Ordering Senses 4.1 Ordering Senses in Monolingual Dictionaries The major approaches to sense ordering should be seen as guidelines rather than hard-and-fast rules, as excessively orthodox adherence to any one such principle is likely to lead to undesirable outcomes for some entries. A notorious example is the entry summit in the first edition of COBUILD (Sinclair and Hanks 1987), where sense ordering according to corpus frequency compelled the lexicographers to list the ‘political meeting’ sense first, before the ‘top of the mountain’ sense. This example underscores the fact that, above all, common sense should prevail over any strict application of principles. As lexicographers discover over and over again in the course of their work, the lexicon of a natural language is not regular enough for an across-the-board treatment to work seamlessly for all items. Rather, we should always remain open to individual solutions, and not hesitate to depart from the general principle whenever the peculiarity of a lexical item justifies this. Having said that, consistency is in general seen as a virtue in dictionaries, so guiding principles are needed. The most popular principles of relevance in guiding sense ordering are: chronology, frequency, markedness and logic.
4.1.1 Chronology
In chronological ordering, also known as historical, senses are arranged from the earliest attested to the most recent. As one would expect, the principle is most relevant for historical and diachronic dictionaries. However, there also exist general dictionaries using this arrangement. For example, the American dictionary publisher Merriam Webster’s Incorporated has insisted on the application of the historical principle in its range of general dictionaries, including the popular Merriam-Webster’s Collegiate Dictionary. This dictionary was found inferior for US college students compared with other dictionaries aimed at college students or advanced learners of English (McCreary and Amacker 2006, McCreary 2008, 2010). In a large measure, the disappointing performance of the Merriam-Webster dictionary was ascribed to its policy (Mish 2004: 20a) of placing first historically oldest senses which are no longer current. In view of the evidence that dictionary users all too often do not read dictionary entries beyond the first sense (Tono 1984, Lew 2004), placing a non-contemporary meaning in this privileged position is counterproductive for most typical uses 291
The Bloomsbury Companion to Lexicography
of the dictionary. McCreary (2008) suggests that this policy should be reversed by placing archaic senses towards the end of the entry.
4.1.2 Markedness
Relegating archaic senses to the final sections of an entry may be taken as indicative of another principle: that of placing marked senses after unmarked ones. This criterion (hailed as ‘distribution’ by Fuertes-Olivera and Arribas-Baño 2008: 38) says that senses which are not in general use, are restricted geographically, pragmatically, or socially, should follow those not so restricted. Sound as it is, it is obvious that the policy is insufficient in itself, as most senses with serious claims for entry-initial placement will not be restricted in any way. It should be clear, then, that this principle will not be of much help in those decisions that determine the most salient form of the entry: those regarding the most salient meanings. It will, however, assist in deciding what to do with those senses which exhibit restriction in use.
4.1.3 Frequency
The idea behind frequency ordering is to present the sense in which the lemma is most frequently used as the first one, and then order the remaining senses in decreasing frequency. The criterion has been in use for some time, though in pre-corpus times frequency was evaluated subjectively by intuition, and an early publication on the topic, Kipfer (1984), writes of ‘usage ordering’. But frequency-based ordering really came into its own with the introduction of electronic corpora. Even though corpus tools are still not quite capable of automatically counting the occurrences of words in specific senses, modern corpus query applications go a long way towards facilitating such estimates. There is no question that ordering by frequency is convenient for the lexicographer, providing a relatively objective ground for ordering decisions (issues of corpus balancing aside), but is it also in the best interest of the dictionary user? All too often authors claim that listing the most frequent senses at the top gives the user the best chance of finding what they want in the shortest time possible. The fact is, though, that such claims remain largely unproven. It was English monolingual dictionaries for advanced learners that embraced frequency ordering most enthusiastically. However, if we picture a scenario of advanced learners of English looking up the meaning of a common word (common words tend to have many senses, other things being equal), it is quite unlikely that they will be looking for the most frequent sense, as this sense will normally be quite familiar to advanced language learners. Indeed, I have heard comments from advanced learners of English that they start examining long entries from the bottom up, as they have discovered through extensive dictionary use that the senses they seek are often found towards the end of the entry. Perhaps it is the use of a similar strategy that might account for the special salience of final senses noted by Nesi and Tan (2011). 292
Identifying, Ordering and Defining Senses
Placing the most frequent sense first is rather more defensible if a dictionary is going to be used for text production (such as essay writing). Whereas looking up a frequent sense of a common word is not likely when the dictionary is being used for comprehension, users engaged in text production may wish to seek guidance or reassurance on the grammatical or collocational behaviour of a well-known sense. This invites the conclusion that the optimal sense ordering hangs on what the dictionary is actually used for (or is designed to be used for). In dynamic dictionaries of the future sense ordering might conceivably be adjusted depending on the circumstances of use (an idea developed in Lew 2009).
4.1.4 Logic
Logical ordering is sometimes invoked by dictionary editors in the front matter. The notion was subjected to close scrutiny by Hiorth (1954), and then Kipfer (1984), who found it to be merely a label with little content. Another term encountered in the front matter of dictionaries is psychologically-meaningful ordering (Kipfer 1984: 103), but it has never been made clear how these two types would actually differ. All in all, it seems that these different labels represent intuitive attempts at respecting the dictionary entry as a coherent text (cf. Frawley 1989), rather than seeing it as a loose amalgamation of independent senses. In order to present a more holistic picture of meaning, lexicographers should strive to present senses as related, to the extent that this is practical, typically by introducing an important core sense of some generality and then demonstrating how other peripheral senses relate to this pivotal sense. These senses may be derived from the core sense by meaning extension, specialization or generalization, including the figurative processes of metaphor and metonymy (van der Meer and Sansome 2001, Atkins and Rundell 2008, Wojciechowska 2012). We return to this issue below. Unlike in applying the previous principles, this approach to sense ordering implies grouping senses at different levels of organization, so the structure of the entry need not be flat. Instead, subsenses should be allowed to be nested under the main sense. A well-known exemplar of such an approach is the New Oxford Dictionary of English (Hanks and Pearsall 1998), where a systematic attempt has been made to cluster related subsenses under a smaller number of ‘prototypical’ senses. Its subsequent editions largely continue this tradition under the slightly changed title Oxford Dictionary of English. The number of hierarchical levels can be larger than two, and the hierarchy can get quite elaborate. As Fraser (2008: 72) notes, large scholarly dictionaries may feature as many as four levels of sense organization, with a possible arrangement including the following: overarching Divisions, labelled with capital letters (A, B, C); semantic Branches, with Roman numerals (I, II, II); Sections, with Arabic numerals 293
The Bloomsbury Companion to Lexicography
(1, 2, 3); and Subsections, with lower-case letters (a, b, c). A prominent exemplar of a dictionary with this style of sense organization (maximal, though not obligatory for every entry) is the Oxford English Dictionary.
4.2 Ordering Senses in Bilingual Dictionaries As we have already seen, entry structure in bilingual dictionaries may be carried over from a monolingual dictionary which may have been used as a starting point in the compilation of bilingual dictionaries. This routinely happens in the (often superficial) adaptations of monolingual dictionaries referred to as semi-bilingual or bilingualized dictionaries (Hartmann 1994). More interesting are those works in which senses in a bilingual entry have been organized around their equivalents. In such cases, there is an argument to be made for placing at the top those senses which include the most common textual equivalent in the TL, and give further senses in descending order of frequency of equivalents translating this headword (not the same, of course, as the absolute frequency of candidate equivalents). Another way to think of this measure is as conditional probability of a candidate equivalent appearing in a TL text, given the presence of the source lemma in a SL text. The rationale for this ordering principle would be that a user seeking a TL equivalent, is first presented with equivalents which translate the headword in the largest proportion of cases. Until recently, such ordering would mostly be based on the intuition of the lexicographer. Currently, corpora are increasingly being used in the compilation of bilingual dictionaries, but they tend to be separate corpora for the two languages. As such, they can provide information on the frequent patterns of use of words, but offer no direct clues on the correspondences between the two lexical systems. However, advances in parallel corpora may soon allow meaningful assistance in the identification of the most common textual equivalents between languages. Even if the most frequent equivalents in texts are not in each and every case the best candidates for inclusion in all types of bilingual dictionary, they are by and large the most serious candidates to consider.
5 Helping Dictionary Users Identify the Relevant Sense Polysemous entries present a special challenge to dictionary users, as they need to locate the relevant section of the entry. Research on dictionary use (Tono 1984, Bogaards 1998, Lew 2004) shows that users tend to look at the top part of the entry, and may not scan the whole entry unless there are obvious signals in the entry that the top sense is not what they should be looking at. There is also some evidence (Nesi and Tan 2011) that more sophisticated users tend to 294
Identifying, Ordering and Defining Senses
look at the final sense in the entry, but again, the material in the middle sections of the entry is not so easily accessible. To assist dictionary users in navigating long entries, two broad types of navigational aids have occasionally been used: (1) entry menus, and (2) sense guidewords (also known as signposts, shortcuts or mini-definitions). In both these types of navigational aid, the idea is to provide the user with rough-and-ready clues to the range of meaning or use covered within a specific sense section of the entry, and so direct them to the most relevant sense. The difference between the two types lies in their spatial organization, as illustrated in Table 4.9.1 with a modified partial entry from the seventh edition of the Oxford Advanced Learners’ Dictionary, as used in Lew’s (2010) study. Entry menus gather all the clues in a solid block at the top of the entry (left-hand column in Table 4.9.1). In contrast, guidewords are distributed throughout the entry, with indicators introducing each sense (right-hand column in Table 4.9.1). The efficacy of such entry navigational aids has been established mainly in the context of monolingual dictionaries for language learners (Tono 1984, 1992, 1997, 2001, Bogaards 1998, Lew and Pajkowska 2007). It would stand to reason that in bilingual dictionaries the need for such access-facilitating devices is diminished, as one of the languages of the dictionary would usually be the native language of the user, allowing for more efficient scanning of the entries than if the entries are all in a foreign language, as would be the case in a monolingual dictionary for language learners. However, recent research reveals that electronic bilingual dictionaries do benefit from Table 4.9.1 Part of entry for advance from OALD7 and Lew (2010) ADVANCE
ADVANCE
1. FORWARD MOVEMENT 2. DEVELOPMENT 3. MONEY 4. SEXUAL 5. PRICE INCREASE 6. MOVE FORWARD 7. DEVELOP 8. HELP TO SUCCEED 9. MONEY 10. SUGGEST 11. MAKE EARLIER 12. MOVE FORWARD 13. INCREASE ■ noun
■
1 [C] the forward movement of a group of people, especially armed forces: We feared that an advance on the capital would soon follow.
DEVELOPMENT 2 [C, U] advance (in sth) progress or a development in a particular activity or area of understanding: recent advances in medical science · We live in an age of rapid technological advance.
2 [C, U] advance (in sth) progress or a development in a particular activity or area of understanding:recent advances in medical science We live in an age of rapid technological advance.
MONEY 3 [C, usually sing.] money paid for work before it has been done or money paid earlier than expected: They offered an advance of £5 000 after the signing of the contract. ∙ She asked for an advance on her salary.
noun FORWARD MOVEMENT 1 [C] the forward movement of a group of people, especially armed forces: We feared that an advance on the capital would soon follow.
295
The Bloomsbury Companion to Lexicography
clickable entry menus as long as the target sense is additionally highlighted (Lew and Tokarek 2010). Direct comparisons between the two systems (Lew 2010, Nesi and Tan 2011) indicate that the distributed system works better. The advantage of guidewords over menus may be explained by the physical proximity between guidewords and full definitions, which allows the two entry elements to work in synergy. Also, since entry menus are found at the top of the entry, there is a real risk of dictionary users getting lost on the way from the menu to the sense down the entry, even if they have identified the relevant sense correctly in the menu itself, particularly if the entry is long and runs on to another page (on paper) or screen. This is much less of a risk when the clue is adjacent to its sense section of the entry, as it is in the guideword system.
6 Defining Senses Defining senses, or meanings, is most relevant to semasiological monolingual dictionaries. Most types of onomasiological dictionary, such as thesauri or synonym dictionaries, tend not to have definitions, except perhaps for bringing out the differences between alternative lexical choices. Prototypical bilingual dictionaries do not normally employ definitions, working instead with equivalents in another language as the primary instrument for explaining meaning. Nevertheless, bilingual dictionaries sometimes do resort to definition in cases where an equivalent happens not to be available, or an equivalent would not be clear on its own. In such and similar cases, a definition (in this use often called a gloss) may be added for clarification.
6.1 The Form of Definition For centuries, monolingual lexicography has been dominated by the Aristotelian model of defining. This format, also known as the classical definition, attempts to describe the defined item (definiendum) by supplying at least two pieces of information. First, it identifies the general category of things to which the defined item belongs. Second, it specifies the features by which the thing defined distinguishes itself from other members of this broader category. The technical terms for the two elements of the classical definition are genus and differentia specifica (or, in the plural, differentiae specificae), respectively (though they need not necessarily come in this particular order). For example, if a heater is defined as ‘a machine for making air or water hotter’ (LDOCE online), then what the definition is telling us is that a heater is a type of machine (genus) with a particular function of making air or water hotter (differentia specifica). 296
Identifying, Ordering and Defining Senses
This defining strategy thus involves two complementary moves: a generalization followed by specialization. Even though the classical definition has ruled for centuries, it has not ruled supreme. Studies by historical lexicographers (e.g. Osselton 2007, Stein 2011) have identified instances of other defining strategies, some of which have recently enjoyed a comeback. Foremost among these has been the so-called full-sentence definition (FSD), brought to the contemporary limelight by the COBUILD range of dictionaries starting in 1987 (see Sinclair 1987). The case for FSD is made by Hanks (for more detail, see Hanks 1987). There are several variants of the full-sentence definition, but the most important characteristic is that the defined item is embedded in the definition itself, as in this definition from COBUILD online: ‘A heater is a piece of equipment or a machine which is used to raise the temperature of something, especially of the air inside a room or a car’. Such full-sentence definitions are claimed to be more similar to regular discourse, and remind the reader of an explanation a teacher or parent might offer. However, studies into patterns of spontaneous defining (Fabiszewski-Jaworski 2011) do not confirm this claim: while the full-sentence format is used at times, the classical definition remains by far the most popular. Another feature of the FSD is that the inclusion of the definiendum in the definition creates an opportunity for highlighting typical word combinations with the item being defined. This is particularly common with verbs and adjectives, as in this COBUILD definition of instil: ‘If you instil an idea or feeling in someone, especially over a period of time, you make them think it or feel it’. Other dictionaries have not adopted the full-sentence definition to the extent that the COBUILD range has. Among the problems of this defining format are: excessive wordiness, complexity, and appeal to conventions that remain largely obscure to the average user (Rundell 2006). Still, the FSD is used in moderation in most current monolingual English dictionaries for learners, and the format has inspired a lot of lexicographic research, some of which continues to this day (Barnbrook 2012). A related defining format is the single-clause definition, used most readily to define abstract nouns, especially ones which lack a useful genus term. Instead of defining destruction as ‘the act or process of destroying something or of being destroyed’ (LDOCE online), the single-clause alternative would just say ‘when something is being destroyed’, avoiding the clumsy and over-general act or process. It may well be that such general words do not contribute that much to the explanation of the exact meaning of the definiendum, but at least they do indicate that a noun is being defined: something that the single-clause definition does a poor job at (Dziemianko and Lew 2006, Lew and Dziemianko 2006a, 2006b, 2012).
297
The Bloomsbury Companion to Lexicography
A frequent defining strategy in concise dictionaries is to give a synonym or several synonyms. Interestingly, such a defining strategy bears affinity to the methods of bilingual lexicography: a synonym can be thought of as a special type of (near-)equivalent. While a bilingual dictionary provides equivalents in another language, synonym definitions may represent a different regional variety (e.g. kirk ScotE church) or register (puke infml vomit). Whenever a lemma represents a non-neutral item, as in the last case, and is rendered with a synonym in general use, the use of a synonym as a definition is generally accepted. Otherwise, it is frowned upon as a lexicographer’s easy way out. A number of other defining formats are occasionally used, such as the morphological definition (a formulation unwrapping a derivative word, e.g. swiftly in a swift fashion), extensional definition (enumerating typical exemplars, e.g. legume a seed such as a pea or bean), or ostensive definition (pointing to the definiendum, e.g. black the colour of this print). The above classification of definition formats has mostly dealt with the syntactic devices by means of which definitional sequences are put together. But the ultimate building blocks of definitions are words, and there is a general, and not altogether unreasonable, expectation that those words be simpler than the word being defined. Of course, this is hardly possible in defining the most common vocabulary (whose presence in monolingual dictionaries is somewhat tokenistic). The requirement of defining in simple words found a systematic and formal implementation in the so-called vocabulary control movement of the mid-twentieth century (Cowie 1999), out of which grew the defining vocabularies of the major monolingual English learners’ dictionaries. These vocabulary lists typically consist of between 2,000 and 3,500 words in their most common senses, and it is with the use of this restricted set that the definitions of up to 100,000 senses recorded in such dictionaries are written. It is often argued that the use of restricted vocabulary generally makes definitions easier to understand. While this is probably so, it is also true that the formulations become less precise, more wordy and roundabout, if not downright strained. The artificiality extends to unnatural collocational patterns, as the natural collocates may not be in the defining vocabulary set. Problems such as these throw into question the rigid restrictions imposed by defining vocabulary lists. As an alternative, Hanks (2009: 307) proposes that while definitions should be ‘as simple as possible’, they should at the same time be ‘as complex as necessary’. This appears to be a reasonable position, given the numerous problems associated with the use of a restricted defining vocabulary. Rather than trying to dumb down definitions for language learners, publishers should offer bilingual learners’ dictionaries (Adamska-Sałaciak 2010).
298
Identifying, Ordering and Defining Senses
6.2 Relations between Definitions of Different Senses Lexicographers defining polysemous entries need to grapple with the issue of relatedness between different senses. Foregrounding the links between different shades of meaning may help repair some of the damage done by artificially chopping semantic space into separate dictionary senses. In line with this consideration, there are those who stress that the dictionary entry is just one type of text (e.g. Frawley 1989), with its own cohesive links. Arguably, readers going through the entry can benefit from the definitions of subsequent senses building on the preceding ones, at the same time avoiding repetition. As a result, however, some definitions may become impossible to interpret without the contextual support of the earlier ones (‘an instance of this’ is a classic formulation in lexicographese, its popularity probably due more to space-saving considerations than to anything else). The assumption underlying such defining practice is that dictionary users behave as entry readers. This assumption can be problematic, as dictionary users do not have to, and often do not care to, go through the complete entry, if they are looking for a quick solution. On such a scenario, it may be more advantageous if a definition of each sense is relatively autonomous, so that its comprehension does not send the dictionary user on a quest for clues all over the entry. One way to approach the issue of how closely the senses should be interrelated is through the primary function of the dictionary. Using a dictionary for comprehension favours quick consultation, and for such uses, relatively autonomous senses might work best. In contrast, if an entry is used for browsing or vocabulary learning, the user is likely to spend more time examining larger portions of the entry, and for such uses a more holistic approach to defining may be more suitable.
References Adamska-Sałaciak, A. (2010) Why we need bilingual learners’ dictionaries. In: I. J. Kernerman and P. Bogaards (eds) English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: K Dictionaries, 121–37. Atkins, B. T. S. (1996) Bilingual dictionaries – past, present and future. In: M. Gellerstam, J. Jarborg, S.-G. Malmgren, K. Noren, L. Rogström and C. R. Papmehl (eds) EURALEX ‘96 Proceedings. Göteborg: Department of Swedish, Göteborg University, 515–46. Atkins, B. T. S. and Rundell, M. (2008) The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. Barnbrook, G. (2012) A sense of belonging: possessives in dictionary definitions. International Journal of Lexicography (published online 6 July 2012). Bogaards, P. (1998) Scanning long entries in learner’s dictionaries. In: T. Fontenelle, P. Hiligsmann, A. Michiels, A. Moulin and S. Theissen (eds) EURALEX ‘98
299
The Bloomsbury Companion to Lexicography Actes/Proceedings. Liege: Université Départements d’Anglais et de Néerlandais, 555–63. Burkhanov, I. (1997) On the correlation between lexicology, linguistic semantics and lexicography. Zeszyty Naukowe Wyższej Szkoły Pedagogicznej w Rzeszowie. Seria Filologiczna. Językoznawstwo 4/26, 55–73. Corino, E., Marello, C and Onesti, C. (eds) (2006) Atti del XII Congresso di Lessicografia, Torino, 6–9 settembre 2006. Allessandria: Edizioni dell’Orso. Cowie, A. P. (1999) English Dictionaries for Foreign Learners: A History. Oxford: Clarendon Press. Dziemianko, A. and Lew, R. (2006) When you are explaining the meaning of a word: the effect of abstract noun definition format on syntactic class identification. In: E. Corino et al. (eds), 857–63. Fabiszewski-Jaworski, M. (2011) Spontaneous defining by native speakers of English. In: K. Akasu and S. Uchida (eds) ASIALEX2011 Proceedings Lexicography: Theoretical and Practical Perspectives. Kyoto: Asian Association for Lexicography, 102–9. Fraser, B. L. (2008) Beyond definition: organising semantic information in bilingual dictionaries. International Journal of Lexicography 21/1, 69–93. Frawley, W. (1989) The dictionary as text. International Journal of Lexicography 2/3, 231–48. Fuertes-Olivera, P. A. and Arribas-Baño, A. (2008) Pedagogical Specialised Lexicography: The Representation of Meaning in English and Spanish Business Dictionaries (Terminology and Lexicography Research and Practice 11). Amsterdam: John Benjamins. Hanks, P. (1987) Definitions and explanations. In: J. Sinclair (ed.), 116–36. —(2000) Do word meanings exist? Computers and the Humanities 34/1–2, 205–15. —(2009) Review of Stephen J. Perrault (ed.) (2008) Merriam-Webster’s Advanced Learner’s English Dictionary. International Journal of Lexicography 22/3, 301–15. Hanks, P. and Pearsall, J. (eds) (1998) New Oxford Dictionary of English. Oxford: Oxford University Press. Hartmann, R. R. K. (1985) Contrastive text analysis and the search for equivalence in the bilingual dictionary. In: K. Hyldegaard-Jensen and A. Zettersten (eds) Symposium on Lexicography II. Proceedings of the Second International Symposium on Lexicography, May 1984, at the University of Copenhagen (Lexicographica. Series Maior 5). Tübingen: Max Niemeyer, 121–32. —(1994) Bilingualised versions of learners’ dictionaries. Fremdsprachen Lehren und Lernen 23, 206–20. Hiorth, F. (1954) Arrangement of meanings in lexicography: purpose, disposition and general remarks. Lingua 4, 413–24. Jarošová, A. (2000) Problems of semantic subdivisions in bilingual dictionary entries. International Journal of Lexicography 13/1, 12–28. Kilgarriff, A. (1997) I don’t believe in word senses. Computers and the Humanities 31/2, 91–113. Kipfer, B. A. (1984) Workbook on Lexicography: A Course for Dictionary Users. Exeter: University of Exeter. Lew, R. (2004) Which Dictionary for Whom? Receptive Use of Bilingual, Monolingual and Semi-bilingual Dictionaries by Polish Learners of English. Poznań: Motivex. —(2009) Towards variable function-dependent sense ordering in future dictionaries. In: H. Bergenholtz, S. Nielsen and S. Tarp (eds) Lexicography at a Crossroads: Dictionaries and Encyclopedias Today, Lexicographical Tools Tomorrow (Linguistic insights – studies in language and communication, Vol. 90). Bern: Peter Lang, 237–64. —(2010) Users take shortcuts: navigating dictionary entries. In: A. Dykstra and T. Schoonheim (eds) Proceedings of the XIV EURALEX International Congress. Ljouwert: Afûk, 1121–32.
300
Identifying, Ordering and Defining Senses Lew, R. and Dziemianko, A. (2006a) A new type of folk-inspired definition in English monolingual learners’ dictionaries and its usefulness for conveying syntactic information. International Journal of Lexicography 19/3, 225–42. —(2006b) Non-standard dictionary definitions: what they cannot tell native speakers of Polish. Cadernos de Traduçao 18, 275–94. —(2012) Single-clause when-definitions: take three. In: Proceedings of 15th EURALEX International Congress, Oslo, 7–11 August, 2012. Oslo: Oslo University, 997–1002. Lew, R. and Pajkowska, J. (2007) The effect of signposts on access speed and lookup task success in long and short entries. Horizontes de Lingüística Aplicada 6/2, 235–52. Lew, R. and Tokarek, P. (2010) Entry menus in bilingual electronic dictionaries. In: S. Granger and M. Paquot (eds) eLexicography in the 21st Century: New Challenges, New Applications. Louvain-la-Neuve: Cahiers du CENTAL, 193–202. Manley, J., Jacobsen, J. R. and Pedersen, V. H. (1988) Telling lies efficiently: terminology and the microstructure in the bilingual dictionary. In: K. Hyldgaard-Jensen and A. Zettersten (eds) Symposium on Lexicography III. Tübingen: Max Niemeyer, 281–301. McCreary, D. R. (2008) Looking up ‘hard words’ for a production test: a comparative study of the NOAD, MEDAL, AHD, and MW Collegiate Dictionaries. In: E. Bernal and J. DeCesaris (eds) Proceedings of the XIII EURALEX International Congress. Barcelona: Universitat Pompeu Fabra, 1287–93. —(2010) Three collegiate dictionaries: a comparison of reading comprehension test scores for university students using MWCD11, AHD4, and NOAD2. In: I. Kernerman and P. Bogaards (eds) English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: K Dictionaries, 55–74. McCreary, D. R. and Amacker, E. (2006) Experimental research on college students’ usage of two dictionaries: a comparison of the Merriam-Webster Collegiate Dictionary and the Macmillan English Dictionary for Advanced Learners. In: E. Corino et al. (eds), 871–85. Mish, F. C. (ed.) (2004) Merriam-Webster’s Collegiate Dictionary, 11th edn. Springfield, MA: Merriam-Webster Incorporated. Nesi, H. and Tan, K. H. (2011) The effect of menus and signposting on the speed and accuracy of sense selection. International Journal of Lexicography 24/1, 79–96. Nowakowski, M. (1990) Metaphysics of the dictionary versus the lexicon. In: J. Tomaszczyk and B. Lewandowska-Tomaszczyk (eds) Meaning and Lexicography. Amsterdam: John Benjamins, 5–19. Osselton, N. E. (2007) Innovation and continuity in English learners’ dictionaries: the single-clause when-definition. International Journal of Lexicography 20/4, 393–9. Piotrowski, T. (1994) Problems in Bilingual Lexicography. Wrocław: Wydawnictwo Uniwersytetu Wrocławskiego. Rundell, M. (1999) Dictionary use in production. International Journal of Lexicography 12/1, 35–53. —(2006) More than one way to skin a cat: why full-sentence definitions have not been universally adopted. In: E. Corino et al. (eds), 323–37. Shcherba, L. V. (1995) Towards a general theory of lexicography. International Journal of Lexicography 8/4, 314–50. Sinclair, J. (ed.) (1987) Looking Up: An Account of the COBUILD Project in Lexical Computing. London, Glasgow: Collins. Sinclair, J. and Hanks, P. (1987) Collins COBUILD English Language Dictionary (COBUILD1). London, Glasgow: Collins. Stein, G. (2011) The linking of lemma to gloss in Elyot’s Dictionary (1538). In: O. Timofeeva and T. Säily (eds) Words in Dictionaries and History. Essays in Honour of R. W. McConchie. Amsterdam: John Benjamins, 55–79.
301
The Bloomsbury Companion to Lexicography Tono, Y. (1984) On the Dictionary User’s Reference Skills. BEd. Thesis, Tokyo Gakugei University. —(1992) The effect of menus on EFL learners’ look-up processes. Lexikos 2, 230–53. —(1997) Guide Word or Signpost? An experimental study on the effect of meaning access indexes in EFL learners’ dictionaries. English Studies 28, 55–77. —(2001) Research on Dictionary Use in the Context of Foreign Language Learning: Focus on Reading Comprehension (Lexicographica. Series Maior 106). Tübingen: Max Niemeyer. van der Meer, G. (2004) On defining: polysemy, core meanings, and ‘great simplicity’. In: G. Williams and S. Vessier (eds) Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004, Vol. 2. Lorient: Université de Bretagne Sud, 807–15. van der Meer, G. and Sansome, R. (2001) OALD6 in a linguistic and a language teaching perspective. International Journal of Lexicography 14/4, 283–306. Wojciechowska, S. (2012) Conceptual Metonymy and Lexicographic Representation. Frankfurt: Peter Lang.
302
4.10
A Theory of Lexicography – Is There One? Tadeusz Piotrowski
Chapter Overview Introduction What Is a Theory? Theories of Lexicography Lexicographic Theory and Theories in Science Components in a Theory of Lexicography Validity of Theories Conclusion
303 304 307 308 312 316 318
1 Introduction This chapter will above all clarify methodological and notional issues encountered in discussions of theories of lexicography – first, it will answer the question whether there are theories of lexicography, second, it will discuss what their components are, and, third, it will touch on their validity. As it will be about theories, the chapter can be called a meta-metalexicographical one. In what follows I will treat such expressions as a theory of lexicography, theoretical lexicography, or metalexicography as synonymous. As usual, there are idiosyncratic uses of such terms. Hüllen (1999), for example, makes a contrast between lexicography, which for him denotes a theory, and dictionary-making, which is the name for practice. For Wiegand (1984) metalexicography is a general term, and encompasses a theory of lexicography. 303
The Bloomsbury Companion to Lexicography
There is an enormous body of literature on theory in general, its components and status in research, above all in the sciences; quite a lot of that is strictly philosophical, and belongs to the philosophy of science (cf. e.g. Psillos and Curd 2008). This chapter also applies notions from the philosophy of science to lexicographic theories. I will be as non-technical as possible, which also means that the analysis will be fairly superficial, on the level of Ziman (1984), which is a highly readable survey of the mainstream philosophy and sociology of science. Unless noted otherwise, the type of dictionary that I will implicitly refer to will be the general (commercial) monolingual dictionary, and the lexicographic activities that aim at producing commercial monolingual and bilingual dictionaries will be called practical lexicography. Research dictionaries, products of academic lexicography, though in general similar to other lexicographic reference works, have a number of other qualities which will be outside the scope of this chapter.
2 What Is a Theory? In the literature on lexicography in the English language one can find quite strong statements against theories. The quotation following is fairly typical: ‘Lexicography is above all a craft, the craft of preparing dictionaries . . . A science has a theory, a craft does not . . . how can there be a theory of the production of artefacts? there is no theory of lexicography’ (the arrangement of quotes does not follow their order in the book). These authoritative statements were written by Henri Béjoint in his recent book (2010: 381). Unfortunately, such broad generalizations can be falsified fairly easily. First, let us sort out terminological issues. ‘An artifact may be defined as an object that has been intentionally made or produced for a certain purpose’ (Hilpinen 2011); a dictionary certainly is an artefact, as is a violin, or a bridge. In contrast to what Béjoint writes, theories of artefacts do exist. Hilpinen (2011) refers to some of them, for example to Margolis and Laurence (2007). There are also books which refer to theories of artefact production, for example to bridge design (Zhao and Tonias 2012). Though based on findings of sciences, bridge design is not a science, it is a craft. And yet it has a theory and a philosophy. One can wonder further why crafts that produce strictly linguistic objects, such as dictionaries, and other texts, are supposed to lack theories. There is a very well-known ancient craft that does produce linguistic objects, a craft that very likely provided a stimulus towards production of the first dictionaries that we know from Sumer. The craft is called translation. Translation is as venerable as lexicography, like dictionary-making it is also both a craft and an art. Yet there is no dearth of theories of translation (cf. e.g. Gentzler 2001, Pym 2010).
304
A Theory of Lexicography – Is There One?
There are numerous analogies between translation and lexicography, not only when equivalence in bilingual dictionaries is involved, and I will refer to those analogies in what follows. What Pym has to say about theories in translation can be very easily applied to lexicography. Translators, like lexicographers, are said to need no theories in their work. But Pym (2010: 2) says: Translators are theorizing all the time . . . whenever they decide to opt for one rendition and not others, they bring into play a series of ideas about what translation is and how it should be carried out. They are theorizing . . . A theory sets the scene where the generation and selection process takes place. Translators are thus constantly theorizing as part of the regular practice of translating. While working on a dictionary, do not lexicographers have to decide between several possible solutions? In their decision making they usually rely on previous models of lexicographic practice, that is, on previous theories of what lexicography is, even though the theories are quite often implicit. It is fair to conclude then that, like translators, lexicographers are theorizing all the time, we can add also that they often follow implicit theories from the past. The main trouble, it seems, in deciding whether some ideas or generalizations form a theory or not is with the meaning of the word theory. It is rather puzzling that most often those who write about theoretical lexicography do not define what a theory is. The reluctance of researchers to define what they mean by theoretical lexicography was noted by Wiegand as early as 1989. Béjoint certainly does not define it, though it is rather clear that he refers to scientific theories, that is, those formulated within natural sciences. Few people would agree that lexicography is a branch of natural science, therefore perhaps the concepts of scientific theories should not be applied to lexicography. For a description of the meaning of the word theory it is only natural to turn to a dictionary. theory 1 a system of rules, procedures, and assumptions used to produce a result . . . 5 a set of hypotheses related by logical or mathematical arguments to explain and predict a wide variety of connected phenomena in general terms ⇒ the theory of relativity (www.collinsdictionary.com/dictionary/english/theory)
305
The Bloomsbury Companion to Lexicography
Sense 1 is even more clear in the American Collins Dictionary, while sense 5 is less clear there, and I will not quote it: theory 5 that branch of an art or science consisting in a knowledge of its principles and methods rather than in its practice; pure, as opposed to applied, science, etc. (www.collinsdictionary.com/dictionary/american/theory) It follows then that there are two distinct senses of the word theory: 1. ‘a study of principles and methods (of an art or science)’; 2. ‘a cohesive set of hypotheses’. The first sense is usually contrasted with the word practice, and it is this sense that seems to be used most often (though, to repeat, undefined) with reference to lexicography, as we can see in the typical title of a book on lexicography: A Handbook of Lexicography. The Theory and Practice of Dictionary-Making (Svensén 2009). Svensén does not say what he means by the word theory, either. In another recent book Atkins and Rundell (2008: 9) say ‘is this absence of theory such a bad thing? It may make more sense to think in terms of the principles that guide lexicographers in their work.’ In the two sentences they first use the word theory in sense 2 – though, as usual, they do not define it, then they use the term principles that is synonymous, as we have seen, with theory in sense 1. Atkins and Rundell describe principles of lexicographers’ work in their book, so, logically, they wrote a theory of lexicography. What Atkins (1992/93: 7) wrote in an earlier paper about the aims of theoretical lexicography is very similar to Pym’s description of the theorizing of a translator: Every editor of every new dictionary must make decisions on how to manage every one of . . . aspects of lexicography, and more. Theoretical lexicography must provide a theoretically sound, yet practical, basis for such decision-making. To do so requires awareness of these points in the dictionary design process where there is editorial choice and those where there is none. In this paper she did define theoretical lexicography, it is ‘a body of theory related to lexicography’ (Atkins 1992/93: 4), which cannot be said to be very precise. However, on one understanding of the word theory Sue Atkins produces a theory of lexicography. We may therefore conclude that whenever we have a description of the principles of lexicographers’ work – systematic and coherent – we have a theory of lexicography. From this point of view lexicography, like translation, has a number of theories. 306
A Theory of Lexicography – Is There One?
3 Theories of Lexicography In terms of their generality, theories of lexicography can be arranged in a hierarchy. Style Guides are theoretical statements that describe how to write a specific dictionary, one level up there would be those theories that discuss in general terms lexicographic principles from a practical point of view, that is, concentrating on those that the authors think are most important in practice, for example because the dictionaries they describe are frequent enough, and this is the case with Atkins and Rundell’s book on lexicography. However, a lexicographer who wants to write a dictionary of synonyms, or a terminological dictionary, would not find their book very helpful. So, on the highest level, there should also be a theory general enough to cover all dictionaries and all aspects of lexicography. That would be the subject matter of a general theory of lexicography. Tarp (2008: 9) calls, somewhat tautologically, general theories ‘general (sic!) summarising statements about lexicography’, while specific theories are for him statements about sub-areas of lexicography. He also presents (2008: 11) a diagram that shows the relationship between practice and theory, which can be interpreted using the hierarchy that I have described. The monumental publication Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography. (Hausmann et al. 1989–91) does include theoretical articles that describe most aspects of lexicography. Unfortunately, they are formulated from various points of view and do not present a unified description, even though in strictly theoretical sections it is Herbert Ernst Wiegand’s articles that predominate. The main ideas from the Encyclopedia are presented far more coherently in shorter theoretical sections added to the multilingual dictionary of lexicographic terms (Wiegand et al. 2010). Wiegand’s work is the most ambitious and comprehensive general theory of lexicography that I know (cf. a list of his publications at www.gs.uni-heidelberg.de/personen/wiegand. html). Unfortunately, two qualities make it rather hermetic. First, it is mostly written in German (though a selection of Wiegand’s papers were published in English in Wiegand [1999]), second, even for a reader fluent in German his style proves to be an insurmountable problem (cf. Welker 2009). There is one important problem with most general theories that I know. Briefly, the theories do not go deep enough. They are systematic accounts of lexicography but they are not critical enough. Wierzbicka says that ‘even the best lexicographers, when pressed, can never explain what they are doing, or why’ (Wierzbicka 1985: 5). This statement can be interpreted as saying that lexicographers work according to hidden assumptions, and a theory should uncover them (cf. also various views on this issue, e.g. Richard Hudson’s, in Béjoint [2010: 346–7]). One notable exception could be the theory behind the COBUILD project, which challenged many cherished tenets of lexicography (Sinclair 1987), in particular those of pedagogical lexicography. If lexicography 307
The Bloomsbury Companion to Lexicography
is to change, then those assumptions have to be described, because for centuries dictionaries have been compiled on their basis, and the environment in which dictionaries are used, their form and function are changing rapidly now. We also have different views on the nature of languages than before. That means that the early assumptions on which dictionaries were created are no longer valid. Dictionaries are cultural products, and ‘theory is a critique of common sense, of concepts taken as natural’, it is ‘the demonstration that what has been thought or declared natural is in fact a historical, cultural product’ (Culler 2000: 14). These assumptions are precisely historical, cultural products, and it is the function of a theory to identify them. When Werner Hüllen says that behind early dictionaries one can see ‘such essential assumptions as “words are self-contained semantic entities of a language” or “words as names identify objects in the world”’ (Hüllen 1999: 4), then it is clear not only that numerous dictionaries are based on them (e.g. the Merriam-Webster dictionaries), but that we can see them deep down in the theoretical statements about lexicography, as, for example, in the Tarp book from 2008 (cf. Piotrowski 2009). Patrick Hanks (2000: 13) is certainly right when he objects to the view that words are entities, objects, saying that ‘treating meanings as events rather than objects yields a more satisfactory explanation of the dynamic nature of language than treating them as objects’. He tacitly identifies one of the most important hidden assumptions in lexicography, and, indeed, in Western culture in general, in which words are identified with objects called concepts, notions, etc. The question is obviously how to describe events in a dictionary. It is likely that the new electronic medium will make it possible, as the lexicography of the future will be computational lexicography. In another paper Hanks suggests that ‘a major task for computational lexicography will be to identify meaning components . . . they may at heart be quite simple structures: much simpler, in fact, than anything found in a standard dictionary. But different’ (Hanks, 2000/2008: 134).
4 Lexicographic Theory and Theories in Science The authors I have cited, like Béjoint, seem to think that if lexicographical theory would not be like those in the sciences then there is no theory at all. Perhaps it was Wiegand who first very authoritatively stated that, and he has been followed since: Lexicography was never a science, it is not a science, and it will probably not become a science .Scientific activities as a whole are aimed at producing theories, and precisely this is not true of lexicographical activities. We must 308
A Theory of Lexicography – Is There One?
bear in mind that writing on lexicography is part of meta-lexicography and that the theory of lexicography is not part of lexicography. (Wiegand 1984: 13) It has to be remembered that for Wiegand science very likely means research activities in general, as the German word Wissenschaft ‘science’ is not associated primarily with the natural sciences, as it is in English. Wiegand very carefully distinguishes between lexicography, the practice itself, and metalexicography, research into the practice. Lexicography produces dictionaries, not theories, while metalexicography does not produce dictionaries but general statements about them. Accordingly, metalexicography can be a science, while lexicography is not. This is a crucial distinction, which is often obscured. This preoccupation with the scientific nature of lexicography and metalexicography echoes the endless discussions of whether linguistics is a science (cf. Clark 2006), or, even more generally, whether the humanities can be like the sciences, with the natural sciences and their methods, above all physics, being models, paragons, for any scholarly activity. However, it is clear that what lexicographers do does not differ, in essence, very much from what descriptive linguists do (cf. also Hanks 2000). Lexicographers study – observe – linguistic data, most often with preconceived conceptions, intuitions, about lexical units, which we might call naïve hypotheses. While working on a dictionary they form generalizations on the basis of the data, which quite often clash with these hypotheses. These generalizations are recorded in dictionary entries, and are, above all, classifications of facts of language. One well-known classification is that into word classes, such as nouns, verbs, etc. But also so-called descriptions of meaning result from lexicographers’ classifications of their material. Lexicographers group together contexts (quotations) with the item they want to describe, in each group the word has the same, or a similar, contribution to the meaning of the context. This is what Patrick Hanks (2000: 12) calls ‘unique contributions of words to the meaning of sentences in which they occur’. This contribution is then given a label, which we call a definition, and recorded in a dictionary entry. If descriptive linguistics is scientific, then obviously lexicography is also scientific. What is different is that linguists are there to discover the truth, to use a pompous phrase, to discover something about reality. They do not care about uses of that truth, that is, whether the users of the description will actually understand it or need it for practical purposes. Lexicographers are there to discover the truth AND to present their findings in such a way that they will be practical for a specific type of user. Thus, the difference between lexicography and linguistics primarily lies in the use of a certain method of description and presentation of data in dictionaries, which, in turn, is usually chosen because of user needs. 309
The Bloomsbury Companion to Lexicography
I used the expression hypotheses above. For practising lexicographers they are their own intuitions, convictions about a word, products of their life with their language. Working on a dictionary, they quite often find that their intuitions are too individual – they have been formed in the course of their own personal experience, and that their convictions do not agree with what they find in the empirical data, the texts. Therefore they change the first draft of the explanation of meaning, to allow for what can be generalized from the data. The finished entry is also a hypothesis, because it is impossible for the lexicographer to take account of all texts. Another lexicographer, studying a different set of contexts, may come to a different conclusion. Hypotheses are also an important initial stage for any scholarly research which aims at producing a theory. Scientists work almost exactly like lexicographers, the difference is in the methodological rigour which they use to test the hypothesis and to present the results, showing how well the tested hypothesis fits the empirical data, that is, whether it is confirmed or refuted. We do not find this rigour in a dictionary, we tend to believe the authority and integrity of the lexicographer. If we turn to theories of lexicography, we also find numerous hypotheses in them, even if this word is not used. Any sentence that says ‘the dictionaries usually contain’, ‘in the majority of the dictionaries this can be found’, ‘most dictionary entries include’ etc. are in fact hypotheses, which, when rephrased more precisely, could be easily empirically tested, that is, they are in essence falsifiable. It is unfair, thus, to say that a theory of lexicography does not put forward falsifiable hypotheses (cf. Wiegand 1989: 261). It does, though from a scientist’s point of view they are not formulated precisely, nor tested properly on empirical data. What I usually find in theories of lexicography is discussion of various theoretical principles on the basis of one or two model dictionaries. This is not good empirical evidence on which to form general conclusions. One sub-field of theoretical lexicography, research into user behaviour, which started off as informal surveys (e.g. Quirk 1974), now uses rigorous empirical methods, though Tarp (2009) makes valid critical remarks on the choice of samples, that is, groups of users, studied. In the light of the above discussion it is no wonder that a historian of dictionaries and lexicography, Werner Hüllen, has no doubts about the status of theory in lexicography, when he says that ‘during the nineteenth and twentieth centuries, [it] has developed into the analytical, fully grown, hypothesis-driven science we have today’ (Hüllen 1999: 4). Wiegand (1989, cf. also Tarp 2008: 7–11), on the other hand, remains sceptical about it, and would be most happy if the word theory were not used at all. There are theories of lexicography, they can be even said to be scientific, that describe dictionaries. Do those theories contain explanations, that is, do they answer the question why certain methods have been used. Explanation 310
A Theory of Lexicography – Is There One?
is usually contrasted with ‘mere’ description of empirical facts in the sciences. This contrast will be better seen from an example: All of the accounts of scientific explanation described below would agree that an account of the appearance of a particular species of bird of the sort found in a bird guidebook is, however accurate, not an explanation of anything of interest to biologists (e.g. the development, characteristic features, or behavior of that species). Instead, such an account is ‘merely descriptive’. (Woodward 2011) Dictionaries are more like bird guidebooks, and so are their theories. Theories of lexicography, and, indeed, histories of dictionaries as well, usually do not explain lexicographic products, they are systematic accounts of dictionaries. In contrast, theoreticians of translation are not content with description of similarities and differences between the original and the translation, though in many theories of translation we do find catalogues of such discrepancies. Theoreticians of translation would like to explain, however, why a translator used a certain strategy. Those lexicographic theories that are intended to be practical have a strong prescriptive bent, that is, the metalexicographer suggests that a practitioner should use this or that method or structure. This prescriptivism is a very important point of difference between theories in the sciences and in lexicography. The former are not prescriptive. Occasionally these prescriptive remarks about dictionaries are linked to some purported needs of the user, which are rarely given any empirical grounding, simply following some assumptions derived from logical criteria, such as the level of proficiency. Or they are supported by some reference to linguistic theories, though in an extremely heterogeneous way, and one can suppose that it is the other way round, that it is traditional methods of lexicography that are given some support from linguistics in these books. For example, most theoretical descriptions of equivalents in a bilingual dictionary use the early structural approach, in which are distinguished full equivalents, partial equivalents, zero equivalents (cf. e.g. Svensén 2009: 253–75). Strange as it might seem, there is strong opposition to the use of the notion of equivalence in contemporary translation studies, because the very concept is seen to be confusing rather than enlightening. To defend the traditional approach in lexicography Adamska-Sałaciak (2010: 403) says that in bilingual dictionaries ‘what matters is, on the one hand, the expectation of equivalence on the part of the dictionary user and, on the other, the lexicographer’s intention to meet this expectation’. Thus, a lexicographer provides the naïve users with what they falsely believe are the building blocks of translation, that is, words with their equivalents in the other language. Hartmann’s comment is 311
The Bloomsbury Companion to Lexicography
even more telling; he says that the bilingual dictionary is ‘a repository of the collective equations established by generations of “translating lexicographers”’ (Hartmann 2007 [2005]: 18). Should the user, however, be very much interested in the history of ‘collective equations’? And should a theoretician defend the status quo of lexicography, or should he or she rather suggest ways of overcoming folk beliefs, such as those described by Hüllen, entombed in a dictionary?
5 Components in a Theory of Lexicography Before starting discussion of components of a theory there is an important distinction that has to be made. Namely, should a theory of lexicography discuss only those specific methods whose results can be called structures of dictionaries, and the structures themselves, that is, lexicographic form, or should it also discuss what those structures contain, that is, lexicographic content? The problem is that dictionary structures are extremely abstract, or, to use another word, empty, to be filled in by relevant data, and the same structures are used again and again in a dictionary, chained (recursively), or one within the other, embedded. This difference between form and content is common knowledge for all those who encode a dictionary for computer processing. Normally in a printed dictionary both lexicographic form and linguistic content are merged together; in contrast, in a computer-encoded dictionary the two are best kept apart, the computer merges them for the user on display or when printing. Here is a typical example of an entry, without content, with explanations. The lines _______ indicate empty slots.
_______ => headword _______ => hyphenation _______ => pronunciation
_______ => part-of-speech label
_______ => definition
The structure was taken from Guidelines P5 of the Text Encoding Inititative (TEI), in which Chapter 9 is a description of the structure of dictionaries for the purpose of computer encoding (www.tei-c.org/release/doc/tei-p5-doc/en/html/ DI.html). 312
A Theory of Lexicography – Is There One?
This abstract form could correspond to a number of different entries, for example to coracle cor·a·cle \`koṙ-ə-kəl, `kär-\ noun : a small boat used in Britain from ancient times and made of a frame (as of wicker) covered usually with hide or tarpaulin or to brunch \`brənch\ noun : a meal usually taken late in the morning that combines a late breakfast and an early lunch (adapted from http://www.merriam-webster.com/dictionary/) In the latter example the slot hyphenation was left empty, therefore it is not shown in the final stage. While the TEI chapter is fairly long and complex, Ide et al. (2000: 113) say that: descriptions of dictionary structure have been informed by the format of printed dictionaries, . . . the constraints imposed by these formats interfere with the development of a model that fully captures the underlying structure of lexical information. As a result, although schemas such as those provided in the TEI Guidelines exist, they do not provide a satisfactorily comprehensive and unique description of dictionary structure and content. This shows how important it is to describe properly the abstract structure, because of its complexity. It is important for any theory of lexicography, and in fact it is perfectly possible to discuss a dictionary structure not using specific items, but discussing which classes of items (such as phrasal verbs) go with which structural elements (phrasal verbs as sub-entries? full entries?). This is in fact what Wiegand and Hausmann say in their description of the structure of monolingual dictionaries, in which they call the structures information code: The elements of the abstract linear micro-structure are information types; they may also be considered as information classes. Realizations of a type (or elements of a class) are called hereafter information. It is only after applying the specific information code of a dictionary to this information that the information data become specific textual segments of a dictionary article, i.e. lexicographic items. (Hausmann and Wiegand 1989: 341) It is the use of abstract information structures into which meaningful elements can be inserted that distinguishes lexicography from other types of information description, and this makes dictionaries like databases. It is the difficulty of 313
The Bloomsbury Companion to Lexicography
fitting descriptions of facts into those structures that makes lexicography so difficult, so unnatural (cf. Bolinger 1985). It has to be stressed that this feature can be found not only in dictionaries that provide information on a language, but also in dictionaries that provide information on objects, the latter type of reference work is often called the encyclopedia, the former the dictionary, and there is a continuum from purely linguistic dictionaries to encyclopedic dictionaries or encyclopedias. It is obvious that the lexicographic form in both dictionaries and encyclopedias is the same, that is, they both share the same conventions, though linguistic dictionaries are typically far more complex structurally, using recursion and embedding of lexicographic structures. Both encyclopedias and dictionaries are reference works of the same type (McArthur 1998/2003, Tarp 2011: 56–7), and any fully fledged theory of lexicography should take this into account. Lexicographic form firmly belongs to lexicography, does the content also belong to it? In linguistic dictionaries this content is a description of facts of a language, therefore quite often it is thought that lexicography is a branch of linguistics. In encyclopedias that content, however, is a description of facts from various fields of knowledge, and it is not suggested that an encyclopedia of mathematics makes lexicography a branch of mathematics. As the two aspects, form and content, have different functions, it would be best if in a theory the names of the elements in each would be independent of both typographical (or any other) or linguistic conventions. The elements of dictionary structures should have their own names, and the items to be inserted into the structures should have their own, to avoid confusion between the two. Therefore there are two aspects that a theory of linguistic lexicography is to take account of, lexicographic form and linguistic content. However, it was noted earlier that linguistic content is selected and prepared for insertion into lexicographic form from the point of view of the user. The user can be a human being, but also the computer. And this is the third dimension of a theory of lexicography. For most practical lexicographers a dictionary is a sort of message to the user, and, as in any communication act, the form and content of the message are adjusted to the interlocutor’s needs. These three aspects, or components, of the theory of lexicography can be called, using somewhat metaphorically the Morris terms (Morris 1938, cf. Lyons 1977: 114–20), syntactic, when the focus is on form, semantic, with the focus on content, and pragmatic, the focus being the user. In many theories of lexicography the three aspects were indeed distinguished, though perhaps not as explicitly. Geeraerts (1989), for example, distinguishes two aspects, structural and pragmatic, collapsing syntactic with semantic aspects under one label ‘structural’. Wiegand (1984) does have the three aspects in his general theory of lexicography, or, more precisely, in metalexicography, though not specifically identified. 314
A Theory of Lexicography – Is There One?
Recently it has been suggested that the pragmatic aspect of lexicography should be given absolute priority when compiling a new dictionary, within so-called function theory (Tarp 2008), which is a model of anticipated needs of users. While there is much that is quite interesting in the theory, it suffers from logical contradictions and quite important methodological drawbacks (cf. Piotrowski 2009). Further, while it is true that for practical lexicography the user perspective acts as a filter in presentation of lexicographic data, it does not have to be as important for collection and storage of the data. Logically, after creating a huge comprehensive lexicographic database it would be possible, using different interfaces that would filter and structure the data, to present the content in ways suited to the user. Moreover, Tarp’s user models are rigid, and based on his ideas of what the users need – there is no empirical data to support his theses. A better idea perhaps would be to produce a dictionary (or a dictionary interface) that would adjust itself to the user’s needs (cf. de Schryver 2010) as they arise, an idea that has been implemented in Intelligent Tutoring Systems for some time. Tarp in a later paper does indeed describe a far more flexible approach (Tarp 2011: 67–9) to electronic dictionaries (e-dictionaries), saying that lexicographical e-tools should be viewed ‘as one multifunctional dictionary with individualized search options within the framework of its defined functions’ (2011: 69), that is, basically along the lines of my proposal. The pragmatic and the syntactic aspects are peculiar to practical linguistic lexicography. What about the semantic one? Is the description of a language in dictionaries dependent on descriptions of languages in linguistics? Some people believe so, saying that lexicography is a branch of applied linguistics, like lexicology (whose status in English linguistics is unclear). A good recent discussion can be found in Tarp (2010), who himself thinks that lexicography, in particular its theory, is an independent field of study. Atkins and Rundell (2008) say that lexicographers should use linguistic studies in their descriptions of a language as inspiration, which means that the lexicographic description is different from the linguistic one. The chief difference between lexicography and linguistics was imaginatively described by Dwight Bolinger, who says: One potential actor is still waiting in the wings: the lexicographer. He is an unpretentious fellow, and perhaps that is why he has been there so long. If only he were not such a dilettante – insisting on looking at every wayward sense and anecdote, at frequencies and usages, at the whole of meaning instead of some theoretically important part. (Bolinger 1975: 224) Bolinger hints at the fact that linguists use idealization, which is ‘a deliberate simplification of something complicated with the objective of making it more tractable’ (Frigg and Hartmann 2012). For example, they take account of that 315
The Bloomsbury Companion to Lexicography
part of the lexicon only which falls under the scope of their theory. If we look, for example, at Wierzbicka’s work (e.g. Wierzbicka 1985), who writes a lot about lexicography, it is clear that she describes just a handful of words, and no lexicographer would do that. Wierzbicka’s descriptions are very detailed, while dictionaries include shallow descriptions of a large number of lexical items. In general, lexicology might be said to do intensive description of a small number of items, and lexicography does extensive description of a large number of items (cf. Hanks 2006). Obviously, a lexicographer also idealizes the data, because it is not possible to describe all possible contextual uses of a word in a dictionary. However, lexicographic idealization is carried out from the point of view of a specific type of user. The benefits of the independent lexicographic approach to the public and to linguists are obvious. It is not difficult today to collect huge masses of raw data, or even to process it using shallow methods, such as those described by Grefenstette (1998), who calls them approximate linguistics. For example, Sketch Engine is a tool to process text on a shallow level (Kilgarriff et al. 2004). While those shallow methods do produce valuable results, they are still just data to be interpreted. A dictionary has one advantage over such results – it contains huge numbers of items with interpretations of all the items it includes, by highly competent individuals, on the basis of the current norms of the society they live in. Whatever the quality of the interpretation, its one advantage is that it is there, in the dictionary. Therefore one should be cautious about assertions that lexicography, or its theory, should conform to the latest linguistic fashions. This is a point that Geeraerts makes implicitly in his 2007 paper. While Geeraerts himself thinks that practical lexicography is an applied branch of lexicology, that is, that lexicology formulates theoretical principles, and lexicography uses the principles in practice (Geeraerts, 2007: 1172), he says that ‘a number of existing definitional and descriptive practices in the dictionary that are somewhat suspect from an older theoretical point of view receive a natural interpretation and legitimacy in the theoretical framework offered by Cognitive Linguistics’ (2007: 1160–1). Thus, if lexicographers did not use some definitional practices because they had been condemned by one group of linguists, they might come under criticism from another group of linguists who think these practices are legitimate.
6 Validity of Theories As we have seen, there are various theories of lexicography and, naturally, some of them are in competition. In his theoretical book Tarp (2008) criticizes Wiegand’s theory, because it is too linguistically oriented and does not take into account lexical functions as much as it should. How to judge who is right, 316
A Theory of Lexicography – Is There One?
Wiegand or Tarp? This is the question about the validity of theories, and about predictions that they might make. ‘The most convincing demonstration of the validity of a scientific hypothesis is the successful prediction of a previously unobserved or unrecorded observation’ (Ziman 1984: 43). Are theories of lexicography predictive in this sense? A short answer is that they are not and that they will not be. This results, again, from the differences between the sciences and the humanities. The natural sciences form general statements about the world; these general statements, sometimes called scientific laws, are always true under specified conditions (or they are specific probabilities for the occurrence of a certain event). There are no exceptions to them, an exception simply means that the law is false. A scientific law will be true in future, too, and this makes it possible for scientists to make predictions about future events. The humanities, in contrast, cannot make such predictions. What they do make is some generalizations on the basis of the description of past events – in a way the humanities are always historical sciences. These generalizations are true only about the past. We do not know what the future will be, we can only hope that the nearest future will be similar to the present, and we extrapolate our expectations about the present, derived from past experiences, to the future. That is why practical manuals of lexicography can be used with some success in ‘normal’ times, because we think that tomorrow will be essentially like today. While in ‘normal’ times this extrapolation of principles derived from the past into the future works reasonably well, because social conditions change gradually, in unusual times (perhaps it would be apt to use here Kuhn’s phrase ‘the revolutionary change of paradigm’ [Kuhn 1970]) this does not work at all, because tomorrow can be quite different from today. In lexicography we certainly have the revolutionary situation, dictionaries change, they become abstract objects in virtual space, and are no longer concrete tomes on the bookshelf. It is highly likely that the dictionary of the future will not be perceived as an object at all, it will work like a background process. We already know this sort of dictionary, it is for example the spelling checker working with a word editor. The traditional models of dictionary distribution and use no longer function as they did. Also users’ expectations change rapidly. This also means that any principles derived from past behaviour of dictionary users, that is, from surveys of users’ needs and expectations, are doomed from the beginning. The knowledge of the past does not give us a clue as to how the users might use a dictionary in a completely different environment, that is, in the unknown future. So what does the validity of a lexicographic theory depend on? As usual in the humanities, it depends on the authority of the theoretician. We tend to believe that because Sue Atkins and Michael Rundell produced successful dictionaries in the past, we can be sure that their guide will help us make a good 317
The Bloomsbury Companion to Lexicography
dictionary in the future. This belief is fallible, however, as we have noticed. As long as theories are based on the authority of their makers they are not science, they are simply sets of beliefs. A good theory, especially if it makes some suggestions about the future, should also describe how to verify its claims.
7 Conclusion In this chapter I have argued that there are indeed theories of lexicography, from prescriptive descriptions of lexicographic methods, to general theories that cover all aspects of lexicography. The most developed general theory is that by Herbert Ernst Wiegand. A theory of lexicography should address three aspects, which make lexicography unique among other reference sciences: syntactic (lexicographic structures), semantic (content of structures), and pragmatic (user needs). There are unfortunately very few theories that discuss the most basic assumptions on which dictionaries are founded. Validity of theoretical models is primarily based on their efficiency in the past, therefore they cannot be used in times when the cultural and technological environment rapidly changes.
References Adamska-Sałaciak, A. (2010) Examining equivalence. International Journal of Lexicography 23/4, 387–409. Atkins, B. T. S. (1992/93) Theoretical lexicography and its relation to dictionary-making. Dictionaries 14, 4–43. Atkins, B. T. S. and Rundell, M. (2008) The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. Béjoint, H. (2010) The Lexicography of English. Oxford: Oxford University Press. Bolinger, D. (1975) Aspects of Language. New York: Harcourt, Brace, Jovanovich Inc. —(1985) Defining the indefinable. In: R. Ilson (ed.) Dictionaries, Lexicography and Language Learning. Oxford: Pergamon Press, 69–73. (Reprinted in T. Fontenelle [ed.] 2008). Clark, B. (2006) Linguistics as a science. In: K. Brown (ed.) Encyclopedia of Language and Linguistics. Oxford: Elsevier Science. Culler, J. (2000) Literary Theory: A Very Short Introduction. Oxford: Oxford University Press. De Schryver, G.-M. (2010) State-of-the-art software to support intelligent lexicography. In: R. Zhu (ed.) Chinese Lexicographic Research 2, 584–99. Sl: Chinese Social Sciences. Available at: www.hcxf.cn/read.asp?id=570. (Accessed 12 July 2012). Fontenelle, T. (ed.) (2008) Practical Lexicography. A Reader. Oxford: Oxford University Press. Frigg, R. and Hartmann, S. (2012) Models in science. In: E. N. Zalta (ed.) The Stanford Encyclopedia of Philosophy (Fall 2012 edn). Available at: http://plato.stanford.edu/ archives/fall2012/entries/models-science/. (Accessed July 2012). Geeraerts, D. (1989) Principles of monolingual lexicography. In: F. J. Hausmann et al. (eds), 287–96.
318
A Theory of Lexicography – Is There One? —(2007) Lexicography. In: D. Geeraerts and H. Cuyckens (eds) The Oxford Handbook of Cognitive Linguistics. Oxford: Oxford University Press, Chapter 44. Genztler, E. (2001) Contemporary Translation Theories, revised 2nd edition. Clevedon: Multilingual Matters. Grefenstette, G. (1998) The future of linguistics and lexicographers: will there be lexicographers in the year 3000? In: T. Fontenelle, P. Hiligsmann, A. Michiels, A. Moulin and S. Theissen (eds) Proceedings of the Eight EURALEX Congress. Liége: University of Liége, 25–41. (Reprinted in: T. Fontenelle [ed.] 2008). Hanks, P. (2000) Contributions of lexicography and corpus linguistics to a theory of language performance. In: U. Heid et al. (eds), 3–13. —(2000/2008) Do word meanings exist? Computers and the Humanities 34, 205–15. (Reprinted in: T. Fontenelle [ed.] 2008). —(2006) Lexicography. In: K. Brown (ed.) Encyclopedia of Language and Linguistics. Oxford: Elsevier Science. Hartmann, R. R. K. (2005) Interlingual references: on the mutual relations between lexicography and translation. The Hong Kong Linguist 25, 43–52. Reprinted in R. R. K. Hartmann (2007) Interlingual Lexicography Selected Essays on Translation Equivalence, Contrastive Linguistics and the Bilingual Dictionary. Tübingen: Max Niemeyer, 208–17. Hausmann, F. J. and Wiegand, H. E. (1989) Component parts and structures of general monolingual dictionaries: a survey. In: F. J. Hausmann et al. (eds), 328–60. Hausmann, F. J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (1989–91) Wörterbücher/ Dictionaries/Dictionnaires: An International Encyclopedia of Lexicography, Vols 1–3. Berlin: Walter de Gruyter. Heid, U. et al. (eds) (2000) Proceedings of the Ninth EURALEX International Congress, EURALEX 2000. Stuttgart: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. Hilpinen, R. (2011) Artifact. In: E.N. Zalta (ed.) The Stanford Encyclopedia of Philosophy (Winter 2011 edn), Available at: http://plato.stanford.edu/archives/win2011/entries/ artifact/. (Accessed 15 June 2012). Hüllen, W. (1999) English Dictionaries 800–1700. The Topical Tradition. Oxford: Clarendon Press. Ide, N., Kilgarriff, A. and Romary, L. (2000) A formal model of dictionary structure and content. In: U. Heid et al. (eds), 113–26. Available at: www.kilgarriff.co.uk/Publication s/2000-IdeKilgRomary-Euralex.pdf. Kilgarriff, A. et al. (2004) The Sketch Engine. In: G. Williams and S. Vessler (eds) Euralex 2004 Proceedings. Lorient: Université de Bretagne-Sud, 105–16. Reprinted in: T. Fontenelle (ed.) 2008. Kuhn, T. S. (1970) The Structure of Scientific Revolutions, 2nd edition. Chicago, IL: University of Chicago Press. Lyons, J. (1977) Semantics, Vol. 1. Cambridge: Cambridge University Press. Margolis, E. and Laurence, S. (eds) (2007) Creations of the Mind: Theories of Artifacts and Their Representation. Oxford: Clarendon Press. McArthur, T. (1998) What then is reference science? In: T. McArthur Living Words, Language, Lexicography and the Knowledge Revolution. Exeter: University of Exeter Press, 215–22. Reprinted in: R. R. K. Hartmann (ed.) (2003) Lexicography, Critical Concepts. Vol. III. Lexicography: Lexicography, Metalexicography and Reference Science. London: Routledge, 422–8. Morris, Ch. (1938) Foundations of the theory of signs. In: O. Neurath, R. Carnap and C. Morris (eds) International Encyclopaedia of Unified Science I. Chicago, IL: University of Chicago Press, 77–138. Piotrowski, T. (2009) Review of Tarp 2008. International Journal of Lexicography 22/4, 480–6.
319
The Bloomsbury Companion to Lexicography Psillos, S. and Curd, M. (eds) (2008) The Routledge Companion to Philosophy of Science. London: Routledge. Pym, A. (2010) Exploring Translation Theories. London and New York: Routledge. Quirk, R. (1974) The image of the dictionary. In: R. Quirk. The Linguist and the English Language. London: Edward Arnold, 148–63. Sinclair, J. (ed.) (1987) Looking Up: An Account of the Cobuild Project in Lexical Computing. London and Glasgow: HarperCollins. Svensén, B. (2009) A Handbook of Lexicography. The Theory and Practice of Dictionary-Making. Cambridge: Cambridge University Press. Tarp, S. (2008) Lexicography in the Borderland between Knowledge and Non-Knowledge. General Lexicographical Theory with Particular Focus on Learner’s Lexicography. Tübingen: Max Niemeyer. —(2009) Reflections on lexicographical user research. Lexikos 19. Available at: http:// lexikos.journals.ac.za/pub/article/view/440/157. (Accessed 10 July 2012). —(2010) Reflections on the academic status of lexicography. Lexikos 20 .Available at: http://lexikos.journals.ac.za/pub/article/view/152/94. (Accessed 13 July 2012). —(2011) Lexicographical and other e-tools for consultation purposes: towards the individualization of needs satisfaction. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds) e-Lexicography. The Internet, Digital Initiatives and Lexicography. London: Continuum, 54–70. Welker, H. A. (2009) An Overview of Wiegand’s Metalexicographic Works. Available at: www. let.unb.br/hawelker/images/stories/professores/documentos/Wiegand.pdf. (Accessed 20 July 2012). Wiegand, H. E. (1984) On the structure and contents of a general theory of lexicography. In: R. R. K. Hartmann (ed.) LEXeter’83 Proceedings. Papers from the International Conference on Lexicography at Exeter, 9–12 September 1983. Tübingen: Max Niemeyer, 13–30. —(1989) Der gegenwärtige Status der Lexikographie und ihr Verhältnis zu anderen Disziplinen. In: F. J. Hausmann et al. (eds), 246–80. —(1999) Semantics and Lexicography. Selected Studies (1976–1996). Edited by Antje Immken and Werner Wolski. Tübingen: Max Niemeyer. Wiegand, H. E. et al. (eds) (2010) Wörterbuch zur Lexikographie und Wörterbuchforschung / Dictionary of Lexicography and Dictionary Research. Band 1 / Volume 1: Systematische Einführung /Systematic Introduction + A–C. Berlin/ New York: Walter de Gruyter. Wierzbicka, A. (1985) Lexicography and Conceptual Analysis. Ann Arbor, NY: Karoma Publishers Woodward, J. (2011) Scientific explanation. In: E. N. Zalta (ed.) The Stanford Encyclopedia of Philosophy (Winter 2011 edn). Available at: http://plato.stanford.edu/archives/win2011/ entries/scientific-explanation/. (Accessed 6 July 2012). Zhao, J. and Tonias, D. (2012) Bridge Engineering. Rehabilitation, and Maintenance of Modern Highway Bridges, 3rd edition. New York: McGraw-Hill Professional. Ziman, J. (1984) An Introduction to Science Studies. The Philosophical and Social Aspects of Science and Technology. Cambridge: Cambridge University Press.
320
5 New Directions in Lexicography
5.1
e-Lexicography: The Continuing Challenge of Applying New Technology to Dictionary-Making Pedro A. Fuertes-Olivera
Chapter Overview Introduction: Background and Definitions Printed Online Dictionaries Replicated Online Dictionaries Functional Online Dictionaries The Accounting Dictionaries Conclusion Acknowledgements
323 326 327 329 334 337 337
1 Introduction: Background and Definitions The present chapter reflects on the future of dictionaries and analyses some of the challenges lexicography faces in connection with the expansion of the internet. This has resulted in the frequent use of the term e-lexicography with various meanings. For instance, discussions concerned with the term printed online dictionaries (see Section 2 below) indicate that e-lexicography is even used when discussing printed dictionaries that have simply been uploaded onto the internet. 323
The Bloomsbury Companion to Lexicography
I define the term e-lexicography as an Information Science scientific discipline that is mainly concerned with the development, planning, compilation and publication of electronic reference tools. It deals with aspects that are common to all reference tools, for example, users’ needs in the framework of a lexicographic theory. And it elaborates transformative concepts, that is, concepts that are based on performing theoretical analysis of potential user situations, the respective user conditions and the user needs with a view to developing new concepts for compiling dictionaries for specific use situations (see Fuertes-Olivera and Bergenholtz 2011 for a review; also Tarp 2007). The redaction of this paper needs several adjustments in order to comply with the above definition of the term e-lexicography. One of the adjustments consists in using the term online dictionary for referring to any type of information tool, be they dictionaries, knowledge databases, lexicons, glossaries, etc. that contains (or aims to contain) collections of structured data and are accessed through the internet, although some (or perhaps many) of them are not really internet dictionaries according to the above-mentioned definition of e-lexicography. Online dictionaries have attracted the attention of many researchers in the last three years. For instance, a thematic session in Lexicographica (Haß and Schmitz 2010), and several books (Bergenholtz et al. 2009, Granger and Paquot 2010, 2012, Fuertes-Olivera and Bergenholtz 2011, Kosem and Kosem 2011, Gouws et al. 2013) mostly deal with online dictionaries, although following different traditions and objectives. To these, we can add several articles published in leading journals and collections which make crystal clear that many scholars see the future of lexicography in connection with the electronic medium, mostly the internet. It goes without saying that this vast amount of research has led to various visions, some of which will be discussed in this chapter. The second adjustment defends broadening discussions on the challenges the internet poses for dictionary-making. These must go beyond a kind of microlexicographic analysis, that is, an analysis that is restricted to devices and technological innovations. Instead, it must face the challenges in a more global and abstract way, thus imitating what has happened in other spheres of human life. For instance, discussions on e-commerce were initially concerned with technical aspects for making payment secure. They are now centred on the changes this new concept has imposed on human behaviour, as we can now acquire goods and services from far away locations, which has modified the concept of customer and his or her relationship with retailers. Both adjustments indicate that the internet is a technology that represents more than a new lexicographic medium within the field of e-lexicography. To the best of my knowledge, most discussions on e-lexicography have usually been limited to issues connected with a range of natural language processing 324
e-Lexicography
approaches and tools that are available to augment accessibility and usefulness. My view is that these approaches are short-sighted as they do not pay attention to the relationship between the technology (the internet) and humans (the dictionary users), lexicographic developments, and costs (every activity has benefits and costs). The analysis of this interrelationship is the main aim of this chapter, structured into four more sections and a conclusion. Section 2 focuses on institutional challenges, which are connected with the decisions taken by traditionally oriented social and/or political bodies that influence the online dictionaries that are still being compiled in some places, here identified as printed online dictionaries. Section 3 deals with research challenges, especially with some often-quoted assumptions that are accepted without questioning their merits, even when the lexicographic methods and procedures underlying these assumptions deliver poor results, if any. The outcome of such assumptions results in replicated online dictionaries, some of which are only prototypes after several years of working on them. Section 4 analyses social and intellectual challenges for making functional online dictionaries. These are connected to my definition of e-lexicography and will therefore be illustrated with an example of what Tarp (2011: 60–1) identifies as Model T Ford dictionaries (Section 5). To sum up, this chapter broadens the concept of lexicographic challenges the internet poses for lexicography by incorporating the whole human factor into the discussion (see Chapter 5.3 for a narrower view). The human factor is here illustrated in terms of the distinction between contemplative and transformative lexicography, that is, contemplative meaning the practice of analysing existing dictionaries and questioning users about their use of existing dictionaries, with transformative meaning performing theoretical analysis of potential user situations, the respective user conditions and the user needs with a view to developing new concepts for compiling dictionaries for specific use situations (Tarp 2002, Bergenholtz et al. 2011). Within this debate, this chapter adopts a transformative stance and makes a fundamental claim: The development of online dictionaries has lent support to function-based lexicography, which is centred on the dictionaries and the users with the aim of generating true online dictionaries based on the functions they must fulfil (see Chapter 5.3). The validity of the above stance can be put into practice by taking costs on board and making a sound use of available devices, but must be always preceded by the lexicographer’s analysis that aims to characterize and typologize users’ needs in order to establish a basis upon which the corresponding lexicographic solution(s) can be found and developed. This idea accords with the main tenets of the function theory of lexicography, which is the theoretical base upon which this chapter studies the continuous challenge of applying new technology to dictionary-making (Bergenholtz and Tarp 2002, 2003, 2004, Tarp 2008, also Tono 2010 for a review of the function theory). 325
The Bloomsbury Companion to Lexicography
2 Printed Online Dictionaries There have been several attempts at proposing electronic dictionary typologies (De Schryver 2003, Fuertes-Olivera 2009, Lew 2011, Tarp 2011). These have identified the online dictionary as a type of electronic information tool that contains a collection of electronically structured data that can be accessed with internet tools with or without paying a subscription fee, enhanced with a wide range of functionalities, used in various environments, and linked to both external and internal information sources. De Schryver (2003), for example, includes Cerquiglini’s computer-assisted paper dictionary (cited in Pruvost 2000: 188) as a type of online dictionary. This type corresponds basically to my description of printed online dictionaries. The printed online dictionary described in this section coincides with Tarp’s (2011: 58–60) copycats, that is, mere copies of printed dictionaries, and faster horses, that is, traditionally conceived printed dictionaries slightly modified in order to equip them with quicker access by means of search engines and links. The Diccionario de la lengua española (Drael), which is compiled by the Spanish Royal Academy (http://buscon.rae.es/draeI/), is a typical example of such a dictionary type. For instance, although its access route allows searching by an approximate, it is not very efficient. I searched for lee (verb: present tense of the verb leer) and the system only retrieved le (personal pronoun); I searched for cocha, and only found two Spanish meanings (one for pig and one for lagoon), but no retrieval of orthographically related words, for example, coche (English car). Furthermore, the query system does not allow searching through partial words, anagrams and combinations. It is only equipped with a search in the entry field engine (Pastor and Alcina 2010), which means that searching in the content fields (e.g. definitions), external sources, and thematic indexes is not contemplated in this dictionary. The Drael poses a lexicographic challenge that I have identified as institutional challenge, which is the challenge of discarding lexicographic concepts created and used by well-known and respected social institutions. These concepts were created for making printed dictionaries and are not adequate for making online dictionaries. This is an institutional challenge as it must face the social prestige underlying well-known and traditional organizations such as the Spanish Royal Academy. If solved adequately, this challenge might eliminate the compilation of lexicographic imitations that deliver very poorly. This challenge is especially necessary in Spanish-speaking lexicographic circles, some of which are still compiling printed online dictionaries. These cost money to the Spanish taxpayer and result in deficient printed online dictionaries. The current compilation of the Diccionario de aprendizaje del español como lengua extranjera (Daele) illustrates this challenge. This dictionary is presented as an ‘ongoing electronic dictionary’ that ‘aims to develop a prototype for an online 326
e-Lexicography
learner’s dictionary for Spanish’ (Mahecha Mahecha and DeCesaris 2011: 180). The Daele (www.iula.upf.edu/rec/daele/) started as a project funded by the Spanish I + D funding agency in 2006, and it also receives funds from a private foundation and the Catalan R+ D funding agency. In March 2012, it offered structured lexicographic data on 307 verbs. A visit to its homepage and a review of some of the papers published so far explaining some of its lexicographic concepts (e.g. Bataner 2010, Mahecha Mahecha and DeCesaris 2011, Renau and Battaner 2011) reveals that the project is misguided, not only because it has delivered poorly after six years of public and private funding, but also because it follows lexicographic concepts associated with printed dictionaries, which in the Spanish-speaking world means lexicographic concepts similar to those associated with the Drael. For instance, the Daele has a macrostructure but no hyperlinks, audible pronunciation, maximizing or minimizing searches, or fuzzy spelling systems. The query system does not contemplate searching partial words, approximate expressions, anagrams, etc. And its homepage is cramped. Renau and Battaner (2011: 224) indicate that the Daele has ‘customising options’, which they identify as a very primitive system of selecting the amount of lexicographic data displayed. This connects my discussion with the second challenge, illustrated in the production of replicated online dictionaries.
3 Replicated Online Dictionaries Replicated online dictionaries constitute a very heterogeneous category that is integrated by dictionary projects that reproduce (or aim to reproduce) lexicographic practices and methods taken from dictionaries without questioning their adequacy for their specific lexicographic project. For instance, Renau and Battaner’s previous comment on customizing options does not question that there are more than the advanced learners, initially described by Hornby 64 years ago (Tarp 2008), that the way of presenting the data on the homepage can be more important than the data themselves (Almind 2005), and that there should be a way for eliminating the kind of Google-suffocation effect that has been described in different quarters and which refers to the presentation of many more data than the user can assimilate. (I will follow up this idea in Section 5.) The existence of replicated online dictionary projects is an example of a research challenge, which is the type of challenge associated with questioning academic assumptions, even those that are taken for granted. This means that whenever a new online dictionary project is initiated, its compilers must start by analysing whether methods, practices and concepts used in other online dictionaries will also serve them. For the sake of simplicity and for space reasons, I will show the working of this challenge in the context of making specialized 327
The Bloomsbury Companion to Lexicography
online dictionaries, that is, dictionaries covering areas outside general cultural knowledge and general language (Fuertes-Olivera 2010, Tarp 2012a, Nielsen forthcoming) that are currently presented as prototypes, that is, dictionary projects that are still on the drawing board. The number of online specialized dictionary prototypes described in the literature is vast and covers a large number of specialized fields. They go from prototypes for assisting in the production of scientific articles (Alonso et al. 2011) to prototypes for assisting in both communicative and cognitive use situations in a specific domain (Fernández and Faber 2011). The former are usually identified as dictionary prototypes, whereas the latter tend to adopt fancier names such as data banks or terminological databases. As they have not resulted in running online dictionaries yet, we cannot offer a detailed analysis of their merits. However, a critical review of the publications that explain the lexicographic concepts underlying the project indicates that they are assuming lexicographic methods and concepts that do not agree with the nature of the data to be included, which makes their compilers adopt decisions that are not bound to result in making real specialized online dictionaries. These prototypes assume that they can treat terms as if they were words, and thus claim that the compilation of specialized online dictionaries can be accomplished under the tenets of a particular linguistic theory, for example, Cognitive Linguistics, that the terms they need for making dictionaries can be spotted in specialized corpora, usually by performing keyness analyses, and that their meanings, uses, and usages can be identified through a detailed analysis of the concordances retrieved with software such as WordSmith Tools (Scott 2008). Using a particular linguistic theory for making dictionaries is inadequate per se, and will be discussed in the next section. Furthermore, relying on corpus data, especially on automatic terminological extraction, for selecting terms (at least many of them) as well as their synonyms and antonyms, writing definitions of terms, and preparing usage notes, especially in culture-bound subject fields, is almost impossible at the current state of knowledge and innovation (see Fuertes-Olivera 2012 for a review of the role of corpus in specialized lexicography). For instance, small case a is used in Spanish accounting texts to indicate that the accompanying account is booked (i.e. recognized) in the credit side in a system of double-entry bookkeeping. This meaning could not be discovered by performing a concordance analysis of a within a reasonable time span. Similarly, the lemmas included as example (1), which are a selection of Spanish International Accounting Standard (IAS)/International Financial Reporting Standard (IFRS) terms cannot be extracted from a corpus as this concept is defined in Corpus Linguistics. Nor could a linguistically conceived terminological corpus have been used for selecting the 3,000-odd 4-plus orthographic accounting terms included in Spanish accounting terminology, or for explaining 328
e-Lexicography
the meaning differences observed in more than 2,000 accounting terms with contiguous meanings, some of them being very specialized, whereas some others are popularized with or without idiosyncratic culture-bound meanings (example 2). plan de opciones sobre acciones para empleados transacción con pagos basados en acciones liquidada en efectivo formato de la cuenta de pérdidas y ganacias costes de transacción atribuibles a un activo o pasivo financiero patrimonio neto atribuible a los tenedores de instrumentos de patrimonio neto de la domonante Example 1. 7-plus orthographic words lemmatized in the Accounting Dictionaries reembolso (four meanings): (1) A reembolso is the regular repayment of instalments and interest on a loan (popularized culture-independent meaning. English amortisation). (2) A reembolso is the redemption of a debt or obligation (specialized meaning. English repayment). (3) A reembolso is payment received or to be received as compensation for expenditure (popularized culture-bound meaning; English reimbursement). Example 2. Meanings of reembolso in the Accounting Dictionaries To sum up, instead of replication lexicographers must focus on understanding the true nature of lexicography and prepare their online dictionary projects accordingly, as shown below.
4 Functional Online Dictionaries Functional online dictionaries are the product of applying a lexicographic theory to dictionary making. Bergenholtz (2011), Fuertes-Olivera and Nielsen (2012), Gouws (2011), Leroyer (2011) and Tarp (2011, 2012b), among others, claim that a lexicographical theory for making dictionaries is more than an ontological requirement in the era of the internet: it is a necessity for understanding the continuing challenge of applying new technology to this process. This necessity is explained as an intellectual and social challenge as it is mostly concerned with locating the options the internet offers within a theoretical framework that accepts the existence of an object of study (an information tool), which is rooted in the form of concepts, categories, theories and assumptions, has a (proto-) 329
The Bloomsbury Companion to Lexicography
history, contains independent methodological contributions and offers directions for practical actions (Tarp 2012b). At an abstract level, the theoretical framework envisaged defends the position that the very essence of lexicography is its capacity to provide quick and easy access to dictionary data from which information needed by different types of users in different types of social situations can be retrieved. During the last decade, the number of publications based on the above-mentioned relationship among data, access routes and user’s needs has grown considerably, as shown in the Introduction. For some of these publications, for example, Fuertes-Olivera and Bergenholtz (2011), the relationship among data, access routes and user’s needs must be understood as a change of paradigm in lexicography (Leroyer 2011), which is based on two related assumptions, that is, on two related intellectual and social challenges in the context of this chapter. The first assumption is that lexicography is an independent science within which one or more theories are possible. For example, the function theory of lexicography advocates that the core of lexicography is the design of utility tools that can be accessed and consulted easily with a view to meeting punctual information needs occurring for specific types of users in specific types of extra-lexicographical situations. The second assumption is that lexicography is not isolated from the rest of the world, that is, lexicography has relationships with many disciplines, this relationship being determined by the degree of knowledge needed for making a specific dictionary. The translation of both ideas to dictionary-making in the era of the internet has led proponents of the function theory to integrate lexicography into the realm of Information Science. This change sees lexicography as a consultation discipline, whose object of analysis is a lexicographical tool, for example, a true online information tool, which uses the technical possibilities offered by the internet, which uses concepts taken from a transformative view of lexicography, and which aims to offer dynamic articles with dynamic data that correspond to the specific types of information needs which specific types of users performing specific types of lexicographically relevant activities might have in any consultation situation (see Chapter 5.3). From this position, which is already possible in some online dictionaries, the internet might finally allow us to design information tools that allow users to recreate and re-represent their own dictionary data. Bothma (2011: 71), for example, discusses some information technologies that make visible the relationship between the technology already available with data presentation, expected costs and the satisfaction of users’ needs. He, however, approaches this option critically when he assumes that the use of existing (and for the same reason future) ‘technologies should not be simply because the technologies exist; they should only be adopted if they bring a higher level of efficiency to the dictionary and enhance the user experience with the 330
e-Lexicography
e-dictionary, that is, it allows the user to satisfy his/her information needs more effectively and more efficiently’. Bothma’s review of current existing information technologies indicates that some of them are already incorporated in online dictionaries, the ones identified as ‘Model T Fords’ (Tarp 2011: 60–1), that is, online dictionaries that have gone beyond traditional boundaries are making use of existing technologies in order to provide quicker data access, and are adapting the dictionary articles to the various functions displayed by the dictionary. His analysis shows that information technologies allow for the personalization of information presented to the user by means of filtering and adaptive technologies based on the user’s profile. Among the information technologies he cites that are being used or considered for use in online dictionaries, this chapter highlights some of them, either those that are already in use, as seen in Section 5, or those that might be used in the near future depending on how fast the intellectual and social challenges described in this chapter cope with them: zz Searching: Searching is the exploration of a defined information space with
a defined objective and search strategy; for instance, the function-based/ situation search functionalities in the Accounting Dictionaries. zz Navigating: Navigating is ‘the exploration of a defined or undefined information space without using a defined strategy’ (Bothma 2011: 81). Navigation is a very common way of moving between discrete online information entities and many dictionaries allow this functionality, which is usually presented as a kind of table of contents at the beginning of a dictionary article, as in Wikipedia, or as source, that is, as a link to an external text, as in the Accounting Dictionaries. zz User Profiling/Modelling: User profiling or modelling is the information technology that retrieves customized data based on a user’s profile that has been constructed through the user supplying the system with specific data, by the system tracking user behaviour, or a combination of both. Bothma indicates that to the best of his knowledge no existing online dictionary makes use of this information technology. As the technology is commonly used nowadays – for example, Facebook, Twitter and LinkedIn – I envisage its use in specialized online dictionaries in which compilers can easily create, say, three different articles of the same term for experts, semi-experts, and interested laypersons, which are the standard uses of any specialised information tool (Bergenholtz and Kaufmann 1997). Spohr (2011: 104) shows how this technology can work in a dictionary project he discusses, which is presented as a pluri-monofunctional model ‘in that it not only serves a single lexicographic function, but further allows for the dynamic extraction of different monofunctional dictionaries, that is, dictionaries serving exactly one lexicographic function’. 331
The Bloomsbury Companion to Lexicography zz Filtering: Filtering is the information technology that uses filters for
allowing users to select the amount and type of dictionary data retrieved. Filtering can be user-controlled or system-controlled, depending on whether the filter is based on the choices indicated by the user or on his or her previous searches. This information technology is currently being used in several dictionary projects, for example the function-based/use situation-based filter used in the Accounting Dictionaries. Spohr (2011: 114) also offers an indication of how this technology can work in his dictionary project, which is described as modular, that is, it has a multilayer architecture, ‘with the lexicographic data model at the top, followed by the lexicographic data and the access and presentation model at the bottom’. The lexicographic data model contains classes and descriptive instances and the properties and relations used to connect them. The lexicographic data layer contains actual lexical items that are described by means of properties and relations. Finally, the ‘access and presentation layer defines which of the entities (both classes, properties and instances) are relevant to which users in which situations’. zz Adaptive hypermedia: Adaptive hypermedia, which is subdivided into adaptive presentation and adaptive navigation support (Bothma 2011: 88), is the information technology that tailors what the user sees to his or her interests, goals, abilities, etc. Adaptive hypermedia is therefore concerned with the system’s ability to guide users into their specific search objectives. This technology allows the presentation of data tailored to a user’s needs. The use of adaptive hypermedia is done ‘by means of marking up data in the document’ (Bothma 2011: 91), a possibility that seems to be in its inception in some specialized dictionaries. For instance, in the New Palgrave Dictionary of Economics Online (www.dictionaryofeconomics.com/dictionary), users have an advanced search system that allows them to search in the full text of the dictionary article, the bibliography attached to the different dictionary articles, the article titles, the names of contributors, the abstract of the article and the list of keywords. These options are possible because the system has been previously marked on purpose. Bothma (2011: 91) envisages the complex markup of data on the Web as the ideal of the Semantic Web, one of whose objectives may be extremely useful for future online dictionaries: it should ‘allow a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing’ (www.w3.org/2001/sw). Kwary (2012: 35–9) illustrates the use of adaptive hypermedia in an English dictionary project of Finance for Indonesian students. Kwary’s proposal contemplates an online dictionary that has eight features divided into three lines. The first line has three features: a search text-box, an Insert System button 332
e-Lexicography
and a Search button. The second line contains four features in the form of tick-boxes, ‘English definition, Definisi B. Indonesia (‘Indonesian definition’), Terjemahan (‘equivalent’ in Indonesian), and CFA Sample Question’. Finally, the last line is the search result box. Kwary claims that the adaptive search system directs the search action to the preferable result without the users having to click a particular tick box when searching for the meaning of a term. He also adds that the adaptive system embedded in the dictionary suggests entries and works as an incremental search which is integrated into the search text box. zz Linked open knowledge: Linked open knowledge is an information technology and technique that implies that the knowledge is in the public domain, as defended by the Open Data Movement. It is currently being used in several dictionary projects, typically for cross-referencing users to Wikipedia pages, selected texts and corpus concordances, as shown in the Interactive Language Toolbox (Verlinde 2011), the Accounting Dictionaries (Fuertes-Olivera et al. 2012, Nielsen et al. 2012) and the Musikordbogen (Bergenholtz 2010). zz Recommender Systems: A recommender system is an information technology and technique that seeks to predict the user’s preference when he or she is searching. Bothma sees this technology as useful for selecting synonyms and antonyms and claims that a kind of recommender system is being used in the Ordbogen over faste vendinger (Bergenholtz 2010), and that such a system is very adequate when using a production dictionary, for instance for indicating that the specific word the user is looking at is not adequate in a context of usage and that he or she should use another one, which is also indicated. zz Annotation systems: An annotation system is an information technology and technique for creating information, as the Wiki environment in which Wikipedia and Wiktionary are created. In addition, some online dictionaries make use of several systems for sharing/creating information, typically by making queries and posting answers, giving feedback through emails, etc. Bothma (2011) claims that by means of public annotations, the currency and completeness of an e-dictionary could be enhanced. To sum up, functional online dictionaries aim to create customizable environments in which the user will be able to set up his or her profile (and adapt it accordingly). The system will adapt the user’s profile to the changing environment and the user’s specific needs, the database will require that the data will be marked up and that there are links to external sources, and the system will be able to make recommendations and allow the user to make annotations (Bothma 2011). Besides these, for the proponents of the function theory of lexicography it is also an intellectual and social challenge to add reflections on 333
The Bloomsbury Companion to Lexicography
(possible) customizable environments, as illustrated below in the context of the Accounting Dictionaries.
5 The Accounting Dictionaries Since its inception in the mid-2000s, the construction of the Accounting Dictionaries (Fuertes-Olivera et al. 2012, Nielsen et al. 2012) has been explained in around 25 papers, which have focused on three main issues: (1) anchoring the dictionary project under the tenets of the function theory of lexicography; (2) explaining some lexicographic decisions, for example, the process of lemma and/or equivalent selection; and (3) illustrating the transition from lexicography per se to lexicography integrated within the tenets of Information Science (see Fuertes-Olivera 2011, Fuertes-Olivera and Tarp 2011, Nielsen and Almind 2011, Bergenholtz 2012, Fuertes-Olivera and Nielsen 2012, Tarp 2012a, for a review of the project). The integration of lexicography into Information Science is an example of an intellectual and social challenge, especially the challenge of creating information e-tools for targeted information in order to avoid the risk of information overload, sometimes called the Google-suffocation effect. This is one of the main intellectual and social challenges of e-lexicography as it has to face different approaches and offer arguments in support of its working. As indicated in the Introduction, our vision of e-lexicography is based on a broader view of e-lexicography in which humans play a key role as both actors and consumers of online dictionaries. This means that the Accounting Dictionaries are the product of adopting three main assumptions. The first assumption is that a dictionary makes use of a database but is not itself a database and therefore the same database can generate several dictionaries. Nielsen and Almind (2011: 142) and Nielsen (this book) explain that databases and dictionaries are two different things. They add that ‘databases are vessels that contain data and nothing else. They have no functionality per se. Data are stored in discrete fields defined to contain specific types of data, for instance, numbers, dates, alphanumerical strings and so on.’ For lexicographical purposes, the most important things in a database are to avoid redundancy, that is, the number of duplicates must be reduced to a minimum, and to create interdependency of data. For instance, grammar cannot exist independently of a term. The Accounting Dictionaries adhere to this principle and are exponents of a triadic construction: First, there is a database containing specially selected data that have been structured in a way that facilitates search and retrieval. Secondly, users will see one or more dictionaries, for example, the English Accounting Dictionary and 334
e-Lexicography
the English-Danish Accounting Dictionary, which are websites that, strictly speaking, do not contain the lexicographic data, as these are contained in the database and not in the user interface. Thirdly, in order to provide access to the lexicographical data, a search engine is introduced as a mediator between the dictionary (user interface) and the database. The search engine allows users to search for data in the database and presents the results of searches to users according to their request. In this case, there are several dictionaries and one database, and the relationship between database and dictionary is, thus, a one-to-many relationship. (Nielsen and Almind 2011: 147) The second assumption is related to the above quotation. It can be formulated as a less is more assumption, which means that the search must retrieve only the data needed in the use situation which prompted the consultation process (see Chapter 5.3). The less is more concept aims to eliminate the suffocation effect Google and similar search engines provoke. It argues that in the era of the internet we must find a way for directing users to what they really need, thus discarding the retrieval of a vast amount of data that will consume most of the user’s time and energy as well as increasing the possibility of a failed search. Advocating a less is more assumption in lexicographical work not only connects lexicography with other disciplines, for example, computer science, in which scholars are trying to implement the Web 3.0 or semantic web, but also goes well with the true nature of lexicography, that is the satisfaction of users’ needs in specific use situations in the minimum possible amount of time. The practical consequence of such an assumption is that it gives a convincing answer to one of the recurrent lexicographic discussions in terms of the possibilities opened by the new technologies. This assumption accepts that databases are for storing as much data as possible, whereas a dictionary conceived in this way allows the presentation on the screen of as little data as possible (Bergenholtz 2011, Fuertes-Olivera and Nielsen 2012, Tarp 2012a). In the Accounting Dictionaries, this is formulated as one database, 27 different dictionaries, for example, the English-Spanish Accounting Dictionary for reception has a search functionality that retrieves only definitions and equivalents of the search term, whereas the search functionality in the English-Spanish Accounting Dictionary for translation also retrieves grammar data, and translated collocations and examples (see Chapter 5.3 for an illustration of this process). The third assumption is that an online dictionary is always in progress, not only because it can be corrected and updated on a regular basis but also because it allows the inclusion of new dictionary data that are based on new lexicographic thinking, that is, the application of a transformative approach to lexicography. For instance, the Spanish series of the Accounting Dictionaries (Fuertes-Olivera et al. 2012) is an L1-L2-L1-L2 dictionary, that is, English lemma, Spanish equivalent, English definition of the lemma, Spanish definition of the 335
The Bloomsbury Companion to Lexicography
English lemma. To the best of our knowledge, this is the first dictionary in the world with this pedagogically oriented lexicographic structure, which is the result of finding that many Spanish users did not understand the English definitions very well. The working of the three assumptions leads me to claim that the future of online dictionaries will be connected with the possibilities users may have of reusing the hits they retrieve. In Figure 5.1.1, this possibility is represented by dotted lines with the aim of indicating that its practical application will basically be decided by the particular user, provided he or she has the technical options for doing so. For instance, in the Accounting Dictionaries users have the functionality ‘source’, which links them to basic accounting texts that the Accounting Dictionaries team selected due to their relevance, that is, the condition of being directly connected with the subject field (accounting), the dictionary function(s) (primarily translation), the use situation in which the dictionaries are intended to be used (also for gaining some accounting knowledge), and the levels of competence of the intended users (translators, students and interested laypeople) (Fuertes-Olivera and Nielsen 2011). The ‘source’ frequently links users to the EU homepage where the IASs and the IFRSs are posted, as well as modifications and discussions of them. Supposing a user is translating between EU languages, he or she can use this link to access key data that he or she can reuse, for example, by following the inspiration he or she might find when consulting these official texts during the translation process, and possibly, either by modifying the data types selected by the dictionary team or by storing his or her translation(s) in an added storage system, for example, a translation memory, that the user has at his or her own disposal.
Figure 5.1.1 Future online dictionaries
336
e-Lexicography
6 Conclusion In conclusion, this chapter has dealt with the future of online dictionaries. It has defended the position that this future is broader than the simple analysis of the tools and devices with which we can equip our information tools. In particular, it has focused on three challenges, which are presented as institutional, research oriented, and intellectual and social challenges. These challenges must be highlighted in a critical context. The internet has allowed the compilation of new types of information tools, for example, the so-called terminological knowledge bases. These are proliferating around the world, especially because they obtain public money easily, although most of them do not deliver much. For instance, around 90 per cent of the terminological dictionary projects funded by the Spanish R+D funding agency are still prototypes after several years of continuous and generous funding. It must be stressed that time is ripe for incorporating costs and benefits into the process of dictionary-making, that is, for evaluating the challenges of applying new technology to dictionary-making in a human and sound way.
7 Acknowledgements Thanks are due to Ministerio de Economía y Competitividad for financial support (Grant FFI2011–22885), also to the University of Aarhus for financing my stay at Aarhus University as Velux Visiting Professor 2011–12. Thanks are especially due to my colleagues at the Centre for Lexicography (Henning Bergenholtz, Sven Tarp, Sandro Nielsen, Patrick Leroyer and Birger Andersen) for their comments on a previous draft. And finally my special thanks to Professor Howard Jackson for inviting me to take part in this project. I myself am responsible for any remaining mistakes.
References Almind, R. (2005) Designing internet dictionaries. In: I. Barz, H. Bergenholtz and J. Korhonen (eds) Schreiben, Verstehen, Übersetzen, Lernen. Zu ein-und zweisprachigen Wörterbüchern mit Deutsch. Frankfurt am Main: Peter Lang, 103–19. Alonso, A., Millon, C. and Williams, G. (2011) Collocational networks and their application to an e-advanced learner’s dictionary of verbs in science (DicSci). In: Iztok Kosem and Karmen Kosem (eds), 12–22. Available at: www.trojina.si/elex2011. (Accessed 10 March 2012). Battaner, P. (2010) El uso de etiquetas semánticas en los artículos lexicográficos de verbos en el DAELE. Quaderns de filologia. Estudis linguistics 15, 139–58. Bergenholtz, H. (2010) Ordbogen over faste vendinger (Database and design: Richard Almind). Odense: Ordbogen.com (www.ordbogen.com) (Accessed 10 March 2012).
337
The Bloomsbury Companion to Lexicography —(2012) Concepts for monofunctional accounting dictionaries. Terminology 18, 243–63. Bergenholtz, H., Bothma, T. and Gouws, R. (2011) A model for integrated dictionaries of fixed expressions. In: Iztok Kosem and Karmen Kosem (eds), 34–42. Available at: www.trojina.si/elex2011. (Accessed 10 March 2012). Bergenholtz, H. and Kaufmann, U. (1997) Terminography and lexicography. a critical survey of dictionaries from a single specialized field. Hermes 18, 91–125. Bergenholtz, H., Nielsen, S. and Tarp, S. (eds) (2009) Lexicography at a Crossroads: Dictionaries and Encyclopedias Today, Lexicographical Tools Tomorrow. Bern: Peter Lang. Bergenholtz, H. and Tarp, S. (2002) Die moderne lexikographische Funktionslehre. Diskussionsbeitrag zu neuen und alten Paradigmen, die Wörterbücher als Gebrauchsgegenstände verstehen. Lexicographica 18, 253–63. —(2003) Two opposing theories: on H. E. Wiegand’s recent discovery of lexicographic functions. Hermes 31, 171–96. —(2004) The concept of dictionary usage. Nordic Journal of English Studies 3, 23–36. Bergenholtz, I., in cooperation with Richard Almind and Henning Bergenholtz (2010) Musikordbogen. Odense: Ordbogen (www.ordbogen.com). (Accessed 10 March 2012). Bothma, Theo J. (2011) Filtering and adapting data and information in an online environment in response to user needs. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 71–102. De Schryver, G.-M. (2003) Lexicographers’ dreams in the electronic-dictionary age. International Journal of Lexicography 16/2, 143–99. Diccionario de aprendizaje del español como lengua extranjera (Daele) (2012). Available at: www.iula.upf.edu/rec/daele/. (Accessed 15 March 2012). Diccionario de la Lengua Española. Vigésima segunda edición (Drael). Real Academia Española. Available at: http://buscon.rae.es/draeI/. (Accessed 15 March 2012). Fernández, T. and Faber, P. (2011) The representation of multidimensionality in a bilingualized English-Spanish thesaurus for learners in Architecture and Building Construction. International Journal of Lexicography 24/2, 198–225. Fuertes-Olivera, P. A. (2009) The function theory of lexicography and electronic dictionaries: Wiktionary as a prototype of a collective multiple-language Internet dictionary. In: Henning Bergenholtz, Sandro Nielsen and Sven Tarp (eds), 99–134. —(2012) Lexicography and the internet as a (re-)esource. Lexicographica 28/1, 49–70. Fuertes-Olivera, P. A. (ed.) (2010) Specialised Dictionaries for Learners. Berlin and New York: De Gruyter. Fuertes-Olivera, P. A. and Bergenholtz, H. (eds) (2011) e-Lexicography: The Internet, Digital Initiatives and Lexicography. London and New York: Continuum. Fuertes-Olivera, P. A., Bergenholtz, H., Nielsen, S., Gordo Gómez, P., Mourier, L., Niño Amo, M., Ríos Rodicio, Ángel de los, Sastre Ruano, Á., Tarp, S. and Velasco Sacristán, M. (2012) Accounting Dictionaries (A series of 10 interconnected Spanish, Spanish-English, and English-Spanish Dictionaries). Database and Design: Richard Almind and Jesper Skovgård Nielsen. Odense: Lemma.com. Fuertes-Olivera, P. A. and Nielsen, S. (2011) The dynamics of terms in accounting: what the construction of the Accounting Dictionaries reveals about metaphorical terms in culture-bound subject fields. Terminology 17/1, 157–80. —(2012) Online dictionaries for assisting translators of LSP texts: the Accounting Dictionaries. International Journal of Lexicography 25/2, 191–215. Fuertes-Olivera, P. A. and Tarp, S. (2011) Lexicography for the third millennium: cognitive-oriented specialised dictionaries for learners. Ibérica 21/1, 141–62. Gouws, R. H. (2011) Learning, unlearning and innovation in the planning of electronic dictionaries. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 17–29. Gows, R. H., Heid, U., Schweickard, W. and Wiegand, H. E. (eds) (2013) Dictionaries. An International Encyclopedia of Lexicography. Supplementary Volume: Recent Developments
338
e-Lexicography with Special Focus on Computational Lexicography. Berlin and New York: Mouton de Gruyter. Granger, S. and Paquot, M. (eds) (2010) eLexicography in the 21st Century: New Challenges, New Applications. Proceedings of ELEX 2009. Cahiers du Cental 7. Louvain-la-Neuve: Presses universitaires de Louvain. —(2012) Electronic Lexicography. Oxford: Oxford University Press. Haß, U. and Schmitz, U. (2010) Lexicographie im internet 2010 – Einleitung. Lexicographica 26, 1–18. Interactive Language Tolbox. Available at: http://ilt.kuleuven.be/acnederlands/leeshulp. php (Accessed 15 March 2012). Kosem, I. and Kosem, K. (eds) (2011) Electronic Lexicography in the 21st Century: New Applications for New Users. Proceedings of eLex 2011, Bled, 10–12 November 2011. Available at: www.trojina.si/elex2011. (Accessed 10 March 2012). Kwary, D. A. (2012) Adaptive hypermedia and user-oriented data for online dictionaries: a case study on an English Dictionary of Finance for Indonesian students. International Journal of Lexicography 25/1, 30–49. Leroyer, P. (2011) Change of paradigm: from linguistics to information science and from dictionaries to lexicographic information tools. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 121–40. Lew, R. (2011) Online dictionaries of English. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 230–50. Mahecha Mahecha, V. and DeCesaris, J. (2011) Representing nouns in the Diccionario de aprendizaje del español como lengua extranjera (DAELE). In: Iztok Kosem and Karmen Kosem (eds), 180–6. Available at: www.trojina.si/elex2011. (Accessed 10 March 2012). New Palgrave Dictionary of Economics Online. Available at: www.dictionaryofeconomics.com/dictionary. (Accessed 15 March 2012). Nielsen, S. (forthcoming) LSP lexicography and typology of specialized dictionaries. In: G. Budin, C. Laurén and J. Humbley (eds) Language for Special Purposes. An International Handbook. Berlin and New York: De Gruyter. Nielsen, S. and Almind, R. (2011) From data to dictionary. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 141–67. Nielsen, S., Mourier, L. and Bergenholtz, H. (2012) Accounting Dictionaries (A series of 13 interconnected Danish, Danish-English, English-Danish and Danish Dictionaries). Database and Design: Richard Almind and Jesper Skovgård Nielsen. Odense: Lemma.com. Oxford English Dictionary. Available at: www.oed.com/. (Accessed 15 March 2012). Pastor, V. and Alcina, A. (2010) Search techniques in electronic dictionaries: a classification for translators. International Journal of Lexicography 23/3, 307–54. Pruvost, J. (2000) Colloquium report: Des dictionnaires papier aux dictionaries électroniques. VIIe Journée des Dictionnaires (22 mars 2000). International Journal of Lexicography 13/3, 187–93. Renau, I. and Battaner, P. (2011) The Spanish Learner’s Dictionary DAELE on the panorama of the Spanish E-lexicography. In: Iztok Kosem and Karmen Kosem (eds), 221–6. Available at: www.trojina.si/elex2011. (Accessed 10 March 2012). Scott, M. (2008) WordSmith Tools, Version 5. Liverpool: Lexical Analysis Software. Spohr, D. (2011) A multi-layer architecture for ‘pluri-monofunctional’ dictionaries. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 103–20. Tarp, S. (2002) Translation dictionaries and bilingual dictionaries. Two different concepts. Journal of Translation Studies 7, 59–84. —(2007) Lexicography in the Information Age. Lexikos 17, 170–9.
339
The Bloomsbury Companion to Lexicography —(2008) Lexicography in the Borderland between Knowledge and Non-Knowledge. General Lexicographical Theory with a particular focus on Learner’s Lexicography. Tübingen: Max Niemeyer. —(2011) Lexicographical and other e-tools for consultation purposes: towards the individualization of needs satisfaction. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 55–70. —(2012a) Specialized lexicography: 20 years in slow motion. Ibérica 24, 117–28. —(2012b) Do we need a new theory of lexicography? Lexikos 22, 321–32. Tono, Y. (2010) A critical review of the theory of lexicographical functions. Lexicon 40, 1–26. Ulsamer, S. (2011) Automatically extracted word formation products in an online dictionary. In: Iztok Kosem and Karmen Kosem (eds), 302–10. Available at: wwwtrojina. si/elex2011. (Accessed 10 March 2012). Verlinde, S. (2011) Modelling interactive reading, translation and writing assistants. In: Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), 275–86. Wikipedia. Available at: http://en.wikipedia.org/wiki/Main_Page. (Accessed 15 March 2012).
340
5.2
The Future of Historical Dictionaries, with Special Reference to the Online OED and Thesaurus Charlotte Brewer
Chapter Overview Introduction The OED Up to 2012 The OED: Present and Future Characteristics Links to Other Dictionaries
341 344 347 351
1 Introduction Historical dictionaries are a special case in lexicography and dictionary making, requiring different areas of research and raising different types of questions about the art and craft of lexicography. A historical dictionary is usually historical in two distinct ways: first, since it records and comments on how words have been used during the course of a language’s past, it tells a story – perhaps many stories – about the development not only of language, but also of society and culture over the period in which that language has been used. Secondly, such dictionaries are themselves historical documents, especially since (as is always the case) they take decades to research and write. Any historical dictionary will reflect changes in scholarly methods over the period that it was assembled, and 341
The Bloomsbury Companion to Lexicography
inevitably, as it falls out of date – for language, and language scholarship, will continue to move forward as lexicographers run behind trying to catch up – it will need supplementing or revising. These further accretions or transformations will again mirror changing values in scholarship and culture. Perhaps, paradoxically, it is digitization – that is, one of the most distinctively modern developments in technological and intellectual culture in recent times – which has opened up both types of history to us more now than at any stage in the past. The digitization of historical dictionaries has enabled ordinary users as well as language scholars and academics (for whom, in general, these dictionaries were originally written) to scrutinize the evidence accumulated by these dictionaries over periods of the past, and to subject this to systematic analysis, thus revealing how the dictionary was put together and what values are encoded in its scholarly and methodological assumptions. Many nations have produced historical dictionaries, and dictionaries are often thought of as expressing, or indeed embodying, patriotic values. The first major European dictionaries were unavoidably imbued with such significance, for example Italy’s Vocabolario degli Accademici della Crusca (1611) and France’s Le Dictionnaire de l’Académie françoise dedié au Roy (1694), both of which were the product of academies determined to regulate and purify the language of the nation. Likewise, today’s historical dictionaries have often been funded by institutions intrinsic to the country whose culture they commit to paper: the Deutsches Wörterbuch (1852–1960), edited by the brothers Jacob and Wilhelm Grimm began as a private enterprise but had its funding taken over by various state organizations from around 1868 onwards, while the Trésor de la Langue Française (1971–94) and the Woordenboek der Nederlandsche Taal (1863–1998) were both state-funded from the outset. The Oxford English Dictionary (OED), on which this chapter concentrates, is an exception to that rule in that it has always been published by an independent publishing house, Oxford University Press (OUP). It is in many other ways a striking example of the characteristics and features of historical dictionaries. Originally published 1884–1928, but compiled over a much longer period (from around 1860 onwards), the OED was subject to all sorts of changes in editorial and lexicographical policy, and in its definitions of words relating to every sort of topic – science, art, sexuality, gender, race, politics and so on – it inevitably reflected the temper of its time. Its earliest published sections were out of date before the final ones were published, and its identity, function and purpose have been significantly complicated by a series of supplements and additions, and most recently by the monumental project of revision that has been published online (and only online) since 2000 (www.oed.com). It is uncontroversially recognized as the greatest dictionary of English ever published, or (to quote its website’s strapline) ‘The definitive record of the English language’, but it can certainly be criticized for imperfections and inconsistencies. And from early 342
The Future of Historical Dictionaries
on the OED was seen as occupying a unique role in the cultural as well as the intellectual or scholarly life of the nation, so that during the First World War its publishers claimed it as ‘An Imperial Asset’.1 Its readers thought the same, one reviewer suggesting that it ‘should be the most coveted possession of all public libraries in the United Kingdom, in the Colonies, and at least the headquarters of every district in India … It is not so much a Dictionary as a History of English speech and thought from its infancy to the present day’.2 Describing the English language as an instrument of empire looks questionable to us today, as do those of the original definitions that we are now able to acknowledge as politically, socially, racially or ethically biased (but which were not so recognized at the time of publication): for example, the explanation published in 1924 that the term ‘white man’ meant ‘a man of honourable character such as one associates with a European (as distinguished from a negro)’, or the 1909 definition of Sapphism, referring to female homosexuality, as ‘unnatural sexual relations between women’. One of the significant challenges that has faced the publishers and lexicographers working on editions and revisions to the OED since the first edition was published has been to identify such now-controversial and/or objectionable features in the original dictionary and to update them, while at the same time faithfully preserving, in some way or other, the historical evidence about attitudes and beliefs of the past which the original wording bore testimony to. This chapter’s title specifies the ‘future’ of historical dictionaries. OED’s future has recently taken an interesting turn, since the publishers have wholeheartedly embraced internet publishing, an innovative step in 2000, when its website was first launched, and still innovative today, as the dictionary develops increasingly sophisticated electronic search and display tools. Now that it is online, available free to UK public library users and by subscription to institutions and private households worldwide, the OED is reaching a far wider and less scholarly audience than ever before. This presents its publishers, and indeed the lexicographers themselves, with another set of challenges, some of which, as we shall see, are in conflict (or potentially in conflict) with the academic values according to which the dictionary was first conceived and compiled. One of OED’s unique features is that it traces a word’s history from its first use through to its last, and finds its evidence in texts written at every stage in the language’s history from 1150 onwards. This offers a further challenge still, since the technological revolution over the last couple of decades has enabled digitization not just of the OED itself but of many millions of words written and printed both past and present, expanding the dictionary’s potential sources in ways which must sometimes seem unmanageable to the lexicographers. This chapter explains the consequences of OED’s history for its future (Section 2), examines how its current revision and its innovative website are revealing the dictionary’s past and present stock of information to its 343
The Bloomsbury Companion to Lexicography
expanding public (Section 3), and briefly reviews the Historical Thesaurus of the OED (Section 4), a work that began life in Glasgow University in the 1960s but is intimately connected with the OED itself. Along the way it discusses the importance for historical lexicography (both present and future) of digitization, of interlinking dictionaries, of finance and of preserving the values of scholarship.
2 The OED Up to 2012 To understand the OED today, and think coherently about its likely form and function in the future, we need to know quite a bit about its past. The narrative of its origin and compilation has been engagingly told in a couple of bestsellers as well as more academic books.3 Briefly, the dictionary was embarked on in the early 1860s by the London Philological Society, a group of scholars and gentlemen who were knowledgeable about language and about lexicographical history and were keen to emulate the great European dictionaries of the day. Enlisting the help of hundreds of volunteers (an early example of crowd sourcing), the lexicographers set out to record every word in the language, reading through thousands of printed sources of every description, in every period in the language, in order to discover how words were used in real context. Poring over these quotations, around five million altogether, enabled the editors of the dictionary (which in 1879 was taken over from the Philological Society by OUP) to determine definitions for words and to trace how meanings and senses had developed over time. As they themselves rightly claimed, the OED’s grounding in this mass of historical evidence created a revolution in the art of lexicography (Murray 1933, preface). As we have already seen, a historical dictionary falls swiftly out of date. On the one hand, contemporary language moves on and new words and usages come into existence, while on the other, historical scholarship uncovers new texts from the past and new information about the history and development of the language. Based though it was on the best Victorian and Edwardian scholarship of the time, the OED could not avoid obsolescence as the years passed following its completion. By the 1950s its publishers were faced with a dilemma. Should they keep on printing something that was so out of date? Or should they revise the whole thing again from scratch – which would be extraordinarily expensive, while at the same time consume another few decades? Not unnaturally, they compromised. They reprinted the original dictionary on the one hand, but commissioned a supplement on the other. This supplement only recorded new words entering the language since the first edition, or new senses of existing words – like the use of Trident to refer to the nuclear weapon system, aerobics, Angle-poise, antihistamine and so on – and was eventually published in four volumes between 1972 and 1986 (Burchfield 1972–86).4 344
The Future of Historical Dictionaries
In this way, OUP made up lost ground with contemporary, twentieth-century vocabulary, but kept the original Victorian and Edwardian dictionary as it was. Then in 1989, in a remarkably bold and prescient move, the Press digitized the whole work, merging the original first edition – still unchanged – with the four-volume supplement, and reprinting and rereleasing the resulting combination as a second edition. It was this development that introduced the first seriously complicating element into the OED, adumbrating problems to come in the immediate and more distant future. The second edition was received with enormous praise and appreciation by the general public, but severely criticized by lexicographical specialists for its confusing mixture of (almost entirely) old wine in new bottles. While everything about its appearance – a brand new work, apparently, in 20 handsome volumes (soon to be available in the then-novel format of CD-ROM) – suggested that it was at the cutting edge of scholarship as well as technology, its content was very largely the same as in the original Victorian and Edwardian version. The editors had identified and corrected a handful of the problematic definitions mentioned above, relating to race or sexuality, but had left many more untouched (the entry for Sapphism, e.g. dropped the word ‘unnatural’ in the original definition, but continued to refer to female homosexuality as a ‘vice’ in the etymology, while ‘unnatural’ remained a descriptive term in a number of words referring to male homosexuality, e.g. sodomy, buggery and the like). And the second edition retained hundreds of definitions that were outdated in other ways too; for example dialect was explained as a ‘subordinate’ form of a language ‘arising from local peculiarities of vocabulary’, that is, ‘a provincial method of speech’, while slang was said to be ‘the special vocabulary used by any set of persons of a low or disreputable character; language of a low and vulgar type’. These definitions both appeared to belong to the same date as the second edition, namely 1989, but could never have been written at that time: they express value judgements on language which no respectable lexicographer would have put his or her name to since the 1950s or earlier. In fact, of course, they had both been taken over unchanged from the first edition. The same applied to many of the more recondite components of the unrevised entries, for example pronunciation and etymology: the pronunciation recorded that of late Victorian England, while many etymologies had long been superseded by more recent scholarship. In more subtle and pervasive ways, too, the second edition belied its late twentieth-century date. As revealed by electronic searches of the dictionary (which OUP itself, in digitizing the dictionary for the second edition, had made possible), the OED had throughout its compilation been subject to more than one type of cultural bias. The individual writers most favoured for quotation purposes, for example – and here we should bear in mind that quotations were the principal evidence on which this dictionary was based, and from which 345
The Bloomsbury Companion to Lexicography
it derived its unique authority – were those of the Victorian literary canon: Shakespeare, Walter Scott, Chaucer, Milton, Pope, Tennyson, Dickens and the Bible. Female authors received short shrift, female poets even shorter shrift, and some periods (for example the late sixteenth century) were much more intensely represented in the dictionary than others (the eighteenth century). This is quite unsurprising, of course: these were the sources that both the lexicographers and their volunteers believed were the most important (or in the case of female writers and the eighteenth century, the least important) for literary and therefore linguistic evidence on the use of language. However, from the 1930s or so onwards, linguists had begun to question the assumption that it is great literature we should turn to in seeking to record (as the OED sets out to do) the history and development of the language, while literary critics had begun to question the assumptions implicit in the formation and promulgation of a literary canon in the first place. In much of the quotation evidence, therefore, as well as the definitions it contained, the 1989 second edition bore the stamp of an era long disappeared.5 At the same time, nevertheless, the ‘new’ dictionary looked bang up to date – not just because of the smart new printing, but also because of the patently recent vocabulary leaping off the page to greet the eye of anyone leafing through it. Thousands of new words, only just recorded in the supplement, were clearly of the present, whether relating to recent cultural phenomena (the twist, flower power, yuppie), to science, technology and history (moon landing, wysiwyg, cultural revolution) or to transformed sexual mores, which meant that vocabulary previously considered unprintable could be recorded for the first time (cunnilingus, fellatio). The result was a sort of mongrel edition, happily consulted by many as the last word in lexicography – and indeed, enormously useful and erudite in many respects – but treated by scholars, if they were aware of the problems, with considerable caution.6 The OED lexicographers and publishers themselves knew better than anyone that the second edition was unsatisfactory, however warmly received by non-specialists. They swiftly came up with a solution, although it has been an expensive one so far and commits them to major expense in the future. In the 1990s the Press began the long overdue task of revising the OED in its entirety, thus pulling the still largely Victorian and Edwardian second edition into the twenty-first century. A team of around 60 lexicographers began the process of overhauling, re-researching and rewriting each and every component of each and every entry: spelling, pronunciation, etymology and semantic analysis. At the same time, they embarked on a major new reading programme to provide the quotation evidence which would drive this reconfiguration forward. The revision is ongoing: by December 2012, the lexicographers had revised around 37 per cent of the original dictionary, and completion is still some decades off. And while they are engaged on revising the historical portion of the dictionary – all 346
The Future of Historical Dictionaries
the entries up to and including those of Burchfield’s supplement – they are also recording the vocabulary of today, both new senses of existing words and new words altogether; recent additions to the OED lexicon include the verbs carpetbomb and dogsbody, cybercast as both noun and verb, earthscape, subcommunity, super-injunction, and many more; see http://oed.com/public/wordslist0612.7 So Oxford’s lexicographers, engaged on what is indisputably a project of innovation – its establishment of new types of search tools and hyperlinks (as the next two sections explain) looking some way to the future – are in many respects retracing the steps taken by their Victorian predecessors. The major difference is that all the scholarship and research methods being applied to the Victorian original are twenty-first-century ones – electronic searching, electronic storage, electronic editing and electronic production.
3 The OED: Present and Future Characteristics The current and future job of the lexicographers is thus twofold: revision and (where necessary, if a word is still in use) updating of the old entries, and creation of new ones. But the past is constantly present in OED lexicography, in more ways than one. Most visibly – although many users do not realize this – it is present in the two-thirds of the entries that are as yet unrevised, many of which have been virtually unchanged since their initial publication between 1884 and 1928 in the first edition of OED. This is because OUP has decided that the best way to present the dictionary online is to merge the new and revised entries seamlessly with the unrevised ones. Why? Probably because online publication reaches a much more varied audience than was ever exposed to the print version, whose 13 volumes, forbiddingly heavy and expensive, were usually confined to the reference sections of university libraries and to the homes of relatively wealthy academics and intellectuals. The current OED is regularly browsed by journalists, teachers, lawyers, editors, technologists and word enthusiasts, not only in the United Kingdom but throughout the global community, notably the United States and Japan, where interest in the OED is high. This wide audience brings with it commercial returns, but also the responsibility to meet popular requirements as well as scholarly ones. In turn, as the Press sees it, this requires on-screen presentation that is simple and easy to use. The problem is that the history of the dictionary is not simple at all, and the second edition, into which the ongoing third edition (or OED3) is being gradually merged, was itself a hybrid of new with old. OUP has nevertheless decided that attempts to represent the history of the dictionary more transparently, distinguishing between the different stages of composition and publication, and making it clear that the content of many existing entries dating back 70 years and more contains information long since superseded by subsequent 347
The Bloomsbury Companion to Lexicography
scholarship would be impossibly confusing for its non-specialist users and for anyone who is not a lexicographical historian. Opinions will vary on whether this is a good decision or a bad one. It could certainly be argued that one of the chief attractions of OED for non-specialist users is its impeccable scholarly credentials, and that it is precisely this characteristic of OED that the current website, with its unannounced merging of new with long-outdated entries, obscures or at worse misrepresents. Four times a year (March, June, September and December), the website is updated: a set of newly revised or newly created entries is uploaded and the corresponding set of unrevised entries silently removed. At the same time, the lexicographers can make changes or corrections to entries or parts of entries throughout the whole dictionary – updating bibliographical details in the quotations, for example, or regularizing editorial labels should they rethink their rules on describing terms as ‘obsolete’, ‘slang’, ‘colloq.’ (i.e. ‘colloquial’) – while improving the user-friendliness of the site more generally by adding short videos or brief explanatory features on subjects such as surnames or place names in English. This mode of revision, composition, accretion and regular transformation is something that was impossible in print format, and it has many advantages for consumer as well as lexicographer. Most obviously, dictionary users no longer need to wait years for a supplement, and when it arrives squeeze it onto their shelves next to existing volumes: instead, each fresh batch of spanking-new scholarship materializes onto everyone’s screen every quarter, with every entry in correct alphabetical order. The revised OED may never be published in print again, and electronic publishing is the likely future for many types of dictionary now. Where historical dictionaries are concerned, there is an added advantage for both publisher and reader: neither will have to bear the especially heavy costs associated with producing such works, due not only to their vast size but also to the many different fonts and complicated page layouts required by the variety of information displayed in every entry. But there are disadvantages to internet publication, too, particularly for those seeking a truly definitive definition. The entry you consult in January may be different by March (entirely revised, or with new quotations, date changes or other alterations). Scholars find this evanescence upsetting and infuriating; even the casual reader may find it disconcerting. One category of vocabulary that has proved particularly vulnerable to unsignalled change are sexual words whose definitions –dating back to the first edition – have more recently been perceived to be offensive, for example those which (as we have seen) termed homosexual acts ‘unnatural’. In an editorial sweep of otherwise unrevised entries after December 2010, the term ‘unnatural’ was silently deleted and a more explicit reference to ‘anal sex’ substituted. At a pinch, this could be described as
348
The Future of Historical Dictionaries
censorship of the historical record. The wording of definitions is itself a cultural act, of interest therefore to every type of cultural commentator and historian, whether academic or not. In an historical dictionary, it is important that acts of this type are identified and dated – so that we don’t mistakenly assume that the references to ‘anal sex’ date from 1989 (as indicated by the website) when in fact they were introduced post-2010, or conversely that describing homosexual acts as unnatural in 1989 was unacceptable when demonstrably that is untrue: all the references to ‘unnatural’ sex, although originally published in the first edition, were reproduced unchanged in the 1989 second edition and not thought worthy of comment at the time. By contrast, the printed book is (more-or-less) permanent and unchanging. If a new edition supersedes the old, the old does not disappear. But there is now no question that users of online historical dictionaries who intend to cite any part of an entry as scholarly underpinning for their published work, whether literary, linguistic, historical or otherwise, must take a screen shot of the information they are viewing in order to secure a permanent record. This unpredictable changefulness in authorities previously static, now an intrinsic feature of electronic publishing of reference works, marks a sea change in academic methodologies of all kinds. Its implications will take some years to be fully absorbed. Other electronic transformations of OED are still in the process of developing. Since its inception in 2000, the online website has provided tools for searching and display that have been improved and refined in successive makeovers. They offer the ability to analyse and exploit the vast stores of information previously trapped inaccessibly between printed pages, revealing answers to questions of interest to professional linguists and word enthusiasts alike. For example, which writers introduced the most words into the language? Do these words have particular grammatical or etymological features? Which periods of the language have been most lexically productive? Front-page buttons now allow one to search by language of origin, by subject, by ‘usage’ – that is, for words labelled ‘allusive’, ‘archaic’, ‘colloquial and slang’, ‘derogatory’, and so on. The extent of these categories is frustratingly limited on the one hand – you can’t search for Scottish words, for example, without drilling further into the site – and strangely broad on the other; if you click on the ‘colloquial and slang’ hyperlink you get thousands of results in a format which cannot easily be analysed (the web designers have not thought about how users may wish to process the data resulting from searches). Particularly notable is the new feature (inspired, so the present author has been told, by the ‘Examining the OED’ website, i.e. Brewer 2005-) listing the top thousand most-cited sources in the dictionary on a web page which can be further searched. All these devices open up the history and development of the language in ways that could not
349
The Bloomsbury Companion to Lexicography
have been dreamed of by the original lexicographers, and they give a good indication of the directions in which other historical dictionaries, not just the OED, may travel. But they also point to the possibility of error and pitfall. In the OED’s case, all electronic searches are applied to a mixed database, made up of the (so far) unrevised entries of OED2 blended indistinguishably with the (so far) revised entries constituting the ongoing OED3, in a proportion that changes every quarter when a new update to the website is uploaded (as we have seen, unrevised OED2 is merged seamlessly with OED3 in the version of the dictionary one accesses online). The revised portion of OED, however, is transforming the historical record of the vocabulary, not least by antedating more than one in three of the words previously attributed to Shakespeare (Brewer 2012). It is also quoting far more intensively from sources its predecessors neglected: eighteenth-century texts, writing by women, and non-literary texts of all kinds (e.g. wills, inventories, many more newspapers and journals, diaries, legal and local government records, and a host of other heterogeneous texts, many available for the first time in digitally searchable resources). OED3 is thus greatly different from OED2 – not surprisingly, as otherwise there would have been no point in embarking on the revision. So users can easily be led up the garden path if they do not take the results of current searches of OED Online with a large pinch of salt. The moral is clear: caveat emptor. However sophisticated the electronic medium in which they appear, dictionaries reflect the evidence and research put into them in the first place – and in the case of the OED, a historical dictionary whose compilation has extended over many years, electronic searches will turn up results reflecting many different stages of composition. The only way to solve this problem is to allow users to search the old material separately from the new (as well as the other way round), a resource recently withdrawn: most unfortunately, the December 2010 relaunch deleted the electronically searchable version of OED2 from the website, leaving us just with the hybrid version of the dictionary. Looking on the bright side, this problem of inconsistent data will gradually reduce as the revision of OED proceeds, and will have entirely disappeared in 20 years or so (!) when the revision is complete. By then, we can hope that searchable OED2 will have returned to the website, as an important record of historical lexicography bearing witness to a whole era of society and culture as well as to literary and linguistic scholarship on language itself. For as OED3’s editor John Simpson states in his online preface, in a remark that can be applied to other historical dictionaries too, ‘Far more than a convenient place to look up words and their origins, the Oxford English Dictionary is an irreplaceable part of English culture. It not only provides an important record of the evolution of our language, but also documents the continuing development of our society’ (www.oed.com/public/oedhistory#future, accessed June 2012). 350
The Future of Historical Dictionaries
4 Links to Other Dictionaries Another forward-looking characteristic of the new OED Online, one with implications for lexicography more widely, is its hyperlinks to related material in other dictionaries – some published by OUP itself, and others further afield. The most important of these is OED’s younger ‘sister’ dictionary, The Historical Thesaurus of the Oxford English Dictionary (HTOED, Kay et al. 2009). The origins of this latter work can be traced back to a winter’s day in 1964, when the project’s senior editor, M. L. Samuels, announced at a departmental meeting at Glasgow University his scheme of ‘turning the dictionary [OED] inside out’, in order to loose its contents from their alphabetical moorings and reorganize them by semantic categories (Kay and Wotherspoon 2002). The problem with conventional (print) dictionaries, as has often been observed, is that alphabetical order has nothing, or little, to do with the meanings of words or the connections between them. The advantage of thesauruses, by contrast, is that words are arranged according to their meanings. But the idea behind HTOED was more ambitious still, since by recasting and reformulating OED’s evidence it was able to show how related categories of vocabulary, along with the objects or concepts denoted by that vocabulary, had developed through time. This reconfiguration of OED not only enables new types of historical lexical research, therefore, but also opens fresh paths of historical and cultural investigation in areas beyond the purely linguistic. For example, as the editors explain, ‘anyone wanting to know the range of words available to Shakespeare for a particular meaning can consult the appropriate time span in the relevant section or sections’; while the larger groups of related words can themselves be thought (and used) as ‘conceptual maps’ – charting the development of (say) vocabulary relating to the mind, or to concepts such as strength or virtue or sin or wealth (evidently a seemingly endless list), to cultural artefacts of all imaginable kinds (cookery, gardening, art, music, business, merchandise, clothing, etc.), to social entities and relationships (monarchy, the family, the state, etc.); see further Kay et al. (2009: preface). All this is on display in historically organized categories of data which is itself coterminous with the entire range of recorded vocabulary in English – or to be more precise, with the vocabulary recorded in the OED: the main drawback of the project is that the editors based their study on the second edition of OED (waiting for the revision of this edition would have meant that they would never have started in the first place, though they were able to draw on the 1972–86 supplement as they went along), and therefore replicates the imperfections of that edition as well as the strengths of the original OED itself.8 The print version of HTOED was virtually unusable. While volume 1 – the lists of words by semantic category, packed tightly if legibly onto extremely thin pages – was richly fascinating to contemplate, active research on it required 351
The Bloomsbury Companion to Lexicography
using volume 2, that is, protracted effort of a most tedious kind, trailing back and forth between the volumes, looking up successive individual items, and tracing them according to the rebarbative labelling system. All is now transformed in the OED Online version, which integrates the thesaurus into its design at every stage in an entry, allowing readers to link to related words and see them displayed either in their dictionary layout or in their embedded niches within the thesaurus itself. Here there seems to be no disadvantage to the electronic medium, as several clicks achieve far better and easier results than we could previously attain by tossing volumes around and leafing through pages. Other new hyperlinks on OED Online take us (from the front page) to OUP’s internet dictionary resources elsewhere: ‘Oxford Dictionaries Online’ (a concise dictionary of contemporary English which often offers superior definitions for unrevised words in the OED itself – dialect and slang, whose dated treatment was quoted above, being good examples here), ‘Oxford Language Dictionaries Online’ (a series of foreign language dictionaries published by OUP), ‘Oxford Reference Online’ (a portal to a range of reference works, including collections of quotations, maps, timelines and an encyclopaedia, along with specialized dictionaries or ‘companions’ on art and architecture, biological sciences, classics, computing and so on), and finally the Oxford Dictionary of National Biography (ODNB). This latter resource is also integrated into the OED itself, so that when looking up a word and browsing its quotation evidence, one can link through to the ODNB entry for an author (should it exist) – a wonderful bonus which can enhance understanding of the historical evidence and open up further fields of inquiry (though it should be noted that the links are presently not free from glitches, a problem afflicting many of the other new features on the relaunched website). Even more helpful for historical linguists are the links within entries to two other historical dictionaries of English: the Middle English Dictionary (MED) and the Dictionary of Old English. These dictionaries provide a much wider range of synchronic quotation evidence for the periods they cover, enabling specialists to get a better sense of connotation than is possible from the more selective OED. Moreover, although MED was composed (between the mid 1930s and 2002) with the example of OED before it, and with the aid of the OED’s own dictionary quotations for this period, it often chose to analyse the semantic structure of words differently. Comparing the two dictionaries side by side on the screen (or printing out the entries for closer scrutiny) illuminates difficult lexicographical decisions about how to interpret difficult words and concepts with particular cultural resonance (e.g. ‘honour’, ‘truth’ and the like); evaluations of this sort are much easier with the new hyperlinks. This feature is likely to develop as OUP continues to rethink its plans for OED: candidates for linkage in the future might include the Dictionary of the Scots Language (DSL, at www.dsl.ac.uk/), which combines the Dictionary of the 352
The Future of Historical Dictionaries
Older Scottish Tongue (Craigie 1937–2002) with The Scottish National Dictionary (Grant 1931) and is under further development, as well as dictionaries of US English such as the Dictionary of American Regional English (DARE, Cassidy and Hall 1985–2012) – not to mention earlier historical dictionaries such as Jamieson’s Etymological Dictionary of the Scottish Language (1808) or Wright’s English Dialect Dictionary (1898). All these dictionaries contain rich resources for historical study of English; all except DARE have been digitized – and DARE itself is due online in 2013. None of the respective websites has so far approached the degree of sophistication or functionality of the OED, but then none has had access to funds of the order that have been poured into the OED. These funds have ensured not only a Rolls Royce website but also, much more importantly, a team of lexicographers with expertise and experience stretching back decades. The future of historical dictionaries, whether completed a century or more ago (like Jamieson and Wright), or ongoing (like DSL, DARE and OED itself), is undeniably digital. The democratization of knowledge that has accompanied the growth of the internet means that lexicographers and their publishers now have much better access to an enthusiastic public, available to be tapped for funds as well as for volunteer support. Working out how to finance historical dictionaries, how to link them together, and how to present the scholarly information locked away in them attractively and accessibly – all this while preserving academic standards intact – is one of the most exciting issues in lexicography today.
Notes 1. Oxford University Press (1916: 16). 2. Quoted in Oxford University Press’s in-house journal The Periodical, 15 February 1928: 25. 3. Murray (1977) remains the authoritative work here, though see Winchester (1998 and 2003). The essays in Mugglestone (2000) offer a good range of analytic and descriptive accounts of the first edition, while Brewer (2007) provides a history of OED over the twentieth and twenty-first centuries. 4. In fact this was the second supplement; the 1933 reissue of OED (Murray 1933) had included a one-volume supplement with entries for many words or usages that entered the language since publication of the dictionary’s first instalment in 1884. 5. See further Brewer (2005-), Brewer (2010a). 6. See further Stanley (1990), Algeo (1990), Brewer (1993) and (for an overview) Brewer (2007: 213–22). 7. Three interim volumes of OED additions to the vocabulary (i.e. between the second edition of 1989 and 2000, when the OED went online) appeared in print in the 1990s (Simpson and Weiner 1993, 2 vols, and Proffitt and Simpson 1997). 8. See further Brewer (2010b), on which this account draws.
353
The Bloomsbury Companion to Lexicography
References Algeo, J. (1990) The emperor’s new clothes: the second edition of the Society’s dictionary. Transactions of the Philological Society 88, 131–50. Brewer, C. (1993) The second edition of the OED. Review of English Studies 44, 313–42. —(2005-). Examining the OED. Available at: http://oed.hertford.ox.ac.uk/. —(2007) Treasure-House of the Language: The Living OED. New Haven and London: Yale University Press. —(2010a) The use of literary quotations in the Oxford English Dictionary. Review of English Studies 61, 93–125. —(2010b) Review of: Christian Kay, Jane Roberts, Michael Samuels and Irené Wotherspoon (eds) Historical Thesaurus of the Oxford English Dictionary. Review of English Studies 61, 801–5. —(2012) Shakespeare, word-coining, and the OED. Shakespeare Survey 65, 345–57. Burchfield, R. W. (1972–86) A Supplement to the Oxford English Dictionary, 4 vols. Oxford: Clarendon Press. Cassidy, F. G. and Hall, J. H. (1985–2012) Dictionary of American Regional English, 5 vols. Cambridge, MA: Harvard University Press. Craigie, W. A., Aitken, A. J., Stevenson, J. A. C. and Dareau, M. G. (1937–2002) A Dictionary of the Older Scottish Tongue, 4 vols. Chicago: University of Chicago Press, Aberdeen: Aberdeen University Press, and London: Oxford University Press. Grant, W. (1931) The Scottish National Dictionary, 10 vols. Edinburgh: Scottish National Dictionary Association. Kay, Christian and Wotherspoon, Irené (2002) Turning the dictionary inside out: some issues in the compilation of a Historical Thesaurus. In: Javier E. Diaz Vera (ed.) A Changing World of Words: Studies in English Historical Lexicography, Lexicology and Semantics. Amsterdam: Rodopi, 109–35. Kay, Christian, Roberts, Jane, Samuels, Michael and Wotherspoon, Irené (eds) (2009) Historical Thesaurus of the Oxford English Dictionary: with additional material from A Thesaurus of Old English, 2 vols. Oxford: Oxford University Press. Mugglestone, Lynda (ed.) (2000) Lexicography and the OED: Pioneers in the Untrodden Forest. Oxford: Oxford University Press. Murray, J. A. H. et al. (1933) The Oxford English Dictionary: Being a Corrected Re-issue with an Introduction, Supplement, and Bibliography of A New English Dictionary on Historical Principles, Founded Mainly on the Materials Collected by the Philological Society. Oxford: Clarendon Press. Murray, K. M. E. (1977) Caught in the Web of Words: James A. H. Murray and the Oxford English Dictionary. New Haven and London: Yale University Press. Oxford University Press (1916) The Oxford Dictionary: A Brief Account. Oxford: Oxford University Press. Proffitt, Michael and Simpson, John (1997) Oxford English Dictionary Additions Series. Oxford: Clarendon Press. Simpson, J. A. and Weiner, E. S. C. (1993) Oxford English Dictionary Additions Series, 2 vols. Oxford: Clarendon Press. Stanley, E. G. (1990) The Oxford English Dictionary and Supplement: the integrated edition of 1989. Review of English Studies 61, 76–88. Winchester, Simon (1998) The Surgeon of Crowthorne. London: Penguin. —(2003) The Meaning of Everything: The Story of the Oxford English Dictionary. Oxford: Oxford University Press. Wright, J. (1898) The English Dialect Dictionary. London: Henry Frowde.
354
5.3
The Future of Dictionaries, Dictionaries of the Future Sandro Nielsen
Chapter Overview Introduction Redefining Dictionaries Dictionaries of the Future Concluding Remarks
355 356 357 370
1 Introduction The lexicographic landscape is gradually changing due to internal and external developments. Lexicographic research activities, findings and output in the form of principles, theories and dictionaries represent internal developments, whereas changes in social behaviour, information needs and technology are typical examples of external factors affecting research in and the making of dictionaries. In this light, it is relevant to ask the question: Do we have dictionaries in the future? Possible answers depend on, among other things, what we mean by the term ‘dictionary’ and the time frame involved. Experience shows that new artefacts eventually replace existing ones in most, if not all, cases; though it would be imprudent to think that dictionaries will disappear within the near future. As long as people find that some of their needs for information can be solved by consulting dictionaries, their existence will be ensured for some time. 355
The Bloomsbury Companion to Lexicography
What objects people will regard as dictionaries may change, however, owing to a range of factors, including the types of need identified, the media available, and the types of help provided. The future of dictionaries has been discussed sporadically in the literature during the past two decades. Andersen and Nielsen (2009) and Samaniego Fernández and Pérez Cabello de Alba (2011) are some of the most recent contributions, and they mention a number of trends that may shape the future. First of all, they point to the ontological shift from linguistics to lexicography as a separate discipline or as part of information science, that is, a theoretical development. The second trend is the change in the form and size of printed and electronic dictionaries, that is, a practical development. Finally, the authors find that more and more lexicographers recognize that general lexicography can benefit from the theoretical and practical advances in specialized lexicography. These trends are discussed below.
2 Redefining Dictionaries One of the first things to consider in a forward-looking study is the nature of the social reality examined. The ontological position of lexicographers is, therefore, imperative because it affects the way in which lexicographers perceive their objects of research and work. Dictionaries are often described as reference books containing words and their spelling, pronunciation and meaning, or as reference works containing words and their translations in another language, see, for example, Crystal (2010: 112) and Sterkenburg (2003: 396). However, developments in both practical and theoretical lexicography have highlighted some of the limits of such ontological positions: they are specific examples of types of dictionary; they are not generally applicable; and they are often mutually exclusive. In other words, a dictionary is not just a dictionary. In an attempt to describe the dictionaries of the future, a wider and more complex ontological position is necessary. Nielsen (2009: 215) proposes that dictionaries are reference tools made up of several surface features, that is, features that are visible to users in print or on screens, such as user guides, word lists, appendices, search sites and result sites. In addition, dictionaries have at least three significant underlying features, the overriding of which is that dictionaries are designed to have one or more lexicographic functions. A lexicographic function is the type of help a dictionary can give to a specific user type in a specific type of non-lexicographic situation in which someone may consult dictionaries to find help; for instance communicative functions such as providing help to translate, understand or write texts, and cognitive functions such as to provide help to acquire knowledge independently of communicative activities (e.g. Bergenholtz and Tarp 2010: 30). 356
The Future of Dictionaries, Dictionaries of the Future
Secondly, dictionaries contain data that have been selected to support the relevant function(s), and thirdly, lexicographic structures combine and link the data so that they support and fulfil the dictionary function(s). These features should not be seen in isolation, as a dictionary is made up of the totality of surface and underlying features and their interrelationships. The proposed ontological position is generally applicable in lexicographic contexts. It applies, for instance, to printed and electronic dictionaries, to monolingual and bilingual dictionaries, to general and specialized dictionaries, to learner’s and expert’s dictionaries, to language and encyclopedic dictionaries, and it underlines the fact that dictionaries are complex lexicographic information tools. While traditional definitions of dictionaries are rooted in linguistics, the function-oriented definition primarily focuses on information needs of users and how lexicographers can respond to such needs (see also Chapter 5.1 in this book). This shift is accentuated by Hartmann (2012: 101): ‘Lexicography is not just part of applied linguistic lexicology and technical terminology, but a potentially independent field capable of expansion, with reference works (or “information tools”) other than dictionaries – like encyclopedias, atlases, catalogues, manuals and directories – being just as important.’ With the functional approach, theoretical and practical lexicography are not subject to linguistic constraints and this approach provides a platform from which lexicographers can respond satisfactorily to the needs for information in the knowledge and information society of tomorrow.
3 Dictionaries of the Future It is notoriously difficult to forecast the future. Even so, forward-looking statements may express opportunities based on possible courses of action related to present-day knowledge; according to Mićić (2010: 1), ‘Most of the trends, technologies and issues that will determine our future in the next ten to twenty years are already visible now. The future is already here; it just hasn’t arrived everywhere to the same extent.’ The above discussion indicates that one challenge facing the lexicographic community is the provision of tools that can satisfy needs for information. At a theoretical level, the shift of focus towards lexicography and information science requires a theoretical basis that applies to all dictionaries and not separate theories for individual types of dictionaries, that is, a transformative approach instead of a contemplative approach (e.g. Gouws 2011: 17–20, Tarp 2009: 24). A transformative theory is prospective and allows lexicographers to develop guidelines for designing and making dictionaries that are adapted to specific types of users and to specific types of user situations. Even though Bergenholtz (2011: 30) laments the limited effect theory has had on lexicographic practice, either because publishers do not (sufficiently) 357
The Bloomsbury Companion to Lexicography
take theoretical advances into consideration or because a long-time lag is involved, dictionaries of the future will likely benefit from something like function-oriented principles and theories in order to satisfy information needs. But what will these dictionaries look like?
3.1 Can Dictionaries Be Made Future-Proof? Dictionaries will be available in many shapes and sizes and two general types can be identified: printed dictionaries and electronic dictionaries. Even though this distinction seems clear, there is not always a clear dividing line between the two. As pointed out by Andersen and Nielsen (2009: 360–1), it has been claimed that printed dictionaries are an endangered species that will be replaced by electronic ones. Nonetheless, printed dictionaries are likely to be with us for still some time. Printed dictionaries invite users to adopt a slow and reflective consultation procedure in connection with certain functions, for example, providing help to language learning. Furthermore, publishers offer dictionaries printed on demand, which may keep costs down; and printed dictionaries are not dependent on power sources in order to work. Other factors that may extend the life of printed dictionaries are that many come with CD-ROMs containing the text of the printed dictionary as well as additional material, that some printed dictionaries are transferred to electronic media without any or few changes (e.g. Tono 2009), and that some dictionaries come with interactive e-texts available from publishers via the internet. Such e-texts may include text corpora, audio pronunciation guides, still pictures, video footage, online writing guides and the possibility of making personalized notes (e.g. Oxford Advanced Learner’s Dictionary and Dictionary of Law). Electronic dictionaries also come in a great many varieties. They can be dictionaries that were originally produced as printed books and transferred to physical electronic media (e.g. CD-ROMs and pocket electronic dictionaries), they can be dictionaries originally produced in electronic form for one medium and transferred to another (e.g. from dictionaries available only on CD-ROMs to being accessible online), and they can be genuine online dictionaries accessible from PCs, flat screens, tablets and smart phones. One major difference between printed and (some) electronic dictionaries is their location in relation to users. Access to printed dictionaries and electronic dictionaries stored on physical media (printed books, CD-ROMs, DVDs, desktop applications) depends on the actual location of the dictionaries relative to users in most situations, whereas it is not a question of where online dictionaries are located but how they can be accessed no matter where users are and when users want access. In light of the above, lexicographers may abandon the dichotomy printed versus electronic dictionaries and instead distinguish between offline and online dictionaries. 358
The Future of Dictionaries, Dictionaries of the Future
Whether printed, electronic, physical or online, many of these dictionary types are likely be with us for some time to come. Developments in some countries, for example, Scandinavia, may indicate what will happen to printed dictionaries. The Scandinavian languages belong to the so-called small languages and publishers have gradually discontinued printing individual dictionaries and made them available only in electronic form. The large bilingual dictionaries to and from Scandinavian languages are now mainly available in electronic form due to falling sales of printed copies, and the large (multi-volume) monolingual dictionaries, whether they are called lexicon, encyclopedia etc., are experiencing a similar development. If this trend is applied to the general market for dictionaries, those printed dictionaries that are most likely to survive will be medium-sized, monolingual dictionaries covering one of the United Nations’ world languages, and medium-sized, bilingual dictionaries between these world languages and important pairings of world languages and non-world languages. The surviving dictionaries will be supplemented by online access to the corresponding digitized dictionaries and additional e-texts and digital recordings that support the lexicographic functions of individual dictionaries as described above and with extensions, for example, by providing help to learners who translate texts by showing different translation strategies for particular genres (Nielsen 2010). Online dictionaries may be made available by different types of commercial suppliers. Most publishing houses act as specialist shops offering access to their own dictionaries, usually on a subscription basis, so that (potential) dictionary users can take out subscriptions for one or more dictionaries for specified periods. In the communication and information society another type of actor enters the stage, namely what may be called lexicographic supermarkets. These are internet businesses, not publishers, who offer access to dictionaries they have not produced themselves but licensed or bought from one or more publishers and one or more authors. These lexicographic supermarkets provide a wide range of web-based services catering to peoples’ needs for information and knowledge.
3.2 Digitization of Information Tools An external factor that has great influence on lexicography is the digitization of information activities. The general trend is that printed media lose market shares to electronic media as carriers of data, and since the 1980s this trend has affected theoretical and practical lexicography. The digitization of communication in general will, in the long run, result in printed dictionaries giving way to online information tools. In principle, everybody can upload information tools, including dictionaries, to the internet and many contribute to the contents of 359
The Bloomsbury Companion to Lexicography
various types of wikis, and since it is impossible to predict what everybody will do in the future, a number of selected topics with general application to both theoretical and practical lexicography will be addressed. In today’s knowledge and information society we are constantly exposed to a plethora of data coming to us from many different sources, and this data, or information, blitz will continue in the future. As dictionaries contain data, they should be able to give users something better and more helpful than other information tools, otherwise it will be difficult to justify their existence. Online dictionaries compete with internet search engines as providers of data that can be turned into information when processed by readers, but the main problem of internet search engines is that they tend to provide too many results from searches in a vast sea of unstructured data, and the results are often irrelevant for the particular information needs of searchers. One way in which lexicographers can future-proof dictionaries is to develop theories or principles that allow them to design and produce information tools that give users the opportunity to access structured data with targeted searches and have the search results presented in structured ways that tell users exactly what they need to know. The detachment from linguistics causes lexicographers to change their perception of dictionaries. Talking and thinking about online dictionaries, people often imagine a database that is accessed by users from an interface whose only function is to give direct access to the database. The relationship between database and dictionary is thus a one-to-one relationship and the database is identical with the dictionary. Andersen and Almind (2011), Bergenholtz (2011), Bergenholtz et al. (2011), Bothma (2011), Nielsen and Almind (2011) and Spohr (2011), among others, discuss different technical setups describing online dictionaries as constructions with three main components. First, there is a database containing specially selected data that have been structured in a way that facilitates search and retrieval. Secondly, users may have access to not just one but several dictionaries via their interface. Thirdly, users access the lexicographic data through a search engine introduced as a mediator between the dictionary and the database, allowing users to search for data in the database, from where the search engine retrieves and presents the relevant data according to user requests. In this case, the relationship between database and dictionary is a one-to-many relationship and the dictionary is not identical with the database.
3.3 One Database May Serve Many Dictionaries The tools of information and communications technology allow lexicographers to include considerably more data in databases than in printed dictionaries, but lexicographers should carefully consider how many of these data users will be presented with. The 18 contributions in Haß and Schmitz (2010) on internet 360
The Future of Dictionaries, Dictionaries of the Future
lexicography are generally based on the premise that databases should contain as comprehensive a coverage of linguistic concepts as possible and present all to users, that is, the more data addressed to a headword the better for documentation purposes and hence users. This reflects the idea that the dictionary is equal to the database and that dictionaries have the sole function of documenting linguistic concepts. Most users, however, do not have documentation problems (this type of problem seems to be the concern of philologists and terminologists) but problems in extra-lexicographic contexts which amount to genuine needs for specific information about something in order to complete a specific task (see the discussion in Bergenholtz 2012). Dictionaries should offer satisfactory help in such situations, not with an overload of data but with carefully selected and presented data that satisfy information needs; this is in line with the general principles of communicating in a modern society as expressed by Sternberg (1988: 58): ‘The best way to inform your reader is to tell them what they are likely to want to know – no more and no less.’ One way in which to develop such dictionaries is to treat the database as a comprehensive collection of structured data in which dictionaries search for the data that tell users exactly what they want to know. The needs for specific types of data to satisfy specific types of need require a proper lexicographic response. Publishers and lexicographers who treat the database as equal to the dictionary will have to develop a new database for every dictionary, but this is both time-consuming and expensive. If they take the alternative approach described above, lexicographers and publishers only have to develop one database in order to publish several dictionaries and the database can be monolingual, bilingual and multilingual depending on the design and purpose of the entire dictionary concept. Monolingual databases can only form the basis of monolingual dictionaries, whereas bilingual databases can form the basis of several monolingual dictionaries in either language and several bilingual dictionaries between the two languages. For example, a database that contains data in two languages (L1 and L2) can be the source of the following set of dictionaries with the identified functions: zz Dictionary providing help to understand a word or term in L1.
zz Dictionary providing help to understand a word or term in L2.
zz Dictionary providing help to produce a text where the expression is
known in L1.
zz Dictionary providing help to produce a text where the expression is
known in L2.
zz Dictionary providing help to find a word or term where the meaning is
known in L1.
zz Dictionary providing help to find a word or term where the meaning is
known in L2.
361
The Bloomsbury Companion to Lexicography zz Dictionary providing help to translate a word or term from L1 into L2. zz Dictionary providing help to translate a word or term from L2 into L1.
zz Dictionary providing help to translate a collocation or phrase from L1
into L2.
zz Dictionary providing help to translate a collocation or phrase from L2
into L1.
zz Dictionary providing help to acquire knowledge about a word, term or
concept in L1.
zz Dictionary providing help to acquire knowledge about a word, term or
concept in L2.
As this list indicates, each dictionary has its own function (i.e. they are monofunctional dictionaries) and search options are tailor-made for the function in question. The two dictionaries providing help to acquire knowledge may be described as polyfunctional in that they can each be used to acquire general knowledge about a topic (e.g. the inflectional paradigm of irregular verbs) or specific knowledge about a topic (e.g. the height of the Eiffel Tower in Paris). Only these cognitive dictionaries are likely to present users with all the data addressed to a word, term or concept in the database, while the other ten dictionaries present different sets of a carefully selected number of data types from the database. Online dictionaries based on a distinction between database and dictionary have been designed and described in the literature, see, for example, Bergenholtz (2010, 2012), Bergenholtz and Bothma (2011), Bergenholtz and Gouws (2010), Fuertes-Olivera and Nielsen (2012) and Tarp (2011). Nielsen and Almind (2011) describe a set of accounting dictionaries linked to an English and Danish database and show how the database may be connected to the surface feature component called result site (Fuertes-Olivera et al. 2012). Readers who want to know the meaning of a particular English term found in a text (e.g. notifications) may consult the English dictionary providing help to understand a word or term. The search engine makes a targeted search for the term in the database in those data fields containing headwords and their inflections, allowing users to search for the base form of a word or term as well as its inflected forms as encountered in texts. The search engine retrieves the data addressed to the search string as shown in Figure 5.3.1. The dictionary presents data that are intended to help users understand words and terms found in texts: the meaning of the term searched for – ‘no more and no less’. Definitions are written with the factual and linguistic competences of the intended user group in mind and written as full sentences using natural language to make it easy for users to turn the data into useful information by a mental process.
362
The Future of Dictionaries, Dictionaries of the Future
Figure 5.3.1 Search result helping users to understand the word or term ‘notification’ Authors may need help with writing texts that contain the term notification and consult the dictionary providing help to produce texts where the expression is known. The search engine searches the database for the following three types of data: inflection, collocation and example. Figure 5.3.2 contains the data retrieved and their arrangement as presented to users. The data types in Figure 5.3.2 all provide help to write texts in English. The definition allows users to ascertain whether the word has the correct meaning in the writing context and the grammar data, collocations and examples support actual text production. Again users are presented with a limited number of data types selected from all the data types contained in the database and addressed to the headword notification. Translators who need help with translating the English term notification may consult the dictionary providing help with translating a word or term from English into Danish. The search engine searches the database for the following type of data: inflection. The dictionary presents a range of data types including: headword, definition, collocations, examples, equivalent and inflection. Figure 5.3.3 shows the result of the search. In addition to presenting the meaning of the term, the data contain the Danish equivalent ‘anmeldelse’, its inflectional paradigm, two synonyms to the equivalent, English collocations and example sentences with translations into Danish. The dictionary recommends one equivalent and informs users that 363
The Bloomsbury Companion to Lexicography
Figure 5.3.2 Search result helping users to write texts with the word or term ‘notification’
Figure 5.3.3 Search result helping users to translate the word or term ‘notification’
364
The Future of Dictionaries, Dictionaries of the Future
there are two synonymous expressions in Danish, so that users are given one suggested solution to the translation of the English word, instead of presenting three equivalents and leaving it to users to pick one; this keeps the lexicographic information costs at a minimum (see Section 3.5 below). Authors may want to express a specific meaning but not know the exact word to use. By consulting the dictionary providing help to find a word or term where the meaning is known, users can search for a word expressing the meaning ‘change the original amount’ and the search engine searches the database in the following data types: definition, usage note, synonym and antonym. The dictionary gives users the search result shown in Figure 5.3.4. On the basis of the data presented in Figure 5.3.4, users can establish whether the word has the correct meaning through the definition and find help with text production in the form of inflection, collocations and examples. Figures 5.3.1–5.3.4 show dictionaries providing help in communicative situations, but help in cognitive situations may also be available. Students may want to acquire general or specific knowledge about the term reinsurer and consult the dictionary providing help to acquire knowledge about a word, term or concept. The search engine makes a targeted search in the database in two types of data, namely inflection and definition, and retrieves the relevant data types as presented in Figure 5.3.5. The definition in Figure 5.3.5 explains the meaning of the term, which is complemented by the context example; and the synonym and antonym
Figure 5.3.4 Data providing help to find a word or term where the meaning is known 365
The Bloomsbury Companion to Lexicography
Figure 5.3.5 Help in cognitive situations providing all data addressed to the search word help place the term ‘reinsurer’ in a terminological hierarchy. By clicking the cross-reference (‘See also’), users are taken to another article with relevant additional data and the item indicating the source of the definition is a link transferring students to the website where the International Financial Reporting Standard (IFRS) is found. There students can find additional information and gain more knowledge. An alternative description of the above scenario is possible. It may be argued that the bilingual database serves four polyfunctional dictionaries each with several search options so that users who consult one of the monolingual dictionaries have the choice of searching for the following kinds of assistance: zz Help to understand a word, term or concept.
zz Help to write a text where the expression is known.
zz Help to find a word or term where the meaning is known.
zz Help to acquire knowledge about a word, term or concept.
Users who consult one of the bilingual dictionaries may search for the following kinds of assistance: zz Help to understand a word, term or concept. zz Help to translate a word or term.
zz Help to translate a collocation or phrase.
366
The Future of Dictionaries, Dictionaries of the Future
Search options can thus be tailor-made for specific lexicographic functions and polyfunctional online dictionaries with the above search options will work in a simple way. When they consult the dictionaries, users will type their search strings into search boxes and select the type of help they want, whereupon search engines will search the database and retrieve the relevant data. These data will then be presented to users on the dictionary websites in a predetermined order, similar to the examples shown in Figures 5.3.1–5.3.5 above. This is also in line with the communicative principle of telling users only what they need to know and the principle of allowing users to access lexicographic data in different but targeted ways.
3.4 Access to and Presentation of Lexicographic Data Online dictionaries offer various access routes to their data. Some have alphabetical lists of headwords that users can click and then be transferred to the relevant articles, and others are mere digitized texts in which users have to scroll up and down in search for information. In these cases users will be presented with the entire data stock in the articles no matter why they consulted the dictionaries. Modern information technology offers more advanced search options that can be directly linked to user needs in those cases where lexicographers adopt a setup that distinguishes between database and dictionary. By focussing on the needs of users in various types of user situation, lexicographers can ensure that the data retrieved satisfy a specific type of need and are presented in such a way that users can easily turn the data into useful information. Databases and online dictionaries allow lexicographers to take a dynamic approach to accessing and presenting definitions. Bergenholtz and Kaufman (1997) and Nielsen (2011) explain that this may result in the presentation of more than one definition for each meaning of a word or concept in the sense that definitions are written in different genres. First, databases and dictionaries can have one type of definition for each function. A good definition supporting text production in a foreign language is likely to differ from a good definition for text production in the same language by native speakers; and a definition that can best support text comprehension in the user’s native language is likely to differ from a good definition supporting the acquisition of knowledge. Secondly, an online dictionary designed to help different types of user may contain definitions that reflect the cultural, factual and linguistic competences of, for instance, beginners, intermediate and advanced learners, or laypersons, semi-experts and experts. The definition that can best help laypersons understand a concept is different from the definition that experts and semi-experts need in order to successfully understand the same concept in their field of specialization. 367
The Bloomsbury Companion to Lexicography
One route of access is for users to type the search word into a search box and indicate whether they are laypersons, semi-experts or experts (alternatively: beginners, intermediate or advanced learners). The access option for laypersons will search the database for the graphemic search string in the data fields containing headwords and those containing definitions intended for laypersons and the output device presents the data retrieved. Similarly, the search engine will search the headword fields for the search string and the definition fields for the appropriate definition marked for semi-experts or experts and the output device will show the result of either search. In short, online dictionaries may contain several definitions of the same headword written in different genres and instead of showing all three definitions every time users look for a particular headword, these dictionaries will show only the definition users need. Databases and online dictionaries may also present definitions in two or more languages. This is, for instance, relevant in connection with dictionaries treating English vocabulary for non-native speakers, and Kwary (2012: 36) shows how it is possible to help users with definitions in English as a default option and definitions in Indonesian as an option for users whose English-language competence makes them unable to properly understand the English definitions. The database will contain definitions of the same word or concept in two or more languages, and the dictionary will show the one specifically written for a particular function or type of user as requested by those who consult it. Online dictionaries may allow users to perform non-manual searches. In the future, people may prefer voice-activated access so that they can say a word and this will trigger the search and retrieval functions similar to the way that voice-activated navigation systems work. In these cases users will be given an audio-visual presentation of the data found, for example, a definition, a phrase or an inflectional paradigm will be shown and read aloud. Voice-activated access and audio-visual presentation of search results will benefit dyslexic and visually impaired persons in particular. Online dictionaries can provide help in situations other than communicative and cognitive. Tarp (2008: 127) explains how dictionaries may function as ‘how-to’s’, that is, books giving instructions on how to do specific things, for example, how to operate machines, and such dictionaries have operative functions. For instance, a group of student nurses may have been asked to take blood from patients but have not received any instructions on how to carry out the task. In such situations they may consult online dictionaries and find the necessary instructions either as written text, as series of illustrations or photos, as audio guides, or as video footage with voice-over. Furthermore, it is possible that online dictionaries may present data in three-dimensional form, including holograms. This requires users to have the appropriate equipment, which may be particularly helpful for dictionaries whose functions are operative; for 368
The Future of Dictionaries, Dictionaries of the Future
example by showing the inside of an internal combustion engine and how the individual parts function. A characteristic feature of digital media is that they can be personalized. Information and communications technology may allow users to personalize their dictionaries through the various search options available as described above, and by allowing users to upload data to their dictionaries. Users may then have personalized dictionaries in which they can search in the database as well as the personally added data. Nevertheless, user uploaded data should be completely separate from the data in the database (which is only accessible to editors) but it should be possible to connect the uploaded data with specific headwords, definitions, collocations, illustrations, pictures, video footage, etc. Irrespective of technical setup, users value easy and quick access to and clear presentation of data in dictionaries. Lexicographers should therefore consider the ease with which users will be able to acquire the necessary information from the data and the way in which these data are accessed. Ease of access and appropriate presentation of data come under the heading lexicographic information costs, which are defined as the effort that users believe or feel is associated with consulting a dictionary or any part of it. Search-related information costs are the effort related to the look-up activities users have to perform when consulting a dictionary in order to get access to the data they are searching for. The more activities users have to perform to find the help they need, the higher will be the costs. Comprehension-related information costs are the effort related to the ability of users to understand and interpret the data presented in a dictionary, and this effort is related to the cultural, factual and linguistic competences of users and the way in which the data are presented. An appropriate dictionary design and data presentation structure may keep lexicographic information costs at a low level, whereas an inappropriate design and use of structures may lead to high-information costs. For example, the access options available in online dictionaries affect users’ perception of the costs associated with finding help to solve problems. The actual wording and presentation of data, such as a high degree of textual condensation, may result in high-information costs, while clear and consistent access routes may reduce lexicographic information costs (Nielsen 2008: 173–4). If they are presented with too many data, users incur high-information costs, not only finding (or not finding) the data that answer their questions, but also having to read all the data to make sure that they have not missed anything. This means that online dictionaries should present the data in such a way that users feel they get answers to their questions with ease and have gained useful information by consulting the dictionaries. Unfortunately, lexicographic information costs cannot be eliminated, but prudent and proper attention to this aspect may result in a reasonable cost level that does not seriously affect the use of dictionaries. 369
The Bloomsbury Companion to Lexicography
4 Concluding Remarks Dictionaries do have a future and dictionaries of the future will, to an increasing extent, be regarded as ‘digital assistants’. This development may be explained in terms of economic sectors in modern society: dictionaries are in a transitional phase from the manufacturing sector into the service sector in an attempt to keep up with the general move into a knowledge and information society. Dictionaries of tomorrow will be information tools which, through their surface and underlying features, provide help to satisfy specific types of lexicographically relevant need of specific types of potential user in specific types of extra-lexicographic situation. These tools will be designed and developed in line with advances in society, for example communications technology and the general use of digital media, and lexicographers need a platform that allows them to respond satisfactorily to the needs for information and knowledge of actual and potential users. The lexicographic platform will be supported by two pillars: a theoretical and a practical one. The theoretical pillar focuses on needs-adapted data presentation, using principles for making dictionaries that provide users with limited amounts of structured data from which useful information can be retrieved. The practical pillar will be a technical one, using available technical features exactly because they can provide help that satisfies user needs and not simply because they are available. The above discussion is based on the possible courses of action related to present-day knowledge found in general as well as specialized lexicography and shows how databases can serve as bases for several dictionaries; how users can search for help in communicative, cognitive and operative situations; how dictionaries can provide data that specifically cater to different types of user; how users can personalize dictionaries; and how lexicographers can offer users different ways of access to the lexicographic data (see also Chapter 5.1 in this book). Online dictionaries thus allow surface features to interact dynamically with underlying features, and vice versa. This does not mean that the setup of online dictionaries will change overnight, but the theoretical and practical issues discussed in this chapter provide some pointers to the future of dictionaries and dictionaries of the future.
References Dictionaries Nielsen, S., Mourier, L. and Bergenholtz, H. (2012) Accounting Dictionaries. (A series of 13 interconnected Danish, English, Danish-English, English-Danish dictionaries). Database and design: Richard Almind and Jesper Skovgård Nielsen. Odense: Ordbogen.com. Available at: www.ordbogen.com (Accessed 30 March 2012).
370
The Future of Dictionaries, Dictionaries of the Future Richards, P. H. and Curzon, L. B. (2011) Dictionary of Law, 8th edition. Harlow: Longman. Turnbull, J. (ed.) (2010) Oxford Advanced Learner’s Dictionary, 8th revised edition Oxford: Oxford University Press.
Other References Andersen, B. and Almind, R. (2011) The technical realization of three monofunctional phrasal verb dictionaries. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds), 208–29. Andersen, B. and Nielsen, S. (2009) Ten key issues in lexicography for the future. In: H. Bergenholtz, S. Nielsen and S. Tarp (eds), 355–63. Bergenholtz, H. (2010) Needs-adapted data access and data presentation. Doctorado Honoris Causa del Excmo. Sr. D. Henning Bergenholtz. Valladolid: Universidad de Valladolid, 41–57. —(2011) Access to and presentation of needs-adapted data in monofunctional internet dictionaries. In: P.A. Fuertes-Olivera and H. Bergenholtz (eds), 30–53. —(2012) Concepts for monofunctional accounting dictionaries. Terminology 18/2, 243–63. Bergenholtz, H. and Bothma, T. (2011) Needs-adapted data presentation in e-information tools. Lexikos 21, 53–77. Bergenholtz, H., Bothma, T. and Gouws, R. (2011) A model for integrated dictionaries of fixed expressions. In: I. Kosem and K. Kosem (eds) Electronic Lexicography in the 21st Century. New Applications for New Users: Proceedings of eLex 2011, Bled, 10–12 November 2011. Trojina: Institute for Applied Slovene Studies, 34–42. Bergenholtz, H. and Gouws, R. (2010) A new perspective on the access process. Hermes 44, 103–27. Bergenholtz, H. and Kaufmann, U. (1997) Terminography and lexicography. A critical survey of dictionaries from a single specialized field. Hermes 18, 91–125. Bergenholtz, H., Nielsen, S. and Tarp, S. (eds) (2009) Lexicography at a Crossroads. Dictionaries and Encyclopedias Today, Lexicographical Tools Tomorrow. Bern: Peter Lang Bergenholtz, H. and Tarp, S. (2010) SP lexicography or terminography? The lexicographer’s point of view. In: P. A. Fuertes-Olivera (ed.), 27–38. Bothma, T. (2011) Filtering and adapting data and information in an online environment in response to user needs. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds), 71–102. Crystal, D. (2010) The Cambridge Encyclopedia of Language. Cambridge: Cambridge University Press. Fuertes-Olivera, P. A. (ed.) (2010) Specialised Dictionaries for Learners. Berlin and New York: Walter de Gruyter. Fuertes-Olivera, P. A. and Bergenholtz, H. (eds) (2011) e-Lexicography. The Internet, Digital Initiatives and Lexicography. London and New York: Continuum. Fuertes-Olivera, P. A., Bergenholtz, H., Nielsen, S. and Amo, M. N. (2012) Classification in lexicography: the concept of collocation in the Accounting Dictionaries. Lexicographica 28, 293–308. Fuertes-Olivera, P. A. and Nielsen, S. (2012) Online dictionaries for assisting translators of LSP texts: the accounting dictionaries. International Journal of Lexicography 25/2, 191–215. Gouws, R. H. (2011) Learning, unlearning and innovation in the planning of electronic dictionaries. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds), 17–29. Hartmann, R. R. K. (2012) [Review of] Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds). e-Lexicography. The Internet, Digital Initiatives and Lexicography. International Journal of Lexicography 25/1, 99–103.
371
The Bloomsbury Companion to Lexicography Haß, U. and Schmitz, U. (eds) (2010) Thematic section. Lexicographica 26. Kwary, D. (2012) Adaptive hypermedia and user-oriented data for online dictionaries: a case study on an English dictionary of finance for Indonesian students. International Journal of Lexicography 25/1, 30–49. Mićić, P. (2010) The Five Futures Glasses. How to See and Understand More of the Future with the Eltville Model. Basingstoke: Palgrave Macmillan. Nielsen, S. (2008) The effect of lexicographical information costs on dictionary making and use. Lexikos 18, 170–89. —(2009) The evaluation of the outside matter in dictionary reviews. Lexikos 19, 207–24. —(2010) Specialised translation dictionaries for learners. In: P. A. Fuertes-Olivera (ed.), 69–82. —(2011) Function- and user-related definitions in online dictionaries. In: F. I. Kartashkova (ed.) Ivanovskaya leksikografischeskaya shkola: traditsii i innovatsii [Ivanovo School of Lexicography: Traditions and Innovations]: A Festschrift in Honour of Professor Olga Karpova. Ivanovo: Ivanovo State University, 197–219. Nielsen, S. and Almind, R. (2011) From data to dictionary. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds.), 141–67. Samaniego Fernández, E. and Pérez Cabello de Alba, B. (2011) Conclusions: ten key issues in e-lexicography for the future. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds.), 305–11. Spohr, D. (2011) A multi-layer architecture for ‘pluri-monofunctional’ dictionaries. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds.), 103–20. Sterkenburg, P. van (ed.) (2003) A Practical Guide to Lexicography. Amsterdam and Philadelphia: John Benjamins. Sternberg, R. J. (1988) The Psychologist’s Companion: A Guide to Scientific Writing for Students and Researchers. Cambridge: Cambridge University Press and British Psychological Society. Tarp, S. (2008) The third leg of two-legged lexicography. Hermes 40, 117–31. —(2009) Beyond lexicography: new visions and challenges in the information age. In: H. Bergenholtz, S. Nielsen and S. Tarp (eds), 17–32. —(2011) Lexicographical and other e-tools for consultation purposes: towards the individualization of needs satisfaction. In: P. A. Fuertes-Olivera and H. Bergenholtz (eds), 54–70. Tono, Y. (2009) Pocket electronic dictionaries in Japan: user perspectives. In: H. Bergenholtz, S. Nielsen and S. Tarp (eds), 33–67.
372
6
Resources Reinhard Hartmann
Chapter Overview Introduction Academies Associations Corpora/Databases Journals Networks Online Dictionaries Publishers University Research Centres Conclusion
373 374 377 379 381 382 383 385 386 389
1 Introduction Is lexicography a practical field (linked to publishing) or a scientific discipline (part of ‘dictionary research’ and ‘reference science’), with complex interconnections to linguistic studies, information technology and other subjects? The answers to such questions will determine its academic status, but one important condition for its future success is the availability of reliable resources. For the purposes of this chapter, resources are defined as ‘compendia of valuable assets for work and study’ – not so much for the lexicographic compilation process, but for research on such topics as dictionary history, dictionary typology, dictionary criticism, dictionary structure, dictionary use and dictionary IT. Having played a part in developing dictionary research over the last three decades and contributing to surveys of the dictionary scene more recently (Hartmann 2010a), I can document some of the major facilities that benefit 373
The Bloomsbury Companion to Lexicography
scholars specializing in lexicography.1 There are several problems, however, in this area. Information often tends to be scattered and out of date, so it is difficult to select genuine ‘knowledge’ from the few available sources, complicated by tensions between such extremes as private versus public initiatives, individual versus collective ventures, amateur versus professional activities, subject-based versus interdisciplinary specializations, commercial versus academic bodies, language-based versus encyclopedic reference tools, paper/print versus electronic media, local versus regional efforts and national versus international organizations. The information presented here is limited to eight types of resource and has been grouped into the following sections: Academies – for establishing details about the national status of languages, Associations – for finding and sharing forums with like-minded specialists, Corpora/Databases – for testing the empirical evidence on linguistic usage, Journals – for consulting professional organs to update knowledge, Networks – for checking what links are available between groups of experts, Online Dictionaries – for exemplifying trends in the digital media, Publishers – for finding and distributing information, and University Research Centres – for advancing academic progress through investigation and training. (A further set of resources – bibliographical references – can be consulted in Chapter 8.) My choice of such resources is not arbitrary (as may be suggested by their presentation in alphabetic order), but an effort has been made to specify some of the thematic and chronological links between them and to specify reasons why they are important for dictionary research. In each case, a table is presented with a list of 10 representative examples. The coverage of the information is unavoidably selective (not only because of the restriction to 5,000 words), and there are many overlaps between them, but I trust that the data offered in each section will prove useful to readers.
2 Academies How do we obtain and document knowledge about national languages? The answer to this question still depends on which kinds of institutions are available, one of which can be a national academy of science. Many such state-funded research institutions are engaged in dictionary and encyclopedia projects and often may include specialized libraries, archives and institutes for the study of language history, dialectology, lexicology, terminology and
374
Resources
onomastics (Hartmann 2012). This aspect of research into dictionary history can be quite revealing, as it overlaps with efforts to develop ‘standards’ for these national languages, although some aspects of the academy approach have been criticized as ‘regulatory’ and ‘puristic’ (Thomas 1991). At the same time, many innovations are introduced, for example, by applying linguistic and computational frameworks such as corpora to the compilation and publication of reference tools. Many academies of this type are exemplified in Wikipedia, and some are described in detail with reference to their websites. A select list of 8 of them from Europe and 2 from other continents is presented in Table 6.1 in order to illustrate their widely differing structures and some of their achievements in terms of the language institutes they contain, the encyclopedia and dictionary projects they engage in and/or the journals they promote, most with relevance to progress in lexicography (two of them, in Hungary and the Netherlands, have even hosted international congresses of EURALEX). All are cited together with their websites, including those for special language institutes, for example, the Accademia della Crusca in Florence, the Akademie der Wissenschaften in Göttingen and the Académie Française in Paris. Some countries have several competing academies (such as India, where there are three national academies of science). There can be various networks between academies, for example, through national bodies such as the American National Academies at Washington (www.nationalacademies.org/), the Union of 8 German Academies at Mainz (www.akademienunion.de/), the Institut de France with 5 national academies (www.institut-de-france.fr/) and the Council of Finnish Academies (www.academies.fi/). There are also overarching international bodies such as the Association of Spanish-language based Academies (ASALE http://asale.org/), the All-European Academies (ALLEA www.allea. org/), the InterAcademy Panel (IAP www.interacademies.net/Academies// ByCountry.aspx) and the International Union of Academies (UAI www.uai-iua. org/), some of whose websites give more details on their ongoing reference schemes. Sometimes there are other links, for example, via associations like European Federation of National Institutions for Language (EFNIL), International Certificate Conference (ICC) and Mercator for the study of ‘lesser-used’ or endangered minority languages such as Welsh and Scottish Gaelic in the United Kingdom, Frisian in the Netherlands or Basque in Spain, and occasionally tensions can develop between such academy-like bodies and various research centres at universities in the respective countries, for example, when they offer taught courses (see Sections 6 and 9). There are no directories as such devoted to academies, but most of them are cited in general guides such as The Europa World of Learning, and some of
375
The Bloomsbury Companion to Lexicography
Table 6.1 Academies Acronym for the Academy (Location)
Full name of the Academy (Founding date)
Website Achievements e.g. Language institutes, Dictionary projects, Journals
AdC (Florence)
Accademia della Crusca (1583)
Italian Lexicography Centre, Studi di Lessicografia Italiana
www.accademiadellacrusca.it/ www.accademiadellacrusca.it/ Centro_lessicografia.shtml
AdW (Göttingen)
Akademie der Wissenschaften (1751)
German Dictionary by the Grimm Brothers, Goethe Dictionary
http://adw-goe.de/ www.uni-goettingen.de/ de/118878.html
AF (Paris)
Académie Française (1635)
Dictionary of French
www.academie-francaise.fr/ dictionnaire/index.html
AAH (Canberra)
Australian Academy of the Humanities (1969)
Language Studies, Language Atlases, Humanities Australia
www.humanities.org.au/
CASS (Beijing)
Chinese Academy of Social Sciences (1977)
Institute of Linguistics, Chinese Dictionary, Zhongguo Yuwen
www.cssn.cn/english.html www.cssn.cn/news/437181.htm
FA Fryske Akademy (Leeuwarden) (1938)
Frisian Dictionary, Place Names, Databases, Trefwoord
www.fryske-akademy.nl/ www.fryske-akademy.nl/en/ undersyk/taalkunde/
MTA (Budapest)
Linguistic Research Institute, www.mta.hu/ www.nytud.hu/depts/index.html Hungarian Dictionary, Hungarian Corpus
Magyar Tudományos Akadémia (1825)
http://www.pan.pl/ www.ijp-pan.krakow.pl/
PAN (Warsaw) Polska Akademia Nauk (1952)
Institute of the Polish Language, Polish Dictionary, Język Polski
RAE (Madrid)
Real Academia Española (1713)
www.rae.es/ Spanish Dictionary, www.rae.es/rae/gestores/ Spanish Corpus, Escuela gespub000038.nsf/ de Lexicografía Hispánica voTodosporId/A70A70FA0B3 E49A1C12572D4003E1394? OpenDocument&i=1/
RAN (Moscow)
Rossijskaja Akademija Nauk (1724)
Russian Language Institute, Russian Dictionaries, Russian National Corpus, Voprosy Jazykoznanija
www.ras.ru/ www.ruslang.ru/
them are listed in web portals like the Canadian ‘Scholarly Societies’ Project (www.scholarly-societies.org/) or the German-based University Directory (www. university-directory.eu/Academies-directory.html). Some of the international associations (such as ALLEA mentioned above) offer lists, portals and portraits of their member academies.
376
Resources
3 Associations How do we make progress by getting together in groups? It was the development of learned or scholarly societies that eventually led to the creation of academies discussed in Section 2. National and international associations of a slightly different kind form important links between individuals working in the professional context of lexicography (Hartmann 2013). They often provide opportunities not only for publishing the researchers’ own investigation results (in all branches of dictionary research), but also help to make them appreciate alternative approaches from neighbouring fields, which can encourage a wider view of scientific progress. Most notable for lexicography have been the 10 societies listed in Table 6.2, together with their major achievements and websites, such as the Dictionary Society of North America (DSNA) in North America and EURALEX in Europe. Of direct relevance are the continental AFRILEX, ASIALEX and AUSTRALEX, the regional Nordisk Forening for Leksikografi (NFL) for Northern Europe Table 6.2 Associations Acronym for the Association
Full name of the Association
Achievements e.g. Conferences, Journals
Website
AELex
Asociación Española de Estudios Lexicográficos
5 (from 2004), Revista de Lexicografía
www.iula.upf.edu/aelex/
AFRILEX
African Association for Lexicography
17 (from 1996), Lexikos
www.afrilex.africanlanguages. com/
ASIALEX
Asian Association for Lexicography
8 (from 1999)
http://asialex.org/
ATL
Association for Terminology and Lexicography
6 (from 2004)
www.batl.org.uk/
AUSTRALEX
Australasian Association for Lexicography
16 (from 1990)
www.australex.org/
DSNA
Dictionary Society of North America
19 (from 1977), Dictionaries
www.dictionarysociety.com/
EURALEX
European Association for Lexicography
15 (from 1983), International Journal of Lexicography
www.euralex.org/
LEDA
Foreningen af Leksikografer i Danmark
Meetings (from 1989), LEDA-Nyt
http://leksikografer.dk/
NFL
Nordisk Forening for Leksikografi
12 (from 1991), LexicoNordica
http://www. nordisksprogkoordination. org/nfl/
10 (from 1993), Cishu Yanjiu
www.guoxue.com/
ZCX (LSC/CLA) Zhongguo Cishu Xuehui
377
The Bloomsbury Companion to Lexicography
and the national societies, AELex for Spain, Association for Terminology and Lexicography (ATL) for the United Kingdom, Foreningen af Leksikografer i Danmark (LEDA) for Denmark and Zhongguo Cishu Xuehui (ZCX) for China (plus a few in other countries such as Bulgaria, India, Japan and Korea). Some of these have impressive records in terms of the number of conferences held and the proceedings and journals published. To be successful, such associations need to have pioneering members (often affiliated with university research centres → Section 9), innovative special-interest groups, informative websites and publications (→ Sections 5 and 8). Of particular benefit to their participants are regular meetings (such as the 15 biennial congresses held by EURALEX between 1983 and 2012, all with published proceedings) or successful conference series such as the six meetings devoted to historical lexicography and lexicology (www.le.ac.uk/ee/jmc21/ ishll.html), the eight of the European Language Resources Association (www. lrec-conf.org/), the ones of the French and international Journées des Dictionnaires (http://www3.u-cergy.fr/metadif/jdd.html) or those on corpus linguistics and other networks (see Sections 4 and 6). Of interdisciplinary interest may be national and international associations representing such fields as applied, computational and corpus linguistics, terminology, dialectology, onomastics, translation, languages for specific purposes and indexing/archiving. For some of these, networks exist which act as overarching ‘federations’, ‘unions’ or ‘councils’ (sometimes providing directories of their national members), such as the Association for Computational Linguistics (www.aclweb.org/), the Association Internationale de Linguistique Appliquée (www.aila.info/), the European Association for Terminology (www. eaft-aet.net/), the Fédération Internationale des Traducteurs (www.fit-ift.org/), the International Council of Onomastic Sciences (www.icosweb.net/) and the International Society for Dialectology and Geolinguistics (www.sidg.org/). For examples of national and global interdisciplinary bodies – the American Council of Learned Societies (www.acls.org/) and the International Council for Science (www.icsu.org/). There are very few directories and web portals that list lexicography-based associations. Some more associations, particularly on linguistics and the study of English and other languages and language families, are cited in Hartmann (2010b), for another list of associations of relevance to lexicography → www. pangaealex.org/, for a list of authors → www.writers.org.uk/society/, for a select list of conferences → http://linguistlist.org/callconf/index.cfm, for a general guide to academic bodies around the world → The Europa International Foundation Directory.
378
Resources
4 Corpora/Databases Where and how do we obtain empirical evidence on linguistic usage? Dictionary IT has been a field of rapid growth as one of the important branches of meta-lexicography. Corpora and databases overlap in the sense that most collections of spoken or written language material are electronically processed and the relevant information is extracted from them by specially designed technologies, some of which have made significant contributions to recent dictionary projects, supported by new associations and conference series such as the International Computer Archive of Modern and Medieval English at Bergen (http:// icame.uib.no/), the European Language Resources Association (www.elra.info/), the International Corpus Linguistics Conferences (http://cl2011.org.uk/archives. html), the Asociación Española de Lingüística de Corpus (www.cilc2012.es/) and the Asia-Pacific Corpus Linguistics Conferences (http://corpling.com/conf/). The corpora and databases listed in Table 6.3 have been selected to represent their major types for major languages. They range from large to small and can cover a wide range of text genres (one or more languages, spoken or written language, general or LSP) and can concentrate on grammatical and semantic features (such as parts of speech or sense groups). Occasionally tensions arise between optimistic expectations and critical doubts, between spoken and written language data, between general language corpora and those of mixed or specialized text genres, between lexicography-oriented corpora and LSP/terminology databases, between monolingual and bilingual/translation corpora, and even between lexical databases and online dictionaries, as demonstrated by Fuertes Olivera and Bergenholtz (2011). More references to corpora and databases can be found on websites of academies (→ Section 2), associations (→ Section 3), networks (→ Section 6), online dictionaries (→ Section 7), publishers (→ Section 8) and university research centres (→ Section 9). It is impossible to list all important corpus and database projects in view of their variable coverage and ever-changing nature, but a range of them can be located in the Encyclopedia of Applied Linguistics (2012) and at the websites of some of the schemes cited in Table 6.3 or through interfaces, directories and gateways like the following, some with brief descriptions of their contents. CNRTL (Paris) www.cnrtl.fr/corpus/, CoRD (Helsinki) www.helsinki.fi/varieng/CoRD/corpora/index.html, CorpusEye (Odense) http://beta.visl.sdu.dk/visl/corpus.html, ETB (Riga) www.eurotermbank.com/collection_list.aspx, FrameNet (Berkeley CA) http://framenet.icsi.berkeley.edu/fndrupal/framenet _data,
379
The Bloomsbury Companion to Lexicography
Table 6.3 Corpora/Databases Acronym for the Corpus/ Data base
Full name of the Corpus/ Database (Location)
Special features e.g. Language, Text genre (size)
Website
BLF
Base Lexicale du Français (KU Leuven)
French, newspaper texts, 50 m. words
http://ilt.kuleuven.be/blf/
BNC
British National Corpus (U Oxford)
English, mixed genres, 100 m. words
www.natcorp.ox.ac.uk/
BoE
Bank of English COBUILD Corpus (Birmingham U/ Collins)
English, mixed genres, 450 m. words
www.titania.bham.ac.uk/
ČNK
Český Národní Korpus (Charles U Prague)
www.korpus.cz/english/ Czech, mixed genres, index.php 3 b. words, including InterCorp parallel corpus
COCA
Corpus of Contemporary American English (Brigham Young U)
English, mixed genres, 425 m. words
DANTE
Database of Analysed Texts English, mixed genres, 1.7 b. words of English (Lex.MC/SkE Brighton)
EUROPARL
European Parliament Parallel corpus of political www.statmt.org/europarl/ Proceedings (Strasbourg) debates, 30+ m. words each for 11+ languages
EUSKARA
UZEI Terminology and Lexicography Centre (Donostia)
Basque, mixed genres, 4.6 m. words
www.euskaracorpusa.net/
IATE
Inter-Active Terminology Database for Europe (Luxembourg)
Multilingual translation, 8.4 m. terms
http://iate.europa.eu/
ICE
International Corpus of English (UC London)
Comparable English corpora from different parts of the world (each 1 m. words)
www.ucl.ac.uk/ english-usage/projects/ ice-gb/index.htm
http://corpus.byu.edu/coca/
www.webdante.com/
LDC (Philadelphia) www.ldc.upenn.edu/Catalog/, LL (Ypsilanti MI) http://linguistlist.org/sp/GetWRListings.cfm?WRAbbrev =Texts#wr173, NaCTeM (Manchester) www.nactem.ac.uk/resources.php, Opus (Uppsala) http://opus.lingfil.uu.se/, OTA (Oxford) http://ota.ahds.ac.uk/catalogue/index.html, SketchEngine (Brighton) www.sketchengine.co.uk/, Valency Patternbank (Erlangen) www.patternbank.uni-erlangen.de/, WordNet Database (Princeton) http://wordnet.princeton.edu/ and Wortschatz (Leipzig) http://wortschatz.uni-leipzig.de/.
380
Resources
5 Journals How do we keep track of new developments through the medium of periodicals? Many of the academies and associations cited in Sections 2 and 3 above sponsor journals, which I have surveyed (Hartmann 2009). All branches of dictionary research can benefit from such serial publications, for example, dictionary history (from IJL), dictionary typology (from Lexicographica), dictionary criticism (from Reference Reviews), dictionary structure (from Dictionaries) and dictionary IT (from Language Resources and Evaluation Journal). The list in Table 6.4 contains 10 journals of special relevance to lexicography, and an increasing number of them are becoming available in online editions. There are quite a few journals available for neighbouring disciplines such as applied linguistics, computational and corpus linguistics, languages for specific purposes, semiotics, dialectology, onomastics, terminology, library science, indexing and translation. Some of these can be pursued via the websites of some of the bodies mentioned in other sections (e.g. 2 on academies, 3 on associations, 4 on corpora, and 8 on publishers). More can be found in directories such as the EBSCO Journals (www.ebscohost.com/title-lists/), the JSTOR Scholarly Journal Archive (www.jstor.org/), the MLA Bibliography (www.mla.org/ bib_periodicals/) or the journal LLBA (www.csa.com/factsheets/llba-set-c.php).
Table 6.4 Journals Title
Since
Website
Cahiers de Lexicologie Cishu Yanjiu [Lexicographical Studies] Dictionaries. Journal of the DSNA International Journal of Lexicography Language Resources and Evaluation Journal [formerly Computers and the Humanities] Lexicographica. International Annual for Lexicography LexicoNordica
1959 1979 1979 1988 (1966) 2005
http://atilf.atilf.fr/jykervei/cahlex.htm www.cishu.com.cn/ http://muse.jhu.edu/journals/dictionaries/ www.ijl.oxfordjournals.org/ www.elra.info/Language-Resources-andEvaluation.html
1985
www.degruyter.com/view/serial/35484
1994
Lexikos
1991
Lexique. Revue française de lexicologie et de linguistique Reference Reviews
1982
http://nordisksprogkoordination.org/nfl/ publikationer/lexiconordica www.wat.co.za/EngelseWebwerf/ Publications/lexikosEng.htm www.septentrion.com/en/revues/lexique/
1987
www.emeraldinsight.com/journals. htm?issn=0950–4125
381
The Bloomsbury Companion to Lexicography
6 Networks How do we support each other’s work? We have already noticed the development of ‘networks’ in the above sections on academies, associations, corpora and journals. This is not a straightforward term, as it has computational and social connotations, both of which are important for all branches of dictionary research (cf. Hartmann 2011a, 2011b). Networks can cover a wide range of contacts within and between all types of institutions (from workshops and companies to schools, colleges, universities and other public bodies) within and between all types of special disciplines (from arts and humanities to linguistics and computing) at all hierarchical levels (from local and regional to national, international and global), and may be given many different titles, from ‘committee’ and ’institute’ to ‘group’, ‘forum’ and ‘union’. Table 6.5 concentrates on networks related to lexicography and terminology for English, German and a number of other European languages. For English, numerous informal networks exist in countries around the world, such as the Iwasaki Linguistic Circle in Japan whose journal Lexicon sponsors critical reviews of English-language–based dictionaries. For Germanic languages → the FGLS based at the University of Bristol (www.bris.ac.uk/german/ fgls/), for Nordic languages → the Expert Group Nordic Language Council ENS at Copenhagen (www.norden.org/en/), for Romance languages from French to Valencian → the Organisation Internationale de la Francophonie (www.francophonie.org/), the Union Latine (www.unilat.org/), the Study Network on Minority Romance Languages (www.romaniaminor.net/) and the Inter-University Institute for the Valencian Community (www.iulma.es/). For minority languages in Europe (such as Celtic) → the Irish Foras na Gaeilge (www.forasnagaeilge.ie/), the Forum for Research on Languages of Scotland and Ulster (www.abdn.ac.uk/frlsu/), the International Celtic Congress (www. ccheilteach.ie/) and the European Research Centre on Multilingualism and Language Learning (www.mercator-research.org/). There are no directories of networks as such, but some are cited on the websites of bodies such as FTT at Las Palmas (www.webs.ulpgc.es/terminol/), the Google Scholars specializing in lexicography (http://scholar.google.com/ citations?view_op=search_authors&hl=en&mauthors=label:lexicography), ISO/ TC37 (www.iso.org/iso/technical_committee_contact.html?commid=48116) and affiliated organizations (www.iso.org/iso/about/organizations_in_liaison.htm), LexicographyList (http://tech.groups.yahoo.com/group/lexicographylist/) and Linked.In (www.linkedin.com/myGroups?trk=hb_side_mygrps).
382
Resources
Table 6.5 Networks Acronym for the Network
Full name of the Network (Location)
Achievements Website e.g. Conferences, Journals
EFNIL
European Federation of National Institutions for Language (The Hague)
Annual Conferences, Projects
www.efnil.org/
ELC
European Language Council (Berlin)
Thematic Network Projects, European Language Portfolio
www.celelc.org/
IL
(Workshops of) Academic Wissenschaftliches Network of Internet Netzwerk InternetLexicography lexikographie (Mannheim)
INFO-TERM
International Information Centre for Terminology (Vienna)
Meetings, Newsletter
www.infoterm.info/
IRMM
Institute for Reference Materials and Measurements (EC Joint Research Centre, Geel)
Workshops, Conferences (Catalogue of Reference Materials)
http://irmm.jrc.ec.europa. eu/
LaBLex
Laboratoire de Lexicographie Journées, Bilingual Bilingue (Bari) Dictionary Projects
LTT
Réseau de Lexicologie, Terminologie, Traduction (Bruxelles)
NordTerm
Assemblies, NordTermNet, www.nordterm.net/info/ Network of Nordic Term banks main-en.html Terminology Organisations (Helsinki et al.)
RaDT
Rat für deutschsprachige Terminologie (Bern)
German-Language Terminology Links
WBN
Wörterbuch-Netz (Trier)
Electronic Documentation http://woerterbuchnetz. de/ of Historical and Dialect Dictionaries
Meetings, Lists of Publications, Lettre d’information LTT
http://multimedia. ids-mannheim.de/ mediawiki/web/index. php/WebHome
www.lablex.uniba.it/ www.ltt.auf.org/index.php
www.radt.org/
7 Online Dictionaries What is the impact of the digital media? Reference works come in all sizes, formats and functions, which is one reason why we have already met some aspects of e-lexicography (see Section 4). Dictionary IT is thus an important field which should help us distinguish the various trends in which online schemes are developing, from adaptations and aggregations of existing dictionaries to completely new interfaces and sometimes collaborative projects involving different publishers (see Section 8) and university research centres (see Section 9).
383
The Bloomsbury Companion to Lexicography
Table 6.6 Online dictionaries Acronym for the Dictionary
Full title of the Dictionary
Special features (Contents)
Website
ANW
Algemeen Nederlands Woordenboek
Dutch words, meanings et al.
http://anw.inl.nl/
COBUILD
English corpus-based Collins Birmingham University International information, translation equivalents Language Database, English Language Dictionary
www.mycobuild.com/ http://dictionary.reverso.net/
ELDIT
Elektronisches Lernerwörterbuch Deutsch-Italienisch
Bilingual learner’s dictionary
http://dev.eurac.edu:8081/ MakeEldit1/
LDoCE
Longman Dictionary of Contemporary English Online
Word finder
www.ldoceonline.com/
Macmillan
Macmillan Dictionary
British and American corpus-based
www.macmillandictionary. com/
ORDNET
Det Danske Ordbog, Ordbog over det Danske Sprog
Danish general and historical dictionaries
www.ordnet.dk/
Oxford
Oxford Dictionaries Online
World English dictionary and thesaurus
http://oxforddictionaries.com/
PBWB
Pons Bildwörterbuch
Pictorial German dictionary
www.bildwoerterbuch.com/ en/home/
PDEV
Pattern Dictionary of English Verbs
Verb phrase collocations
http://deb.fi.muni.cz/pdev/
TLF
Le Trésor de la Langue Française informatisé
French historical dictionary
http://atilf.atilf.fr/
Table 6.6 lists 10 electronically produced ‘open’ dictionaries for English and a range of European languages. The emphasis is on features which make them special, such as the range of languages and subjects covered, their databases and their various presentation modes. In addition to these, there are the so-called aggregators (or internet sites which combine the reference material from several different sources), such as DictionaryBoss http://dictionaryboss.com/, DictionaryCom http://dictionary.reference.com/, The Free Dictionary www.thefreedictionary.com/, ThesaurusCom http://thesaurus.com/?regHome=true and VisualThesaurus www.visualthesaurus.com/.
384
Resources
There are also several internet ‘portals’ offering directories of dictionaries and other reference materials, such as CL Research www.clres.com/, ElexicoCom www.elexico.com/, GlossaristCom www.glossarist.com/, LinguistList http://linguistlist.org/sp/GetWRListings.cfm?WRAbbrev=Dict, Online Bibliography of Electronic Dictionaries www.owid.de/bibl/obelex/, RefdeskCom www.refdesk.com/ and Yahoo Directory http://dir.yahoo.com/reference/dictionaries/.
8 Publishers How do we present our information through commercial channels? Publishing bodies are needed for the transmission of research results and the acknowledgement of authorship. There have been occasional tensions between (open) research output and (commercial) media, and the recent economic recession has led to a reduction of dictionary projects and their staff and a greater willingness of publishers to collaborate and even to join forces. Table 6.7 lists a selection of bodies known for the publication of dictionaries and other reference materials as well as journals and book series of relevance to lexicography, concentrating on the English-speaking world and Europe. Among the networks linking publishers of dictionaries and other reference materials are associations such as the Publishers Association (London) www.publishers.org.uk/, the Association of American Publishers (Washington, DC) www.publishers.org/, the Federation of European Publishers (Brussels) www.fep-fee.eu/, the European Association of Search and Database Publishing (Brussels) www. eadp.org/, the International Publishers Association (Geneva) www.internationalpublishers. org/ and Publishers Global www.publishersglobal.com/. More lists of publishers can be found in the websites of the associations mentioned above and in directories such as the Directory of Publishing and The Writer’s Handbook.
385
The Bloomsbury Companion to Lexicography
Table 6.7 Publishers Name of the Publisher (Location)
Special Features e.g. Journals, Book Series, Dictionaries
J. Benjamins (Amsterdam)
www.benjamins.com/ Babel, International Journal of Corpus Linguistics, Terminology, Language International World Directory (BS), Studies in Corpus Linguistics (BS), Studies in the History of the Language Sciences (BS), Terminology and Lexicography Research and Practice (BS)
Bibliographisches Institut (Meyers) Encyclopedias, (Duden) (Mannheim) Dictionaries, Atlases
Website
www.bi-media.de/ www.duden.de/
De Gruyter (Berlin)
www.degruyter.com/ Corpus Linguistics and Linguistic Theory, Dialectologia et Geolinguistica, Folia Linguistica, Lebende Sprachen, Lexicographica [International Annual], Semiotica, WSK Online – Handbooks of Linguistics and Communication Science (BS), Lexicographica Series Maior (BS)
Elsevier (Amsterdam – London – New York)
English for Specific Purposes, Journal of English for Academic Purposes, Language and Communication, Language Sciences, Lingua, System, (Medical) Dictionaries, Technical Handbooks
www.elsevier.com/
HarperCollins (New York – London)
ELT News, ELT Reader (BS), (Collins) Dictionaries, Wordbank
www.harpercollins.com/ www.harpercollins.co.uk/
Larousse (Paris)
Langages, Langue Française, Encyclopedias, (Larousse & Didier) Dictionaries
www.editions-larousse.fr/
Macmillan (London – New York)
Journal of Information Technology, Latino Studies, (Macmillan, Palgrave & Encarta) Dictionaries
http://international. macmillan.com/ http://us.macmillan.com/
Oxford U.P. (Oxford)
Applied Linguistics, English Language Teaching www.oup.co.uk/ Journal, International Journal of Lexicography, Journal of Semantics, Dictionaries
Random House (New York – London)
Living Language (BS), (Webster) Dictionaries
www.randomhouse.com/ www.randomhouse.co.uk/
Wiley (Malden MA – Chichester)
The Modern Language Journal, International Journal of Applied Linguistics, Encyclopedias (BS), Companions (BS)
www.wiley.com/
9 University Research Centres How do we advance knowledge through research and training? We have already seen in earlier sections how several kinds of dictionary work can be pursued in connection with academies, corpus projects, interdisciplinary networks, online dictionary projects or publishers. The recent development of 386
Resources
lexicographic units at universities, covering one or more branches of dictionary research, is worthy of recognition, but their numbers are still very small (6 out of 117 universities in the United Kingdom, 7 out of 83 in Germany, 6 out of 79 in France and 5 out of 62 in Spain). One representative selection of such centres has been published in a list of portals on the EURALEX website (Hartmann 2010b). All of these centres face potential tensions between humanities and science, theory and practice, education and commerce, or public and private initiatives. Limited funding has occasionally led to a decline in numbers of staff, projects and postgraduate courses, and sometimes to the complete closure of such units. All of this can affect even the 10 pioneering units exemplified in Table 6.8, which concentrates on the situation in Europe, with special reference to such factors as dictionary projects (which often involve links with academies, archives, libraries and publishers), MA and PhD programmes, publications, conferences and the availability of corpus and other computer technologies. Occasionally, these centres can interlink with other disciplines, departments or institutions, for example, when the Applied Linguistics Institute at Pompeu Fabra University in Barcelona provides networks for terminology and other fields, or when the Interdisciplinary Centre for Research on Lexicography, Valency and Collocation (ICRLVC) at Erlangen acts as a forum for lexicography and collocational research in different parts of the university and MA courses elsewhere, or when the Laboratoire Lexiques, Dictionnaires et Informatique (LDI) allows professional connections to be maintained between Cergy and other universities in the Paris region, or when joint online Language for Special Purposes (LSP) dictionary projects bring together staff at the Aarhus Business School in Denmark and the Spanish University of Valladolid. There is a range of university research units which have become well known for their special initiatives, such as lexicography summer schools (at Ivanovo in Russia), the study of minority languages (at Cambridge in England or at Ghent in Belgium), the promotion of neighbouring disciplines such as Natural Language Processing (NLP) and corpus linguistics (at Stuttgart in Germany or at Lorient-Bretagne-Sud in France), terminology (at Lyon 2-Lumière in France or at Pecs-Karoli-Gaspar in Hungary), and translation (at Tampere in Finland or at Bologna in Italy). Some are managing to collaborate with research units in independent institutes or in academies (e.g. the Magnusson Language Institute at Reykjavik in Iceland). There are no general directories available for academic lexicography centres, but some specific websites can provide selected information, for example, on the International Association of Universities at Paris (www.unesco.org/ iau), on lists of universities and academies (www.university-directory.eu/),
387
The Bloomsbury Companion to Lexicography
Table 6.8 University Research Centres Acronym for Centre (Location)
Full Name of the Centre (University)
Website Achievements e.g. Special projects, Publications, Courses, Meetings
CENTLEX (Aarhus)
Center for Lexikografi, Aarhus Universitet
Danish LSP online dictionaries, http://bcom.au.dk/ research/academicareas/ Hermes, MA and PhD, centreforlexicography/ Conferences www.birmingham. ac.uk/schools/edacs/ departments/english/ research/projects/drc.aspx
DRC Dictionary Research (Birmingham) Centre, University of Birmingham
Dictionary history, Corpus-based lexicography, PhD, Conferences
FGL (Oslo)
Forskargruppe for Leksikografi, Universitetet i Oslo
Norwegian dictionaries, Dialect www.hf.uio.no/iln/ forskning/grupper/ and text corpora, MA and leksikografi/index.html PhD, Conferences of NFL 1991 and 2013, Congress of EURALEX 2012
GdL (La Coruña)
Grupo de Lexicografía, Universidade da Coruña
www.udc.es/grupos/ Spanish dictionary, Revista de Lexicografía, MA and lexicografia/ PhD, Thematic bibliography, International conference 2004
ICLVCR (Erlangen)
Interdisciplinary Centre for Lexicography, Valency and Collocation Research, Friedrich-AlexanderUniversität
Valency and collocation research, MA (EMLex) and PhD, Conferences and workshops
www.lexi.uni-erlangen. de/en/
INFOLEX (Barcelona)
Lexicography Research Group, Universitat Pompeu Fabra
Spanish Learner’s Dictionary, Thematic Networks for Terminology and Lexicography, Corpora, Termbank, MA and PhD, Congress of EURALEX 2008, Corpus Seminar 2010
www.iula.upf.edu/infolex/ ipresca.htm
INL (Leiden)
Instituut voor Nederlandse Historical and contemporary dictionaries, Corpus Lexicologie, development Rijks-Universiteit Leiden
LDI (Paris)
Laboratoire Lexiques, Dictionnaires et Informatique, Université Paris 13 Nord
French lexicography, MA and PhD, Journées
http://www-ldi. univ-paris13.fr/
LI (Göteborg)
Lexikaliska Institutet, Göteborgs Universitet
Swedish and multilingual dictionaries, MA and PhD, Conference of NFL 1999
www.svenska.gu.se/ forskning/li/ www.islex.se/
ZLL (Poznań)
Bilingual dictionaries, MA and Zakład Leksykografii i PhD Leksykologii, Uniwersytet im. Adama Mickiewicza
388
www.inl.nl/
http://ifa.amu.edu.pl/ fa/Department_of_ Lexicology_and_ Lexicography
Resources
on the academic ranking of world universities (www.arwu.org/), on research funding in the United Kingdom (www.rcuk.ac.uk/), on doctoral dissertations (LinguistList http://linguistlist.org/search/search-all-res1.cfm), or on staff based at British universities (www.academia.edu/). For general guides to academic bodies around the world → The Europa International Foundation Directory and The Europa World of Learning.
10. Conclusion This chapter has examined eight assets of potential value to dictionary research: zz Academies,
zz Associations,
zz Corpora and databases, zz Journals,
zz Networks,
zz Online dictionaries, zz Publishers and
zz University research centres.
Not all topics could be fully covered, but an effort has been made to show correlations and overlaps between the main resources, for example, lexicographic work that is being carried out at academies, publishers and universities and existing networks that have been developed between them. A number of issues may have been simplified or overlooked, for example, when selected examples were presented in the tables, and it was not possible to describe the current situation in all countries, languages and disciplinary specialisations (e.g. NLP, terminology, translation and onomastics), but the emphasis on websites has hopefully improved empirical evidence and increased our knowledge of the current facts.
Note 1. I wish to acknowledge the help I have received from the editor and a number of co-authors (especially Paul Bogaards and Robert Lew) and other scholars elsewhere, such as Félix Córdoba Rodríguez, Gilles-Maurice de Schryver, Dmitrij Dobrovolskij, Anna Hannesdóttir, Thomas Herbst, Iztok Kosem, Sabina Pavlova, Jean Pruvost, Serge Verlinde and Geoffrey Williams.
389
The Bloomsbury Companion to Lexicography
References Dictionaries, Directories and Other Reference Works (for online dictionaries → Section 7) Directory of Publishing 2012 (U.K. and Republic of Ireland), 37th edition (2011) London: Continuum & Publishers Association. The Encyclopedia of Applied Linguistics (2012) 10 volumes/online. In: Carol A. Chapelle (ed.). Malden, MA and Chichester, UK: Wiley & Blackwell. EURALEX Bibliography. Available at: http://euralex.pbworks.com/w/page/7230036/ FrontPage The Europa International Foundation Directory, 21st edition (2012) London: Routledge and Taylor and Francis Group. The Europa World of Learning, 62nd edition (2011). London and New York: Routledge and Taylor and Francis Group. The Writer’s Handbook 2011 (2010) Barry Turner (ed.). London: Palgrave-Macmillan.
Other References Fuertes-Olivera, P. and Bergenholtz, H. (eds) (2011) e-Lexicography. The Internet, Digital Initiatives and Lexicography. London: Continuum. Hartmann, R. (2009) Keeping in touch: a survey of lexicography periodicals. Lexikos 19, 404–22. —(2010a) Has lexicography arrived as an academic discipline? Reviewing progress in dictionary research during the last three decades. In: H. Lönnroth and K. Nikula (eds) Nordiska Studier i Lexikografi 10. Rapport från Konferensen om Lexikografi i Norden, Tammerfors 2009. Tammerfors University of Tampere and Oslo: Språkrådet, 11–35. —(2010b – revised 2011) Reference portals to internet sources relevant to lexicography and terminology, EURALEX Website. Available at: http://euralex.pbworks.com/f/ Reference+Portals+aug+2010.pdf. —(2011a) International and interdisciplinary networking for the benefit of reference science. In: F. I. Kartashkova (ed.) Ivanovskaya Leksikograficheskaya Shkola: Traditsij i Innovatsij/Ivanovo School of Lexicography: Traditions and Innovations. A Festschrift in Honour of Professor Olga Karpova. Ivanovo: Ivanovo State University, 158–79. —(2011b) Linking up. The role of networking in disciplinary contacts within and around lexicography, with special reference to four European countries. Dictionaries 32, 33–65. —(2012) The contribution of European academies to dictionary-making, lexicography and reference science. In: K. P. Márkus, T. Pintér and D. Pődör (eds) Szavak Pásztora. Írások Magay Tamás tiszteletére (Pastor of Words. A Festschrift in Honour of Tamás Magay). Szeged: Grimm Publishing House, 310–33. —(2013) Lexicographic associations. Article 39. In: R. Gouws et al. (eds) Dictionaries. An International Encyclopedia of Lexicography. Supplementary Volume 5.4 Recent Developments with Special Focus on Computational Lexicography. Berlin: De Gruyter Mouton, 613–19. Thomas, G. (1991) Linguistic Purism. London/New York: Longman.
390
7
Glossary of Lexicographic Terms Barbara Ann Kipfer
Abridged dictionary: a condensed or derived work from an unabridged or much larger dictionary and usually including the most essential vocabulary while excluding the rare and archaic or omitting information such as etymologies or examples; also called abridgment or abridgement (e.g. Shorter Oxford English Dictionary). Access structure: the design of a reference work with parts that allow users to search for particular types of information such as alphabetical order of the headword list or an index for a conceptual thesaurus. Analogical dictionary: a work containing information such as collocations, synonyms, confusable words, etc. (e.g. Macmillan Collocations Dictionary). Analytic (or analytical) definition: the classical formula to explain the meaning of a word or phrase using the genus (generic term) and differentia (distinguishing feature or features) formula; also called logical definition (e.g. triangle – a plane figure (genus) that has three straight bounding sides (differentia). Antedating: citation selection from the earliest possible sources; evidence of the occurrence of a word or phrase given in an historical dictionary. Antonym: a word or phrase with the opposite meaning of another, for example, alive/dead, clean/dirty. Article: 1 a paragraph describing a headword in a dictionary; also called entry 2 a piece describing a vocabulary feature, issue, or process of dictionarymaking that is included in the front matter of a dictionary. Back-formation: a word-formation process in which an element, usually an affix, is removed from a word to create another, for example, burgle from burglar. Base word: the form of a word that heads a dictionary entry, including the simplest (canonical) terms; the most grammatically simple form of a word; also called basic form, canonical form, entry form, head form (e.g. arm). Bilingual dictionary: a reference work providing equivalent words/phrases in two languages; a dictionary with translation equivalents of two languages (e.g. Collins Robert French-English English-French Dictionary).
391
The Bloomsbury Companion to Lexicography
Bilingualized dictionary: a monolingual dictionary that has been translated into another language, usually a learner’s dictionary (e.g. Password series by Kernerman). Bogey: a word entered in a dictionary through some error, as misunderstanding or misreading a manuscript, or by design as a test for plagiarism; also called ghost word, for example, dord. Borrowing: a word-formation process in which a word or phrase is transferred from one language to another; a word or phrase from one language taken into another language and naturalized; also called loanword (e.g. goulash, into English from Hungarian ‘gulyás-hús’); see also calque. Calque (loan translation): an expression (compound, derivative, or phrase) introduced into one language by translating its constituent parts from another language; for example, superman (English) from German Übermensch. Canonical form: the most grammatically simple form of a word; also called base word, basic form, entry form, head form (e.g. arm). Catchword: a lexical unit which is included in the wordlist defined in a dictionary; a guideword. Citation: a word or phrase with enough context to understand the meaning, taken as a unit that is recorded or excerpted from written and spoken sources. Citations are a source of lexicographical data that are collected, sorted, and analyzed for writing definitions and are used as verbal illustrations or examples in dictionary entries. Classificatory label: a taxonomic name or class of a sense or entry, for example, the order and genus for an animal; also called taxonomic name, scientific classification (e.g. Felis catus). Closed corpus: a collection of text that is limited by the number of sources available, as for a dead language or dialect; a collection of text claiming to contain all or nearly all data from a particular field, for example, the Old English Corpus. Cognate: a pair or group of words that have the same root or languages that are genetically related; a member of a pair or group of words and phrases that have an intralingual or interlingual genetic relationship (English apple and German apfel are cognate words; English and Flemish are cognate languages). Collect: to excerpt citations for the wordlist of a dictionary; to gather source material for compiling a reference work. College dictionary (collegiate dictionary, desk dictionary): an intermediatesize, single-volume dictionary intended for use by students or at an office desk and containing information similar to an abridged general dictionary, for example, Merriam-Webster’s Collegiate Dictionary. Collocation: a combination of words (adjective-noun, verb-preposition) that have a certain mutual expectancy, that have a great likelihood of co-occurring, for 392
Glossary of Lexicographic Terms
example, false expectation, hot coffee, nice surprise. Collocations vary in the degree to which one lexical unit expects another to occur with it. Collocations are more fixed than free combinations and less fixed than idioms. Compound word: two or more lexical units (simple words) that form a new lexical unit (a new word with a single meaning), for example, dry + clean = dryclean, time + keeper = timekeeper. Concordance: a systematic list of every occurrence of every lexical unit in a specific text or texts, arranged with the lexemes in the centre with preceding and following context; also called keyword-in-context concordance. A concordance provides details about grammar, usage, compounding, lemmatization, collocation, and context. Connotation: associations and characteristics connected with a lexical unit or sense beyond the linguistic explanation or denotation, for example, that caviar is a symbol of luxury. Context: a phrase, sentence, or paragraph surrounding a lexical unit that depicts its meaning or sense; also called lexicographic context, minimal context, situational context, context of use. Taken from either written or spoken sources, context shows the characteristic features of a lexical unit and the setting or circumstances with which a word or phrase is associated. Corpus (plural, corpora): the written and spoken sources used as the basis for a reference project, a systematic collection of texts which documents a language (e.g. Corpus of Contemporary American English). Cross-reference (or cross reference): the listing of another lexical unit for an entry or a sense that is considered either synonymous with or related to that entry or sense; a word or symbol in a reference work that indicates related information. Definiendum: a lexical unit (word or phrase) that is defined in a reference work. Definiens: the explanation of the meaning of a lexical unit (word or phrase); also called definition. Defining vocabulary (limited defining vocabulary): a restricted set of words used to define all other terms in a dictionary; the controlled use of vocabulary in definitions, restricting the descriptions to using the most frequent words of a vocabulary to describe other words. This is a practice of learner’s dictionaries, for example, Longman Dictionary of Contemporary English. Definition: the explanation of the meaning of a lexical unit (word or phrase); also called definiens, gloss. The definition offers semantic information and is the prominent feature of a dictionary. Denotation: an aspect of meaning that relates a word or phrase to the thing it expresses, i.e. what a sense of a lexical unit actually refers to; also called referential meaning, cognitive meaning, reference. This is the usual topic of the definition while the more subjective or emotive aspects (connotation) are not described. 393
The Bloomsbury Companion to Lexicography
Denotatum: an actual existing object referred to by a lexical unit (word or phrase); the meaning distilled from a referent by perception. It is contrasted with designatum. Density: the degree of vocabulary coverage in a reference work. Derivative: a word that is created by the addition of an affix to a base or stem, for example, lexical, lexically, lexicalize, lexicalization. Derivatives are not always given headword status but may be run-on entries or sub-entries under the headword from which they were derived. Descriptive: an approach to describing the meanings and uses of the language based on observed facts rather than on attitudes as to how it should be used (prescriptive). Designatum: an object that is referred to by a lexical unit (word or phrase), whether it actually exists or not; the aspect of meaning identified for expression by a word or phrase. It is contrasted with denotatum. Diacritic (or diacritic mark, diacritical mark): a sign placed above or below a character or letter to indicate that it has a different sound or phonetic value or that a syllable has a certain type of stress or tone (e.g. piñata, jalapeño). Dialect label: a label indicating a specific geographic area where a sense or entry is used, for example, New England, British (also called regional label), or a particular group of speakers, for example, kids’ slang. Dictionary: a reference work that describes chosen lexical units (words, phrases) of a language or subject; also called lexicon (2). Differentia (plural differentiae): the specifying term or terms in an analytic, classical or logical definition, the term which qualifies or characterizes the genus, one or more of the characteristic features which distinguish the word explained from the generic term (genus) of which it is considered a specific instance; also called differentia specifica (e.g. autumn/fall: the third season of the year, when crops and fruits are gathered and leaves fall). Diminutive: a word is formed from another by the addition of a suffix expressing smallness in size, for example, a booklet is a small book, eaglet is a small eagle. Direct entry: the listing of a multiword expression under its first constituent, for example, autumnal equinox listed in letter A. Direct sense: the primary or significant sense of a lexical unit. Electronic dictionary: a reference work that is compiled with the use of computers and is presented in computerized form, including online dictionaries, dictionaries on media such as DVD, spelling checkers and thesauri in word processors, and dictionary databases; any machine-readable version of a dictionary.
394
Glossary of Lexicographic Terms
E-lexicography (electronic lexicography, computational lexicography): the processes involved in the compilation, design and implementation of electronic dictionaries and other word-based reference works. Encyclopedic definition: an explanation of a word or phrase that is comprehensive and includes information that is not strictly linguistic, a definition that reflects encyclopedic knowledge or facts; for example, the definition of elephant in Collins English Dictionary: Either of the two proboscidean mammals of the family Elephantidae. The African elephant (Loxodonta africana) is the larger species, with large flapping ears and a less humped back than the Indian elephant (Elephas maximus), of S and SE Asia. Entry: a paragraph describing a lexical unit (word or phrase), the basic unit in a reference work. An entry may describe the base word and all of its parts of speech (as sub-entries), or the parts of speech may be depicted as separate entries. Entry block: a paragraph describing a lexical unit or part of speech of a lexical unit. Entry form: the most grammatically simple form of a word; also called basic form, canonical form, head form, base word. Entry line: the parts of an entry which precede the definition(s), usually the headword, pronunciation and syllabication, part of speech, and label(s); the initial line of a reference-work entry indicated by indentation or typography (e.g. boldface). Entry word: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary. Canonical forms are almost always included. Also called entry, entry head, headword, keyword, lemma, main entry, word entry. Equivalent: 1 a lexical unit (word or phrase) used as a synonymous definition for another entry word, also called defining equivalent, synonymous equivalent (e.g. gimcrack – a showy object of little use or value: GEWGAW) 2 the translation (in the target language) of a lexical unit (in the source language) of a bilingual dictionary, also called translation equivalent (e.g. dog = le/un chien). Etymological dictionary: a reference work describing the histories and origins of the entry words, tracing back to the earliest form (etymon) and meaning of words and phrases. Etymology: the history and origin of a lexical unit (word or phase). In print dictionaries, the etymologies are usually given in abbreviated form, for example, for cabbage from Concise Oxford Dictionary: ME: from OFr. dial. caboche ‘head’, var. of OFr. caboce. Etymon: the form from which a word is derived, for example, the etymon of glossary is Latin glossa, from Greek glossa ‘tongue, language’.
395
The Bloomsbury Companion to Lexicography
Example: a phrase or sentence excerpted from a written or spoken source or written by a lexicographer that contains a form of the entry word and which indicates additional information about its grammatical or semantic characteristics, such as its context and usage: also called citation, illustrative citation, illustrative example, illustrative phrase, illustrative quotation, illustrative sentence, quotation, verbal illustration. Excerpt: to extract a lexical unit and its context from a written or spoken source, recording it for analysis or other uses; to select suitable material from a data set for compiling a reference work. Fascicle: any of the sections of a book being brought out in instalments prior to its publication in completed form; one of several instalments of a published reference work, once a practice of large historical dictionaries. Field label: a subject label in a reference work used to indicate the discipline, domain or field of a sense or entry, for example, biology, linguistics. Fixed expression: two or more words that are always found together in a specific arrangement, such as collocations (e.g. nice surprise) compounds (e.g. school dictionary), and idioms (e.g. red herring); a phrase whose constituent elements cannot be moved or substituted without changing the meaning or literal interpretation. Formulaic definition: a style of describing meaning written in a specified, traditional formula, also called truncated definition. The formulas differ according to parts of speech and are used to provide consistency in treatment. Frequency: the number of occurrences of a word/phrase in written or spoken contexts, as in corpora, kept as a statistic and used in wordlist compilation and usage label assignment for the vocabulary considered for inclusion in reference works. Front matter: the introductory material of a reference book prior to the wordlist, usually including a preface, explanatory notes, articles, pronunciation guide, abbreviations, and any other information on how to use the book; also called fore-matter. Genus: the word or phrase that classifies a lexical unit, the part of a definition that is the superordinate word (hypernym) to which the word being defined is subordinate (hyponym); also called IS-A, is-a. Ghost word: a word entered in a dictionary through some error, as misunderstanding or misreading a manuscript, or by design as a test for plagiarism; also called bogey. Gloss: a brief explanation of the meaning of a lexical unit (word or phrase) (example band ‘strip of material’ as opposed to a definition of band ‘strip of material used as a distinguishing mark on clothes’). Glossary: a simple or short list of defined words, a wordlist defined for a limited or specialized subject or a wordlist defined concisely for a limited subject.
396
Glossary of Lexicographic Terms
Grammatical code: one of a system of abbreviated terms and symbols used to designate detailed syntactic information, for example, U = uncountable noun. Graphic illustration: a picture, line drawing, table, list, or map included to aid description of a lexical unit or units. Guide word (or guideword): a word or part of a word printed at the top or bottom of a reference book page to indicate what entries are included on that page. Also in learners’ dictionaries, the list of words at the beginning of a long entry to aid the user to find the sense required. Hapax legomenon: any word or phrase that appears only once in a manuscript, document or particular area of literature. Hard word: a term that is unfamiliar to the average reader, usually a foreign, scientific, technical, or formal term. Hard words are looked up in dictionaries and motivated the creation of dictionaries, for example, Robert Cawdrey’s Table Alphabeticall of 1604. Headword: the form of a lexical unit (word or phrase) chosen for inclusion in the wordlist defined in a dictionary, especially canonical forms; also called entry, entry head, keyword, lemma, main entry, word entry. Practice varies as to how headwords are marked typographically and how variant forms are shown. Historical dictionary: a type of reference work that attempts to describe all the forms and meanings of its entry words from their inception; a description of a vocabulary’s history from beginning to present, documenting the changes in form and meaning of words and phrases. Homograph: a lexical unit which is spelled the same as another but has a different pronunciation and meaning, for example, minute ‘division of time’ and minute ‘tiny’. Homonym: a lexical unit which is spelled and pronounced the same as another but has a different meaning and etymology, and the two types are homographs and homophones, for example, bear ‘animal’ and bear ‘carry’. Homophone: a lexical unit which is pronounced the same as another but has a different spelling and meaning, for example, fair and fare. Hypernym (or hyperonym): the generic term of a set which has members (words and phrases) that are more specific (hyponym), for example, step is a hypernym of footstep. Hyponym: the specific term of a word or phrase, which is a member of a larger, more generic term or set, for example, footstep is a hyponym of step. Hyponymy: the hierarchical relationship between the meanings of lexical units, in which the meaning of one lexical unit is a specific type of another lexical unit. The sense of the hyponym (specific term) can be said to be included in that of the hypernym (generic term), for example, flower is a type of plant, tiger is a type of cat.
397
The Bloomsbury Companion to Lexicography
Ideological dictionary: a reference work arranged so that the user moves from meaning to word; also called analogical dictionary, onomasiological dictionary. Idiom: a fixed expression with a unitary meaning that is not always transparent from the combination of the meanings of its constituent words, for example, kick the bucket, let the cat out of the bag. Index entry: an entry whose headword is a variant and cross-reference of a fully defined headword in a reference book. Inflection: a change in the basic form of a word that shows a grammatical function such as case, gender, number, tense, person, mood or voice (e.g. inflects, inflected, inflecting); a form, suffix or element involved in such a change. Informant: a person who answers a questionnaire or otherwise supplies examples of usage or other data for a dictionary or linguistic project. Inkhorn term: a hard word usually coined from foreign roots such as Latin or Greek, and thought to be unnecessary or overly pretentious, for example, animadversion for ‘criticism’. Intentional definition: a type of definition using a formula which specifies the attributes or characteristics of a concept in relation to its hypernym (generic term), for example, pine is ‘kind of evergreen tree’. International Phonetic Alphabet: a pronunciation transcription system based on the Latin alphabet that uses numerous symbols (diacritics) to represent speech sounds – now the usual way of representing pronunciation in dictionaries. Keyword: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary, especially canonical forms; a defined word or phrase. Also called entry, entry head, headword, lemma, main entry, word entry. Label: a descriptor or abbreviation indicating a restricted usage of a lexical unit, either by classification (domain, field, subject), part of speech, language variety, register, status, style, or usage, for example, American English, archaic, derogatory, informal, slang. Learners’ dictionary (or learner’s dictionary): a reference book intended for foreign (non-native) learners of a language, for example, Collins COBUILD Advanced Learner’s English Dictionary. Lemma: a lexical unit which is included in the wordlist defined in a dictionary, especially canonical forms. Also called entry, entry head, headword, keyword, main entry, word entry. Lexeme: a word or phrase regarded as a single, definable item in the vocabulary of a language; also called lexical unit, lexical item. Lexemes are usually thought of as a combination of a graphic/phonic form with a meaning/ semantic value in a particular grammatical context.
398
Glossary of Lexicographic Terms
Lexical: of or pertaining to the description of the meanings of the units of a language; of or relating to the meaning of words as distinguished from their grammar and construction; of or pertaining to a lexicon or lexicography. Lexical unit: a word or phrase regarded as a single, definable item in the vocabulary of a language; also called lexeme, lexical item. Lexicographer: a person who writes, edits, or compiles dictionaries. Lexicographic archive: a collection of lexical information from various sources that is available for lexicographers and researchers; a collection of dictionaries in a reference department of a library or as a special collection. Lexicographic definition: an explanation of meaning that is considered the equivalent of the lexical unit and may be substituted for the lexical unit in a context (e.g. The book was long = The set of written, printed or blank sheets bound together into a volume was long). Lexicographic (or lexicographical): of or pertaining to lexicography, the defining of words or to dictionary making. Lexicography: the practices and principles of dictionary-making, the editing or compiling of a dictionary; the professional activity and academic field concerned with dictionaries and other reference works, the latter also called metalexicography. Lexicology: a branch of linguistics that is concerned with the study of the basic units of vocabulary (lexical units), their formation, meaning and structure. Lexicon: (1) the entire set of lexical units of a language; the totality of a language’s vocabulary, (2) a reference work listing and explaining the words of a language, language variety, specialized work, etc., sometimes synonymous for dictionary. Loanword (or loan-word): a word or phrase that has been borrowed into a language and has not been fully assimilated into the vocabulary; also called borrowing. Logical definition: the classical formula to explain the meaning of a word or phrase using the genus (generic term) and differentia (distinguishing feature or features) formula; also called analytic definition (e.g. triangle – a plane figure [genus] that has three straight bounding sides [differentia]). Macrostructure: the overall organizational scheme of a reference work, often starting with an alphabetical wordlist. Main entry: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary. Canonical forms are almost always included. Also called entry, entry head, entry word, headword, keyword, lemma, word entry; compare sub-entry, run-on. Meaning: the relationship between a word or phrase and object(s) or idea(s) which it designates, that is, what a lexical unit denotes or conveys; a descrip-
399
The Bloomsbury Companion to Lexicography
tion (in a dictionary) of a concept referred to or implied by a word or phrase; also called sense. Meaning discrimination: the division of distinct meanings (senses) of a word or phrase within a dictionary entry; also called sense discrimination. Metalanguage: any language used to describe language, the language that describes the meanings or senses of lexical units in a dictionary. Metalexicography: the study of lexicography and the processes of dictionarymaking (metalexicographer). Microstructure: the internal organizational scheme or design of a reference unit within a reference work, providing detailed information about the word or phrase. The microstructure is usually explained in a reference work’s front matter or guide. Monolingual dictionary: a reference work that describes only one language using the same language, for example, general dictionaries, learner’s dictionaries. Monosemous: of or having only one sense or meaning. Monosemy: of a word or phrase, the state of having a single meaning; compare polysemy. Morphology: the form and structure of words in a language, especially their change, combination, derivation, and inflection; a branch of grammar concerned with the formation and structure of words and phrases. Multiword lexical unit (or multi-word lexical unit): a lexical unit consisting of two or more words which function as a unit (lexeme), both syntactically and semantically; also called multiword combination, multiword expression (e.g. express train, out of date). Neologism: a new word or a new meaning for an established word; also, the practice of coming up with or coining new words. Nesting: the practice of clustering related words/phrases within an entry in a reference work, for example, the entry for casual would nest the entries casually, casualness at the end of the entry. Nonce word: a word or phrase coined for a particular occasion; also called hapax legomenon, nonce form. Normative dictionary: a reference work which is based on normative attitudes as to how a language should be used, a dictionary written prescriptively rather than descriptively based on facts observed about its usage. Object language: the human language from which the entry words of a dictionary are taken. Online dictionary: any dictionary that is available on a computer network, such as the Internet/Web, and capable of being searched for word data. Onomasiological dictionary: a reference work arranged by the meaning or concept leading to the lexical unit (word or phrase), a dictionary presenting language as expressions of semantically linked concepts (ideas, meanings); 400
Glossary of Lexicographic Terms
also called semantic dictionary (e.g. reverse dictionary, word-finding dictionary, Roget-style thesaurus). Onomastic dictionary: a reference work describing personal or other names such as place names, pseudonyms, surnames, etc. Open corpus: an open-ended corpus which is compiled from an unlimited number of written and spoken sources and accounts for change in the language. Orthographic word: a lexical unit (word or phrase) distinguished from others by its spelling. Ostensive definition: a definition which includes a representation of the object or idea being described, as ‘yellow: a colour whose hue resembles that of ripe lemons or sunflowers’; also called synthetic definition. In a dictionary this can be supplemented by pictorial illustration or otherwise pointing directly at an object. Overall-descriptive dictionary: a reference book intended to describe the standard and non-standard uses of its vocabulary. Part of speech: the syntactic classification or grammatical role of a sense or entry, for example, noun, verb, prefix; also called word class. Phonetic: of or pertaining to speech sounds. Phonetics: a branch of linguistics concerned with the production and nature of speech sounds, especially in articulatory-biological or acoustic-physical terms. Phonological word: a lexical unit (word or phrase) distinguished from others by its pronunciation. Phonology: a branch of linguistics concerned with the study of speech as a system of sound patterns, especially relationships between syllables and words; the history and theory of sound changes. Phrase: two or more words functioning as a syntactical and semantic unit with a single grammatical function. Phraseological information: data or a reference work describing fixed expressions, phrases or sentences; data about phrases in syntactic context. Phraseology: the study of phrases, such as fixed expressions, idioms and multiword expressions. Plan: the editorial policies, practices and objectives developed by those involved in a lexicographic project. Polysemous: of or having more than one sense or meaning. Polysemy: of a word or phrase, the state of having more than one sense or meaning. Most lexical units are polysemous and a general dictionary functions to distinguish those senses; compare monosemy. Pragmatic information: word data describing social and cultural rules of speaking such as gesture, intonation and tone, pitch, and other conventions of communication.
401
The Bloomsbury Companion to Lexicography
Prescriptive: pertaining to a strict and authoritarian approach to describing the meanings and uses of lexical units based on normative attitudes as to how a language should be used as opposed to the descriptive which is based on facts observed about a language’s use. Pronunciation: the form, production, and representation of speech as studied in phonetics and phonology. Pronunciation is codified in dictionaries, mainly by phonetic transcription (using special symbols as from the International Phonetic Alphabet) or respelling (by conventional letters or characters). Pronunciation key: a table that translates the symbols used to represent speech sounds and gives representative words containing those speech sounds. Quotation: a phrase or sentence excerpted from a written or spoken source that contains a form of the entry word and which indicates additional information about its grammatical or semantic characteristics; in a dictionary entry or citation file, a citation; also called example, illustrative citation, illustrative example, illustrative phrase, illustrative quotation, illustrative sentence, verbal illustration. Range of application: the semantic and syntactic restrictions of a lexical unit’s use. Range of information: the extent of information offered about the entry words in a reference book; also called range. Reference skills: the ability to find needed or sought-after information within a dictionary or other reference work; an understanding of the structure and features of a reference work for useful consultation. Register: a variety of language associated with a particular social context; the style of language, grammar, and words used for particular situations, for example, informal language used at a party, legal language, jargon of a technology field. Register label: a label used to mark a feature of usage for a word or phrase, such as formality, style, or variation in place and time. Respelled pronunciation (or respelling): a pronunciation transcription system using conventional letters or characters and a minimum number of diacritical symbols. Reverse dictionary: a reference work which alphabetically lists clue words or phrases that offer users a way of moving from a concept, idea, or meaning to the target word – an inversion of the traditional order; also called onomasiological dictionary, word-finding dictionary. Root: in etymology, the part of a word which is common to a word family and may have cognates in related languages, for example, Primitive IndoEuropean *mater – as the root of ‘mother’ and cognates. Root word (or root morpheme): the base of a word; the form of a word after all affixes are removed, for example, vigour as the root of reinvigorating. 402
Glossary of Lexicographic Terms
Rule-based definition: the use of a rule to define a word, for example, ‘whom: used instead of ‘who’ as the object of a verb or preposition’ or ‘objective case of who’. Run-in: a derivative, idiom, etc. placed within an entry. Run-on: a derivative, idiom, etc. placed at the end of an entry; a word or phrase not given separate headword status but added as a sub-entry under a word or phrase to which it is related. This is a typical treatment of derivatives that do not require separate definition; compare main entry. Semantic: of or concerning meaning or the distinction of meanings of a word or phrase. Semantic dictionary: a reference work arranged so that the user moves from meaning to word; also called onomasiological dictionary. Semantic field (lexical field): a set of words grouped by meaning which form a conceptual network, for example, colour terms; an area of human experience or perception that is described by a set of interrelated words/phrases. Semasiology: the explanation of the meaning of given words or phrases; semantics. Traditional dictionaries supply semasiological information while thesauruses and reverse dictionaries offer the opposite, onomasiological data. Sense: a meaning conveyed by a lexical unit (word or phrase), one of several meanings that can be established for a word or phrase and described by a definition in a dictionary; also called meaning. Sense discrimination: the division of meanings within a dictionary entry, the treatment of polysemy (multiple meanings) through rationalization, discrimination and display in dictionary entries; also called meaning discrimination. Sense ordering: the principles employed in a reference work for arranging the different senses (meanings) of the entries. Senses may be ordered historically according to the semantic changes the word/phrase has undergone, or by frequency of use, or logically in relation to a ‘core’ meaning from which other senses have developed. Sense relation: any semantic link between two or more words, such as by homonymy, hyponymy, polysemy, synonymy, complementarity, and antonymy. Two types are distinguished: inclusion (hyponymy, synonymy) and exclusion (antonymy). Slang: any informal word or phrase that is not considered appropriate in certain circumstances, such as formal occasions. Slang used by a particular group of people is sometimes called cant, jargon or argot. Slip: a piece of paper, card or database entry where a record is made of a lexical unit’s context and any other linguistic information excerpted from a written or spoken source, a record of information from a reading program for a dictionary project. 403
The Bloomsbury Companion to Lexicography
Source language: the language of the entry words, especially in a bilingual dictionary; the language of a text that is translated into another (target) language. Specialized dictionary: any reference work restricted to a subset of language or for a specific target audience, for example, law dictionary, dictionary of early American English. Specialized sense: a connotative, figurative or idiomatic meaning of a lexical unit (word or phrase) that exists only in a special context; the narrowing of a sense within a particular context, for example, virus in computing. Sprachgefühl: an intuition for language meaning and usage, a sensitivity to language, especially for what is grammatically or idiomatically acceptable in a given language. Standard-descriptive dictionary: a reference book intended to describe the standard uses of its vocabulary. Status label: a label used to mark the acceptability or currency of a word or phrase in a dictionary, for example, obsolete, rare. Style label (or stylistic label): a label used to mark the style level of a word or phrase in a dictionary, especially the formality and social acceptability of a sense’s or entry’s use, for example, colloquial, formal, informal, nonstandard, slang, vulgar; sometimes called status label. Subentry (or sub-entry): a listed or defined derivative of a lexical unit, one of the numbered senses of a headword within a dictionary entry; a derivative, idiom, etc. listed or defined within or following a dictionary entry; compare main entry. Subject label: a label used to mark the field, domain, or subject of a sense or entry, for example, chemistry, computer science, psychology. Subsense (or sub-sense): one of the distinct meanings of a polysemous word, often marked by a number; a meaning that follows or is attached to a main sense of a word or phrase and which gives a more specific meaning or use. Substitution principle: a method of defining in which the definition text is substitutable for the lexical unit (word or phrase) in context; the principle that a word or phrase in a text can be replaced by its dictionary definition for certain categories of words. Syllabification (or syllabication): a system of depicting word division for writing, using a symbol, as a centred dot or dividing line, to show the acceptable division points; the division of words into phonic syllables and their written representation by graphic syllables for purposes of hyphenation. Synonym: a lexical unit (word or phrase) whose meaning is similar to that of another lexical unit or units. Synonymy varies in degree and nature and there are no true synonyms as no two words have exactly the same sense in terms of denotation, connotation, formality or currency. 404
Glossary of Lexicographic Terms
Synonymy: a paragraph, usually following an entry that lists and discriminates lexical units similar to the entry word in meaning and usage, a display of synonym relations in the form of a short essay; also called synonym paragraph, synonym study. Syntax: a branch of grammar concerned with the part of speech or word class of lexical units (words and phrases) and their compatibility within sentences and texts; the grammatical information about a lexical unit. Synthetic definition: a definition which includes a representation of the object or idea being described, as ‘yellow: a colour whose hue resembles that of ripe lemons or sunflowers’; also called ostensive definition. In a dictionary this can be supplemented by pictorial illustration or otherwise pointing directly at an object. Target language: the language of the translations or equivalents in a bilingual dictionary; the language into which a source language is to be translated. Terminological dictionary: a type of reference work providing information about a special vocabulary or the vocabulary of a specialist field, for example, Dictionary of Lexicography. Thematic dictionary: a type of reference work that is organized by topics or concepts, for example, a thesaurus. Thesaurus: (1) a wordlist with synonyms, either arranged by concept/idea or alphabetically; a type of reference work presenting synonym networks between words within concepts, (2) an onomasiological or thematic reference work. Trade name (brand name): a proprietary name or symbol given to a business, company, product or service, which may or may not be registered as a trademark. Trade names that are officially registered and legally protected are trademarks. Some achieve generic status and become part of everyday language, for example, Kleenex. Translation equivalent: the translation (in the target language) of a lexical unit (in the source language) of a bilingual dictionary, a word or phrase in one language which corresponds in meaning to a word or phrase in another language; also called equivalent. Typifying definition: a definition that focuses on what is typical about the lexical unit being defined, for example, abaya = a long, loose-fitting overgarment, typically made of wool, traditionally worn in some Arab countries. Typography: the art and craft of composing type and fonts; the arrangement and appearance of words and letters in a printed document. Unabridged: complete, comprehensive in coverage; inclusive of standard and non-standard meanings and uses. In a dictionary family, this is the largest in size. Usage: the manner in which lexical units are used syntactically and semantically according to time, space or circumstance in a language or by a part of 405
The Bloomsbury Companion to Lexicography
society. Usage information is collected and presented in a dictionary often by descriptive labels or usage notes and/or by giving example sentences. Usage label: the marking of a word, phrase or sense for a syntactic or semantic restriction; the marking of a word or phrase as typical or appropriate in a particular context or language variety. Usage note: a note or paragraph offering explanation or guidance on syntactic or semantic restrictions on a lexical unit (word or phrase). Variant: a form of a word or phrase that is different from the standard (most commonly / frequently used) form – in spelling, pronunciation or grammar. Verbal illustration: a phrase or sentence excerpted from a written or spoken source or written by a lexicographer that contains a form of the entry word and which indicates additional information about its grammatical or semantic characteristics, such as its context and usage; also called citation, example, illustrative citation, illustrative example, illustrative phrase, illustrative quotation, illustrative sentence. Vocabulary: the total list of lexical units (words, phrases) chosen for entry in a dictionary; also called wordlist; the sum total of the words used in a language or by a speaker of a language. Word entry: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary. Canonical forms are almost always included. Also called entry, entry head, entry word, headword, keyword, lemma, main entry. Wordlist: the total list of lexical units (words, phrases) chosen for entry in a dictionary; also called vocabulary.
406
8
Annotated Bibliography Howard Jackson
The aim of this bibliography is to point the reasearcher in lexicography in the direction of the most useful sources and to further bibliographical material. It cannot possibly list all the publications on (meta-)lexicography that have appeared over the last half-century or so. For older works, bibliographies listed under 8.1 below may be consulted. The focus in this bibliography is on more recent and up-to-date work, and specifically on dictionary research rather than dictionaries per se. The bibliography is not a simple alphabetical list but is organized under a number of topic headings, which are intended to act as a guide to the reader. In topic 2 and from topic 5 onwards, works are listed in reverse chronological order (latest first).
Topic Headings 1 2 3 4 5 6 7 8 9 10 11 12 13
Bibliographies (of dictionary research) Encyclopedias, Compendia and Dictionaries of Lexicography Book Series Journals Manuals Textbooks Historical Lexicography Bilingual Lexicography For Learners Lexicography of Individual Languages Electronic Lexicography Dictionary Use Other Works
408 408 409 409 410 411 411 412 413 414 415 416 416
407
The Bloomsbury Companion to Lexicography
1 Bibliographies (of dictionary research) Euralex Bibliography of Lexicography, edited by Anne Dykstra, online at: http://euralex. pbworks.com/w/page/7230036/FrontPage. [still under construction, with participation from lexicographers invited, it includes: a thematic list, an alphabetical list and R. R. K. Hartmann’s bibliography (updated to July 2007)] Bibliografía temática de la lexicografía (2003), compiled by Félix Córdoba Rodríguez, online at: www.udc.es/grupos/lexicografia/bibliografia.htm. [the alphabetical listing is complete and contains over 10,000 items; but the thematic listing has not been finished] Boccuzzi, C., Centrella, M., lo Nostro, M. and Zotti, V. (2007) Bibliographie thématque et chronologique de métalexicographie 1950–2006. Fasano: Schena Editore. [with a concentration on French metalexicography, this volume presents its bibliography by topic and by chronology] Dolezal, F. T. and McCreary, D. R. (1999) Pedagogical Lexicography Today. A Critical Bibliography on Learners’ Dictionaries with Special Emphasis on Language Learners and Dictionary Users (Lexicographica. Series Maior 96). Tübingen: Max Niemeyer. [an annotated bibliography of over 500 articles in the field of pedagogical lexicography, with a topic index and commentary addressing the issues and debates within the field] Wiegand, H. E. (2006–7) Internationale Bibliographie zur germanistischen Lexikographie und Wörterbuchforschung, 3 Vols, Berlin: Walter de Gruyter. [International Bibliography of German Lexicography and Dictionary Research; Vol. 1 A-H, Vol. 2 I-R, Vol. 3 S-Z]
2 Encyclopedias, Compendia and Dictionaries of Lexicography Wiegand, H. E., Biebwenger, M., Gouws, R. F., Kammerer, M., Storrer, A. and Wolski, W. (eds) (2010) Wörterbuch zur Lexikographie und Wörterbuchforschung / Dictionary of Lexicography and Dictionary Research. Berlin: Walter de Gruyter. [first volume (A-C) of a proposed 4-volume dictionary containing the specialist terminology of dictionary research in about 5,600 headwords, 7,200 reference headwords and 50,000 headword equivalents in 9 languages, with an introduction to the subject in both English and German] Fontenelle, T. (ed.) (2008) Practical Lexicography: A Reader. Oxford: Oxford University Press. [a collection of significant articles on issues of dictionary compiling, under the following headings: I Metalexicography, macrostructure, microstructure, and the contribution of linguistic theory; II Corpus design; III Lexicographical evidence; IV Word senses and polysemy; V Collocations, idioms and dictionaries; VI Definitions; VII Examples; VIII Grammar and usage in dictionaries; IX Bilingual lexicography; X Tools for lexicographers; XI Semantic networks and wordnets; XII Dictionary use] Hartmann, R. R. K. (ed.) (2003) Lexicography: Critical Concepts, 3 Vols. London and New York: Routledge. [Vol. 1: Dictionaries, Compilers, Critics and Users; Vol. 2: Reference Works across Time, Space and Languages; Vol. 3: Lexicography, Metalexicography and Reference Science; 70 previously published articles, mostly from the twentieth century, collected together to relate to the themes represented in the titles of the volumes] Burkhanov, I. Y. (1998) Lexicography: A Dictionary of Basic Terminology. Rzeszów Wydawn. Wyższej Szkoły Pedagogicznej w Rzeszowie. Hartmann, R. R. K. and James, G. (1998) Dictionary of Lexicography. London and New York: Routledge. [a comprehensive listing of lexicographical terms, together with an extensive bibliography; second, revised paperback edition published in 2001] Martínez de Sousa, José (1995) Diccionario de lexicografía práctica (Dictionary of Practical Lexicography). Barcelona: Biblograf.
408
Annotated Bibliography Hausmann, F. J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (1989–91) Wörterbücher/ Dictionaries/Dictionnaires: An International Encyclopedia of Lexicography, Vols 1–3. Berlin: Walter de Gruyter. [No 5 in the series Handbücher zur Sprach- und Kommunikationswissenschaft; articles organized in 38 sections covering the whole range of lexicography, representing the state of the art at the end of the 1980s; a supplementary Vol. 4, Dictionaries. An International Encyclopedia of Lexicography. Recent Developments with Special Focus on Computational Lexicography, edited by Rufus H. Gouws et al., is in preparation and will contain 136 articles in 22 chapters. Vols 1–3 available online at: www.degruyter.com/view/serial/16647]
3 Book Series Études de lexicologie, lexicographie et dictionnairique, series edited by Bernard Quémada and Jean Pruvost, published by Honoré Champion, Paris. [The series is devoted to lexicology, lexicography and dictionary science; the issues raised by computerization in lexicography and dictionary science, the distinctions between monolingual and bilingual dictionaries, the study of words based on the latest research, are some of the subjects treated in this series.] Lexicographica: Series Maior. Supplementbände zum Internationalen Jahrbuch für Lexikographie. Max Niemeyer Verlag, Tübingen (until Vol. 134 in 2008); Walter de Gruyter Verlag, Berlin / New York (from 2009). [supplementary volumes to Lexicographica, the international annual of lexicography; since 1984 over 140 volumes have been published in this series, which constitutes an international library for the field of lexicography and dictionary research; the published volumes represent the whole range of current perspectives, from dictionary history and dictionary typology to dictionary criticism, and from dictionary use and dictionary structure to computational lexicography. Details of volumes published can be found at: http://pub.ids-mannheim.de/extern/lex/] Terminology and Lexicography Research and Practice, John Benjamins Publishing Co, Amsterdam. [aims to provide in-depth studies and background information pertaining to lexicography and terminology; general works include philosophical, historical, theoretical, computational and cognitive approaches; other works focus on structures for purpose- and domain-specific compilation (LSP), dictionary design, and training; the series (14 volumes published between 1999 and 2011) includes monographs, state-of-the-art volumes and course books in the English language]
4 Journals (see also Chapter 6) Cahiers de lexicologie, Institut de Linquistique Française, since 1959, two parts per year. [http://atilf.atilf.fr/jykervei/cahlex.htm] Dictionaries, Dictionary Society of North America, since 1979, single annual volume. [www.dictionarysociety.com/] International Journal of Lexicography, Oxford University Press, for European Association for Lexicography, since 1988, four parts per year. [http://ijl.oxfordjournals.org/] Lexicographica: Internationales Jahrbuch der Lexikographie, Walter de Gruyter (up to 2008, Max Niemeyer), since 1985, single annual volume. [www.degruyter.com/view/j/lexi] LexicoNordica, Nordisk Forening for Leksikografi, since 1994, single annual volume. [http://nordisksprogkoordination.org/nfl/publikationer/lexiconordica]
409
The Bloomsbury Companion to Lexicography Lexicon, Kenkyusha, for Iwasaki Linguistic Circle, Tokyo, since 1972, at least one volume per year. [see: http://kdictionaries.com/kdn/kdn15/kdn1504-akasu.html] Lexikos, Bureau of the WAT & African Association of Lexicography, since 1991, single annual volume (since 2011 available freely online at: http://lexikos.journals.ac.za/ pub). Revista de Lexicografía, Universidade de Coruña (Grupo de Lexicografía), since 1994/5, single annual volume. [www.udc.es/grupos/lexicografia/revista.htm] Studi di lessicografia italiana, Accademia della Crusca, since 1979, single annual volume (not all years). [www.accademiadellacrusca.it/riviste/riviste.php?ctg_id=74&vedI_elenco=1] Trefwoord, Fryske Akademy, since 1999, single annual volume (available online at: www. fryske-akademy.nl/?L=1&id=51).
5 Manuals Atkins, B. T. S. and Rundell, M. (2008) The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. [after an introductory chapter, there are three parts: 1 pre-lexicographic planning; 2 analysing the data; 3 compiling the entry] Fontenelle, T. (ed.) (2008) Practical Lexicography: A Reader. Oxford: Oxford University Press. [a collection of significant articles on issues of dictionary compiling, under the following headings: I Metalexicography, macrostructure, microstructure and the contribution of linguistic theory; II Corpus design; III Lexicographical evidence; IV Word senses and polysemy; V Collocations, idioms and dictionaries; VI Definitions; VII Examples; VIII Grammar and usage in dictionaries; IX Bilingual lexicography; X Tools for lexicographers; XI Semantic networks and wordnets; XII Dictionary use] de Villers, M.-E. (2006) Profession lexicographe. Montréal: Les Presses de l’Université de Montréal. [an introduction to lexicography and the profession of the practical lexicographer] Svensén, B. (2004) Handbok i lexicografi. Ordböcker och ordboksarbete i teori och praktik. Stockholm: Norstedts Akademiska Förlag. [translated into English and published in 2009 as A Handbook of Lexicography. The Theory and Practice of Dictionary-Making, Cambridge University Press; intended as a general orientation to the principles and methods of lexicography] van Sterkenburg, P. (ed.) (2003) A Practical Guide to Lexicography. Amsterdam and Philadelphia: John Benjamins. [a collection of 29 contributions, intended as a coursebook aimed at ‘professional lexicographers and students of language’, organised in two parts: 1 The forms, contents and uses of dictionaries; 2 Linguistic corpora (databases) and the compilation of dictionaries] Bergenholtz, H. and Tarp, S. (eds) (1995) Manual of Specialized Lexicography. The Preparation of Specialized Dictionaries Amsterdam and Philadelphia: John Benjamins. [aims to provide an improved foundation for practical LSP lexicography and a manual for would-be LSP dictionary-makers; chapters cover the range of information that such dictionaries may contain] Svensén, B. (1993) Practical Lexicography: Principles and Methods of Dictionary-Making. Oxford: Oxford University Press. [systematically and comprehensively covers monolingual and bilingual lexicography; updated by Svensén (2004)] Boguraev, B. (ed.) (1991) Building a Lexicon, special issue of the International Journal of Lexicography, 4/3. [three articles on the contributions of, respectively, lexicography, linguistics and computers to building a lexicon] Zgusta, L. (1971) Manual of Lexicography (Janua Linguarum. Series Maior 39). The Hague: Mouton. [the original manual, now out of print]
410
Annotated Bibliography
6 Textbooks Herbst, Th. and Klotz, M. (2003) Lexikografie. Eine Einführung. Paderborn: Schöningh (UTB). [deals, among other things, with the form of definitions, the selection of examples, the handling of collocations and idioms, and syntactic information in monolingual and bilingual dictionaries] Jackson, H. (2002) Lexicography. An Introduction. London and New York: Routledge/ Taylor & Francis. [overview of the history, types and content of dictionaries, with a concentration on English monolingual dictionaries, including those for learners] Hartmann, R. R. K. (2001) Teaching and Researching Lexicography (Applied Linguistics in Action Series). Harlow: Longman/Pearson Education. [deals with the relationship between lexicographic theory and practice in three sections: I Lexicography in Practice and Theory, II Perspectives on dictionary research, III Issues, methods and case studies, IV Resources] Landau, S. I. (2001) Dictionaries: The Art and Craft of Lexicography. Cambridge: Cambridge University Press. [2nd edition of a classic introductory account of the lexicography of English by a working lexicographer, with particular attention to the influence of the corpus revolution, and a useful chapter on legal and ethical issues] Béjoint, H. (2000) Modern Lexicography: An Introduction. Oxford: Oxford University Press. [originally published as Tradition and Innovation in Modern English Dictionaries in 1994, a useful introduction to the study of lexicography] Kipfer, B. A. (1984) Workbook on Lexicography (Exeter Linguistic Studies 8). Exeter: University of Exeter. [a guide to the study of lexicography, explaining the processes that lexicographers go through, with an especially useful section on types of defining style]
7 Historical Lexicography Adams, M. (ed.) (2010) ‘Cunning Passages, Contrived Corridors’: Unexpected Essays in the History of Lexicography. Monza: Polimetrica. [a collection of essays covering the history of lexicography, historical lexicography and historical lexicology] Cowie, A. P. (ed.) (2009) The Oxford History of English Lexicography, 2 Vols. Oxford: Oxford University Press. [Vol. 1 General-Purpose Dictionaries; Vol. 2 Specialized Dictionaries; covers from Middle Ages to present, chronologically in Vol. 1 and thematically in Vol. 2] San Vincente, F. (ed.) (2008–10) Textos fundamentales de la lexicografía italoespañola (1917– 2007), 3 Vols. Monza: Polimetrica. [a joint history of Italian and Spanish lexicography, examining dictionaries published in the two countries in the nineteenth and twentieth centuries; a fourth volume will cover 1570 to 1805] Considine, J. and Iamartino, G. (eds) (2008) Words and Dictionaries from the British Isles in Historical Perspective. Newcastle upon Tyne: Cambridge Scholars Publishing. [Eleven papers from the second International Conference on Historical Lexicography and Lexicology, with an emphasis on lexicography and dictionaries of English] Yong, H. and Peng, J. (2008) Chinese Lexicography: A History from 1046 BC to AD 1911. Oxford: Oxford University Press. [covers 3 millenia and 600 titles, including general-purpose dictionaries, dialect dictionaries, LSP dictionaries and encyclopedias; with a primary focus on monolingual lexicography] Coleman, J. (2004–8) A History of Slang and Cant Dictionaries, 3 Vols. Oxford: Oxford University Press. [Vol. 1 covers the period 1567–1784, Vol. 2 1785–1858, Vol. 3 1859– 1936; a projected fourth volume will cover the period up to 1984]
411
The Bloomsbury Companion to Lexicography Coleman, J. and McDermott, A. (eds) (2004) Historical Dictionaries and Historical Dictionary Research. Papers from the International Conference on Historical Lexicography and Lexicology, Leicester 2002 (Lexicographica. Series Maior 123). Tübingen: Max Niemeyer. [papers from the first conference on historical lexicography and lexicology, in two parts; Part 1 has 12 articles on dictionary history, Part 2 has 6 articles on historical dictionaries] Hayakawa, I. (2001) Methods of Plagiarism. A History of English-Japanese Lexicography. Tokyo: Jiyūsha. Hüllen, W. (1999) English Dictionaries 800–1700. The Topical Tradition. Oxford: Clarendon Press. [taking in some 400 titles, a discussion of the onomasiological (topical) strand of dictionary-making, especially in English, from its beginnings in the ninth century to the end of the seventeenth, from Aelfric to Wilkins] Van Hoof, H. (1994) Petite histoire des dictionnaires (Bibliothèque des Cahiers de l’Institut de Linguistique de Louvain 77). Louvain-la-Neuve: Peeters. [a review of monolingual and bilingual dictionaries from antiquity to the present] Boisson, C., Kirtchuk, P. and Béjoint, H. (1991) Aux origines de la lexicographie: les premiers dictionnaires monolingues et bilingues. International Journal of Lexicography 4/4, 261–315. [traces the origins of dictionaries back to ancient civilizations across the world] James, G. (ed.) (1989) Lexicographers and their Works. Exeter: Exeter University [19 articles on a range of lexicographical topics, including 9 on the history of lexicography]
8 Bilingual Lexicography Stark, M. (2011) Bilingual Thematic Dictionaries (Lexicographica. Series Maior 140). Berlin: Walter de Gruyter. [identifies the characteristic features of bilingual thematic dictionaries, evaluates their usefulness and proposes improvements] Hartmann, R. R. K. (2007) Interlingual Lexicography. Selected Essays on Translation Equivalence, Contrastive Linguistics and the Bilingual Dictionary (Lexicographica. Series Maior 133). Tübingen: Max Niemeyer. [a collection of 24 essays by Hartmann on the topic of translation equivalence and its treatment in the bilingual dictionary, especially from the perspective of the user] Yong, H. and Peng, J. (eds) (2007) Bilingual Lexicography from a Communicative Perspective. Amsterdam and Philadelphia: John Benjamins. [presentation of the ‘communicative theory of lexicography’, pioneered by Yong and Pen, and of the empirical investigation that underpins it] Adamska-Sałaciak, A. (2006) Meaning and the Bilingual Dictionary. The Case of English and Polish. Frankfurt: Peter Lang. [four chapters exploring the field of bilingual lexicography, entitled ‘Bilingual lexicography’, ‘Capturing meaning’, ‘Hunting for equivalents’ and ‘Giving examples’] Chan, Sin-Wai (ed.) (2004) Translation and Bilingual Dictionaries. Papers from the Hong Kong 2002 Conference (Lexicographica. Series Maior 119). Tübingen: Max Niemeyer. [after an introductory chapter by the editor on dictionaries and translators, the volume has two parts; Part 1 has eight articles on translation and bilingual dictionaries, Part 2 has eight articles on bilingual dictionaries and intercultural communication] Ferrario, E. and Pulchini, V. (eds) (2002) La Lessicografia Bilingue tra presente e avvenire. Vercelli: Ed. Mercurio. [papers from a conference on bilingual lexicography, concentrating on Italian, held in Vercelli, Italy in May 2000] Szende, T. (ed.) (2000a) Dictionnaires bilingues. Méthodes et contenus. Paris: Honoré Champion. [a collection of papers on bilingual lexicography given at the first ‘Journée sur la Lexicographie bilingue’, in 1998]
412
Annotated Bibliography Szende, T. (ed.) (2000b) Approches contrastives en lexicographie bilingue. Paris: Honoré Champion. [papers from the second ‘Journée sur la Lexicographie bilingue’, in 1999] Béjoint, H. and Thoiron, P. (eds) (1996) Les dictionnaires bilingues (Champs linguistiques). Louvain-la-Neuve: Aupelf-Uref-Duculot. [12 contributions on bilingual dictionaries] Farina, D. M .T. (ed.) (1996) The Translational Equivalent in Bilingual Lexicography (thematic issue of Lexicographica. International Annual Vol. 12). Tübingen: Max Niemeyer. Bartholomew, D. A. and Schoenhals, L. C. (1983) Bilingual Dictionaries for Indigenous Languages. México: Summer Institute of Linguistics. [a practical manual for the fieldworker on the preparation of bilingual dictionaries, oriented towards the languages of Central America but generalizable to the indigenous languages of other areas]
9 For Learners Bielińska, M. (2010) Lexikographische Metatexte. Eine Untersuchung nichtintegrierter Außentexte in einsprachigen Wörterbüchern des Deutschen als Fremdsprache. Frankfurt: Peter Lang. [a discussion of the ‘outer texts’ (front and back matter) in German monolingual learners’ dictionaries] Fuertes-Olivera, P. A. (ed.) (2010) Specialised Dictionaries for Learners (Lexicographica. Series Maior 136). Berlin/New York: De Gruyter. [a contribution to pedagogical specialized lexicography, it argues for the need for better specialized dictionaries for learners based on a sound theoretical framework] Kernerman, I. J. and Bogaards, P. (eds) (2010) English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: K Dictionaries Ltd. [eleven papers given at the special seminar at the Dictionary Society of North America conference of 2009 by contributors from a range of countries, reflecting on the learners’ dictionary tradition in English] Tarp, S. (2008) Lexicography in the Borderland between Kknowledge and Non-knowledge. General Lexicographical Theory with Particular Focus on Learner’s Lexicography (Lexicographica. Series Maior 134). Tübingen: Max Niemeyer. [proposes ‘function theory’, which is then applied to and illustrated from learners’ dictionaries] Welker, H. A. (2008) Panorama geral da lexicografia pedagógica. Brasilia: Thesaurus. [aims to provide a general overview of the field of pedagogical lexicography for researchers, students and language teachers; includes summaries of dictionary reviews] Fuertes-Olivera, P. A. and Arribas-Baño, A. (2008) Pedagogical Specialized Lexicography. Amsterdam and Philadelphia: John Benjamins. [deals with specialized dictionaries used to meet pedagogical needs in the teaching of business English and LSP translation to Spanish learners] Leaney, C. (2007) Dictionary Activities. Cambridge: Cambridge University Press. [aimed at language teachers for developing students’ dictionary skills; explains what the features of a dictionary are and how to navigate a dictionary, through to more complex topics such as collocations, idioms and word building; also looks at the use of electronic dictionaries and specialized dictionaries] Huszár, A. (2006) Eine vergleichende Untersuchung von Lernerwörterbüchern des Deutschen und Englischen. Hamburg: Verlag Dr. Kovač. [Vol. 3 in the series Angewandte Linguistik aus interdisziplinärer Sicht (ed. K.-D. Baumann); a comparison of learners’ dictionaries in German and English, with a particular focus on the treatment of ‘Funktionsverbgefüge’ and compound verbs] Stein, G. (2002) Better Words. Evaluating EFL Dictionaries. Exeter: Exeter University Press. [a collection of papers authored by Stein, some previously unpublished, on EFL lexicography, focusing on the suitability and effectiveness of monolingual and bilingual EFL dictionaries in teaching and learning]
413
The Bloomsbury Companion to Lexicography Humblé, P. (2001) Dictionaries and Language Learners. Frankfurt: Haag & Herchen. [elaborates a model for a foreign language learner’s dictionary, taking account of didactic and pedagogical implications; full text available at www.pget.ufsc.br/publicacoes/professores/PhilippeHumble/Philippe_R._M._Humble_-_A_New_Model_ For_A_Foreign_Language_Learner_s_Dictionary.pdf] Heuberger, R. (2000) Monolingual Dictionaries for Foreign Learners of English: A Constructive Evaluation of the State-of-the-Art Reference Works in Book Form and on CD-ROM (Austrian Studies in English 87). Vienna: Braumüller. [a critique of late-1990s English dictionaries for advanced learners, both in print and electronic form] Cowie, A. P. (1999) English Dictionaries for Foreign Learners: A History. Oxford: Clarendon Press. [a history of the development of the genre, together with discussions on phraseology, the role of the computer and user-related research] Herbst, T. and Popp, K. (eds) (1999) The Perfect Learners’ Dictionary(?) (Lexicographica. Series Maior 95). Tübingen: Max Niemeyer. [papers from a symposium held in Erlangen in 1997; Part 1 contains 15 papers on the 1995 generation of English learners’ dictionaries; Part 2 has 4 papers on other types of learner dictionary; Part 3 has 4 papers on dictionaries and corpora] Stark, M. (1999) Encyclopedic Learners’ Dictionaries. A Study of their Design Features from the User Perspective (Lexicographica. Series Maior 92). Tübingen: Max Niemeyer. [an examination of this hybrid dictionary seeks to identify what encyclopedic learners’ dictionaries are as a type, to investigate their usefulness to learners and to suggest how their design might be improved in order to serve users’ needs better] Zöfgen, E. (1994) Lernerwörterbücher in Theorie und Praxis (Lexicographica. Series Maior 59). Tübingen: Max Niemeyer. [a discussion of the theory and practice of pedagogical lexicography, with special reference to French learners’ dictionaries]
10 Lexicography of Individual Languages Béjoint, H. (2010) The Lexicography of English. From Origins to Present. Oxford: Oxford University Press. [covering both dictionary history and current issues, this work surveys the range of lexicography in English, in both Britain and the United States, with useful contrastive perspectives with French lexicography, and speculation on the future of the dictionary] Żmigrodzki, P. (2009) Wprowadzenie do leksykografii polskiej, 3rd edition. Katowice: Wydawnictwo Uniwersytetu Śląskiego. [introduction to Polish lexicography] Correia, M. (2008) Os Dicionários Portugueses. Lisboa: Caminho. [dictionaries of Portuguese] Ishikawa, S., Minamide, K., Murata, M. and Tono, Y. (eds) (2006) English Lexicography in Japan. The Jacet Society of English Lexicography and Taishukan Publishing Company. [a collection of articles by Japanese scholars, showing the range of interest in Japan in the lexicography of English, under the following headings: 1 Dictionary and words; 2 Dictionary – analysis and comparisons; 3 Dictionary and pragmatics; 4 Dictionary and gender; 5 Dictionary and education] Pruvost, J. (2006) Les dictionnaires français, outils d’une langue et d’une culture. Paris: Éditions Ophrys. [a review of dictionaries of French from the perspectives of lexicology and lexicography] Ruhstaller, S. and Prado Aragonés, J. (eds) (2001) Tendencias en la investigación lexicográfica del español. El diccionario come objeto de estudio linguístico y didáctico. Huelva: Universidad de Huelva. [trends in the lexicographic investigation of Spanish]
414
Annotated Bibliography James, G. (2000) Colporul: A History of Tamil Dictionaries. Chennai: Cre-A. [a history of Tamil lexicography from the earliest times, set in the context of reference science and soiolinguistics, with an extensive bibliography and discussion of unpublished manuscripts] McArthur, T. and Kernerman, I. (eds) (1998) Lexicography in Asia. Tel Aviv: Password Publishers Ltd. [around a dozen papers, largely from a 1997 conference on ‘Dictionaries in Asia’, surveying the field and discussing issues relevant to the Asian continent] Dodd, W. S. (ed.) (1995) A Survey of Spanish Lexicography/Panorama de la Lexicografía Española, Special Issue of International Journal of Lexicography 8/3. [four articles in Spanish on the history of and current issues in Spanish lexicography]
11 Electronic Lexicography Granger, S. and Paquot, M. (eds) (2012) Electronic Lexicography. Oxford: Oxford University Press. [after an introduction by Sylviane Granger, Part I contains six articles under the title ‘Lexicography at a Watershed’, Part II ‘Innovative Dictionary Projects’ contains seven articles, and Part III ‘Electronic Dictionaries and their Users’ contains six articles] Fuertes-Olivera, P. A. and Bergenholtz, H. (eds) (2011) e-Lexicography. The Internet, Digital Initiatives and Lexicography. London: Continuum. [15 papers covering current issues in e-lexicography; Part 1 on ‘function theory’, and Part 2 on specific topics arising from electronic dictionary projects] Kosem, I. and Kosem, K. (eds) (2011) Electronic Lexicography in the 21st Century. New Applications for New Users. Ljubljana: Trojina, Institute for Applied Slovene Studies. [Proceedings of eLex 2011 held in Bled, Sovenia, 10–12 November 2011] Granger, S. and Paquot, M. (eds) (2010) eLexicography in the 21st Century: New Challenges, New Applications. Presses Universitaires de Louvain. [proceedings of the eLex 2009 conference at Louvain-la-Neuve] Nesi, H. (2009) Dictionaries in electronic form. In: A. P. Cowie (ed.) The Oxford History of English Lexicography, Vol. II. Oxford: The Clarendon Press, 458–78. Nielsen, S. (2009) Reviewing printed and electronic dictionaries: a theoretical and practical framework. In: S. Nielsen and S. Tarp (eds) Lexicography in the 21st Century. In Honour of Henning Bergenholtz. Amsterdam and Philadelphia: John Benjamins, 23–41. Almind, R. (2005) Designing internet dictionaries. Hermes 34, 37–54. [one of a number of articles in this issue of the journal that deal with e-lexicography; it argues that print and electronic dictionaries require very different design solutions; available to download from: http://download2.hermes.asb.dk/archive/2005/Hermes34.html] Haß, U. (ed.) (2005) Grundfragen der elektronischen Lexicographie. Berlin: Walter de Gruyter. [describes the ‘elexico’ on-line dictionary project of German at the Institut für deutsche Sprache, Mannheim – www.elexico.de] Zock, M. and Carroll, J. (eds) (2003) Les dictionnaires électroniques: pour les personnes, les machines ou pour les deux? [Issue 44/2 of Revue TAL, published by Association pour le Traitement Automatique des Langues] Corréard, M.-H. (ed.) (2002) Lexicography and Natural Language Processing. A Festschrift in Honour of B.T.S. Atkins. Grenoble: Euralex. [full text available electronically on the Euralex website at: www.euralex.org/elx_proceedings/Lexicography%20and%20 Natural%20Language%20Processing/]
415
The Bloomsbury Companion to Lexicography
12 Dictionary Use Lew, R. (ed.) (2011) Studies in Dictionary Use: Recent Developments, special issue of International Journal of Lexicography 24/1. Oxford: Oxford University Press. [seven articles, including an introductory one by Lew, on aspects of dictionary use research] Magay, T. (ed.) (2006) Szótárak és használóik. [Dictionaries and their Users] Budapest: Akadémiai Kiadó. Welker, H. A. (2006) O Uso de dicionários: Panorama geral das pesquisas empiricas. Brasilia: Thesaurus. [an overview of empirical research into dictionary use, summarizing some 220 studies] Lew, R. (2004) Which Dictionary for Whom? Receptive Use of Bilingual, Monolingual and Semi-bilingual Dictionaries by Polish Learners of English. Poznań: Motivex. [full text available online at www.staff.amu.edu.pl/~rlew/pub/Lew_2004_book.pdf] Tono, Y. (2001) Research on Dictionary Use in the Context of Foreign Language Learning. Focus on Reading Comprehension (Lexicographica. Series Maior 106). Tübingen: Max Niemeyer. [aims to show how research into dictionary use can contribute to the improvement of dictionary design and the clarification of issues in language learning; it summarizes previous dictionary use research and reports on studies carried out by the author] Nesi, H. (2000) The Use and Abuse of EFL Dictionaries. How Learners of English as a Foreign Language Read and Interpret Dictionary Entries (Lexicographica. Series Maior 98). Tübingen: Max Niemeyer. [discusses experimental design problems, especially the unreliability of questionnaires; it proposes the need for detailed accounts of individual dictionary consultations and reports on a number of experiments using computers to gather information on large numbers of individual consultations] Atkins, B. T. S. (ed.) (1998) Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators (Lexicographica. Series Maior 88). Tübingen: Max Niemeyer. [reports on eight studies researching dictionary use by a range of groups, from school students to professors, and with the help of different types of dictionary, both monolingual and bilingual] Battenburg, J. D. (1991) English Monolingual Learners’ Dictionaries. A User-oriented Study (Lexicographica. Series Maior 39). Tübingen: Max Niemeyer. [a description of eleven research projects, all in higher education institutions, that aimed to understand dictionary use better]
13 Other Works Bański, P. and Wójtowicz, B. (eds) (2011) Issues in Modern Lexicography. München: Lincom Europa. Zgusta, L. (2006) Lexicography Then and Now (Selected Essays edited by F. F. M. Dolezal and T. B. I. Creamer) (Lexicographica. Series Maior 129). Tübingen: Max Niemeyer. [updated and edited essays by Zgusta on a range of lexicographical topics, which shed light on and question current theory and practice] Apresjan, J. (2000) Systematic Lexicography. Oxford: Oxford University Press. [a translation into English, by Kevin Windle, of this seminal Russian work on lexicography] Dalby, A. (1998) A Guide to World Language Dictionaries. London: Library Association Publishing. [bibliography of the most important dictionaries of 285 languages and language groups globally, encompassing around 1,640 dictionaries] Kachru, B. and Kahane, H. (eds) (1995) Cultures, Ideologies and the Dictionary, Studies in Honor of Ladislav Zgusta. Tübingen: Max Niemeyer. [especially good on the history of lexicography and on the cultural and ideological influences on dictionaries. Part I
416
Annotated Bibliography Contextualizing Culture; Part II Lexicography in Historical Context; Part III Ideology, Norms, and Language Use; Part IV Pluricentricity and Ethnocentrism; Part V Dictionaries across Languages and Cultures; Part VI Language Dynamics vs Prescriptivism; Part VII Language Learner as the Consumer; Part VIII Structuring Semantics; Part IX Ethical Issues and Lexicologists’ Biases; Part X Terminology across Cultures] Shcherba, L. (1995) Towards a general theory of lexicography. International Journal of Lexicography 8/4, 314–50 [an English translation of Shcherba’s article, originally published in Russian in 1940] Burchfield, R. W. (ed.) (1987) Studies in Lexicography. Oxford: Clarendon Press. [a collection of essays dealing with historical, period and modern regional dictionaries such as DARE (Dictionary of American Regional English) and the Australian National Dictionary]
417
Index academy 39, 326, 342, 374–6 adaptive technologies 45, 235, 254, 331, 332–3 Afrilex 20, 377 alphabetical ordering 167–70, 246–7 Asialex 377 audio file 101, 127, 131, 358 bidirectional 50, 215, 220, 225, 228, 261 bilingual dictionary 23, 26, 37, 50, 67, 69, 93, 213–31, 234, 260, 280, 289–94, 296, 311, 359, 412 bilingualized dictionary 23, 67, 69, 219, 290, 294 collocation(s) 24, 38, 78, 83–6, 94, 121–2, 127–9, 169, 191, 362, 363, 387 corpus (linguistics) 27, 38, 42, 77–96, 102, 151–2, 166, 169, 192, 195, 207, 218, 235, 260, 264, 278–9, 287, 294, 328, 379, 387 coverage 40, 50, 55, 156, 213 crowdsourcing 45 definition 24, 27, 37, 42, 50, 101, 111, 119, 148, 154, 173, 190, 201, 202–4, 206, 270, 296–9, 309, 362, 363, 367–8 dictionary analysis 22, 54, 56 dictionary criticism/evaluation 2, 19, 21, 48, 49, 179, 373 dictionary database 36–8, 80, 219, 280, 315, 332, 333, 334–5, 360–7, 379–80 dictionary design 25, 37–8, 70, 168–73, 306, 361, 369 dictionary research 1, 20, 181, 373, 389, 407 dictionary skills 56, 62, 68, 70, 169, 174, 176, 180, 181, 182–3, 236, 251 Dictionary Society of North America 19, 377 dictionary structure 23–5, 313, 373 dictionary use(r) 25–7, 62–74, 97, 170, 174–6, 181–3, 235, 236, 294, 373, 416 dictionary writing systems 43–4 digitization 342, 343, 353, 359–60 domain (label) 38, 90, 225
electronic (e-)dictionary 44–5, 62, 68, 70, 98, 99, 117, 170, 196, 206, 221, 251, 326, 415 e-lexicography 44, 323–40, 383, 415 embedded dictionary 46 encyclopedic dictionary 20, 154, 314, 357 etymology 22, 69, 101, 155–9, 190, 261, 265, 345 Euralex 20, 377, 408 examples 24, 26, 37, 51, 91, 195, 204–5, 226–8, 280, 363 figurative expression 223–4, 293 frequency 24, 40, 43, 79, 160, 170, 171, 192, 195, 197–8, 218, 220, 246, 264, 291, 292 full-sentence definition (FSD) 173, 192, 297, 362 function theory 315, 325, 330 grammar code 191, 192, 198, 226 guide words 26, 195, 199, 200 handbook (of lexicography) see manual handheld dictionary 196 headword 24, 40, 79–83, 107, 168, 190, 216, 222, 228, 266, 269, 368 historical lexicography 148–64, 291, 297, 341–54, 378, 411 history of lexicography 19, 21, 233, 260, 344, 373, 375 information science 324, 330, 334, 356 Iwasaki Linguistic Circle 22, 54, 382 journals 381, 409–10 labels 51, 54, 70, 84–90, 225, 226 learner’s dictionary 23, 166, 167–81, 413 lemma 25, 38, 40–2, 43, 80, 239, 243, 264, 269–72 lemma selection 38, 40–2, 264 lemmatization 49, 80–1, 237, 239–46, 247, 260, 264–9
419
Index lexicography 1, 20, 37, 46, 78, 167, 249, 254, 303, 307–8, 314, 317, 323, 330, 341, 353, 356, 373 limited defining vocabulary 167, 171, 172–3, 190, 191, 195, 201–2, 298
pronunciation 24, 37, 51, 102, 131, 150, 155, 167, 190, 259, 269, 277, 345, 358
macrostructure 21, 23, 167, 168, 170, 219 manual (of lexicography) 19, 317, 410 meaning(s) 55, 69, 85, 102, 131, 150, 167, 169, 180, 190, 199, 203, 216, 222, 239, 242, 266, 286, 289, 293, 296, 308, 344, 351, 365 mediostructure 25, 247 megastructure 25, 219–22 metalexicography 1, 303, 309 microstructure 24, 172, 222–8 monitor corpus 81 monolingual dictionary 23, 35, 51, 67, 69, 167, 188, 216, 219, 261, 287, 291, 296, 304, 313, 359 morphology 37, 150, 168, 233, 252, 259, 267, 270, 280 multiword expression 24, 79–80, 220–1, 223, 285
reference corpus 81, 83, 102 reference science 2, 228, 373 reference skills see dictionary skills reference tool 202, 324, 356, 374 regional (label) 41, 90, 150, 153, 160, 225, 270 register 70, 89–90, 154, 298 run-on 168, 285
navigation 103, 119, 126, 195, 295, 331, 332, 368 neologism 21, 45, 81–3, 220 nesting 168, 169 normative tradition 39, 40, 266 online dictionary 71, 103, 252, 324, 326, 327, 335, 358, 362, 367, 383 onomasiological dictionary 23, 132, 216, 296 outer text 219, 221–2, 413 parallel corpora 91–4, 218, 289, 294 pedagogical lexicography 91, 165–87, 215, 307, 408 polysemy 70, 150, 170–1, 202, 222–3, 265, 286, 294, 299 practical lexicography 1, 304, 316, 357 printed dictionary 38, 44, 170, 261, 272–3, 312, 323, 326–7, 358, 359
420
quotation(s) 37, 51, 149, 151, 154, 309, 344, 346, 352
scientific vocabulary 153, 154, 172 search techniques 91, 98, 103–5 sense ordering 291–4 senses 54–5, 85, 149, 150, 159, 170, 192, 216, 219, 223, 284–301 signposts 26, 70, 171, 195, 199–200, 206, 295 Sketch Engine 27, 78, 80, 85, 316 specialized dictionary 23, 71, 159, 261, 328, 331, 352, 356, 413 style guide 38, 43, 307 theory (of lexicography) 1, 3, 28, 303–20, 329, 357 thesaurus 85, 351–2 translation 80, 91–4, 102, 221, 228, 262, 277, 304–5, 311, 336, 359 translation equivalent 148, 251, 260 typology (of dictionaries) 1, 22, 68, 99, 373 usage note 51, 100, 191, 328, 365 user profile 45, 66, 81, 331 user studies 26–7, 45, 172, 175, 195 video clip 37, 43, 261–3, 269, 280, 348, 358, 368 word sketch 83–6, 89, 93