212 16 11MB
English Pages [505] Year 2022
The Bloomsbury Handbook of Lexicography
ALSO AVAILABLE FROM BLOOMSBURY An Introduction to English Lexicology, by Howard Jackson and Etienne Zé Amvela On Invisible Language in Modern English, by Evelyn Gandón-Chapela The Bloomsbury Handbook of Discourse Analysis, edited by Ken Hyland, Brian Paltridge and Lillian Wong Vague Language, Elasticity Theory and the Use of ‘Some’, by Grace Qiao Zhang and Nhu Nguyet Le
The Bloomsbury Handbook of Lexicography Second Edition
Edited by Howard Jackson
BLOOMSBURY ACADEMIC Bloomsbury Publishing Plc 50 Bedford Square, London, WC1B 3DP, UK 1385 Broadway, New York, NY10018, USA 29 Earlsfort Terrace, Dublin 2, Ireland BLOOMSBURY, BLOOMSBURY ACADEMIC and the Diana logo are trademarks of Bloomsbury Publishing Plc First published in Great Britain 2022 Copyright © Howard Jackson and Contributors, 2022 Howard Jackson has asserted his right under the Copyright, Designs and Patents Act, 1988, to be identified as Editor of this work. Cover image © Catherine McQueen/Getty Images All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. Bloomsbury Publishing Plc does not have any control over, or responsibility for, any third-party websites referred to or in this book. All internet addresses given in this book were correct at the time of going to press. The author and publisher regret any inconvenience caused if addresses have changed or sites have ceased to exist, but can accept no responsibility for any such changes. A catalogue record for this book is available from the British Library. A catalog record for this book is available from the Library of Congress. ISBN: HB: 978-1-3501-8170-0 ePDF: 978-1-3501-8171-7 eBook: 978-1-3501-8172-4 Series: Bloomsbury Handbooks Typeset by Integra Software Services Pvt. Ltd. To find out more about our authors and books visit www.bloomsbury.com and sign up for our newsletters.
Contents
List of Figures List of Tables List of Contributors
viii xiii xv
1 Introduction Howard Jackson
1
2 A history of research in lexicography Paul Bogaards
5
PART I RESEARCH METHODS AND PROBLEMS 3 Researching lexicographical practice Lars Trap-Jensen
19
4 Methods in dictionary criticism Kaoru Akasu
31
5 Researching users and uses of dictionaries Hilary Nesi
43
6 Methods in (meta)lexicography Howard Jackson
57
PART II CURRENT RESEARCH AND ISSUES 7 Using corpora as data sources for dictionaries Adam Kilgarriff
71
8 Researching the use of electronic dictionaries Verónica Pastor and Amparo Alcina
89
9 Researching historical lexicography and etymology John Considine
131
10 Researching pedagogical lexicography Amy Chi
145
Contents
11 Monolingual learners’ dictionaries – past and future Shigeru Yamada
165
12 Issues in compiling bilingual dictionaries Arleta Adamska-Sałaciak
193
13 Aspects of African language lexicography D. J. Prinsloo
209
14 Issues in sign language lexicography Inge Zwitserlood, Jette Hedegaard Kristoffersen and Thomas Troelsgård
227
15 Identifying, ordering and defining senses Robert Lew
251
16 A theory of lexicography – is there one? Tadeusz Piotrowski
267
17 Compiling dictionaries for minority and endangered languages Verna Stutzman and Kevin Warfel
285
18 Aspects of multi-word expressions in Asian lexicography Vincent B. Y. Ooi, Ai Inoue, Kilim Nam and Cuilian Zhao
309
19 Issues in onomasiological lexicography Gerardo Sierra
325
20 Issues in collaborative and crowdsourced lexicography Franck Sajous and Amélie Josselin-Leray
343
PART III NEW DIRECTIONS IN LEXICOGRAPHY 21 Theoretical, technological and financial challenges: Some reflections for making online dictionaries Pedro A. Fuertes-Olivera
361
22 The future of historical dictionaries, with special reference to the online OED and thesaurus Charlotte Brewer
375
23 The future of dictionaries, dictionaries of the future Sandro Nielsen
389
24 The design of internet dictionaries Annette Klosa-Kückelhaus and Frank Michaelis
405
vi
Contents
25 Resources Reinhard Hartmann
423
Glossary of lexicographic terms Barbara Ann Kipfer 441 Annotated bibliography Howard Jackson458 Names Index 470 General Index 472
vii
List of Figures
7.1 Word sketch for baby (from enTenTen12, a very large 2012 web corpus)
77
7.2 Thesaurus entry for gargantuan78 7.3 Sketch diff comparing strong and powerful79 7.4 Screenshot from Linguee.com for English search term baby, language pair English-German84 7.5 Bilingual word sketch, based on a parallel corpus, for red for the language pair English-French85 8.1 Classification of electronic dictionaries by Lehr (1996: 315), translation in de Schryver (2003: 148)
91
8.2 Representation of two search techniques in a dictionary
95
8.3 Search by inflected form in the Ultralingua dictionary
97
8.4 Presence of the words agreement and lawsuit in the OneLook Reverse Dictionary99 8.5 Presence of any of the words in the Wordsmyth dictionary
100
8.6 Presence of one word and absence of another word using operators in the CED101 8.7 Presence of an exact sequence of words in TERMIUM Plus102 8.8 Queries in natural language in the OneLook Reverse Dictionary103 8.9 Use of part of speech filters in Cercaterm104 8.10 Use of thematic area filters in Cercaterm106 8.11 Search in the alphabetical list of entries in CED
107
8.12 Search in the inverse alphabetical list of entries in the DRAE
108
8.13 Search in the definition fields of the Wordsmyth109 8.14 Search in the semantic relations field by navigation in the WordNet dictionary
110
8.15 Direct search in the semantic relations in the Wordsmyth dictionary
111
List of Figures
8.16 Search in the lexical relations field in the DiCoInfo dictionary
111
8.17 Search in a complementary forum in the WordReference dictionary
112
8.18 Search in a complementary corpus in Just The Word dictionary113 8.19 Search in the DRAE thematic field index
114
8.20 Search in the external links access field of the OneLook Reverse Dictionary115 8.21 Result of information about use in context in FrameNet116 8.22 Graphical representation of relations in EcoLexicon117 11.1 Entry on Wordfinder
183
13.1 Extract of isiZulu noun prefixes and concords from isiZulu.net (https://isizulu.net/)
210
13.2 Formation of the isiZulu possessive for the food of a person in Zulu e-Dict test version (Bosch and Faaß 2014: 743)
211
13.3 A section of the decision tree for Sepedi copulatives (Prinsloo and Bothma 2020: 90)
212
13.4 Results for intensify in Google
213
13.5 umhlinzantulo in EID
215
13.6 ngamasonto in isiZulu.net217 13.7 Guidance on the range of application of umzala in OZSD
217
13.8 Syntactic information regarding the use of upša in ONSD
218
13.9 Data box contrasting trom, trommel and drom in RD
218
13.10 Sepedi Ruler (Prinsloo and De Schryver 2007: 191)
223
13.11 Theoretical framework of Simultaneous Feedback (De Schryver and Prinsloo 2000: 198)
223
14.1 Representation of a sign in a (printed) Finnish Sign Language – Finnish dictionary (Malm 1998). Entry for the sign meaning ‘expensive’, ‘precious’, ‘valuable’. The initial and final hand configurations are shown, and the movement is indicated by an arrow. (Copyright Malm 1998. Reprinted with permission of the editor.)
229
14.2 Entry for the sign for ‘worker’ in the online bilingual Dutch-Flemish Sign Language/Flemish Sign Language-Dutch dictionary. Sign representation through video clip and SignWriting (Van Herreweeghe et al. 2004)
230
ix
List of Figures
14.3 Entry for the sign for ‘to cut’ in the online health care dictionary of German Sign Language. Sign representation through video clip and HamNoSys (Konrad et al. 2007b)
231
14.4 Two variants of the sign for ‘hair’ in DTS. Illustrations from the DTS dictionary (Center for Tegnsprog 2008–12)
232
14.5 Classifier predicate in NGT. Literally: ‘upright animate entity moves upwards near cylindrical entity’. In context: ‘cat goes up inside drainpipe’ (Crasborn et al. 2008, Creative Commons license BY-NC-SA)
234
14.6 Examples of numeral incorporation in DTS. All pictures are from the DTS dictionary (Center for Tegnsprog 2008–12)
235
14.7 Example of a sign entry in Global Signbank (NGT); description of all form units with their meaning (in this sign)
236
14.8 Example of a sign entry in the NZSL dictionary (McKee 2011)
238
14.9 Example of a sign entry in the DTS Dictionary (Center for Tegnsprog 2008–12)239 14.10 Handshape selection window in the NZSL dictionary (McKee 2011). The user can choose from 30 handshape groups, comprising in total 63 handshapes
240
14.11 Selection window for place of articulation in the NZSL dictionary (McKee 2011)
241
14.12 Movement selection window in the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). The options are (from left to right): straight or curved movement, circular movement, twisting or bending wrist, opening or closing hand, finger wiggling, and no movement
241
14.13 Selection window for finger orientation in the Online Health Care dictionary of German Sign Language (Konrad et al. 2007b)
242
14.14 Mouth movement selection window in the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). The user can choose from 57 options, 17 shown as photos, as above and 40 rendered as text in a drop-down list
242
14.15 Search result list from the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). Each entry found is represented by a photo, an ID-number and the first line of Finnish equivalents (for each sense). The selected search criteria are located to the right of the result list (not shown in this screenshot)
243
14.16 Search result list from the NZSL dictionary (McKee 2011). Each entry found is represented by a drawing, a main gloss, secondary equivalents (if present)
x
List of Figures
and the parts of speech of the sign. The selected search criteria are shown above the result list
244
14.17 Search result list from the DTS Dictionary (Center for Tegnsprog 2008–12). Each entry found is represented by a photo, an ID-gloss, 0–3 relevance markers and the first Danish equivalent (of each sense). The selected search criteria are shown to the left of the result list
245
17.1 Timeline of significant dictionary publications
286
17.2 How much of the Lexicon does a person need to know? (Echerd 2019: 11)
289
17.3 Semantic Domain = cluster of words (Moe 2007: 2)
289
17.4 Sample Semantic Domain Questionnaire
290
17.5 Traditional oral literature (Van den Berg 2012: 7)
291
17.6 Muna data (Van den Berg 2012: 8)
292
17.7 English reversal index (Captain and Captain 2019, as cited in Stutzman, Warfel and Bryson 2020c: 27)
296
17.8 Entry with audio and colour-coded language data (Lopez and Broadwell 2013)
297
17.9 Dictionary entry (Niggli 2016b, as cited in Stutzman, Warfel and Bryson 2020b: 4)
298
17.10 Structure of an entry (Stutzman, Warfel and Bryson 2020b, 7)
298
17.11 Subsenses of senses (Job 2020)
301
17.12 Subentries vs. senses
302
17.13 Cross references (Niggli 2016a, as cited in Stutzman, Warfel and Bryson 2020b: 27)
302
18.1 Development of the bilingual Lexicon311 18.2 Representation of aculturated chunks
313
19.1 Dual approach between semasiology and onomasiology
326
19.2 Uses of the dictionaries
328
19.3 Example of a graph
336
21.1 Green colour for the equivalent in the DWS of the Diccionarios Valladolid-UVa367 21.2 Orange colour for the equivalent in the DWS of the Diccionarios Valladolid-UVa368 xi
List of Figures
23.1 Search result helping users to understand the word or term ‘notification’
395
23.2 Search result helping users to write texts with the word or term ‘notification’
396
23.3 Search result helping users to translate the word or term ‘notification’
396
23.4 Data providing help to find a word or term where the meaning is known
397
23.5 Help in cognitive situations providing all data addressed to the search word
398
24.1a Heat-map of participants in an eye-tracking study scanning the OWID website as a whole (Müller-Spitzer, Koplenig and Michaelis 2014: 724)
409
24.1b Heat-map of participants in an eye-tracking study scanning the OWID website for all dictionaries included (Müller-Spitzer, Koplenig and Michaelis 2014: 724)
410
24.2 Entry ‘administrator’ in Dictionary of South African English412 24.3 Examples for design primitives in Dictionary of South African English413
xii
List of Tables
5.1 Activities associated with dictionary use
48
7.1 Complexity in verb lemmatization rules for English
74
7.2 The ‘most passive’ verbs in the BNC, for which a ‘usually passive’ label might be proposed81 8.1 Summary of our classification of search techniques for electronic dictionaries
118
8.2 Summary of the electronic dictionary analysis
125
10.1 Samples of recent research studies of dictionary users published in the International Journal of Lexicography153 11.1 Development of Mainstream EFL Dictionaries
166
11.2 Indication of ‘want+object+to-infinitive’ in Major EFL Dictionaries*173 11.3 EFL Dictionaries Adopting Signposts and Menus
176
13.1 A sentence in the indicative mood in Sepedi
210
13.2 Negation strategies for a verbal mood
220
13.3 Extract from the possessive construction for Sepedi noun classes
221
18.1 Linguistic and situational contexts for 麻烦 má fan (sense 1)
313
18.2 Acculturated example translation
314
18.3 The translation of English familiar idioms into Japanese by L2 learners
315
18.4 The translation of Japanese familiar idioms into English by L2 learners
316
19.1 Description of euthanasia by users
327
19.2 Comparison between onomasiological dictionaries
338
19.3 Comparison of electronic onomasiological dictionaries
339
25.2 Associations
427
25.3 Corpora/Databases
428
List of Tables
25.4 Journals
430
25.5 Networks
432
25.6 Online dictionaries
433
25.7 Publishers
435
xiv
List of Contributors
Arleta Adamska-Sałaciak is Professor and Head of the Lexicography and Lexicology Research Unit in the Faculty of English, Adam Mickiewicz University in Poznań. She is the author of Meaning and the Bilingual Dictionary (2006), as well as a number of works on metalexicography, lexical semantics, philosophy of linguistics, theory of language change and nineteenth-century linguistic thought. She has co-authored and edited several bilingual dictionaries with English and Polish, published, among others, by HarperCollins and Pearson-Longman. In 2016–18, she was Yunshan Chair Professor at the Center for Lexicographical Studies at Guangdong University of Foreign Studies in Guangzhou. Since 2018, she has been a member of the Quality Assurance Committee of the European Master in Lexicography (EMLex) Programme. Kaoru Akasu is Professor of English linguistics at Toyo University in Tokyo, Japan. He was President of the JACET Society of English Lexicography from 2007 to 2013. His fields of interest include pedagogical and bilingual lexicography as well as contrastive studies of English and Japanese. He is editor-in-chief of Compass Rose English-Japanese Dictionary and a coeditor of Lighthouse English-Japanese Dictionary and Luminous English-Japanese Dictionary, all published by Kenkyusha, one of the major dictionary publishers in Japan, and he has also contributed to other dictionary projects. He is also interested in dictionary criticism and has published a number of articles in which he has conducted critical analyses of such dictionaries as OALD6, CALD, NODE, CIDE and LDOCE2, among others. He co-edited Lexicography: Theoretical and Practical Perspectives, Proceedings of the Seventh ASIALEX Biennial International Conference, 2011, Kyoto, and his recent dictionary analyses ‘The first dictionary of English collocations in Japan’ and ‘On the Three Editions of the Kenkyusha’s Dictionary of English Collocations: A Comparative Analysis’ are found in Research on Phraseology in Europe and Asia: Focal Issues of Phraseological Studies and Linguo-Cultural Research on Phraseology, respectively. Amparo Alcina is Professor at the Universitat Jaume I of Castellón (Spain), where she teaches Translation Technology and Terminology in a Translation Degree. She completed a master’s degree in Computational Linguistics at the University of Barcelona in 1993, and defended her PhD thesis on Nominal Phrases and Reference at the University of Valencia in 1999. She was the Director of the Master’s Degree in Translation Technology and Localization at the Universitat Jaume I and coordinates the research team TecnoLeTTra (http://tecnolettra.uji.es), which focuses on language, terminology and translation technology. She leads the research projects ONTODIC, which are working on the creation of onomasiological and combinatory dictionaries based on ontologies, as well as other research and educational projects on digital dictionaries, specialized corpora and translation memories. Currently, she researches on the lexical and terminological resources digitalization using descriptive logics and ontologies. She has published articles in
List of Contributors
journals such as Meta, Target, Perspectives: Studies in Translation, Terminology, International Journal of Lexicography and The Interpreter and Translator Trainer. Paul Bogaards was born in Leiden, the Netherlands, in 1940. He was an experienced teacher and teacher trainer in French, applied linguistics and didactics, and taught French, applied linguistics and lexicology at Leiden University from 1976 to 2002. He compiled several Dutch/French dictionaries, published by Van Dale Lexicografie (Utrecht and Antwerp) and Dictionnaires Le Robert (Paris), wrote books on learner characteristics, language learning, and vocabulary learning in a second language, published numerous papers in various journals and was the editor of the International Journal of Lexicography from 2002 until his death on 3 October 2012. His main research interests concerned the acquisition and testing of lexical knowledge in a second language and the use and usability of dictionaries. Charlotte Brewer has published widely on the OED and on dictionaries more generally, including Treasure-House of the Language: The Living OED (2007), a history of the OED in the twentieth and twenty-first centuries (see further https://oed.hertford.ox.ac.uk). She is currently co-editing an edition of the papers of the first chief editor of the OED, J. A. H. Murray, held by the Bodleian Library in Oxford. Other publications include Editing Piers Plowman: The Evolution of the Text (1996) and the co-edited Traditions and Innovations in the Study of Middle English Literature: The Influence of Derek Brewer (2013). She is Professor of English Language and Literature at Oxford University and a fellow of Hertford College, Oxford. Amy Chi teaches English for academic purposes to EFL/ESL students at the Center for Language Education of the Hong Kong University of Science and Technology. The core of her research work is in the area of pedagogical lexicography underpinning teaching and learning English as a second language. The scope of her research includes EFL learners’ dictionaries, dictionary use training (teaching methodology and material writing) and second language vocabulary acquisition. The greater part of her publications has been research work related to Hong Kong students’ use of English dictionaries to assist their language learning. She was an Advisor for the compilation of the Macmillan English Dictionary (First edition, 2002). She is the founding Secretary (1997–9) and Executive Board member (1999–2011) of ASIALEX. Her recent publications include ‘Reconstructing the Lexicographical Triangle through Teaching Dictionary Literacy to Teachers of English’ (2020) and ‘A Review of Longman Dictionary of Contemporary English (6th edition)’ (2016), both published in Lexicography, Journal of ASIALEX. She gave a keynote speech entitled ‘Dictionary Use Training in the EFL Classroom: Are English Teachers Prepared for the Task?’ at the ASIALEX Conference 2019, hosted by Istanbul University. John Considine is Professor of English at the University of Alberta, Canada, and was formerly an assistant editor of the Oxford English Dictionary. He is editor of The Cambridge World History of Lexicography (2019) and has written and edited a number of other books about dictionaries. The first volume of his history of English dictionaries in the sixteenth, seventeenth and eighteenth centuries is forthcoming from Oxford University Press.
xvi
List of Contributors
Pedro A. Fuertes-Olivera is Full Professor at the University of Valladolid, Extraordinary Professor (Department of Afrikaans and Dutch, University of Stellenbosch), and Tutor at the Spanish Open University. He has also taught at Aarhus University (Visiting Scholar 2007 and Velux Visiting Scholar 2011–12) and Guangdong University of Foreign Studies (October 2008 and 2017). In the last five years, he has also been invited to lecture at conferences and workshops held in different countries around the world. His interest lies in lexicography, translation and language teaching. He has published around 160 academic papers, books and dictionaries, most of which have appeared in well-known academic journals and international collections. He is a member of the Advisory Board of several national and international journals. He is currently the main researcher of several funded research projects and the Director of the International Centre for Lexicography, a research group devoted to dictionary making and theory building. He is also working with Ordbogen.com, the ‘supergazelle’ Danish company that has earned several consecutive Gazelle awards for Danish high-growth companies. Some of his recent books are The Routledge Handbook of Lexicography (2018), Theory and Practice of Specialised Online Dictionaries: Lexicography versus Terminography (2014), e-Lexicography: The Internet, Digital Initiatives and Lexicography (Continuum, now Bloomsbury, 2011), Specialised Dictionaries for Learners (2010) and Pedagogical Specialised Lexicography. The Representation of Meaning in Business and Spanish Business Dictionaries (2008). Reinhard R. K. Hartmann grew up in Vienna (Austria), studying economics and translation, and enjoyed an academic career at three English universities, specializing in applied linguistics and producing nineteen books, including the Dictionary of Lexicography (co-author Gregory James, 1998/2001), the textbook Teaching and Researching Lexicography (2001) and the three-volume reader Lexicography. Critical Concepts (2003). After his retirement from the University of Exeter, he was honorary professor at Birmingham University, where the Dictionary Research Centre was moved in 2001. The LEXeter 1983 conference led to EURALEX and other international initiatives, such as the Lexicographica Series Maior, of which he was one of the editors from 1984 to 2007. He has compiled a list of Reference Portals for the EURALEX website and has been working on an International Directory of Lexicography Institutions. Ai Inoue is Professor at Toyo University in the Department of Economics. Her research interests focus on English phraseology from lexicographical and educational standpoints. She has published a large number of papers on English phraseology in journals and delivered oral presentations at international conferences. Her most recent book is Working toward Systematization of English Phraseology from the Three Perspectives of Morphology, Semantics and Acoustic Phonetics (2019) (original in Japanese). Howard Jackson is Professor Emeritus of English Language and Linguistics, originally in the School of English at Birmingham City University, UK, where he taught for over forty years. He taught modules on lexicography and corpus linguistics, as well as on grammar and lexicology. He is the author of Lexicography: An Introduction (2002) and (with Etienne Zé Amvela) of Introduction to English Lexicology (Third edition, Bloomsbury 2021). He has contributed
xvii
List of Contributors
chapters to the Routledge Handbook of Lexicography (ed. P.A. Fuertes-Olivera 2018) and The Cambridge Companion to English Dictionaries (ed. S. Ogilvie 2020). Amélie Josselin-Leray is Associate Professor in Linguistics and Translation Studies in the Translation Department (D-TIM) of the University of Toulouse Jean Jaurès, France, which she ran for five years. Prior to this, she was Associate Professor for ten years in the English Studies Department. She is now in charge of the Master’s Programme in Translation Studies, which is part of the EMT network. After working as a lexicographer and reviser for the Canadian Bilingual (French/English) Dictionary Project at the University of Ottawa for two years in the late 1990s, she completed a PhD in Translation and Multilingual Lexicology & Terminology at the University Lyon 2, France in 2005. Within the research lab CLLE (CNRS & University of Toulouse 2), her research focuses mainly on lexicography, terminology and corpus linguistics. Her research interests have recently shifted to the use of dictionaries, term banks and translation technologies in the translation process and to the study of crowdsourced and collaborative dictionaries. She has recently published in Neologica, Études de linguistique appliquée and Lexis. Adam Kilgarriff was both Director of Lexical Computing Ltd., and a research scientist working at the intersection of lexicography, corpus linguistics and language technology. His company developed the Sketch Engine (https://www.sketchengine.eu), a leading tool for corpus research used for linguistic research, translation and dictionary-making at Oxford University Press, Cambridge University Press and many other companies and universities. His PhD, on ‘polysemy’, was from the University of Sussex and he subsequently worked at Longman Dictionaries and the University of Brighton. He was a Visiting Research Fellow at the University of Leeds. He was active in moves to make the web available as a linguists’ corpus and was the founding chair of ACL-SIGWAC (Association for Computational Linguistics Special Interest Group on Web as Corpus). Adam died on 16 May 2015. See also http://www.kilgarriff.co.uk. Barbara Ann Kipfer is a lexicographer, currently for Zeta Global. She has worked for such companies as Dictionary.com and Thesaurus.com, Answers.com, Ask Jeeves, Bellcore/Telcordia, General Electric Research, IBM Research, idealab, Knowledge Adventure and Wolfram|Alpha. Barbara holds a PhD and MPhil in Linguistics (University of Exeter), a PhD in Archaeology (Greenwich University), an MA and a PhD in Buddhist Studies (Akamai University), and a BS in Physical Education (Valparaiso University). She is the author of 14,000 Things to Be Happy About and eighty other books including thesauri and dictionaries, trivia and question books, archaeology reference books, and happiness and spiritually themed books. Her websites are www.referencewordsmith.com and www.thingstobehappyabout.com. Annette Klosa-Kückelhaus holds an MA and PhD in German linguistics from the universities of Munich and Bamberg. She has been a lexicographer for Duden and has (co-)authored extensively on lexicography. Currently she heads the Lexicography and Language Documentation area and is chief editor of an online dictionary of neologisms at the Leibniz-Institute for the German Language (IDS) at Mannheim.
xviii
List of Contributors
Jette Hedegaard Kristoffersen was trained as a sign language interpreter at the Copenhagen Business School and subsequently received a BA in Linguistics from the University of Copenhagen. She has since worked in the field of lexicography and taught linguistics and ethics for sign language interpreter training at the Centre for Sign Language (now a department of University College Copenhagen). She was the leader of the Danish Sign Language Dictionary and the Danish Sign Language Corpus 1999–2020 at the Centre for Sign Language. She is a member of the Danish Sign Language Board. She has written sign language–related articles for several publications, among these a chapter on sign language lexicography in Electronic Lexicography (2012) and chapters in SignGram Blueprint – A Guide to Sign Language Grammar Writing (E-book, 2017). Robert Lew is Professor in the Faculty of English at Adam Mickiewicz University in Poznań, Poland. His interests centre on dictionary use, and he has been involved in a number of research projects, including topics such as access-facilitating devices, definition formats, dictionaries for production, writing assistants, digital dictionary interfaces and training in dictionary skills. He is the Editor of the International Journal of Lexicography (Oxford University Press). He has also worked as a practical lexicographer for various publishers, including Harper-Collins, PearsonLongman and Cambridge University Press. Frank Michaelis holds an MA in Philosophy and German linguistics from the University of Göttingen. He has been lexicographer for the ‘Deutsches Wörterbuch von Jacob und Wilhelm Grimm, Neubearbeitung’ and later specialized in internet lexicography. Currently, he is Researcher in the Department for Lexical Studies at the Leibniz-Institute for the German Language (IDS) at Mannheim and responsible for the development of the online dictionary portal OWID. Kilim Nam is a professor at Kyungpook National University (Daegu, South Korea) in the Department of Korean Language and Literature. She holds a PhD in Korean linguistics (‘On the Copula ida Structures in Contemporary Korean’, 2004) from Yonsei University (Seoul). Her research focuses on corpus linguistics and language performance and she has published on neologism and lexicography, including The Korean Neologism Investigation Project: Current Status and Key Issues (2020). Hilary Nesi is Professor of English Language at Coventry University, UK. Her research activities mostly concern the design and use of reference tools, corpus analysis and the use of English for academic purposes. She was principal investigator for the projects to create the BASE corpus of British Academic Spoken English and the BAWE corpus of British Academic Written English, and she is Editor in Chief of the Journal of English for Academic Purposes and the Elsevier Encyclopedia of Language and Linguistics (forthcoming), She is also lead educator for the FutureLearn MOOC ‘Understanding English Dictionaries’. Sandro Nielsen is affiliated with the Department of English, School of Communication and Culture at the Faculty of Arts, Aarhus University, Denmark, where he is Associate Professor. He has an MA in English (LSP for translators and interpreters) from 1987 and was awarded his PhD degree in specialized lexicography in 1992. He has published extensively on theoretical and xix
List of Contributors
practical lexicography and is the author of The Bilingual LSP Dictionary: Principles and Practice for Legal Language (1994), co-editor of Lexicography in the 21st Century (2009), Lexicography at a Crossroads (2009). He is the author and co-author of a printed and an online bilingual law dictionary, three printed and twelve online accounting dictionaries, and a contributor to the Manual of Specialised Lexicography (1995). His main research areas are principles for online LSP dictionaries, user guides in dictionaries, lexicographic information costs and academic dictionary reviewing. Teaching interests focus on lexicography and legal translation for translators and interpreters. Vincent B. Y. Ooi is Associate Professor of English Language and Linguistics at the National University of Singapore. His teaching and research interests include lexicology and lexicography, corpus linguistics, computer-mediated communication and varieties of English. Some recent publications include articles on ‘Using the Web for Lexicographic Purposes’ (Routledge Handbook of Lexicography) and ‘Lexicography and World Englishes’ (in World Englishes: Rethinking Paradigms, Routledge). Verónica Pastor is Associate Professor at Valencian International University (VIU). She was a PhD researcher at Universitat Jaume I of Castellón (Spain), Department of Translation and Communication, where she was collaborating in the TecnoLeTTra research team (http:// tecnolettra.uji.es), which focuses on language, terminology and translation technology. She completed a master’s degree in Translation Technology and Localization at Universitat Jaume I in 2008, and she obtained her PhD degree in translation, terminology and the knowledge society from Universitat Jaume I of Castellón in 2013. She received an extraordinary doctorate award in 2015 for her PhD thesis on techniques and strategies in electronic resources for onomasiological searches. Her recent publications focus on search techniques in electronic dictionaries. Tadeusz Piotrowski is Professor of Linguistics at Wrocław University in Poland. He has published more than one hundred scholarly papers and reviews in journals and books in Poland, the United Kingdom, the United States, Hungary, the Czech Republic, Russia and Germany, as well as three books on lexicography. He is interested in the bilingual and monolingual lexicography of Slavic languages and English, in both their theoretical and practical aspects, as well as in computer lexicography; and he has edited over thirty dictionaries published in Poland and Germany. His major current publications are on semantics in a computer relational lexico-semantic dictionary, the Polish Wordnet (plWordnet). Danie Prinsloo is Professor in African languages and former head of the department of African languages and chair of the School of Languages at the University of Pretoria. He is well known nationally and internationally for his groundbreaking research in the field of African language lexicography. He is the author or co-author of more than 100 scientific articles, books and dictionaries and has presented more than 100 papers on African language lexicography. He started the corpus era in Africa two decades ago and plays a major role in the development of corpus-based lexicography and in particular new designs for paper and electronic dictionaries. He is a five-time awardee of the Exceptional Academic Performer Award of the University of Pretoria and acknowledged as one of this University’s Centenary Leading Minds. In 2010 he xx
List of Contributors
received an award from the Pan South African Language Board for his contribution to effective innovation of technology to promote multilingualism. Franck Sajous is a Research Engineer at CLLE, a research lab in linguistics and psychology at the CNRS & University of Toulouse Jean Jaurès, France. He has an educational background in computer science and joined his current lab in 2003, where he has been involved in corpus linguistics and NLP research. His research interests initially lay in the study of free online dictionaries as potential resources for NLP. He has built several electronic lexicons and machinereadable dictionaries based on the English, French and Italian editions of Wiktionary. He then conducted quantitative and qualitative studies aiming to describe amateur dictionaries in terms of lexical coverage, treatment of neology and specialized domains, neutrality vs. point of view, etc. Recently, he has compared the dictionaries written by the crowds to those written by professional lexicographers in order to assess their complementarity rather than seeing them as competitors. He has recently published in Neologica, Études de linguistique appliquée and Lexis. Gerardo Sierra Martinez is Researcher and Head of the Language Engineering Group at Universidad Nacional Autónoma de México (UNAM). His work focuses on research and development in corpus linguistics and computational lexicography. Regarding the former, he has published the book Introduction to corpus linguistics, which constitutes a reference in the linguistic and language technology community. He is the researcher who has put more corpora on the Internet in Mexico, with his own technology, including the GECO corpus manager. Among them are the Corpus of Sexualities in Mexico, the RST Spanish Treebank and the Parallel Corpus of Mexican Languages. On computational lexicography, his work on onomasiological dictionaries, terminological extraction systems and definitional contexts are recognized worldwide. Some of his websites are: www.corpus.unam.mx and www.iling.unam.mx. Verna Stutzman obtained an MA in Linguistics from the University of Manitoba in 1978 and an MA in Counselling from Dallas Baptist University in 2005. She did field work in Papua New Guinea from 1984 to 1995 under the auspices of SIL Papua New Guinea as a linguist/ translator in the Tauade language project (https://www.webonary.org/tauade/), and then with the Lou language community (https://www.webonary.org/loudictionary/). From 1995 to 2005, she was part of the LinguaLinks development team for SIL International and is now Coordinator for SIL International Dictionary and Lexicography Services. Verna is based in Greenville, Texas. In 2018, she co-authored ‘Single-event Rapid Word Collection Workshops: Efficient, Effective, Empowering’ with Brenda Boerger. As of January 2020, Verna has coordinated the publication of more than 250 dictionaries on Webonary.org from more than 40 countries and published 12 Dictionary Apps in the Google Play store under the Webonary-SIL label. Her team has facilitated 13 Rapid Word Collection workshops in 10 countries and converted around 20 lexical databases in a variety of formats into Fieldworks Language Explorer software. Currently, she is leading the SIL International Dictionary and Lexicography team in authoring an online Dictionary-Making and Lexicography course, https://sites.google.com/sil.org/dls-course/. Lars Trap-Jensen has an educational background in general linguistics, Greenlandic and social studies (Aarhus University), with an MPhil in linguistics (Cambridge University). He has been a xxi
List of Contributors
lecturer in Danish language at the universities of Basel and Zürich. Since 1994 he has been working as a practical lexicographer at the Society for Danish Language and Literature, Copenhagen, since 2003 as the managing editor of The Danish Dictionary and the dictionary site ordnet.dk. Other projects include the digitization of the Ordbog over det danske Sprog (Dictionary of the Danish Language, 28 volumes + 5 supplementary volumes), development of the Danish wordnet, DanNet, and the Danish Thesaurus. He is engaged in promoting the field of lexicography, with active memberships in the Danish Association for Lexicography, in the Nordic Association for Lexicography, in the European Association for Lexicography, EURALEX (former President), and in Globalex. With over fifteen publications in the last five years, he has published most recently in Oslo Studies in Language (2020), Dictionaries. Journal of the Dictionary Society of North America Vol. 41, Issue 1 (2020), Slovenščina 2.0, Vol. 7, Issue 1 (2019). Thomas Troelsgård received an MA in Russian language and computational linguistics from the University of Copenhagen and has since then worked in the field of lexicography. He participated in the development of the Danish Sign Language Dictionary at the Centre For Sign Language (now a department of University College Copenhagen). At present he is working partly at the Centre for Sign Language, and partly at the Society for Danish Language and Literature. Thomas Troelsgård has written sign language–related articles for several publications, among these a chapter on sign language lexicography in Electronic Lexicography (Oxford University Press, 2012). Kevin Warfel obtained a BA mathematics from Goshen College in 1984 and an MA linguistics from the University of Texas at Arlington in 1987. From 1990 through 2009, Kevin worked with SIL in Burkina Faso, West Africa, serving first in administration, then as a field linguist with the Puguli (pug) language community, and finally as a language technology consultant. While in Burkina Faso, Kevin presented a paper on the Puguli numbering system at a local colloquium and produced a draft of a phonological description of Puguli (in French only). Since July 2009, he has been based in Waxhaw, North Carolina. Kevin worked with the SIL software development team as a software analyst for a couple of years. Since October 2012, as the Associate Dictionary & Lexicography Services Coordinator, he has overseen Rapid Word Collection Workshops. He is currently assisting Verna Stutzman with the redaction of an online lexicography course and plans to author a chapter on Rapid Word Collection in a book due to be published in 2022. Shigeru Yamada is Professor at Waseda University, Japan. He was a Co-Editor-in-Chief of Lexicography: Journal of ASIALEX (2016–20). He specializes in EFL and bilingual lexico graphy. His recent publications include Guide to the Practical Usage of English Monolingual Learners’ Dictionaries: Effective Ways of Teaching Dictionary Use in the English Class (2014, OUP), contribution to Oxford Companion to the English Language (2/e, 2018), and ‘Development of Specialized Hand-held Electronic Dictionaries with Special Reference to Those for Medical Professionals and Students’ (2019, 3L: Language, Linguistics, Literature®). He has contributed to a number of dictionary projects, such as Dictionnaire Japonais-Français/ Français-Japonais (2009, Assimil/K Dictionaries) and Daily Concise Japanese-English Dictionary (8/e, 2016, Sanseido).
xxii
List of Contributors
Cuilian Zhao is Professor and Vice Director of the Lexicography Research Center at Sichuan International Studies University, Chongqing. Her research interests focus on bilingual lexicography, psycholinguistics and stylistics. Her most recent publications are ‘Representation and Motivations of the Semantic and Contextual Information of Polysemous Entries in the Chinese-English Dictionary (Unabridged)’, Lexicographical Studies 2017(4):1-9; ‘On Lexical Equivalence in the Bilingual Dictionary – Philosophical and Mental-Representation Reflections’, Proceedings of the 13th International Conference of the Asian Association for Lexicography (ASIALEX 2019). Inge Zwitserlood obtained MA and PhD degrees in linguistics from Utrecht University, specializing in sign language linguistics early on, in particular the morphology and morphosyntax of sign languages. She is affiliated with the Centre for Language Studies at Radboud University Nijmegen, where she has, among other things, been involved in the construction of a sign language corpus. Currently, she is working on Sign Language of the Netherlands (NGT) and Turkish Sign Language. Her recent publications include: ‘Classifiers’, in R. Pfau, M. Steinbach and B. Woll (eds) Sign Language: An International Handbook (2012, pp. 158–86); (with P. M. Perniss and A. Ozyurek) ‘An Empirical Investigation of Expression of Multiple Entities in Turkish Sign Language (TİD): Considering the Effects of Modality’ (Lingua 122, 2012, 1636–67); (with O. Crasborn and J. Ros) ‘The Corpus NGT: An Online Corpus for Professionals and Laymen’ (2008; available at http://www.ru.nl/corpusngt).
xxiii
xxiv
1
Introduction Howard Jackson
This introductory chapter has two purposes: to characterize the field of lexicography; and to outline the aims and scope of the book.
1 What is lexicography? The term ‘lexicography’ is used in two distinct senses: first, it refers to the compilation of dictionaries; and second, it refers to the study of dictionaries. It is the first sense that is usually represented in dictionary definitions, such as that in Collins English Dictionary (online): ‘the process or profession of writing or compiling dictionaries’. The terms ‘practical lexicography’ or ‘lexicography practice’ are used for the first sense, and ‘lexicography theory’ or ‘dictionary research’ for the second. We will return to the question of theory a little later. For the second sense, the term ‘metalexicography’ has also been coined, and those who engage in the study or research of dictionaries are called metalexicographers. It is to this audience, and especially to those who are becoming metalexicographers, that this volume is primarily addressed. Dictionary research covers a wide range of activities, and metalexicographers may become experts in one or more aspects, many of which are represented in the contributions to this volume. Some concentrate on the history of dictionary making, others on historical dictionaries. Some investigate the typology of dictionaries, distinguishing monolingual from bilingual, historical from synchronic, general from specialized, alphabetical (semasiological) from thematic or topical (onomasiological). Others investigate the compilation process, including the use of corpora, or the design and structure of dictionaries. Still others engage in dictionary criticism, evaluating the structure and content of dictionaries, both in general terms (the macrostructure) and in terms of the information contained in individual entries (the microstructure). Some concentrate their interest on pedagogical lexicography, the provision of dictionaries for language learners, and how these can contribute to the learning process, or on dictionaries for sign languages, as pedagogical aids to deaf communities. Others research the users and uses of dictionaries more generally, seeking to discover how different groups use dictionaries, and whether the design, structure and content of dictionaries match users’ expectations and reference skills. Still others have turned their attention to e-lexicography, which is almost certainly the future of lexicography – at least one publisher has already ceased print publication and is offering its dictionary only on the internet (Macmillan Dictionary Blog, 5 November 2012). Where, we may ask, does lexicography sit in the panoply of academic and scholarly disciplines? Is it a branch of linguistics? Or does it belong somewhere else? Or is it an independent
The Bloomsbury Handbook of Lexicography
discipline? There is no single answer to these questions from those engaged in metalexicography, as you will observe from the contributions to this volume. In 1996, Reinhard Hartmann wrote a chapter (in Hartmann (ed.) 1996) entitled ‘Lexicography as an Applied Linguistic Discipline’. Hartmann’s criteria for an applied linguistic discipline, as expressed in Hartmann (2001: 33)), were that it should be ‘linguistic in orientation, interdisciplinary in outlook and problem-solving in spirit’, which he claims (Hartmann 2001) applies to pedagogical lexicography and perhaps to computational lexicography, though not to other aspects of (meta)lexicography. Indeed, the introduction to the Dictionary of Lexicography (Hartmann and James 1998: vi) proclaims: Lexicography, often misconceived as a branch of linguistics, is sui generis, a field whose endeavours are informed by the theories and practices of information science, literature, publishing, philosophy, and historical, comparative and applied linguistics. McArthur (1998: 219) places lexicography within a newly minted discipline of ‘reference science’, along with ‘encyclopedics’ and ‘tabulations’, ‘directories’ and ‘catalogues’ (see also McArthur 1986). Addressing the question ‘Who is a lexicographer?’, Bergenholtz and Gouws (2012) are adamant that lexicography is not a subdiscipline of linguistics; it is ‘an independent discipline’, part of information science; there is a range of people who can be called lexicographers, and they do not necessarily have a background in linguistics. Indeed, lexicographers include both ‘those people writing dictionaries but equally those people writing about dictionaries’ (Bergenholtz and Gouws 2012: 76). The practice of lexicography was for centuries independent of linguistics. It was Philip Gove, the editor-in-chief of Webster’s Third New International Dictionary (1961), who was the first practising lexicographer to acknowledge explicitly the influence of modern linguistics on the compilation of the dictionary of which he was editor (Adams 2020: 163). Since then, linguistics has informed lexicographic practice, especially in respect of the genre of learners’ dictionaries, and to an extent that of native-speaker dictionaries as well. Lexicographic practice has, though, always drawn on a range of other disciplines and crafts, relevant to reference works generally. For this reason, many lexicographers consider it to be either a discipline in its own right (sui generis) or a branch of reference science or information science. One of the debates within lexicography addresses the issue of whether there is such a thing as lexicographic theory. Some would respond to this question with a resounding ‘No’ (e.g. Béjoint 2010). Others give a cautious ‘No’, e.g. Rundell (2012: 83); ‘it is not clear that there is a role for “lexicographic theory” as such.’ Hartmann (1996) makes a distinction between ‘practical’ lexicography (dictionary compilation) and ‘theoretical’ lexicography (dictionary study). Not all see lexicographic theory in this way. You will find that some contributors to this volume, notably Piotrowski (in 4.10), respond to the question, understanding it in a more scientific sense, with an unequivocal ‘Yes’ and present arguments why this should be so. We are engaged in a fascinating discipline. Long may the discussions continue!
2 Aims and organization of the Handbook This Handbook of Lexicography is aimed primarily at students of lexicography who are proposing to undertake research in one of the areas covered by ‘lexicography’. While it cannot possibly 2
Introduction
be comprehensive in its coverage – think of the four-volume encyclopaedia of lexicography (Hausmann et al. 1989–91, Gouws et al. 2014) – the Handbook aims to give a broad overview of the discipline, dealing with the main trends and issues in the contemporary study of lexicography; and the contributions have been selected with this purpose in mind. They are all written by experts in their field who are at the cutting edge of lexicographic research. Originally published as The Bloomsbury Companion to Lexicography (Jackson (ed.) 2013), the Handbook represents an updated and expanded edition of the Companion. Most of the original chapters have been updated by their original authors; Chapters 2 and 7 are, however, as in the Companion, as both authors (Paul Bogaards and Adam Kilgarriff) have in the intervening years passed away. There are six additional chapters, and a names index has been added. The Handbook contains some twenty-five contributions in three parts, plus this Introduction and Chapter 2, which is a free-standing review of research in lexicography over the last six or so decades. Part I (Chapters 3–6) contains four contributions on research methods and problems; Part II (Chapters 7–20) contains fourteen contributions on current research and issues in lexicography; and Part III (Chapters 21–25) looks forward with four contributions on directions in which the study of lexicography appears to be travelling, as well as a compendium of resources considered to be useful to a lexicography researcher. The volume concludes with a glossary of key terms in lexicography and an annotated bibliography of recent significant work in lexicography research, as well as pointers to where further bibliographical information may be found. Rather than have a single list of references at the end of the book, it has been decided to retain the reference list supplied by each author at the end of their contribution. While the disadvantage may be that some general works will be referenced more than once, this will not be the case with the majority of the references, as each chapter concentrates on one specific area of lexicography research and references the works pertinent to that area. The advantage is that the reader will more readily be able to ascertain the references that pertain to each individual contribution and area of research.
References Adams, M. (2020), ‘The Making of American English Dictionaries’ in S. Ogilvie (ed.), The Cambridge Companion to English Dictionaries, Cambridge: Cambridge University Press, 157–69. Béjoint, H. (2010), The Lexicography of English. From Origins to Present, Oxford: Oxford University Press. Bergenholtz, H. and R.H. Gouws (2012), ‘Who is a lexicographer?’, Lexicon 42, 68–78. Collins English Dictionary (online) at www.collinsdictionary.com [accessed 21 September 2020]. Gows, R.H., U. Heid, W. Schweickard and H.E. Wiegand (eds) (2014), Dictionaries. An International Encyclopedia of Lexicography. Supplementary Volume: Recent Developments with Special Focus on Computational Lexicography, Berlin and New York: Mouton de Gruyter. Hartmann, R.R.K. (2001), Teaching and Researching Lexicography, Harlow: Pearson Education. Hartmann, R.R.K. (ed.) (1996), Solving Language Problems, Exeter: University of Exeter Press. Hartmann, R.R.K. and G. James (1998), Dictionary of Lexicography, London and New York: Routledge. Hausmann, F-J., O. Reichmann, H.E. Wiegand and L. Zgusta (1989–91), Wörterbücher/Dictionaries/ Dictionnaires: An International Encyclopedia of Lexicography, vols 1–3, Berlin: Walter de Gruyter.
3
The Bloomsbury Handbook of Lexicography
Jackson, H. (ed.) (2013), The Bloomsbury Companion to Lexicography, London and New York: Bloomsbury Academic. Macmillan Dictionary Blog (2012), http://www.macmillandictionaryblog.com/bye-print-dictionary [accessed 7 November 2012]. McArthur, T. (1986), Worlds of Reference: Lexicography, Learning and Language from the Clay Tablet to the Computer, Cambridge: Cambridge University Press. McArthur, T. (1998), Living Words: Language, Lexicography and the Knowledge Revolution, Exeter: Exeter University Press. Rundell, M. (2012), ‘It works in practice but will it work in theory? The uneasy relationship between lexicography and matters theoretical (Hornby Lecture)’ in R.V. Fjeld and J.M. Torjusen (eds), Proceedings of the 15th EURALEX Congress, Oslo: University of Oslo, 47–92. Webster’s New International Dictionary, Third edition (1961), ed. Philip Gove, Springfield, MA: Merriam-Webster.
4
2
A history of research in lexicography Paul Bogaards
1 Historical overview Although dictionary criticism is almost as old as dictionaries themselves (at least as far as the Western world is concerned, see Hartmann 2008: 136) and even if we can find more theoretical reflections on dictionaries from the sixteenth century on (Hausmann 1989), a more focused scientific study of lexicographical works dates from the middle of the twentieth century only. In 1959 the French journal Cahiers de lexicologie was launched and one year later a centre for the development of a new national dictionary was created in Nancy (France). In November 1960 a first conference on ‘Problems in Lexicography’ was held in Bloomington, Indiana (see Householder and Saporta 1962). Ten years later, again in the United States and in France, the first handbooks on lexicography appeared. Ladislav Zgusta published his Manual of Lexicography in 1971, and in the same year Jean and Claude Dubois’ Introduction à la lexicographie and Josette Rey-Debove’s Étude linguistique et sémiotique des dictionnaires français contemporains came out. A year before, an issue of the journal Langages had also been devoted to lexicography (Rey-Debove 1970). A few years later, in 1975, the Dictionary Society of North America (DSNA) was founded, and their journal Dictionaries appeared for the first time in 1979. A new impulse to the study of dictionaries was given in the 1980s. In 1983 Reinhard Hartmann invited scholars and lexicographers from all Europe to Exeter (UK) to attend a conference (see Hartmann 1983) which was to be the first in a long series organized by EURALEX, the European association of people working in lexicography and related fields that was created during the conference. In 1984 Sidney Landau published his Dictionaries. The Art and Craft of Lexicography (second edition 2001). In 1985 the first issue of Lexicographica. International Annual for Lexicography appeared. In 1988 the first volume of the International Journal of Lexicography was published, to be followed in 1991 by Lexikos, an annual journal that is now the mouthpiece of AFRILEX, the African sister association of EURALEX. The international encyclopaedia of lexicography was to appear around this time under the title Wörterbücher/ Dictionaries/Dictionnaires (Hausmann et al. 1989–91). From the 1990s on lexicography has been an established academic subject in a number of universities, e.g. Aarhus (Denmark), Barcelona (Spain), Birmingham (UK), Cergy-Pontoise (France), Göteborg (Sweden), Lille (France) and Poznań (Poland) (see also Hartmann, Chapter 25). In about half a century dictionaries and lexicography have become a mature field of scientific endeavour, having its own traditions and institutions (see also Gouws 2004).
The Bloomsbury Handbook of Lexicography
2 Research in lexicography As may already be clear from the titles of some of the handbooks quoted above, the exact content of what is covered by the term ‘lexicographic research’ is not always the same: for one author lexicography is essentially a craft or even an art, whereas others insist on the necessity of a sound scientific basis for dictionaries. Reading through the relevant literature makes it clear also that the points of view taken and the types of dictionaries studied may vary substantially. Zgusta (1971), for instance, starts with the discussion of a number of linguistic (semantic, morphological, combinatorial and stylistic) aspects which cover more than half of his book. Dubois and Dubois (1971), on the other hand, present dictionaries, monolingual as well as bilingual ones, language dictionaries as well as encyclopaedic dictionaries, as pedagogic texts at the service of an intended group of users, devoting only limited space to some semantic and stylistic issues. Rey-Debove (1971) defines the dictionary as ‘a work that describes a language through a lexical approach’, paying no attention to the user nor to bilingual or encyclopaedic dictionaries. In the following sections I will try to give an overview of the main research trends in a number of fields, following the distinctions proposed by Hartmann (2008: 137). As will become clear, progress has been attained at quite unequal paces, and methodologies have not been worked out in all domains.
2.1 Dictionary history Historical research on dictionaries is mainly concerned with two subjects: the influence of older dictionaries on newer ones and the new elements that have been introduced by great lexicographers. In other words: tradition and innovation. According to Landau (2001: 43), the ‘history of English lexicography usually consists of a recital of successive and often successful acts of piracy.’ Formulated more neutrally as borrowing, the phenomenon of the influence existing dictionaries have had on newly produced ones can be intricate and intriguing, as is, for instance, demonstrated by Rodríguez-Álvarez and Rodríguez-Gil (2006), who studied two eighteenth-century English dictionaries. In the same vein a number of French seventeenth- and eighteenth-century dictionaries are studied by a research group in Canada (Cormier 2003, Cormier and Fernandez 2004, 2005, Cormier 2010). In all cases, systematic in-depth comparisons are made between the dictionaries involved, and in some cases clearly formulated hypotheses are confronted with considerable amounts of precise data. A comparable methodology is sometimes used in order to find out what kind of author may have compiled a given dictionary, e.g. the anonymous sixteenth-century Vocabulario trilingüe studied by Clayton (2003). In other pieces of historical research the focus is more on what some dictionaries or lexicographers have introduced as innovations. It goes without saying that this type of research is also based on a broad knowledge of the dictionary scene of the time and on comprehensive comparisons of dictionaries. A case in point is a special issue of the International Journal of Lexicography on Dr Samuel Johnson and his Dictionary of the English Language. This dictionary was first published in 1755 and is considered as a leap towards modern lexicography (McDermott and Moon 2005). 6
A History of Research in Lexicography
2.2 Dictionary criticism Assessing the quality of a dictionary is not a simple affair (see Akasu, Chapter 4). There are so many aspects that can be studied and the evaluations can differ so vastly from one point to another, that it is practically impossible to give a final mark or to prefer one dictionary on all points over another. Dictionary criticism, be it in the form of discussions about the compiling of new dictionaries or in the form of assessments of existing dictionaries, almost never covers the totality of a dictionary. Most dictionary reviews that appear in newspapers are limited to what is called the macrostructure, the list of entries, in which neologisms are welcomed or criticized. In more scientific evaluations a limited number of aspects is normally taken into account, mainly depending on the expertise of the evaluator. Going through the many reviews of individual dictionaries that have been published in the International Journal of Lexicography does not lead to any uniform approach or methodology, and only vague conclusions could be drawn from studies of dictionary reviews by Ripfel (1989) and Jehle (1990). Nevertheless, some attempts have been made at making reviews more systematic or more comprehensive. Wiegand (1998, 2002) has tried to cover a maximum of viewpoints on two learners’ dictionaries of German by publishing in one volume the reviews of a great number of specialists, who comment each on one particular aspect of the dictionary. Chapters are devoted to grammar, morphology, phonetics, orthography and so on. But in spite of the richness of this approach, it cannot be seen as exhaustive, and it would be difficult to draw any general conclusions from the data presented. A comparable path is followed by a group of Japanese researchers who have published a number of reports in Lexicon, a journal that is published by the Iwasaki Linguistic Circle in Tokyo. In recent issues one can find thorough analyses of recently published learners’ dictionaries (Masuda et al. 2008, Kanazashi et al. 2009, Kokawa et al. 2010, Dohi et al. 2010). All analyses are globally structured along the same lines: macrostructure, phonetics, definitions, examples and illustrations are in each case studied in about this order, and a user study is in principle included at the end. However, the depth of the analysis is not always the same and additional subjects may vary considerably. In Kanazashi et al. (2009), for instance, chapters on various types of notes and on etymology are added, whereas in Masuda et al. (2008) a chapter is included about ‘vocabulary builders’. Another attempt at structuring reviews of a given type of dictionaries is done by Bogaards (1996). Following step by step the problems the learner is supposed to face while reading or writing a text in a foreign language, he systematically compares four learners’ dictionaries of English. This methodology was followed in the same way for other learners’ dictionaries (see for instance, Bogaards 2010a). Recently, Coleman and Ogilvie (2009) introduced evidence-based statistical, textual and qualitative techniques in what they call ‘forensic dictionary analysis’.
2.3 Dictionary typology Often for the layman ‘the’ dictionary is just the one they happen to have on their shelves or in their computer. But for researchers it is crucial to exactly determine the type of dictionary they want to study. This is not always an easy task. Dictionaries can be categorized in many
7
The Bloomsbury Handbook of Lexicography
ways. According to Rey (2003: 89), ‘the typology of dictionaries is almost as complex as that of leguminous plants or of arthropods … and still awaits its Linnaeus or its Cuvier.’ Typologies are set up from very different points of view such as their comprehensiveness (unabridged versus college dictionaries), the number of languages involved (monolingual versus bilingual or multilingual dictionaries), the type of data treated (linguistic versus encyclopaedic dictionaries) and the language of the users (native-speaker versus foreign learner dictionaries). After having discussed a fair number of attempts at categorizing all existing or possible dictionary types, Béjoint (2010: 45) concludes that ‘it is impossible to classify dictionaries in a way that would be both orderly and realistic’, and then presents a list of seven oppositions: 1. monolingual and bilingual dictionaries – including dialect and slang dictionaries as monolingual, and bilingualized, semi-bilingual and hybrid dictionaries as bilingual; 2. general and specialized dictionaries – the latter category including dictionaries giving information on pronunciation, etymology, synonyms, collocations and so on; 3. encyclopaedic and linguistic dictionaries – the first ones including proper names of any kind; 4. foreign learners’ and native speakers’ dictionaries; 5. dictionaries for adults and dictionaries for children – or for any other age group; 6. alphabetized and non-alphabetized (i.e. ideological, thematic or onomasiological) dictionaries; 7. electronic and paper dictionaries. Although this list of parameters will indeed clearly categorize the majority of dictionaries, it does not take into account differences between academic and more practical dictionaries, or between descriptive and prescriptive dictionaries. Likewise, it would be easy to find omissions when using the admittedly restricted list of oppositions presented by Atkins and Rundell (2008: 24ff) or the ‘general dictionary typology’ given by Svensén (2009: 22ff).
2.4 Dictionary structure Dictionaries are structured on a number of levels. Rey-Debove (1971) first introduced the terms ‘macrostructure’ and ‘microstructure’. The macrostructure of a dictionary is the list of entries or its nomenclature, i.e. a collection of entities that are selected in order to give an adequate view of the domain that is meant to be described. In a language dictionary, the macrostructure is supposed to reflect a rational choice of the lexical stock that is available in that language. As language dictionaries are never complete (only dictionaries of an author’s works or of a ‘dead’ language can be), all lexicographers have to make choices about which items to include and which not. Selections are made in view of the intended use and users of the dictionary. Frequency normally plays a central role in the selection of the headwords to include, but this criterion is not always decisive. Children’s dictionaries may include words that are not very frequent in the language as a whole or even in children’s language, but the fact that many children tend to stumble over these items may lead to their inclusion. A dictionary of synonyms will leave out a number of the
8
A History of Research in Lexicography
most frequent words for the sole reason that they don’t have any synonyms. On the other hand, even if frequency counts do not yield a given element of a series (names of the days, military ranks, colour names, etc.), the lexicographer has to make sure that a whole range is represented in a systematic way. The microstructure of a dictionary is the nature of the information given about the headwords and the way this information is presented. Lexicographers have many decisions to make about what information will be presented in what form and in what order. Pronunciation data, etymology, word class, definition, examples, collocations, homonyms and many other aspects of words can be part of the microstructure, which will present all elements selected in the same order and according to the same standards in each entry. Although the distinction between macro- and microstructure is quite straightforward, in some cases elements can be part of either one. This applies to compounds and multi-word expressions as well as to derived forms. Compounds like double time or old girl used to be part of the microstructure not so long ago but are more and more treated as headwords in more recently produced dictionaries (cf. Atkins and Rundell 2008: 181). get around or pull off tend to occupy a place that is somewhere in between the two structure types: they are often presented under a given headword, but have their own microstructure, which is in principle structured in the same way as the one used for independent headwords. Multi-word expressions such as fixed phrases (night after night, finer feelings), similes (drunk as a skunk, deaf as a post) and proverbs (a rolling stone gathers no moss) are most of the time treated in the microstructure, although their fixed status (when they have one, which is not always the case) could justify their presence in the list of headwords. Adverbs like photographically, nouns like Islamism, i.e. rather infrequent words that are derived from more frequent elements, are sometimes treated in the microstructure of the simplex headwords they are derived from, although they are lexical items with the same linguistic status as these headwords. In principle, headwords are given in their canonical form: infinitive for verbs, singular for nouns, etc. Irregular inflected forms may be listed in the microstructure but may have their place in the macrostructure as well, especially when the dictionary is meant for users having another native language (the receptive side of a bilingual dictionary or a learners’ dictionary). More recently, two other terms have been introduced to describe the structural composition of dictionaries. The mediostructure or cross-reference structure of a dictionary is the network of internal references that makes information available to the users that is present at other places in the dictionary (synonyms, references to tables, or pictorial illustrations, etc.). The megastructure is the relationship and order between the components of a dictionary: the front matter including, for example, a foreword, an introduction and a list of abbreviations and symbols, the lemma list; and the back matter, which may contain special lists of items or grammatical information and so on (for more details on dictionary structures see Svensén 2009). As will be clear, dictionary structure is most of all a subject of interest for practising lexicographers, who have to devise the content and the presentation of their dictionary. Little scientific research has been done so far in this field. The research that has been done pertains to the next section, where the effects of the choices made by lexicographers are confronted with the use that is made by groups of users.
9
The Bloomsbury Handbook of Lexicography
2.5 Dictionary use The first point the participants of the first conference on lexicography unanimously agreed on was that ‘dictionaries should be designed with a special set of users in mind and for their specific needs’ (Householder 1962: 279). It took about twenty years, however, before research into dictionary use began to blossom (see Nesi, Chapter 5). At first, from 1980 onwards, questionnaires were used to ask users how often they used a dictionary, what they looked up and if they were satisfied with the information found. The results yielded by these studies (see Bogaards 1988 for an overview) were not very informative and did not lead to any precise conclusions about the strengths or weaknesses of specific (types of) dictionaries. Since the 1990s, research on dictionary use has been more and more experimental in nature and the methodologies used have evolved, seeking to confirm or reject strictly formulated hypotheses and using ever more sophisticated statistics. The main topics that have been studied concern the comprehension of words or texts, the production of words or texts, and vocabulary acquisition through dictionary use. In Bensoussan et al. (1984), more than 800 students had to read texts in English as a second language and to answer multiple-choice questions about the content of these texts; they were allowed to use a monolingual or a bilingual dictionary. The researchers did not find significant differences in the results obtained by students who did use a dictionary, bilingual or monolingual, or who did not use a dictionary (cf. Nesi and Meara 1991 for similar results). In fact, the approach turned out to be too global and not all variables involved were under control. For instance, it is not clear to what extent the subjects who had dictionaries at their disposal did indeed use them, nor what the relationship was between difficult vocabulary and the way text comprehension was measured. The same topic was studied in much more precise terms and in tightly controlled conditions by Tono (2001) and by Lew (2004). In the latter study, for instance, the words to be looked up were pseudo-English words so as to ensure that there was no possible prior knowledge, and the dictionary entries were immediately available next to the text. In this structured situation, the effect of dictionary use was statistically highly significant. Ard (1982) was the first study aimed at measuring the effect of the use of bilingual dictionaries in writing in a foreign language. Due to an informal methodology based on retrospection and on oral protocols during the writing task, the results are rather vague and inconclusive. In a welldesigned study, Laufer (1993) had students translate a number of isolated low-frequency words and then use these words in a sentence. For each word the students were provided with either a definition or an example, and later on both a definition and an example. Considering only the productive part here, it turned out that definitions and examples are equally effective, but that the combination of both types of information leads to the best results. Again, only in this more structured situation could clear conclusions be drawn. A similar study has been done by Laufer and Hadar (1997). Research into the relationship between dictionary use and vocabulary learning is the most recent development in this field. Laufer and Hill (2000) studied the lookup strategies of learners using an electronic dictionary and their influence on immediate word retention. Following up on a series of studies executed by Laufer and her collaborators, Chen (2011) compared the relevance of various types of dictionaries for vocabulary learning, testing word retention at the end of the experimental session and two weeks later. Dziemianko (2010 and 2011b) studied the effectiveness 10
A History of Research in Lexicography
of paper and electronic dictionaries in this context but found various differences in favour of electronic or paper dictionaries for immediate reception and production as well as for retention (see also Bogaards 2010b). Other user studies have focused on more specific aspects of dictionaries. Bogaards and Van der Kloot (2001, 2002) studied the effectiveness of grammatical information, the difference between the two studies being that in the first one real dictionary entries were used, whereas in the second, more conclusive, study manipulated entries were presented to the subjects. Recently Dziemianko (2011a) did research in the same spirit, taking into account the proficiency level of the subjects, the part of speech of the words and the particular form of grammatical codes used. Research on the importance of various types of aids in the access structures of dictionaries, such as signposts, menus or guide words, was done by Bogaards (1998) and was recently taken up by Nesi and Tan (2011) as well as by Tono (2011), who applied a technique (‘eye-tracking’) that is totally new in lexicographic research. The quality of various types of definition was studied by Lew and Dziemianko (2006). For an extensive overview of dictionary user studies see Welker (2010); for methodological considerations see Lew (2011).
2.6 Dictionary content Dictionary content depends on the type of dictionary. A dictionary of music is supposed to give an authoritative description of the world of musical notation, instruments, composers and so on. In the same way, a language dictionary, which is the prototypical representative of this type of reference book, should present a complete and reliable picture of the lexical riches of the language described. Whereas in this field traditionally data were manually collected from printed sources such as novels and newspapers, nowadays computers and the internet can provide huge masses of data of various types. Although spoken data still cause serious problems, they have also become available in ever larger quantities. Corpus linguistics was introduced in the 1980s and the COBUILD dictionary was the first one to profit entirely from this new approach. The choices made and the techniques used were justified in a book, Looking Up (Sinclair 1987), which had a big influence. Corpora such as the British National Corpus (BNC) or the Bank of English are now far bigger than they were in those early days (600 million tokens or more as against 20 million then). But, more importantly, methods to extract useful lexical data have evolved in important ways. In the pioneering stage of corpus linguistics use was essentially made of the KWIC (Keyword in Context) procedure, a device that aligns a given word within all the contexts it has in the corpus. As the output of this type of analysis was not always manageable, especially not for more frequent words with thousands of concordance lines, ways were sought to automatically produce more concise images of lexical items in a language. Using tagged input and a set of linguistic relations that can hold between words, Kilgarriff and Tugwell (2002) created the Sketch Engine, a tool that presents in one page the most salient characteristics of a given item, such as a verb’s most frequent subjects and objects, the most frequent adjectives that accompany a noun and so on (cf. Kilgarriff et al. 2004, Kilgarriff, Chapter 7). The Sketch Engine is now widely used, not only for dictionary making but for the development of theoretical linguistics as well (see Pustejovsky and Rumshisky 2008). In his theory of norms and
11
The Bloomsbury Handbook of Lexicography
exploitations, Hanks (2004) combines prototype theory with the outcomes of corpus linguistics and shows how literal meanings of words can be distinguished from conventional metaphorical or idiomatic uses as well as from incidental metaphorical uses. Sue Atkins (2010), who was one of the important promoters of corpus lexicography, describes how, ideally, lexicographic data described in FrameNet (cf. Fillmore et al. 2003) could be combined with data available in other databases in order to reach ever more complete descriptions of the lexicon of a language.
3 Conclusion As has become clear, I hope, research in lexicography does not constitute one consistent body. It is more of a patchwork composed of quite separate domains, each with their own topics to be studied and their own approaches or methodologies. One could think that this is due to the absence of a comprehensive theory, but up to now it is unclear how such a theory should, or even could, be formulated (cf. Bogaards 2010c; Piotrowski, Chapter 16).
References Ard, J. (1982), ‘The use of bilingual dictionaries by ESL students while writing’, ITL Review of Applied Linguistics 58, 1–27. Atkins, B.T.S. (2010), ‘The Dante Database: Its contribution to English lexical research, and in particular to complementing the FrameNet Data’ in G.-M. De Schryver (ed.), A Way with Words: Recent Advances in Lexical Theory and Analysis. A Festschrift for Patrick Hanks, Kampala: Menha Publishers, 267–97. Atkins, B.T.S and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. Béjoint, H. (2010), The Lexicography of English. From Origins to Present, Oxford: Oxford University Press. Bensoussan, M., D. Sim and R. Weiss (1984), ‘The effect of dictionary usage on EFL test performance compared with student and teacher attitudes and expectations’, Reading in a Foreign Language 2, 262–76. Bogaards, P. (1988), ‘A propos de l’usage du dictionnaire de langue étrangère’, Cahiers de Lexicologie 52, 131–52. Bogaards, P. (1996), ‘Dictionaries for learners of English’, International Journal of Lexicography 9 (4), 277–320. Bogaards, P. (1998), ‘Scanning long entries in learners’ dictionaries’ in T. Fontenelle et al. (eds), Actes EURALEX ‘98 Proceedings, Liège: Université de Liège, 555–63. Bogaards, P. (2010a), ‘The evolution of learners’ dictionaries Merriam-Webster’s Advanced Learner’s English Dictionary’ in I.J. Kernerman and P. Bogaards (eds), English Learners’ Dictionaries at the DSNA 2009, Tel Aviv: KDictionaries, 11–27. Bogaards, P. (2010b), ‘Dictionaries and second language acquisition’ in A. Dykstra and T. Schoonheim (eds), Proceedings of the XIV Euralex International Congress, Ljouwert: Fryske Akademy, 99–123. Bogaards, P. (2010c), ‘Lexicography: Science without theory?’ in G.-M. De Schryver (ed.), A Way with Words: Recent Advances in Lexical Theory and Analysis. A Festschrift for Patrick Hanks, Kampala: Menha Publishers, 313–22.
12
A History of Research in Lexicography
Bogaards, P. and W. Van Der Kloot (2001), ‘The use of grammatical information in learner’s dictionaries’, International Journal of Lexicography 14 (2), 97–121. Bogaards, P. and W. Van Der Kloot (2002), ‘Verb constructions in learners’ dictionaries’ in A. Braasch and C. Povslen (eds), Proceedings of the Tenth Euralex International Congress 2002, Copenhagen, Vol. 2, 747–57. Chen, Y. (2011), ‘Studies on bilingualized dictionaries: The user perspective’, International Journal of Lexicography 24 (2), 161–97. Clayton, M. L. (2003), ‘Evidence for a native-speaking natural author in the Ayer Vocabulario Trilingüe’, International Journal of Lexicography 16 (2), 99–119. Coleman, J. and S. Ogilvie (2009), ‘Forensic dictionary analysis: Principles and practice’, International Journal of Lexicography 22 (1), 1–22. Cormier, M. (2003), ‘From the Dictionnaire de l’Académie Françoise, dédié au Roy (1694) to the Royal Dictionary (1699) of Abel Boyer: tracing inspiration’, International Journal of Lexicography 16 (1), 19–41. Cormier, M. (ed.) (2010), ‘Perspectives on Seventeenth- and Eighteenth-century European Lexicography’, Special issue of International Journal of Lexicography 23 (2), 133–222. Cormier, M. and H. Fernandez (2004), ‘Influence in lexicography: A case study. Abel Boyer’s Royal Dictionary (1699) and Captain John Stevens’ Dictionary English and Spanish (1705)’, International Journal of Lexicography 17 (3), 291–308. Cormier, M. and H. Fernandez (2005), ‘From the Great French Dictionary (1688) of Guy Miège to the Royal Dictionary (1699) of Abel Boyer: tracing inspiration’, International Journal of Lexicography 18 (4), 479–507. Dohi, K., Y. Osada, A. Shimizu, Y. Asada, R. Takahashi and T. Kanazashi (2010), ‘An analysis of Longman Dictionary of Contemporary English, Fifth Edition’, Lexicon 40, 85–187. Dubois, J. and C. Dubois (1971), Introduction à la lexicographie: le dictionnaire, Paris: Larousse. Dziemianko, A. (2010), ‘Paper or electronic? The role of dictionary form in language reception, production and the retention of meaning and collocations’, International Journal of Lexicography 23 (3), 257–73. Dziemianko, A. (2011a), ‘User-friendliness of noun and verb coding systems in pedagogical dictionaries of English: A case of Polish learners’, International Journal of Lexicography 24 (1), 50–78. Dziemianko, A. (2011b), ‘Does dictionary form really matter’ in K. Akasu and S. Uchida (eds), Asialex 2011 Proceedings. Lexicography: Theoretical and Practical Perspectives, Kyoto: Asian Association for Lexicography, 92–101. Fillmore, C., C.R. Johnson and M.R.I. Petruck (2003), ‘Background to FrameNet’, International Journal of Lexicography 16 (3), 235–50. Gouws, R.H. (2004), ‘Meilensteine auf dem historischen Weg der Metalexikographie’, Lexicographica 20, 155–75. Hanks, P. (2004), ‘The syntagmatics of metaphor and idiom’, International Journal of Lexicography 17 (3), 245–74. Hartmann, R.R.K. (ed.) (1983), Lexicography: Principles and Practice, London: Academic Press. Hartmann, R.R.K. (2008), ‘Twenty-five years of dictionary research: Taking stock of conferences and other lexicographic events since LEXeter ‘83’ in E. Bernal and J. DeCesaris (eds), Proceedings of the XIII EURALEX International Congress (Barcelona, 15 – 19 July 2008), Barcelona: Universitat Pompeu Fabra, 131–48. Hausmann, F-J. (1989), ʻKleine Weltgeschichte der Metalexikographieʼ in H.E. Wiegand (ed.), Wörterbücher in der Diskussion: Vorträge aus dem Heidelberger Lexikographischen Kolloquium (Lexicographica Series Maior 27), Tübingen: Max Niemeyer, 75–109. Hausmann, F-J., O. Reichmann, H.E. Wiegand and L. Zgusta (eds) (1989–91), Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography, 3 vols, Berlin: Walter de Gruyter. Householder, F.W. (1962), ‘Summary report’ in F.W. Householder and S. Saporta (eds), 279–82.
13
The Bloomsbury Handbook of Lexicography
Householder, F.W. and S. Saporta (eds) (1962), Problems in Lexicography, Bloomington: Indiana University. Jehle, G. (1990), Das englische und französische Lernerwörterbuch in der Rezension. Theorie und Praxis der Wörterbuchkritik, Tübingen: Max Niemeyer. Kanazashi, T., T. Otani, A. Nonomiya and M. Ryu (2009), ʻAn analysis of Longman Advanced American Dictionary, new edition: A pedagogical viewpointʼ, Lexicon 39, 18–99. Kilgarriff, A. and D. Tugwell (2002), ‘Sketching words’ in M.-H. Corréard (ed.), Lexicography and Natural Language Processing. A Festschrift in Honour of. B.T.S. Atkins, Euralex. Kilgarriff, A., P. Rychlý, P. Smrž and D. Tugwell (2004), ‘The Sketch Engine’ in G. Williams and S. Vessier (eds), Proceedings of the Eleventh EURALEX International Congress, Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud, 105–16. http://www.sketchengine.eu. Kokawa, T., R. Aoki, J. Sugimoto, S. Uchida and M. Ryu (2010), ‘An analysis of the Merriam-Webster’s Advanced Learner’s English Dictionary’, Lexicon 40, 27–84. Landau, S. (1984), Dictionaries. The Art and Craft of Lexicography, New York: Scribner. Second edition 2001, Cambridge: Cambridge University Press. Laufer, B. (1993), ‘The effect of dictionary definitions and examples on the use and comprehension of new L2 words’, Cahiers de Lexicologie 63, 131–42. Laufer, B. and L. Hadar (1997), ‘Assessing the effectiveness of monolingual, bilingual, and “bilingualized” dictionaries in the comprehension and production of new words’, The Modern Language Journal 81, 189–96. Laufer, B. and M. Hill (2000), ‘What lexical information do L2 learners select in a CALL dictionary and how does it affect word retention?’, Language Learning & Technology 3, 58–76. Lew, R. (2004), Which Dictionary for Whom? Receptive Use of Bilingual, Monolingual and Semibilingual Dictionaries by Polish Learners of English, Poznań: Motivex. Lew, R. (2011), ‘User studies: Opportunities and limitations’ in, K. Akasu and S. Uchida (eds), Asialex2011 Proceedings. Lexicography: Theoretical and Practical Perspectives, Kyoto: Asian Association for Lexicography, 7–16. Lew, R. and A. Dziemianko (2006), ‘A new type of folk-inspired definition in English monolingual learners’ dictionaries and its usefulness for conveying syntactic information’, International Journal of Lexicography 19 (3), 225–42. Masuda, H., S. Uchida, M. Hirayama, A. Kawamura, R. Takahashi and Y. Ishii (2008), ‘An analysis of Collins COBUILD Advanced Dictionary of American English’, Lexicon 38, 46–155. McDermott, A. and R. Moon (eds) (2005), Johnson in Context, Special issue of International Journal of Lexicography 18 (2), 153–266. Nesi, H. and P. Meara (1991), ‘How using dictionaries affects performance in multiple-choice EFL tests’, Reading in a Foreign Language 8, 631–43. Nesi, H. and K.H. Tan (2011), ‘The effect of menus and signposting on the speed and accuracy of sense selection’, International Journal of Lexicography 24 (1), 79–96. Pustejovsky, J. and A. Rumshisky (2008), ‘Between chaos and structure: interpreting lexical data through a theoretical lens’, International Journal of Lexicography 21 (3), 337–55. Rey, A. (2003), ‘La renaissance du dictionnaire de langue française au milieu du XXe siècle: une révolution tranquille’ in M. Cormier, A Francoeur and J.-C. Boulanger (eds), Les dictionnaires Le Robert. Genèse et évolution, Montréal: Les Presses de l’Université de Montréal, 88–99. Rey-Debove, J. (ed.) (1970), La lexicographie (Langages 19), Paris: Didier. Rey-Debove, J. (1971), Étude linguistique et sémiotique des dictionnaires français contemporains, The Hague: Mouton. Ripfel, M. (1989), Wörterbuchkritik: eine empirische Analyse von Wörterbuchrezensionen, Tübingen: Max Niemeyer. Rodríguez-Álvarez, A. and M.E. Rodríguez-Gil (2006), ʻJohn Entick’s and Ann Fisher’s dictionaries: An eighteenth-century case of (cons)piracy?’, International Journal of Lexicography 19 (3), 287–319.
14
A History of Research in Lexicography
Sinclair, J. (ed.) (1987), Looking Up. An Account of the COBUILD Project in Lexical Computing, London and Glasgow: Collins ELT. Svensén, B. (2009), A Handbook of Lexicography. The Theory and Practice of Dictionary-Making, Cambridge: Cambridge University Press. Tono, Y. (2001), Research on Dictionary Use in the Context of Foreign Language Learning: Focus on Reading Comprehension (Lexicographica. Series Maior 106), Tübingen: Max Niemeyer. Tono, Y. (2011), ‘Application of eye-tracking in EFL learners’ dictionary look-up process research’, International Journal of Lexicography 24 (1), 124–53. Welker, H.A. (2010), Dictionary Use. A General Survey of Empirical Studies, Brasilia: Author’s Edition. Wiegand, H.E. (ed.) (1998), Perspektiven der pädagogischen Lexikographie des Deutschen. Untersuchungen anhand von Langenscheidts Grosswörterbuch Deutsch als Fremdsprache, Tübingen: Max Niemeyer. Wiegand, H.E. (ed.) (2002), Perspektiven der pädagogischen Lexikographie des Deutschen II. Untersuchungen anhand des de Gruyter Wörterbuchs Deutsch als Fremdsprache, Tübingen: Max Niemeyer. Zgusta, L. (1971), Manual of Lexicography, The Hague and Paris: Mouton.
15
16
PA RT I Research methods and problems
18
3
Researching lexicographical practice Lars Trap-Jensen
The computerization of work routines that the world has witnessed over the last few decades has changed the lives of many people, but the effect it has had on dictionaries and on the lexicographer’s daily life is all-embracing and difficult to overstate. In this chapter, we look at the various stages involved in dictionary-making and some of the decisions that the lexicographer is faced with in the process (see also Adamska-Sałaciak, Chapter 12 and Stutzman and Warfel, Chapter 17). It should be noted that even if some of the issues are general and shared by different types of dictionaries, others pertain to just one particular type. Monolingual dictionaries are obviously different products compared to bilingual ones, and making a dictionary is different from making an encyclopaedia, terminology or even a telephone directory, even if they all must be considered lexicographical products. In the following, the focus of attention is, unless stated otherwise, on monolingual dictionaries for native speakers.
1 Dictionary conceptualization While many things have changed dramatically in lexicography, the planning phase is arguably one of the areas that has been least affected, and yet it is perhaps the most important one. It is during this phase that crucial decisions about the database structure and its inventory must be made, based on an analysis of the intended users and their needs. These decisions condition how the data in a later phase can be presented to the end-user and how it can be reused in other applications. One thing that has changed is that lexicographers today are less inclined to have one specific product in mind when they build their dictionary database. Over the last decades publishers have spent much effort in unifying their dictionary resources and standardizing the information contained in each element in the database. Instead of offering a range of independent dictionaries, each with their own specific list of entry words, inflectional information, synonyms, style labels, etc., most publishing houses now have one central database from which individual dictionaries can be produced by extracting the desired combinations of information types needed for a particular lexicographical product. From the publisher’s point of view, this solution gives them two important advantages. First, it makes maintenance easier, as updates that are made in one element in the database immediately feed through to all the dictionaries in which that element is used. Second, it enables them to refine the range of dictionaries offered, as they can extract
The Bloomsbury Handbook of Lexicography
different combinations of information types to suit the needs of a specific user group. For the user, it means that they are more likely to recognize a distinct flavour of a particular publishing house’s products, and perhaps a sense of familiarity if they buy more than one product. From a lexicographical point of view, what has happened is that the production of lexicographical data has become more clearly separated from the presentation of the data to the user. Today, dictionaries may be available both as classical paper products (although sales are rapidly declining) and in different electronic forms such as app, online version or an API. The lexicographical data may be offered as a resource that is not immediately visible to the user, as in spell-checkers in word processing programs, or only when activated, such as the plug-in function found in e-readers that pops up when users click on a word. More will be said about this in the last section of the chapter.
2 Designing the database Lexicography involves a lot of decision-making: How many words should be in the dictionary and by what criteria? What types of information are relevant for the intended target group? Does the intended target group coincide with the actual user group, and, if not, does it matter? What is the best way to explain a particular word meaning to the reader? The answers to these and a good deal more are not necessarily easy to provide beforehand, but they are important for the way the database should be built. A database designed to meet future requirements for other dictionaries or publication channels should anticipate as many aspects as possible in the early stage of the process. To take an example: in a dictionary that is going to appear as a concise paper dictionary, it may be appropriate to use abbreviations, whereas the online version will have the full forms. However, not all abbreviations have a one-to-one expansion: adj. refers sometimes to ‘adjective’ and sometimes to ‘adjectives’, bot. can unfold as ‘botany’ or as ‘botanical’. It is likely that a simple list of abbreviations and their expansions will not do. Instead, all the different possibilities must be taken into account and a special field or an attribute made available in the database to show how a given word form should be presented as a full and abbreviated form respectively. Another example is morphological information. In a bilingual L1–L2 dictionary, morphological information about headwords is not necessary, as it can be assumed that the users know how the words are inflected in their native language. But if the same list of headwords is, at some later point in time, used in a different dictionary, such as a learners’ dictionary, no such assumption can be made and the morphological information will have to be produced if it is not in the database from the outset. For definitions, it is not recommended to use the same wording in a technical dictionary as in an encyclopaedia, not to speak of children’s dictionaries. For that reason, the database may well include several versions of the same definition to be used for different user groups. Even within the same dictionary, two versions could be offered: a short definition for quick reference and a more elaborate one for users who prefer an encyclopaedic explanation with attention to detail. More could be added to the list of examples: information about pronunciation in either phonetic notation or as sound clips, images and video clips, syntactic and encyclopaedic information, quotations and other language examples are all information types that are important to store in the database. They may not all be relevant for publication in one and the same dictionary, but it is advisable to store the information in a central base from where it can be easily retrieved. 20
Researching Lexicographical Practice
In some cases, it may even be practical to include elements in the database that are not ever going to be shown to the end-user but which are useful for other purposes. In a dictionary project that began in the early 1990s, The Danish Dictionary, it was decided to include information about the nearest superordinate word (the genus proximum) and about subject domain if at all possible. This information was systematically entered by the editors throughout the compiling period but was not used at all in the printed dictionary. It was, however, very useful when, later, the dictionary data was used to build a Danish wordnet (on the model of Princeton WordNet), to compile a Danish thesaurus (Pedersen et al. 2009, Lorentzen and Trap-Jensen 2011, Nimb et al. 2014) and the Danish FrameNet Lexicon (Nimb 2017). Technically, there is a wide range of software solutions available. Some lexicographers and publishers prefer relational databases, others XML bases, and both types exist as proprietary commercial products and as open-source products. No more will be said about soft- and hardware, but it should be stressed that the notion ‘a central database’ is used here as a broad cover term. An actual implementation often involves several databases. The main point is that the overall architecture should be such that the bases are designed to function as a conceptual unit, linked to each other via unique ID numbers. Apart from defining what elements to use in the dictionary, it is important in the planning phase to prepare a manual or style guide that tells the lexicographers about the inventory of elements and how they should be used. A style guide is especially vital for larger projects with a staff of considerable size, and for long-term projects that have to allow for some degree of staff turnover. It is an obvious boon for training new editors and helps to secure a uniform final appearance. Style guides are project-internal tools and as such they vary greatly from project to project, ranging from rough principles (be brief and to the point; don’t use brackets and exceptions if you can avoid them) to very specific instructions for certain elements (use a maximum of four synonyms; only describe syntactic patterns with 10 corpus examples or more; in metatext, use only words from the defining vocabulary). A style guide that carefully records all the principles and conventions defined in the planning phase, supplied with the revisions and adjustments made during the compiling process will ultimately capture what in the end gives the dictionary its own characteristic style and personality.
3 Describing the linguistic data After the initial planning phase, where all the general decisions are made, it is time to consider the object of description, the linguistic data. This is an area that has undergone a dramatic development over the last decades, both in the methods used and in the available tools. The achievements within the field of corpus linguistics have produced a range of tools that lexicographers use to establish a sound empirical basis for their linguistic description. Corpus linguistic methods are employed at almost every stage of the dictionary entry: lemma selection, lexical variants, inflection, collocations, valency patterns, set phrases, compounding and derivation. This interesting topic is explored in further detail in Chapter 7. Here, we will start by taking a closer look at the empiricist position and ask whether it is as justified as many lexicographers are inclined to think.
21
The Bloomsbury Handbook of Lexicography
3.1 Prescription or description? Historically, the view that dictionaries should reflect the language of all its speakers cannot be taken for granted. In the nineteenth century and earlier it was widely held that, because of the important educational role of dictionaries, they should be normative in the true sense of the word: serving as an exemplary model for their users. Consequently, headwords and examples were excerpted from texts written by respected, canonical authors of their time. A well-known case in point is the Dictionnaire de l’Académie française (1694), which set an example for a number of national dictionaries in the eighteenth and nineteenth centuries. To illustrate, one of the pioneers behind the Dictionary of the Royal Academy in Denmark (Langebek 1740) claimed that there was no room in the dictionary for: All coarse, rude and lecherous words and phrases which contradict decency … for they need not be known to those who do not appreciate it, and those who do will surely get to know them anyhow. Around hundred years later, the editor of the most popular Danish dictionary of the time wrote in his preface (Molbech 1859: viii): Even the most frequent use of a newly formed word, especially in colloquial language, renders it no authority or proof of usability in pure speech and good style, nor of its admittance into a dictionary as long as it offends the cultivated ear and the delicate language instinct. There are notable exceptions to the normative tradition, but even so it was not until well into the twentieth century that it became generally accepted for dictionaries to reflect the language community taken as a whole. No doubt, the greater availability of texts beyond the professional works of authors and journalists played a role in paving the way for the descriptive view dominant in the latter half of the twentieth century. Most lexicographers today accept the descriptive role of dictionaries and prefer to see their own role as objective observers of linguistic facts, but there are areas of lexicographical practice that fall outside the scope of description. Whenever the normative role of language is involved, an element of authority and language policy is present. The orthographic forms of the headwords in a dictionary are in many countries regulated not directly by the practice of the language users but through an official body that has been given the formal authority to determine the spelling of words. Other countries, such as the UK, have no such body but a de facto norm which is set by one or two dictionaries that are widely recognized and followed by the educational system and by central authorities.
3.2 Lemma selection Another area where normative aspects are involved is lemma selection, clearly illustrated by the above quotations. A strictly descriptive approach would involve ranking all the words of a well-balanced corpus and mechanically selecting the most frequent ones until a given cut-off point, determined externally by the size and resources of the dictionary project. In itself, it is no
22
Researching Lexicographical Practice
trivial matter to decide what constitutes a word and its lemma form, but we cannot go into the details here. However, few dictionaries build their headword list in this mechanical way, as the frequency principle would inevitably produce a number of undesired headwords. Most obvious examples are proper names, which occur frequently in corpus texts but are for the most part uninteresting for a general language dictionary. Admittedly, there are exceptions, such as proper names with a metonymic function (The White House, Mecca), names that are part of multi-word expressions (Adam’s apple, Rome wasn’t built in a day) or culture-specific names that require explanation (American idol, the London Eye), but most proper names have the sole function of identifying a unique person, place or other entity. Apart from proper names, compounds and combining forms (long-tailed, long-haired, longeared) are examples of words that are often frequent in texts but are not always obvious lemma candidates. They are often semantically transparent and thus predictable from their components. It should be noted, though, that the process of compounding and derivation is language specific but in languages where the process is productive (which is in general the case for Germanic languages, although less pronounced for English) the result may be a large number of often trivial compounds. In many instances, therefore, the user is better off being able to look up less frequent simplex words which cannot be decoded immediately. Conversely, the descriptivist model would most likely lead to accidental lexical gaps in dictionary coverage. Parts of a language’s vocabulary are made up of closed sets of lexical items, and most people would find it odd if they could only look up some of the months of the year or all the days of the week except Tuesday. For systematic reasons, the solution would be to include all the members of the set no matter if, by chance, one or two were not sufficiently represented in the corpus to warrant their inclusion. Even if absence from the corpus is non-accidental, inclusion may be worthwhile after all. A case in point is the chemical elements, some of which are undoubtedly better known and used than others. Another problem with lemma selection is the difficulty involved in defining which lexical units belong to a particular language. We have seen that the descriptive approach attempts to reflect the language of the whole language community. But how exactly is a language community delineated? There is no doubt a common core of words that are known to all speakers of English. As one moves away from the common core, however, the vocabulary of individual speakers becomes gradually less concordant. Due to differences such as age, education and housing history, the linguistic experience of a middle-aged engineer from Manchester is different from that of a university student in Cardiff, which is again quite different from a fisherman from Aberdeen. The engineer knows many technical terms from his or her field of speciality, the fisherman is familiar with the words associated with fishing gear and navigation at sea, and the student probably knows many slang words and informal expressions the others don’t, apart from the special vocabulary associated with her subject of study. Due to differences in personal life and linguistic experience it is unlikely that any two speakers of a language have exactly the same stock of words at their disposal. How should the dictionary deal with this? Should it include all the technical terms from subject fields, and all slang, jargon and informal expressions? Ideally, perhaps yes, especially in an e-dictionary where physical space is irrelevant. In practice, lexicographers are forced to decide on priorities, in which case it is important to realize who is the intended target group of the dictionary. For a learner’s dictionary, the users can be expected to look up slang and informal expressions more often than special terms from the fishing trade, 23
The Bloomsbury Handbook of Lexicography
and they are also more likely to come across the special words used in linguistics and language pedagogy than words belonging to engineering. When it comes to regional language, the practice of most dictionaries is to leave out genuine dialect words that are rare outside the geographical area where the dialect is spoken. Instead, these are included in special dictionaries devoted to that particular dialect. Somewhat more controversial are words from other languages that appear in the corpus texts. In the Englishspeaking world this is perhaps not as controversial an issue as it can be in other countries and languages around the world where the dominant influence of English as a global language is felt. Because of the status associated with the language, English words and expressions appear quite frequently in otherwise ‘pure’ (Spanish, Czech, Swedish, etc.) contexts. The lexicographer must determine whether to treat these items as loanwords that need explanation like any other word turning up in a corpus with a sufficient frequency, or if they should be interpreted as instances of code switching that they can safely neglect. There is no simple answer to this problem, and the lexicographer must in each case carefully analyse if the item shows signs of integration into the surrounding language, for example, in the way the word is pronounced, inflected or used syntactically. The more established the word is in the new language, the more reasonable it is to include it in the dictionary. Another point is that words may not be borrowed in all their senses. For example, the word pride is used in several languages in the sense ‘gay parade’, but the English word is much more semantically diverse, so it is not very helpful for the user to be referred to an English dictionary for an explanation. One should, however, be aware that practice varies significantly, as every country and language have their own cultural and political contexts and traditions. The attitude towards loanwords can be a highly sensitive matter, especially in areas where a minority language has been historically dominated by a larger, perhaps colonial, language.
3.3 Language policy Dictionaries and language policy can play important and active roles in contributing to the cultural identity and self-understanding of a young nation. Think of the status of Russian in the Baltic states, or of the role of dictionaries for minority languages such as Frisian, Basque, Irish and Sami, where the attitude towards loanwords from the dominant language easily comes to carry political overtones. What in one context is viewed as linguistic puritanism may in another be interpreted positively as a sign of pride in the local language. In some countries much effort is spent in coining new words and expressions in the local language in order to avoid the influence from English or other dominant languages. Such an undertaking is, of course, politically rather than linguistically motivated. From the language’s point of view, it doesn’t matter if the Icelandic word for a female flight attendant is stewardess with an English loanword or flugfreyja (literally ‘flight-Freya’, after the goddess of love in Nordic mythology) or if the English word computer is used instead of the Icelandic coinage tölva (a contraction of tala ‘number’ and Völva, a soothsayer mentioned in the younger Edda). What is important, however, is that the language policy is actively supported by the language community, whatever direction it takes. Otherwise, it may lead to the absurd situation where the dictionary lists one set of words but you hear a totally different set when you visit the local pub.
24
Researching Lexicographical Practice
Even if a solution is found for the descriptive problems discussed here, and even if the achievements of corpus linguistics have indisputably made life easier for the lexicographer in many ways, it should not be forgotten that a substantial amount of data found in dictionaries still cannot be verified empirically. Whether scent and perfume are synonyms and what their most appropriate equivalents are in French, or whether scumbag should be labelled ‘informal’, ‘derogatory’ or ‘slang’ are not questions that can be answered by checking against empirical evidence in a corpus. They are the result of the lexicographer’s evaluation based on his or her professional skill and linguistic perception. Likewise, the art of writing a precise, informative and elegant definition is something where a human is still superior to the computer. Finally, the role of and limits to the use of corpora have been questioned in recent years. For a long time, corpus frequency has been unrivalled as the dominant criterion for lemma selection. But one could also ask: can it be taken for granted that the most frequent words are also the words that users want to look up? Traditionally it is a question that was difficult, if not impossible, to examine empirically. With the arrival of e-dictionaries, log-file analysis can provide valuable data. Those studies that have been carried out (Bergenholtz and Johnsen 2005, de Schryver et al. 2006, Trap-Jensen et al. 2014) suggest that there is no simple correlation between corpus frequency and look-up frequency. On the other hand, there is still a long way to go before we can predict which words will be looked up in a dictionary and which ones will not. It is simply an area where we still have too little knowledge. Undoubtedly, it is a field that will attract more attention, not least because corpus-driven dictionaries are being put under pressure from user-driven tendencies. Future dictionaries may well use no-match lists from the log-files in addition to corpus frequency as a criterion for lemma selection.
3.4 Description or inclusion? In recent years, we have also witnessed a growing awareness of issues related to identity politics and a corresponding pressure on dictionaries to include groups seen as underprivileged judged on various parameters such as gender, ethnicity, age, religion and physical or mental disability. Language easily becomes a battleground when public debate unfolds and the air becomes thick with accusations of racism, sexism, ageism, etc. The Muhammad cartoons controversies in Denmark and Paris, #metoo and Black Lives Matter are recent examples that most readers are familiar with. Under these circumstances, it is not uncommon for dictionaries to be contacted by dissatisfied users who accuse existing descriptions of racism, sexism, ageism or otherwise being offensive or exclusive to certain groups (cf. Moon 2014). The semantic description and definitions of a dictionary should always aim to be up to date, and if the general understanding and attitudes in society have changed, it is only to be welcomed if definitions reflect such a development. A case in point is same-sex marriage. Many societies have legalized and accepted same-sex marriage, so it should be unproblematic to update the dictionary descriptions accordingly. Rather than defining a (married) couple as ‘a man and a woman who are married’, it is straightforward to revise to ‘two persons who are married’ – and likewise in entries describing the persons involved: e.g. bride, groom, brother-in-law, daughter-in-law.
25
The Bloomsbury Handbook of Lexicography
It becomes somewhat more complicated when users want particular words or senses added to or removed from the dictionary. How do you react if LGBT+ users insist on the inclusion of words such as cisnormativity, demisexual or top surgery with reference to greater diversity and inclusion? Or alternatively: should racist or offensive words be removed from the dictionary and replaced by neutral terms that treat the offended party equally? There is no obvious reaction to such requests, and each dictionary must carefully consider how they best balance descriptive precision against inclusion and respect for minorities. The reason I mention this problem is because it also challenges the status of corpora as the empirical basis for dictionaries. Modern corpora have continued to grow by a factor 10 every ten to fifteen years (cf. Trap-Jensen 2018: 28), and a standard corpus today contains, as a rough estimate, more than one billion tokens. However, most of them are not particularly well-balanced, but are dominated by journalistic texts (because they are easiest to access and often available in a uniform format) or by texts harvested from the web (which means they are readily available but sometimes difficult to handle technically or due to copyright issues). The lack of balance has the consequence that the corpus is not after all that mirror of general language it ideally was meant to be. To return to the example just mentioned, if corpus evidence suggests that the LGBT+ words are of low frequency, the editors may be inclined to refuse the requests. But strictly speaking, they cannot know if weak corpus representation is due to the fact that the words are rare in the language, or if it is because the texts dealing with the subject in question happen to be absent from the corpus. This problem is something which needs to be addressed by lexicographers and corpus linguists.
4 Dictionary writing system Another area where computers have made life easier for lexicographers is the software they use for entering the lexicographic data into the database. Dedicated dictionary writing systems (DWS) help build the data structure and secure data consistency. They are designed to implement some of the decisions that would formerly be part of the style guide. By creating a Document Type Definition (DTD) or an XML schema for the document in which the dictionary is being edited, the lexicographers can specify everything related to the document structure: what elements can be used, in what order are they allowed to occur, which elements may be used recursively, what content is possible (characters, images, sound or video clips), and what attributes an element can have. If an editor makes a mistake in attempting to store the article document, he or she is notified immediately and presented with the possible causes of schema violation. Cross-references are another traditional source of errors in dictionaries that is handled by a DWS as it can bind and automatically track the source and targets of a reference. If an editor deletes or changes either of the two, he or she will be notified and can take appropriate action. Most DWSs offer various other features such as advanced search and statistics, preview settings, export or publishing modules, integration or interoperability with other bases and multi-user set-up. If the DWS has a login function, it can also be used as a tool for the project management to keep track of article production and workflow in the various editorial phases.
26
Researching Lexicographical Practice
DWSs may be developed and tailored to meet the exact needs of a specific project, but there are also several off-the-shelf products available on the market that are sufficiently flexible to meet most customization needs.
5 Data access and presentation As mentioned earlier, there is a growing demand for dictionaries to be available in various channels and on several platforms. This implies that their contents must be presented to the user in different ways, as the possibilities in a printed dictionary are very different from those of an Internet browser – the use of hyperlinks and audio/video clips are obvious examples. Likewise, the limited size of the screen constrains what can be displayed on a smartphone or another mobile device in comparison with a 24-inch desktop monitor. If the structure of the dictionary has been devised with sufficient care, it is possible to take the differences into account in the publishing phase. The function and aesthetics of layout and typography in general belong to a long and wellestablished tradition with obvious consequences for lexicography. However, the readers are encouraged to explore for themselves the wealth of literature on the subject as no more will be said about it here. Instead, we will look at a few selected themes and tendencies that have been the object of discussion in e-lexicography recently.
5.1 Flexible data presentation The use of hyperlinks in a browser leads to different ways of navigating as compared with the two dimensions of a sheet of paper. On the computer screen, you can read from the top left to the lower-right corner as on a book page but, in addition, you can also navigate ‘downwards’ by clicking on links that will expand an element on the page or take you to a different page. This has been exploited in e-dictionaries in various ways: 1. by having different functionalities on different tabs which the user can shift between 2. by letting the user choose between different contents according to a specified profile 3. by letting the user expand or unfold certain information types by clicking a button or symbol. Many lexicographers have seen this as the fulfilment of their dreams and have welcomed the digital possibilities with enthusiasm. Through hyperlinks they can present to the user all and only the relevant information needed in a specific look-up situation. Consequently, a number of e-dictionaries have appeared that make use of the customization possibilities, ranging from fairly simple options (show more, show less) to highly elaborate user profiles (Verlinde 2010, Trap-Jensen 2010) whereas others have marketed the customized content as different dictionaries altogether (Bergenholtz 2011). In a way, it is the lexicographer’s dream, but it has turned out to have one serious disadvantage: so far, user studies have not been able to confirm that users take advantage of the possibilities offered to them. Evidence suggests that they are not very good at analysing their own needs and the look-up situation they are in (Trap-Jensen 2010, 27
The Bloomsbury Handbook of Lexicography
Lorentzen and Theilgaard 2012). Lexicographers will have to respond to this challenge and find new ways of accommodating the users. An obvious reaction would be to change the focus from customization towards the use of adaptive technologies: rather than leaving it to the user to select the appropriate combination of data for a task, the dictionary should display the content that is most relevant when compared with the user’s previous search behaviour. This is in keeping with the service provided by Amazon, YouTube and e-commerce companies that offer new items to their customers based on their previous purchases or behaviour (cf. Rundell 2012: 23. See also Chapters 21–24 in this book).
5.2 Crowdsourcing and collaborative lexicography While most people know Wikipedia, few successful attempts have so far been made at creating dictionaries with content that is entirely user-generated (see Sajous and Josselin-Leray, Chapter 20). A possible exception is Wikipedia’s sister project Wiktionary (www.wiktionary.org), at least if you bear in mind that size seems to be a condition of success: the most active Wiktionaries exist for languages with large numbers of speakers, such as English, French and Russian.1 Whether this is a lasting trend remains to be seen, but there seems to be a preference among crowdsourcing contributors for niche areas where they are the experts. Thus user-driven dictionaries are more likely to be successful if they are directed towards a limited area (such as slang, neologisms, dialects, special subject fields) rather than towards general language vocabulary. Urban Dictionary (www.urbandictionary.com) is perhaps the best example of this. What it shows is that there is plenty of enthusiasm and commitment, and lexicographers can take advantage of this, also in their own resources. Most obvious are suggestions for new entries, where users can submit anything from a headword to a full entry proposal with sense divisions, definitions, collocations and authentic examples. Userinvolvement and interactivity in general are characteristic trends in Internet behaviour which can be incorporated in dictionaries in various forms: on social media, as blogs or forums, RSS feeds, user panels, questions and answers, comments and feedback on individual entries, etc. Also, entertainment, gamification and dynamic content are features that cannot be dismissed as a mere whim of fashion, especially in learners’ dictionaries and other dictionaries aimed at the younger generation of digital natives.
6 Finding the dictionary – the future There is little doubt that the digital reality will change the form and status of dictionaries. The question is: how will they change? If we think of the analogue products of the not-so-distant past, a dictionary was a very concrete and tangible object: a physical book on a shelf. Faced with a linguistic problem, the user would have to make a deliberate choice and reach out for the dictionary they thought would help solve the problem. This is not so in the digital era. Faced with a similar problem today, nine out of ten persons would not turn to their favourite e-dictionary. Instead, they would do as always when faced with any kind of problem: they ask Google. They do not really bother whether the answer comes from a dictionary, from a forum discussion or a 28
Researching Lexicographical Practice
newspaper article. One response to this challenge, seen from the lexicographer’s point of view, is search engine optimization: make sure your dictionary appears as early as possible on the Google result page. Another reaction is resource integration: provide the answer to the user where the problem occurs. Instead of turning to a completely different site, whether Google or an e-dictionary, the user could look up directly in an embedded dictionary via a keyboard shortcut (e.g. double click) without leaving the site. This feature is already implemented in many e-readers, but it needs to be developed further, for instance, as part of individual applications and sites or even as part of the computer’s operating system. The ultimate goal is to provide only the definition or translation that is the right one in a given context, but with the current state of language technology it may take a while before this is reality. More will be said about future dictionaries in Part III. The challenge for lexicography today is that dictionaries are likely to change the way they look and will probably lose some of their former status, with a risk of drowning in the profusion of other resources with which they compete for user attention (see also Nielsen, Chapter 23). Whether such a development is good or bad is more than anything a matter of personal inclination. For the pessimist, it may be a comfort that nothing suggests that the need for lexicographical data is diminishing.
Note 1 Success measured solely as number of entries is, of course, a very rough and dubious criterion, and I have said nothing about the lexicographic quality of the products (for a critical account of Wiktionary, see Lew 2014). Another reservation concerns apparent counterexamples like Malagasy: it is sometimes mentioned that the number of Wiktionary entries for some languages is boosted through automatic bottranslations from other languages.
References Bergenholtz, H. (2011), ‘Access to and presentation of needs-adapted data in monofunctional internet dictionaries’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), e-Lexicography: The Internet, Digital Initiatives and Lexicography, London and New York: Continuum, 30–53. Bergenholtz, H. and M. Johnsen (2005), ‘Log files as a tool for improving internet Dictionaries’, Hermes 34, 117–41. De Schryver, G.-M., D. Joffe, P. Joffe and S. Hillewaert (2006), ‘Do dictionary users really look up frequent words? – on the overestimation of the value of corpus-based lexicography’, Lexikos 16, 67–83. Langebek, J. (1740), Plan for the Organisation of the Dictionary of the Royal Academy. Here quoted from L. Jacobsen and H. Juul-Jensen (1918), Preface to the Dictionary of the Danish Language, Vol. 1, Copenhagen: Gyldendal Publishers. Lew, R. (2014), ‘User-generated Content (UGC) in English Online Dictionaries’ in OPAL: Online publizierte Arbeiten zur Linguistik 2014 (4), Mannheim: Institut für Deutsche Sprache, 8–26. Lorentzen, H. and L. Theilgaard (2012), ‘Online dictionaries – how do users find them and what do they do once they have?’ in R.V. Fjeld and J.M. Torjusen (eds), Proceedings of the 15th EURALEX
29
The Bloomsbury Handbook of Lexicography
International Congress, Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 654–60. Lorentzen, H. and L. Trap-Jensen (2011), ‘There and back again – from dictionary to wordnet to thesaurus and vice versa: how to use and reuse dictionary data in a conceptual dictionary’ in I. Kosem and K. Kosem (eds), Electronic Lexicography in the 21st Century. New Applications for New Users (Proceedings of eLex 2011, Bled, 10–12 November 2011), Ljubljana: Trojina, Institute for Applied Slovene Studies, 175–9. Molbech, C. (1859), Molbechs ordbog 1–2, Copenhagen: Gyldendalske Boghandlings Forlag. Moon, R. (2014), ‘Meanings, Ideologies, and Learners’ Dictionaries’ in A. Abel, C. Vettori and N. Ralli (eds), Proceedings of the 16th EURALEX International Congress, Bolzano/Bozen: EURAC Research, 85–105. Nimb, S. (2017), ‘The Danish FrameNet Lexicon: Method and lexical coverage’, in Proceedings of the International FrameNet Workshop at LREC 2018, Miyazaki, Japan. http://lrec-conf.org/workshops/ lrec2018/W5/pdf/3_W5.pdf Nimb, S., L. Trap-Jensen and H. Lorentzen (2014), ‘The Danish Thesaurus: Problems and perspectives’ in A. Abel, C. Vettori and N. Ralli (eds), Proceedings of the 16th EURALEX International Congress, Bolzano/Bozen: EURAC Research, 191–9. Pedersen, B.S. et al. (2009), ‘DanNet: The challenge of compiling a wordnet for Danish by reusing a monolingual dictionary’, Language Resources and Evaluation 43 (3), 269–99. Rundell, M. (2012), ‘It works in practice but will it work in theory? The uneasy relationship between lexicography and matters theoretical’ in R.V. Fjeld. and J.M. Torjusen (eds), Proceedings of the 15th EURALEX International Congress, Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 47–92. Trap-Jensen, L. (2010), ‘One, two, many: Customization and user profiles in internet dictionaries’ in A. Dykstra and T. Schoonheim (eds), Proceedings of the 14th Euralex International Congress, Leeuwarden: Fryske Akademy, 1133–43. Trap-Jensen, L. (2018), ‘Lexicography between NLP and linguistics: Aspects of theory and practice’ in J. Čibej, V. Gorjanc, I. Kosem and S. Krek (eds), Proceedings of the 18th EURALEX International Congress, Ljubljana University Press, Faculty of Arts, 25–37. Trap-Jensen, L., H. Lorentzen and N.H. Sørensen (2014), ‘An odd couple – corpus frequency and look-up frequency: what relationship?’ in I. Kosem and M. Rundell (eds), Slovenščina 2.0 2 2, Trojina, Institute for Applied Slovene, Slovenia, 94–113. Verlinde, S. (2010), ‘The Base lexicale du français: A multi-purpose lexicographic tool’ in S. Granger and M. Pacquot (eds), eLexicography in the 21st Century: New Challenges, New Applications (Proceedings of eLEX 2009), Louvain-la-Neuve: Presses Universitaires de Louvain (Cahiers du CENTAL series), 325–34.
30
4
Methods in dictionary criticism Kaoru Akasu
1 Introduction Dictionary criticism, or dictionary evaluation, is an area of lexicography that, through evaluations and appraisals, aims to contribute towards improving the quality of a dictionary or dictionaries or, for that matter, to help to further progress lexicography per se. Hartmann and James (1998: 85) noted, in the entry for lexicography, that ‘[t]here are as yet no internationally agreed standards of what constitutes a good dictionary.’ Potentially, dictionary criticism does, however, have a crucial and pivotal role to play in transforming the current situation. In what follows, I will first touch upon some observations concerning dictionary criticism in order to find out where it stands. I will, then, take a brief look at the kinds of attempt made thus far to improve the situation as regards dictionary criticism, and I will have some comments to make about them. Finally, I will introduce the reader to what we call dictionary analysis and elaborate on and present it as a reasonably practical and realistic, if not the best or ideal, method for conducting dictionary criticism.
2 Dictionary criticism as it stands Hartmann (1996: 241) notes that ‘[e]valuating and assessing lexicographic products is a timehonoured activity’, and Dohi (1993: 22) also states that ‘the criticism of dictionaries of English is considered to have a history of well over 200 years.’ It is worth pointing out, however, that in the monumental work on the topic (Hausmann et al. 1989–91), a tome with more than 3,200 pages in three volumes, containing some 330 articles on a vast variety of topics of lexicography, there is only one article, that of Osselton’s (1989), that carries the phrase ‘dictionary criticism’ in its title (but cf. Swanepoel 2013). Osselton (1989: 229) observes quite harshly as well as to the point: [T]he criticism … reveals a surprising lack of interest in general principles, with incidental sniping taking place of any real exploration of the intentions with which the works being criticized had been set up. Omissions are lamented and superfluities condemned, but the whole basis for determining the nomenclature remains largely undiscussed. The near-total absence of concern for the semantic principles of definition is specially striking, and the topic of lemmatization is seldom raised. User-convenience is hardly an issue.
The Bloomsbury Handbook of Lexicography
Although these comments of Osselton’s were made about some of the major academic, historical dictionaries such as OED, views of a similar tenor are echoed by many scholars. Hartmann (1996: 241), for instance, writes that dictionary criticism ‘has been beset by personal prejudice rather than noted for the application of objective criteria’. Dohi (1992: 6) commented earlier that ‘[d]ictionaries in the past do not seem to have attracted the critical attention they deserve, or the criticism has been made not from a substantial but from a superficial, nitpicking point of view’. Hartmann and James (1998: 53) note, in the entry for evaluation, that ‘[a] systematic framework for formulating criteria with respect to COVERAGE, FORMAT, SCOPE, SIZE, TITLE, etc. has yet to be developed.’ All of these observations point to the fact that dictionary criticism leaves a great deal to be desired and that there is an acute need to ‘establish a sound and rigorous basis on which to conduct the criticism, together with a set of applicable criteria’, as Jackson (2002: 173) notes. The need for more research into dictionary criticism itself and for more objective evaluation criteria is obvious (Jackson 2000).
3 Suggestions for improvement It is worth noting that a wide variety of attempts have been made to introduce or set up applicable criteria that may be used in the actual implementation of dictionary criticism. I will touch upon some of the attempts, but I hasten to add that I do not mean to give a definitive, final answer to the question of exactly what the specific evaluation criteria should be, for reasons that will be given later. In his review of five American college dictionaries, McMillan (1949: 214) wrote that ‘[t]hese dictionaries1 can be compared by evaluating (1) the quantity of information, (2) the quality of the information and (3) the effectiveness of presentation’ and went on to say that ‘[t]he quality of the definition in a dictionary can be judged by various criteria: accuracy, completeness, clearness, simplicity, and modernity’ (McMillan 1949: 218). Steiner (1984) provides ‘a checklist for reviewers of bilingual dictionaries’, comprising three major categories, each with a few subcategories, which goes as follows (Steiner 1984: 168–78): (I) The degree of inclusiveness: A. The degree of inclusiveness of lexical elements, B. The degree of inclusiveness of nonlexical elements; (II) Problems involving either content or organization, or both: A. Bias and prejudice shown in the lexical or the nonlexical elements, B. Glossing the entry word only by substitutable translation equivalents, C. The degree to which the user is afforded meaning discrimination and the method by which it is provided, D. Is the dictionary monodirectional or bidirectional?, E. The establishment of standards for equivalents and the search for new equivalents, F. The faithful reversibility of the two sides of the dictionary, G. The amount of information given by the orthography adopted in the dictionary; (III) Problems involving organization: A. The consistency of the alphabetization, B. The feminine form of an adjective as a noun, C. The number of alphabetical lists, D. Under which entry word is an idiomatic expression to be entered?, E. The order in which the part-of-speech function is treated, F. The order of meanings, G. Words of different origin and/or 32
Methods in Dictionary Criticism
meaning of the same spelling, H. The degree to which the typography implements the goal of lexicographer, I. Convenient arrangement of the book. Some subcategories break down into further subparts. Kister’s (1992) contribution, which may be said to be a little different in nature in that it is a buyer’s guide, dealing with an overwhelming number of dictionaries, gives ‘[t]wenty points to consider when choosing a dictionary’ (Kister 1992: 64–73): 1. Does the dictionary provide the level of vocabulary coverage you need? 2. Are the dictionary’s contents clear and readable? 3. Is the dictionary produced by reputable people? 4. Is the dictionary reasonably current? 5. Are the dictionary’s definitions thorough, accurate, precise, and objective? 6. Does the dictionary include etymologies and, if so, are they relatively easy to understand? 7. Does the dictionary include illustrative quotations or examples and, if so, are they effective? 8. Does the dictionary include pictorial illustrations and, if so, are they effective? 9. Does the dictionary include synonyms and antonyms and, if so, how extensive are they? 10. Does the dictionary include variant spellings and pronunciations? 11. Is the dictionary’s pronunciation system reasonably precise and not overly complicated? 12. Does the dictionary furnish adequate usage notes and labels? 13. Does the dictionary emphasize American or British English? 14. Does the dictionary offer any special or unique lexical features? 15. Does the dictionary include any useful nonlexical (or encyclopedic) material? 16. Are the dictionary’s page layout and typography appealing to the eye? 17. Is the dictionary physically well made? 18. Is the dictionary fairly priced? 19. What do knowledgeable critics say about the dictionary? 20. How well does the dictionary measure up to its major competitors?’ All of these points are interesting and legitimate questions for prospective buyers. However, in academic or scholarly reviews of dictionaries, we do not take up some questions, such as 17 and 18 above, as they are aimed at different audiences. This is also part of the reason why I am of the opinion that there should be a distinction drawn between specialist reviews and journalistic ones.2 Nakamoto’s (1994) paper, which, in my view, is among the most in-depth and thoroughgoing pieces of work dealing with evaluation criteria, attempts to provide a comprehensive checklist for reviewing monolingual English dictionaries for foreign learners (cf. Nakamoto 1995a, 1995b). Nakamoto (1994: 16) states that the checklist ‘consists of four parts: (a) checkpoints about the macro-structure and micro-structure of the dictionary reviewed (to be abbreviated to [DA] and [DI] hereafter), (b) those about the review ([R]), (c) those about the critic(s) ([C]), and (d) those about the influence ([I])’. All checkpoints are ‘written in question form so that the critic can draft a review systematically by answering these questions one by one’ (Nakamoto 1994: 2). I refrain from listing all the specific checkpoints that he has produced because there are so many of them: [DA] has twenty-seven items in it, [DI] twenty-six, [R] twenty-one, [C] thirteen and [I] just two. To give a few examples for each category, [DA] includes questions such as ‘Is there any frontmatter article?’, ‘Who are the intended users?’, and ‘What is the general entry structure?’, and [DI] questions like ‘How are compounds shown?’, ‘How are the senses presented?’, and ‘Is any
33
The Bloomsbury Handbook of Lexicography
pragmatic information provided?’ In [R] are included such questions as ‘Who are the intended readers of the review?’, ‘Which features of the dictionary are reviewed?’, and ‘Is it a descriptive or evaluative review?’ [C] contains questions such as ‘Who is the reviewer?’, ‘What does the critic review the dictionary for?’ and ‘What experience does the reviewer have of dictionary reviewing?’ Last, the checkpoint questions in [I] are ‘Has the review changed the reviewed dictionary in its subsequent printing(s) and/or edition(s)?’ and ‘Has the review influenced dictionary making?’ (Nakamoto 1994: 16–44). Bogaards (1996) made a critical appraisal of the so-called big four dictionaries that came out in the year 1995, i.e. OALD5, LDOCE3, COBUILD2 and CIDE. In his review, Bogaards developed a set of criteria for evaluating EFL dictionaries. What should be noted is that his set of criteria is divided into two main parts: that of RECEPTION and that of PRODUCTION. I will have something to say about this later on. Chan and Taylor’s (2001) survey is very interesting in that it is a review of various aspects of a number of (selected) dictionary reviews. Accordingly, it may be said that their work is not directly involved in setting up evaluation criteria, but it does have an indirect bearing on it. Their findings suggest that ‘most dictionary reviews are factual and descriptive rather than evaluative, and only in some cases is the evaluation based on a principled study of any kind’ (Chan and Taylor 2001: 163). They also suggest that ‘they [such reviews] should be evaluative and that at least part of the evaluation should be based on a study of the use of the dictionary by target users’ (Chan and Taylor 2001: 163). See also Chan and Loong (1999). As the title of his paper indicates, Swanepoel (2008) is still more interesting because he has made a first attempt to develop ‘a general framework for the description and evaluation of dictionary evaluation criteria, using parameters from research on dictionary criticism and the usability of websites’ (Swanepoel 2008: 207). At the end of the article, he concludes by providing a framework consisting of the following four parameters: ‘Information covered by the evaluation criteria’; ‘Presentation format of the evaluation criteria’; ‘Validity of the evaluation criteria’ and ‘Application of the evaluation criteria’. Each parameter has a few subcategories. He goes on to say that ‘to be usable, the evaluation criteria themselves will have to meet the following evaluation criteria: be explicitly formulated, valid/motivated, generally acceptable, and the evaluative concepts on which they are based will have to be clearly defined and operationalized’ (Swanepoel 2008: 228). There are many more articles and books dealing with evaluation criteria that I have not space to touch upon here, e.g. Steiner (1993), Nielsen (2009); but let us put an end to this survey and move on to the next section, which does not imply in any way that these other works are not worthy of note or worth considering.
4 Some thoughts on the proposals for improvement Let us pause for a moment to consider some of the issues involved in or implied by studies of evaluation criteria such as those I mentioned above. First, let us think about the possibility of establishing comprehensive criteria that could be applied across the board, that is, to all dictionary types or genres. From an idealistic point of
34
Methods in Dictionary Criticism
view, it would be practical to have such a ‘common yardstick’. Let us name the comprehensive tool, applicable to all dictionaries, ‘Common Yardstick’ with capital letters. My question is, however, whether we would ever want to compare, for instance, a scholarly historical dictionary such as OED with a learner’s dictionary like OALD or LDOCE. I suppose not. As suggested by the fact that most investigations of evaluation criteria mentioned above have focused on one type of dictionary or another, it would be sensible and practical to set up evaluation criteria by defining or demarcating, early on, the dictionary type or genre of the dictionaries to be reviewed. Consequently, a ‘common yardstick’ rather than the Common Yardstick will be more easily accessible and available, and it will actually suffice insofar as the dictionaries under examination are of a particular type. Take, for example, the number of words entered in dictionaries as one aspect of dictionary features to be compared. Should it be the sole purpose of the dictionary to provide headwords, say, for the game of Scrabble, then the yardstick that we would be looking for will be quite simple: the more the better. That is not the case, however. Dictionaries have multiple purposes and functions that are reflected in their structure. Therefore, dictionary criticism will have to be conducted in such a way that reviewers take this multilayered complexity into account. Put another way, dictionaries may not be reviewed in such a simplistic manner. This is another reason for not seeking the comprehensive Common Yardstick. As for the distinction between receptive and productive functions as illustrated by Bogaards (1996), it is, indeed, a useful as well as meaningful dichotomy in many ways. Consider native speaker dictionaries like COD, however. It is questionable whether the targeted users, i.e. native speakers of English, wish to have such grammatical information as ‘countable’ or ‘uncountable’ for nouns, ‘attributive’ or ‘predicative’ for adjectives, or ‘complementation patterns’ for verbs. So, the kind of information needed for dictionaries of one particular genre obviously differs significantly from that required of dictionaries of another genre. We should remember that Bogaards’ detailed review focused on the big four monolingual learners’ dictionaries of English rather than native speaker dictionaries. Accordingly, the adoption of the dichotomy is adequate, effective and valid as far as learners’ dictionaries are concerned, whereas it may not be so for other types of dictionary such as native speaker ones. This would constitute yet another reason for not seeking the Common Yardstick. If we do, however, press forward with a quest for the Common Yardstick, we will find ourselves compelled to raise the degree or level of abstraction of evaluation criteria so that the Yardstick would encompass all relevant properties of different criteria. We would eventually end up with something like accuracy, adequacy, consistency, correctness and the like. Yet, the difficulty in their actual applicability or practicality lies precisely in the fact that these concepts are highly abstract. It would seem that a further attempt to provide a clear and definite answer to the question of what constitutes the Common Yardstick lies outside the scope of this chapter.
5 Dictionary analysis In this section, I would like to introduce the reader to what we call ‘dictionary analysis’ (Nakamoto 1998, Akasu 2005). By ‘we’ I refer to those active members of the Iwasaki Linguistic
35
The Bloomsbury Handbook of Lexicography
Circle (abbreviated to ILC hereafter) who, like myself, have been involved in the writing of dictionary analyses3 (Higashi et al. 1988, Akasu 2008, 2012, Akasu et al. 1996, 2001, 2005). Dictionary analysis is put forward here as one of the suitable or promising candidates for a reasonably practical method or measure to carry out dictionary criticism. My belief is that this particular type of analysis deserves more attention and recognition among people concerned with dictionary criticism in particular and, for that matter, lexicography in general, both theoretical and practical. As I mentioned in Akasu (2007), the first dictionary analysis of its kind was published back in 1968. There have been more than fifty articles of dictionary analysis published in Lexicon, the mouthpiece of the ILC, and other journals.4 I will consider, as an example, the following article in Lexicon that came out in 2000: ‘An Analysis of The New Oxford Dictionary of English’ (abbreviated to NODE, hereafter). In what follows, special attention will be given to the methodological aspects of the analysis. First, I was asked to be the head of the reviewing team. Thus, this was going to be a ‘team review’, in accordance with Chapman (1977), with each reviewer being a specialist in some area.5 My next task was to decide on what sort of dimensions NODE should be looked into. This would determine exactly how many other analysts were needed. The following six dimensions were chosen: headwords, pronunciation, definition, illustrative examples, grammatical information and etymology. The dimensions like these would, of course, vary in number and type according to the kind of dictionary to be analysed. I will take up, by way of illustration, Section Four of the article, titled ‘Sense description’, the section that I was in charge of. For reasons of brevity, I cannot give a detailed account of all sections in Akasu et al. (2000), but, basically, the same policy and principles of analysis run though all the other sections as well, so one may safely say that an account of the features as seen in one section will be sufficient for clarification. The section in question is composed of ten subsections, beginning with ‘Introductory remarks’, part of which goes as follows: [T]he sense description of NODE will be examined from a number of aspects. First, the core sense and subsense structure will be considered. Then, specific entries will be examined according to their types and, in so doing, reference will be made, where appropriate, to the division, arrangement, and presentation of the senses of words entered. Lastly, usage labels will be discussed (Akasu et al. 2000: 71) The remaining subsections included are: ‘Core senses and subsenses’, ‘Common words’, ‘Ergative verbs’, ‘Phrasal verbs’, ‘Encyclopedic and specialist entries’, ‘Function words’, ‘Derivatives’, ‘Coverage’ and ‘Labels’. As is clear from this list, different types of entry are the subject of study in the analysis. It should be noted, in this connection, that this is a comparative review, in that corresponding entries of such dictionaries as CED4 and CD were referred to in investigating the relevant entries of the dictionary under review.6 Other dictionaries, including COD9, COBUILD2 and LDOCE3, were also consulted where appropriate. Coverage refers to the areas of meaning of words covered in the dictionary. In examining coverage, the following eight pages of NODE were selected: 100-1, 600-1, 1100-1 and 1600-1.
36
Methods in Dictionary Criticism
It should be noted that this is a case of random sampling to make an objective survey. There were 249 headwords in all, and then the corresponding entries of CED4 were closely compared. It was found that CED4 covers a slightly wider range of meaning. As for labels used, a survey on the same pages mentioned above was carried out. Obviously, these are pieces of quantitative research. In contrast, I looked into a number of particular entry words qualitatively to examine the core sense and subsense structure, common words and so on. Thus, this whole analysis turned out to be a combination of qualitative and quantitative study. Incidentally, Nakao (1972: 53), one of the original members of the ILC, noted that ‘it is important to point out problems of the dictionary under review by surveying or analysing it from a holistic point of view, in addition to criticism of specific entry problems’ (my translation), which is a telltale sign that he knew early on that a well-balanced review was necessary because only that could give rise to improvement in dictionaries. One more point to be noted about our dictionary analysis is that all members of the reviewing team are practical as well as theoretical lexicographers. By practical lexicographers I mean that they are experienced lexicographers in that they have had, in one way or another, a substantial as well as active role to play in the compilation of different dictionaries. My conviction is that a reviewer of dictionaries should have had at least some experience of writing a dictionary before doing a review. The knowledge should certainly give the reviewer some idea of what is involved in actual dictionary making in real terms, and this in turn would help him or her to make a realistic and reasonable appraisal or to make an informed, sensible judgement, without exaggeration, about the work under review. Consider the following quotation from Rundell (1998: 316): ‘[I]mprovement can be said to take place when: the description of a language that a dictionary provides corresponds more closely to reliable empirical evidence regarding the way in which that language is used; the presentation of this description corresponds more closely to what we know about the reference needs or reference skills of the target user.’ While this statement is sensible and sound, I suspect that Rundell’s perspicacity as demonstrated in the statement above comes, partly at least, from his own experience as a practical lexicographer, which does seem to make a difference.
6 New directions in dictionary analysis The importance or significance of user research has already been recognized among lexicographers, and our dictionary analysis has also started to incorporate it.7 To be specific, Dohi et al. (2002) were the first to include user research as part of the dictionary analysis. The research consisted of three forms of enquiry: a questionnaire, a written test and interviews. Ishii (2010, 2011), another member of the ILC, has offered a promising new approach. He has built a database or special corpus of the full texts of CALD3, COBUILD6, LDOCE5, MEDAL2 and OALD7, and made an attempt to compare the readability of definitions and illustrative examples employed in each of the dictionaries above. In so doing, he used the Flesch Reading Ease Score and the Flesch-Kincaid Grade Level as measures of calculating the specific level of readability (Kincaid et al. 1975). What should be noted in this context is the fact that this is a
37
The Bloomsbury Handbook of Lexicography
quantitative study that might replace the random sampling method mentioned earlier and that could allow us to conduct a full and thoroughgoing investigation, resulting in still more accurate, objective and reliable data.
7 Conclusion We saw in this chapter that the business of dictionary criticism is still under development and that the setting up of applicable evaluation criteria is the key issue in its refinement and improvement. I argued that the establishment of those criteria may be encouraged or facilitated by delimiting or delineating the relevant dictionary type or types. For reviewing dictionaries comparatively, I presented dictionary analysis as an effective as well as a workable method, subsuming all suggestions put forward by Chapman (1977). I do not claim that this kind of analysis is the best or ideal method, but it has certainly provided us with findings of special and professional interest and revealing insights. I may have sounded pessimistic about the setting up of the Common Yardstick, because I gave up on it. I just chose to be realistic and practical. I hasten to add that we are not content, nor should we be, with the way dictionary analysis stands now. It is about time dictionary analysis itself was reviewed in a new light and, in this connection, the two new approaches introduced in the preceding section are welcome additions in that direction. One final note of caution. The business of dictionary criticism might lead dictionary reviewers into a risky position where their work could induce some form of uniformity or at least a propensity for it in the design features of dictionaries. In fact, back in 1995, the celebrated ‘big four’ dictionaries did have their own distinct, characteristic features, but it would seem that the current ‘big five’, with MEDAL added in 2002, have now lost some of their distinctive identity and have begun to show an increasing resemblance to one another in at least some of their features, though the widespread use of corpora may have been a contributing factor. Similar concerns are voiced by some lexicographers such as Yamada (2010). That, we do not want to see happen.
Postscript It is worthy of note that electronic dictionaries are increasingly and steadily gaining ground and one may get the impression that print dictionaries are almost overshadowed by their emergence. I hasten to add that electronic dictionaries are not uniform in types and formats and they come in more than one design. It is true, indeed, that there are certain features or functions that are specific to electronic dictionaries. One such obvious example is that of multimodality: Users of electronic dictionaries can, in most cases, listen to them, by way of pronunciation of headwords or illustrative examples, as the case may be, which does not happen in the case of print dictionaries. To give a second example, you can jump to another item or other items in the dictionary, or to some other graphic material and the like, without any difficulty at all. It follows from this that there are some additional areas or aspects of electronic dictionaries that have been introduced by relatively recently developed technology, and, admittedly, these features should be taken into 38
Methods in Dictionary Criticism
account in evaluating the dictionaries. I would like to underline, however, that the points made in this chapter are no less applicable to and still hold true for these developments in electronic dictionaries.
Notes 1 ‘These dictionaries’ refer The American College Dictionary (1948), New College Standard Dictionary (1947), Macmillan’s Modern Dictionary, Rev. edition (1947), Webster’s Collegiate Dictionary, 5th edition (1941) and The Winston Dictionary, College edition (1946). 2 See Svensén (2009) for the distinction between specialist reviews and journalistic reviews. It is to be noted, also, that Swanepoel (2008: 214) mentions that ‘Ripfel (1989) distinguishes between evaluation criteria used in journalistic reviews and those used in expert reviews.’ I hasten to add that I am well aware that such features as the physical quality of dictionaries and their pricing do matter a great deal when it comes to deciding which dictionary to buy. I stress the fact, however, that these features are not peculiar to dictionaries but apply to other kinds of publications as well. 3 The Iwasaki Linguistic Circle, or ‘Iwasaki Kenkyukai’ in Japanese, is a group of linguists and lexicographers, theoretical and practical, based in Tokyo, Japan. The ILC has been publishing a journal called Lexicon once a year for nearly fifty years now. The journal is characterized by the fact that each issue contains one or two articles of dictionary analysis. For a fuller account of this publication, see Akasu (2003, 2007). An electronic version of Akasu (2007) is available at the following site: http:// kdictionaries.com/kdn/kdn15/kdn1504-akasu.html. In this connection, I might add one more point here. All the articles in Lexicon began to be published in English in 1995 (i.e. Lexicon no. 25), and all of them in and after that year may be accessed online now at the Globalex website: https://globalex. link/publications/Lexicon. 4 ‘Other journals’ include International Journal of Lexicography, in which there are two reviews conducted and contributed by some ILC members. One, which appeared in 1992, was of the eighth edition of The Concise Oxford Dictionary; and the other, published in 1994, was of the second edition of Longman Dictionary of the English Language. Although these two reviews are considerably shorter, due to space limitations of IJL reviews, than the kind of dictionary analysis found in Lexicon, they do retain the characteristic features of ILC’s dictionary analysis. It might be a good starting point for interested readers to take a look at these, i.e. Higashi et al. (1992) and Masuda et al. (1994), in order to get some idea of what dictionary analysis is like. 5 It should be underlined that the very first dictionary analysis mentioned earlier, which was of (the first edition of) The Penguin English Dictionary, had been performed, with subsequent analyses following one after another, long before Chapman (1977: 158) made this suggestion ‘toward a still better method’. 6 Comparative reviews may be divided into two major types: diachronic or longitudinal and synchronic or, if you will, ‘latitudinal’ ones. The former includes comparisons of the type where the current edition of a dictionary is compared with its previous edition or editions, e.g. a comparison of COD12 with COD11 or of LDOCE5 with LDOCE4 and/or still earlier editions. Our analysis of NODE belongs to the latter type. There are a number of other subtypes in this category. To give just a few examples, a dictionary of one variety of a language may be compared with its counterpart of another, e.g. a comparison of NODE with NOAD or of OALD with OAAD. A comparison can also be made between a ‘senior’ dictionary and its ‘junior’ version, e.g. a comparison between COD and POD or CALD and CLD. 7 The following is an oft-quoted passage from Johnson’s celebrated Plan: ‘The value of a work must be estimated by its use: It is not enough that a dictionary delights the critic, unless at the same time it instructs the learner’ (Johnson 1947: 5). Truer words were never ‘written’.
39
The Bloomsbury Handbook of Lexicography
References Dictionaries CALD: Cambridge Advanced Learner’s Dictionary, First edition 2003, Cambridge: Cambridge University Press. CALD3: Cambridge Advanced Learner’s Dictionary, Third edition 2008, Cambridge: Cambridge University Press. CD: The Chambers Dictionary, 1998, Edinburgh: Chambers Harrap. CED4: Collins English Dictionary, Fourth edition 1998, Glasgow: HarperCollins. CLD: Cambridge Learner’s Dictionary, First edition 2001, Cambridge: Cambridge University Press. COBUILD2: Collins COBUILD English Dictionary, New edition 1995, London: HarperCollins. COBUILD6: Collins COBUILD Advanced Dictionary of English, Sixth edition 2009, Glasgow: HarperCollins. COD: The Concise Oxford Dictionary of Current English, First edition 1911, Oxford: Clarendon Press. COD9: The Concise Oxford Dictionary of Current English, Ninth edition 1995, Oxford: Oxford University Press. COD11: Concise Oxford English Dictionary, Eleventh edition 2004, Oxford: Oxford University Press. COD12: Concise Oxford English Dictionary, Twelfth edition 2011, Oxford: Oxford University Press. LDOCE: Longman Dictionary of Contemporary English, First edition 1978, Harlow: Longman. LDOCE3: Longman Dictionary of Contemporary English, Third edition 1995, Harlow: Longman. LDOCE4: Longman Dictionary of Contemporary English, Fourth edition 2003, Harlow: Pearson Education Limited. LDOCE5: Longman Dictionary of Contemporary English, Fifth edition 2009, Harlow: Pearson Education Limited. MEDAL: Macmillan English Dictionary for Advanced Learners, First edition 2002, Oxford: Macmillan Education. MEDAL2: Macmillan English Dictionary for Advanced Learners, Second edition 2007, Oxford: Macmillan Education. NOAD: The New Oxford American Dictionary, First edition 2001, New York: Oxford University Press. NODE: The New Oxford Dictionary of English, First edition 1998, Oxford: Clarendon Press. OAAD: Oxford Advanced American Dictionary for Learners of English, 2011, Oxford: Oxford University Press. OALD: (Advanced) Learner’s Dictionary of Current English, First edition 1948, London: Oxford University Press. OALD7: Oxford Advanced Learner’s Dictionary of Current English, Seventh edition 2005, Oxford: Oxford University Press. OED: The Oxford English Dictionary, First edition 1884–1933, Oxford: Clarendon Press. POD: The Pocket Oxford Dictionary of Current English, First edition 1924, Oxford: Clarendon Press.
Other references Akasu, K. (2003), ‘Dictionary analyses in Lexicon revisited’, a paper presented at the Third Asialex Biennial International Conference, Meikai University, Urayasu, Chiba, Japan, 27 August 2003. Akasu, K. (2005), ‘A glimpse into dictionary analyses’, a paper presented at the meeting of the English Language Postgraduate Seminar, University of Birmingham, UK, 18 November 2005. Akasu, K. (2007), ‘The Iwasaki Linguistic Circle and dictionary analysis’, Kernerman Dictionary News 15, 5–11. 40
Methods in Dictionary Criticism
Akasu, K. (2008), ‘An analysis of A Valency Dictionary of English: A Corpus-based Analysis of the Complementation Patterns of English Verbs, Nouns and Adjectives’, Lexicon 38, 12–28. Akasu, K. (2012), ‘The first dictionary of English collocations in Japan’ in J. Szerszunowicz (ed.), Research on Phraseology in Europe and Asia: Focal Issues of Phraseological Studies, Vol. 1, Bialystok: University of Bialystok Publishing House, 45–56. Akasu, K. and S. Uchida (eds) (2011), Lexicography: Theoretical and Practical Perspectives, Proceedings of the Seventh ASIALEX Biennial International Conference, Kyoto: The Asian Association for Lexicography. Akasu, K., T. Koshiishi, T. Makino, A. Kawamura and Y. Asada (2005), ‘An analysis of Cambridge Advanced Learner’s Dictionary’, Lexicon 35, 127–84. Akasu, K., T. Koshiishi, R. Matsumoto, T. Makino, Y. Asada and K. Nakao (1996), ‘An analysis of Cambridge International Dictionary of English’, Lexicon 26, 3–76. Akasu, K., K. Nakamoto, H. Saito, Y. Asada, K. Urata and K. Omiya (2000), ‘An analysis of The New Oxford Dictionary of English’, Lexicon 30, 53–117. Akasu, K., H. Saito, A. Kawamura, T. Kokawa and R. Hotta (2001), ‘An analysis of the Oxford Advanced Learner’s Dictionary of Current English, Sixth Edition’, Lexicon 31, 1–51. Bogaards, P. (1996), ‘Dictionaries for learners of English’, International Journal of Lexicography 9 (4), 277–320. Chan, A. and Y. Loong (1999), ‘Establishing criteria for evaluating a learner’s dictionary’ in R. Berry, B. Asker, K. Hyland and M. Lam (eds), Language Analysis, Description and Pedagogy, Hong Kong: Hong Kong University of Science and Technology, 298–307. Chan, A.Y.W. and A. Taylor (2001), ‘Evaluating learner dictionaries: What the reviewers say’, International Journal of Lexicography 14 (3), 163–80. Chapman, R. L. (1977), ‘Dictionary reviews and reviewing: 1900–1975’ in J. C. Raymond and I. W. Russel (eds), James B. McMillan: Essays in Linguistics by His Friends and Colleagues. Alabama: University of Alabama Press, 143–61. Dohi, K. (1992), ‘A note on dictionary criticism’, LEXeter Newletter 10, 6–7, Exeter: Dictionary Research Centre. Dohi, K. (1993), ‘The need for dictionary criticism’, Toyoko English Studies 2, 21–32. Dohi, K., A. Shimizu, T. Osada, K. Komuro, T. Kanazashi, S. Isozaki and K. Urata (2002), ‘An analysis of Longman Advanced American Dictionary’, Lexicon 32, 1–96. Fontenelle, T. (2008), Practical Lexicography: A Reader, Oxford: Oxford University Press. Hartmann, R.R.K. (1996), ‘Lexicography as an applied linguistic discipline’ in R.R.K. Hartmann (ed.), Solving Language Problems: From General to Applied Linguistics, Exeter: University of Exeter Press, 230–44. Hartmann, R.R.K. and G. James (1998), Dictionary of Lexicography, London: Routledge. Hausmann, F-J., O. Reichmann, H.E. Wiegand and L. Zgusta (eds) (1989-91), Wörterbücher, Dictionaries, Dictionnaires: Ein Internationales Handbuch zur Lexikographie, vols. 1–3, Berlin: Walter de Gruyter. Higashi, N., K. Dohi and K. Akasu (1988), ‘BBI eigo rengo katsuyo jiten no bunseki [An analysis of The BBI Combinatory Dictionary of English]’, Lexicon 17, 43–124. Higashi, N., S. Takebayashi, K. Nakao, M. Sakurai, F. Yamamoto, H. Masuda and S. Yawata (1992), ‘A review of The Concise Oxford Dictionary of Current English’, International Journal of Lexicography 5 (2), 129–60. Ishii, Y. (2010), ‘Shuyo EFL jisho ni okeru yorei no goi-reberu hikaku [Comparison of vocabulary levels of illustrative examples in major EFL dictionaries]’, a paper presented at JACET workshop held at Toyo University, 27 March 2010. Ishii, Y. (2011), ‘Comparing the vocabulary sets used in the “big five” English monolingual dictionaries for advanced EFL learners’ in K. Akasu and S. Uchida (eds), 180–9. Jackson, H. (2000), ‘Dictionary criticism’, a paper presented at InterLex 14, University of Exeter, 10 April 2000. Jackson, H. (2002), Lexicography: An Introduction, London: Routledge.
41
The Bloomsbury Handbook of Lexicography
Johnson, S. (1947), The Plan of a Dictionary of the English Language, London: J. and P. Knapton (reprinted in Fontenelle 2008). Kincaid, J.P., R.P. Fishburne Jr., R.L. Rogers and B.S. Chissom (1975), ‘Derivation of new readability formulas (Automated readability index, Fog count and Flesch reading ease formula) for navy enlisted personnel’ in Research Branch Report, Millington, TN: Naval Technical Training, U. S. Naval Air Station, Memphis, TN, 8–75. Kister, K.F. (1992), Kister’s Best Dictionaries for Adults and Young People: A Comparative Guide, Phoenix, AZ: Oryx Press. Masuda, H., S. Takebayashi, K. Akasu, F. Yamamoto and K. Nakao (1994), ‘A review of Longman Dictionary of the English language’, International Journal of Lexicography 7 (1), 31–46. McMillan, J.B. (1949), ‘Five college dictionaries’, College English 4, 214–21. Nakamoto, K. (1994), ‘Establishing criteria for dictionary criticism: A checklist for reviewers of monolingual English learners’ dictionaries’, Unpublished MA dissertation submitted to the University of Exeter. Nakamoto, K. (1995a), ‘A checklist for reviewers of EFL dictionaries: Checkpoints about the review’, Lexicon 25, 1–13. Nakamoto, K. (1995b), ‘A checklist for reviewers of EFL dictionaries: Checkpoints about the dictionary under review’ in S. Takahashi, K. Asao and R. Matsumoto (eds), In Honor of Nobuyuki Higashi: Papers Contributed on the Occasion of his Sixtieth Birthday September 4,1995, Tokyo: Kenkyusha, 16–35. Nakamoto, K. (1998), ‘An analysis of ILC’s ‘dictionary analysis”, Lexicon 28, 28–38. Nakao, K. (1972), ‘Jisho no chosa/bunseki: sono igi to mondaiten [Survey and analysis of dictionaries: their significance and problems]’, Lexicon 1, 49–57. Nielsen, S. (2009), ‘Reviewing printed and electronic dictionaries: A theoretical and practical framework’ in S. Nielsen and S. Tarp (eds), Lexicography in the 21st Century: In Honour of Henning Bergenholtz, Amsterdam: John Benjamins Publishing Company, 23–41. Osselton, N.E. (1989), ‘The history of academic dictionary criticism with reference to major dictionaries’ in F. J. Hausmann et al. (eds), Vol. 1, 225–30. Ripfel, M. (1989), Wörterbuchkritik. Eine empirische Analyse von Wörterbuchrezensionen (Lexicographica. Series Maior 29), Tübingen: Max Niemeyer. Rundell, M. (1998), ‘Recent trends in English pedagogical lexicography’, International Journal of Lexicography 11 (4), 315–42. Steiner, R. J. (1984), ‘Guidelines for reviewers of bilingual dictionaries’, Dictionaries 6, 166–81. Steiner, R. J. (1993), ‘Reviews of dictionaries in learned journals in the United States’, Lexicographica 9, 158–73. Svensén, B. (2009), A Handbook of Lexicography: The Theory and Practice of Dictionary-Making, Cambridge: Cambridge University Press. Swanepoel, P. (2008), ‘Towards a framework for the description and evaluation of dictionary evaluation criteria’, Lexikos 18, 207–31. Swanepoel, P. (2013), ‘Evaluation of dictionaries’ in R.H. Gouws, U. Heid, W. Schweickard and H.E. Wiegand (eds), Dictionaries: An International Encyclopedia of Lexicography. A Supplementary Volume: Recent Developments with Focus on Electronic and Computational Lexicography, Berlin and Boston: Walter de Gruyter, 587–96. Yamada, S. (2010), ‘EFL dictionary evolution: Innovations and drawbacks’ in I.J. Kernerman and P. Bogaards (eds), English Learners’ Dictionaries at the DSNA 2009, Tel Aviv: K Dictionaries, 147–68.
42
5
Researching users and uses of dictionaries Hilary Nesi
1 Introduction Dictionary use was not a popular research topic until the final years of the twentieth century. Welker (2010) summarizes 320 empirical dictionary use studies, but only six of these were conducted before 1980. In the 1980s there was an upsurge of interest, and an increasing number of studies have taken place in each decade since then, although according to Kosem et al. (2019a) very few have come from outside the UK, Denmark and Germany. Much of the latest dictionary use research focuses on electronic dictionaries produced locally or regionally for specific user groups (see Pastor and Alcina, Chapter 8), as evidenced, for example, at recent AFRILEX, ASIALEX and EURALEX conferences, and in the proceedings of the six biennial eLex conferences (Granger and Paquot (eds) 2010; Kosem and Kosem (eds) 2011; Kosem et al. (eds) 2013, 2015, 2017, 2019b). The aim of all studies of dictionary use is to discover ways to increase the success of dictionary consultation. This involves the identification of users’ needs and skills deficits, and the making of appropriate matches between types of dictionary, types of dictionary user and types of dictionary use. The following research questions are pertinent: ●● ●● ●● ●● ●●
Who are the users of dictionaries? What kinds of activity prompt dictionary use? What kinds of dictionary do users prefer to use? What kinds of information do dictionary users look for? What consultation strategies do dictionary users employ, and how successful are these strategies?
This chapter looks at some of the various ways in which these questions have been addressed.
2 Methods Welker (2010) identifies six main methods of investigating dictionary use: 1. Questionnaire surveys 2. Interviews 3. Observation
The Bloomsbury Handbook of Lexicography
4. Protocols 5. Tests and experiments 6. Log files All these methods are intended to shed light on the way dictionary users consult dictionaries for their own purposes, under non-experimental conditions. Gathering such information is not easy, however, because the data-gathering instruments often rely heavily on users’ ability to explain their consultation behaviour, and because they also tend to steer users towards uncharacteristic patterns of use. There has been some criticism of questionnaire surveys as a method of investigating dictionary use. Hatherall (1984), Wiegand (1998), Nesi (2000a) and Tarp (2009), for example, have argued that they can place unreasonable demands on users’ powers of recall, and that there is a danger that users and questionnaire designers may not share the same concepts and terminology. Nevertheless, if they are well designed and carefully implemented, they can be an effective means of discovering long-term dictionary-using habits and attitudes in large populations (see, for example, the impressively large survey conducted by Kosem et al. (2019a), completed by 9,562 respondents from nearly 30 European countries). Surveys distributed online have the additional advantage of enabling actual or simulated dictionary information to be embedded within questions, as in the surveys described in Müller-Spitzer (ed.) (2014). Dictionary features illustrated in this way enable respondents to gain a clearer understanding of the researchers’ investigative intent, as pointed out by Lew (2015). Interviews and observations are used less frequently in dictionary research, and generally with only a few participants, because of the cost in terms of time and expertise. Neubach and Cohen (1988) interviewed only six dictionary users, for example, and East (2008) observed groups of six and five users. Interviews and observations can, however, be more successful than questionnaire surveys as a means of probing dictionary-using behaviour. Interview participants can ask each other for clarification if unexpected aspects of dictionary use come to light, and observations can reveal behaviour without the need for users to describe it at all. The data are therefore less likely to be coloured by misunderstandings and misconceptions, although they do not always reveal natural look-up behaviour because the interviewer or observer may unintentionally influence the outcome, especially if participants believe that researchers approve of certain strategies and disapprove of others. Interviews are sometimes combined with other data-gathering methods, to triangulate findings and increase validity. Ptaszynski and Sobkowiak (2011) combined interviews with a survey and protocols, for example, and Gromann and Schnitzer (2016) used interviews alongside a survey and production tasks. Laboratory-based methods now enable researchers to observe in detail the way users interact with dictionary information. Heid and Zimmerman (2012) adopted usability testing methods from the field of information science to record keystroke patterns for search routes through different types of dictionary interface. A larger number of studies (e.g. Tono, 2011, Kaneta 2011, Simonsen 2011, Kemmer 2014, Müller-Spitzer, Michaelis and Koplenig 2014, Lew et al. 2013, 2018) have employed eye-tracking, a method borrowed from the fields of cognitive science, psycholinguistics and human-computer interaction, to discover what areas of the dictionary entry users view, in what order, and for how long. Müller-Spitzer, Michaelis and Koplenig (2014) provide a good summary of the advantages and disadvantages of this technique: the eye-tracked gaze is a relatively good indication of perception, but tracking cannot confirm that what is gazed 44
Researching Users and Uses of Dictionaries
upon is actually perceived. Research of this kind must be conducted in an artificial setting, usually with a relatively small number of participants, but it can reveal aspects of dictionary use that it is impossible for a human interviewer or observer to discover. Completely natural look-up behaviour is difficult to record because it is a private activity that occurs spontaneously rather than to order. A researcher might spend a very long time observing a potential user as they read or write for their own purposes, before catching the moment when dictionary consultation occurs. Tests or tasks which prompt dictionary use are useful as a means of generating a lot of structured data in a short amount of time, particularly for comparative purposes, for example, to identify the conditions under which users look up the most words, take the least time, achieve the highest comprehension scores or retain the most vocabulary. It is not always possible to extrapolate information about natural dictionary consultation from these findings, however – for example, the dictionaries available at the site of the experiment may not be the same as the ones users normally consult, and the task may not bear much resemblance to the users’ normal reading, writing or translating activities. Protocols, or self-reports, can shed light on users’ understanding and decision making, either during spontaneous dictionary use, or whilst completing a task set by the researcher. Oral protocols or ‘think aloud’ reports, used, for example, by Thumb (2004), Nesi and Boonmoh (2009) and Alhaisoni (2020), are recordings of participants’ thoughts, spoken aloud throughout the consultation process. User behaviour is thus open to examination without the distortion of faulty recall or re-interpretation, but usually relates to only a small number of participants because of the special skills needed to think aloud, and the amount of time required to gather and analyse spoken data. Nesi and Boonmoh (2009), for example, chose 17 participants from a cohort of 580 students to train in think-aloud techniques. They collected data from just eight of these, the ones who proved most capable of verbalizing their dictionary use. Written protocols can be either freely written or structured using a format prepared by the researcher, perhaps with multiple choice options. Typically, they record a reason for each dictionary search, the information searched for, the dictionary used and an evaluation of the success of the consultation. The method is suitable for use with multiple participants: Müllich (1990) collected 108 written protocols from language learners, for example, and Harvey and Yuill (1997) collected 211. Protocols can be produced retrospectively or during a dictionary using task. Both types have their attendant problems: there is a danger that users will forget the details of their consultations after they have completed the task, but producing a protocol while completing a task is quite disruptive and can lead to loss of focus. Atkins and Varantola (1997) asked participants to work in pairs to reduce disruption, one member using a dictionary, and the other recording the process. With all forms of protocol it is likely that some behaviours will go unrecorded or misrecorded, however, because consultation processes cannot always easily be described. In recent years log-file analysis has become an increasingly popular method of studying dictionary use (Töpel 2014). It has been used, for example, by Lew (2011), Müller-Spitzer et al. (2015), Kozioł-Chrzanowska (2017) and De Schryver et al. (2006, 2019). It is a good way of capturing information about the searches users make online, when they are engaged in their normal activities, perhaps over an extended period of time. Log files have the benefit of being unobtrusive, but like other forms of observation they cannot on their own provide much insight into the context or purpose of dictionary consultation. Moreover, although they may be able to indicate whether consultations lead to the information users are searching for, when analysed in 45
The Bloomsbury Handbook of Lexicography
isolation they cannot reveal whether users consider their consultations to have been successful. Most countries have data protection laws which prevent researchers from enriching log files with personal user information, unless the users first give explicit permission for their look-ups to be logged. This procedure would probably reduce the benefit of the method, however, because alerting users to the logging process would affect their natural consultation behaviour (MüllerSpitzer et al. 2015: 2). For this reason, log-file studies generally do not attempt any kind of user categorization, settling instead for large-scale analysis of the kinds of words that users search for. De Schryver et al. (2006) and Verlinde and Binon (2010) did not find evidence for a relationship; between search frequency and corpus frequency, but methods for analysing log files have since improved, and Müller-Spitzer et al. (2015) and De Schryver et al. (2019) were able to show that users are more likely to search for higher-frequency items. This is a matter of importance for teachers and learners, because, as Lew (2015: 243) points out, it justifies the use of wordlists created from corpus derived frequency lists. Dictionary users, uses and contexts of use can all vary enormously, making it unsafe to generalize from the findings of individual studies. In some other fields of research large-scale controlled trials can test how effectively a given treatment works, but the effectiveness of a dictionary cannot usually be investigated by this means because it is difficult to enlist the aid of a representative sample of all potential users (Welker 2010: 13). Studies therefore tend to focus on the behaviour of smaller and more specific groups, representing dictionary users of one particular type, in one particular context. To facilitate the comparison of findings from different studies researchers sometimes try to adopt similar methods; Welker (2010:13) cites a number of studies utilizing similar questionnaire formats, for example, and Dziemianko (2012) traces a sequence of dictionary use replication studies. In her edited collection Müller-Spitzer (2014) assembles a number of interrelated studies linked to the German Internet Lexicography network. The studies are predominantly survey-based but combine some other approaches such as eyetracking (Kemmer 2014) and user evaluations (Koplenig and Müller-Spitzer 2014b). This mixed method approach helps to compensate for the inevitable limitations of individual methods, and is advocated by Töpel (2014, same volume). Zhang et al. (2020) seems to be the only metaanalysis to have been conducted in the field of lexicography so far. It compares the findings from 44 studies with a total of 3,475 participants and is able to resolve some arguments relating to print versus online dictionaries, and monolingual versus bilingual dictionaries, where individual studies provide conflicting evidence. Further meta-analysis in areas other than vocabulary acquisition would clearly benefit the field, although, of course, such studies are possible only in areas where a substantial body of research has already been built up. Multiple studies of different groups, using similar or complementary methods, may gradually enable us to build up a complete picture of ‘how dictionaries are used … who the users are, where, when and why they use dictionaries, and with which result’ (Tarp 2009: 279).
3 Who are the users of dictionaries? Varantola (2002: 33) divides dictionary users into three broad categories: language learners, non-professional users and professional users, these last being those who ‘normally use a
46
Researching Users and Uses of Dictionaries
dictionary to perform a task that they get paid for’. Other user variables that are likely to affect behaviour are: ●● ●● ●● ●● ●● ●● ●● ●●
age mother tongue second or foreign language language proficiency level educational level level of skill in dictionary use role (as a teacher, learner, translator, traveller, player of word games, etc.) location (geographically, and within the home, place of work or educational institution).
The geographical location of online dictionary users is ascertainable from log files, at least to some extent, but questionnaires are typically considered the best way to obtain factual information relating to some or all of Varantola’s variables. Items to establish user profiles were included in surveys by Atkins and Varantola (1998), Hartmann (1999) and Kosem et al. (2019a), for example, and are often found in larger surveys of user wants and needs involving multiple data collection methods. Research participants are usually selected on the grounds that they are available, willing to take part, and reasonably representative of the types of user the researchers are most concerned with. This means that they are often university students, as researchers are usually based in universities. It also means that people in locations where little research takes place tend to be under-represented in studies of dictionary use. Lew (2011, 2015) points out that there is a deficit of information about many contexts of use, for example by tourists, or people doing crossword puzzles at home, and argues that lexicographical research prioritizes educated professionals over ordinary, less skilled dictionary users.
4 What kinds of activity prompt dictionary use? Most dictionary consultations are undertaken when the user is engaged in another activity, in order to ‘solve a context-dependent problem’ (Varantola 2002). Dictionary use is typically classified as ‘receptive’ (i.e. to help with text decoding tasks) or ‘productive’ (i.e. to help with text encoding tasks), although dictionaries can also be treated as resources for learning new vocabulary or finding out about a language. Table 5.1 summarizes the broad range of activities generally associated with dictionary use. Traditionally the large monolingual dictionaries have focussed on the receptive needs of native speakers, while learners’ dictionaries, bilingualized dictionaries and L1–L2 bilingual dictionaries also support language production by providing translations and/or more grammar, phraseology, usage and pronunciation information. Most, but not all, surveys have found that dictionaries are more often used receptively, whilst reading (Marello 1987, Hartmann 1999, Stark 1999) or translating from L2 to L1 (Tomaszczyk 1979). Battenburg (1991) found that lower-level students used their dictionaries more whilst reading, and advanced level students used their dictionaries more whilst writing. Tomaszczyk’s survey respondents reported using dictionaries for speaking
47
The Bloomsbury Handbook of Lexicography
Table 5.1 Activities associated with dictionary use.
Written medium
Spoken medium
Receptive
Productive
Reading
Writing
Translating from L2 to L1
Translating from L1 to L2
Listening
Speaking
Interpreting from L2 to L1
Interpreting from L1 to L2
Gathering language information
and listening activities, but he concluded that they might have been referring to the preparation of oral reports. Dictionary use prior to the advent of smartphones was generally associated with activities in the written medium, and as yet the spontaneous use of mobile e-dictionaries has not been researched to any great extent. ‘Reading’, ‘writing’, ‘speaking’ and ‘listening’ are very broad activity types. Some questionnaires make finer distinctions; Ripfel (1990) lists reading newspapers or magazines, listening to the radio or watching television, explaining word meanings to children, doing homework and writing letters, for example; and Hartmann (1999) includes playing word games, writing assignments and reading for study and pleasure. Müller-Spitzer (ed.) (2014) notes that some of her survey respondents mentioned using dictionaries for word games, and to solve arguments about words. Presumably, users’ needs change according to the type of receptive or productive activity they are engaged in: translators need to understand every word in the text, while learners reading for pleasure may only read for gist, ignoring many of the words they do not know. Activity type can also affect users’ choice of dictionary format. Nesi (2010) records a complex picture of e-dictionary preferences, with students using computer-based dictionaries when reading and writing at the computer, and portable electronic dictionaries when reading and writing with paper-based materials. Her participants also preferred mobile e-dictionaries for speaking and listening activities, because of their accessibility and audio pronunciation features. The secondary ‘knowledge-oriented’ use of dictionaries (Bergenholtz and Tarp 2003) has most often been studied in connection with vocabulary learning and retention, and the effect of dictionary use has usually been measured by testing participants after they have completed a task under various conditions, for example, with or without a bilingual and/or monolingual dictionary, in print and/or in electronic form (e.g. Dziemianko 2010, 2012, 2014, 2017, Chen 2017) This is the approach that Zhang et al. (2020) discuss extensively in their meta-analysis of the effects of dictionary use on second language vocabulary acquisition, reaching the conclusion that dictionary use, and particularly monolingual dictionary use, is a very effective vocabulary learning strategy, regardless of whether the dictionary is electronic or in print. However, the secondary use of dictionaries in more natural surroundings has also been explored in questionnaire surveys (e.g. Marello 1987, Chi 1998, Hass 2005), and in combination with extensive reading (Ronald 2002). Nesi (2010) investigated the way users created and annotated their own wordlists using current e-dictionary resources. It seems that the e-dictionary format encouraged browsing for general interest, especially when words within one entry were hyperlinked to other entries (Nesi 2000b). 48
Researching Users and Uses of Dictionaries
5 What kinds of dictionary do users prefer to use? The typology of reference works is complex, but the basic choices facing users are between monolingual and bilingual, hard-copy and electronic. Studies suggest that although users can distinguish these broad categories, many fail to make finer distinctions in terms of types of reference work, the different user groups they are intended for, and the relative merit of comparable titles. Participants in surveys and interviews are often unable to give precise information about the publishers and titles of their dictionaries (Nesi and Haill 2002, Law and Li 2011), and generally the investigation of dictionary preferences is hampered by users’ ignorance of the details concerning the dictionaries they own and use, and of the different types of dictionaries that exist. In many educational contexts dictionary skills are not systematically taught (Atkins and Varantola 1997, Bae 2011). When they are taught, they rarely include the skills of selection and criticism. One simple but seemingly underused way of establishing users’ preferences is to ask them to evaluate various kinds of dictionary material. MacFarquhar and Richards (1983) used this method to compare users’ impressions of different defining styles, and Kanazashi (2011) and Koplenig and Müller-Spitzer (2014b) report studies comparing users’ responses to dictionary formatting and layout features. Lew (2015: 240) advises against evaluating entry displays on the basis of users’ initial reactions, as they may need time to adapt to innovative features. However, although user evaluations do not constitute sufficient proof that one lexicographical approach is superior to another, they are a useful supplement to the comments of dictionary reviewers, who often do not belong to the user group the publisher is targeting. Publishers can obtain information about the popularity of their dictionaries through log files and sales figures, but such commercially sensitive information is rarely made available to external researchers, and many published log-file studies relate to experimental reference works or noncommercial dictionaries (e.g. De Schryver et al. 2006, Bergenholtz and Johnsen 2007, Hult 2007, Müller-Spitzer et al. 2015). User preferences investigated by more direct means generally indicate that language learners prefer bilingual or bilingualized dictionaries (see, e.g. Tomaszczyk 1979, Baxter 1980, Atkins and Varantola 1997, Lew 2004, Ryu 2006), although monolinguals tend to be used progressively more at more advanced levels of study, and consultations may involve both bilingual and monolingual dictionaries, so that the search term can first be identified, and then checked (Müller-Spitzer (ed.) 2014). The use of additional lexicographical and extralexicographical resources can also be a strategy for resolving consultation problems; some learners consult multiple dictionaries (Chon 2009); translation students refer to corpora, search engines and term banks (Frankenberg-Garcia 2005); and the business student participants in Gromann and Schnitzer’s study (2016) used search engines to replace the main functions of monolingual dictionaries, and to reinforce bilingual dictionary information. Lew and Adamska-Sałaciak (2015) argue in favour of bilingual dictionary use, but although dictionary definitions written in another language can be hard for learners to understand, monolingual dictionaries are often regarded as superior in quality, both by teachers (Boonmoh and Nesi 2008) and by students (Gromann and Schnitzer 2016). Lew (2004) tested the effect of specially constructed monolingual, bilingual, and bilingualized dictionary entries on the reading test performance of over 700 Polish learners of English and found that the bilingual format
49
The Bloomsbury Handbook of Lexicography
provided the best support. However, Zhang et al.’s large-scale meta-analysis (2020) indicates that, overall, the monolingual format is best for vocabulary learning. Some surveys have investigated dictionary purchasing choices, but because this decision has sometimes rested with teachers (Béjoint 1981, Hartmann 1999, Boonmoh and Nesi 2008, Gromann and Schnitzer 2016) it may not reflect users’ real preferences. Authorities now have less influence over users’ dictionary choices, as e-dictionaries and smartphone apps allow for more secretive consultation and can often be accessed for free. Gromann and Schnitzer (2016) found that 76 per cent of all the consultations they observed involved electronic rather than print dictionaries. Pons was the online bilingual dictionary that lecturers recommended, but although Spanish learners tended to obey this recommendation, French learners tended to consult leo-org, a dictionary which lecturers in French business communication classes explicitly advised against. Koplenig and Müller-Spitzer (2014a) report that only 15.9 per cent of their 684 respondents indicated that they would be willing to pay for dictionary content. These were language professionals, and Lew (2015) believes that this proportion would have been even lower amongst the general dictionaryusing population. E-dictionary packages and portals often contain a wide range of dictionaries of varying quality, some of which contain unattested headwords and idioms, presumably included to increase the extent of coverage and impress unsophisticated users (Nesi 2012). However, despite the growing number of studies of e-dictionary use, little research has been undertaken to evaluate the content of popular e-dictionary portals and apps, and as yet little information is available to help users choose the e-dictionary to consult.
6 What kinds of information do dictionary users look for, and what consultation strategies do they employ? Unsurprisingly, surveys of native speaker users (e.g. Quirk 1975, Jackson 1988, Hartmann 1999, Chatzidimou 2007) and language learners (e.g. Tomaszczyk 1979, Béjoint 1981, Battenburg 1991, Bishop 1998) indicate greatest interest in information that can be applied immediately to a receptive or productive task, such as meaning and spelling, rather than knowledge-oriented information such as the etymology of the look-up word. Because dictionary use generally occurs whilst users are busy doing something else, they generally want to find information quickly, with as little disruption as possible to the task they are undertaking. Several studies have investigated users’ misinterpretations of dictionary information, through failing to read the entire entry (Miller and Gildea 1985, Nesi and Meara 1994), failing to understand grammatical information (Chan 2012), or consulting the wrong entry or subentry (Nesi and Haill 2002). The first definition in polysemous entries is the one that immediately catches the user’s eye, and it also usually represents the most familiar meaning, so alternative definitions lower down the entry are often ignored (Tono 1984, Bogaards 1998) Some studies have explored the role of ‘signposts’ within long entries as a means of helping users find the right, contextually appropriate, definition (Tono 1992, 1997, Bogaards 1998, Lew and Pajkowska 2007, Lew 2010, Nesi and Tan 2011, Müller-Spitzer, Michaelis and Koplenig 2014, Dziemianko 2016), and there has also been interest in other formatting aspects, for example, the use of colours 50
Researching Users and Uses of Dictionaries
to increase search speed and effectiveness (Dziemianko 2015), the effects of alternative ‘folded’ layouts in online dictionaries (Kaneta 2011) and the possible advantages of other types of layered interfaces, including ‘tabbed’ and ‘panelled’ views (Koplenig and Müller-Spitzer 2014b). One of the reasons why e-dictionaries are so popular is that they provide faster access than print dictionaries. As Zhang et al. (2020: 23) point out, it is now widely acknowledged that they are more convenient for users and enable them to look up more words within a shorter space of time. Some researchers (e.g. Taylor and Chan 1994, Zhang 2004, Stirling 2005, Nesi 2000c) have worried that convenience of consultation might affect the quality of the experience, a concern supported by studies of online interaction generally, for example, Delgado et al.’s meta-analysis involving 171,055 participants (2018), which concludes that paper-based reading has advantages over reading on screen. However, experiments with comparable print and e-dictionaries have either recorded no significant difference in task performance (e.g. Nesi 2000b, Koyama and Takeuchi 2003, 2007, Chen 2010, Dziemianko 2012) or significantly better performance by e-dictionary users, at least with some dictionaries (Shizuka 2003, Dziemianko 2010, 2017). In Zhang et al.’s meta-analysis (2020: 23) no clear pattern emerges regarding the effect of dictionary form on L2 vocabulary acquisition. It should be noted that many popular online bilingual dictionaries translate in a fairly primitive way, without information or labels to indicate register differences or restrictions on use (Nesi 2012). Thus, they might encourage a tendency, noted by Hatherall (1984), to look for one-word equivalents of search terms, translating word-for word rather than considering the context. Researching e-dictionary use can be problematic. The content of e-dictionary packages, portals and apps is changeable and often poorly described, and users are often secretive about their e-dictionary-using habits. Even a relatively homogeneous user group may select widely different products as their dictionaries of choice. Moreover e-dictionary developers are not in the habit of offering review copies or discounts on class sets. Shizuka (2003), Koyama and Takeuchi (2007) and Diehr (2010) acquired pocket electronic dictionaries from Casio for use in experiments in Japan and Germany, but other researchers such as Chen (2010) complain that they lacked the e-dictionary resources they needed for their research.
7 Conclusion Research into dictionary use is ultimately intended to help users consult dictionaries more successfully. Progress in this respect has been patchy, however. In some cases, research has informed teacher training, and the teaching of dictionary skills (see, e.g. Bae 2011), but it does not seem to have greatly affected the choices made by commercial dictionary publishers. ‘Few modifications to the learners’ dictionary design are supported by published results of experimental research on how learners really use dictionaries,’ as Lew and Dziemianko (2006: 277) point out. The decline of the print dictionary has led to a reduction in the size of lexicographical teams in mainstream publishing houses, greater reliance on automatic dictionary compilation procedures, and the rise of online dictionary sites created and managed either collaboratively by volunteers, or commercially by technical companies. This might suggest that there will be fewer opportunities
51
The Bloomsbury Handbook of Lexicography
for user research to influence design in years to come. Fortunately, however, the technology is also enabling many university-based research groups to experiment with new presentation techniques and dictionary content. The proceedings of recent eLex conferences are full of descriptions of small and specialized e-dictionaries designed to meet the needs of particular groups of users. Development teams for dictionaries of this sort are familiar with the research methods described in this chapter and have the means to conduct their own user research, building on prior research findings. Their dictionaries will not be ‘block-busters’ like the famous print dictionaries of the past, but they do offer the hope of further, fruitful, user-research-informed design.
References Alhaisoni, E. (2020), ‘Dictionary look-up strategies used by Saudi EFL students: A think-aloud study’, International Journal of English Linguistics 10 (3), 159–76. Akasu, K. and S. Uchida (eds) (2011), Lexicography: Theoretical and Practical Perspectives. Papers Submitted to the Seventh ASIALEX Biennial International Conference, Kyoto, August 22–24, 2011, Kyoto, Japan: The Asian Association for Lexicography. Atkins, B.T.S. and K. Varantola (1997), ‘Monitoring dictionary use’, International Journal of Lexicography 10 (1), 1–45. Atkins, B.T.S. and K. Varantola (1998), ‘Language learners using dictionaries: The final report on the EURLEX/AILA research project on dictionary use’ in B.T. Atkins (ed.), Using Dictionaries: Studies of Dictionary Use by Language Learners and Translators, Tübingen: Max Niemeyer, 21–81. Bae, S. (2011), ‘Teacher-training in dictionary use: Voices from Korean teachers of English’ in K. Akasu and S. Uchida (eds), 46–55. Battenburg, J.D. (1991), English Monolingual Learners’ Dictionaries: A User-Oriented Study, Tubingen: Max Niemeyer. Baxter, J. (1980), ‘The dictionary and vocabulary behavior: A single word or a handful?’, TESOL Quarterly 14, 325–36. Béjoint, H. (1981), ‘The foreign student’s use of monolingual English dictionaries: A study of language needs and reference skills’, Applied Linguistics 2 (3), 207–22. Bergenholtz, H. and M. Johnsen (2007), ‘Log files can and should be prepared for a functionalistic approach’, Lexikos 17, 1–20. Bergenholtz, H. and S. Tarp (2003), ‘Two opposing theories: On H.E. Wiegand’s recent discovery of lexicographic functions’, Hermes 31, 171–96. Bishop, G. (1998), ‘Research into the use being made of bilingual dictionaries by language learners’, Language Learning Journal 18, 3–8. Bogaards, P. (1998), ‘Scanning long entries in learner’s dictionaries’ in T. Fontenelle et al. (eds), EURALEX ‘98 Actes/Proceedings, Liège: Université Départements d’Anglais et de Néderlandais, 555–63. Boonmoh, A. and H. Nesi (2008), ‘A survey of dictionary use by Thai university staff and students, with special reference to pocket electronic dictionaries’, Horizontes de Lingüística Aplicada 6 (2), 79–90. Chan, A. (2011), ‘Bilingualised or monolingual dictionaries? Preferences and practices of advanced ESL learners in Hong Kong’, Language, Culture and Curriculum 24 (1), 1–21. Chan, A. (2012), ‘Cantonese ESL learners’ use of grammatical information in a monolingual dictionary for determining the correct use of a target word’, International Journal of Lexicography 25 (1), 68–94. Chatzidimou, K. (2007), ‘Dictionary use in Greek education: An attempt to track the field through three empirical studies’, Horizontes de Lingüística Aplicada 6 (2), 91–104. Chen, Y. (2010), ‘Dictionary use and EFL learning. A contrastive study of pocket electronic dictionaries and paper dictionaries’, International Journal of Lexicography 23 (3), 275–306.
52
Researching Users and Uses of Dictionaries
Chen, Y. (2017), ‘Dictionary use for collocation production and retention: A CALL-based study’, International Journal of Lexicography 30 (2), 225–51. Chi, M.L.A (1998), ‘Teaching dictionary skills in the classroom’ in T. Fontenelle et al. (eds) EURALEX ‘98 Actes/Proceedings, Liège: Université Départements d’Anglais et de Néderlandais, 565–77. Chon, Y.V. (2009), ‘The electronic dictionary for writing: A solution or a problem?’, International Journal of Lexicography 22 (1), 23–54. Delgado, P., C. Vargas, R. Ackerman and L. Salmerón (2018), ‘Don’t throw away your printed books: A meta-analysis on the effects of reading media on reading comprehension’, Educational Research Review 25, 23–38. De Schryver, G.-M., D. Joffe, P. Joffe and S. Hillewaert (2006), ‘Do dictionary users really look up frequent words? – On the overestimation of the value of corpus-based lexicography’, Lexikos 16, 67–83. De Schryver, G.-M., S. Wolger and R. Lew (2019), ‘The relationship between dictionary look-up frequency and corpus frequency revisited: A log-file analysis of a decade of user interaction with a Swahili-English dictionary’, Gema Online Journal of Language Studies 19 (4), 1–27. Diehr, B. (2010), MOBIDIC hilft beim Englisch lernen. Available online http://www.presse-archiv.uniwuppertal.de/2010//1105_mobidic.html. Dziemianko A. (2010), ‘Paper or electronic? The role of dictionary form in language reception, production and the retention of meaning and collocations’, International Journal of Lexicography 23 (3), 257–73. Dziemianko A. (2012), ‘Why one and two do not make three: Dictionary form revisited’, Lexikos 22, 195–216. Dziemianko, A. (2014), ‘On the presentation and placement of collocations in monolingual English learners’ dictionaries: Insights into encoding and retention’, International Journal of Lexicography 27 (3), 259–79. Dziemianko, A. (2015), ‘Colours in online dictionaries: A case of functional labels’, International Journal of Lexicography 28 (1), 27–61. Dziemianko, A. (2016), ‘An insight into the visual presentation of signposts in English learners’ dictionaries online’, International Journal of Lexicography 29 (4), 490–524. Dziemianko, A. (2017), ‘Dictionary form in decoding, encoding and retention: Further insights’, ReCALL 29 (3), 1–22. East, M. (2008), Dictionary Use in Foreign Language Writing Exams: Impact and Implications, Amsterdam: John Benjamins. Frankenberg-Garcia, A. (2005), ‘A peek into what today’s language learners as researchers actually do’, International Journal of Lexicography 18 (3), 335–55. Granger, S. and M. Paquot (eds) (2010), eLexicography in the 21st Century: New Challenges, New Applications. Proceedings of eLex 2009, Louvain-la-Neuve, 12–14 October 2009, Louvain: Presses Universitaires de Louvain. Gromann, D. and H. Schnitzer (2016), ‘Where do business students turn for help? An empirical study on dictionary use in foreign-language learning’, International Journal of Lexicography 29 (1), 55–99. Hartmann, R.R.K. (1999), ‘Case study: the Exeter University survey of dictionary use’ in R.R.K. Hartmann (ed.), Thematic Network Project in the Area of Languages. Sub-project 9: Dictionaries. Dictionaries in Language Learning, Berlin: Freie Universität, 36–52. Hatherall, G. (1984), ‘Studying dictionary use: Some findings and proposals’ in R.R.K. Hartmann (ed.), LEX’eter ‘83 Proceedings: Papers from the International Conference on Lexicography at Exeter, 9–12 September 1983 (Lexicographica. Series Maior 1), Tübingen: Max Niemeyer, 183–9. Harvey, K. and D. Yuill (1997), ‘A study of the use of a monolingual pedagogical dictionary by learners of English engaged in writing’, Applied Linguistics 18 (3), 253–78. Hass, U. (2005), ‘Nutzungsbedingungen in der Hypertextlexikografie. Über eine empirische Untersuchung’ in D. Steffens (ed.), Wortschatzeinheiten: Aspekte ihrer (Be)schreibung. Dieter Herberg zum 65. Geburtstag, Mannheim: Institut für Deutsche Sprache, 29–42. Heid, U. and J.T. Zimmerman (2012), ‘Usability testing as a tool for e-dictionary design: Collocations as a case in point’ in R. Vatvedt Fjeld and J.M. Torjusen (eds), Proceedings of the 15th EURALEX International Congress, Oslo: University of Oslo, 661–71. 53
The Bloomsbury Handbook of Lexicography
Hult, A-K. (2007), ‘A study in dictionary use on the internet’, Nordiska studier i lexikografi. Rapport från 9. Konference om leksikografi i Norden, Akureyri 22.–26. maj 2007. Jackson, H. (1988), Words and Their Meaning, London: Longman. Kanazashi, T. (2011), ‘Three areas of dictionary research where user studies are of particular importance’ in K. Akasu and S. Uchida (eds), 209–18. Kaneta, T. (2011), ‘Folded or unfolded: Eye-tracking analysis of L2 learners’ reference behaviour with different types of dictionary interfaces’ in K. Akasu and S. Uchida (eds), 219–24. Kemmer, K. (2014), ‘Rezeption der Illustration, jedoch Vernachlässigung der Paraphrase? Ergebnisse einer Benutzerbefragung und Blickbewegungsstudie’ in C. Müller-Spitzer (ed.), 251–78. Koplenig, A. and C. Müller-Spitzer (2014a), ‘General issues of online dictionary use’ in C. Müller-Spitzer (ed.), 127–41. Koplenig, A. and C. Müller-Spitzer (2014b), ‘Questions of design’ in C. Müller-Spitzer (ed.), 189–204. Kosem, I. and K. Kosem (eds) (2011), Electronic Lexicography in the 21st Century New Applications for New Users. Proceedings of eLex 2011, Bled, 10–12 November 2011, Trojina: Institute for Applied Slovene Studies. Kosem, I., J. Kallas, P. Gantar, S. Krek, M. Langemets and M. Tuulik (eds) (2013), Electronic Lexicography in the 21st Century: Thinking Outside the Paper. Proceedings of the eLex 2013 Conference, 17–19 October 2013, Tallinn, Estonia, Ljubljana and Tallinn: Trojina, Institute for Applied Slovene Studies and Eesti Keele Instituut. Kosem, I., M. Jakubíček, J. Kallas and S. Krek (eds) (2015), Electronic Lexicography in the 21st Century: Linking Lexical Data in the Digital Age. Proceedings of the eLex 2015 Conference, 11–13 August 2015, Herstmonceux Castle, United Kingdom, Ljubljana and Brighton: Trojina, Institute for Applied Slovene Studies and Lexical Computing Ltd. Kosem, I., C. Tiberius, M. Jakubíček, J. Kallas, S. Krek and V. Baisa (eds) (2017), Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference, Brno, Czech Republic: Lexical Computing CZ s.r.o. Kosem, I., R. Lew, C. Müller-Spitzer, M. Ribeiro Silveira, S. Wolfer, A. Dorn, A. Gurrutxaga, K. Ceberio, E. Etxeberria, M-A. Lefer, D. Geeraerts, K. Štrkalj Despot, T. Stojanov, N. Ljubešić, M. Škrabal, B. Štěpánková, V. Vodrážková, H. Lorentzen, L. Trap-Jensen, J. Kallas, M. Tuulik, K. Koppel, M. Langemets, T. Heinonen, I. Thomas, T. Margilitadze, S. Markantonatou, V. Giouli, C. Mulhall, I. Kernerman, Y. Ben-Moshe, T. Sadan, A. Abel, M. Nied Curcio, L. Tanturovska, B. Nikovska, C. Tiberius, O. Grønvik, M. Hovdenak, S. Berg-Olsen, K.E. Karlsen, C-E. Smith Ore, M. Biesaga, T. Zingano Kuhn, J. Silvestre, E.I. Tamba, G. Haja, M-R. Clim, M-I. Patrascu, T. Tasovac, S. Petrović, S. Arhar Holdt, C. Valcarcel Riveiro, M.J. Domínguez Vázquez, E. Volodina, I. Pilán, E. Sköldberg, L. Holmer and H. Nesi (2019a), ‘The image of the monolingual dictionary across Europe. Results of the European survey of dictionary use and culture’, International Journal of Lexicography 32 (1), 92–114. Kosem, I., T. Zingano Kuhn, M. Correia, J.P. Ferreria, M. Jansen, I. Pereira, J. Kallas, M. Jakubíček, S. Krek and C. Tiberius (eds) (2019b), Electronic Lexicography in the 21st Century. Proceedings of the eLex 2019 Conference. 1–3 October 2019, Sintra, Portugal. Brno, Czech Republic: Lexical Computing CZ, s.r.o. Kozioł-Chrzanowska, E. (2017), ‘What do users of general electronic monolingual dictionaries search for? The most popular entries in the Polish Academy of Sciences Great Dictionary of Polish’ in I. Kosem et al. (eds), 202–20. Koyama, T. and O. Takeuchi (2003), ‘Printed dictionaries versus electronic dictionaries: A pilot study on how Japanese EFL learners differ in using dictionaries’, Language Education and Technology 40, 61–79. Koyama, T. and O. Takeuchi (2007), ‘Does look-up frequency help reading comprehension of EFL learners? Two empirical studies of electronic dictionaries’, Calico Journal 25 (1), 110–25. Law, W. and K. Li (2011), ‘Mobile phone dictionary: Friend or foe? A user attitude survey of Hong Kong translation students’ in K. Akasu and S. Uchida (eds), 303–12. Lew, R. (2002), ‘Questionnaires in dictionary use research: A re-examination’ in A. Braasch and C. Povlsen (eds), Proceedings of the Tenth EURALEX International Congress, Vol. 1, Copenhagen: Center for Sprogteknologi, Copenhagen University, 267–71. 54
Researching Users and Uses of Dictionaries
Lew, R. (2004), Which Dictionary for Whom? Receptive Use of Bilingual, Monolingual. and Semibilingual Dictionaries by Polish learners of English, Poznań: Motivex. Lew, R. (2010), ‘Users take shortcuts: Navigating dictionary entries’ in A. Dykstra and T. Schoonheim (eds), Proceedings of the 14th EURALEX International Congress, Leeuwarden/Ljouwert, The Netherlands: Fryske Akademy, 1121–32. Lew, R. (2011), ‘User studies: opportunities and limitations’ in K. Akasu and S. Uchida (eds), 7–16. Lew, R. (2015), ‘Research into the use of online dictionaries’, International Journal of Lexicography 28 (2), 232–53. Lew, R. and A. Adamska-Sałaciak (2015), ‘A case for Bilingual learners’ dictionaries’, ELT Journal 69 (1), 47–57. Lew, R. and J. Doroszewska (2009), ‘Electronic dictionary entries with animated pictures: Lookup preferences and word retention’, International Journal of Lexicography 22 (3), 239–57. Lew, R. and A. Dziemianko (2006), ‘Non-standard dictionary definitions: what they cannot tell native speakers of Polish’, Cadernos de Traduçao 18, 275–94. Lew, R., M. Grzelak and M. Leszkowicz (2013), ‘How dictionary users choose senses in bilingual dictionary entries: An eye-tracking study’, Lexikos 23, 228–54. Lew, R., R. Kaźmierczak, E. Tomczak and M. Leszkowicz (2018), ‘Competition of definition and pictorial illustration for dictionary users’ attention: An eye-tracking study’, International Journal of Lexicography 31 (1), 53–77. Lew, R. and J. Pajkowska (2007), ‘The effect of signposts on access speed and lookup task success in long and short entries’, Horizontes de Lingüística Aplicada 6 (2), 235–52. Marello, C. (1987), ‘Examples in contemporary Italian bilingual dictionaries’ in A. P. Cowie (ed.), The Dictionary and the Language Learner: Papers from the EURALEX Seminar at the University of Leeds, 1–3 April 1985 (Lexicographica. Series Maior 17), Tübingen: Max Niemeyer, 224–37. MacFarquhar, P. and J. Richards (1983), ‘On dictionaries and definitions’, RELC Journal 14 (1), 111–24. Miller, G. and P. Gildea (1985), ‘How to misread a dictionary’, AILA Bulletin, 13–26. Müller-Spitzer, C. (ed.) (2014), Using Online Dictionaries (Lexicographica Series Maior 145), Berlin: Walter de Gruyter. Müller-Spitzer, C., F. Michaelis and A. Koplenig (2014), ‘Evaluation of a new web design for the dictionary portal OWID’ in C. Müller-Spitzer (ed.), 207–28. Müller-Spitzer, C., S. Wolfer and A. Koplenig (2015), ‘Observing online dictionary users: Studies using Wiktionary log files’, International Journal of Lexicography 28 (1), 1–26. Müllich, H. (1990), ‘Die Definition ist blöd!’ Herübersetzen mit dem einsprachigen Wörterbuch. Das französische und englische Lernerwörterbuch in der Hand der deutschen Schüler, Tübingen: Max Niemeyer. Nesi, H. (2000a), The Use and Abuse of Learners’ Dictionaries, Tübingen: Max Niemeyer. Nesi, H. (2000b), ‘On screen or in print? Students’ use of a learner’s dictionary on CD-ROM and in book form’ in P. Howarth and R. Herington (eds), EAP Learning Technologies, Leeds: Leeds University Press, 106–14. Nesi, H. (2000c), ‘Electronic dictionaries in second language vocabulary comprehension and acquisition: the state of the art’ in U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), Proceedings of the Ninth EURALEX International Congress, EURALEX 2000, Stuttgart, Germany, August 8th–12th, 2000, Stuttgart: Institut für Maschinelle Sprachverarbeitung. Nesi, H. (2010), ‘The virtual vocabulary notebook: The electronic dictionary as vocabulary learning tool’ in G. Blue (ed.), Developing Academic Literacy, Pieterlen, Switzerland: Peter Lang, 213–26. Nesi, H. (2012), ‘Alternative e-dictionaries: Uncovering dark practices’ in S. Granger (ed.), Electronic Lexicography, Oxford: Oxford University Press, 357–72. Nesi, H. and A. Boonmoh (2009), ‘A close look at the use of pocket electronic dictionaries for receptive and productive purposes’ in T. Fizpatrick and A. Barfield (eds), Lexical Processing in Second Language Learners, Clevedon, UK: Multilingual Matters, 67–81. Nesi, H. and R. Haill (2002), ‘A study of dictionary use by international students at a British university’, International Journal of Lexicography 15 (4), 277–306. 55
The Bloomsbury Handbook of Lexicography
Nesi, H. and P. Meara (1994), ‘Patterns of misrepresentation in the productive use of EFL dictionary definitions’, System 22 (1), 1–15. Nesi, H. and K.H. Tan (2011), ‘The effect of menus and signposting on the speed and accuracy of sense selection’, International Journal of Lexicography 24 (1), 79–96. Neubach, A. and A. Cohen (1988), ‘Processing strategies and problems encountered in the use of dictionaries’, Dictionaries 10, 1–19. Ptaszynski, M.O. and M. Sobkowiak (2011), ‘Is it all just text production? Examining dictionary use in L1-L2 translation and in free composition in L2’ in K. Akasu and S. Uchida (eds), 426–35. Quirk, R. (1975), ‘The social impact of dictionaries in the UK’ in R. McDavid and A. Ducket (eds), Lexicography in English, New York: Annals of the New York Academy of Sciences, 211, 76–88. Ronald, J. (2002), ‘L2 lexical growth through extensive reading and dictionary use: A case study’ in A. Braasch and C. Povlsen (eds), Proceedings of the Tenth EURALEX International Congress, Copenhagen, Denmark, August 12–17 2002, Vol. 2, Copenhagen: Center for Sprogteknologi, Copenhagen University, 765–71. Ripfel, M. (1990), ‘Wörterbuchbenutzung bei Muttersprachlern. Untersuchungsbericht über eine Befragung erwachsener muttersprachlicher Sprecher zur Wörterbuchbenutzung’, Lexicographica 6, 237–51. Ryu, J. (2006), ‘Dictionary use by Korean EFL college students’, Language and Information Society 7, 83–114. Shizuka, T. (2003), ‘Efficiency of information retrieval from the electronic and the printed versions of a bilingual dictionary’, Language Education and Technology 40, 15–33. Simonsen, H.K. (2011), ‘User consultation behaviour in internet dictionaries: An eyetracking study’, Hermes 46, 75–101. Stark, M. (1999), Encyclopedic Learners’ Dictionaries: A Study of their Design Features from the User Perspective, Tübingen: Max Niemeyer. Stirling, J. (2005), ‘The portable electronic dictionary – faithful friend or faceless foe?’, Modern English Teacher 14 (3), 64–72. Tarp, S. (2009), ‘Reflections on lexicographical user research’, Lexikos 19, 275–96. Taylor, A. and A. Chan (1994), ‘Pocket electronic dictionaries and their use’ in W. Martin, W. Meijs, M. Moerland, E. Ten Pas, P. Van Sterkenburg and P. Vossen (eds), Proceedings of the 6th Euralex International Congress, Amsterdam: Euralex 598–605. Thumb, J. (2004), Dictionary Look-up Strategies and the Bilingualised Learner’s Dictionary: A Thinkaloud Study (Lexicographica Series Maior 117), Berlin: Walter de Gruyter. Tomaszczyk, J. (1979), ‘Dictionaries: Users and uses’, Glottodidactica 12, 103–19. Tono, Y. (1984), On the Dictionary User’s Reference Skills, B.Ed. Dissertation, Tokyo: Gakugei University. Tono, Y. (1992), ‘The effect of menus on EFL learners’ look-up processes’, Lexikos 2, 230–53. Tono, Y. (1997), ‘Guide Word or Signpost? An experimental study on the effect of meaning access indexes in EFL learners’ dictionaries’, English Studies 28, 55–77. Tono, Y. (2011), ‘Application of eye-tracking in EFL learners’ dictionary look-up process research’, International Journal of Lexicography 24 (1), 1–30. Töpel, A (2014), ‘Review of research into the use of electronic dictionaries’ in C. Müller-Spitzer (ed.), 13–54. Varantola, K. (2002), ‘Use and usability of dictionaries: Common sense and context sensibility?’ in M-H. Correard (ed.), Lexicography and Natural Language Processing A Festschrift in Honour of B.T.S. Atkins, Euralex, 30–44. Welker, H.A. (2010), Dictionary Use: A General Survey of Empirical Studies, Brasilia: Author’s Edition. Wiegand, H.E. (1998), Wörterbuchforschung. Untersuchungen zur Wörterbuchbenutzung, zur Theorie, Geschichte, Kritik und Automatisierung von Wörterbuchern, Berlin: de Gruyter. Zhang, P. (2004), ‘Is the electronic dictionary your faithful friend?’, CELEA Journal 27 (2), 23–8. Zhang, S., H. Xu and X. Zhang (2020), ‘The effects of dictionary use on second language vocabulary acquisition: A meta-analysis’, International Journal of Lexicography, ecaa010. https://doi.org/10.1093/ ijl/ecaa010. 56
6
Methods in (meta)lexicography Howard Jackson
A number of chapters in this handbook discuss methodologies used in various aspects of lexicography, not least the preceding three chapters of this section. The aim of this chapter is to present an overview of the variety of methods that are used in the pursuit of lexicography. The first section outlines methods used in practical lexicography, i.e. the compiling of dictionaries. The second section looks at the influence of linguistics on practical lexicography. The third section turns to methods in academic lexicography, i.e. metalexicography. The final section reviews the methods used in the investigation of dictionary uses and dictionary users.
1 Practical lexicography (see also Trap-Jensen, Chapter 3) There has been a succession of manuals or practical guides published on how to compile a dictionary, from Zgusta (1971) through Landau (2001), van Sterkenburg (2003), Atkins and Rundell (2008), Fontenelle (2008) to Svensén (2009), which is a translation of a book originally published in Swedish in 2004. While the stages through which a dictionary project passes have remained relatively stable over the years, what takes place at each stage has changed considerably, not least since the advent of computer technology and the availability of large text corpora (Rundell and Stock 1992, Heuberger 2016). Wiegand (1998), quoted in Schierholz (2015), identifies six ‘phases’ involved in the compiling of a dictionary: a) preparation b) acquisition of materials and data c) treatment of materials and data d) evaluation e) preparation of the print process f) further development In one of the dictionary projects described by Bergenholtz (2018) he identifies six ‘steps’ needed in dictionary compiling: 1. 2. 3. 4.
dictionary functions and dictionary conception business plan timetable with milestones making the database
The Bloomsbury Handbook of Lexicography
5. dictionary making 6. layout and printing All practical lexicography methods identify a ‘planning’ or ‘preparation’ stage in dictionary compilation, at which several crucial decisions have to be made. Initially the need at this stage is to ‘identify the market’ (Landau 2001: 345), which specifies which group of users the dictionary will be aimed at – children, college students, adults, learners, or some particular subject specialists – and what the size of the dictionary will be – compact, concise, desk-size. This might be called a ‘needs analysis’ (Schierholz 2015: 328). These decisions will clearly have an impact on the budget for the dictionary, as well as the size and composition of the staff team needed to compile it, together with their deployment. Still at the planning stage, which encompasses 1–3 of Bergenholtz’s ‘steps’, a schedule or timetable will be established, which will set out the timescale for the completion of the dictionary compilation, together with expected intermediate goals to be completed within a specified time frame. The planning stage also includes procedures to establish the wordlist that the dictionary will contain, which will depend on the target user group as well as the size of the dictionary. The source(s) of the data to be used for writing the dictionary entries will need to be decided, whether this will be a corpus, a citation file, previous dictionaries or some combination of these. Then a detailed ‘style manual’ (Landau (2001) or ‘style guide’ (Atkins 2002) will need to be written, which will specify the microstructure of the dictionary, what each entry needs to contain and how it will be laid out. It will be the daily guide for the lexicographers in their writing of the dictionary entries. In these days of computerized lexicography, the planning phase will also need to decide which ‘customizable software’ will be used, both for extracting data from the corpus (Corpus Query System) and for compiling the entries themselves (Dictionary Writing System) (Grundy and Rawlinson 2016). The Dictionary Writing System will enable the lexicographers, who may well be working remotely, to all contribute to a common dictionary file, so that a dictionary entry may be worked on by a number of different specialists, e.g. for pronunciation, grammar, defining, examples. Wiegand’s next two phases, ‘acquisition and treatment of data and materials’, are often combined in computerized lexicography that uses text corpora. The corpora may have been compiled in-house by a publisher, such as Oxford University Press’s Oxford Curated Corpora (https://languages.oup.com/products/corpora/) or Harper-Collins’s Collins Corpus (https://www. collinsdictionary.com/cobuild/); or the corpora may be publicly available, such as those on the English Corpora website (https://www.english-corpora.org/corpora.asp). To extract the data from the corpus in a form useful for lexicographers writing dictionary entries an appropriate extraction tool or corpus query system is necessary. In the early days of corpus lexicography, when the corpora were relatively small, a concordancing program was used, which produces KWIC (key word in context) concordance lists for each word or lemma being worked on. With the growth in the size of corpora into the millions, even billions of words, such tools produce far too much information for lexicographers to process, especially for more frequently occurring items. More sophisticated corpus query systems have been developed, the most well-known of which is Sketch Engine (www.sketchengine.eu): see Kilgarriff, Chapter 7. Rather than producing a list of all the contexts in which a particular word occurs in a corpus, Sketch Engine creates a
58
Methods in (Meta)Lexicography
‘word sketch’, which details the typical patterns, both grammatical and lexical (collocations) that the word enters; thus, it presents the lexicographer with immediately useful information for writing a dictionary entry. The headword (lemma) list will have been decided in the planning phase, but at some point the editor will need to ensure that the list represents a balance among the letters of the alphabet; there are, for example, many more words in English beginning with the letter ‘s’ than with the letter ‘j’. A number of guides for achieving balance have been used, including E. L. Thorndike’s ‘block system’, described in Landau (2001: 360–2), in which the letter ‘s’ is allocated thirteen blocks to the one for the letter ‘j’. The compiling and editing of the entries in a dictionary will usually be undertaken using an online ‘dictionary writing system’ (Abel 2012), accessible by all the lexicographers working on the dictionary, as well as by the editor overseeing and checking their work. Some dictionary publishers use off-the-shelf dictionary writing software, such as TLex (https://tshwanedje.com/ tshwanelex/) or DPS (dps.cw.idm.fr), used by most British dictionary publishers (Rundell et al. 2020: 27), while others may use a fully customized in-house system. It may depend on whether the publisher employs IT staff who are able to devise an appropriate dictionary writing system and whether the dictionary project is so specialized that an off-the-shelf product would not be suitable. This stage in the dictionary compilation process incorporates Wiegand’s fourth, ‘evaluation’, phase, which refers to the processing of the data acquired from the corpus or other source. The fifth of Wiegand’s phases involves the preparation of the dictionary for publication, either in printed form or electronically, or both. The publication format will have been decided at the planning stage, and it will have influenced the structure and content of the dictionary as it was being compiled. Both formats could be used for publication from the same database, with the information and its display tailored to the particular format (see Nielsen, Chapter 23). At this stage, the methods of book publishing are to the fore, involving proof-reading, typesetting, printing, etc.; or, in the case of electronic publication, web page design, incorporation of hyperlinks, dropdown menus and the like. Wiegand’s final phase, ‘further development’, refers to what happens post-publication: how the dictionary is reviewed, revised and kept up-to-date, what mechanisms are in place to track neologisms, whether some kind of crowd-sourcing is put in place, as with Macmillan’s ‘Open Dictionary’ (https://www.macmillandictionary.com/open-dictionary/index.html). Two dictionaries could be taken as case studies of compilation: the historical Oxford English Dictionary and the learner’s Collins COBUILD Dictionary. The compilation of the first is described in a number of publications, including Winchester (2003), Gilliver (2016) and Charlotte Brewer’s website (https://oed.hertford.ox.ac.uk/). The story of the compilation of COBUILD, the first corpus-driven dictionary, is told in Sinclair (1987).
2 Lexicography and linguistics The compiling of dictionaries has a long history that predates the development of modern linguistics by many centuries. Nevertheless, many, though not all, of the decisions that a lexicographer takes are of a linguistic nature (Atkins 1993). This section considers the influence
59
The Bloomsbury Handbook of Lexicography
that linguistics has exerted on lexicography and the kinds of contribution that linguistic methods can make to lexicography. That is not to say that lexicography is a branch of linguistics, or indeed applied linguistics: although Hartmann (2001) appears in a series titled ‘Applied Linguistics in Action’, Hartmann himself links lexicography to reference science (p. 5), which is where it properly belongs. The first English dictionary to explicitly acknowledge the influence of modern (structuralist) linguistics was Webster’s Third (Gove 1961). The influence was limited, mainly to the descriptivist stance that the dictionary adopted, much to the dismay of its critics (Sledd and Ebbit 1962), and to the representation of pronunciation. Structuralist linguistics had little to say about meaning (semantics) or words. As linguists in the later part of the twentieth century became more interested in the matters of relevance to lexicography, their influence increased; and it became usual, at least in Britain, for a team of lexicographers to include one or more linguists, e.g. John Sinclair and others on the COBUILD team, or Patrick Hanks on the team for The New Oxford Dictionary of English (Pearsall 1998). However, Béjoint (2010: 272) notes that ‘many lexicographers are still unconvinced that linguists can be of any help’, and that for a variety of reasons, not least that ‘the number of words studied by linguists is negligible compared to the immensity of the lexicon, and some are not even those that are problematic in dictionary making’. Nevertheless, Béjoint (2010) is convinced that linguistics can contribute meaningfully to several areas of lexicography. He opines (2010: 275): All the branches of linguistics have something to contribute: etymology, phonetics, morphology, syntax, semantics, dialectology, sociolinguistics, pragmatics, corpus linguistics, etc. What the linguists can provide is accuracy in individual entries and consistency across different entries. Phonetics clearly has a contribution to make to the representation of pronunciation in dictionaries, and the near-universal use of the International Phonetic Alphabet has been evidence of that. Corpus linguistics, with its ability to determine the relative frequency of words, is useful, among other things, in establishing the wordlist and deciding the order of senses within an entry, as well as providing the frequency information that is given in many learner’s dictionaries. The research from sociolinguistics and pragmatics is needed for accurately labelling the register and social status of lexical items. Perhaps the most obvious area of linguistics with insights relevant for dictionary-making is lexical semantics. There are two lexicographic issues in particular that lexical semantics can provide help with, both of which are key to an effective dictionary entry: deciding how many senses a lexical item has; and defining or describing the meanings of the senses identified. Compare any two dictionaries of the same type and size and you will discover that for many polysemous lexemes there is no agreement on how many senses are recognized, nor the order in which they are presented (see Jackson 2002: 88–93). The influence of context on the meaning of a lexical item, as recognized by lexical semantics more recently, is reflected in the word sketches produced by Sketch Engine (see above and Kilgarriff, Chapter 7). One theory from cognitive psychology that has been influential in lexical semantics is prototype theory (Taylor 2003), which argues that some members of a category are more central or prototypical than other more peripheral and potentially fuzzy members. This theory influenced the writing of entries for polysemous lexemes in the New Oxford Dictionary of English (Pearsall 60
Methods in (Meta)Lexicography
1998), in which central ‘core’ senses are distinguished from ‘subsenses’, which are extensions or specialized meanings of the core sense. Patrick Hanks, who developed prototype theory within lexical semantics in his theory of ‘norms and exploitations’ (Hanks 2013), was involved in NODE. The Macmillan Dictionary (Rundell 2002) likewise recognizes main senses and subsenses of words, the latter being ‘closely related’ to the main sense under which they are listed. One of the main tasks of a dictionary is to define (the meaning of the senses of) words, and the question is whether linguists have anything to contribute to this task. Componential analysis helped in identifying the ‘genus’ and ‘differentiae’ of some words to aid in the classical definition style. Wierzbicka’s (1996) theory of ‘natural semantic metalanguage’ provides an exhaustive analysis of word meanings, but it is probably too detailed and complex for immediate usefulness to lexicographers engaged in writing definitions. Prototype theory influenced some of the definitions in NODE. In the end, we have to agree with Béjoint (2010: 336) that ‘defining words in a way that is satisfactory both for the linguists and for the dictionary users may well be impossible.’ Dictionary users require explanations, rather than definitions in the logician’s sense. While some linguistic theories may provide some methods for determining the meaning of words, it is the skill and experience of the lexicographer that ultimately counts in designing an explanation that will be understandable and useful to a dictionary user. Where corpus linguistic methods in particular have contributed significantly to the writing of dictionary entries is in the identification and description of multi-word expressions, from fixed idioms to looser collocations. Sinclair (e.g. 1991) formulated what he called the ‘idiom principle’, which asserted that much of language is prefabricated. Following on from this, many researchers have looked at idioms and collocations, as well as at how surrounding words influence what a word ‘means’ in a particular context. Many of these insights are now incorporated in dictionaries, especially in those aimed at learners. The methods of various branches of linguistics thus have an important contribution to make to practical lexicography, and Rundell (2012: 71) believes that ‘lexicography has benefited enormously from its engagement with theoretical linguistics’. It is not the whole story, though, for ‘lexicographers and linguists have different agendas’ (Rundell 2012: 71). Moreover, methods from other areas are also employed in dictionary compilation: from business in the planning and management of the project, from book production and web design in the publication of the dictionary, and from reference science in determining the overall conception and execution of the dictionary, which will take its place among other reference resources available to the public.
3 Academic lexicography In this section, we are concerned with what is sometimes called metalexicography or dictionary research (see also Bogaards, Chapter 2). According to Schierholz (2015: 337), metalexicography is ‘the theory of practical lexicography’, and it has ‘the task of investigating all methods of practical lexicography and their theoretical reflection’. Hartmann (2001: 121) notes that ‘dictionary research is not characterised by a single, crucial method, but by a multiplicity of investigative styles.’ Some dictionary research focuses on dictionary structure, other research
61
The Bloomsbury Handbook of Lexicography
focuses on dictionary content; some research may take a broad view of the lexicographic output of a particular country or publisher, other research may narrow the focus to a particular dictionary or dictionary type; some dictionary research may take a historical approach, whereas other research may take a contrastive approach, comparing successive editions of the same dictionary or two or more dictionaries aimed at the same user group. Many of these kinds of investigation go under the name of ‘dictionary criticism’ (see Akasu, Chapter 4). Dictionary research is manifold and diverse, and the methods used are equally various. Probably the most elaborated of the methods proposed is Wiegand’s (2010) ‘systematic dictionary research’, summarized in Schierholz (2015), which is based on Mann and Schierholz (2014). The method concentrates on ‘the investigation of dictionary structures and the clear presentation of results’ (Wiegand 2010: 249); the focus is on ‘dictionary form’ and ‘methods of presentation’ of the results of the investigation (Schierholz 2015: 337), rather than on dictionary content. Wiegand proposes a number of text segmentation methods to investigate dictionary structures, the foremost of which is the ‘functional-positional segmentation’ (Schierholz 2015: 337). The method identifies each element of a dictionary article in terms of its position within the article and its function, and whether it is part of a hierarchy of items. Schierholz (2015: 338) summarizes: ‘With this procedure one can find out which parts of a dictionary article belong together and which tasks are given to these elements by the lexicographer.’ Wiegand (2010) also proposes methods for displaying the investigations into the structure of dictionary entries, including tree diagrams and the like. Wiegand’s detailed methods of investigation result in elaborate diagrams to display the structure of dictionary articles. It is arguable that Wiegand’s approach is too complex and fine-meshed to be of much practical use; the investigator would need to expend considerable effort and time to master Wiegand’s methods, and then it would be questionable whether the results would justify the effort expended. Not only that but the focus would be on the detail rather than on the larger picture of how the dictionary functions and how its content satisfies the intended user group and the dictionary functions they need. An altogether more comprehensible approach is that proposed by Coleman and Ogilvie (2009) in their article on ‘forensic dictionary analysis’. The aim of this methodology is to ‘reconstruct lexicographic policies and practices … that sometimes differ from the accounts given by the lexicographers themselves’; it is a methodology that combines ‘statistical, textual, contextual, and qualitative analyses’ (p. 1). Dictionary research lends itself to quantitative analyses, as the analyses of dictionaries in the journal Lexicon, published by the Iwasaki Linguistic Circle in Japan, illustrate so well (e.g. Kokawa et al. 2018). But, warn Coleman and Ogilvie (2009: 2–3): Numerical results can be unduly convincing: statistical analysis is only useful if it is rigorous, and the rigour and success of quantitative analysis is dependent on a well thought-out strategy with regard to the parameters of the study and the sampling techniques adopted. The parameters of a study are determined by identifying the countable items in the dictionary or dictionaries under investigation, items such as headword, pronunciation, register and semantic field. Sampling must be from the whole alphabetical range, not from one or two letters of the alphabet, and the size of the sample must fit the purpose of the research and the item being studied. Bukowska (2010) emphasizes the necessity of choosing an appropriate sampling 62
Methods in (Meta)Lexicography
technique in order to achieve a genuinely random sample; she complains that ‘most of the samples in current metalexicographic research are judgmental one-stretch samples based on what metalexicographers intuitively consider reliable and representative’ (p. 1259). Bukowska (2010) discusses and tests various sampling techniques, from simple random sampling, to systematic sampling (taking every so many pages), to stratified sampling (dividing the items into subgroups, e.g. by word class); and she concludes that stratified sampling works best with larger dictionaries but not with smaller ones, where random sampling is more appropriate. It is clear that using quantitative methods in dictionary research requires careful attention to how the sample is constructed on which the study will be based. Statistical analysis is probably not sufficient on its own; it needs to be supplemented by qualitative and contextual analysis (Coleman and Ogilvie 2009). A dictionary is a text like any other, though of a unique genre; and it can be investigated using the methods and techniques of text linguistics or stylistics, to uncover its purpose, structure and content. Account should be taken of the dictionary’s front matter – foreword, preface, guide to using the dictionary, etc. – as well as to any other contextual information that might be available, in the form, for example, of publisher’s blurb or lexicographers’ accounts of the compilation process (e.g. Sinclair 1987). In the light of this contextual information, the text of the main dictionary can be examined, both in its macrostructure and in its microstructure. Sampling techniques mentioned earlier will identify a (random) selection of entries to examine in detail, using the methods of textual analysis. The outcomes of such investigations may be published as dictionary reviews, which are a particular ‘metalexicographic genre’ (Bergenholtz and Gouws 2016). Dictionary research encompasses more than just the forensic examination of individual dictionaries and dictionary entries. Swanepoel (2008: 221–2) suggests that it also includes research on the history of dictionaries, research on dictionary typologies, research on dictionary use and users, research on dictionary structure (cf. Rundell 2012: 49). We may also add research that compares and contrasts successive editions of a dictionary or two or more different dictionaries (Hartmann 2001). Different methods will be to the fore in each of these areas of dictionary research: methods of historical research, reference science, social science, contrastive analysis and so on.
4 Investigating dictionary use(r)s (see also Nesi, Chapter 5) Investigation into the users and uses of dictionaries began properly in the 1970s (Béjoint 2010: 239), though Lew (2011b) notes a study from 1915; however, ‘the majority of the empirical user studies available today have been done in the last two decades or so’ (Lew 2011b: 1, cf. Welker 2010). User studies have employed a variety of methods, many from the social sciences, with some new methods becoming available as a result of advances in computer technology. The earliest user studies were conducted using paper-based questionnaires; the newest technique involves computer-based eye-tracking technology. User studies are interested in finding out how users relate to dictionaries and what look-up processes they engage in when using a dictionary. Béjoint (2010) suggests that there are three 63
The Bloomsbury Handbook of Lexicography
types of user study: reference needs, asking what users look up; how dictionaries are used, i.e. the look-up process; and how dictionaries help, i.e. do they satisfy the look-up query? Additionally, a study may gauge attitudes to innovations, such as in electronic dictionaries (e.g. Müller-Spitzer et al. 2012). Lew (2011a: 7) lists the following techniques that are used to investigate users’ lookup processes: observation, self-accounts, think-aloud protocols, video-taping, screen recorders, server logging, eye tracking. The list is not intended to be exhaustive; it omits questionnaires, for example, as well as lab-based experiments and tests (cf. Tarp 2009: 283). While ‘methodological standards are improving at a steady rate’ (Lew 2011b: 1), Béjoint (2010: 257) believes that ‘studies of dictionary use … need to continue refining their methodologies.’ Every method has advantages and disadvantages; and the size and composition of the sample can also be an issue. A majority of studies uses subjects that are readily available: the researcher’s own or a colleague’s students; with the result that the sample is often restricted to university-educated subjects, even to language students or professionals. Online questionnaires are often able to cast the net somewhat wider, but the sample may then be self-selecting rather than representative. Questionnaires have the advantage that the sample size can be relatively large, and the results can be easily computed. Their usefulness depends on the nature of the questions asked. If they are used to ascertain how subjects use their dictionary, then the disadvantage is that the subjects are usually being asked to reflect on their own practice; there is no guarantee that a subject’s self-reporting accurately represents what they actually do. If, on the other hand, the questions are directed at subjects’ attitudes to possible innovations in dictionary structure and content, as in Müller-Spitzer et al.’s (2011) study, then a questionnaire will be an appropriate instrument. Similarly, interviews or focus groups may be ‘particularly useful for probing the field’ (Lew 2011a: 13). Online questionnaires have the advantage that they can be administered with an online survey tool and the results can be readily analysed statistically (Müller-Spitzer et al. 2018). Observation is beneficial in that the researcher can see what the subject is doing when performing a look-up, rather than relying on a subject’s self-reporting. However, this method is time-consuming, as only one subject can be observed at a time; and the presence of the observer may influence the behaviour of the subject (Labov’s observer’s paradox), although this could be mitigated by using video recording, which also has the advantage of producing a record that can be referred to later and multiple times. Self-accounts and think-aloud protocols ask subjects to reflect on how they use a dictionary; in the case of think-aloud protocols the reflection occurs in the course of undertaking the look-up procedure rather than in retrospect. Such protocols can be recorded and even followed up in interview. The disadvantages of self-reporting apply here, too, though with think-aloud protocols, which require subjects to talk through their procedure while undertaking it, there is less danger of retrospective misreporting or embellishing of the account. In order to keep a measure of control over their observations, some researchers set up tests and experiments (for the difference between these, see Welker 2010: 21–3). These often involve a control and experimental group of subjects, as well as specified lexicographical tasks that the subjects must perform, usually in a specified timeframe within a controlled setting (classroom or laboratory). The advantage of tests and experiments is that variables, such as the dictionary used and the purpose of the look-up, as well as the profile of the subjects, is controlled, thus adding 64
Methods in (Meta)Lexicography
to the validity of the results. The disadvantage is that the look-up does not take place in a natural setting to fulfil a real need for lexicographic information, thus reducing the generalizability of the results. Perhaps the most unobtrusive method for observing users’ practices in dictionary look-ups in a natural setting is the use of log files, which are records that are routinely gathered of a computer user’s keystrokes. This method applies, of course, only to the use of online dictionaries; but they are becoming more and more the favoured dictionary reference source. Müller-Spitzer et al. (2018: 730) are of the opinion that ‘quantitative evaluations of log files can give profound insights into general patterns of look-up behaviour.’ But Lew (2011a) sees some disadvantages: the context of the dictionary look-up is not known, and equally nothing is known about the user; so we do not know whether the user has selected the appropriate dictionary for the task in hand (cf. Tarp 2009: 279). The most recent and innovative method of dictionary use research is eye-tracking, pioneered by Yukio Tono (2011), and termed ‘a highly promising technique’ by Lew (2011b: 3). Again, this technique is applicable only to electronic dictionaries, either online or on CD-ROM. Eyetracking technology records what a user is looking at and for how long; it distinguishes between ‘fixations’ (prolonged gaze) and ‘saccades’ (movements of the eye between fixations) (MüllerSpitzer et al. 2018: 722). A large amount of data is collected in this way that gives an accurate account of the user’s behaviour. However, the quantity of data makes it ‘virtually impossible to analyse the fixation patterns of all participants and extract “typical” patterns in a qualitative manner’ (Müller-Spitzer et al. 2018: 722). And Lew (2011a: 13) thinks that ‘it would be naive to believe that the eye-tracker will instantly answer all questions as to how users interact with dictionaries’. All methods of dictionary user research have upsides and downsides; and those who have reviewed the methods have often noted their positive and negative points (e.g. Tarp 2009, Welker 2010, Lew 2011a, Müller-Spitzer et al. 2018). The variety of research methods means, though, that an investigator can choose the method that suits the purpose of their study; and the accumulation of results from the range of studies that have been undertaken in recent decades has begun to provide some useful insights into user behaviour and to paint a more rounded picture of how dictionaries are used and for what purposes. Dictionary user studies continue to present a fruitful area of research.
5 Conclusion Our survey has, arguably, demonstrated that Hartmann (2001) was correct in his assessment of the multiplicity of methods and investigative styles that characterize dictionary compilation and research. The methods are drawn from a variety of disciplines, including linguistics, the social sciences, reference science and business, as well as those exclusive to lexicography itself. Methods vary according to whether we are engaged in practical lexicography or metalexicography (dictionary research); and within metalexicography a range of methods is used depending on the area of dictionary research being investigated and the purpose of the study. It is incumbent on a researcher to select the appropriate method for the investigation that they wish to pursue.
65
The Bloomsbury Handbook of Lexicography
References Abel, A. (2012), ‘Dictionary writing systems and beyond’ in S. Granger and M. Paquot (eds), Electronic Lexicography, Oxford: Oxford University Press, 83–106. Atkins, B.T.S. (1993), ‘Theoretical lexicography and its relation to dictionary-making’, Dictionaries 14, 4–43. Atkins, B.T.S. (2002), ‘Then and now: Competence and performance in 35 years of lexicography’, Euralex Proceedings, 1–28. Atkins, B.T.S. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. Béjoint, H. (2010), The Lexicography of English, Oxford: Oxford University Press. Bergenholtz, H. (2018), ‘Dictionary management’ in P.A. Fuertes-Olivera (ed.), 34–42. Bergenholtz, H. and R.H. Gouws (2016), ‘On the metalexicographic genre of dictionary reviews with specific reference to LexicoNordica and Lexikos’, Lexikos 26, 60–81. Bukowska, A.A. (2010), ‘Sampling techniques in metalexicographic research’, Euralex Proceedings, 1258–69. Coleman, J. and S. Ogilvie (2009), ‘Forensic dictionary analysis: Principles and practice’, International Journal of Lexicography 22 (1), 1–22. Fontenelle, T. (ed.) (2008), Practical Lexicography: A Reader, Oxford: Oxford University Press. Fuertes-Olivera, P. A. (ed.) (2018), The Routledge Handbook of Lexicography, Abingdon and New York: Routledge. Gilliver, P. (2016), The Making of the Oxford English Dictionary, Oxford: Oxford University Press. Gove, P.B. (ed.) (1961), Webster’s Third New International Dictionary, G & C Merriam. Grundy, V. and D. Rawlinson (2016), ‘The practicalities of dictionary production’ in P. Durkin (ed.), The Oxford Handbook of Lexicography, 561–78, Oxford: Oxford University Press. Hanks, P. (2013), Lexical Analysis: Norms and Exploitations, Cambridge, MA: The MIT Press. Hartmann, R.R.K. (2001), Teaching and Researching Lexicography, Harlow: Longman. Heuberger, R. (2016), ‘Corpora as game changers: The growing impact of corpus tools for dictionary makers and users’, English Today 32 (2): 24–30. Jackson, H. (2002), Lexicography: An Introduction, London and New York: Routledge. Kokawa, T., Y. Asada, J. Sugimoto, T. Osada and K. Ikeda (2018), ‘An analysis of the Merriam-Webster’s Advanced Learner’s English Dictionary, Second Edition’, Lexicon 48: 25–75. Landau, S.I. (2001), Dictionaries: The Art and Craft of Lexicography, Cambridge: Cambridge University Press. Lew, R. (2011a), ‘User studies: Opportunities and limitations’ in K. Akasu and S. Uchida (eds), Asialex 2011 Proceedings, Kyoto-Asian Association for Lexicography, 7–16. Lew, R. (2011b), ‘Studies in dictionary use: Recent developments’, International Journal of Lexicography 24 (1), 1–4. Mann, M. and S.J. Schierholz (2014), ‘Methoden in der Lexikographie und Wörterbuchforschung. Ein Überblick mit einer Auswahlbibliographie’, Lexicographica 30, 3–57. Müller-Spitzer, C., A. Koplenig and A. Töpel (2011), ‘What makes a good online dictionary? Empirical insights from an interdisciplinary research project’ in I. Kosem and K. Kosem (eds), Proceedings of eLEX2011, 203–8. Müller-Spitzer, C., A. Koplenig and A. Töpel (2012), ‘Online dictionary use: Key findings from an empirical research project’, in S. Granger and M. Paquot (eds), Electronic Lexicography, Oxford: Oxford University Press, 425–57. Müller-Spitzer, C., A. Koplenig and S. Wolfer (2018), ‘Dictionary usage research in the Internet era’, in P.A. Fuertes-Olivera (ed.), 715–34. Pearsall, J. ed. (1998), The New Oxford Dictionary of English, Oxford: Clarendon Press. Rundell, M. (ed.) (2002), Macmillan English Dictionary for Advanced Learners, Oxford: Macmillan Education.
66
Methods in (Meta)Lexicography
Rundell, M. (2012), ‘“It works in practice but will it work in theory?” The uneasy relationship between lexicography and matters theoretical’, Euralex Proceedings, 47–92. Rundell, M. and P. Stock (1992), ‘The corpus revolution’, English Today 8 (2), 9–14 and 8 (3), 21–32. Rundell, M., M. Jakubíček and V. Kovář (2020), ‘Technology and English dictionaries’ in S. Ogilvie (ed.), The Cambridge Companion to English Dictionaries, Cambridge: Cambridge University Press. Schierholz, S.J. (2015), ‘Methods in lexicography and dictionary research’, Lexikos 25, 323–52. Sinclair, J.M. (ed.) (1987), Looking Up. An Account of the COBUILD Project in Lexical Computing, London and Glasgow: Collins ELT. Sinclair, J.M. (1991), Corpus, Concordance, Collocation, OXFORD: Oxford University Press, Sledd, J. and W.R. Ebbit (eds) (1962), Dictionaries and THAT Dictionary, Chicago: Scott Foresman. Svensén, B. (2009), A Handbook of Lexicography, Cambridge: Cambridge University Press. Swanepoel, P. (2008), ‘Towards a framework for the description and evaluation of dictionary evaluation criteria’, Lexikos 18: 207–31. Tarp, S. (2009), ‘Reflections on lexicographical user research’, Lexikos 19, 275–96. Taylor, J.R. (2003), Linguistic Categorization, Third edition, Oxford: Oxford University Press. Tono, Y. (2011), ‘Application of eye-tracking in EFL learners’ dictionary look-up process research’, International Journal of Lexicography 24 (1): 124–53. Van Sterkenburg, P. (2003), A Practical Guide to Lexicography, Amsterdam: John Benjamins. Welker, H.A. (2010), Dictionary Use. A General Survey of Empirical Studies, Brasilia: Author’s Edition. Wiegand, H.E. (1998), Wörterbuchforschung. Untersuchungen zur Wörterbuchbenutzung, zur Theorie, Geschichte, Kritik und Automatisierung der Lexikographie. Volume 1, Berlin and New York: Walter de Gruyter. Wiegand, H.E. (2010), ‘Zur Methodologie der systematischen Wörterbuchforschung’, Lexicographica 26, 249–330. Winchester, S. (2003), The Meaning of Everything. The Story of the Oxford English Dictionary, Oxford: Oxford University Press. Wierzbicka, A. (1996), Semantics: Primes and Universals, Oxford: Oxford University Press. Zgusta, L. (1971), Manual of Lexicography, The Hague: De Gruyter Mouton.
67
68
PA RT I I Current research and issues
70
7
Using corpora as data sources for dictionaries Adam Kilgarriff
1 Introduction There are three ways to write a dictionary: ●● ●● ●●
Copy Introspect Look at data.
The first has its place. Making checks against other dictionaries is a not-to-be-overlooked step in the lexicographic process. But if the dictionary is to be an original work, it never has more than a secondary role. Introspection is central. The lexicographer always needs to ask themself, ‘what do I know about this word?’ ‘how do I interpret this evidence?’ ‘does that make sense?’ But by itself, intuition does a limited and partial job. Asked ‘what is there to say about the verb remember?’ we might come up with some facts about meaning, grammar and collocation – but there are many more we will miss, and some of our ideas may be wrong. That leaves data, and for lexicography, the relevant kind of data is a large collection of text: a corpus. A corpus is just that: a collection of data – text and speech – when viewed from the perspective of language research. A corpus supports many aspects of dictionary creation: ●●
●●
●● ●●
headword list development for writing individual entries: ¡¡ discovering the word senses and other lexical units (fixed phrases, compounds, etc.) ¡¡ identifying the salient features of each of these lexical units ¡¡ their syntactic behaviour ¡¡ the collocations they participate in ¡¡ any preferences they have for particular text-types or domains providing examples providing translations.
The chapter describes how a corpus can support each of these parts.
The Bloomsbury Handbook of Lexicography
First, two apologies. First, I am a native speaker of English who has worked mostly on monolingual English dictionaries. Examples, and the experiences which inform the chapter, will largely be from English. Second, to illustrate and demonstrate how corpora support lexicography, one cannot go far without referring to the piece of software that is the intermediary between corpus and lexicographer: the corpus query system. But this article is not a review of corpus query systems, so we simply use one – the one developed by the author and colleagues, the Sketch Engine (Kilgarriff et al. 2004) – to illustrate the various ways in which the corpus can help the lexicographer. For a review of corpus query systems, see Kilgarriff and Kosem (2012). With ever-growing quantities of text available online, faster computers and progress in corpus and linguistic software, the field is changing all the time. By the time any practice is standard and widely accepted, it will be well behind the latest developments, so if this article were to talk only of standard and widely accepted practices it would run the risk of looking dated by the time it was published. Instead, I mainly describe recent and current work on projects I am involved in, arrogantly assuming that this coincides substantially with the leading edge of the use of corpora in lexicography.
2 Headword lists Building a headword list is the most obvious way to use a corpus for making a dictionary. Ceteris paribus, if a dictionary is to have N words in it, they should be the N words from the top of the frequency list.
2.1 In search of the ideal corpus It is never as simple as that, mainly because the corpus is never good enough. It will contain noise and biases. The noise is always evident within the first few thousand words of all the corpus frequency lists that I have ever looked at. In the British National Corpus1 (BNC), for example, a large amount of data from a journal on gastro-uterine diseases presents noise in the form of words like mucosa – a term much-discussed in these specific documents, but otherwise rare and not known to most speakers of English.2 Bias in the spoken BNC is illustrated by the very high frequencies for words like plan, elect, councillor, statutory and occupational: the corpus contains a quantity of material from local government meetings, so the vocabulary of this area is well represented. Thus, keyword lists of the BNC in contrast to other large, general corpora show these words as particularly BNC-flavoured. If we turn to UKWaC (the UK ‘Web as Corpus’, Baroni et al. 2009), a web-sourced corpus of around 1.6 billion words, we find other forms of noise and bias. The corpus contains a certain amount of web spam. We discovered that people advertising poker are skilled at producing vast quantities of ‘word salad’ which, at the time, escaped our automatic routines for filtering out bad material. Internet-related bias also shows up in the high frequencies for words like browser and configure. While noise is simply wrong, and its impact is progressively reduced as our technology
72
Using Corpora as Data Sources for Dictionaries
for filtering it out improves, biases are more subtle in that they force questions about the sort of language to be covered in the dictionary, and in what proportions.3
2.2 Multi-words Dictionaries have a range of entries for multi-word items, typically including, for English, noun compounds (credit crunch, disc jockey), phrasal and prepositional verbs (take after, set out) and compound prepositions and conjunctions (according to, in order to). While corpus methods can straightforwardly find high-frequency single-word items and thereby provide a fair-quality first pass at a headword list for simple words, they cannot do the same for multi-word items. Lists of high-frequency word pairs in any English corpus are dominated by items which do not merit dictionary entries: the string of the usually tops the list of word pairs, or bigrams. The Sketch Engine has several strategies here: one is to view multi-word headwords as collocations (see discussion below) and to find multi-word headwords when working through the alphabet looking at each headword in turn. Another is to use lists of translations. This was explored in the Kelly project (Kilgarriff et al. 2014). The project worked on nine languages. First, we prepared and cleaned up a corpus headword list of around 6,000 words for each language. Then, all the words on those lists were translated (by a professional translation agency) into each of the eight other languages, giving us a database with seventy-two directed language pairs.4 We reasoned that where one language uses a multi-word expression for a unitary concept (say, English look for) it was likely that other languages had a single word for the concept (e.g. French chercher, Italian cercare) and that when the Italian-to-English and French-to-English translators encountered cercare and chercher, they were likely to translate it as look for. So, although look for did not appear in the English source list, it appeared multiple times in the database as a translation. The strategy produced a modest number of multi-word expressions.
2.3 Lemmatization The words we find in texts are inflected forms; the words we put in a headword list are lemmas. So, to use a corpus list as a dictionary headword, we need to map inflected forms to lemmas: we need to lemmatize. English is not a difficult language to lemmatize as no lemma has more than eight inflectional variants (be, am, is, are, was, were, been, being), most nouns have just two (apple, apples) and most verbs just four (invade, invades, invading, invaded). Most other languages present a substantially greater challenge. Yet even for English, automatic lemmatization procedures are not without their problems. Consider the data in Table 7.1. To choose the correct rule we need an analysis of the orthography corresponding to phonological constraints on vowel type and consonant type, for both British and American English.5 Even with state-of-the-art lemmatization for English, an automatically extracted lemma list will contain some errors.
73
The Bloomsbury Handbook of Lexicography
Table 7.1 Complexity in verb lemmatization rules for English. Lemma
-ed, -s forms
Rule
-ing form
Rule
Fix
fixed, fixes
delete ed, -es
fixing
delete ing
Care
cared, cares
delete d, -s
caring
delete -ing, add –e
Hope
hoped, hopes
delete d, -s
hoping
delete -ing, add –e
Hop
hopped
delete ed, undouble hopping consonant
hops
delete s
Fuse
fused
delete d
fusing
delete -ing, add –e
Fuss
fussed
delete ed
fussing
delete ing
AmE
bussed, busses??
delete ed/-s undouble consonant
bussing
delete -ing, undouble consonant
BrE
bused, buses
delete ed, -es
busing
delete ing
Bus
delete -ing, undouble consonant
These and other issues in relating corpus lists to dictionary headword lists are described in detail in Kilgarriff (1997).
2.4 User profiles Building a headword list for a new dictionary (or revising one for an existing title) has never been an exact science, and little has been written about it. Headword lists are typically extended in the course of a project and are only complete at the end. A good starting point is to have a clear idea of who will use your dictionary, and for what purpose: a ‘user profile’. A user profile ‘seeks to characterise the typical user of the dictionary, and the uses to which the dictionary is likely to be put’ (Atkins and Rundell 2008: 28). This is a manual task, but it provides filters with which to sift computer-generated wordlists.
2.5 New words As everyone involved in commercial lexicography knows, neologisms punch far above their weight. They might not be very important for an objective description of the language, but they are loved by marketing teams and reviewers. New words and phrases often mark the only obvious change in a new edition of a dictionary and dominate the press releases. Mapping language change has long been a central concern of corpus linguists and a longstanding vision is the ‘monitor corpus’, the moving corpus that lets the researcher explore language change objectively (Clear 1988, Janicivic and Walker 1997). The core method is to compare an older ‘reference’ corpus with an up-to-the-minute one to find words which are not already in the dictionary, and which are in the recent corpus but not in the older one. O’Donovan and O’Neill (2008) describe how this has been done at Chambers Harrap Publishers, and Fairon 74
Using Corpora as Data Sources for Dictionaries
et al. (2008) describe a generic system in which users can specify the sources they wish to use and the terms they wish to trace. The nature of the task is that the automatic process creates a list of candidates, and a lexicographer then goes through them to sort the wheat from the chaff. There is always far more chaff than wheat. The computational challenge is to cut out as much chaff as possible without losing the wheat – that is, the new words which the lexicography team have not yet logged but which should be included in the dictionary. For many aspects of corpus processing, we can use statistics to distinguish signal from noise, on the basis that the phenomena we are interested in are common ones and occur repeatedly. But new words are usually rare, and by definition are not already known. Thus, lemmatization is particularly challenging since the lemmatizer cannot make use of a list of known words. So, for example, in one list we found the ‘word’ authore, an incorrect but understandable lemmatization of authored, past participle of the unknown verb author. For new-word finding we will want to include items in a candidate list, even though they occur just once or twice. Statistical filtering can therefore be used only minimally. We are exploring methods which require that a word that occurred a maximum of once or twice in the old material occurs in at least three or four documents in the new material, to make its way onto the candidate list. We use some statistical modulation to capture new words which are taking off in the new period, as well as the items that simply have occurred where they never did before. Many items that occur in the new words list are simply typing errors. This is another reason why it is desirable to set a threshold higher than one in the new corpus. For English, we have found that almost all hyphenated words are chaff, and often relate to compounds which are already treated in the dictionary as ‘solid’ or as multi-word items. English hyphenation rules are not fixed: most word pairs that we find hyphenated (sand-box) can also be found written as one word (sandbox), and as two (sand box). With this in mind, to minimize chaff, we take all hyphenated forms and two- and three-word items in the dictionary and ‘squeeze’ them so that the one-word version is included in the list of already-known items, and we subsequently ignore all the hyphenated forms in the corpus list. Prefixes and suffixes present a further set of items. Derivational affixes include both the more syntactic (-ly, -ness) and the more semantic (-ish, geo-, eco-).6 Most are chaff: we do not want plumply or ecobuddy or gangsterish in the dictionary, because, even though they all have google counts in the thousands, they are not lexicalized and there is nothing to say about them beyond what there is to say about the lemma, the affix and the affixation rule. The ratio of wheat to chaff is low, but amongst the nonce productions there are some which are becoming established and should be considered for the dictionary. So, we prefer to leave the nonce formations in place for the lexicographer to run their eye over. For the longer term, the biggest challenge is acquiring corpora for the two time periods which are sufficiently large and sufficiently well matched. If the new corpus is not big enough, the new words will simply be missed, while if the reference corpus is not big enough, the lists will be full of false positives. If the corpora are not well-matched but, for example, the new corpus contains a document on vulcanology and the reference corpus does not, the list will contain words which are specialist vocabulary rather than new, like resistivity and tephrochronology. While vast quantities of data are available on the web, most of it does not come with reliable information on when the document was originally written. While we can say with confidence that 75
The Bloomsbury Handbook of Lexicography
a corpus collected from the web in 2009 represents, overall, a more recent phase of the language than one collected in 2008, when we move to words with small numbers of occurrences, we cannot trust that words from the 2009 corpus are from more recently written documents than ones from the 2008 corpus. Two text types where date-of-writing is usually available are newspapers and blogs. Both of these have the added advantage that they tend to be about current topics and are relatively likely to use new vocabulary. My current strategy for new-word-detection involves large-scale gathering of newspaper and blog feeds every day.
3 Collocation and word sketches The arrival of large corpora provided the empirical underpinning for a view of language associated with Firth and Sinclair, in which the patterning of words in text was central: collocation came to the fore. Since the beginning of corpus lexicography, the primary means of analysis has been the reading of concordances. Since the earliest days of the COBUILD project, the lexicographers scanned concordance lines – often in their thousands – to find all the collocations and all the patterns of meaning and use. The more lines were scanned, the more patterns and collocations were found (though with diminishing returns). This was good and objective, but also difficult and time-consuming. Dictionary publishers were always looking to save time, and hence cut costs. Early efforts to offer computational support were based on finding frequently co-occurring words in a window surrounding the headword (Church and Hanks 1990). While these approaches had generated plenty of interest among university researchers, they were not taken up as routine processes by lexicographers: the ratio of noise to signal was high, the first impression of a collocation list was of a basket of earth with occasional glints of possible gems needing further exploration, and it took too long to use them for every word. The ‘word sketch’ is a response to this problem. A word sketch is a one-page, corpus-based summary of a word’s grammatical and collocational behaviour, as illustrated in Figure 7.1. It uses a parser to identify all verb-object pairs, subject-verb pairs, modifier-modifiee pairs and so on, and then applies statistical filtering to give a fairly clean list, as proposed by Tapanainen and Järvinen (1998, and for the statistics, Rychly 2008). Word sketches need very large, part-ofspeech-tagged corpora: in the late 1990s this had recently become available for general English in the form of the British National Corpus, and the first edition of word sketches was prepared to support a new, ‘from scratch’ dictionary for advanced learners of English, the Macmillan English Dictionary for Advanced Learners (MEDAL, Rundell 2001). As the lexicographers became familiar with the software, it became apparent that word sketches did the job they were designed to do. Each headword’s collocations could be listed exhaustively, to a far greater degree than was possible before. That was the immediate goal. But analysis of a word’s sketch also tended to show, through its collocations, a wide range of the patterns of meaning and usage that it entered into. In most cases, each of a word’s different meanings is associated with particular collocations, so the collocates listed in the word sketches provided
76
Using Corpora as Data Sources for Dictionaries
Figure 7.1 Word sketch for baby (from enTenTen12, a very large 2012 web corpus).
valuable prompts in the key task of identifying and accounting for all the word’s meanings in the entry. The word sketches functioned not only as a tool for finding collocations but also as a useful guide to the distinct senses of a word – the analytical core of the lexicographer’s job (Kilgarriff and Rundell 2002). It became clear that the word sketches were more like a contents page than a basket of earth. They provided a neat summary of most of what the lexicographer was likely to find by the traditional means of scanning concordances. There was not too much noise. Using them saved time. It was more efficient to start from the word sketch than from the concordance. Thus the unexpected consequence was that the lexicographer’s methodology changed, from one where the technology merely supported the corpus-analysis process, to one where it
77
The Bloomsbury Handbook of Lexicography
pro-actively identified what was likely to be interesting and directed the lexicographer’s attention to it. And whereas, for a human, the bigger the corpus, the greater the problem of how to manage the data, for the computer, the bigger the corpus, the better the analyses: the more data there is, the better the prospects for finding all salient patterns and for distinguishing signal from noise. Though originally seen as a useful supplementary tool, the sketches provided a compact and revealing snapshot of a word’s behaviour and uses and became the preferred starting point in the process of analysing complex headwords. Since the first word sketches were used in the late 1990s, the Sketch Engine, the corpus query tool within which they are presented, has not stood still. Word sketches have been developed for a dozen languages (the list is steadily growing) and have been complemented by an automatic thesaurus (which identifies the words which are most similar, in terms of shared collocations, to a target word, see Figure 7.2) and a range of other tools including ‘sketch diff’, for comparing and contrasting a word with synonyms or antonyms (see Figure 7.3). There are also options such as clustering a word’s collocates or its thesaurus entries. The largest corpus for which word sketches have been created so far contains seventy billion words (Pomikalek et al. 2012). In a quantitative evaluation, two thirds of the collocates in word sketches for four languages were found to be ‘publishable quality’: a lexicographer would want to include them in a published collocations dictionary for the language (Kilgarriff et al. 2010).
Figure 7.2 Thesaurus entry for gargantuan. 78
Using Corpora as Data Sources for Dictionaries
Figure 7.3 Sketch diff comparing strong and powerful.
4 Labels Dictionaries use a range of labels (such as usu pl., informal, Biology, AmE) to mark words according to their grammatical, register, domain, and regional characteristics, whenever these deviate significantly from the (unmarked) norm. All of these are facts about a word’s distribution, and all can, in principle, be gathered automatically from a corpus. In each of these four cases, computationalists are currently able to propose some labels to the lexicographer, though there remains much work to be done. In each case the methodology is to: ●●
specify a set of hypotheses ¡¡ there will usually be one hypothesis per label, so grammatical hypotheses for the category ‘verb’ may include: ■■ is it often/usually/always passive ■■ is it often/usually/always progressive ■■ is it often/usually/always in the imperative
79
The Bloomsbury Handbook of Lexicography
●●
for each word ¡¡ test all relevant hypotheses ¡¡ for all hypotheses that are confirmed, alert the lexicographer ■■ (in the Sketch Engine, by adding the information to the word sketch).
Where no hypotheses are confirmed – when, in other words, there is nothing interesting to say, which will be the usual case – no alerts are given.
4.1 Grammatical labels: usually plural, often passive, etc. To determine whether a noun should be marked as ‘usually plural’, we simply count the number of times the lemma occurs in the plural, and the number of times it occurs overall, and divide the second number by the first to find the proportion. Similarly, to discover how often a verb is passivized, we can count how often it is a past participle preceded by a form of the verb be (with possible intervening adverbs) and determine what fraction of the verb’s overall frequency the passive forms represent. Given a lemmatized, part-of-speech-tagged corpus, this is straightforward. A large number of grammatical hypotheses can be handled in this way. The next question is: when is the information interesting enough to merit a label in a dictionary? Should we, for example, label all verbs which are over 50 per cent passive as often passive? To assess this question, we want to know what the implications would be: we do not want to bombard the dictionary user with too many labels (or the lexicographer with too many candidatelabels). What percentage of English verbs occurs in the passive over half of the time? Is it 20 per cent, or 50 per cent, or 80 per cent? This question is also not in principle hard to answer: for each verb, we work out its percentage passive, and sort according to the percentage. We can then give a figure which is, for lexicographic purposes, probably more informative than ‘the percentage passive’: the percentile. The percentile indicates whether a verb is in the top 1 per cent, or 2 per cent, or 5 per cent, or 10 per cent of verbs from the point of view of how passive they are. We can prepare lists as in Table 7.2. This uses the methodology for finding the ‘most passive’ verbs (with frequency over 500) in the BNC. It shows that the most passive verb is station: people and things are often stationed in places, but there are far fewer cases where someone actively stations things. For station, 72.2 per cent of its 557 occurrences are in the passive, and this puts it in the 0.2 per cent ‘most passive’ verbs of English. At the other end of the table, levy is in the passive just over half the time, which puts it in the 1.9 per cent most passive verbs. The approach is similar to the collostructional analysis of Gries and Stefanowitsch (2004). As can be seen from this sample, the information is lexicographically valid: all the verbs in the table would benefit from an often passive or usually passive label. A table like this can be used by editorial policymakers to determine a cut-off which is appropriate for a given project. For instance, what proportion of verbs should attract an often passive label? Perhaps the decision will be that users benefit most if the label is not overused, so just 4 per cent of verbs would be thus labelled. The full version of the table in Table 7.2 tells us what these verbs are. And now that we know precisely the hypothesis to use (‘is the verb in the top 4 per cent most-passive verbs?’) and where the hypothesis is true, the label can be added into the word sketch. In this way, the element of chance – will the lexicographer notice whether
80
Using Corpora as Data Sources for Dictionaries
Table 7.2 The ‘most passive’ verbs in the BNC, for which a ‘usually passive’ label might be proposed. Percentile
Ratio
Lemma
0.2
72.2
station
Frequency
0.2
71.8
base
0.3
71.1
destine
771
0.3
68.7
doom
520
0.4
66.3
poise
640
0.4
65.0
situate
2025
0.5
64.7
schedule
1602
0.5
64.1
associate
8094
0.6
63.2
embed
688
0.7
62.0
entitle
2669
0.8
59.8
couple
1421
0.9
58.1
jail
960
1.1
57.8
deem
1626
1.1
55.5
confine
2663
1.2
55.4
arm
1195
1.2
54.9
design
11662
1.3
53.9
convict
1298
1.5
53.1
clothe
749
1.5
52.8
dedicate
1291
1.5
52.4
compose
2391
1.6
51.5
flank
551
1.7
50.8
gear
733
1.9
50.1
levy
603
557 19201
a particular verb is typically passivized? – is eliminated, and the automation contributes to a consistent account of word behaviour.
4.2 Register Labels: formal, informal, etc. Any corpus is a collection of texts. Register is in the first instance a classification that applies to texts rather than words. A word is informal (or formal) if it shows a clear tendency to occur in informal (or formal) texts. To label words according to register, we need a corpus in which the constituent texts are themselves labelled for register in the document header. Note that at this stage, we are not considering aspects of register other than formality. One way to come by such a corpus is to gather texts from sources known to be formal or informal. In a corpus such as the BNC, each document is supplied with various text type classifications, 81
The Bloomsbury Handbook of Lexicography
so we can, for example, infer from the fact that a document is everyday conversation, that it is informal, or from the fact that it is an academic journal article, that it is formal. The approach has potential, but also drawbacks. In particular, it is not possible to apply it to any corpus which does not come with text-type information. Web corpora do not. An alternative is to build a classifier which infers formality level on the basis of the vocabulary and other features of the text. There are classifiers available for this task: see for example, Heylighen and Dewaele (1999), and Santini et al. (2009). Following this route, we have recently labelled all documents in a twelve billion word web corpus according to formality, so we are now in a position to order words from most to least formal. The next tasks will be to assess the accuracy of the classification, and to consider – just as was done for passives – the percentage of the lexicon we want to label for register. The reasoning may seem circular: we use formal (or informal) vocabulary to find formal (or informal) vocabulary. But it is a spiral rather than a circle: each cycle has more information at its disposal than the previous one. We use our knowledge of the words that are formal or informal to identify documents that are formal or informal. That then gives us a richer dataset for identifying further words, phrases and constructions which tend to be formal or informal and allows us to quantify the tendencies.
4.3 Domain Labels: Geol., Astron., etc. The issues are, in principle, the same as for register. The practical difference is that there are far more domains (and domain labels): even MEDAL, a general-purpose learner’s dictionary, has eighteen of these; larger dictionaries typically have over one hundred. Collecting large corpora for each of these domains is a significant challenge. It is tempting to gather a large quantity of, for example, geological texts from a particular source, perhaps an online geology journal. But rather than being a ‘general geology’ corpus, that subcorpus will be an ‘academic-geology corpus’, and the words which are particularly common in the subcorpus will include vocabulary typical of academic discourse in general, and vocabulary associated with the preferences and specialisms of that particular journal, as well as of the domain of geology. Ideally, each subcorpus will have the same proportions of different text-types as the whole corpus. None of this is technically or practically impossible, but the larger the number of subcorpora, the harder it is to achieve. Once we have the corpora and counts for each word in each subcorpus, we need to use statistical measures for deciding which words are most distinctive of the subcorpus: which words are its ‘keywords’, the words for which there is the strongest case for labelling. The maths we use is based on a simple ratio between relative frequencies, as implemented in the Sketch Engine and presented in Kilgarriff (2009).
4.4 Region Labels: AmE, AustrE, etc. The issues concerning region labels are the same as for domains but in some ways a little simpler. The taxonomy of regions, at least from the point of view of labelling items used in different parts of the English-speaking world, is relatively limited, and a good deal less open-ended than the 82
Using Corpora as Data Sources for Dictionaries
taxonomy of domains. In MEDAL, for example, it comprises just twelve varieties or dialects: American, Australian, British, Canadian, Caribbean, Irish, New Zealand and South/East/West African English.
5 Examples Most dictionaries include example sentences. They are especially important in pedagogical dictionaries, where a carefully selected set of examples can clarify meaning, illustrate a word’s contextual and combinatorial behaviour, and serve as models for language production. The benefits for users are clear, and the shift from paper to electronic media means that we can now offer users far more examples. But this comes at a cost. Finding good examples in a mass of corpus data is labour-intensive. For all sorts of reasons, a majority of corpus sentences will not be suitable as they stand, so the lexicographer must either search out the best ones or modify corpus sentences which are promising but in some way flawed.
5.1 GDEX In 2007, the requirement arose – in a project for Macmillan – for the addition of new examples for around 8,000 collocations. The options were to ask lexicographers to select and edit these in the ‘traditional’ way, or to see whether the example-finding process could be automated. Budgetary considerations favoured the latter approach, and subsequent discussions led to the GDEX (‘good dictionary examples’) algorithm, which is described in Kilgarriff et al. (2008). The method is to score sentences, and only display the highest-scoring ones. A wide range of heuristics are used for scoring, including sentence length, the presence (or absence) of rare words or proper names, and the number of pronouns in the sentence. The system worked successfully on its first outing – not in the sense that every example it identified was immediately usable, but in the sense that it streamlined the lexicographer’s task. GDEX continues to be refined, as more selection criteria are added and the weightings of the different filters adjusted, for English and for other languages. The lexicographer can scan a short list until they find a suitable example for whatever feature is being illustrated, and GDEX means they are likely to find what they are looking for in the top five examples, rather than, on average, within the top twenty to thirty.
6 Translations The corpora that help most for finding translations are parallel corpora: corpora comprising pairs of texts that are translations of each other (see also Adamska-Sałaciak, Chapter 12). Parallel corpora are the fuel that Google Translate feeds on, and ‘statistical machine translation’, of which Google Translate is the highest-profile example, is a great success story of language technology and the use of corpora.
83
The Bloomsbury Handbook of Lexicography
Parallel corpora are of most use if they are aligned: that is, for each sentence, or word, in the one text, the computer knows what the corresponding item is in the other. Where the text is a straightforward literal translation, sentence alignment can now be performed with high accuracy. Of course, some sentences are not one-to-one, and some sections may exist in only one language. Working solutions have been found for identifying and handling these cases, which are all on a cline of how closely the translation follows the original. Throughout parallel corpus work, textpairs which are literal translations are easiest to work with, whereas free translations of novels offer much less. Word alignment is intrinsically a trickier concept than sentence alignment. First, very often, an individual word is not translated by a single word. Second, items often do not stay in the same order. One can expect the sentences and their translations to be in the same order as each other, but one cannot expect the words and their translations to be in the same order in source and target text. There are two ways for lexicographers to use parallel corpora: parallel concordances and summaries. The first is simpler and is based only on sentence alignments. The lexicographer searches for a word or phrase on one side of the corpus and sees pairs of sentences which are translations of each other. The website http://www.linguee.com offers exactly this, for the big European languages, and since its arrival in 2009, it has rapidly become a translator’s favourite. Its display for the English search term baby and the language pair English-German is shown in Figure 7.4. As with examples in general, people (translators, lexicographers and other users) find these example pairs very useful and easy to use. They will often remind a lexicographer of ways of translating a word or phrase that should be included in the dictionary entry and will supply example sentence pairs to be included (usually after some editing).
Figure 7.4 Screenshot from Linguee.com for English search term baby, language pair EnglishGerman. 84
Using Corpora as Data Sources for Dictionaries
The ‘summaries’ approach for using parallel corpora only applies when the corpora are large, and for words where there are many sentence pairs. Then, it will not be possible for the lexicographer to read all the sentence pairs, and it should be possible for the computer to summarize what it finds in them. This is a bilingual version of the reasoning that led to word sketches. In a process closely related to methods for word alignment, the computer can find the other-language words that occur with particularly high frequency in the node word’s aligned sentences. The process can also be applied to find candidate collocations as translations of the node word’s collocations. A first version of a bilingual word sketch based on a parallel corpus, for red for the language pair English-French, is shown in Figure 7.5.
Figure 7.5 Bilingual word sketch, based on a parallel corpus, for red for the language pair English-French. 85
The Bloomsbury Handbook of Lexicography
A limiting factor for parallel-corpus work is the availability of a parallel corpus, for the language pair in question. The early work in the field was based on the Canadian parliamentary proceedings, ‘Canadian Hansard’, which were available in English and French and were fairly literal professional translations of each other. Other sources frequently used, for the languages of the EU, are the European parliamentary proceedings and other documents from the EU (as used for the screenshot). Other text types where parallel data is often available include software documentation, documentation for vehicles and machinery, and film transcripts. A large and well-maintained collection of parallel data is available at the OPUS website.7 For any particular language pair, some text types will be available, others will not.
7 Summary Corpora can make dictionary-making more accurate, efficient, complete and consistent. They can deliver a candidate headword list, and, where the corpora are developed with care, with neologism-finding in mind, can identify candidate neologisms. There are many ways in which they can support entry-writing. They can provide a wide range of clues to the lexicographer for analysing the word’s range of meaning into distinct senses. In combination with a suitable corpus query system they can find the idioms, phrases and collocations for a word. They can identify if a word has noteworthy behaviour in relation to grammar, domain, region and register. They can do the preparatory work for finding good example sentences, and translations. Corpora have been used in these ways in a range of dictionary projects, and the chapter has described how this has worked, with reference to a particular corpus query tool, the Sketch Engine. Over the last two decades, the lexicographer’s role has been more and more often checking and confirming or editing the corpus tool’s work, where earlier it would have been ‘writing from scratch’. In the early twenty-first century, with the advent of the web and many and varied online resources, much is changing in the world of dictionary-making, and many things are uncertain. Quite what the role of the lexicographer will be in ten years’ time is far from clear, but I am confident that the role of the corpus will grow, with the line between dictionary and corpus blurring, and the lexicographer operating at that interface.
Notes 1 The website for the BNC is http://natcorp.ox.ac.uk. 2 In the BNC mucosa is marginally more frequent than spontaneous and enjoyment but appears in far fewer corpus documents. 3 As is now generally recognized, the notion of ‘representativeness’ is problematical with regard to general-purpose corpora like BNC and UKWaC, and there is no ‘scientific’ way of achieving it: see e.g. Atkins and Rundell (2008: 66). 4 The database can be explored online at http://kelly.sketchengine.co.uk.
86
Using Corpora as Data Sources for Dictionaries
5 The issue came to our attention when an early version of the BNC frequency list gave undue prominence to verbal car. 6 Here we exclude inflectional morphemes, addressed under lemmatization above: in English a distinction between inflectional and derivational morphology is easily made for most cases. 7 http://opus.nlpl.eu/.
References Atkins, S. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. Baroni, M., S. Bernardini, A. Ferraresi and E. Zanchetta (2009), ‘The WaCky wide web: A collection of very large linguistically processed web-crawled corpora’, Language Resources and Evaluation Journal 43 (3), 209–26. Church, K. and P. Hanks (1990), ‘Word association norms, mutual information and lexicography’, Computational Linguistics 16, 22–9. Clear, J. (1988) ‘The Monitor Corpus’ in M. Snell-Hornby (ed.), ZüriLEX ‘86 Proceedings, Tübingen: Francke Verlag, 383–9 Fairon, C., K. Macé and H. Naets (2008), ‘GlossaNet2: A linguistic search engine for RSS-based corpora’, in S. Evert, A. Kilgarriff and S. Sharoff (eds), Proceedings, Web As Corpus Workshop (WAC4), Marrakech, 34–9. Gries, S. Th. and A. Stefanowitsch (2004), ‘Extending collostructional analysis: A corpus-based perspective on “alternations”’, International Journal of Corpus Linguistics 9 (1), 97–129. Heylighen F. and J-M. Dewaele (1999), Formality of Language: Definition, Measurement and Behavioural Determinants, Internal Report, Free University Brussels. Available at http://pespmc1.vub. ac.be/Papers/Formality.pdf Janicivic, T. and D. Walker (1997), ‘NeoloSearch: Automatic detection of neologisms in French Internet documents’ in Proceedings of ACH/ALLC’97, Queen’s University, Ontario, Canada, 93–4. Kilgarriff, A. (1997), ‘Putting frequencies in the dictionary’, International Journal of Lexicography 10 (2), 135–55. Kilgarriff, A. (2009), ‘Simple maths for keywords’ in M. Mahlberg, V. González-Díaz and C. Smith (eds), Proceedings, Corpus Linguistics. Liverpool. Available at http://ucrel.lancs.ac.uk/publications/ cl2009/ Kilgarriff, A. and I. Kosem (2012), ʻCorpus tools for lexicographersʼ in S. Granger and M. Paquot (eds), Electronic Lexicography, Oxford: Oxford University Press. Kilgarriff, A. and M. Rundell (2002), ‘Lexical profiling software and its lexicographic applications: a case study’ in A. Braasch and C. Povlsen (eds), Proceedings of the Tenth Euralex Congress, Copenhagen: University of Copenhagen, 807–18. Kilgarriff, A., M. Husák, K. McAdam, M. Rundell and P. Rychlý (2008), ‘GDEX: automatically finding good dictionary examples in a corpus’ in E. Bernal and J. DeCesaris (eds), Proceedings of the XIII Euralex Congress, Barcelona: Universitat Pompeu Fabra, 425–31. Kilgarriff, A., V. Kovář, S. Krek, I. Srdanović and C. Tiberius (2010), ‘A quantitative evaluation of word sketches’ in A. Dykstra and T. Schoonheim (eds), Proceedings of 14th EURALEX International Congress, Leeuwarden: Fryske Academy, 372–9. Kilgarriff, A., P. Rychlý, P. Smrz and D. Tugwell (2004), ‘The Sketch Engine’ in G. Williams and S. Vessier (eds), Proceedings of the Eleventh Euralex Congress, Lorient, France: UBS, 105–16. Kilgarriff, A., F. Charalabopoulou, M. Gavrilidou, J. Bondi Johannessen, S. Khalil, S. Johansson Kokkinakis, R. Lew, S. Sharoff, R. Vadlapudi and E. Volodina (2014), ‘Corpus-based vocabulary lists for language learners for nine languages’, Language Resources and Evaluation Journal 48 (1), 121–63.
87
The Bloomsbury Handbook of Lexicography
O’Donovan, R. and M. O’Neill (2008), ‘A systematic approach to the selection of neologisms for inclusion in a large monolingual dictionary’ in E. Bernal and J. DeCesaris (eds), Proceedings of the XIII Euralex Congress, Barcelona: Universitat Pompeu Fabra, 571–9. Pomikálek, J., M. Jakubíček and P. Rychlý (2012), ‘Building a 70 billion word corpus of English from ClueWeb’, Proceedings LREC, Istanbul. Available at http://www.lrec-conf.org/proceedings/lrec2012/ index.html Rundell, M. (ed.) (2001), Macmillan English Dictionary for Advanced Learners, Oxford: Macmillan Education. Rychlý, P. (2008), ‘A lexicographer-friendly association score’ in P. Sojka and A. Horák (eds), Proceedings of 2nd Workshop on Recent Advances in Slavonic Natural Language Processing, Brno: Masaryk University. Santini M., G. Rehm, S. Sharoff and A. Mehler (eds) (2009), ‘Introduction’, Journal for Language Technology and Computational Linguistics (Special Issue on Automatic Genre Identification: Issues and Prospects) 24 (1), 129–45. Tapanainen, P. and T. Järvinen (1998), ‘Dependency concordances’, International Journal of Lexicography 11 (3), 187–203.
88
8
Researching the use of electronic dictionaries Verónica Pastor and Amparo Alcina
1 Introduction The use of electronic dictionaries has many advantages over the traditional paper dictionary. However, access to the lexicon and terminology of a dictionary presents certain difficulties, partly due to the lack of user knowledge about how a dictionary can be queried to access this kind of information, and partly due to the diversity of ways a dictionary can be consulted (in different areas of the dictionary, with different operators, in widely varied interfaces), which vary from one dictionary to another (see also Klosa-Kückelhaus and Michaelis, Chapter 24). Many studies can be found in the literature on dictionary use by translators (Meyer 1988, Roberts 1990, Atkins 1998, Mackintosh 1998, Corpas et al. 2001, Sánchez 2004a), and also by native speakers of a language, and language learners (Tomaszczyk 1979, Béjoint 1981, Bogaards 1988, Tono 1989, Hulstijn and Atkins 1998, Hartmann 1999, McCreary and Dolezal 1999). Most of these and other authors conclude that users come across difficulties in their dictionary consultations, generally due to two main causes: the dictionary does not facilitate access to information (dictionaries present deficiencies and are not user-friendly), and users are unaware of how to use dictionaries to access the information they contain (Béjoint 1989: 280, Cowie 1999: 188, Hartmann 1999: 40, Nesi and Haill 2002: 285, Fernández-Pampillón and Matesanz 2003: 150, Sánchez 2004a: 482). In addition, dictionaries do not always take advantage of the full potential offered by electronic formats. Electronic dictionaries are frequently mere copies of paper dictionaries; they are texts where the same information appears in different font styles. However, these dictionaries do not offer the search possibilities of an advanced database system. Despite these reflections in the literature, we have found no studies that establish a ‘universal’ classification or arrangement of the search techniques that can be used in a dictionary, in other words, one that is valid for training in electronic dictionary use in general, and that can be adapted to any specific dictionary. By this, we refer to a classification that can specify and highlight all the query options a dictionary offers. This classification may serve as a guide to design search types when creating a new electronic dictionary, either from scratch or based on a paper dictionary. The present research, within the framework of the ONTODIC Project, investigates how to incorporate new search techniques, such as onomasiological searches, in electronic dictionaries (Alcina 2009). This has led us to analyse the search techniques that are currently available in
The Bloomsbury Handbook of Lexicography
electronic dictionaries (Pastor and Alcina 2010, Pastor 2013, this chapter1), as well as in electronic corpora (Pastor and Alcina 2009) and the Internet (Pastor and Alcina 2011). In the present chapter, we first analyse the reflections offered by some authors on the features of electronic dictionaries, mainly in contrast to paper dictionaries. This is followed by an analysis of thirty-two advanced electronic dictionaries, from which we compiled all the search possibilities offered by electronic dictionaries, including the most common and best known, as well as the most innovative. Our empirical study is summarized in Annex 1. Following this analysis we describe our classification of search techniques, based on the authors’ reflections and the dictionary analysis. In this classification we focus on three main elements that cover all the search possibilities. We term these three elements: the query, the resource and the result. The combination of these three elements and their subtypes allows us to structure all the search techniques that can be used in dictionaries. Finally, we conclude by highlighting the practical uses and contributions our classification provides.
2 Electronic dictionaries and search possibilities The development of new technologies and the Internet have progressively modified the concept of the dictionary. Many paper dictionaries have been converted to electronic formats, such as CD-ROM, while others are available online. Electronic dictionaries can be classified into various types according to different criteria. Some authors have attempted to devise typologies of electronic dictionaries (Ide 1993, Sharpe 1995, Nesi 1998, 1999, 2000b). One example of an electronic dictionary typology is that by Lehr (1996: 315), which focuses on technical and meta(lexicographic) evaluation. Based on technical evaluation, this author distinguishes between online or offline dictionaries. Offline dictionaries comprise pocket electronic dictionaries (PEDs) and PC dictionaries. PC dictionaries include dictionaries in CD-ROM, floppy disk and other formats. Based on meta(lexicographic) evaluation, this typology distinguishes between electronic dictionaries based on their paper versions, and newly developed electronic dictionaries, as well as electronic dictionaries with both print and innovative appearances. In this classification (see Figure 8.1), the author distinguishes between newly developed electronic dictionaries (new development) and electronic versions of paper dictionaries (based on paper) (Gross 1997, Jacquet-Pfau 2002: 90). Nesi (2000a: 140) states that fully electronic dictionaries are more effective than electronic dictionaries adapted from paper versions: ‘electronic dictionaries would be most effective if they were designed from scratch with computer capabilities and computer search mechanisms in mind.’ Electronic dictionaries can be easily updated (Kay 1984: 461, Carr 1997: 214, Harley 2000) and allow a quicker, more precise and exhaustive search, in which a variety of search criteria can be combined (Jacquet-Pfau 2002: 99, de Schryver 2003: 157). According to Sharpe (1995: 48), electronic adaptations of paper dictionaries incorporate the same types of search as paper dictionaries, and their search possibilities are therefore less flexible. Lew (2011: 238, 242) points out that some online dictionaries are more uncomfortable to use than paper dictionaries because they do not exploit the potential of the electronic format.
90
Researching the Use of Electronic Dictionaries
Figure 8.1 Classification of electronic dictionaries by Lehr (1996: 315), translation in de Schryver (2003: 148).
Forget (1999) affirms that electronic dictionaries differ from paper dictionaries in factors such as: use, presentation, search capabilities, technical aspects and nature of contents (multimedia elements). With regard to the first of these factors, for example, this author points to the use of hypertext in electronic dictionaries, and a higher flexibility in searches as compared to paper dictionaries. Forget also states that every electronic dictionary has a different interface, and as a result the user must devote more time to learning how to use each one of the electronic dictionaries available. The presentation of electronic dictionaries often involves the use of different colours to highlight information, as well as more intuitive interfaces. Electronic dictionaries are reported to incorporate different search types, and although electronic searches are always quicker than manual searches, much time can also be wasted on searching in an electronic dictionary without obtaining a satisfactory result. Finally, technical aspects mentioned include the interactivity of electronic dictionaries, which allow the user to add to a dictionary entry a comment that may be useful in future searches, such as a usage note, a context, a translation used for a specific client, etc. In addition, users can copy terms directly from the dictionary without having to type them, electronic dictionaries take up less space than paper dictionaries, and do not deteriorate with time. Of all the differences between electronic and paper dictionaries already mentioned in this section, the main difference stated by many authors, and that we want to emphasize here, lies in the way the user accesses the information in a dictionary (Nesi 1998, de Schryver 2003: 147). 91
The Bloomsbury Handbook of Lexicography
Search possibilities in paper dictionaries are limited: the search is restricted to the arrangement of contents by the lexicographer, which is always alphabetical (Sallas 2001) and, therefore, is limited to the search for an exact word (Santana et al. 1996: 70). In contrast, searches in electronic dictionaries are quicker and more flexible because they incorporate more advanced search techniques (Sánchez 2004b: 181, Kaalep and Mikk 2008). Some authors have reflected on the types of search that have already been implemented in electronic dictionaries, or that would be desirable to incorporate, such as de Schryver’s (2003) paper ‘Lexicographer’s Dreams in the Electronic-Dictionary Age’. In this publication the author calls for dictionary designs that would allow more complex searches, and reviews all the authors that have dealt with the advances or desirables in the development of electronic dictionaries to date. Knowles (1990: 1656) suggests the use of hyperlinks; Abate (1985) or Lew et al. (2018) mention the use of images and graphics, as well as the use of natural language in searches; Poirier (1989) refers to the search in the whole text of the dictionary with Boolean operators; Sobkowiak (1999) mentions access to corpora concordances as a dictionary option; Nesi (2000a) suggests simultaneous searches in different dictionaries; Corris et al. (2000) refer to functions of the dictionary that suggest similar spellings to the search words; Geeraerts (2000) considers search functions by inverse indexes, anagrams, and phonetic similarity. Dodd (1989: 89) also refers to the search capabilities of electronic dictionaries and anticipates the possibility of searching for a word from its phonetics, spelling similarities, etymology, thematic area, semantic relations with other words (synonymy, antonymy, hyponymy), words in the definition, part of speech, etc. Many electronic dictionaries allow searches in the entire dictionary content, or in some of its sections, and not only in the entries (Roberts and Langlois 2001: 712), for example, in dictionary definitions or contexts. In addition, hyperlinks in electronic dictionaries link words that are related to other words (Nesi 1999: 61; Gómez González-Jover 2005: 161, Church 2008). Hamon and Nazarenko (2001: 187) identify different types of hyperlinks: in the entry index, in the keywords, and in relations of synonymy and hyponymy between entries and subentries. The inclusion of images in dictionaries is also a very useful complement to linguistic information (Faber et al. 2007, Montero and Faber 2008: 151, Lew 2011: 245). Some entries provide audio files, although dictionaries do not include video files (Lew 2011: 246). Another option of electronic dictionaries is the possibility to access the most recent entries the user consulted, similar to the ‘search history’ in web search engines (Forget 1999, Rizo and Valera 2000: 369). Fernández-Pampillón and Matesanz (2003) distinguish the following types of searches: search in an entry, search in a list of entries (alphabetical or inverse), assisted search, multiple search, search with related words (use of the dictionary as a thesaurus), search in anagrams, and search using abbreviations and marks. Sánchez (2004b: 193–4) includes a range of searches: a search in the entries, similar to an alphabetical search in paper dictionaries; an assisted search in which the dictionary suggests words when the user has misspelled the search word; an advanced search, which allows the user to search in the dictionary content (definitions, examples, etc.); a search with wildcards, patterns defined by the user, and filters, which allows the user to combine search words with operators, and to search for orthographical variants with wildcards; a search for related words, which accesses words that are semantically related to other words; a search with anagrams, which retrieves words that result from changing the order of the letters the user introduces in the 92
Researching the Use of Electronic Dictionaries
dictionary; a search with abbreviations and marks, which retrieves all the forms corresponding to abbreviations and marks used in the dictionary entries; and finally a refined search, which retrieves words by introducing their pronunciation or etymology. Moreover, it is worth noting that a term’s meaning is determined not only by the concept it refers to, but also by the context in which it is used (Kussmaul 1995: 89, Corpas and Valera 2001: 253, Robinson 2003: 113); hence, the importance given by translators and other language professionals to the search for terms within their context of use. Currently, dictionaries, even electronic ones, do not have a wide enough contextual field to solve all user needs. Bowker (1998: 648) states that dictionaries lack contexts compared to the valuable information offered by electronic corpora. Bowker performed a study with translation students in which half the students translated a text using dictionaries only, and the other half used electronic corpora and WordSmith tools. She found that the translations produced by the students with access to corpora were of a higher quality than those from the students that had only used dictionaries. These context-related deficiencies of dictionaries are one of the reasons some authors (Montero and Faber 2008: 162) give to explain why translators have gradually moved from searching in dictionaries to other resources, such as the Internet and corpora, which offer users wider contexts and facilitate access to these contexts by means of various search techniques that dictionaries have not yet incorporated. Because contexts are so relevant to translators and other language professionals, some studies call for the incorporation of corpora within the search systems of electronic dictionaries. In this way, corpora are not simply a source for creating dictionaries, nor a dictionary substitute (Colominas 2004: 362), but rather a complement to or accessory of the dictionary. In this vein, works like that of Castagnoli (2008) are of particular interest; this author created a database in which the contextual field is eliminated from the entries, and terms in the entries are linked to a reference corpus where the user can consult the concordances of a term. In addition, some corpora can be queried like a dictionary (Lew 2011: 246–7). In this section we have outlined some of the reflections many authors have made on electronic dictionaries, with particular attention to how they differ from paper dictionaries. Specifically, the literature highlights aspects such as interactivity, variety and flexibility in the searches. However, we have found no exhaustive studies on dictionary search possibilities, or on how these search possibilities could be organized. We therefore detected the need to develop a classification of search techniques in electronic dictionaries, through the analysis of a set of electronic dictionaries and the search possibilities they offer.
3 Analysis of search techniques in electronic dictionaries In an earlier study we analysed fifteen electronic dictionaries, and we proposed a classification of the search techniques used in electronic dictionaries (Pastor and Alcina 2010). Subsequently, we extended the analysis to thirty-two electronic dictionaries, and our classification proved useful. Only two new search results were added: collocations and related words, and the graphical representation of relations.
93
The Bloomsbury Handbook of Lexicography
The criterion used to select the dictionaries was that they should incorporate innovative search techniques as well as the traditional alphabetical search. In addition, we analysed both online and offline dictionaries in order to examine the advantages and disadvantages of both formats. Online dictionaries are more accessible than the CD-ROM format, and most online dictionaries are free. Moreover, online dictionaries can be consulted on any computer (with Internet access), and the dictionary does not have to be installed in the computer in order to use it. However, offline dictionaries generally provide more search techniques than online dictionaries and are more stable and durable compared to online dictionaries. The virtual nature of online dictionaries means that their URL location may change at any time, or they may even disappear altogether. We analysed dictionaries available on CD-ROM in Spanish, such as the Diccionario de uso del español (DUE) by María Moliner and the Diccionario de la lengua española (DRAE) by the Real Academia Española, in English, such as the Oxford English Dictionary (OED) and the Collins English Dictionary (CED), and in French, such as Le Grand Robert de la Langue Française. We then described a number of online dictionaries and databases, such as (in alphabetical order) the Base Lexicale du Français, Cercaterm; Collins Dictionary; the Diccionari de la llengua catalana; the Dicouèbe2 dictionary and other similar dictionaries (DiCoInfo,3 DiCoEnviro, DAD); the Diccionario de Colocaciones del Español (DiCE);4 Dirae; EcoLexicon;5 FrameNet;6 Le grand dictionnaire terminologique; IATE; Just The Word; Macmillan English Dictionary; the Merriam-Webster monolingual English dictionary, thesaurus, and visual dictionary; OncoTerm;7 the OneLook Reverse Dictionary; TERMIUM Plus; Trésor de la Langue Française informatisé (TLFi); Ultralingua; UNTERM; WordNet;8 the WordReference dictionary and the Wordsmyth dictionary. The search options of these dictionaries are summarized in the table in Annex 1, where they are compared according to different criteria (search for a word in the alphabetical entry list, search for one or more words in the definitions or other fields of an entry, use of operators, search for a semantic relation by navigation, search for a semantic relation by direct search, access to complementary forums, search by thematic area, access to external links, search for images, introduction of an exact word, introduction of a partial word, use of wildcards, introduction of a word to search for phonetically or orthographically related words, introduction of an inflected form, search for anagrams, specification of a part of speech, and introduction of a question in natural language). The horizontal axis of the table shows the search options, and the dictionaries analysed appear on the vertical axis. Comments are also added on some search options in each dictionary. This is intended to give an overall picture of the diverse query options that can be found in electronic dictionaries before we go on to the systematization of search techniques in the following section.
4 Classification of search techniques in electronic dictionaries Our review of the search techniques reported in the literature and our analysis of the search techniques in electronic dictionaries reveal a wide variety of search possibilities. Not all the 94
Researching the Use of Electronic Dictionaries
dictionaries incorporate the same search options. In addition, the same types of searches have different names, depending on the dictionary. We therefore consider it necessary to systematize all the search techniques that have been incorporated in electronic dictionaries to date. The purpose of this chapter is to present a proposal for the classification of search techniques in electronic dictionaries. Our search technique classification is divided into three elements that we have found to be present in every search: the query, the resource and the result. By differentiating the query, the resource and the result, we are able to reflect clearly and coherently all the search possibilities offered by each dictionary. Search techniques are options that a user can apply to a resource to obtain a result. The user wants to obtain particular information (the meaning of a word, a usage context, etc.), and to obtain this information the user queries an electronic resource by introducing an expression, which we call the query. The electronic resource queried by the user can be a dictionary section, for example, a dictionary field where the user can find information. The specific element of the electronic resource queried by the user is called the resource. Finally, by introducing a query in a resource, the user obtains a result, the third element of our classification. Therefore, we distinguish three elements in a search technique: a query, a resource and a result. The query is the word or phrase introduced by the user in the interface of a resource. The resource is the resource or part of the resource in which the word or phrase is searched. The result is the element obtained when a query is searched in a resource. In a dictionary, if we introduce the exact word house as a query to search in the list of dictionary entries (resource), the result we obtain is the dictionary entry for the word house with information about this word. In contrast, if the query we introduce is a combination of words, for example yellow fruit and we search in the definition field of a dictionary (resource), the result we obtain is a list of words whose definition field contains the words introduced in the query (see Figure 8.2). Below, we explain in more detail the search techniques included in our classification and provide examples of how these search techniques are applied in the electronic dictionaries we have analysed.
Figure 8.2 Representation of two search techniques in a dictionary. 95
The Bloomsbury Handbook of Lexicography
4.1 The first element: The query The query is the expression introduced by the user when searching in a dictionary. It is normally an exact word. In some cases, the user may introduce a partial word, for example, part of a word. In other dictionaries we can introduce an approximate word, an anagram, or a sequence of characters that may or may not form a word. Some dictionaries allow the user to introduce a combination of two or more words. Together with the expression, the user can also introduce other information, in the form of filters, in order to restrict or specify the result they want to obtain. A filter limits the expression to a particular criterion, such as a part of speech or a thematic field. For instance, a filter for the word play could be the part of speech ‘noun’. The types of queries we have discerned in electronic dictionaries fall into a range of search technique types and subtypes: (1) an exact word, (2) a partial word, (3) an approximate expression (inflectional form or spelling similarity), (4) an anagram and (5) a combination of two or more words.
4.1.1 Search by exact word The search by an exact word consists of introducing a complete word in the same form as it is included in the dictionary. This search can be used to obtain the dictionary entry containing information about the introduced word (a definition, an example, grammatical information, etc.). This option is offered by all dictionaries. For example, the user can search in the dictionary for the word house in the list of entries of the dictionary and find its definition, etymology, etc. The search of an exact word may be accent and case sensitive or not.
4.1.2 Search by partial word A partial word is an incomplete word. The omitted part of the word can be the start, the middle or the end of the word. This omitted part of the word is replaced by a wildcard. The most frequent wildcards are the asterisk ‘*’, and the question mark ‘?’ The question mark normally replaces only one character. For example, analy?ed retrieves analyzed and analysed. The asterisk normally replaces one or more characters. For example, house* retrieves housemaid, housewife, housebreaking, household, housekeeper, etc. Of the dictionaries analysed, these two wildcards can be used in the CED, DRAE, DUE, OED and the OneLook Reverse Dictionary. In some dictionaries other wildcards can be used. For example, the Ultralingua dictionary uses the asterisk (to replace zero or more characters), the question mark and also the plus sign ‘+’, which replaces one or more characters. The Wordsmyth dictionary uses the asterisk or the percentage sign ‘%’ to substitute any sequence of characters, and the dot ‘.’ or the underscore or understrike ‘_’ to replace one character. In addition, almost all the electronic dictionaries that we have analysed include an autocomplete feature, which means that when the user starts introducing a word in the dictionary, a drop-down list appears suggesting entries included in the dictionary that start with the letters that the user is introducing. For example, in the Macmillan English Dictionary, if we start by introducing ima, the dictionary suggests words such as image, imagery, imaginable, imaginary, imagination, etc. 96
Researching the Use of Electronic Dictionaries
4.1.3 Search by approximate expression An approximate expression is a word or sequence of characters that is similar to an exact word included in the dictionary. The approximate expression can be an inflected form of a word, or a word or sequence of characters that is pronounced or spelled similarly to another word. This search technique can be useful to obtain a list of words included in the dictionary that are similar to the word or sequence introduced by the user. We explain this search technique in more detail below.
4.1.3.1 Search by inflected form An inflected form is a word with inflectional morphemes, for example, a plural noun or a conjugated verb. When the user queries an inflected form in a dictionary, the dictionary retrieves the base form of that word, which corresponds to the headword of a dictionary entry. For example, in the Ultralingua dictionary, if we introduce a conjugated verb, the dictionary retrieves the infinitive, or if we introduce a plural noun or adjective, it retrieves a singular noun or an adjective. Figure 8.3 illustrates the search for the conjugated verb played in the Ultralingua dictionary; the dictionary retrieves the entries that include the base form of this verb, the infinitive play.
Figure 8.3 Search by inflected form in the Ultralingua dictionary. 97
The Bloomsbury Handbook of Lexicography
4.1.3.2 Search by a similarly pronounced or spelled word This search technique consists in introducing a sequence of characters that may or may not constitute a word and that are spelled or pronounced similarly to a word that is included in the dictionary. This feature can be found in the CED, DRAE, DUE, OED, TLFi, Wordsmyth, etc. In addition to these specific search options provided by some dictionaries to search for a similarly pronounced or spelled word, almost all the electronic dictionaries that we have analysed include a feature suggesting similarly spelled words that are included in the dictionary when the search word is not found. For instance, in the Macmillan English Dictionary, if we introduce wron, the dictionary does not display any result, but it suggests other similar words from the dictionary, such as wrong, wren, pron, iron, won and wrote.
4.1.4 Search by anagram and crossword search An anagram is a sequence of characters that may or may not constitute a word, whose transposition results in one or more complete words included in the dictionary. An anagram search can be useful to obtain a list of dictionary words that contain all the letters of the anagram, in the same or a different order. Some dictionaries can also randomly add or discard a number of letters specified by the user (crossword search). These search techniques are useful for finding a word if we know some or all of the letters it contains. Anagram searches can be found in the CED, DRAE, DUE and Wordsmyth. If we introduce the letters bowle in the CED and select the ‘anagrams’ option, a list of words made up of those letters is retrieved: bowel, below and elbow. In addition, DRAE can perform crossword searches. For example, if we introduce the letters casa, and select the option of generating words by adding up to three more letters, we retrieve a long list of words including: asca ‘ascus’, casa ‘house’, actas ‘records’, casal ‘country house’, caseta ‘hut, stand or kennel’, casuca ‘shanty or hovel’, casería ‘country house’, caserna ‘barracks’, etc.
4.1.5 Search by a combination of two or more words Search by a combination of two or more words consists of introducing two or more words in the dictionary at the same time. Our analysis revealed five ways of combining words in a query: (1) presence of all the words introduced, (2) presence of any of the words introduced, (3) presence of one word and absence of another, (4) presence of the exact sequence of words and 5) introduction of words in the form of a question in natural language. Dictionaries normally combine words using operators. In the dictionaries analysed, the search for two or more words can be made in content fields, for example, to obtain a list of dictionary words whose definition contains the words included in the query, or that are related to the query words.
4.1.5.1 Presence of all words In this search technique all the words included in the query must be present in the resource. Below, we present some examples.
98
Researching the Use of Electronic Dictionaries
Figure 8.4 Presence of the words agreement and lawsuit in the OneLook Reverse Dictionary.
In the first example we combine the words agreement and lawsuit without operators in the OneLook Reverse Dictionary. The result we retrieve is a list of words, including: settle, complaint, demand, arrangement, etc. All the words retrieved by the dictionary are related to the query words. Figure 8.4 shows part of the entry for the word settle in the dictionary, where settle is given as a verb used in lawsuit contexts to designate the agreement reached by the parties. The operators used to indicate that all the words must be present can vary from one dictionary to another. The DUE uses the operator ‘&’ to combine two words, one at each side of the operator (word1 & word2). In this case, the dictionary retrieves the entries that contain both query words in their content fields. The CED uses the operator ‘AND’, and Wordsmyth requires the option ‘all of the words’.
4.1.5.2 Presence of any of the words The operators used to indicate that any of the query words can be present also vary from one dictionary to another. In the DUE, the operator ‘|’ is used in combination with two words, one at each side of the operator (word1 | word2). In this case, the dictionary retrieves entries that include only the first or the second word introduced with the operator, or both words at the same time. In the CED, the operator ‘OR’ is used. In the Wordsmyth dictionary, the ‘word(s)’ option retrieves entries that include any of the words the user introduces. Figure 8.5 shows an example of this search technique in the Wordsmyth dictionary (reverse search). We introduce a three-word query, transport, carry and arrows, to search for in the dictionary definitions (‘word(s)’ and ‘definition’ options). The dictionary retrieves a list of over 100 words whose definitions include any of the words introduced. In this list we find words such as quiver, which is defined as ‘a case designed to hold and transport arrows, often strapped to the back or waist’. This definition contains two of the three query words. The list also includes other words whose definitions contain only one of the query words, for example, achieve. One of 99
The Bloomsbury Handbook of Lexicography
Figure 8.5 Presence of any of the words in the Wordsmyth dictionary.
the definitions of achieve is ‘to reach or carry through successfully; accomplish’. The criterion used by the dictionary to list the words is alphabetical. The dictionary does not prioritize entries with a higher number of query words in their definition.
4.1.5.3 Presence of one word and absence of another word In this search technique some of the words must be present in the content field of the dictionary and others must not be present. Of the dictionaries analysed, CED, DUE, OED and TERMIUM 100
Researching the Use of Electronic Dictionaries
Plus allow this search technique. DUE uses the operator ‘!’, which is combined with two words, one on each side of the operator (word1 ! word2). This option retrieves entries whose definitions include the first word, with the condition that the second word does not appear. The CED and OED use the operator ‘NOT’, and TERMIUM Plus ‘AND NOT’. Figure 8.6 presents an example of this search in the CED. We combine the words feline NOT domestic, to search in the definition fields. We retrieve entries whose definition contains the word feline but not domestic. The result is a list of words referring to felines: bobcat, caracal, cheetah, feline, jaguar, jaguarondi, leopard, lion, lynx, etc. The definitions of these words include the word feline, but not domestic. As we can see in the definition of leopard, the word feline is included, but domestic is not. In the search results the word cat does not appear because it is defined in the dictionary as ‘domestic feline’. It is worth noting that searches ‘by a combination of two or more words’, for example, in the dictionary definitions, can be problematic when they require the user to guess words that might
Figure 8.6 Presence of one word and absence of another word using operators in the CED.9 101
The Bloomsbury Handbook of Lexicography
appear in the definition. Although the search of feline NOT domestic in the CED yields a good list of wild felines, cat NOT house, will not.
4.1.5.4 Presence of an exact sequence of words Among the dictionaries analysed, the DUE, Dirae and TERMIUM Plus allow users to search for an exact sequence of words in the dictionary’s content field. In some dictionaries the sequence of words must be introduced in quotation marks. Figure 8.7 presents an example of a search for an exact sequence of words in TERMIUM Plus. When we search for words whose definition includes the exact sequence fabricar moneda ‘to make a coin’, the result is the word acuñar ‘to coin’. As we can see in Figure 8.7, the exact sequence fabricar moneda appears in the definition of acuñar.
4.1.5.5 Queries in natural language The last search subtype using combinations of two or more words is the introduction of queries in natural language. This search can be used to obtain a list of words that might answer the question introduced in the dictionary. Of the dictionaries analysed, only the OneLook Reverse Dictionary allows this search technique. There is no restriction on the queries that can be introduced and so, for example, the user may use wh-questions (e.g. starting with what is or who is).
Figure 8.7 Presence of an exact sequence of words in TERMIUM Plus.10 102
Researching the Use of Electronic Dictionaries
The query a big building retrieves a list of words including bunker, cathedral, castle, mansion, coliseum, etc. The word castle is defined as ‘a large building’. In the same dictionary, if we introduce the query a small building, the resulting list includes chapel, turret, shed, summerhouse, lodge, shop, dentil, coach house, portakabin, cottage, etc. The word lodge is defined as ‘a small building for housing coaches and carriages and other vehicles’ (see Figure 8.8). Searches using questions in natural language are the same as other searches with combinations of two or more words, because what the dictionary does is to extract the question keywords to look for the related words, which it then retrieves. In the case of the question What is a big building? the dictionary looks for words that are related to big and building. The dictionary also detects some more complex elements in a question, such as negations placed immediately before a word. For instance, if we introduce the query country which has sea in the OneLook Reverse
Figure 8.8 Queries in natural language in the OneLook Reverse Dictionary. 103
The Bloomsbury Handbook of Lexicography
Dictionary, the first result is seaside (a place by the sea). In contrast, if we introduce country which has no sea, the first result is landlocked (almost or entirely surrounded by land). However, if the negation does not appear immediately before the keyword, the system does not understand what the user wants to ask. For example, if we introduce the query country that does not have sea, the retrieved words are short or solid, which are not related to the query. Thus, this system clearly does not understand complex questions.
4.1.6 Filters Filters are used to add a search restriction to the query and may take the form of, for example, a part of speech restriction, a thematic field restriction or a language restriction (in bilingual or multilingual dictionaries). Filters can be combined with an exact word, a partial word or an approximate word. Some dictionaries allow the user to refine the search by searching inside the search result from a previous search (refine search). Below we provide two examples of filters.
4.1.6.1 Part of speech filter An example of a part of speech filter is to look for a word only in its noun form. The Cercaterm allows the user to restrict queries to nouns, adjectives, verbs, adverbs, etc., as we can see in Figure 8.9.
Figure 8.9 Use of part of speech filters in Cercaterm. 104
Researching the Use of Electronic Dictionaries
4.1.6.2 Thematic area filter An example of a thematic area filter is found in the Cercaterm dictionary (see Figure 8.10). We introduce the word puente and we restrict the search with the thematic area filter Electrónica ‘electronics’. The result is a list of words or expressions in Catalan in which puente is a type of electronic connection (pont ‘jumper’, establir un pont ‘to jumper’, pont de Schering ‘Schering bridge’, etc.). If we introduce the same query puente and restrict the search with the thematic area filter Transportes ‘transport’, the result is a list of words in which puente is a type of building, a bridge (pont ‘bridge’, aproximació a un pas a nivell o a un pont mòbil ‘warning sign for grade crossing or a bascule or travelling bridge’, etc.).
4.2 The second element: The resource In an electronic dictionary the information is structured in sections or fields. Each of these sections can be queried. This feature of electronic dictionaries differs from paper dictionaries, in which information is alphabetically ordered and the only option available is to search alphabetically for a word in the list. Electronic dictionaries allow other types of search techniques: in the dictionary
105
The Bloomsbury Handbook of Lexicography
Figure 8.10 Use of thematic area filters in Cercaterm.
entries, in the content fields (definitions, examples, relations, forums or corpora), in a thematic index and in external links (to a search engine or other dictionaries). Broadly speaking, our classification divides the resource or specific sections that contain searchable information into four types: (1) entry field, (2) content field, (3) thematic field index and (4) external links access field.
4.2.1 Search in the entry field We understand entry field to mean all the headwords used to head and list the dictionary entries. The use of any query (an exact word, a partial word, etc.) in this field will allow the user to access all the entries whose headword coincides with the query introduced. A search in the entry field can be used to obtain information about a word that is found in such entries, for example, a definition, grammatical information, a usage example. A single entry can also contain several sub-entries, where each sub-entry has its own entry field. For example, an entry may contain the main entry (that may contain spelling variants, derived words and phrases plus inflections of all types of headwords), as well as one or more sub-entries containing idioms and phrasal verbs. 106
Researching the Use of Electronic Dictionaries
A word can be looked up in alphabetical order in the entries. For example, in the CED, the ‘browse’ search option orders the list of dictionary entries starting with the word or sequence of characters introduced by the user. If we introduce the sequence hous, the list starts with the word house, which is the first word in the dictionary that starts with hous (see Figure 8.11). Some dictionaries allow the list of entries to be ordered inversely, starting with the last letter introduced. For example, as we can see in Figure 8.12, if we introduce the sequence of characters cción with the option ‘diccionario inverso’ in the DRAE, the list of entries begins with the word
Figure 8.11 Search in the alphabetical list of entries in CED. 107
The Bloomsbury Handbook of Lexicography
Figure 8.12 Search in the inverse alphabetical list of entries in the DRAE.
acción ‘action’, the first word in alphabetical order ending in cción, and is followed by others such as redacción ‘writing’, reacción ‘reaction’, facción ‘faction’, etc.
4.2.2 Search in the content field Content fields include information in text format in each entry. The information can vary: a definition, examples, lexical or semantic relations, comments from a forum, and corpus concordances. The user can search with queries in these content fields to find entries whose content coincides with the query introduced. Below we provide examples of searches in the content fields mentioned.
4.2.2.1 Search in the definition fields In this example, we look for the words fruit and yellow in the definition fields of the Wordsmyth dictionary. The dictionary retrieves a list of words agrimony, apple, apricot, cherry, chinaberry, citron, Golden Delicious, grapefruit, Japanese quince, jujube and lemon. Figure 8.13 shows the definitions to the right of these words in which the words fruit and yellow appear. The CED, Diccionari de la llengua catalana, Dirae, DRAE, DUE, Grand dictionnaire terminologique, OED, OneLook Reverse Dictionary, TERMIUM Plus and WordReference also provide this search technique option.
4.2.2.2 Search in the relations fields Some dictionaries incorporate information on paradigmatic relations (e.g. synonymy, antonymy, hyponymy) or syntagmatic relationships (e.g. collocations) among their words or terms. These relation fields include words or expressions with a lexical or semantic relation 108
Researching the Use of Electronic Dictionaries
Figure 8.13 Search in the definition fields of the Wordsmyth. to a dictionary entry. In many cases, this information can be accessed by navigating the dictionary’s hyperlinks or by a direct search using keywords. In the search by navigation, the user accesses synonyms of a word that are linked from within an entry, and which can lead to other synonyms. In the direct search with keywords, the dictionary searches for the query word or words in the synonymy fields and retrieves a list of entries that include that synonym in their synonymy fields. Below we include examples of both search techniques. Examples of a search in the relations field by navigation come from the WordNet dictionary, Merriam-Webster monolingual English dictionary and OncoTerm. The WordNet dictionary 109
The Bloomsbury Handbook of Lexicography
Figure 8.14 Search in the semantic relations field by navigation in the WordNet dictionary.
entries include semantically related words, such as synonyms, hypernyms and hyponyms. As we can see in Figure 8.14, the entry for the word transport accesses the hyponyms air transport, navigation, hauling or trucking, etc. An example of direct search with keywords in the semantic relations field can be found in the Wordsmyth dictionary. Figure 8.15 shows a search for entries that include the word agreement among their synonyms. The dictionary retrieves a list of twenty-seven synonyms for agreement, including accession, accord, alliance, etc. Other dictionaries also allow searches in the semantic relations fields. The WordReference and Ultralingua dictionaries contain a synonym search option. The OneLook Reverse Dictionary generates lists of words related to the query. Some dictionaries allow searches for collocations (Dicouèbe, DiCoInfo, DiCoEnviro, DAD, DiCE). In the Computer Science dictionary, DiCoInfo, we introduce the verb envoyer in French and search in the ‘lien lexical’ (lexical relation) field. The result is a list of collocations of the verb envoyer. Figure 8.16 shows that the results include the collocation envoyer un courriel. The user can access the entries through the hyperlink of the collocate.
4.2.2.3 Search in a complementary forum Some dictionaries incorporate forums in their content fields. In these forums the user can ask questions related to the entry or consult the answers to previous questions asked by other users. These forums can be useful when the information included in the entry does not satisfy the user’s queries. For example, in the WordReference dictionary, the entries for agreement and contract include a comment from the forum contract for agreement in which the difference between a contract and an agreement is explained (see Figure 8.17). 110
Researching the Use of Electronic Dictionaries
Figure 8.15 Direct search in the semantic relations in the Wordsmyth dictionary.
Figure 8.16 Search in the lexical relations field in the DiCoInfo dictionary. 111
The Bloomsbury Handbook of Lexicography
Figure 8.17 Search in a complementary forum in the WordReference dictionary. Forums can be useful for both dictionary users and creators, since the latter can use the users’ queries and answers to improve the dictionary. This way the forum works as a control and evaluation mechanism and allows creators to adapt the dictionary to the users’ needs. However, this search technique has to be used with caution: answers in a forum are not always controlled, and their reliability should be verified.
4.2.2.4 Search in a complementary corpus Some dictionaries incorporate links to a corpus in the entry content, in which the user can access concordances for each dictionary entry. An example of this search technique can be found in the Just The Word. Figure 8.18 shows this database entry for the term employee. It provides 112
Researching the Use of Electronic Dictionaries
Figure 8.18 Search in a complementary corpus in Just The Word dictionary.
combinations of the word in a corpus. The user can read each concordance and also access the complete text in which the concordance appears.
4.2.3 Search in the thematic field index A thematic field index is a list of hierarchically ordered areas, in which the user can navigate and select the item they want to consult. This field is frequently used to show a map of thematic areas. The dictionary entries, which can be words, but may also be images, are classified in these thematic areas. There are two types of search in the thematic field index. In the search by navigation the user scrolls down the hierarchical structure of thematic areas. In the direct search the user introduces a keyword in the dictionary that corresponds to a thematic area. Examples of both types of search are given below. In the first example (see Figure 8.19) we search by navigation in the DRAE thematic field index. We scroll down the profesiones y disciplinas ‘professions and disciplines’ hierarchical structure, then enter ciencia y técnica ‘science and technology’, matemáticas ‘mathematics’ and finally álgebra ‘algebra’. The result is a list of dictionary entries classified within the selected thematic area that includes the terms binomio ‘binomial’, cociente ‘quotient’, coeficiente ‘coefficient’, combinación ‘permutation’, etc. In the following example we perform a direct search with keywords in the thematic field index of the Merriam-Webster Visual Dictionary. We introduce the keyword flower. The dictionary suggests the thematic areas flower and flowering. If we select the area flower, we access entries with images related to this thematic area, such as pleasure garden, examples of flowers, structure of a flower and structure of a plant. 113
The Bloomsbury Handbook of Lexicography
Figure 8.19 Search in the DRAE thematic field index.
4.2.4 Search in external links access field This type of field offers links to resources external to the dictionary. Some dictionaries are linked to web search engines and other dictionaries. For example, in the WordReference dictionary, the ‘in context’ option searches in Google for the query introduced. In addition, the ‘images’ option will search for the query in Google images. Our example of a search in links to external dictionaries comes from the OneLook Reverse Dictionary. Figure 8.20 shows that the entry for the word hall has links to other general online dictionaries, such as the Merriam Webster or the Oxford Dictionaries, as well as other specialized online dictionaries in the fields of arts, economics, computer science, medicine, etc.
4.3 The third element: The result The result of a search is the information the user obtains after querying a dictionary. The result of a dictionary search is usually the entry with information about a word (meaning, grammatical information, pronunciation, etymology, use in context, equivalences, collocations and related words, etc.). In other cases, the result is a word or a list of words that corresponds to an entry in the dictionary. Finally, the result may also be an image or list of images, and the results may include audio files. The retrieval of these results depends on the type of dictionary and the options incorporated in it. Below we describe what we consider to be the most innovative results as implemented in electronic dictionaries. 114
Researching the Use of Electronic Dictionaries
Figure 8.20 Search in the external links access field of the OneLook Reverse Dictionary.
4.3.1 Context(s) We start by explaining the result where the user obtains information about the use in context of a word. Some dictionaries include contextual information about words in their entry content fields, for example, FrameNet. Figure 8.21 shows the entry for the verb play in the frame ‘Performers_and_roles’. Each frame element has a different colour, which allows the user to identify every frame element in a context. The entry also includes information about the syntactic patterns of frame elements. The user can also obtain contextual information about the dictionary entries by accessing the concordances of a complementary corpus, for example, in the Just The Word.
4.3.2 Collocations and related words The result of a dictionary search may be an entry with information about paradigmatically related words (e.g. synonyms, antonyms, etc.) and/or syntagmatically related words (collocations). In the French version of the DiCoInfo, when a lexical unit is the base for collocations expressing different meanings, these collocations are grouped according to their meaning. For example, the lexical unit fichier ‘file’ is the base for two collocations which express the idea of ‘to delete’ a 115
The Bloomsbury Handbook of Lexicography
Figure 8.21 Result of information about use in context in FrameNet. file: supprimer or effacer un fichier, therefore these two collocations occur together. In the DiCE the ayuda a la redacción ‘help with text production’ search option allows the user to retrieve the collocates that express the idea of ‘true love’: amor acendrado, verdadero, único.
4.3.3 Graphical representation of relations between words In some dictionaries paradigmatic and syntagmatic relations are presented in graphs or tables. In EcoLexicon results are presented in a graph, for example, drought ‘attribute of’ dryness, ‘type of’ hydrological drought, ‘affects’ temperature, precipitation, area of land, ephemeral lake, flow and equivalents in several languages (see Figure 8.22).
4.3.4 Word or list of words Another result obtained from the dictionary is a word or list of words. Some search techniques, such as the search in content fields or the search in a thematic field index, can generate a list of words. For example, if we search the query vehicle in the definition field of the Wordsmyth dictionary, it retrieves a list of words that contain the query in the definition fields of their entries, such as aircraft, airflow, ambulance, aquaplane, ATV, automobile, etc. In addition, if we search for the thematic area Lógica ‘Logic’ in the DRAE thematic field index (profesiones y disciplinas ‘Professions and disciplines’ → Filosofía ‘Philosophy’ → Lógica ‘Logic’), the result we obtain is a list of words that are classified within this field, such as 116
Researching the Use of Electronic Dictionaries
Figure 8.22 Graphical representation of relations in EcoLexicon.
a contrariis ‘e contrario’, ad hóminem ‘ad hominem’, antecedente ‘antecedent’, a pari ‘a pari’, apodíctico ‘apodictic’, argumento ‘argument’, etc. It is worth mentioning that in most of the dictionaries that we have analysed, lists of words are displayed in alphabetical order. Yet in Dirae, for example, results may be displayed in alphabetical order, in order of relevance according to the user’s search, in order of frequency in a corpus, or according to word length. It would be a valuable aid to the user if all the dictionaries displayed search results in order of relevance according to the user’s search. For example, if the user searches for agency, the resulting list of words should not be displayed in the order in which they appear in the dictionary, but in order of relevance, so that agency appears in the search result list before adoption agency.
4.3.5 Image or list of images Some visual dictionaries retrieve images classified in a thematic field index. For example, in the Merriam-Webster Visual Dictionary, if we navigate in the thematic field (‘themes’) in Plants & gardening → Plants → Flower → Structure of a flower, the dictionary retrieves the image of a
117
The Bloomsbury Handbook of Lexicography
flower with the names of all its parts. Illustrations are present in some entries in EcoLexicon, and in Collins Dictionary photos are embedded from Flickr.
4.3.6 Audio files It is becoming common among electronic dictionaries to include audio files in the entries of some terms. In most dictionaries these audio files provide the pronunciation of words. Other types of audio file are also available in some electronic dictionaries. Some audio files help the user to understand the meaning of a word, in the same way as a definition or an image can do. For example, in the Macmillan English Dictionary some entries include audio files that clarify the meaning of a word with a sound. In the entry of the noun applause, we can hear the sound of people applauding, thus helping the user to understand the meaning of that word.
5 Summary table and conclusion In the analysis presented here, we have shown how each electronic dictionary uses a different nomenclature to refer to the types of search it offers. Moreover, in some cases the same name is used in different dictionaries to refer to different searches. This inevitably leads to problems when comparing dictionaries or teaching students how to use them. Neither does this situation encourage communication between lexicographers, terminographers and computer technicians when collaborating to develop a new dictionary, nor assist faculty in teaching future lexicographers, terminographers, philologists or translators to use a standard classification that is valid for all the search techniques. We think that a homogeneous classification and nomenclature such as the one we have presented here would be useful to evaluate electronic dictionaries, assess their functions, teach how to use them or design new dictionaries.
Table 8.1 Summary of our classification of search techniques for electronic dictionaries.
118
Researching the Use of Electronic Dictionaries
From the review of the literature and the analysis of a set of electronic dictionaries, we have synthesized the search techniques in electronic dictionaries into three elements: the query, the resource, and the result. These three elements embody all the search possibilities we have observed in electronic dictionaries. In addition, this structure is flexible, in that new elements can be incorporated if needed. In a first analysis of fifteen electronic dictionaries, we proposed a classification of search techniques in electronic dictionaries; we then increased the number of dictionaries analysed to thirty-two and our classification still proved useful. Moreover, we have noticed that dictionaries are adding innovative features that we have included in our classification, for example, new results such as collocations and related words, and the graphical representation of relations. With this classification, we aim to contribute to solving the problems related to nomenclatures, types and subtypes of searches that we have found in each particular dictionary.
Acknowledgements We would like to thank the authors and editors of the following dictionaries for granting us permission to reproduce screenshots of them: Cercaterm, Collins English Dictionary (CED), Diccionario de la lengua española (DRAE), DiCoInfo, EcoLexicon, FrameNet, OneLook Reverse Dictionary, TERMIUM Plus, Ultralingua, WordNet, WordReference, Wordsmyth. This research is part of the ONTODIC Project: Methodology and technologies for the elaboration of onomasiological dictionaries based on ontologies. Terminological resources for e-translation, TSI2006-01911, and the ONTODIC II Project: Methodology and techniques for the elaboration of collocations dictionaries based on ontologies. Terminological resources for e-translation, TIN2009-07690, both funded by the Spanish Government.
Notes 1 This chapter is a revised and expanded version of our analysis which appeared in an article published in Volume 23 (3) of the International Journal of Lexicography in 2010, reproduced here with kind permission of Oxford University Press. 2 For more information on the Explanatory Combinatorial Lexicology (ECL), see Mel’čuk et al. (1995). 3 See L’Homme (2008), Jousse et al. (2011). 4 See Alonso et al. (2010), Vincze et al. (2011). 5 For more information on Frame-Based Terminology and EcoLexicon, see Faber et al. (2005, 2006), Tercedor and López-Rodríguez (2008), Faber et al. (2009), Prieto-Velasco and López-Rodríguez (2009), Faber (2010). 6 For more information on semantic frames and FrameNet, see Fillmore (1985), Fillmore et al. (1998, 2002, 2003), Ruppenhofer et al. (2006). 7 See López-Rodríguez et al. (2006). 8 For more information on WordNet, see Fellbaum (1998), Miller (1998a, 1998b). 9 Reproduced from Collins Electronic English Dictionary & Thesaurus with the permission of HarperCollins Publishers Ltd. © HarperCollins Publishers 1992. 10 Source: Translation Bureau, Government of Canada’s Terminology and Linguistic Databank TERMIUM Plus® (www.btb.termiumplus.gc.ca), Government of Canada, 2012. Reproduced with the permission of the Minister of Public Works and Government Services Canada, 2012. 119
The Bloomsbury Handbook of Lexicography
References Electronic dictionaries Cercaterm, Centre de Terminologia TERMCAT. http://www.termcat.cat/ [accessed 7 August 2020]. Collins Dictionary, http://www.collinsdictionary.com/ [accessed 7 August 2020]. Collins English Dictionary, Electronic edition. Version 1.5, HarperCollins Publishers (CD-ROM). Diccionario de colocaciones del español (DiCE), Grupo DiCE, Universidade da Coruña. http://www. dicesp.com/ [accessed 7 August 2020]. Diccionari de la llengua catalana, Second edition, Institut d’Estudis Catalans. http://dlc.iec.cat/ [accessed 7 August 2020]. Diccionario de la lengua española, Electronic edition. Version 21.1.0., Real Academia Española. Madrid, Espasa Calpe (CD-ROM). Diccionario de la lengua española, Real Academia Española. http://rae.es [accessed 7 August 2020]. Diccionario de uso del español, Moliner, M., Madrid, Gredos (CD-ROM). DiCoEnviro (Le dictionnaire fondamental de l’environnement), Observatoire de Linguistique Sens-Texte (OLST), Université de Montréal. http://olst.ling.umontreal.ca/cgi-bin/dicoenviro/search-enviro.cgi?ui=es [accessed 7 August 2020]. DiCoInfo (Dictionnaire fondamental de l’informatique et de l’Internet), Observatoire de Linguistique Sens-Texte, Université de Montréal. http://olst.ling.umontreal.ca/dicoinfo/ [accessed 7 August 2020]. Dicouèbe: Dictionnaire en ligne de combinatoire du français, Observatoire de Linguistique Sens-Texte (OLST), Université de Montréal. http://olst.ling.umontreal.ca/dicouebe/ [accessed 7 August 2020]. Dictionnaire analytique de la distribution, J. Dancette. http://olst.ling.umontreal.ca/dad/ [accessed 7 August 2020]. Dictionnaire d’Apprentissage du Français des Affaires (DAFA), GRELEP. http://www.projetdafa.net/ [accessed 16 July 2020]. Dirae, G. Rodríguez Alberich and Real Academia Española. http://dirae.es/ [accessed 7 August 2020]. EcoLexicon, LexiCon Research Group, Universidad de Granada. http://ecoLexicon.ugr.es/en/ [accessed 7 August 2020]. FrameNet, International Computer Science Institute. http://framenet.icsi.berkeley.edu/ [accessed 7 August 2020]. Grand dictionnaire terminologique, Office québécois de la langue française, Gouvernement du Québec. http://www.oqlf.gouv.qc.ca/ressources/gdt.html [accessed 7 August 2020]. IATE (InterActive Terminology for Europe), European Communities. http://iate.europa.eu [accessed 16 July 2020]. Just the Word, Sharp Laboratories of Europe. http://www.just-the-word.com/ [accessed 7 August 2020]. Le Grand Robert de la langue française, Electronic edition. Version 2.0 (CD-ROM). Le Trésor de la Langue Française informatisé, Atilf. http://atilf.atilf.fr/tlf.htm [accessed 7 August 2020]. Macmillan English Dictionary, Macmillan Publishers Limited. http://www.macmillandictionary.com/ [accessed 7 August 2020]. Merriam-Webster Online, Merriam-Webster, incorporated. http://www.merriam-webster.com/ [accessed 7 August 2020]. OncoTerm: Sistema Bilingüe de Información y Recursos Oncológicos, Grupo de Investigación OncoTerm, Universidad de Granada. http://www.ugr.es/~oncoterm/ [accessed 7 August 2020]. OneLook Reverse Dictionary, http://www.onelook.com/reverse-dictionary.shtml [accessed 16 July 2020]. Oxford English Dictionary, Electronic edition. 2nd Edition. Version 1.00, Oxford University Press (CD-ROM). TERMIUM Plus, Government of Canada. http://www.btb.termiumplus.gc.ca [accessed 7 August 2020]. Ultralingua, http://ultralingua.com/onlinedictionary/index.html [accessed 7 August 2020]. WordNet 3.0, Princeton University. http://wordnet.princeton.edu [accessed 7 August 2020]. WordReference Online Language Dictionaries, http://wordreference.com/ [accessed 7 August 2020]. Wordsmyth, http://www.wordsmyth.net/ [accessed 7 August 2020].
120
Researching the Use of Electronic Dictionaries
Other references Abate, F.R. (1985), ‘Dictionaries past and future: Issues and prospects’, Dictionaries 7, 270–83. Alcina, A. (2009), ‘Metodología y tecnologías para la elaboración de diccionarios terminológicos onomasiológicos’ in A. Alcina, E. Valero and E. Rambla (eds), Terminología y sociedad del conocimiento, Bern: Peter Lang, 33–58. Alonso, M., A. Nishikawa and O. Vincze (2010), ‘DiCE in the web: An online Spanish collocation dictionary’ in S. Granger and M. Paquot (eds), eLexicograpy in the 21st Century: New Challenges, New Applications. Proceedings of eLex 2009 (Cahiers du Cantal 7), Louvain-la-Neuve: Presses Universitaires de Louvain, 364–74. Atkins, B. T. S. (ed.) (1998), Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators (Lexicographica. Series Maior 88), Tübingen: Max Niemeyer. Atkins, B. T. S. and K. Varantola (1998), ‘Monitoring dictionary use’ in B. T. S. Atkins (ed.), 83–122. Béjoint, H. (1981), ‘The foreign student’s use of monolingual English dictionaries: A study of language needs and reference skills’, Applied Linguistics 2 (3), 207–22. Béjoint, H. (1989), ‘The teaching of dictionary use: Present state and future tasks’ in F. J. Hausmann et al. (eds), Vol. 1, 208–15. Bogaards, P. (1988), ‘À propos de l’usage du dictionnaire de langue étrangère’, Cahiers de Lexicologie 52 (1), 131–52. Bowker, L. (1998), ‘Using specialized monolingual native-language corpora as a translation resource: A pilot study’, Meta 43 (4), 631–51. Carr, M. (1997), ‘Internet dictionaries and lexicography’, International Journal of Lexicography 10 (3), 209–30. Castagnoli, S. (2008), ‘Corpus et bases de données terminologiques: l’interpretation au service des usagers’ in F. Maniez, P. Dury, N. Arlin and C. Rougemont (eds), Corpus et dictionnaires de langues de spécialité, Bresson: Presses Universitaires de Grenoble, 213–30. Colominas, C. (2004), ‘Los corpora como herramientas de traducción’ in E. Ortega (ed.), Panorama actual de la investigación en Traducción e Interpretación, Granada: Atrio, 362–72. Corpas, G., J. Leiva and M.J. Varela (2001), ‘El papel del diccionario en la formación de traductores e intérpretes: análisis de necesidades y encuestas de uso’ in M. Ayala (ed.), Diccionarios y enseñanza, Alcalá de Henares: Servicio de Publicaciones de la Universidad de Alcalá, 239–73. Corris, M., C.D. Manning, S. Poetsch and J. Simpson (2000), ‘Bilingual dictionaries for Australian languages: user studies on the place of paper and electronic dictionaries’ in U. Heid et al. (eds), 169–81. Cowie, A.P. (1999), English Dictionaries for Foreign Learners – A History, Oxford: Clarendon Press. Church, K.W. (2008), ‘Approximate lexicography and web search, International Journal of Lexicography 21 (3), 325–36. de Schryver, G-M. (2003), ‘Lexicographers’ dreams in the electronic-dictionary age’, International Journal of Lexicography 16 (2), 143–99. Dodd, W.S. (1989), ‘Lexicomputing and the dictionary of the future’ in G. James (ed.), Lexicographers and Their Works (Exeter Linguistic Studies 14), Exeter: Exeter University Press, 83–93. Faber, P. (2010), ‘Terminología, traducción especializada y adquisición de conocimiento’ in E. Alarcón (ed.), La traducción en contextos especializados. Propuestas didácticas, Granada: Atrio, 87–96. Faber, P., P. León-Araúz, J.A. Prieto-Velasco and A. Reimerink (2007), ‘Linking images and words: The description of specialized concepts’, International Journal of Lexicography 20 (1), 39–65. Faber, P., P. León-Araúz and J.A. Prieto-Velasco (2009), ‘Semantic relations, dynamicity, and terminological knowledge bases’, Current Issues in Language Studies 1 (1), 1–23. Faber, P., C. Márquez-Linares and M. Vega-Expósito (2005), ‘Framing terminology: A process-oriented approach’, Meta 50 (4). Available at http://id.erudit.org/iderudit/019916ar. Faber, P., S. Montero, M.R. Castro-Prieto, J. Senso-Ruiz, J.A. Prieto-Velasco, P. León-Araúz, C. MárquezLinares and M. Vega-Expósito (2006), ‘Process-oriented terminology management in the domain of coastal engineering’, Terminology 12 (2), 189–213.
121
The Bloomsbury Handbook of Lexicography
Fellbaum, C. (1998), WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press. Fernández-Pampillón, A. and M. Matesanz (2003), ‘Los diccionarios electrónicos: hacia un nuevo concepto de diccionario’ in C. López and A. Séré (eds), Nuevos géneros discursivos: los textos electrónicos, Madrid: Biblioteca nueva, 137–58. Fillmore, C.J. (1985), ‘Frames and the semantics of understanding’, Quaderni di Semantica 6, 222–53. Fillmore, C.J. and B.T.S. Atkins (1998), ‘FrameNet and lexicographic relevance’ in A. Rubio, N. Gallardo, R. Castro and A. Tejada (eds), Proceedings of the ELRA Conference on Linguistic Resources. Granada, 28–30 May 1998, 417–23. Fillmore, C.J., C.F. Baker and H. Sato (2002), ‘The FrameNet database and software tools’ in M. G. Rodríguez and C. P. S. Araujo (eds), Proceedings of the Third International Conference on Language Resources and Evaluation (LREC), Las Palmas de Gran Canaria, 29–31 May 2002, 1157–60. Fillmore, C.J., C.R. Johnson and M.R.L. Petruck (2003), ‘Background to FrameNet’, International Journal of Lexicography 16 (3), Special Issue on Frame Semantics, 235–50. Forget, N. (1999), Les dictionnaires électroniques dans l’optique de la traduction, MA Dissertation, University of Ottawa. Geeraerts, D. (2000), ‘Adding electronic value. The electronic version of the Grote Van Dale’ in U. Heid et al. (eds), 75–84. Gómez González-Jover, A. (2005), Terminografía, lenguajes profesionales y mediación interlingüística. Aplicación metodológica al léxico especializado del sector industrial del calzado y de las industrias afines, PhD Thesis, Universidad de Alicante. Gross, G. (1997), ‘La grammaire, les dictionnaires et l’informatique’ in J. Pruvost (ed.), Les dictionnaires de la langue française et l’informatique. Actes du colloque la Journée des dictionnaires, CergyPontoise: Centre de Recherche Texte-Histoire, 55–64. Hamon, T. and A. Nazarenko (2001), ‘Detection of synonymy links between terms’ in D. Bourigault, C. Jacquemin and M.-C. L’Homme (eds), Recent Advances in Computational Terminology, Amsterdam and Philadelphia: John Benjamins, 185–208. Harley, A. (2000), ‘Cambridge dictionaries online’ in U. Heid et al. (eds), 85–8. Hartmann, R.R.K. (1999), ‘Thematic report 2. Case study: The Exeter University survey of dictionary use’ in R.R.K. Hartmann (ed.), Dictionaries in Language Learning. Recommendations, National Reports and Thematic Reports from the TNP Sub-Project 9: Dictionaries, Berlin: Freie Universität Berlin, 36–52. Hausmann, F-J., O. Reichmann, H.E. Wiegand and L. Zgusta (eds) (1989–91), Wörterbücher/ Dictionaries/Dictionnaires, Ein internationales Handbuch zur Lexikographie/An International Encyclopedia of Lexicography/Encyclopédie internationale de lexicographie, vols 1–3, Berlin: Walter de Gruyter. Heid, U., S. Evert, E. Lehmann and C. Rohrer C. (eds) (2000), Proceedings of the Ninth Euralex International Congress, EURALEX 2000, Stuttgart, Germany, 8–12 August 2000, Universität Stuttgart: Institut für Maschinelle Sprachverarbeitung. Hulstijn, J.H. and B.T.S. Atkins (1998), ‘Empirical research on dictionary use in foreign-language learning: survey and discussion’, in Atkins (ed.), 7–19. Ide, K. (1993), ‘A catalogue of electronic dictionaries’, Language 22 (5), 42–9. Jacquet-Pfau, C. (2002), ‘Les dictionnaires du français sur cédérom’, International Journal of Lexicography 15 (1), 89–104. Jousse, A.-L., M-C. L’Homme, P. Leroyer and B. Robichaud (2011), ‘Presenting collocates in a dictionary of computing and the Internet according to user needs’ in I. Boguslavsky and L. Wanner (eds), Proceedings of the 5th International Conference on Meaning-Text Theory. Barcelona, 8–9 September 2011, 134–44. Kaalep, H.-J. and J. Mikk (2008), ‘Creating specialised dictionaries for foreign language learners: A case study’, International Journal of Lexicography 21 (4), 369–94. Kay, M. (1984), ‘The dictionary server’ in Proceedings of the Tenth International Conference on Computational Linguistics (COLING-84). Stanford, 2–6 July 1984, 461. 122
Researching the Use of Electronic Dictionaries
Knowles, F.E. (1990), ‘The computer in lexicography’ in F-J. Hausmann et al. (eds), Vol. 2, 1645–72. Kussmaul, P. (1995), Training the Translator, Amsterdam and Philadelphia: John Benjamins. L’Homme, M.-C. (2008), ‘Le DiCoInfo. Méthodologie pour une nouvelle génération de dictionnaires spécialisés’, Traduire 217, 78–103. Lehr, A. (1996), ‘Electronic dictionaries’, Lexicographica 12, 310–17. Lew, R. (2011), ‘Online dictionaries of English’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), e-Lexicography. The Internet, Digital Initiatives and Lexicography, London: Continuum, 230–50. Lew, R., R. Kaźmierczak, E. Tomczak and M. Leszkowicz (2018), ‘Competition of definition and pictorial illustration for dictionary users’ attention: An eye-tracking study’, International Journal of Lexicography, 31 (1), 53–77. López-Rodríguez, C.I., P. Faber and M. Tercedor (2006), ‘Terminología basada en el conocimiento para la traducción y la divulgación médicas: el caso de Oncoterm’, Panacea VII (24), 228–40. Mackintosh, K. (1998), ‘An empirical study of dictionary use in L2-L1 translation’ in B.T.S. Atkins (ed.), 123–49. McCreary, D.R. and F. Dolezal (1999), ‘A study of dictionary use by ESL students in an American university’, International Journal of Lexicography 12 (2), 107–46. Mel’čuk, I.A., A. Clas and A. Polguère (1995), Introduction à la lexicologie explicative et combinatoire, Brussels: Duculot. Meyer, I. (1988), ‘The general bilingual dictionary as a working tool in thème’, Meta 33 (3), 368–76. Miller, G.A. (1998a), ‘Foreword by George A. Miller’ in C. Fellbaum (ed.), WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press, xv–xxii. Miller, G.A. (1998b), ‘Nouns in WordNet’ in C. Fellbaum (ed.), WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press, 23–46. Montero, S. and P. Faber (2008), Terminología para traductores e intérpretes, Granada: Tragacanto. Nesi, H. (1998), Dictionaries on Computer: How Different Markets Have Created Different Products, University of Warwick. Nesi, H. (1999), ‘A user’s guide to electronic dictionaries for language learners’, International Journal of Lexicography 12 (1), 55–66. Nesi, H. (2000a), The Use and Abuse of EFL Dictionaries. How Learners of English as a Foreign Language Read and Interpret Dictionary Entries (Lexicographica. Series Maior 98), Tübingen: Max Niemeyer. Nesi, H. (2000b), ‘Electronic dictionaries in second language vocabulary comprehension and acquisition’ in U. Heid et al. (eds), 839–47. Nesi, H. and R. Haill (2002), ‘A study of dictionary use by international students at a British university’, International Journal of Lexicography 15 (4), 277–305. Pastor, V. (2013), Estrategias de búsqueda onomasiológica en la actividad de traducción. Una ayuda al diseño de diccionarios terminológicos (PhD Thesis), Universitat Jaume I, Castelló. Available at http:// hdl.handle.net/10803/387420. Pastor, V. and A. Alcina (2009), ‘Search techniques in corpora for the training of translators’ in I. Ilisei, V. Pekar and S. Bernardini (eds), International Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography and Language Learning. Borovets, Bulgaria, 17 September 2009, 13–20. Pastor, V. and A. Alcina (2010), ‘Search techniques in electronic dictionaries: A classification for translators’, International Journal of Lexicography 23 (3), 307–54. Pastor, V. and A. Alcina (2011), ‘Acceso a la información terminológica en Internet: técnicas para traductores’ in S. Maruenda-Bataller and B. Clavel-Arroitia (eds), Multiple Voices in Academic and Professional Discourse. Current Issues in Specialised Language Research, Teaching and New Technologies, Newcastle: Cambridge Scholars Publishing, 243–56. Poirier, C. (1989), ‘Les différents supports du dictionnaire: livre, microfiche, dictionnaire électronique’ in F-J. Hausmann et al. (eds), Vol. 1, 322–27. Prieto-Velasco, J.A. and C.I. López-Rodríguez (2009), ‘Managing graphic information in terminological knowledge bases’, Terminology 15 (2), 179–213. 123
The Bloomsbury Handbook of Lexicography
Rizo, A. and S. Valera (2000), ‘Lexicografía bilingüe: el español y la lengua inglesa’ in I. Ahumada (ed.), Cinco siglos de lexicografía del español. IV Seminario de Lexicografía Hispánica. Jaén, 17–19 November 1999, Publicaciones de la Universidad de Jaén, 341–80. Roberts, R. P. (1990), ‘Translation and the bilingual dictionary’, Meta 35 (1), 74–81. Roberts, R. P. and L. Langlois (2001), ‘L’apport de l’informatique à la recherche lexicographique’, Meta 46 (4), 711–20. Robinson, D. (2003), An Introduction to the Theory and Practice of Translation, London: Routledge. Ruppenhofer, J., M. Ellsworth, M. R. L. Petruck, R. J. Christopher and J. Scheffczyk (2006), FrameNet II: Extended Theory and Practice. Available online http://framenet.icsi.berkeley.edu. Sallas, M. (2001), ‘La recerca d’informació i de documentació en terminologia’ in M. T. Cabré, L. Codina and R. Estopà (eds), Terminologia i Documentació, Barcelona: IULA, 107–20. Sánchez, M.d.M. (2004a), ‘Estudio experimental sobre el uso del diccionario como herramienta para el traductor: hacia una descripción de necesidades’ in E. Ortega (ed.), Panorama actual de la investigación en Traducción e Interpretación, Granada: Atrio, 477–86. Sánchez, M.d.M. (2004b), El uso de los diccionarios electrónicos y otros recursos de Internet como herramienta para la formación del traductor inglés-español, PhD Thesis, Universitat Jaume I. Santana, O., Z. Hernández, J. Pérez, G. Rodríguez and F. Carreras (1996), ‘Diccionarios en soportes informáticos’, Cuadernos Cervantes 11, 68–77. Sharpe, P. (1995), ‘Electronic dictionaries with particular reference to the design of an electronic bilingual dictionary for English-speaking learners of Japanese’, International Journal of Lexicography 8 (1), 39–54. Sobkowiak, W. (1999), Pronunciation in EFL Machine-Readable Dictionaries, Poznań: Motivex. Tercedor, M. and C.I. López-Rodríguez (2008), ‘Integrating corpus data in dynamic knowledge bases’, Terminology 14 (2), 159–82. Tomaszczyk, J. (1979), ‘Dictionaries: Users and uses’, Glottodidactica 12, 103–19. Tono, Y. (1989), ‘Can a dictionary help one read better?’ in G. James (ed.), Lexicographers and their Works (Exeter Linguistic Studies 14), Exeter: University of Exeter Press, 192–200. Vincze, O., E. Mosqueira and M. Alonso (2011), ‘An online collocation dictionary of Spanish’ in I. Boguslavsky and L. Wanner (eds), Proceedings of the 5th International Conference on Meaning-Text Theory. Barcelona, 8–9 September 2011, 275–86.
124
YES
DRAE
Dirae
YES
DUE
Search for a word in the alphabetical entry list
YES
‘ ’, -
‘ ’, *
In DRAE definitions
In DRAE definitions
YES
&, |, !, (), ‘ ’.
YES
YES
Y, O, NO in búsqueda múltiple
YES
YES
Search for Use of one or more operators words in the definitions or other fields of an entry
YES
YES
Search for a semantic relation by navigation
Introduction of a ~ at the end of the query
YES
Search for a semantic relation by direct search
Annex 1
YES
With wildcards
YES
Introduction of a partial word
YES
With wildcards
Introduction of *, ?, +, […], the beginning [! … ], @, # of a word and índice de todas las palabras option, or introduction of the end of a word and búsqueda inversa option
YES
YES
Access to Search by Access to Search for Introduction comple- thematic external images of an exact mentary area links word forums
Table 8.2 Summary of the electronic dictionary analysis.
125
*
YES
*, ?
YES
Use of wildcards
YES
Option: Búsqueda en las entradas
YES
stemming
stemming
YES
Option: árbol de categoría gramatical
YES
YES
Introduction Introduction Search for of a word to of an inflected anagrams search for form phonetically or orthographically related words
YES
Specification of a part of speech
Introduction of a question in natural language
YES
Robert
Diccionari de la llengua catalana
YES
YES
CED
Cercaterm
YES
Search for a word in the alphabetical entry list
OED
126
YES
i, o in àrea temàtica option
YES
Advanced search. Field: definició, exemple
Advanced ‘’ search. Field: denominación, definición and nota
YES
?, *, &, #
YES
YES
Recherche option and then texte intégral (clic on définitions box)
AND, OR, NOT, ()
YES
AND, OR, NOT
YES
YES
YES
YES
Search for Use of one or more operators words in the definitions or other fields of an entry
YES
YES
YES
YES
YES
Search for a semantic relation by navigation
Recherche option and then texte intégral (clic on synonymes, renvois et contraires box)
YES
Search for a semantic relation by direct search
Advanced search. Field: àrea temàtica
YES
Restrict subject area
YES
YES
YES
YES
YES
YES
Access to Search by Access to Search for Introduction comple- thematic external images of an exact mentary area links word forums
YES
*, ?, &, #
YES
*, ?
YES
*, ?
YES
Use of wildcards
Començada per, acabada en, en qualsevol posició, no començada per, no acabada en, que no contingui
YES
With *, ? wildcards, and advanced search
YES
With wildcards
YES
With wildcards
YES
With wildcards
YES
Introduction of a partial word
aproximat option in advanced search
YES
Correction phonétique option and phonetic search
YES
YES
YES
formes list
YES
YES
YES
Introduction Introduction Search for of a word to of an inflected anagrams search for form phonetically or orthographically related words
YES
in advanced search
YES
YES
YES
Specification of a part of speech
Introduction of a question in natural language
Search for a word in the alphabetical entry list
EcoLexicon
DiCE
YES
DiCoInfo YES
Dicouèbe
127
Option: lemas, funciones léxicas and valores
YES
Option: lien lexical
YES
All fields
YES
Search for Use of one or more operators words in the definitions or other fields of an entry
YES
YES
YES
Search for a semantic relation by navigation
Option: consultas avanzadas
YES
Option: lien lexical
YES
YES
Search for a semantic relation by direct search
YES Links to websites with images. It contains txt files with contexts extracted from a corpus
YES Domains
It has no links but includes context fields with contexts extracted from a corpus
YES
YES
YES
YES
YES
YES
Access to Search by Access to Search for Introduction comple- thematic external images of an exact mentary area links word forums
YES
Options: Terme commençant par and Terme contenant
YES
With wildcards
YES
Introduction of a partial word
*, ?
YES
%
YES
Use of wildcards
Introduction Introduction Search for of a word to of an inflected anagrams search for form phonetically or orthographically related words
carac. grammaticales [lexie:cgs]
YES
Specification of a part of speech
Introduction of a question in natural language
YES
Merriam- YES Webster YES Navigating or direct search with keywords
Thesaurus
YES
YES
Thesaurus
YES
Option: alternatives
Search in the BNC corpus
YES
YES
YES
YES
Macmillan Dictionary
Just The Word
IATE
In the visual dictionary with a thematic index or direct search with keywords
YES
YES
YES
YES
YES
Función autocompletar
YES
With wildcards
YES
Introduction of a partial word
YES
YES
Search by It has no frame links but includes contexts extracted from a corpus
YES
Access to Search by Access to Search for Introduction comple- thematic external images of an exact mentary area links word forums
dans la définition
Search for a semantic relation by direct search
YES
YES
Search for a semantic relation by navigation
Grand dictionnaire terminologique
Search for Use of one or more operators words in the definitions or other fields of an entry
YES
Search for a word in the alphabetical entry list
FrameNet
128 *, ?
YES
Use of wildcards
YES
YES
YES
YES
Introduction Introduction Search for of a word to of an inflected anagrams search for form phonetically or orthographically related words
Specification of a part of speech
Introduction of a question in natural language
WordNet
UNTERM
Ultralingua
YES
and, &
main field, acronym
AND, OR, AND NOT
YES
YES
Contenu
YES
TLFi
YES
YES
Termium YES Plus
Conceptual relations
YES
Search for Use of one or more operators words in the definitions or other fields of an entry
YES
YES
Search for a word in the alphabetical entry list
OneLook
OncoTerm
129
YES
YES
YES
YES
YES
YES
Search for a semantic relation by navigation
YES
English synonym dictionary
YES
Recherche complexe liens
YES Recherche assistée discipline
YES
Restrict subject area
Links to other dictionaries YES
YES
Direct search with keywords
Some entries include images
YES
YES
YES
YES
YES
YES
YES
Access to Search by Access to Search for Introduction comple- thematic external images of an exact mentary area links word forums
YES
Search for a semantic relation by direct search
YES
*, ?
YES
*, ?
YES
Use of wildcards
With wildcards
YES
*
YES
Option: word *, ?, + hunt
YES
contenant un mot donné
YES
YES
With wildcards
YES
Introduction of a partial word
YES
YES
YES
Option: Find Option: Find similar strings similar strings (fuzzy search) (fuzzy search)
YES
YES
Sons saisis, correcteur d’erreurs automatique, correcteur d’erreurs forcé
YES
YES
Specification of a part of speech
Recherche assistée - code grammatical
YES, listes YES de mots
Introduction Introduction Search for of a word to of an inflected anagrams search for form phonetically or orthographically related words
YES
Introduction of a question in natural language
Search for a word in the alphabetical entry list
Wordsmyth
YES
WordRef- YES erence
130
YES
Options: word(s), word(s) ±forms, all word(s), text string
YES
Option: advanced search (old interface), Reverse Search (new interface)
Options: In context, images (in Google), synonyms
YES
Search for Use of one or more operators words in the definitions or other fields of an entry
YES
YES
Search for a semantic relation by navigation
Options: synonyms, all word relations, Similar Word, Related Word
YES
Spanish synonym dictionary
YES
Search for a semantic relation by direct search
YES Link to Google
YES
Some entries include images
YES
In Google
YES
YES
YES
Access to Search by Access to Search for Introduction comple- thematic external images of an exact mentary area links word forums
With wildcards
YES
Autocomplete feature
YES
Introduction of a partial word
*, %,., _
YES
Use of wildcards
Options: spelledlike and pronunciations
YES
YES
YES
Specification of a part of speech
Option: advanced search (old interface), Reverse Search (new interface)
YES, YES Anagram and Crossword Solver
Introduction Introduction Search for of a word to of an inflected anagrams search for form phonetically or orthographically related words
Introduction of a question in natural language
9
Researching historical lexicography and etymology John Considine
Introduction Historical and etymological lexicography are abnormal in two respects. First, they are never primarily concerned with the present. An entry in a historical or etymological dictionary may begin or end with information about the present, but it can never leave the past out of account, and it may be exclusively concerned with the past. Second, although entries in historical and etymological dictionaries usually include definitions or translation equivalents, these are never of primary importance, and they are occasionally dispensed with altogether (cf. Silva 2000: 90). The research undertaken by historical lexicographers and etymologists is therefore markedly different from other kinds of lexicographical research, and some etymological work is not even oriented towards publication in dictionary form.
1 Historical lexicography 1.1 The weak and strong senses of ‘historical’ The phrase historical lexicography can be used in a weak sense or a strong one. In its weak sense, it can refer to any lexicography with a diachronic dimension. All monolingual Shakespeare dictionaries and glossaries are historical in this sense: they provide lexical items from a subset of early modern English with equivalents or explanations in a more recent variety of English. A dictionary which gives a date for the first attestation of a word, or for the first attestation of each sense of a word, and which registers obsolete words and senses, is likewise historical in this weak sense: Cannon (1996), to which we shall return at Section 2.1, is an example, as is the Concise Scots Dictionary of 1985. A characteristic entry in the latter reads (slightly abridged) ‘outlie &c n 1 an outlying piece of ground la20-, NE. 2 money put out on loan or on mortgage 19-e20’. Here, the first sense of the word is identified as in recorded use in north-eastern Scotland from the late twentieth century onwards, and the second as in use in the nineteenth and early twentieth centuries. No examples are given, and the definitions are the most important element in the entry.
The Bloomsbury Handbook of Lexicography
A more richly historical dictionary is the excellent Dictionnaire historique de la langue française edited by Alain Rey. Its entry for cab begins: CAB noun masc. is borrowed (1848) from the English cab (1827) … Cab originally refers to a carriage with two wheels drawn by one horse, then in 1834 this type of carriage with the driver at the rear … the [French] word is also used, as in England, for four-wheeled carriages with the driver at the front. The cab was elegant and fashionable in the second half of the 19th century … The form hansom cab is used with an extra quality of snobisme (Proust). Today the word is confined to historical usage.1 This entry combines lexical information about how the French word cab has been used with encyclopaedic information about the kind of carriage it designates. No source is quoted, although the use of hansom cab in Proust is commented on; in fact, it is remembered as one of Mme. Swann’s short-lived fashionable gestures – ‘beaucoup d’années auparavant elle avait eu son “hansom cab” ’ – and knowing this would have made it possible to place the snobisme to which the entry refers more precisely. The most extensive dictionary of French, the Trésor de la langue française, is at one level very shallowly historical, covering the period 1789–1960, but its entry structure includes a section in which the first known attestations of the major senses of a given word are briefly identified and dated. A dictionary which is, in the strong sense, edited on historical principles does more than simply providing historical information. It is founded on attestations of the actual usage of each word; as we shall see, these are usually from printed sources. The attestations are presented in every entry. An entry for a hapax legomenon, a word attested once and once only in all the relevant texts which are available to the lexicographer, quotes the text of that single attestation, giving the author, title, and often the date of the source, and as much further information (for instance page, chapter or line numbers) as will allow the reader to locate the quotation. An entry for a word which is attested more than once includes a quotation paragraph which begins with the earliest available attestation and then presents more, chronologically ordered, in a series ending with the latest attestation known to the lexicographer, or with one whose date is reasonably close to the upper chronological limit of the dictionary. In the case of a polysemous word, each sense (and, if appropriate, each subsense) is provided with such a quotation paragraph. The senses are arranged in the entry in an order which reflects their chronological development; there are several possible ways of doing this (see Silva 2000: 90–3). Although each quotation paragraph is likely to be accompanied by a definition, the quotations are not ancillary to the definition – if anything, the reverse is true. As a historical lexicographer has put it: It should go without saying that meaning ordinarily ranges continuously. Thus, division into defined senses is intended to set out the range of uses – illustrated with the clearest examples to be found – and not to imply their discreteness or that a use in context cannot reflect more than one of the senses as defined. (Ashdowne 2010: 209; cf. Silva 2000: 89–90) As well as the quotation paragraphs and definitions which are at the heart of the entry, a dictionary on historical principles is likely to offer at least some etymological information, and may also offer further information, for instance, about morphology, pronunciation, variations in spelling, 132
Historical Lexicography and Etymology
sociolinguistic factors and more. The source of the historical information in the entry for outlie in the Concise Scots Dictionary is the Scottish National Dictionary, a dictionary of post-1700 Scots on historical principles. This begins (slightly abridged) OUTLIE, n., v. Also outly(e), ootlie, -lye. 1. An outlying piece of ground (Abd., Kcd. 1964). ne.Sc. 1958 People’s Jnl. (6 Sept.): I was plooin’ an oot-lye, a gey bit frae the farm. ‡2. Money lying out at loan or on mortgage (Sc. 1808 Jam.). Per. 1881 R. Ford Hum. Readings 62: Wi’ mair interest for the outlie o’t. Abd. 1929 J. Alexander Mains & Hilly 56: Interest for the ootlie o’ their siller. A distinctive feature of this dictionary is its interest in the regional distribution of words: its title-page identifies it as compiled ‘partly on regional lines and partly on historical principles’. This accounts for the abbreviations: Ab[er]d[een], K[in]c[ar]d[ine] and so on. Distinctive too is the heavy dependence on an earlier dictionary, Jamieson’s of 1808 and 1825.2 But the vital role which accurately referenced quotation evidence plays in the Scottish National Dictionary is clear, and characteristic of all lexicography on historical principles. ‘The quotations’, in the opinion of James Murray (paraphrased in Murray 1977: 274), were ‘the essence’ of the New English Dictionary on Historical Principles which he edited, now known as the Oxford English Dictionary (OED) and in many ways the pre-eminent historical dictionary in the world.
1.2 Lexicography on historical principles: Different kinds of evidence The analysis of quotations is, then, the fundamental task of the historical lexicographer, with the collection of quotations as a laborious prerequisite. These tasks have both evolved over the years, not least as a result of changing technologies, and a way to start looking at them is to turn to the very first dictionary which was edited on explicitly historical principles, Franz Passow’s revision of Johann Gottlob Schneider’s Handwörterbuch der griechischen Sprache, published in 1819.3 This dictionary dealt with the language of a limited body of texts. Most (though not all) of the extant texts in ancient Greek were available to Passow in printed editions, and at least relative dates could be assigned to them. Much pioneering work had been done on the excerption of these texts to illustrate the range of senses of individual words (cf. Zgusta 2006: 30–1). Passow’s task was to re-examine the attestations of each word, to place them in an order which would show the chronological development of the word – he referred metaphorically to its ‘life-story’ – and to supplement them from, in the first instance, the oldest texts available to him, namely the poems of Homer and Hesiod, which form a canon of fewer than 250,000 tokens (Thesaurus Linguae Graecae, ‘canon search’ results). These poems were familiar to him, and there was an ample verbal index to them (Seber 1780). Had Passow lived long enough to work his way through all the ancient Greek available to him, he would have found himself dealing with a much less wieldy corpus. In fact, the discoveries of the last two centuries make a corpus of about 105 million words of classical and Byzantine Greek available to students of the language today. Although this corpus is fairly stable inasmuch as nobody is likely to discover another ten million words 133
The Bloomsbury Handbook of Lexicography
of ancient Greek, and although it is machine-readable in the form of the Thesaurus Linguae Graecae, it is large enough to be very difficult to handle lexicographically. So, for example, the stem lexikograph- occurs three times in the corpus, the earliest attestation being in Nymphis, a historian no later than the third century BC (Pantelia 2000: 7). Only the latest of these three attestations is noticed in the best modern dictionary of ancient Greek, that of Liddell and Scott (for the past, present state and future possibilities of which, see Stray, Clarke and Katz 2019). But it is one thing to note a single omission in a dictionary of a language as well attested as ancient Greek, and another to fix all of the omissions. Other dead languages offer different challenges to the historical lexicographer. Some are known from scantier evidence than that for ancient Greek. The corpus of classical Latin to 200 AD is about seven and a half million words, and future additions to it (for instance, from the discovery of documents such as those excavated at Vindolanda) are likely to be minor.4 It was exhaustively concorded on paper slips for the Thesaurus linguae latinae, a dictionary of Latin on historical principles: there is a slip for every occurrence of the conjunction et ‘and’ in every Latin text from before 200 AD (Schröder 2012: 295). The corpus of Old English is about three million words, and future additions to it are likely to be tiny. The Dictionary of Old English project currently under way at Toronto is therefore based, like the Thesaurus linguae latinae, on an exhaustive concordancing of the whole corpus of the language. The Oudnederlands Woordenboek is based on a very much smaller corpus, the 68,000 words (including many toponyms) now regarded as Old Dutch (Louwen 2008: 220). The sparser the evidence for a dead language is, the harder it is to reach firm conclusions about its surviving vocabulary, and the historical lexicography of such languages may have a provisional quality, as did Anna Morpurgo’s Mycenaeae Graecitatis Lexicon of 1963, a pioneering dictionary of Mycenaean Greek which was being retrieved from inscriptions in Linear B (for it, see Jones 1966). It may even have a self-consciously controversial quality: Erica Reiner, the chief editor of the Chicago Assyrian Dictionary, once said of it that ‘This is not a bland dictionary. … We stick out our necks, and then somebody comes along ten years later and corrects the guess’ (Shenker 1978: 3). When a historical dictionary is compiled on the basis of a large and ill-defined body of evidence, the selection of quotation material becomes a pressing concern. Such a body of evidence may be from an obsolete language variety. For instance, no new texts are being written in early modern English, but although there is a more or less complete inventory of surviving early modern English printed books, their texts are not all available in machine-readable format, and there is a very large body of manuscript material, much of it unedited. Work is in progress on a corpus of approximately a thousand million words of Latin (see Bamman and Smith 2012), of which 99 per cent must be post-classical, and this represents a fraction of the extant texts. It is to the point that attempts to compile a dictionary of early modern English have been unsuccessful (see Adams 2010), and no attempt has ever been made to compile a universal dictionary of postclassical Latin. The first dictionary to be compiled on consciously historical principles after that of Passow, the Deutsches Wörterbuch of Jacob and Wilhelm Grimm (first fascicle 1852), was really of pre-contemporary German, roughly from Luther to Goethe, so it was a dictionary based on materials selected from a closed but indeterminately large body of evidence. Two of the nineteenth-century dictionary projects which followed the inception of the Deutsches Wörterbuch were even more audacious. Both the New English Dictionary/OED (first fascicle 1884) and, at least from 1892, the Dutch Woordenboek der Nederlandsche Taal (first 134
Historical Lexicography and Etymology
fascicle 1863, but only on fully historical principles from 1892: see Eickmans 2012: 272) aimed to document a living language variety from a rationally chosen starting point to the present day. Not only did each project require editorial engagement with materials selected from a body of evidence which was growing daily – hence the justly famous image of F. J. Furnivall cutting useful quotations from his newspaper every morning to add to the New English Dictionary files (Murray 1977: 179–80, and cf. 183) – but each documented a language which had developed a wide and well-recorded variety of specialized terminologies, not least scientific and technological. The questions of selection and of the treatment of terminology affect nearly all dictionaries on historical principles in which living languages are treated, and they are both problematic.
1.3 Some problems for historical lexicographers The problem of selection is that some kinds of source are more readily available than others. Most obviously, verbatim recordings of informal speech are hard to come by before the twentieth century (texts presented as such may often be affected by scribal intervention), and informal and non-élite written usage are likewise increasingly hard to come by as the lexicographer searches backwards in time. The most readily available sources tend to be those which have canonical literary status. So, for instance, when the early editors of the OED made Walter Scott’s Lady of the Lake of 1810 their first quotation for the attributive use of the noun soldier, even though a poem published by Anne Bannerman in the Edinburgh Magazine of 1800 would have antedated the quotation (and may have inspired Scott: see Brewer 2009: 217–18), it was doubtless because The Lady of the Lake was very widely available to, and read by, the readers who provided them with much of their quotation evidence, while old volumes of the Edinburgh Magazine had a more restricted currency.5 Other historical dictionaries show a similarly heavy use of canonical literary sources for the same reason. The historical lexicography of regional varieties of English has necessarily drawn on the non-élite sources in which such varieties are most strongly marked, including popular print and records of oral usage. The latter were neatly integrated with historical printed texts in the Dictionary of Jamaican English in 1967 and have been used in other dictionaries on historical principles such as the Dictionary of Newfoundland English of 1982. Harry Orsman’s Dictionary of New Zealand English integrated its compiler’s recollections of oral usage in past decades with his printed sources. The third edition of the OED does not draw directly on oral sources; nor did previous editions, although Murray allowed himself to supply lacunae in the quotation evidence for current usage with quotations which he made up himself and labelled ‘Mod[ern]’ (see Murray 1977: 200–1). Whereas the OED now draws material from a very wide selection of print sources, especially those whose vocabulary is, in effect, indexed by being made available in online databases and archives, its editors are cautious of using data from online-only sources, just as they have always been cautious of using data from unpublished manuscripts. Where an onlineonly source provides a valuable quotation, for instance an antedating of all printed sources, as at Latinx or retweeting, it is printed out so that a hard copy can be archived. Scientific and technological vocabulary present one superficial problem for historical lexicography, and one deep one. The superficial problem is that the process of extracting lexical items from specialized scientific texts and writing definitions of them which are both
135
The Bloomsbury Handbook of Lexicography
exact and accessible to as wide a readership as possible calls for lexicographers with a special range of competence. But they can be found: the OED appointed its first science editor in 1968 (Brewer 2007: 200–3 and 295 n 84). The deeper problem is that, whereas in principle historical dictionaries proceed by induction from quotations to definition, this is impracticable in the case of terminology: the changing senses of geek can be induced from quotation evidence, but writing a definition of germanium calls for an understanding of the structure and uses of germanium, which is hardly to be extracted from quotation evidence, unless the quotations are themselves entries in a scientific dictionary or encyclopaedia. No historical dictionary registers the whole of a highly developed scientific terminology, or even as high a proportion of one as the fullest synchronic dictionaries, but they must all engage to some extent with this problem. It is, in fact, the problem of encyclopaedic reference: as soon as the lexicographer is called on to explain the name of a plant, or of a disease of sheep, or of a kind of fishing-boat, she can hardly avoid resorting to encyclopaedic sources of information rather than pure induction from the quotations. Dictionaries which register the history of a specialized vocabulary such as that of medicine (see e.g. Norri 2010, esp. 73) are, it might be argued, hybrids of the historical encyclopaedia and the historical dictionary. But no lexicography on historical principles is ever absolutely pure. Other elements of the historical dictionary entry may be derived from the quotations. This is the case with information about grammatical function, collocates, geographical distribution and register. Dictionaries of language varieties in which spelling is unstable may present a list of all the spelling variants of a given word, drawing on quotations which are not printed in the entry as well as on all those which are. Compiling these lists is dull work, but they have always been useful for verifying whether a given form can represent a given word (a reader who had encountered the form mamont in a text and suspected that it was a variant of mammoth would indeed find it in the forms list s.v. mammoth in the printed OED), and they are of redoubled use when the full text of a dictionary is electronically searchable (a reader who encounters the form occyccion in a text can search the online text of OED for it and will find it in the variant forms list s.v. oxycroceum). Historical dictionaries may comment ad hoc on pronunciation as far as it can be ascertained from metalinguistic comments in quotation evidence or from metrical verse. They have not traditionally provided reconstructed pronunciations, although the third edition of OED records the pronunciation of a given word which is indicated in previous editions if this differs from the pronunciations now regarded as normal. As for current pronunciations, OED takes these from the synchronic Oxford Dictionary of Pronunciation.6 No other element of the dictionary is more remote from historical principles. The hybrid quality of the Trésor de la langue française is to its advantage in this respect, for it can indicate pronunciation in the fundamentally synchronic portion of an entry, while omitting pronunciation information from the historical portion. Etymologies are usually presented in historical dictionaries, though a historical dictionary of a regional variety of a given language may refer whenever possible to the etymologies in a more general historical dictionary, or silently disclaim responsibility for the etymologies of well-documented words: so, for instance, the Dictionary of Newfoundland English notes tersely that sadogue ‘bread, cake’ is from Irish sodóg ‘cake’, but gives no etymological information at all s.v. salmon. The Middle English Dictionary gives minimal etymologies: that for pouche, for instance, reads ‘OF; cp. CF poche, pouche; AF puche’. In some revised OED entries, the etymology is an elaborate essay rich in information about etyma and cognates (that for pouch runs to 228 words, but those for the noun man and the verb may to more than a thousand apiece); 136
Historical Lexicography and Etymology
this was already true of some first-edition entries, as it had been of some entries in the Grimms’s Deutsches Wörterbuch. Passow’s original formulation of the historical principles of lexicography was in part a rejection of a view of language change in which etymological speculation played a central part (see Zgusta 2006: 27). It is therefore in keeping with his principles that the third edition of the OED has rejected the first edition’s use of asterized reconstructed forms in etymologies: for instance, the noun mind is now derived ultimately from ‘the Indo-European base of a preteritepresent verb for “to think, remember, intend” ’ rather than from ‘the root *men-, man-, mun- … to think, remember, intend’. This brings us back to the fundamental point that lexicography on historical principles is by definition based on historical evidence.
2 Etymological lexicography For our purposes, all etymological research can be placed at some point on each of three conceptual axes.7 The first two have to do with the form in which the research is disseminated. First, any number of etymologies may be presented in a given publication. A study of the etymology of a solitary word is a matter of lexicology rather than lexicography, but as we shall see, the dividing line between a collection of etymologies and an etymological dictionary is not always clear. Second, the results of etymological research may be presented more or less dogmatically: an entry in an etymological dictionary can make a single unsourced statement as though reporting unquestioned fact, or it can review the previous studies of an etymological problem in more or less depth, expressing a more or less marked preference for the conclusions of one of those studies, or for new conclusions. Turning from form to aims, we may conceive a third axis, extending from the etymological search for origins to etymological inquiry into development.
2.1 The number of etymologies presented How many headwords should an etymological dictionary present? One possible answer would be ‘as many as a general-purpose synchronic dictionary’, but in the cases of English and its relatives, there is good reason for a high proportion of the words registered in a general-purpose dictionary to be excluded from an etymological one. Words like fireside and unhappy are, after all, such transparent formations from familiar English elements that there would seem to be little or no point to including them in an etymological dictionary, and the purchaser of such a dictionary in printed form might well feel that they increased the bulk and cost of the book pointlessly. Of course, a form which appears to be as etymologically banal as unhappy may in fact speak to a form in another language: to what extent has unreal been influenced by Latin irrealis, at least in writings on grammar and philosophy? Should the use of uncanny as a conventional translation equivalent for German unheimlich in and following the writings of Freud be a matter for etymological comment? These are examples of a kind of question to which we shall return in Section 2.3. Most etymological dictionaries exclude transparent compounded and affixed forms but include a wide range of loanwords and all the common native words of the language. It would be possible for them also to exclude loanwords of an obvious kind: since general-purpose dictionaries already 137
The Bloomsbury Handbook of Lexicography
offer terse etymologies of the sort which identifies macaroni as Italian in origin or hexagon as Greek, there would be a case for restricting the coverage of an etymological dictionary to words about which there is more to be said. But in practice, all but the most sophisticated dictionary users would be puzzled and dissatisfied by an etymological dictionary from which many common words were excluded – moreover (and here we look forward to Section 2.3 again), a word whose ultimate origin is clear, like hexagon, may conceivably have entered English by any one of several routes, and a word whose proximate origin is clear, like macaroni, may present interesting problems in its remoter ancestry. Etymological dictionaries which are more selective than the norm may confine themselves to words which share one language of origin: for instance, Cannon (1996) is a dictionary of Japanese loanwords in English, and Corriente (2008) addresses the more important topic of Arabic loanwords in Spanish and the other Ibero-Romance languages (other examples are in Malkiel 1993: 90). Corriente’s work, like Manfred Görlach’s Dictionary of European Anglicisms, is an example of the polyglot etymological dictionary, a genre which surveys the outcomes in several languages of words from a single language. Whereas Corriente’s project could be undertaken by a single learned scholar, since expertise in one Ibero-Romance language leads naturally to access to the others, Görlach’s, which brings together data from Germanic, Romance, and Slavic languages, and from Finnish, Hungarian, Albanian and Greek, had to be the product of teamwork. This, by the way, raises the question of the breadth of linguistic competence necessary for a good etymologist of English. The great etymologist Anatoly Liberman writes (2010: xviii) that The works gathered in this bibliography are in English, German, Dutch, Frisian, five Scandinavian languages (Swedish, Norwegian, Danish, Icelandic, and Faroese), French, Italian, Spanish, Rumanian, eight Slavic languages (Russian, Polish, Ukrainian, Czech, Bulgarian, Slovenian, and Serbian/Croatian), Latvian, Lithuanian, Finnish, Hungarian, Japanese, and Latin. For reading works in the Germanic, Romance, and Slavic languages I did not need help (in the Germanic group, the only exception is Yiddish), but this is where my expertise comes to an end. If my mastery of Finnish, Hungarian, Irish, Welsh, and Japanese were at a respectable level, I am sure that I would have discovered many contributions of which I remained unaware. A series of etymological studies of selected words may be presented as a volume of essays, like David L. Gold’s Studies in Etymology and Etiology, or even as part of a monograph, as in the last chapter of W. B. Lockwood’s Informal Introduction to English Etymology, but if they share a well-defined microstructure, they may also be presented as a dictionary, as has been done by Liberman in the pilot volume of his Analytic Dictionary of English Etymology.
2.2 Dogmatic and analytic etymological dictionaries A feature of Liberman’s dictionary project which sets it apart from all other etymological dictionaries of English is the explicit and sustained critical attention to previous etymological work which is flagged by the word analytic in its title.
138
Historical Lexicography and Etymology
Of course, scholarly etymological dictionaries of English before his had been grounded in their makers’ study of primary and secondary evidence. They sought out occurrences of forms in English and other languages which might shed light on the etymology of a given English word – early attestations, for instance, of English mac(c)aroni, Italian mac(c)aroni and maccherone – and critical discussions of the word histories implied by these attestations, such as the exchange on macaroon and macaroni published in Notes and Queries in 1871–2 (for which see Liberman 2010: 681). In the case of W. W. Skeat, the maker of the first modern etymological dictionary of English (see Liberman 1998: 42 and Malkiel 1993: 31–2), this search process was originally limited, by an explicit personal resolution, to three hours of study per entry (Skeat 1879–82/1910: xiv). In that of later works such as the Oxford Dictionary of English Etymology and indeed the fourth edition of Skeat, difficult cases doubtless had more hours of work allotted to them. But in both cases, and in those of other scholarly etymological dictionaries of English published in the twentieth century, identifications of the predecessors who had made given arguments were generally suppressed. Arguments which were rejected would be noted very briefly or passed over in silence, and those which were accepted were presented without comment. A result of this policy was that English had, and has, no multi-volume etymological dictionaries such as were produced in the twentieth century for Spanish and Catalan by Juan Corominas (Joan Coromines) and for Italian by Manlio Cortelazzo and Paolo Zolli, and, at the extreme of scholarly elaboration, for French, in the form of the wonderful 25-volume Französisches etymologisches Wörterbuch founded by Walther von Wartburg.8 The references to secondary sources in etymological dictionaries which do present them vary in exhaustiveness. Those of the German etymological dictionary of Kluge, now in its twentyfourth edition (1883/2002), are presented in abbreviated form at the end of the entry.9 Those in scholarly etymological dictionaries like Winfred Lehmann’s of the extinct Gothic language, are presented more fully, as part of the text of the entry.10 Those in Liberman’s Analytic Dictionary are gathered from a bibliographical survey which approaches exhaustiveness, and are presented discursively: that for adze (the first one presented in Libermann 2008) runs to five columns. This is a procedure as far removed as can be imagined from the dogmatic English-language tradition. The price which is still being paid for the excellence of Liberman’s work, quite apart from the untold thousands of hours which have been spent on the gathering and analysis of material, is slow or even stalled publication. The University of Minnesota Press published two volumes of the dictionary: one which presents fifty-five specimen entries, and one, of nearly a thousand pages, which presents the bibliography (Liberman 2009 and 2010). Both were supported by private benefactors (Liberman 2010: xxii–xxiii). By 2020, no third volume had appeared.
2.3 Narratives of origin or of development Etymological enquiry in the ancient world was typically concerned with the origins of words, and this is still true of some recent etymological lexicography. So, the unsatisfactory etymological dictionary of Eric Partridge, better known for his slang lexicography, was actually called Origins (Liberman 1998: 50). At a higher level of sophistication, the brief etymologies in the widely circulated American Heritage Dictionary of the English Language refer where appropriate to an appendix of Indo-European roots compiled by the eminent philologist Calvert Watkins, with
139
The Bloomsbury Handbook of Lexicography
reference to the Indo-European etymological dictionary of Julius Pokorny, and (in the fourth edition of 2000 and the fifth of 2011) to a matching appendix of Semitic roots. By contrast, from the early stages of its planning onwards, the OED has always treated etymological statements about the history of a series of forms outside English as integrated with the story of the development of senses within English. This principle is very clearly expressed in the Oxford Guide to Etymology by Philip Durkin, the Deputy Chief Editor of the OED, which states at its outset that etymology ‘is the investigation of word histories’, or ‘the whole endeavour of attempting to provide a coherent account of a word’s history (or pre-history)’ (Durkin 2009: 1, 2). The parentheses around ‘or pre-history’ are eloquent: Durkin discusses reconstructed forms at a number of points (e.g. 14–19), and indeed quotes Watkins at length on the subject of reconstruction (253), but in his account, the study of sense-development within English is treated as continuous with the study of how given words first entered the language. This makes it possible to attend effectively to cases where the stories of borrowing and development are intertwined, for instance where there is continuing influence of an etymon on a borrowed form (or reciprocal influence between the two), or where a single form may be traced to multiple borrowings from different languages (see Durkin 2009: 155–78). The principle that etymological lexicography can be so closely connected with historical lexicography that the two must really be practised together, and the fact that there is a major historical dictionary of English in which etymology and historical lexicography are indeed integrated, explain the present three-way partition of the etymological lexicography of English, into specialized single-volume etymological dictionaries with terse origin-driven entries for a wide range of words, large historical dictionaries in which etymology and sense-development are integrated for a much wider range of words, and dictionaries or lexicological collections in which fewer headwords than would be acceptable in a dictionary of the first class are treated in more detail than would be practicable in a dictionary of the second.
3 Conclusion Liberman has more than once struck a melancholy note as he has commented on the story of the etymological lexicography of English (see e.g. Liberman 1998: 93–4). It is not easy to see how future general dictionaries of English etymology, successors to the work of Skeat, will be dramatically better than or indeed different from their predecessors. Liberman’s own analytical project is a different matter, though its scale makes it unlikely that it will have many imitators. The most exciting developments in the etymological lexicography of English are those associated with the OED project (see Durkin 1999, and the publications listed at Durkin 2009: 303–4). There are, however, prospects for the dramatic development of the historical lexicography of English, and although these are treated much more fully in Brewer, Chapter 22, a very brief sketch belongs here as well. At least three possible lines of development can be discerned. First, the development of historical corpora cannot fail to have an effect on the development of historical lexicography, most obviously on the provision of frequency information (this was always present in the Trésor de la langue française, and frequency in current use is now
140
Historical Lexicography and Etymology
indicated in the OED). Gender and class might become more central to a corpus-based historical lexicography of English than they have been to any previous work in the field. Second, there are wide and interesting gaps in the regional historical lexicography of English – for instance, a dictionary of West African English on historical principles would be most welcome, and perfectly feasible. The same may be said of author lexicography. There are many Shakespeare dictionaries in English, but the best of them, the Shakespeare Glossary of C. T. Onions (1911/1986), is a little book, incomparably more modest than the Goethe Wörterbuch (1978–) which perhaps represents the state of the art in historical author-lexicography. Third, the integration of the online texts of major historical dictionaries is now being explored: The Middle English Dictionary and the Historical Thesaurus of the Oxford English Dictionary are now linked to the OED, and the Woordenboek der Nederlandsche Taal and the Oudnederlands Woordenboek are linked to three other historical dictionaries of Dutch and Frisian in the Geïntegreerde Taal-Bank of the Institute for Dutch Lexicography at Leiden. Linking is not really the same as integration, but it is a first step towards it. There are uncertainties about the future of historical lexicography, not least among them the question of the role of print. The uncertainty of funding, even after the publication of a dictionary has begun, is notorious: hence the suspension of publication of Jonathan Lighter’s splendid historical dictionary of American slang (see Winchester 2012: 26), and the heroic ongoing efforts of the editors of the Dictionary of Old English to obtain funding for each new volume. But the genre itself is alive with possibilities.
Notes 1 Dictionnaire historique de la langue française s.v. cab, CAB n. m. est emprunté (1848) à l’anglais cab (1827) … Cab désigne à l’origine une voiture à deux roues tirée par un cheval, puis en 1834 ce type de voiture avec cocher à l’arrière … il s’est aussi employé, comme en Angleterre, pour des voitures à quatre roues avec le cocher à l’avant. Le cab était élégant et à la mode dans la seconde moitié du xixe s. … La forme hansom cab s’est employée avec un renforcement de snobisme (Proust). C’est aujourd’hui un mot d’historien. 2 Jamieson’s work was a dictionary on historical principles avant la lettre, which presented wellreferenced and chronologically ordered quotations from historical texts in the Scots language after the definitions. 3 For more on Passow and on the history of historical dictionaries in general, see Considine (2014 and 2015). 4 Word count from Bamman and Smith (2012: 2 n 2); the figure of 10 million words given beside this includes post-200 material. 5 See also Brewer (2010 and 2012), and cf. Considine (2009). 6 See https://public.oed.com/history/rewriting-the-oed/editing-of-entries/. 7 For a much more elaborate typology than this one, see Malkiel (1976). 8 For Corominas see Malkiel (1993: 140–2); for Cortelazzo and Zolli and their predecessors, Malkiel 1993: 106–7; for the Französisches etymologisches Wörterbuch, ibid. 80–2. 9 No etymological dictionary of English has gone through as many editions as Kluge – but some of those which make up the count of 24 are simply new printings rather than revised editions. 10 See Lehmann (1986: vi), for his treatment of references to secondary sources and that of Sigmund Feist, whose dictionary of 1939 he translated and revised to produce his own.
141
The Bloomsbury Handbook of Lexicography
References Dictionaries American Heritage Dictionary of the English Language (1969/2011), Fifth edition, Boston: Houghton Mifflin. Cannon, G. with N. Warren (1996), The Japanese Contributions to the English Language: An Historical Dictionary, Wiesbaden: Harrassowitz. Concise Scots Dictionary (1985), Editor-in-chief Mairi Robinson, Edinburgh: Polygon. Corriente, F. (2008), Dictionary of Arabic and Allied Loanwords: Spanish, Portuguese, Catalan, Gallician and Kindred Dialects, Leiden: Brill. Dictionary of Newfoundland English (1982/1990), Ed. G.M. Story, W.J. Kirwin, and J.D.A. Widdowson, Toronto: University of Toronto Press. Dictionary of Old English (1986–), Ed. Antonette di Paolo Healey (A–G), Angus Cameron (D), Ashley Crandall Amos (B–D) et al., 8 fascicles (A–G) published to date, on microfiche and on CD-ROM, Toronto: Pontifical Institute of Medieval Studies. Dictionnaire historique de la langue française (2010), New (Fourth) edition, Ed. Alain Rey, Paris: Dictionnaires Le Robert. Goethe Wörterbuch (1978–) 4 vols, plus 8 fascicles of Vol. 5 (A–lebensfeindlich) published to date, Stuttgart: Kohlhammer. Online at http://germazope.uni-trier.de/Projects/GWB/. Görlach, M. (ed.) (2005), A Dictionary of European Anglicisms, Oxford: Oxford University Press. Jamieson, J. (1808), An Etymological Dictionary of the Scottish Language, Illustrating the Words in Their Different Significations by Examples from Ancient and Modern Writers, 2 vols, Edinburgh: printed at the University Press for W. Creech [et al.]. Jamieson, J. (1825), Supplement to the Etymological Dictionary of the Scottish Language, 2 vols, Edinburgh: printed at the University Press for W. & C. Tait [etc.]. Kluge, F. (1883/2002), Etymologisches Wörterbuch der deutschen Sprache, Twenty-fourth edition, Berlin: Walter de Gruyter. Lehmann, W. (1986), A Gothic Etymological Dictionary: Based on the Third Edition of Vergleichendes Wörterbuch der gotischen Sprache by Sigmund Feist, Leiden: Brill. Liberman, A. with the assistance of J.L. Mitchell (2008), An Analytic Dictionary of English Etymology: An Introduction, Minneapolis and London: University of Minnesota Press. Middle English Dictionary (1952–2001), Ed. Hans Kurath (A–F), Sherman M. Kuhn with John Reidy (G–P), and Robert E. Lewis (Q–Z) et al., 13 vols, Ann Arbor: University of Michigan Press [some volumes jointly published with Oxford University Press]. Onions, C.T. (1911/1986), A Shakespeare Glossary, Third edition, Ed. Robert D. Eagleson, Oxford: Clarendon Press. Oudnederlands Woordenboek (2009–), Editor-in-chief Tanneke Schoonheim, Leiden: INL. Online at gtb. inl.nl. Oxford English Dictionary (1884–1933), Ed. James Murray, Henry Bradley, William Craigie, and C.T. Onions, 12 vols plus supplement. [Known until 1933 as A New English Dictionary on Historical Principles], Oxford: Clarendon Press. Oxford English Dictionary (1989), Second edition, Prepared by J.A. Simpson and E.S.C. Weiner on the basis of the First edition and of a 4-vol. Supplement (1972–86) by R.W. Burchfield, 20 vols, Oxford: Clarendon Press. Oxford English Dictionary (2000–), Third edition, Ed. J.A. Simpson, Oxford: Oxford University Press. Online at www.oed.com. Schneider, J.G. (1819), Johann Gottlob Schneider’s Handwörterbuch der griechischen Sprache: Nach der dritten Ausgabe des grössern Griechischdeutschen Wörterbuchs mit besondrer Berücksichtigung
142
Historical Lexicography and Etymology
des Homerischen und Hesiodischen Sprachgebrauchs, Ed. Franz Passow, 2 vols, Leipzig: Friedrich Christian Wilhelm Vogel. Skeat, W.W. (1879–82/1910), An Etymological Dictionary of the English Language, Fourth edition, Oxford: Clarendon Press. Scottish National Dictionary (1931–76 and 2005), Ed. William Grant (vols. 1–3) and David Murison (vols. 3–10), with a supplement ed. Iseabail Macleod, 10 vols. plus supplement, Edinburgh: Scottish National Dictionary Association. Trésor de la langue française: Dictionnaire de la langue du XIXe et XXe siècle (1789–1960) (1971–94), Ed. Paul Imbs and Bernard Quemada, 16 vols, Paris: Éditions du Centre National de la Recherche Scientifique (vols. 1–10); Gallimard (vols. 11–16).
Other references Adams, M. (2010), ‘Legacies of the Early Modern English Dictionary’ in J. Considine (ed.), Adventuring in Dictionaries: New Studies in the History of Lexicography, Newcastle, UK: Cambridge Scholars Publishing, 290–308. Ashdowne, R. (2010), ‘Ut Latine minus vulgariter magis loquamur: The making of the Dictionary of Medieval Latin from British sources’ in C. Stray (ed.), 195–222. Bamman, D. and D. Smith (2012), ‘Extracting two thousand years of Latin from a million book library’, ACM Journal on Computing and Cultural Heritage 5 (1), article 2 (separately paginated). Brewer, C. (2007), Treasure-House of the Language: The Living OED, New Haven and London: Yale University Press. Brewer, C. (2009), ‘The Oxford English Dictionary’s treatment of female-authored sources of the eighteenth century’ in I.T.-B. van Ostade and W. wan der Wurff (eds), Current Issues in Late Modern English, Bern: Peter Lang, 209–38. Brewer, C. (2010), ‘The use of literary quotations in the Oxford English Dictionary’, Review of English Studies 61 (248), 93–125. Brewer, C. (2012) ‘“Happy copiousness”? OED’s recording of female authors of the eighteenth century’, Review of English Studies 63 (258), 86–117. Considine, J. (2009), ‘Literary classics in OED quotation evidence’, Review of English Studies 60 (246), 620–38. Considine, J. (2014), ‘John Jamieson, Franz Passow, and the double invention of lexicography on historical principles’, Journal of the History of Ideas 75, 261–81. Considine, J. (2015), ‘Historical dictionaries: History and development; current issues’ in P. Durkin (ed.), The Oxford Handbook of Lexicography, Oxford: Oxford University Press, 163–75. Durkin, P. (1999), ‘Root and branch: Revising the etymological component of the OED’, Transactions of the Philological Society 97, 1–49. Durkin, P. (2009), The Oxford Guide to Etymology, Oxford: Oxford University Press. Eickmans, H. (2012), ‘Woordenboek der Nederlandsche Taal (WNT)’ in U. Haß (ed.), 271–91. Geïntegreerde Taal-Bank (2007–), Leiden: Instituut voor Nederlandse Lexikologie. Online at gtb.inl.nl. Gold, D.L. (2009), Studies in Etymology and Etiology (ed. F. Rodríguez González and A. Lillo Buades), San Vincente: Publicaciones de la Universidad de Alicante. Haß, U. (ed.) (2012), Grosse Lexica und Wörterbücher Europas, Berlin: De Gruyter. Jones, D.M. (1966), ‘Review of Mycenaeae Graecitatis Lexicon by Anna Morpurgo’, Classical Review new ser. 16 (3), 374–5. Liberman, A. (1998), ‘An annotated survey of English etymological dictionaries and glossaries’, Dictionaries 19, 21–96. Liberman, A. with the assistance of A. Hoptman and N.E. Carlson (2010), A Bibliography of English Etymology, Minneapolis and London: University of Minnesota Press. Lockwood, W.B. (1995), An Informal Introduction to English Etymology, Montreux: Minerva Press.
143
The Bloomsbury Handbook of Lexicography
Louwen, K. (2008), ‘A glimpse behind the scenes of the Oudnederlands Woordenboek (Old Dutch Dictionary)’ in M. Mooijaart and M. van der Wal (eds), Yesterday’s Words: Contemporary, Current and Future Lexicography, Newcastle: Cambridge Scholars Publishing, 218–29. Malkiel, Y. (1976), Etymological Dictionaries: A Tentative Typology, Chicago and London: University of Chicago Press. Malkiel, Y. (1993), Etymology, Cambridge: Cambridge University Press. Murray, K.M.E. (1977), Caught in the Web of Words: James Murray and the Oxford English Dictionary, New Haven and London: Yale University Press. Norri, J. (2010), ‘Dictionary of Medical Vocabulary in English, 1375–1550’ in J. Considine (ed.), Current Projects in Historical Lexicography, Newcastle: Cambridge Scholars Publishing, 61–82. Pantelia, M. (2000), ‘‛Νούς into Chaos’: The creation of the thesaurus of the Greek language’, International Journal of Lexicography 13 (1), 1–11. Schröder, B.-J. (2012), ‘Thesaurus linguae latinae’ in U. Haß (ed.), 293–300. Seber, W. (1780), Index vocabulorum in Homeri Iliade atque Odyssea, New edition, Oxford: ex typographeo Clarendoniano. Shenker, I. (1978), ‘Akkadians had a word for it’, New York Times Book Review, 21 May, 3, 38. Silva, P. (2000), ‘Time and meaning: Sense and definition in the OED’ in L. Mugglestone (ed.), Lexicography and the OED: Pioneers in the Untrodden Forest, Oxford: Oxford University Press, 77–95. Stray, C. (ed.) (2010), Classical Dictionaries Past, Present and Future, London: Duckworth. Stray, C., M. Clarke and J.T. Katz (eds) (2019), Liddell and Scott: The History, Methodology, and Languages of the World’s Leading Lexicon of Ancient Greek, Oxford: Oxford University Press. Thesaurus Linguae Graecae (2009), Director Maria Pantelia, Irvine, CA: University of California, Irvine. http://www.tlg.uci.edu/. Winchester, S. (2012), ‘The mongrel speech of the streets [review of Green’s Dictionary of Slang by Jonathon Green]’, New York Review of Books, 8 March, 24–26. Zgusta, L. (2006), Lexicography Then and Now: Selected Essays, ed. F.S.F. Dolezal and T.B.I. Creamer, Tübingen: Max Niemeyer.
144
10
Researching pedagogical lexicography Amy Chi
1 Introduction A dictionary, as an art and craft of lexicography, has always been closely associated with the notion of pedagogy. A subject-field dictionary, for instance, such as a dictionary of computer engineering or social science, provides word information of a specific discipline to students to support their study, or to serve professionals in the field. In general, the aims of dictionary compilation are threefold: first, as a record of a particular language, such as the monumental Oxford English Dictionary, which charts the development of the English lexicon; second, and the most common purpose, to solve linguistic problems related to word knowledge and usage that users may encounter in their daily lives (typically, such a dictionary is synchronic in nature, like a general-purpose dictionary or a learner’s dictionary of a foreign language); and third, to be sold as a commercial product to generate profits. It is thus logical to presume that pedagogical lexicography, which Hartmann and James (1998: 107) define as ‘a complex of activities concerned with the design, compilation, use and evaluation of pedagogical dictionaries’, entails a symbiotic relationship between the knowledge provider (lexicographer) and the receiver (user). In other words, at the preparatory stage of a dictionary compilation, lexicographers will have a target user group, or groups, which the reference book will serve. The word information to be provided will require specific design and compilation to meet the needs of the target group(s). To evaluate the accomplishment of a dictionary, we should examine users’ comments on the usefulness of the dictionary (use and evaluation). There will also be academics giving dictionary critiques (see Akasu, Chapter 4). With this understanding of pedagogical lexicography as the backdrop, the following presents an overview of the development of English pedagogical lexicography. The most exciting development in pedagogical lexicography since the 1940s has been the compilation of English dictionaries for non-native English speaking (L2) learners of English. Indeed, the monolingual learner’s dictionary (MLD) for advanced L2 learners of English defines the status of such a dictionary type in English lexicography and has been the major thrust for research in pedagogical lexicography (see also Yamada, Chapter 11). Due to the gargantuan number of its potential user group (with the total number amounting to between 500 million and one billion according to Béjoint 2010), and the ever-growing demand for learning English as a foreign language (the British Council estimated in 2013 that 1.75 billion people worldwide could speak English at a useful level), the MLD has been attracting much attention from academics. It all began with A. S. Hornby’s The Idiomatic and Syntactic English Dictionary (ISE, later published by
The Bloomsbury Handbook of Lexicography
the Oxford University Press and renamed as the Advanced Learner’s Dictionary: this was the first edition of the dictionary currently known as the Oxford Advanced Learner’s Dictionary, OALD). The dictionary was targeted specifically at helping Japanese advanced learners of English with their encoding or writing tasks. There were also salient features to meet intended users’ decoding needs, such as looking up word meanings. The dictionary was unrivalled (for a period, when three editions were published) until 1978 when Longman Dictionary of Contemporary English (LDOCE) appeared on the market. Collins COBUILD English Dictionary (CCED) joined in the competition in 1987 and initiated a new era of corpus-based lexicography (see ‘Using corpora as data sources for dictionaries’, Chapter 7). Cambridge International Dictionary of English (CIDE) (1995), Macmillan English Dictionary (MED) (2002) and Merriam-Webster’s Advanced Learner’s English Dictionary (MWALED) (2008) gradually contributed their publications, each with its own innovations and unique features, totalling six MLDs targeting advanced L2 learners of English aged 16 to 24 (Rundell 2010). This chapter will review and discuss some current research studies in pedagogical lexicography in the context of the teaching of English as a second or foreign language (ESL/EFL). We shall focus on four aspects of the MLD: design, compilation, use and evaluation. Some issues related to these four areas overlap while some are interrelated; hence, the sections should not be read as mutually exclusive. Most of the discussion will centre on the advanced learners’ dictionaries in the printed form. The concluding section will indicate some possible future developments of this dictionary type.
2 Monolingual Learner’s Dictionary (MLD) for advanced learners of English 2.1 Background McArthur (1989) postulates that the prevailing conditions of English lexicography presented at the beginning of the twentieth century provided an ideal breeding habitat for the MLD. These included in the discipline of English lexicography a practice of using simple words to explain harder words and the growth of systems of education throughout the British Empire, with the latter point implying that the English language would be part of that growth in dependent countries. The development in foreign language teaching and learning and linguistics research studies at the time further impacted the conceptualization of MLDs. For example, Daniel Jones’s English Pronouncing Dictionary of 1917 provides the model accent to be offered to L2 learners of English; such a pronunciation system, or version of the system, has been adopted by all MLDs to date to provide users with English word pronunciation. Also, the Vocabulary Control Movement in the late 1920s and early to mid-1930s affected the selection of the dictionary wordlist, choice of phraseological information and the convention of a restricted defining vocabulary (Cowie 1999). In addition, Palmer’s and Hornby’s interest in pedagogical grammar resulted in the distinct verbpattern scheme of OALD1.
146
Researching Pedagogical Lexicography
However, Rundell (1988) argues that this dictionary type has inherited many conventions from native-speaker monolingual dictionaries (NSD) that were irrelevant to L2 learners of English. In the first section ‘Researching Design’, we examine the macrostructure of MLDs, focusing on two areas: alphabetical ordering and polysemous word entry. The second section, ‘Researching Compilation’, continues the macrostructure theme with an overview of the MLD wordlist. Two important conventions in presenting word meaning: defining vocabulary and definition writing style are reviewed in detail. In the third section, ‘Researching Use’, we trace the development of user-related research studies since the 1980s, discussing both their strengths and their weaknesses. The fourth section, ‘Researching Evaluation’, includes a review and discussion of who should, and how to, evaluate MLDs. In the concluding section, ‘The Way Forward’, an area which has great potential for pedagogical dictionary research will be proposed.
2.2 Researching design 2.2.1 Alphabetical ordering The MLD adopts the NSD’s mainstream tradition of an alphabetical macrostructure in presenting wordlists. However, since ‘no form of alphabetization can successfully deal with all types of idioms without listing each in several places, and no dictionary can afford the luxury of such repetition’ (Landau 1989: 82), many lexicographical decisions (e.g. adjustment and interpretation of the alphabetical macrostructure) will have been implemented before the final presentation. Compilers have mostly taken into consideration, sometimes based on linguistic theories, the need for space and the requirement to offer quick and easy access to words. Their decisions may affect the choice of headword (using the canonical or base form of a word rather than the form in which it most frequently appears), treatment of homographs (e.g. bank ‘the place for money’ or bank ‘place along the side of a river’), presentation of phrasal verbs, derivatives and multi-word units (e.g. idioms and compounds). The final access structure of the dictionary may not be transparent or comprehensible to inexperienced L2 users of the dictionary. Indeed, the access macrostructure of the dictionary often becomes a source of obstacles for users in their word search. One issue is whether or not the dictionary should adopt nesting or strict alphabetical ordering in presenting headwords. The nesting approach (placing several related words and phrases under the same headword in a non-alphabetical order), adopted by the OALD until its Sixth edition (2000), was perceived to help maintain and develop a sense of semantic and morphological relationships within the lexicon, which was important for users learning the target language, in particular in expanding their vocabulary (Cowie 1999). This structuring principle in presenting words was later replaced by the strict letter-by-letter method, probably to facilitate computing production, based mainly on word class. Such a change gave decoding a higher priority than encoding (Cowie 1999, van der Meer and Sansome 2001) since the arrangement was believed to allow users easier and quicker access to information, which was considered to be important, based upon results from user-related studies (Bogaards 1996, Herbst 1996). Following this convention, compounds become separate headwords while suffixed derivatives of words remain mostly as run-ons (sub-entries of a word or phrase they are related to). All current MLDs follow this arrangement in general; nevertheless, the issue has not yet reached a perfect closure.
147
The Bloomsbury Handbook of Lexicography
Bogaards (1996) questions the use of an alphabetical ordering system in presenting headwords and explicates how such an arrangement can create obstacles for L2 users in completing written translation tasks. Since semantically related words are scattered in the dictionary following their orthographic form, he explains that the ‘findability’ of word meanings following this macrostructure is low. Another concern is that since MLDs aim at appealing to a global clientele, the choice of adopting an alphabetical macrostructure might be problematic to users of different linguistic and cultural backgrounds. First, such an ordering system may obstruct or retard the word-search of users whose mother tongue does not share the Latin alphabet (e.g. Reif 1987, Battenburg 1991). Second, even if users’ languages share the same Latin root, as in the case of Spanish learners of English, they may have a different interpretation of the alphabetical order. Scholfield (1999) explains how Spanish users may be confused when looking up words like alley, spelt with ll, which Spanish traditionally would treat as one letter, but will be two in an English dictionary. Third, a strict letter-by-letter arrangement may obstruct some users from transferring their first language dictionary reference experience and skill to the use of the MLD. Chi’s (2003) study revealed how first language reference experience led some Hong Kong Chinese students to word-search failure. For example, when these students transferred their dictionary reference skills from the use of Chinese dictionaries, which mostly follow a word nesting convention, to their use of MLDs, they were often disappointed by not finding the compound word subsumed under the related headword. Furthermore, whether decoding is really preferred over encoding by L2 users when they consult a dictionary is still debated. Rundell (1988) maintains that the belief of MLD compilers in treating meaning to be the most important information that L2 learners of English need from the dictionary, similar to NSD users, is one of the major faults in the design of this dictionary type. Scholfield (1999) upholds that dictionary consultation, for both receptive and productive purposes, should be an integral part of the learners’ vocabulary acquisition process, both in terms of learning new words and in strategy development. A nesting convention facilitates vocabulary development that treats words in families, and van der Meer (2002: 516) postulates that a nesting entry ‘would in many cases also make sense definitions easier, since repetitions of identical semantic information may be avoided’. Indeed, fuelled by an abundance of evidence resulting from computing technology and multimillion-word textual corpora, there has been a revival of interest in linguistic disciplines like second language vocabulary acquisition and phraseology. Results from such research studies have impacted on the presentation and vocabulary information in brand new, and new, editions of existing MLDs in the past two decades. Most current MLDs take pride in their new features: the red-coloured words (‘[for users] to use them confidently and correctly’ MED1); a wealth of collocation information to support productive tasks and frequency-based vocabulary information to help users communicate effectively (Cambridge Advanced Learner’s Dictionary 2005); and the inclusion of Coxhead’s (2000) Academic Word List (OALD8). With the encoding function of the dictionary seemingly shifting to take a more central position in pedagogical dictionaries, it is possible that the current policy of aligning headwords void of semantic and morphological information will be revisited. On the other hand, at the technological frontier, de Schryver (2003: 175) proclaims that ‘users don’t need to know the sequence of the alphabet’s letters anymore’ in the electronic dictionary interface. He explains that the latest techniques such as voice recognition, ‘focus-in typing’ 148
Researching Pedagogical Lexicography
(while typing, a list of words appears on the screen suggesting or approximating the word in search) and spelling corrections would eliminate the problems of misspelling and prompt users to the right word. The devices solve some apparent problems but help is rendered only to users accessing the MLD electronically. Boonmoh (2010) asserts that the exciting picture that Nesi (2000b) described in respect of dictionaries on CD-ROM have not yet been realized in many pocket electronic dictionaries (PEDs) or in dictionaries on the Internet. In Gouws’ (2009: 4) view, ‘the immediate future of lexicographic tools still sees printed dictionaries as an important and persisting role player.’ He continues: This demands that metalexicographical research, besides its focus on electronic dictionaries, should still also be directed at printed dictionaries – including issues regarding the macrostructure of printed dictionaries. New dimensions of research in alphabetical macrostructure, supplemented by recent linguistic findings in English as a second/foreign language acquisition, and supported by computing technology, may yield a new presentation of the MLD that includes an ‘integrated semantic awareness’ (van der Meer 2002). Rundell (2010) suggests that the one-size-fits-all MLD model is outdated and the future of the dictionary centres on the concept of ‘customization and personalization’. It is possible to offer L2 users from a specific linguistic background a modified or domesticated alphabetical macrostructure. Using the same wordlist, the MLD could be presented non-alphabetically, applying alternative approaches such as by themes (e.g. Longman Lexicon of Contemporary English 1981) or arranging words with synonymous meanings (e.g. Wordnet).
2.2.2 Polysemous words Many dictionary-use research studies have revealed that typically users would choose the first sense of a polysemous word entry to answer their need (e.g. Nesi 1987, Bogaards 1998) because of either convenience or ignorance, or both. This decision is problematic in many respects, especially when the lexicographical decision in prioritizing senses of such types of entry depends mostly on high-frequency use as indicated in native-speaker corpora. Van der Meer (2002: 514) cautions, ‘the only thing a frequency-based order does is increase look-up speed for isolated senses. No more.’ Offering ‘Sign-posts’, ‘Guidewords’, ‘Menus’ or ‘Shortcuts’ at the beginning of, or internally within, a long entry seems to be the convention adopted by most current MLDs to assist users to locate the sense relevant to the their need, and to accelerate the look-up process. Data from research studies (e.g. Bogaards 1998, Tono 2011, Nesi and Kim 2011) testing the effectiveness of such a convention have been positive in general. Variables that might affect the accuracy and speed of the search seem to be related to the L2 user’s English proficiency; the length of, and the way meanings are presented in, the entries users looked up; and the synonymous words or phrases used by the dictionary to indicate ‘direction’ for the search. Opponents of this convention have cast doubts on its effectiveness in guiding users accurately to find the right sense since the ‘directive words’ ‘tend to rely on high-frequency superordinate terms, and these are sometimes too ambiguous or vague to facilitate effective searching’ (Rundell 1998: 327). Yamada (2009) contends that the convention is inconsistently applied in dictionaries, and hence may confuse users.
149
The Bloomsbury Handbook of Lexicography
He further adds that the sign-posts are mainly semantic and are not useful to form-based word consultations, while some sign-posts are semantically redundant, repeating and/or summarizing words found in the sense definition. Online MLDs which offer the ‘sign-post’ convention such as the Cambridge Dictionary Online and the Longman Dictionary of Contemporary English, have inherited the problem.
2.3 Researching compilation 2.3.1 Wordlist A dictionary for L2 learners of English was first conceived as a synchronic dictionary with a wordlist of frequent words that could fulfil, to a lesser or greater extent, both the decoding and encoding needs of L2 users. Such a notion of including only ‘a selected subset of the lexicon’, as Rundell (1998: 316) commented, ‘marks a major departure from the native-speaker (NSD) tradition’. This concept of identifying a limited set of English vocabulary that suffices for L2 users to function in general second-language contexts was the centre of research studies during the Vocabulary Control Movement from the mid-1920s to the late 1930s. Major publications that have influenced the compilation of MLDs include Michael West’s first use of 1,490 words for the vocabulary definitions of the New Method English Dictionary (1935) and his publication of A General Service List of English Words in 1953. The latter was referenced in the compilation of LDOCE1 for its 2,000-word Defining Vocabulary (Cowie 1999), and this defining convention has developed into a norm in the compilation of all major MLDs. Hornby’s wordlist was mostly based on frequency of use, with the belief that dictionary consultations from target users would be for words they encounter frequently; thus, the wordlist contained more structured and ‘heavy duty’ words than rare, technical and scientific words (Cowie 1999, Béjoint 2010). While MLDs published before 1995 more or less attempted to strike a balance between serving the encoding and decoding functions, there appeared a major shift to serve the latter function in the new or newer editions of MLDs. One of the obvious reasons was that user studies undertaken in the 1970s and 1980s revealed that users’ concern was mostly with meaning (see section ‘Researching Use’) in their consultations. Moreover, researchers like Kokawa and Yamada (1998: 354) had elucidated that for an MLD to be complete it has to teach ‘learners linguistic (including pragmatic) knowledge as well as cultural or encyclopaedic aspects of the language, which relate to the “content” of communication’. High-frequency and function words were discovered to be less demanded, a contradiction to the early belief that these words would be needed because of their high frequency of encounters and for use; instead, users are requesting more infrequent words and semantic knowledge about words (Cowie 1999).
2.3.2 Defining vocabulary With regard to microstructure, the types of information that go into individual entries have stayed intact, though perhaps in different formats and with different emphases, in all major MLDs since the initial compilation of this dictionary type. As an advocate of using a selective core vocabulary so as not to overtax L2 learners of English, Hornby used simple words and phrases to write 150
Researching Pedagogical Lexicography
definitions. LDOCE1 (1978) went a step further and debuted a Defining Vocabulary of 2,000 words. This new feature initially obtained mostly positive feedback from users’ studies (e.g. MacFarquhar and Richards 1983, Herbst 1986). However, some researchers contested LDOCE’s claim of using only 2,000 words in writing the dictionary definitions while some doubted the clarity and accuracy of definitions written following such a convention (e.g. Stein 1979, Fox 1989, Bogaards 1996, Rundell 1998). While definitions written following the convention of a controlled vocabulary are intended to encompass simplicity and comprehensibility, Cowie (1999: 111) asserts: It is also important that they should be accurate, concise and written in natural English, and LDOCE provides some evidence that, in striving to be simple and comprehensible, compilers can sometimes lose sight of one or more of the other criteria. Béjoint (2010) suggests that such a restrictive list of vocabulary may hamper lexicographers in creating a ‘chain of definitions’. He explains, for example, that by following the genus word mammal used in defining cat, dictionary users may expand their vocabulary knowledge; instead, if animal is used as the genus, as restricted by the vocabulary list, ‘the chain is short-circuited’ (Béjoint 2010: 172). Yamada (2010) corroborates the fact that natural and idiomatic English should be used in writing definitions. The abolition of a controlled vocabulary is possible, he affirms, with the gradual ease of referencing facilitated by the electronic interface of dictionaries. Should users face any difficult words in the definition (which a dictionary using a controlled vocabulary claims it may avoid), they can look up the words easily in several clicks and most users would not find that troublesome. Yet, it is noteworthy that all MLDs seemingly have endorsed the benefits of employing a controlled vocabulary in writing definitions by adopting the convention. The early resistance by OALD was removed when OALD5 (1995) announced that 3,500 words were used (3,000 in later editions) in writing their definitions. Collins COBUILD Advanced Dictionary of English (2009) uses 3,000 words in its Defining Vocabulary.
2.3.3 Definition writing style This area has been attracting vigorous debate with the discussion ignited by COBUILD’s introduction of full-sentence definitions in its first edition (1987). The traditional defining style uses short phrases and synonymous words but COBUILD’s style seems to imitate the way a teacher explains meaning to students in a full sentence. Hanks (1987) and Fox (1989) maintained that the new defining style helps, for example, eliminate use of distracting conventions such as parentheses, archaic words and excessive formulaic expressions to define. Proponents of the new style, such as Herbst (1996: 326), praised the definitions mainly because they offer good ‘semantic and collocational ranges of the valency complements of a verb’ and ‘avoid the technical character and syntactic clumsiness’. Critiques (e.g. Bogaards 1996, Rundell 2006, Cowie 1999, Béjoint 2010), on the other hand, have raised doubts about this style of definition for its impreciseness, over-specification, wordiness and redundancy-prone and space-consuming nature. They are also concerned that the relatively long definition style could distract users in the look-up process; and
151
The Bloomsbury Handbook of Lexicography
in eliminating old and unnecessary conventions, the full-sentence style has created new obstacles for users. However, studies (e.g. Tickoo 1989, Cumming et al. 1994, Dziemianko 2006) have reported that users prefer full-sentence to traditional definitions, although some which compared students’ performance on given linguistic tasks while using dictionaries with COBUILD’s and traditional defining styles reported inconclusively (e.g. Nesi and Meara 1994, Nesi 1998). Nonetheless, it is a fact that the full-sentence definition style has not yet evolved into the standard convention, though it has been adopted selectively in all MLDs. In all fairness, COBUILD’s defining style broke new ground and drew MLDs further away from the NSD tradition. Yamada (2010) postulates that, in an electronic interface, some of the full-sentence definitions’ weaknesses can be overcome, for example, by offering users a button for selecting definition information based on their look-up needs.
2.4 Researching use 2.4.1 Methodology There has been significant growth in related user research in pedagogical lexicography since the 1980s (see Nesi, Chapter 5). Such studies, especially those related to L2 learners of English, have been well documented and discussed in various books and articles (e.g. Cowie 1999, Nesi 2000a, Tono 2001, Jackson 2002, Béjoint 2010). Commentary generally points to the conclusion that although, relatively, much work has been done compared to before the 1980s, not many conclusive results have been reached. For example, when commenting on the empirical studies on dictionary use in the past twenty years or so, Lew (2011a: 1) could only suggest that ‘there is no denying that the methodological standards are improving at a steady rate.’ Indeed, questionnaire-based study data have been constantly challenged (e.g. Hatherall 1984, Batternburg 1991, Cowie, 1999, Nesi 2000a, Tono 2001, Jackson 2002) mostly on their reliability and accuracy. In Humblé’s (2001) view, the assumptions that most of these studies had that subjects have lexicographic awareness, knowledge of linguistic concepts and are honest in their reports, could be flawed. The questionnaire-methodology was employed generally by researchers (e.g. Tomaszczyk 1979, Béjoint 1981, Kipfer 1987, Taylor and Chan 1994) from late 1970s to early 1990s. These studies intended to collect L2 users’ opinions and discover their habits and/or skills in dictionary use (some studies included the use of bilingual and/or bilingualized dictionaries and electronic dictionaries). Data from such studies have confirmed long-standing conjecture (e.g. meaning is the major information category sought) and revealed new data (e.g. MLDs enjoy high prestige but are depressingly underused). Later studies overcame the weaknesses of questionnaire-based research design, complementing it with other methods like written and/or oral protocols (e.g. Nuccorini 1992, Thumb 2004, Chan 2012), interviews and discussions (e.g. Cubillo 2002, Chi 2010) and/or linguistic tasks or tests (e.g. Cumming et al. 1994, Harvey and Yuill 1997, Atkins and Varantola 1998, Chi 2003). Research designs incorporating tests or tasks in particular have been widely used in discovering users’ reference skills and in examining correlations between the use of the dictionary and users’
152
Researching Pedagogical Lexicography
performance in completing linguistic tasks, mostly in reading-comprehension and vocabulary acquisition. However, many share the same view as Nesi (2000a: 54) that: The findings of many of the other studies [on learners of English as a foreign language] are ultimately inconclusive, either because they report on the beliefs and perceptions of dictionary users, rather than on the observed consequences of dictionary use, or because different studies of similar phenomena have resulted in contradictory findings. Most critics draw their conclusion with recommendations for fine-tuning or experimenting with new methodology in data collection to ensure reliability and validity in areas such as test administration (e.g. choice of dictionaries and sampling of subjects), test design (e.g. contrived vs. natural word search environment) and data analysis and representation (e.g. qualitative vs. quantitative). However, it is only fair to point out that the problem identified regarding reliability and validity is universal in empirical research studies across all academic disciplines. The efforts and findings of all the research work should be given credit.
2.4.2 Subjects Another major concern in these studies is the subjects they employed, including the subject size and nature. Many critics share the same view as Hartmann’s (2001: 94) that ‘The number and scale of user studies is still too small, … The target populations observed are still extremely limited.’ For example, data shown on Tono’s (2001: 43, 51) two tables – Table 3.3 Research on ESL/EFL learners’ reference needs and 3.7 Studies on users’ reference skills – indicated that the highest sample subject number of study found in both research areas, learners’ reference needs and users’ reference skills, was Atkins and Varantola’s (1998), totalling 1,140. Some studies had a very small subject group; for example, Wiegand’s (1985) had just one subject and Ard’s (1982) had two subjects. More current studies of similar focus published in the International Journal of Lexicography also share a small subject group design, as shown in Table 10.1. With such small sample sizes, these studies also have a sporadic pattern of occurrence. There is no denying that these studies have produced insightful data and the qualitative design of some might have justified a small sample size. However, when the reported data of these studies are to be referenced for decisions by a dictionary compiler, which may claim to
Table 10.1 Samples of recent research studies of dictionary users published in the International Journal of Lexicography. Study
Subject number and education background
Chon (2008)
10 students studying at a Korean university
Lew and Doroszewska (2009)
56 students studying at pre-university level in Poland
Chen (2010)
85 students studying at a Chinese university
Dziemianko (2010)
64 students studying at a Polish university
Nesi and Kim (2011)
124 students studying at a Malaysian university
153
The Bloomsbury Handbook of Lexicography
meet the needs of millions, the number of such studies and their subject sample sizes are not noteworthy. In Hartmann’s (2001: 94.) words, ‘the results of various studies are of limited generalisability’. Another question reviewers raised is the significance of findings from user-related studies, since many of these were collected from the self-reports and observations of behaviours and attitudes of a groups of L2 learners of English who are inexperienced and relatively young. Reviewers doubt the worth of data obtained from these subjects who have only ‘rudimentary reference skills’ (Cowie 1999) and are, in Bejoint’s (2010: 257) view: Impatient … [and] anything sophisticated, or abstract, or too long, or expressed in codes will be neglected, because the amount of time and energy necessary to find and understand the information is too much compared with the benefit derived from the consultation. Many of them would ‘ditch a tool which requires too much clicking work’ (Lew, Online Dictionaries of English) when they navigate through the menu on the electronic interface for the full treatment of a word under search. Aside from having relatively low-level reference skills and showing limited patience and inquisitiveness in looking up information from a dictionary, many of these students have low English language proficiency or superficial knowledge of vocabulary acquisition (e.g. Nesi and Meara 1994, Tono 2001, Chi 2010). Additional studies found how infrequently they use their dictionaries (e.g. Béjoint 2010). If one accepts that data obtained in the past two decades provide us with insights into the use, reference needs, attitudes and behaviours of L2 users of MLDs, one should also take into consideration the small window in which these research studies have operated; and hence, the limitations of the representativeness of such findings. Cowie’s (1999: 187) comment in the following should be kept as a reminder of the urgency of finding new directions in user-related research: There seems little point in trying to assess the ability of students to retrieve information of whose existence they are hardly aware or to judge their performance of activities which they have seldom tackled.
2.5 Researching evaluation 2.5.1 The reviewer How should we evaluate, and who should be the judge of, a pedagogical dictionary? Naturally, one would think the user should have the last word on how well a dictionary has served them for the information sought. However, the MLD user is typically a L2 learner of English studying the language through a structured curriculum, and he or she often ‘approaches the dictionary within the constraints of his or her own needs and skills, but without necessarily receiving appropriate guidance and/or instruction’ (Hartmann 2001:25). In other words, the choice of their dictionary may have been made under the constraint of the requirement of school, course or teachers. Indeed, most research studies report that students’ choice of dictionaries for use is mostly determined by their teachers’ recommendation (e.g. Béjoint 1981, Nuccorini 1992, Chi 2003). Relatively young learners may have limited skills in, and knowledge of, the dictionary
154
Researching Pedagogical Lexicography
when asked to comment on or use it to perform a linguistic task. Béjoint (2010: 230) cautions that: A dictionary that sells well is not necessarily a good dictionary, but it is certainly a dictionary that corresponds to a social need. But the commercial success of dictionaries is not an unambiguous indicator of what the users need. A more promising group of MLD evaluators would be teachers of the MLD target users, the EFL/ESL teachers. They are professionally trained, equipped with pedagogical theories and methodologies in foreign language acquisition, teaching and attending to the linguistic needs of MLD target users first-hand in the classrooms. The teacher is, very likely, the driving force in consulting a dictionary in a structured English learning environment. Moreover, it may be reasonable to assume that teachers are conversant with dictionary use and can give a lucid account of the practicality of the dictionary in their students’ learning process. Surprisingly, however, research concerning English language teachers and dictionary use and evaluation is not common. In many user-related studies, English language teachers were subsumed as subjects alongside language learners. For example, in Nuccorini’s (1992) study, a group of five Italian teachers of English was used to compare with eleven EAP students on their dictionary choice and use. Other studies surveying teachers’ views and/or use of dictionaries include those of Herbst and Stein (1987), Tickoo (1989), Koren (1997), Chi (2003, 2011) and Boonmoh (2010). In Jackson’s (2002: 175) view, In general, reviewers – of books, plays, films, music – are chosen because they are considered knowledgeable or expert in the subject matter or the techniques of whatever it is they are reviewing. We should expect the reviewers of dictionaries to be knowledgeable in lexicography. If we follow this criterion, and given that the target users of MLDs are mainly L2 learners of English, academic lexicographers or metalexicographers with a speciality in English language teaching and/or learning should be the ideal reviewers. Indeed, most of the user-related research studies involved the researcher (commonly a metalexicographer or linguist with an English language teaching position at a university) examining his or her own students’ use of dictionaries, either using a questionnaire, or a linguistic performance test, or both. Some metalexicographers conducted comparative reviews examining features of a selected few MLDs based on their own personal judgement or set of criteria (e.g. Dalgish 1995, Allen 1996, Bogaards 1996, Herbst 1996, Scholfield 1999). These were reviews mainly of the editions of the major four MLDs that were published in 1995. Chan and Taylor (2003: 259) found such a review approach more helpful than one which compares a particular dictionary with its earlier edition, since a comparative review seems ‘to lead to a more thorough analysis of the selected features of the dictionaries’ and, thus, is ‘best calculated to provide users (and those who advise users) with a sound basis for making the right choice of dictionary’ (2003: 261). However, they found the 36 reviews they examined ‘primarily factual and descriptive rather than evaluative’ and suggested that they ‘might be better called “book notices” ’ (2003: 267). Béjoint (2010: 228) further adds that, ‘when the reviewers are academics, the reviews are usually better informed but they are more malicious, and on the whole not much more helpful’.
155
The Bloomsbury Handbook of Lexicography
2.5.2 Methodology Chan and Taylor (2003) reported that rarely did the reviews they examined provide any clear reviewer’s purpose. Moreover, they found that most of the reviews were directed not to end users but at the users’ teachers. They concluded that the evaluation process of those reviews in general was unclear and the comments made mostly based on the reviewer’s intuition. Jackson (2002) suggests that reviewers should begin their examination of a dictionary by familiarizing themselves through reading different sections of the dictionary such as the front and back matter, the user’s guide, the preface and the staff and consultant lists. Methods adopted could be random sampling of dictionary entries, conducted by one or a team of reviewers. He (Jackson 2002: 176) asserts that: Team reviews allow a more thorough treatment of each aspect of a dictionary’s lexical description, both by enabling more extensive sampling to be undertaken and by tapping into a reviewer’s specialist interest. He proposes two sources of information, the ‘internal’ and ‘external’, to reference for setting criteria for evaluation. While the ‘internal’ source refers to information obtained from the dictionary provided by the compiler, the ‘external’ source is from metalexicography including linguistic theories regarding the lexicon, dictionary design and production. However, Nielsen (2009: 28) considers adopting a linguistic approach for evaluation inadequate since a dictionary is not ‘just a container of the lexicon of a language’. He proposes a lexicographic approach to dictionary evaluation since it focuses on the significant features of both printed and electronic dictionaries. Such an approach encompasses lexicographic function, data and structure. Reviews adopting such an approach will offer readers information on making various decisions in areas such as the usefulness of the dictionary and the practicality and theoretical development of lexicography. A third approach that he raises is the factual approach, which has a focus on ‘an analysis, description and evaluation of the factual (semantic and encyclopaedic) data and topics contained and treated in the dictionary related to the lexicographic functions’ (Nielsen 2009: 29). Hartmann (2001) also advocates the need to establish international standards, including features like coverage, format, scope, size, title and authority for dictionary criticism. In the past, most reviews involved comparing several MLDs, a specific MLD with its earlier edition, MLDs with bilingual and bilingualized dictionaries. With the availability of the MLD on CD-ROM, handheld or pocket-size electronic dictionaries, and free internet access, reviews or comparative studies on printed MLDs and their electronic versions are growing: some examples are Nesi (2003), de Schryver (2003), Chen (2010) and Lew (Online Dictionaries of English).
2.6 The way forward This chapter has chosen only a few issues in the field of pedagogical lexicography which are of profound significance to the field for review and discussion. Readers may refer to other important issues, such as how computing technology has impacted the compilation of pedagogical dictionaries and the user-related research on electronic dictionaries (including PEDs and MLDs in internet mode) from other chapters under ‘Current research and issues’ of this book. In the
156
Researching Pedagogical Lexicography
following, we shall examine why a major aspect in researching pedagogical lexicography in future lies with the first word of the subject matter: ‘pedagogy’.
2.6.1 Pedagogical lexicography and language teaching Hartmann and James (1998: 107) define pedagogical dictionaries as dictionaries ‘specifically designed for the practical didactic needs of teachers and learners of a language’. Most of the discussion and research studies in the area to date have been on the latter. There is a wealth of research opportunities in exploring the teaching professional, and analysis of the data would bring fresh insight into the fine-tuning of existing MLDs. We still have little knowledge of EFL teachers’ dictionary-use experience and training, given that their teaching philosophy and classroom teaching approach may have impacted on their students’ choices, attitudes and performance in dictionary use. In the following, an explanation is presented for some useful areas of future research. Dictionary users do not approach a MLD with absolutely no reference skills or linguistic knowledge. L2 learners of English would have, for example, approached the dictionary with years of English language knowledge, learning experience and study skills, mostly obtained through a structured English language syllabus instructed by teachers in a classroom setting. The linguistic scaffolding that students acquired in their formative years would have facilitated their use of the dictionary to tackle linguistic tasks they face. When user-related studies uncover that most users exploit only a narrow range of dictionary items in their consultations, focusing predominantly on meanings in their dictionary search, how should researchers interpret the findings? Currently, most studies would conclude that MLDs should be made more transparent and user-friendly in meeting users’ decoding needs. Future research studies including the teacher in the examination may ponder additionally, for example, ‘Has the English language curriculum and/or teaching that users received in the past promoted this narrow usage?’ Other areas of research with a teacher focus could investigate why users are unaware of a range of vocabulary information their dictionaries provide, such as in Fan’s (2000) study where university students who have learned English in a school system for thirteen years before entering university were found not recognizing the vast resources that dictionaries contain even though they reported a high ownership; or examine why users are unaware of their own reference needs, such as Frankenberg-Garcia’s (2011) study revealed, instead fixating only on L1–L2 equivalents and word spelling in their dictionary consultations. An investigation of such failings may begin with a plausible hypothesis that the widely adopted communicative approach in EFL teaching in the past several decades has shifted the foreign language teaching objective from writing to speaking and from semantic and syntactic to communicative competence. In such a teaching environment, dictionaries will not have a strong presence in core teaching. Herbst and Stein (1987: 121) criticize the communicative teaching approach, suggesting that it ‘not only discourages dictionary training but actually runs counter to it’ and that ‘semantic precision, situation appropriateness and grammatical correctness have all too often and too readily been set aside and even discredited’. Swan (2010) warns that since the communicative approach focuses on teaching EFL learners how to use the language for tasks which mirror everyday life, the teaching may neglect language aspects such as lexis, grammar and phonology, which learners would need to master to tackle linguistic-demanding tasks like discussion. If syntactic characteristics and terminologies are not explained or referenced in class, 157
The Bloomsbury Handbook of Lexicography
it is likely that few students would consult the dictionary for grammatical information simply out of ignorance of its presence and use. Taking into consideration the incomplete linguistic scaffolding that students are presumed to possess when they consult a dictionary may suggest a different conclusion from that which most current research studies have reached. Findings on students’ linguistic capacity and learning strategies with reference to their past English language study would help lexicographers to gauge the levels and kinds of dictionary information that a MLD should provide in order to be user-friendly and transparent for its users. Hartmann (2001: 120) postulates that dictionary research as an interdisciplinary subject ‘is the direction in which research in metalexicography or dictionary research should proceed’ and pedagogical lexicography in fact emerged from joining the sister discipline Language teaching with Lexicography. In the case of MLD compilation, the siblings do not seem to be working alongside each other and/or getting equal attention from researchers; instead, many dictionary use decisions have been made by the latter sister following findings obtained directly from the sisters’ shared clients, the L2 learners of English. One possible reason could be inferred from Rundell’s (2007) remark on Dziemianko’s (2006) appeal for dictionary use teaching. He wrote: The iPod comes with almost no instructions – you just have to figure it out, and most people under 30 have no problem with this. So it is incumbent on designers of dictionaries to create systems that users don’t have to learn and that don’t require elaborate explanatory material. Such an attitude is confirmed by Yamada’s (2010: 165) supposition that the teachers have been neglected because dictionary compilers are ‘impatient to see the teaching of dictionary use coming in a visible way’ and ‘have gone to great lengths to make their dictionaries accessible to users, in a sense bypassing the teachers’. Rundell’s assumption is flawed, for while it is true that the new generation of MLD users are technology savvy, it does not follow that their reference skills and linguistic knowledge are necessarily better than user-related research subjects examined in the past. The cognitive process and skill required of users, and their expectation of a dictionary consultation for linguistic needs are not comparable to circumstances when an iPod is used in searching for entertainment or news online. Indeed, extra help is needed to direct users to use dictionaries offered in the virtual medium, as Lew (Online Dictionary of English) concludes in his review of online dictionaries that ‘we have seen that a great variety of dictionaries exist, but without proper guidance users run the risk of getting lost in the riches’. Research findings on how foreign language teaching approaches, curricula and methodologies have impacted on students’ dictionary use and choice would deepen our understanding of users’ needs and, in turn, of the appropriate help that could be offered.
2.6.2 Dictionary use training and the language teacher In view of users’ general low-level and infrequent usage of the MLD, another solution that has been widely suggested by researchers is to provide training to users (e.g. Atkins and Varantola 1998, Cowie 1999, Chi 2003, Chon 2008, Chen 2010, Yamada 2010, Lew 2011b). Béjoint (1989: 212) contends that,
158
Researching Pedagogical Lexicography
The teaching of dictionary use is important not because it aims at improving the way dictionaries are used, but also because it might turn out in the long run to be instrumental in the general progress of lexicography. Béjoint produces a checklist of dictionary skills that an ideal user should possess; some skills are essential to all types of dictionary but some require the users to have knowledge of dictionary typology. Nesi (2003) offers a list of dictionary skills that could be taught at university level. Both lists provide valuable suggestions but are rather abstract and may need to be translated into practical teaching methodologies and classroom exercises for practical use. The items also demand that teachers possess lexicographical knowledge which they may not have, because lexicography as a subject is not included in most EFL teaching training courses (e.g. Gates 1997, Chi 2003, 2011). Many researchers suggest integrating dictionary skill teaching into an existing English syllabus, making the skill relevant to students’ immediate study needs because, as revealed in Cubillo’s (2002: 219) study, ‘reference skill acquisition was reinforced by significant purpose of use’, and in Frankenberg-Garcia’s (2011: 121) that ‘it makes more sense to help learners with dictionaries whenever the need for them arises’. Most agree that the training should start early since dictionary use is a skill that requires teaching and practice, and good attitudes should be instilled at a young age. While metalexicographers and lexicographers are absorbed in proving the need for dictionary training and laying out principles on what to teach, the how of teaching and who to teach remain unexplored. First, little research has been conducted exploring the methodology and syllabus for dictionary use teaching and their effectiveness for learners’ performance. Béjoint (2010: 260) considers improving the dictionary skills of users difficult since ‘it requires the cooperation of teachers, teaching systems, and governments in many cases, provided the users themselves are ready to be educated’. Second, can we assume English teachers are willing and/or capable partners in training students to use dictionaries? Researchers have expressed concerns with teachers not being willing to let students use dictionaries in the classroom or to complete linguistic tasks like reading comprehension and vocabulary learning (Tono 2001). Hartmann (2001) suggests that there exists a love-hate relationship between teachers and the dictionary: some teachers may be reluctant to train students because they may feel that once students become proficient in using dictionaries to assist learning, they will not need to depend on them any longer. As regards teachers’ ability to teach the subject, Boonmoh’s (2010) survey reveals that many university lecturers of English are unaware of the development and functions of electronic dictionaries and are unwilling to train their students. Even if teachers are willing and capable of providing dictionary training in the classroom, the subject lacks user-oriented teaching material (Nesi 2000a). One of the few research studies which aimed at teaching dictionary use to EFL teachers by raising their awareness and enhancing their knowledge of using dictionaries to teach was Bae’s (2015) study. Bae invited a group of Korean EFL teachers from public schools to attend a fourweek in-service teacher training course in dictionary use. Based on Nesi’s (2003) six categories of dictionary skills, Bae identified a list of skills relevant for her trainee-teachers as the core of her training course. Findings from the three surveys Bae conducted before, during and after the teaching respectively indicated that the subject teachers were more curious in knowledge ‘about, rather than of, using dictionaries’ and that ‘explicit teacher training can provide an opportunity
159
The Bloomsbury Handbook of Lexicography
where teachers are introduced to the riches of English dictionaries and explore their diverse functions and facilities for learning’ (2015: 63). Chi (2020) proposes professional lexicographical associations or societies organizing short workshops to help EFL teachers to attain ‘dictionary literacy for dictionary use teaching’. Such dictionary literacy implies, in addition to acquiring knowledge of, and, skills in using dictionaries themselves, EFL teachers need dictionary use teaching methodology, plans and materials to help them integrate dictionary use teaching for their designated students and within the constraints of their teaching circumstances. In short, researching dictionary use teaching is an unexplored area. Issues which require investigation and testing include syllabuses for teaching users at various proficiency levels, teaching methodologies, materials and assessment. To prepare such a teaching package, there is a need to identify and benchmark the threshold level for reference skills and English proficiency, if such a level exists at all. A teaching goal of dictionary use training could be helping users to reach a level where they could manoeuvre reasonably well (obtaining more successful lookups) in a printed or electronic dictionary for the completion of various linguistic tasks they are engaging in. Atkins and Rundell (2008) explain that the birth of a brand new dictionary starts and ends with the Marketing Department of the publisher concerned; and English teachers often learn about a particular dictionary only via the ESL/EFL marketing personnel. Joint research studies, publications and participation in conferences from researchers of both the language teaching and lexicography disciplines would help pedagogical lexicography’s future development. It is still very true as Hulstijn and Atkins (1998: 17) suggest, Juggling these two scenarios [the educational thinking of giving all information or withholding some until people are ready to understand] is the unenviable task of the lexicographer, understanding what’s going on in their dictionaries and teaching dictionary users to understand this, and adapt to it, calls to the language teacher. Collaborative efforts in dictionary use research are, we believe, the way forward.
References Allen, R. (1996), ‘The big four’, English Today 46, 12 (2), 41–7. Ard, J. (1982), ‘The use of bilingual dictionaries by ESL students while writing’, Review of Applied Linguistics 58, 1–27. Atkins, B.T. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, New York: Oxford University Press. Atkins, B.T.S. (ed.) (1998), Using Dictionaries: Studies of Dictionary Use by Language Learners and Translators, Tübingen: Max Niemeyer. Atkins, B.T.S. and K. Vanrantola (1998), ‘Monitoring dictionary use’ in B.T.S. Atkins (ed.), 21–81. Bae, S. (2015), ‘A course in dictionary use for Korean EFL teachers’, Lexicography 2, 45–69. Battenburg, J.D. (1991), English Monolingual Learner’s Dictionaries: A User-oriented Study, Tübingen: Max Niemeyer. Béjoint, H. (1981), ‘The foreign student’s use of monolingual English dictionaries: A study of language needs and reference skills’, Applied Linguistics 2 (3), 207–22.
160
Researching Pedagogical Lexicography
Béjoint, H. (1989), ‘The teaching of dictionary use: Present state and future tasks’ in F-J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds), Wörterbücher/Dictionaries/Dictionnaires: An International Encyclopedia of Lexicography, Vol. 1, Berlin and New York: Walter de Gruyter, 208–15. Béjoint, H. (2010), The Lexicography of English, Oxford and New York: Oxford University Press. Bogaards, P. (1996), ‘Dictionaries for learners of English’, International Journal of Lexicography 9 (4), 277–320. Bogaards, P. (1998), ‘Scanning long entries in learner’s dictionaries’ in T. Fontenelle, P. Hiligsmann, A. Michiels, A. Moulin and S. Theissen (eds), Actes EURALEXE ’98 Proceedings. Papers Submitted to the Eighth EURALEX International Congress on Lexicography in Liège, Belgium, Vol. 2, Liège, Belgium: University of Liège, English and Dutch Department, 555–63. Boonmoh, A. (2010), ‘Teachers’ use and knowledge of electronic dictionaries’, ABAC Journal 30 (3), 56–74. British Council (2013), ‘The English effect’. https://www.britishcouncil.org/research-policy-insight/ policy-reports/the-english-effect [accessed 24 August 2020]. Chan, A. (2012), ‘The use of a monolingual dictionary for meaning determination by advanced ESL learners in Hong Kong’, Applied Linguistics 33 (2), 115–40. Chan, A.Y.W. and A.J. Taylor (2003), ‘Evaluating learner dictionaries: what the reviews say’ in R.R.K. Hartmann (ed.), 254–73. Also in International Journal of Lexicography (2001) 14 (3), 163–80. Chen, Y. (2010), ‘Dictionary use and EFL learning. A contrastive study of pocket electronic dictionaries and paper dictionaries’, International Journal of Lexicography 23 (3), 275–306. Chi, M.L.A (2003), An Empirical Study of the Efficacy of Integrating the Teaching of Dictionary Use into a Tertiary English Curriculum in Hong Kong Vol IV: Research Reports, Ed. G. James, Hong Kong: Language Centre, Hong Kong University of Science and Technology. Chi, M.L.A (2010), ‘Applying formal vocabulary to academic writing: Is the task achievable?’ Reflections on English Language Teaching 9 (2), 171–90. Chi, M.L.A (2011), ‘When dictionaries support vocabulary learning, where to begin?’ in K. Akasu and S. Uchida (eds), ASIALEX Proceedings: Lexicography, Theoretical and Practical Perspectives. Papers Submitted to the Seventh ASIALEX Biennial International Conference, Kyoto, 22–24 August 2011, 76–85. Chi, M.L.A (2020), ‘Reconstructing the lexicographical triangle through teaching dictionary literacy to teachers of English’, Lexicography 7, 79–95. Chon, Y.V. (2008), ‘The electronic dictionary for writing: A solution or a problem?’ International Journal of Lexicography 22 (1), 23–54. Cowie, A.P. (1999), English Dictionaries for Foreign Learners: A History, New York: Oxford University Press. Cowie, A.P. (ed.) (1987), The Dictionary and the Language Learner: Papers from the EURALEX Seminar at the University of Leeds, 1–3 April 1985, Tübingen: Franke Verlag. Coxhead, A. (2000), ‘A new academic word list’, TESOL Quarterly 34 (2), 213–38. Cubillo, M.C.C. (2002), ‘Dictionary use and dictionary needs of ESP students: An experimental approach’, International Journal of Lexicography 15 (3), 206–28. Cumming, G., S. Cropp, and R. Sussex (1994), ‘On-line lexical resources for language learners: Assessment of some approaches to word definition’, System 22 (3), 369–77. Dalgish, G. (1995), ‘Learners’ dictionaries: Keeping the learner in mind’ in B.B. Kachru and H. Kahane (eds), Cultures, Ideologies, and the Dictionary: Studies in Honor of Ladislav Zgusta, Tübingen: Max Niemeyer, 329–38. De Schryver, G.M. (2003), ‘Lexicographers’ dream in the electronic-dictionary age’, International Journal of Lexicography 16 (2), 143–99. Dziemianko, A. (2006), User-friendliness of Verb Syntax in Pedagogical Dictionaries of English (Lexicographica. Series Maior 130), Tübingen: Max Niemeyer. Dziemianko, A. (2010), ‘Paper or electronic? The role of dictionary form in language reception, production and the retention of meaning and collocations’, International Journal of Lexicography 23 (3), 257–73.
161
The Bloomsbury Handbook of Lexicography
Fan, M. (2000), ‘The dictionary look-up behaviour of Hong Kong students: A large-scale survey’, Education Journal 28 (1), 123–38. Fox, G. (1989), ‘A vocabulary for writing dictionaries’ in M.L. Tickoo (ed.), 153–71. Frankenberg-Garcia, A. (2011), ‘Beyond L1-L2 equivalents: Where do users of English as a foreign language turn for help?’, International Journal of Lexicography 24 (1), 97–123. Gates, J.E. (1997), ‘A survey of the teaching of lexicography 1979–1995’, Dictionaries 18, 66–93. Gouws, R.H. (2009), ‘Sinuous lemma files in printed dictionaries: Access and lexicographic functions’ in S. Nielsen and S. Tarp (eds), 3–21. Hanks, P. (1987), ‘Definitions and explanations’ in J. Sinclair (ed.), Looking Up, London and Glasgow: Collins, 116–36. Hartmann, R.R.K. (2001), Teaching and Researching Lexicography, Harlow: Pearson Education Limited. Hartmann, R.R.K. (ed.) (2003), Lexicography Critical Concepts, London: Routledge. Hartmann, R.R.K. and G. James (1998), Dictionary of Lexicography, London: Routledge. Harvey, K. and D. Yuill (1997), ‘A study of the use of a monolingual pedagogical dictionary by learners of English engaged in writing’, Applied Linguistics 18 (3), 253–78. Hatherhall, G. (1984), ‘Studying dictionary user: Some findings and proposals’ in R.R.K. Hartmann (ed.), LEXeter ’83 Proceedings: Papers from the International Conference on Lexicography at Exeter, 9-12 September 1983, Tübingen: Max Niemeyer, 183–9. Herbst, T. (1986), ‘Defining with a controlled defining vocabulary in foreign learners’ dictionaries’, Lexicographica 2, 101–19. Herbst, T. (1996), ‘On the way to the perfect learners’ dictionary: A first comparison of OALD5, LDOCE3, COBUILD2 and CIDE’, International Journal of Lexicography 9 (4), 321–57. Herbst, T. and G. Stein (1987), ‘Dictionary-using skills: A plea for a new orientation in language teaching’ in A.P. Cowie (ed.), 115–27. Hulstijn, J. and B.T.S. Atkins (1998), ‘Empirical research on dictionary use in foreign-language learning: Survey and discussion’ in B.T.S. Atkins (ed.), 7–19. Humblé, P. (2001), Dictionaries and Language Learners, Frankfurt am Main: Haag und Herchen. Jackson, H. (2002), Lexicography: An Introduction, London: Routledge. Kernerman, I.J. and P. Bogaards (eds) (2010), English Learners’ Dictionaries at the DSNA 2009, Tel Aviv: K Dictionaries. Kipfer, B.A. (1987), ‘Dictionaries and the intermediate student: Communicative needs and the development of user reference skills’ in A.P. Cowie (ed.), 44–54. Kokawa, T. and S. Yamada (1998), ‘Review of Della Summers (ed.), Longman Dictionary of English Language and Culture, Harlow: Longman, 1992; Jonathan Crowther (ed.), Oxford Advanced Learner’s Dictionary of Current English, Encyclopedic Edition, Oxford: OUP, 1992’, International Journal of Lexicography 11 (4), 343–57. Koren, S. (1997), ‘Quality versus convenience: Comparison of modern dictionaries from the researcher’s, teacher’s and learner’s points of view’, TESL-EJ 2 (3). http://tesl-ej.org/ej07/a2.html [accessed 30 July 2012]. Landau, S.I. (1989), Dictionaries: The Art and Craft of Lexicography, Cambridge: Cambridge University Press. Lew, R. (2011a), ‘Online dictionaries of English’, published in P.A. Fuertes-Olivera and H. Bergenholtz (eds) (2011), E-Lexicography: The Internet, Digital Initiatives and Lexicography, London/New York: Continuum, 230–50. http://hdl.handle.net/10593/742 [accessed 30 July 2012]. Lew, R. (2011b), ‘Studies in dictionary use: Recent developments’, International Journal of Lexicography 24 (1), 1–4. Lew, R. and J. Doroszewska (2009), ‘Electronic dictionary entries with animated pictures: Lookup preferences and word retention’, International Journal of Lexicography 22 (3), 239–57. McArthur, T. (1989), ‘The background and nature of ELT learners’ dictionaries’ in M.L. Tickoo (ed.), 52–64. MacFarquhar, P.D. and J.C. Richards (1983), ‘On dictionaries and definitions’, RELC Journal 14 (1), 111–24. 162
Researching Pedagogical Lexicography
Nesi, H. (1987), ‘Do dictionaries help students write?’ in T. Bloor and J. Norrish (eds), Written language: British Studies in Applied Linguistics 2, London: Centre for Information on Language Teaching and Research, 85–97. Nesi, H. (1998), ‘Defining a shoehorn: The success of learners’ dictionary entries for concrete nouns’ in B.T.S. Atkins (ed.), 159–78. Nesi, H. (2000a), The Use and Abuse of EFL Dictionaries (Lexicographica. Series Maior 98), Tübingen: Max Niemeyer. Nesi, H. (2000b), ‘Electronic dictionaries in second language vocabulary comprehension and acquisition: The state of the art’ in U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), Proceedings of the Ninth EURALEX International Congress, EURALEX 2000 Stuttgart, Vol. 2, 839–47. Nesi, H. (2003), ‘The specification of dictionary reference skills in higher education’ in R.R.K. Hartmann (ed.), 370–93. Also in R.R.K. Hartmann (ed.) (1999), Dictionaries in Language Learning, Berlin: Free University/FLC/TNP, 53–67. Nesi, H. and H.T. Kim (2011), ‘The effect of menus and signposting on the speed and accuracy of sense selection’, International Journal of Lexicography 24 (1), 79–96. Nesi, H. and P. Meara (1994), ‘Patterns of misinterpretation in the productive use of EFL dictionary definitions’, System 22 (1), 1–5. Nielsen, S. (2009), ‘Reviewing printed and electronic dictionaries: A theoretical and practical framework’ in S. Nielsen and S. Tarp (eds), 23–41. Nielsen, S. and S. Tarp (eds) (2009), Lexicography in the 21st Century, Amsterdam: John Benjamins. Nuccorini, S. (1992), ‘Monitoring dictionary use’ in H. Tommola, K. Varantola, T. Salmi-Tolonen and J. Schopp (eds), EURALEX ’92 Proceedings I–II. Papers submitted to the 5th EURALEX International Congress on Lexicography in Tampere, Finland: University of Tampere, 89–102. Reif, J. A. (1987), ‘The development of a dictionary concept: An English learner’s dictionary and an exotic alphabet’ in A.P. Cowie (ed.), 146–58. Rundell, M. (1988), ‘Changing the rules: Why the monolingual learner’s dictionary should move away from the native-speaker tradition’ in M. Snell-Hornby (ed.), ZüriLEX’86 Proceedings. Papers Read at the Euralex International Congress 1986, Tübingen: Franke Verlag, 127–37. Rundell, M. (1998), ‘Recent trends in English pedagogical lexicography’, International Journal of Lexicography 2 (4), 315–42. Rundell, M. (2006), ‘More than one way to skin a cat: Why full-sentence definitions have not been universally adopted’ in E. Corino, C. Marello and C. Onesti (eds), Proceedings of the XII EURALEX International Congress at Università di Torino, 323–38. Rundell, M. (2007), ‘Review of: Dziemianko, A. (2006), User-friendliness of verb syntax in pedagogical dictionaries of English, Tübingen: Max Niemeyer’, Kernerman Dictionary News 15. http:// kdictionaries.com/kdn/kdn15/kdn1507-rundell.html [accessed 30 July 2012]. Rundell, M. (2010), ‘What future for the learner’s dictionary?’ in I.J. Kernerman and P. Bogaards (eds), 169–75. Scholfield, P. (1999), ‘Dictionary use in reception’, International Journal of Lexicography 12 (1), 13–34. Stein, G. (1979), ‘The best of British and American lexicography’, Dictionaries 1, 1–23. Swan, M. (2010), Oxford Advanced Learner’s Dictionary: Foreword, Oxford: Oxford University Press. Taylor, A. and A. Chan (1994) ‘Pocket electronic dictionaries and their use’ in W. Martin, W. Meijs, M. Moerland, E. Ten Pas, P. Van Sterkenburg and P. Vossen (eds), EURALEX 1994 Proceedings, Amsterdam: Vrije Universiteit, 598–605. Thumb, J. (2004), Dictionary Look-up Strategies and the Bilingualised Learner’s Dictionary, Tübingen: Max Niemeyer. Tickoo, M.L. (1989), ‘Which dictionaries and why? Exploring some options’ in M.L. Tickoo (ed.), 184–203. Tickoo, M.L. (ed.) (1989), Learners’ Dictionaries: State of the Art, Singapore: SEAMEO Regional Language Centre. Tomaszczyk, J. (1979), ‘Dictionaries: Users and uses’, Glottodidactica 12, 103–19.
163
The Bloomsbury Handbook of Lexicography
Tono, Y. (2001), Research on Dictionary Use in the Context of Foreign Language Learning (Lexicographica. Series Maior 106), Tübingen: Max Niemeyer. Tono, Y. (2011), ‘Application of eye-tracking in EFL learners’ dictionary look-up process research’, International Journal of Lexicography 24 (1), 124–53. Van Der Meer, G. (2002), ‘Dictionary entry and access. Trying to see trees and woods’, EURALEX 2002 Proceedings (digital version). http://www.euralex.org/elx_proceedings/Euralex2002/055_2002_V2_ Geart%20van%20Der%20Meer_Dictionary%20Entry%20and%20Access%20Trying%20to%20 see%20Trees%20and%20Woods.pdf [accessed 30 July 2012]. Van Der Meer, G. and R. Sansome (2001), ‘OALD6 in a linguistic and a language teaching perspective’, International Journal of Lexicography 14 (4), 283–306. Wiegand, H.E. (1985), ‘Fragen zur Grammatik in Wörterbuchbenutzungsprotokollen. Ein Beitrag zur empirischen Erforschung der Benutzung einsprachiger Wörterbücher’ in H. Bergenholtz and J. Mugdan (eds), Lexikographie und Grammatik, Tübingen: Max Niemeyer, 20–98. Yamada, S. (2009), ‘EFL dictionaries on the web: Students’ appraisal and issues in the Cambridge, Longman, and Oxford dictionaries’ in B.Y.V. Ooi, A. Pakir, I.S. Talib and P.K.W. Tan (eds), Perspectives in Lexicography: Asia and Beyond, Tel Aviv: K Dictionaries, 87–104. Yamada, S. (2010), ‘EFL dictionary evolution: Innovations and drawbacks’ in I.J. Kernerman and P. Bogaards (eds), 147–68.
164
11
Monolingual learners’ dictionaries – past and future1 Shigeru Yamada
1 Introduction The EFL dictionary can be said to be a delicate answer to the negotiation of various factors, demands and considerations by and of the ‘protagonists’ (compilers, users, teachers and researchers) (Hartmann 2001: 24–7). Since the first main result bore fruit in ISED2 in 1942, the negotiation has been continuing incessantly and at various speeds, producing more fruit. There have been shifts in emphasis, momentum and fashion. There have emerged major technological advances, which have affected and changed the methods of lexicography and the shape of dictionaries. This chapter deals with EFL dictionaries in the narrow sense – monolingual English dictionaries for foreign students of English – the most advanced and influential genre not only of learners’ dictionaries but also of all dictionary types. In an attempt to provide an overview of the EFL dictionary, we will look first at the developmental stages that those dictionaries went through and then at seven of their important innovations and features. The implications of the digital medium for the presentation and consultation of a dictionary will also be touched upon (see also Chi, Chapter 10).
2 Development of EFL dictionaries The important dictionaries from major publishers, almost all for advanced learners, are laid out in Table 11.1. We will divide the evolution of EFL dictionaries into five stages and characterize each as appears in the section title. We will look at each stage with reference to the associated dictionaries and their innovations and features.
2.1 Prelude and the first period (1942–73): Beginning and monopoly The prototypical EFL dictionary was materialized in ISED. The dictionary is supposed to have been the best practice put together by English-speaking scholars engaged in English language
The Bloomsbury Handbook of Lexicography
Table 11.1 Development of Mainstream EFL Dictionaries. Kaitakusha/ OUP
Longman
Chambers/ CUP
OALD3 (1974)
LDOCE1 (1978)
CULD (1980)
OALD4 (1989)
LDOCE2 (1987)
OALD5 (1995)
LDOCE3 (1995)
Collins
Macmillan
MerriamWebster
ISED (1942) ALD2 (1963)
COBUILD1 (1987) CIDE (1995)
OALD6 (2000)
COBUILD3 (2001) LDOCE4 (2003)
OALD7 (2005) OALD8 (2010)
OALD9 (2015) OALD10 (2020)
COBUILD2 (1995)
LDOCE5 (2008)
LDOCE6 (2014)
COBUILD4 (2003)
MED1 (2002)
CALD2 (2005)
COBUILD5 (2006)
MED2 (2006)
CALD3 (2008)
COBUILD6 (2009)
CALD4 (2013)
COBUILD7 (2012) COBUILD8 (2014)
MWALED1 (2008)
MWALED2 (2016)
COBUILD9 (2018)
teaching in Japan back in the 1920s–40s, drawing ideas from Europe and India as well as their dealings with Japanese students. In Bengal, India, Michael West conducted a reading-centred approach, creating graded readers, whose vocabulary items were carefully controlled. In 1935, he with James Endicott produced a companion dictionary to his textbooks – NMED. It characteristically employs a limited defining vocabulary of 1,490 words with which to define 18,000 words and 6,000 idioms (Preface, iii). In 1934, the Carnegie Corporation Conference was held on the initiative of West. It eventually led through the Interim Report on Vocabulary Selection (1936) to the publication of his influential General Service List of English Words (1953). In 1921, Harold E. Palmer was invited by the Ministry of Education to reform Japan’s ELT and stayed in the country until 1936. GEW was published in 1938. Although alphabetical, this precursor to EFL dictionaries is geared to the user’s productive needs, especially in composition (Introduction; iii, v–vi), rather than their receptive needs (cf. Cowie 1999: 36). The dictionary with some 1,000 headwords meticulously provided verb patterns with abundant examples without definitions being regularly given to each sense (Introduction, ix). GEW’s method of indicating 166
Monolingual Learners’ Dictionaries
verb patterns for the dictionary had a far-reaching influence on EFL dictionaries to come (Cowie 1999: 37). The dictionary ‘left its mark on Hornby’s Idiomatic and Syntactic English Dictionary … and helped its strong “productive” character’ (ibid. 3). A. S. Hornby came to Japan in 1924. In 1936 Palmer left Japan in the middle of the dictionary project, after compiling the headword list (Imura 1997: 209). The unfinished groundwork towards the dictionary (revision and enlargement of the selection of collocations, finalization of 25 verb patterns) was taken over, refined and brought to a workable level by Hornby, together with E. V. Gatenby and H. Wakefield (cf. Naganuma 1978: 11–12). The manuscript was completed as a result of five years’ concerted endeavour (Imura 1997: 236). ISED is a small book by today’s standards of EFL dictionaries in terms of both volume and content (1,519 pages with about 30,000 entries). Hornby seems to have consciously kept the vocabulary size as it was since he intended his dictionary for pre-university EFL students both for decoding and encoding (and continuing at the university level and above as far as production is concerned) (Introduction, iv). Pronunciation is indicated by means of the IPA. McArthur (1992: 593–4) summarizes the characteristics of ‘Hornby’s dictionary’ as follows: 1. Headwords chosen because experienced teachers believed they were the most useful to foreign learners. 2. The omission of archaic usage, historical and literary references, and etymology. 3. The provision of the pronunciation of each headword in IPA transcription derived from Jones’s EPD. 4. Meanings given in simple language, avoiding the often convoluted and Latinate constructions used in many mother-tongue dictionaries. 5. Meanings explained by definitions and specimen phrases and sentences to show the headword in use. 6. Grammatical information on every headword provided, including codes referring to the syntactic patterns of all listed verbs. 7. Illustrations providing further information and serving to break up the text. 8. Language-related appendices at the back of the book. All (a few modified) underlie the basic principles of contemporary EFL dictionaries. ISED can be considered to be the best mix put together by English-speaking scholars engaged with ELT in Japan: IPA, research into verb patterns, vocabulary limitation, and collocation, and the editors’ expertise in teaching EFL students. Hornby and his colleagues should be highly praised for making next-to-impossible possible: reconciling opposing double purposes, and ‘simplicity’ (in definitions, specimens of usage and language notes) and ‘complexity’ (in the phonetic transcriptions and grammatical codes) (McArthur 1992: 594) in 1,519 pages in the foreign language in an accessible way for the target users.
2.2 The second period (1974–86): Rivalry and sophistication During this period, much effort was invested into incorporating additional information, including the results of linguistic research. This was realized by means of grammar codes and ‘dictionaryese’ to attain the maximum use of the limited dictionary space. With Longman entering into rivalry with Oxford, the EFL dictionary market became competitive. 167
The Bloomsbury Handbook of Lexicography
LDOCE1 brought in a number of key features and improvements. The most influential is its defining vocabulary. It is a selection of approximately 2,000 words, based on A General Service List of English Words (1953). LDOCE1 attempted to provide a detailed account of English usage. The Survey of English Usage (SEU) was frequently referred to and many example phrases and sentences were drawn from the collection (LDOCE1, Preface, xxvi). Grammatical information was primarily based on A Grammar of Contemporary English (1972) (McArthur 1992: 594), which is also based on the SEU. Detailed grammar codes were given and extended to nouns and adjectives. A conscious effort was made to describe the pronunciation and vocabulary of American English. Stress shift was indicated. Four hundred eighty-eight usage notes were incorporated. OALD3 had broken verb patterns down into 51, compared to ISED’s 33. The coding system of LDOCE1 was more ‘user-friendly’ than that of OALD3 because at least part of the former was mnemonic (Cowie 1990: 688). However, it was pointed out that ‘there is a real danger of opening the gap which is known to exist between the sophistication of some features of dictionary design and the user’s often rudimentary reference skills’ (Cowie 1981: 206). The intermediate student’s CULD took full account of this: conservative, plain and simple, giving minimum essential information (Takebayashi et al. 1982: 108–9) without using any complicated codes or language.
2.3 The third period (1987–94): Competition and versatility The year 1987 saw the publication of COBUILD1 and the revision of LDOCE. The former is ‘revolutionary’ in the history of lexicography. Editor-in-Chief, John Sinclair states in the Introduction (xv) that the dictionary is different from others in the kind of information, its quality, and presentation. Breaking away from the prevalent method of depending on the introspections of lexicographers and foregoing dictionaries, the COBUILD team based their dictionary on a 7.3-million-word corpus. This is by far the single most important innovation (cf. Rundell 2006b: 741). ‘Usage cannot be invented, it can only be recorded’ (Introduction, xv). With the purpose proclaimed on the dust jacket of ‘HELPING LEARNERS WITH REAL ENGLISH’, absolute editorial faith was placed in corpus evidence, and the frequency information dictated the selection of headword items, the discrimination and arrangement of senses, and the identification of grammatical and lexical patterns. Example sentences were basically drawn from the corpus rather than being invented for the purpose of the dictionary. Definition was made universally in ordinary prose – the full sentence definition (FSD) – without recourse to special dictionary conventions or a restricted defining vocabulary. Grammar codes were aligned in the Extra Column to the right of the main entry, and the grammatical terms were explained in boxes and scattered in their alphabetical places. Information on meaning relations (synonymy [indicated by ‘=’], antonymy [≠], and hyponymy [⇑]) was also presented in the Extra Column. Pronunciation was represented in a unique way with superscript numbers indicating the range of variations of vowels and some consonants (Kojima et al. 1989: 152). COBUILD1 will be remembered as an important milestone in the history of lexicography. The epoch-making dictionary is not free from criticisms (see Section 3.6 Examples, for instance).3 However, it is significant that the dictionary has introduced a new look at the language and fresh perspectives on lexicography with the powerful corpus tool.
168
Monolingual Learners’ Dictionaries
The revision of LDOCE was undertaken considering the reactions of users to the first edition, academic reviews, and the publisher’s international user research. As a result, the usefulness of the defining vocabulary was confirmed. The complicated grammar codes were replaced with more transparent ones (see Table 11.2). The Longman Citation Corpus was consulted and many examples were based on the corpus. Special attention was given to collocations, which were indicated in bold in the examples. In addition to 471 Usage Notes, 20 Language Notes were newly incorporated to give guidance on pragmatics in particular. Two years later, in 1989, OALD was revised, using the conventional method. The layout was greatly improved. The indication of verb patterns was dramatically changed. The patterns were reduced to 32 from 51 in OALD3. The complicated alpha-numeric codes were replaced with mnemonic ones (see Table 11.2) together with the indication of complementation. The definitions were made easier (Takahashi et al. 1992: 196) but still difficult without using a defining vocabulary. The definitions offered abundant information on selectional restrictions and the examples showed many collocations. Two hundred ‘Notes on Usage’ were newly incorporated. The location of stress was indicated on all compounds and basically all idioms (ibid. 78–80). The EFL dictionary scene at this period was vibrant, with the three distinct dictionaries competing head-to-head with each other: the conservative OALD4 (introspective with phrase definition and invented examples) and the revolutionary COBUILD1 (subjective with FSDs and corpus-based examples) at extremes and the moderate LDOCE2 (linguistic databases referred to and defining vocabulary employed) in the middle. The rich variety offered to users choice and the opportunity to learn from comparison (Yamada 2010: 150).
2.4 The fourth period (1995 onwards): check and convergence The year 1995 witnessed the revisions and the publication of the ‘Big Four’ EFL dictionaries: CIDE, COBUILD2, OALD5 and LDOCE3 (in order of publication). Macroscopically, it is in this period that the genre of EFL dictionaries has turned in the direction of ‘convergence’ (Rundell 2006b) in the names of corpus-basis and user-friendliness. All four dictionaries claim corpus basis. In competition with the COBUILD corpus, a balanced corpus, the British National Corpus, was developed, on which OALD5 and LDOCE3 were based. It was the natural course that frequency information took precedence and that corpusbased examples became dominant (see Section 3.6 Examples, for detail). Informed partly by the results of increasing user studies, user-friendliness was pursued in the direction of ‘easier access and more lucid presentation’ (McArthur 1992: 594). Navigational aids (‘guide words’ (CIDE4) and ‘signposts’ (LDOCE3)) were newly incorporated (see Section 3.3 Signposts and menus, for detail). To increase lucidity, the idea of a defining vocabulary (initiated by LDOCE1) and the FSD (by COBUILD1) proliferated to other dictionaries. These features are not free from problems in themselves as detailed below (3.4, Defining vocabulary and 3.5, Defining style). It appears that user-friendliness has gone too far, considering the intended audience – advanced students and teachers. Codes and abbreviations came to be spelled out (e.g. see Table 11.2). Especially in the print medium, it is problematic that lucidity was realized in a space-wasting manner, involving repetition and redundancy. 169
The Bloomsbury Handbook of Lexicography
The above may be too simplistic an observation, but it cannot be denied that EFL dictionaries have been coming closer to each other than ever before in information content and structure. Apart from the obvious fact that the corpus is an influential, huge game changer, dictionaries tend to jump on the bandwagon of rivals’ successful features, which has unfortunately deprived the dictionary scene of the pre-1995 variety and individuality. Undoubtedly, there have been incremental improvements and revisions (which are of crucial importance), but, overall, mouldbreaking innovations are long overdue. (With shortening revision intervals) the differences between two consecutive editions have been diminishing.5
2.5 The fifth period (1990s–): Going digital Following the corpus revolution (since 1981), Rundell (2012b) speaks of a second revolution in dictionaries: the print-to-digital transition. It started slowly from the early 1990s with dictionaries on CD-ROMs and handheld devices (‘changes are mostly cosmetic’ [ibid.]) and has rapidly accelerated since about 2008 with the Web assuming the central role and the rise of mobile devices.6 Rundell (2012a) considers the second revolution’s ‘consequences could be even more far-reaching’. While the corpus revolution has profound but mostly ‘internal’ effects, affecting lexicographers’ working methods but without changing the format of dictionary itself, the second revolution has ‘external’ effects, changing the way information is published and accessed (Rundell 2012b). On 5 November 2012, Macmillan announced that they would no longer publish print dictionaries from 2013, shifting their focus to digital resources.7 The breaking news circulated the world, meeting with mixed responses. Considering practical utility, Rundell, editor-in-chief of MED, strongly argues for the digital medium as the dictionary’s best platform: Not only do we now have the tools to broaden coverage and keep dictionaries up to date, but the multimedia, hyperlinking, and interactive features of the Web open up opportunities for improving our descriptions of language and extending the scope of what dictionaries do. (Rundell 2013: 3) Macmillan Dictionary Online provides not only an English dictionary and thesaurus8 but such resources as blogs, ‘BuzzWord’ columns, and the crowd-sourced ‘Open Dictionary’. In fact, a majority of learner’s dictionary sites offer additional resources and functions to the user’s reference advantage, including other dictionaries (native speakers’, bilingual and specialized9), thesauri, grammar, collocation, translation, blogs, quizzes, videos and Word of the Day. As evidence of continuous updating, OALD Online lists newly added entries and senses under ‘NEW entries added’ on its front page. The Cambridge Dictionary site shows candidates to be added to the dictionary under ‘New words’ and takes a vote among users whether to include the item or not. Rundell (2012a) sees the potential and value of crowd-sourcing in dictionary making in getting hold of neologisms, regionalisms and technical terms. The Merriam-Webster Learner’s Dictionary site provides opportunities for user participation with ‘Ask the Editor’ and ‘Comments & Questions’.
170
Monolingual Learners’ Dictionaries
Migrating to digital media, EFL dictionaries differentiate information and functions. For example, OALD10 (2020) provides the following only on its premium online and app:10, 11 Web and app: More like this columns, Word of the Day Web only: Oxford iSpeaker, Oxford iWriter, Spell Checker, My Word Lists, Oxford Phrase List, Resources, Text Checker App only: Favourites, Quiz (Yamada 2020: 4)
3 Important innovations and features This section deals with seven important information categories and features of EFL dictionaries: frequency information, grammar, signposts and menus, defining vocabulary, defining style, examples and efforts to go beyond the alphabetical list. (See also Kilgarriff, Chapter 7 and Lew, Chapter 15.)
3.1 Frequency information After the corpus-based COBUILD1, frequency information has been extensively used throughout the compilation stages. Through the analysis of corpus material, frequent enough vocabulary items, meanings and patterns are identified, entered and arranged in frequency order.12 However, in the finished product, a considerable amount of the information is inevitably lost in the conventional linear style of presentation. There is no knowing how frequent the first sense of a word is in comparison with the second, for example. Conscious efforts have been made to overcome this problem. COBUILD2 introduced the five-level Frequency Bands to indicate the relative importance of words. The most important 14,700 or so words are given black diamonds in the Extra Column, according to their frequency in the Bank of English.13 Now based on the Longman Corpus Network, LDOCE3 presented frequency information in two ways. First, the dictionary distinguishes between written and spoken English and indicates the most frequent 3,000 headwords in three levels in each of the categories. For instance, reasonable is indicated with ‘S1’ and ‘W2’ in the margin (the former on top of the latter), meaning the word is among the most frequently used 1,000 words of spoken English and among the second most frequently used 1,000 words of written English. Over 150 eye-catching bar graphs are introduced to represent relative frequencies of synonyms, grammatical and collocational patterns, and the distribution of words between spoken and written media and between British and American English. OALD7 introduced the Oxford 3000TM, which has a double function: as a defining vocabulary and as a starting point for vocabulary expansion (R99). The 3,000 words were selected on three grounds: frequency from the analysis of corpora, range of use in different text types and familiarity to most users of English; the British National Corpus, the Oxford Corpus Collection, and over 70 experts in teaching and language study were consulted (ibid.). The words of the Oxford 3000 are shown in larger type and with a key symbol. It is noteworthy that, in addition to headwords, the key senses are given a small key symbol in OALD8.14 171
The Bloomsbury Handbook of Lexicography
For OALD10, the Oxford 3000 was selected for frequency (based on the Oxford English Corpus of two billion words) and for relevance (on ‘a specially created corpus of Secondary and Adult English courses’ published by OUP) (x). The Oxford 3000 is for learners up to B2/ upper-intermediate level on the Common European Framework of Reference (CEFR) (ibid.). The Oxford 5000TM, additional 2,000 words at B2–C1 level, was introduced as targets for more advanced learners to expand their vocabulary (ibid.). A key plus symbol is provided for relevant headwords and senses. In OALD10, the headwords in the Oxford 3000 and Oxford 5000 are indicated with the CEFR grades (the former with A1–B2 and the latter with B2 or C1) (x). CALD 4 (2013) was the first to provide the CEFR labels for important words, meanings and phrases (Introduction, ix).15 Nation (2001) suggests that after learning the basic 2,000–3,000 words, learners should turn their attention to more specialized vocabulary across several fields, for example, academic English, which is high frequency vocabulary worth studying for students of English for academic purposes. There are dictionaries geared towards these needs: e.g. MSD (2004), LSDAE1 (2006), CACD (2009) and OLDAE (2014). Along with CACD, other EFL dictionaries depend on the Academic Word List by Coxhead, labelling the words included in the list. This method was initiated by LED (2006) (Dohi et al. 2010: 98) and was followed by LDOCE5 (2009), OALD8 (2010), etc.16 Using the Oxford Corpus of Academic English (71 million words) and the British Academic Spoken English corpus (1.2 million words), OUP developed their own Oxford Phrasal Academic LexiconTM (OPAL), ‘a collection of four different word lists that together provide an essential guide to the most important words to know in the field of English for Academic Purposes’:17 Written words: 12 sublists (1,200 words) Spoken words: 6 sublists (600 words) Written phrases: 15 functional areas (370 phrases) Spoken phrases: 16 functional areas (250 phrases) The OPAL is included in the online version of OALD10, from which the items can be consulted in OLDAE. In the print OALD10, the OPAL words are labelled in the following way: those in the list of written words as ‘W’, that of the spoken words as ‘S’ and both as ‘O’.18
3.2 Grammar Presentation of grammatical information has undergone a drastic change in the course of the development of EFL dictionaries – from opaque to transparent: codes through abbreviations to spell-outs. Table 11.2 chronologically sets out how major EFL dictionaries indicate the pattern of ‘want+object+to-infinitive’: The roots of grammar codes can be traced back to Palmer’s GEW,19 a precursor to EFL dictionaries. The scheme of this productively orientated dictionary ‘was later to be applied, with minor or major variations, in the first four editions of ALD and in various rival compilations’ (Cowie 1999: 37). As can be seen in Table 11.2, however, the ‘descriptively powerful’ model was essentially ‘difficult to learn’ (Rundell 2006b: 740). Improving upon OALD’s, LDOCE1 adopted a more systematic20 and thus ‘user-friendly’ coding system. This system not winning it the popularity of users (‘General Introduction’, F9),21 LDOCE2 turned sharply to transparency by abandoning the codes for grammatical patterns.22 172
173
1/e (1978) v [Wv6] 1 [ … V3 … ]
2/e (1987) v [not usu. in progressive forms] 1 [T] … [+obj+to-v] He wants you to wait here**.
3/e (1995) v [not usually in progressive] 1 … [T] … want sb to do sth I don’t want Linda to hear about this.
4/e (1989) v 1 [ … Tnt no passive …]
5/e (1995) v 1 … [V.n to inf] She wants me to go with her.
LDOCE
3/e (1974) vt, vi 2 [VP … 17]
ALD2 (1963) v.t. & i. 2. (VP … 3)
ISED (1942) vt. & i. ❷ (P … 3)
OALD
CIDE (1995) want (obj) … v … Do you want me to take you to the station? [T + obj + to infinitive]
CULD (1980) (not usu used with is, was etc and -ing (defs 1, 2): not used with is, was etc and -ing (defs 3, 4)) 1 vt
CALD
2/e (1995) 1 … VB: no cont, no passive … V n to-inf …
1/e (1987) 2 … V + O + to-INF: NO IMPER …
COBUILD
Table 11.2 Indication of ‘want+object+to-infinitive’ in Major EFL Dictionaries.* MED
MWALED
174
6/e (2009)** 1 VERB [no cont, no passive] … [v n to-inf] They began to want their father to be the same as other daddies. 7/e (2012) 1 VERB [no cont, no passive] … [v n to-inf] They began to want their father to be the same as other daddies.
3/e (2008) verb [T] 1 … [+ OBJ + to INFINITIVE] Do you want me to take you to the station?
4/e (2013) ▶verb [T] … 1 … [+obj+ to infinitive] Do you want me to take you to the station?
5/e (2008) v [not usually in progressive] 2 … [T] … want sb to do sth I want you to find out what they’re planning.
8/e (2010) verb [T] (not usually used in the progressive tenses) … 1 … ~ sb/sth to do sth Do you want me to help?
5/e (2006) 1 … VERB: no cont, no passive … V n to-inf …
4/e (2003) 1 … VERB: no cont, no passive … V n to-inf … 2/e (2005) verb [T] 1 … [+ obj + to infinitive] Do you want me to take you to the station?
4/e (2003) v [not usually in progressive] 2 … [T] … want sb to do sth I want you to find out what they’re pla nning.
3/e (2001) 1 … VB: no cont, no passive … V n to-inf …
7/e (2005) verb (not usually used in the progressive tenses) … 1 … [VN to inf] Do you want me to help?
6/e (2000) verb (not usually used in the progressive tenses) … 1 … [VNtoinf] Do you want me to help?
2/e (2006) 1 … want sb/sth to do sth Her parents didn’t want her to marry him.
1/e (2002) … verb [T] … 1 … want sb/sth to do sth Her parents didn’t want her to marry him.
1/e (2008) verb 3 not used in progressive tenses [+ obj]
175
6/e (2014) 2 … [T] … want sb to do sth I want you to find out what they’re planning.
9/e (2018) 1 VERB [no cont, no passive] … [v n to-inf] They began to want their father to be the same as other daddies.
8/e (2014) 1 VERB [no cont, no passive] … [v n to-inf] They began to want their father to be the same as other daddies.
* When a grammar code is attached to an example, the example is provided. ** COBUILD6 abolishes the extra column, placing a grammar code in front of the corresponding example. ***‘A verb that is always transitive in all its meanings is just marked verb, and no other verb code is given’ (OALD9, R4).
10/e (2020) verb (not usually used in the progressive tenses) … 1 … ~ sb/sth to do sth Do you want me to help?
9/e (2015) verb*** [T] (not usually used in the progressive tenses) … 1 … ~ sb/sth to do sth Do you want me to help?
2/e (2016)
The Bloomsbury Handbook of Lexicography
Rundell (2006b: 741) summarizes the development as follows: ‘More recently, the emphasis has shifted toward a simpler, surface-grammar model which – while sacrificing some of the delicacy of earlier systems – assumes very little grammatical knowledge on the part of user.’ This shift, taking place markedly in the mid-1990s, manifested itself strikingly in LDOCE3 (see Table 11.2). To ensure understanding, this edition went for partial spell-out, which is closely associated with the productively geared LLA1’s ‘propositional forms’ (F10). Rundell (2006b: 741) also offers the following observation: ‘The economy of the older systems allows them to encode every possible pattern for a given meaning, regardless of its frequency. … The current approach … emphasizes what is typical over what is possible … (emphasis original)23.’ This trend is related to the availability of corpus data to identify what is typical and inevitably with space limitation. However, if the EFL dictionary is truly to help users in their productive activities as well, the dictionary should present much more than what is typical.
3.3 Signposts24 and menus As an EFL dictionary consists of ‘the sheer mass of condensed target language text in monolingual entries’, Scholfield (1996) points out as a long-standing, major challenge that the user has to meet the task of ‘wading through this picking out the numbered definitions and checking each one to find the right one’. Publishers responded to this problem with signposts and menus from 1995 onwards. The relevant dictionaries and editions are as follows: With emphasis on ‘Fast access’ (Introduction, xi), LDOCE3 offers menus and signposts. The guide to the dictionary explains how an entry is organized and how the menus and signposts help reference: In some of the longer entries, meanings that are closely related to each other are grouped together in ‘paragraphs’, or sections in the entry. A menu at the beginning of the entry tells you the paragraph headings, so that you can easily find the section that contains the sense that you want. All these senses begin on new lines, and they have signposts where these are helpful. (LDOCE3, xvii) Table 11.3 EFL Dictionaries Adopting Signposts and Menus. Guide words
Short cuts
Signposts & Menus
Menus
CIDE (1995) CALD2 (2005) CALD3 (2008) CALD4 (2013)
OALD6 (2000) OALD7 (2005) OALD8 (2010) OALD9 (2015) OALD10 (2020)
LDOCE3 (1995) LDOCE4 (2003)* LDOCE5 (2008)* LDOCE6 (2014)*
COBUILD3 (2001) MED1 (2002) COBUILD4 (2003) COBUILD5 (2006) MED2 (2006) COBUILD6 (2009) COBUILD7 (2012) COBUILD8 (2014) COBUILD9 (2018)
*Signposts only. 176
Monolingual Learners’ Dictionaries
The menu for shoot looks as follows: ① GUN/WEAPONS ④ QUICK/SUDDEN ②SPORT ⑤ OTHER MEANINGS ③ SPEAK/TALK/ASK Each item in the menu is repeated at the beginning of an entry. The first two subsume the following signposts and a phrase:25 ① GUN/WEAPONS 1 ▶KILL/INJURE◀ 2 ▶FIRE A GUN◀ 3 ▶BIRDS/ANIMALS◀ ② SPORT 4 [No signpost] 5 shoot pool/billiards, etc. The signpost ‘may be a synonym, a short definition, or the typical subject or object of a verb’ (LDOCE3, ‘Guide to the Dictionary’, xvii). In view of users having difficulty navigating long entries, CIDE drastically changed the macrostructure – building each entry around one core meaning instead of cluttering entries with numbered senses. The guide word, provided after the headword (e.g. bear ANIMAL and bear CARRY), helps users to distinguish between senses of the same word. COBUILD dictionaries have stuck to the one word, one entry policy. To help the user’s reference, the second edition introduced ‘superheadwords’ and the third edition menus. The menus are characteristically grammar-based (Masuda et al. 2003: 29) and ‘may not have been changed since COB[UILD]3’ (Kokawa et al. 2020: 60). MED1 (2002) only makes use of menus. Entries with five or more senses are provided with a menu at the top (‘Using your Dictionary’, xi). The one for shoot looks like this: 1 fire gun 4 take photographs, etc. 2 in sports 5 put drug in body 3 move suddenly & quickly + PHRASES Influenced by English–Japanese dictionaries, editor-in-chief Michael Rundell (personal communication) opted for menus for the following reasons: with the information all at the top of the entry, it is easier to see the full picture; since the layout of the menus usually allows lexicographers a little more space than is available for signposts, the clues for users are a little more likely to be helpful. The signpost is a welcome feature (Ichikawa et al. 2005: 28–9) and its effect is empirically supported. Those dictionaries with signposts (LDOCE3 and CIDE) are conducive to better and quicker reference than those without (COBUILD2 and OALD5) (Bogaards 1998: 560). However, the signpost is not free from problems. There are three from Yamada (2010: 154–6). First, while offering practical help, signposts lack system and consistency. Akasu et al. (1996: 38) question the obscure selection process of CIDE’s guide words and the occasional mismatches between the heading and the guide word in parts-of-speech. 177
The Bloomsbury Handbook of Lexicography
Urata et al. (1999: 78–9) identify six categories for LDOCE3’s signposts: synonyms; short definitions; hypernyms; typical subjects; typical objects; context, purpose, etc. The second problem is related to this miscellany, which may make it difficult for users to establish a systematic search rhythm (Yamada 2010: 155). This is aggravated by the mixing of signposts with phrases by the dictionary (see note 25). The last problem is that some signposts are redundant. Urata et al. (1999: 78) observe that they just repeat part of the definition or summarize the definition, with reference to LDOCE3 (e.g. stir 3 ▶MOVE SLIGHTLY◀ … b) to move slightly). This cannot be considered an efficient use of space.
3.4 Defining vocabulary Dictionary definition has to observe this basic rule: a concept whose content has a certain complexity should be described in a dictionary by means of other less complex concepts (Svensén 1993: 135). In addition, an EFL dictionary is faced with the demanding task of explaining the meaning of the user’s L2 word in the L2 in an accessible way for foreign students. Finding the answer in a defining vocabulary, LDOCE1 (1978) developed the Longman Defining Vocabulary. It is based on West’s General Service List of English Words and other lists and sources (ix). The vocabulary is listed in the back matter: ‘List of words used in the dictionary’ (1283–8). Those words outside the defining vocabulary were given in capitals in definitions, so that the user can check them.26 Summers (LDOCE2, F8) reports, on the basis of their international user research: the use of the 2,000 word Longman Defining Vocabulary is the single most helpful feature. Jackson (2002: 130) counts this feature as ‘the most significant’ among a number of the improvements and innovations introduced by the dictionary. The idea of a defining vocabulary was copied by other dictionaries (CIDE [1995], OALD6 [2000], MED1 [2002], COBUILD6 [2009] and their ensuing editions) and their own versions were used. It is true that a defining vocabulary significantly contributes to lowering the user’s psychological barrier to confronting all L2 dictionary texts and practically eliminates many inconveniences of having to go for a second semantic search resulting from the first. In fact, Herbst (1986) reports that students rated LDOCE1 as the most comprehensible. Certainly, the definitions of LDOCE2 often come across as more approachable than those of OALD4, which involves difficult vocabulary items without being restricted by a defining vocabulary (compare the entries for dare, for example). Svensén (1993: 137) points out that a defining vocabulary allows a systematic description of meaning, benefitting both users and lexicographers. The use of a defining vocabulary has the advantage that one can verify that a concept with a complex content is in fact being defined by means of less complex concepts. One can also define related concepts more consistently: the user can be sure that the concept x is always represented in definitions by the word y, and conversely that the concept x is meant whenever the word y is used. Also, Quirk reports that a restriction in definition language actually breathed new life into semantic analysis: the strict use of the defining vocabulary has in many cases resulted in a fresh and revealing semantic analysis (LDOCE1, ‘Preface’, vii).
178
Monolingual Learners’ Dictionaries
On the other hand, several shortcomings are suggested for both dictionary users and makers. Kawamura (2009: 87–9) identifies six difficulties: 1. 2. 3. 4. 5. 6.
Inclusion of lexical items beyond the expected proficiency of EFL dictionary users Lengthy definitions Unnatural definitions27 Senses to be used are not controlled28 Actual size greater than advertised29 Actual use of defining vocabulary is unclear30
Importantly, a problem of accuracy is raised (Fox 1989: 155, Allen 1996: 47, etc.). As an example, Svensén (1993: 137) cites the definition of cataract from LDOCE2: ‘a diseased growth on the eye causing a gradual loss of sight’. Since LDOCE1, efforts have been made not to detract from lucidity in the execution of the defining vocabulary: a rigorous set of principles was established to ensure that only the most ‘central’ meanings of these 2000 words, and only easily understood derivatives, were used (ix). On the other hand, Rundell (1998: 319) describes the nature of the defining vocabulary and those who work within it: ‘Inevitably, the high-frequency words that make up any DV list are often highly polysemous, and lexicographers have not always resisted the temptation to use such words in non-central or (worse) idiomatic meanings.’ Fox sees a defining vocabulary as putting ‘ “arbitrary” constraints on lexicographers’ freedom to define’ (Fox 1989: 155). Béjoint (1994: 69) argues that ‘the use of a restricted vocabulary blocks the chain of definition’ and that ‘[t]his is clearly a case of conflict between the dictionary as a quick reference tool and as an instrument for self-teaching’.
3.5 Defining style As in the native speaker’s dictionary, in the EFL dictionary the definition used to be made using a phrase and in a form substitutable for the definiendum. The selectional restrictions and possible objects, which are outside the semantic scope of the definiendum, are marked off by parentheses. When the usage is provided (rather than a definition), it is also given in parentheses, as often the case with functional words:
4 (that is) part in relation to (a whole or all) a (after expressions of quantity): 2 pounds of sugar | 2 miles of bad road | much of the night … (s.v. ‘of’ in LDOCE1) 17 (after nouns related to verbs): a lover of music (=someone who loves … ) (ibid.)
It cannot be denied, however, that an effort to pack much information into a limited space involved special grammar and dictionary-ese and produced some difficult-to-understand definitions. In response, the full-sentence definition (FSD), initiated by COBUILD1, has spread among EFL dictionaries.31 Sounding as if the teacher is talking to students in the classroom setting, the definition comes across as approachable. Basically, no prior knowledge or special training is required to comprehend FSDs. Another plus is that the FSD is informative, essentially providing contextual information together with semantic information. The FSD partly takes on the role
179
The Bloomsbury Handbook of Lexicography
assumed by the example. Also the FSD can incorporate extra information with a degree of flexibility not allowed for by the traditional phrase definition. This innovative defining method meets with several criticisms. A typical FSD comprises two clauses: the subordinate clause, often beginning with if, and the main clause. The if-clause tries to describe the context in which the definiendum occurs and the main clause deals with the meaning.32 The FSD works very well for syntactically simple items with a distinct meaning (e.g. ‘familiar to’ and ‘familiar with’ at Senses 1 and 2 of familiar). However, this approach involves several problems. Rundell (2006a: 330–1) speaks of ‘overspecification’ as an intrinsic weakness: ‘the requirement of specifying lexical and syntactic environments often leads to defining statements which appear to exclude a wide range of completely regular behaviours’ (ibid. 331). He observes that COBUILD3’s definition of cheat incorporates only the pattern of ‘cheat someone out of something’ but excludes ‘cheat someone of something’ and ‘cheat someone’, which are equally frequent. Rundell also discusses the problem of ‘increased complexity’ of the supposedly accessible defining style. He warns of the pitfall of ‘go[ing] from the frying pan of unpacking a dense, formulaic definition to the fire of processing something two or three times longer’, citing the definition of retreat 5.4 in COBUILD1 (2006a: 328). The natural prose form, which may be good for understanding, can have detrimental effects on actual consultation – quick reference and information retrieval. While the traditional phrase definition concentrated on meaning, giving it in the substitutable form for the definiendum, the FDS attempts to incorporate not only semantic but also contextual and other information without discrimination. Compare the following pairs of definitions of erudite and gastric: erudite … adj fml (of a person or book) full of learning; SCHOLARLY (LDOCE2) If you describe someone as erudite, you mean that they have or show great academic knowledge. You can also use erudite to describe something such as a book or a style of writing; a formal word (COBUILD2, emphasis added) gastric … adj [attrib] (medical) of the stomach
(OALD4)
You use gastric to describe processes, pain, or illness that occur in someone’s stomach; a medical term. (COBUILD2, emphasis added) While LDOCE2’s definition of erudite marks off the selectional restriction with the use of parentheses from the rest of the text that deals with the meaning of the headword, that of COBUILD2 does not.33 Furthermore, in this dictionary’s definition of gastric the semantic information is deferred until the end. There is a danger that those users seeking semantic information only are distracted and put off retrieving the relevant information. Another problem with the prose style is that the FDS blurs the boundary between metalanguage and language, inflicting the distinction onto the users. For example, look at the entry 180
Monolingual Learners’ Dictionaries
of truncated in COBUILD2. For non-native users, there is no knowing that ‘a truncated version of … ’ can be used in their actual production of English until they get to the first example that includes the phrase:34 A truncated version of something is one that has been shortened. The review body has produced a truncated version of its annual report. …
3.6 Examples The value of examples in EFL dictionaries is enormous. They help users in their reception and production of English texts and also in the consultation of the dictionary. Roughly, there are two types of example: made-up and corpus-derived. The traditional madeup examples are invented by lexicographers to suit particular dictionary purposes. The examples are conveniently tailored to be succinct, multipurpose, contrastive and self-contained. Fox, then at COBUILD, doubts the native speaker’s ability to produce natural examples. Citing ‘ … saluted his friend with a wave of his hand’ as a slightly contrived example to illustrate salute in the sense of ‘greet someone’, she argues that ‘we cannot trust native speakers to invent sentences except in a proper communicative context’ (Fox 1987: 143–4). COBUILD takes a distinctive approach, with almost all examples directly taken from the corpus.35 Sinclair asserts that ‘usage cannot be invented, it can only be recorded’ (COBUILD1, xv). He dismisses invented examples as follows: ‘invented examples are really part of the explanations. … They give no reliable guide to composition in English and would be very misleading if applied to that task. They do not say “This is how the word is used” ’ (ibid.). Fox challenges the self-containedness and excessive informativeness of an invented example. The COBUILD editor takes a global view of examples: The necessity of examples to fit into coherent text is important because language is not a series of isolated sentences, and students should not be encouraged to think that it is. We should be much more aware than we have been in the past of the pitfalls of giving these fully-formed isolated sentences as examples. (Fox 1987: 141) Fox warns of a possible danger of providing students with model sentences that do not fit naturally into a flow of actual discourse by offering invented examples, grammatically well formed and with far too much information (ibid. 141–2). However, COBUILD’s authentic, lexically challenging, open-ended examples in turn come under criticism. Hausmann and Gorbahn (1989: 45) criticize the excessively corpus-dependent examples as ‘not didactically oriented’ on the following grounds: a) strange and demanding, read out of context b) distracting (complex) c) idiosyncratic d) distracting (abstract and lengthy)36 e) circulatory f) not informative g) dangerous (leading to the production of unnatural English) 181
The Bloomsbury Handbook of Lexicography
Pointing out that it is not a matter of ‘a simple choice between the authentic and the invented’, Rundell (1998: 334–5) goes on to argue in favour of corpora as primary sources of examples: ‘Most lexicographers would probably now agree that, where the corpus provides natural and typical examples that clearly illustrate the points that need to be made, there is no conceivable reason for not using them.’ In fact, since COBUILD1, corpus-basis has become the accepted norm among EFL dictionaries, with varying degrees of editing.
3.7 Efforts to go beyond the alphabetical list General EFL dictionaries are in the alphabetical format. Since the dictionaries are intended not only for reception but also for production from the outset (see Section 2.1), efforts are made to teach the user related lexical items beyond the alphabetical list by means of such devices as cross-references, indications of related lexical items, glosses, usage notes, pictorial illustrations, and graphics. In line with ever more importance attached to production and vocabulary building, the recent decades have seen the following innovations among others. RHWDAE provided ‘Related Words’ notes. They succinctly deal with word families, with explanations of parts-of-speech and examples. LDOCE4 incorporated ‘Word Choice’ and ‘Word Focus’ boxes to deal with related words.37 The former offer help with usage of closely related words with occasional warning notes (based on the Longman Learner’s Corpus of over ten million words) (xv). The latter is intended for building vocabulary and for reminding the user of a word that he/she may have forgotten (xvii). For example, the one at airport provides a story of ‘what you do at the airport’, familiarizing the user with these important words highlighted in the story: terminal, check-in desk, passport control, security, departure lounge, departure gate, runway, baggage reclaim, customs, immigration, arrivals. Encyclopaedically oriented, COBUILD7’s ‘Word Webs’ articles also present related vocabulary in a context: ‘Word Webs’ present topic-related vocabulary through encyclopaedia-like readings combined with stunning art [pictures and graphics], creating opportunities for deeper understanding of the language and concepts (viii).38 An additional advantage of these articles by LDOCE4 and COBUILD7 is that they can indicate important collocations of the highlighted words and other items (‘arrive at the airport, go into the terminal building, check in for your flight at the check-in desk’, etc. [WORD FOCUS: AIRPORT, LDOCE4]). LDOCE5 introduced ‘Register’ notes on the following grounds: ‘Being aware of the different register of closely related words and phrases is a common problem for learners of English’ (ix). Dohi et al. (2010: 23–125) point out that there are 397 such notes and that this kind of information had already been provided in LLA1 and LEAs, comparing the notes at alone in LEA2 (‘Formal or informal’ note) and LDOCE5, both dealing with on your own, by yourself and alone.39 OALD9’s new features include ‘Wordfinder’ and ‘More like it’ notes to deal with related expressions. The former offers the lists of words associated with a concept and the latter those of lexical items behaving in a similar way (e.g. attributive-only adjectives). There are 165 Wordfinder notes, each including about ten words (Dohi et al. 2017: 27). For example, the one at apply (verb, sense 1) lists the following words: appoint, candidate, CV, experience,
182
Monolingual Learners’ Dictionaries
interview, job description, qualification, reference and shortlist. It can be noted that the notes are introduced in the same spirit as LDOCE4’s ‘Word Focus’ boxes: Wordfinder notes help you to find words that you don’t know or have forgotten. They suggest entries that you can look up to find vocabulary related to the headword (OALD9, viii). In OALD9, the 36 ‘More like it’ notes were presented in the back matter (R14–16) to which cross-references were provided from the entry as below:
Figure 11.1 Entry on Wordfinder.
In OALD10, the ‘More like this’ notes are only found in the online and app, presented in the fold/collapse manner, with each item hyperlinked to its entry. It can be assumed that the Wordfinder and ‘More like it’ notes were devised to be suited for the electronic platform.
4 Implications of the digital medium The digital medium can solve many problems associated with print dictionaries and can provide new dimensions for dictionary presentation and consultation (see Nielsen, Chapter 23). The medium virtually removes space constraints. More information can be accommodated. However, it should be remembered that the dictionary remains a tool for quick reference. Information should be hierarchically arranged or options should be incorporated, enabling the user to exercise a quick shift from one mode to another, e.g. from the definition within a defining vocabulary to one without.40 The e-dictionary alleviates the reference burden in various ways and can even guide the reference in the right direction in a way that the print dictionary never can. If an EFL student consults a dictionary for the meaning of ‘going forward’ below, they will no doubt enter the base form ‘go forward’: ‘So if we’re going to be serious about race going forward, we need to uphold laws against discrimination … (Obama’s Farewell Address, emphasis added). However, if they do so with Cambridge Dictionaries Online, it will tell them that they should have typed in ‘going forward’ as it is by showing the phrase on top of the list of ten candidates, ‘Search suggestions for go forward’41 (Yamada 2019). As an emerging trend, Rundell (2012b) refers to the ‘disappearing dictionary’ with a double meaning. In the first sense of physical disappearance, he mentions that dictionaries are embedded on other sites and in other devices, citing the following four examples: Kindle;42 news websites; widgets, double-click tools (Macmillan and other dictionary publishers); and British Council’s TeachingEnglish site (which included ‘Cambridge Dictionary’). In the second sense, Rundell 183
The Bloomsbury Handbook of Lexicography
means that dictionaries are in danger of disappearing in the face of these three kinds of alternative resources: User forums (Word Reference, TeachingEnglish, etc.), Translation tools (e.g. Google Translate, www.linguee.com, Eijiro Pro43) and ‘Text remediation’ tools/text analysers (paste in your text, system corrects errors, offers suggestions). With reference to the last, Rundell says that many are under development but warned that this is not trivial (ibid.).
5 Conclusion Generally, EFL dictionaries can be said to have done a good job of reconciling irreconcilables by ‘striking a balance between several conflicting requirements’ and under ‘commercial pressures’ (Cowie 1990: 676). This is the evaluation of the print dictionaries of the first fifty years of their history. Their development saw a historical turning point in the mid-1990s. The dictionaries came to resemble each other more closely than ever before, incorporating the features for frequency information and user-friendliness. There were incremental improvements but without any major breakthroughs. New and extra information and features are provided only in the online and app dictionaries. In this sense, it is legitimate to say that the conventional print EFL dictionary has reached full maturity, more or less. The ‘dictionary’ may be at another important turning point. The digital media liberated the dictionary from constraints imposed by the print media: space, time, function, use and makeruser relation. Digital and print dictionaries should be differentiated. The former can be fallouts of the latter. In whatever the shape, the digital ‘dictionary’ should develop, seeking to exploit its full potential. Innovation could come from nowhere. The print dictionary should re-examine its raison d’être. It is time to go back to basics: the dictionary should re-arrange its structure and rather cluttered contents. It should be more focused, placing more emphasis on learning and vocabulary building. There should be innovative and efficient mechanisms too for lexicography to continue, including theory, education and finance.
Notes 1 The author would like to express his gratitude to Professor Kazuo Dohi for his valuable comments in the preparation of the manuscript. 2 For the abbreviations of dictionaries, please refer to ‘Dictionaries cited and their abbreviations’ under ‘References’. 3 Simplicities (e.g. layout with Extra Columns, FSDs) and complexities (e.g. frequency-based, POSmixed entry structure; authentic examples; sophisticated pronunciation indication) co-existed in COBUILD1. 4 The unique one-entry-per-one-core-meaning structure of the dictionary also necessitated the Phrase Index. 5 In this connection, compare the grammar indications and examples of the consecutive editions of each dictionary in Table 11.2 and refer to the observation of COBUILD’s menus by Kokawa et al. (2020) (3.3 Signposts and menus). 6 This table shows the last editions of EFL dictionaries coming with CD/DVD-ROMs and those with PIN codes.
184
Monolingual Learners’ Dictionaries
OALD7 (2005): w/ CD-ROM
OALD8 (2010): w/ CD-ROM w/ Oxford iWriter
OALD9 (2015): last version w/ DVD-ROM & access code to premium site OALD10 (2020): w/ access code to premium site & app
LDOCE5 (2008): last version w/ DVD-ROM
LDOCE6 (2014)
CALD2 (2005): ‘CD-ROM Dictionary and thesaurus in one’
COBUILD5 (2006): last version w/ CD-ROM
CALD3 (2008): w/ CD-ROM
COBUILD6 (2009): w/ access code to online dictionary
CALD4 (2013): w/ CD-ROM
COBUILD7 (2012): w/ access code to mobile dictionary COBUILD8 (2014)
MED2 (2006): last version w/ CD-ROM
MWALED1 (2008)
MWALED2 (2016)
COBUILD9 (2018)
7 Other EFL dictionary publishers did not follow suit. 8 The online version of CALD also offers a thesaurus. At collinsdictionary.com, which includes COBUILD, the information in Collins English Thesaurus can be accessed. 9 If you look up ‘dislocation’ in Cambridge Dictionaries Online, the relevant information from Cambridge Business English Dictionary is shown below the entries from Cambridge Advanced Learner’s Dictionary & Thesaurus: ‘But the next wave of economic dislocations won’t come from overseas’ (Obama’s Farewell Address, emphasis added) (Yamada 2019). 10 You can purchase access to the premium online and the app. The print version comes with the access code which entitles the purchaser to use both products for four years. 11 The app allows full-text search and voice input. 12 This benefits the user’s receptive needs – entries and senses are organized in order of the likelihood with which the users encounter them in reading. However, Yamada (2010: 156–7) points out the inconsistency brought about by frequency-based sense arrangement – it may or may not correspond to the sense development of a word. Worse, there are cases where only the frequent figurative sense is entered without the not-frequent-enough original sense being included, when the latter will help a user understand and memorize the former (compare OALD4 and LDOCE3 for the treatment of linchpin). Although there are cases where the etymological information compensates, clues should ideally be provided for the user to know the reason for non-entry: low frequency or non-existence. 13 In English-Japanese dictionaries, the indication of relative importance of headwords goes back to Standard EJD (1929). The most frequent 10,000 words were indicated with numbers 1 to 10 (the most
185
The Bloomsbury Handbook of Lexicography
frequent 1,000 words as 1) on the basis of Thorndike’s (1921) and supplementarily Horn’s wordlists (1926) (Dohi 1999: 54–5). 14 It is regrettable that the criteria for identifying such senses are stated nowhere in the dictionary (Yamada et al. 2012: 23), let alone relative frequencies among the key senses. 15 Based on corpus data and recommendation of teachers and academic advisors, CALD2 (2005) used a three-way labelling to indicate the relative importance of words, meanings, and phrases: E (Essential, 4,900 meanings), I (Improver, 3,300) and A (Advanced, 3,700) (Introduction, vii). 16 COBUILD6 includes the Academic Word List in the back matter but ‘strangely enough, fails to acknowledge Coxhead’s AWL and shows no label in the related entries’ (Dohi et al. 2010: 174). 17 https://www.oxfordlearnersdictionaries.com/about/wordlists/opal [accessed 31 August 2020]. 18 COBUILD8 uniquely treats overly used words. Its ‘Visual thesaurus’ in the back matter (1883–1911) ‘focuses on the 50 most over-used words in English and gives you alternatives to help you develop fluency and creativity in your use of English’ (1883). An example is provided for each synonym. 19 This dictionary indicates the pattern of ‘VERB × DIRECT OBJECT × “TO” × INFINITIVE’ as ‘V.P. 17.’ 20 Cowie (1990: 688) praises the coding system as ‘impressively systematic’ because the system consistently assigns ‘3’ to an infinitive construction, for example. 21 Bejóint (1981: 16, 19) points out that grammar codes are underutilized, discovering that a disappointing 55 per cent of the university students under survey did not use the codes at all. 22 LDOCE2 was the first among EFL dictionaries to place a grammatical pattern in front of the example for instant recognition. Lighthouse English-Japanese Dictionary (1/e, 1984) had already given a sentence pattern to the example, the pattern following the example. 23 The comments on OALD4’s grammar codes by Takahashi et al. (1992: 198) are noteworthy: ‘The new codes, which are far easier to remember, not only show how each of the patterns is composed, but also distinguish between sentences which superficially look the same.’ 24 Longman uses ‘signposts’, Cambridge ‘guide words’ and Oxford ‘short cuts’. In this chapter, ‘signpost’ is used as an umbrella term. 25 Herbst (1996: 350) criticizes this arrangement as ‘inconsistent (and aesthetically disturbing)’ because the signposts (bold in capitals, sandwiched with black triangles) are mixed with the phrases (heading in bold) at the beginning of each sub-entry. In response, the subsequent editions introduced colour printing: signposts highlighted in blue (4/e, 2003) and in white against the blue background (5/e, 2009). 26 In LDOCE1, all examples are written in its defining vocabulary (ix). 27 Hanks (1987: 119) states COBUILD1’s stance: ‘No attempt was made to set up a “restricted defining vocabulary” of a fixed number of words. Such vocabularies are a potential source of distortion, especially if they are not accompanied by equally strict controls on the meanings of each word used and the syntactic structures in which they are used.’ 28 Since OALD7, the words in the Oxford 3000 have been controlled in terms of their senses. A word used in a less frequent sense is capitalized and its sense is identified (Komuro et al. 2006: 82–3, Yamada et al. 2012: 21). 29 Electronically scrutinizing all definition data sets, Ishii (2011: 182) reveals that the publicized sizes of DVs are (much) greater than the actual ones: COBUILD6
LDOCE5
MED2
OALD8
Publicized size
2,500
‘around 2,000’ + 30 affixes
‘under 2,500’
3,000 + Language Study Terms
Actual size
about 3,200
NA (about 2,100 + 30 affixes)
about 2,500
about 3,700
*Adapted from Ishii (2011: 182).
He calculates the actual size of CIDE’s DV at more than 3,700, ascribing the huge discrepancy from the proclaimed 2,000 to the fact that derivatives of basic words are not counted in (Ishii 2011: 82).
186
Monolingual Learners’ Dictionaries
30 The widespread rule of printing the words outside the defining vocabulary in small capitals is not strictly observed. For example, LDOCE3 makes proper nouns exceptions (B12) and OALD5 colour terms (1417). The former also does not use small capitals for the words whose entry and definition are very close by (B12). Ishii (2009: 122) points out that OALD7 uses phrasal verbs outside the Oxford 3000 in its definitions without rendering them in small capitals. 31 Only the COBUILD series of dictionaries employs the FSD universally. 32 This should be taught first. If students try to understand the if-definition on the basis of its translation into their L1, the subordinate clause will fail them. 33 While Hanks (1987: 116) criticizes the use of parentheses to indicate selectional restrictions in the conventional definition, the user survey conducted by Wehmeier (2000) reveals that users actually are not bothered by parentheses. 34 The repetition of the phrase is also a problem. 35 ‘Very minor changes’ have been made to citations ‘in order to remove unnecessary distracting information’; ‘Only on very rare occasions have we composed an example because there is no suitable one in the corpus’ (COBUILD1, xv). 36 Hausmann and Gorbahn (ibid.) cite the following example to support this point: To have access to the truth and so to pass beyond the region of mere opinion is to take great risks (s.v. ‘region’, sense 3). 37 These notes were integrated into GRAMMAR and THESAURUS in LDOCE5 (Dohi et al. 2010: 137). 38 ‘Word Webs’ were replaced with ‘Vocabulary in Contexts’ in COBUILD10 (Kokawa et al. 2020: 73). 39 Dohi et al. (2010: 177) note that this type of information had already been shown in CULD (1980), taking the ‘formal’ label for example: Some words which are not particularly formal but which have a less formal, more commonly used equivalent have been labelled (less formal than), e.g. acquire is labelled (more formal than get) (CULD, xii). 40 It is high time the defining vocabulary was abolished and that users were exposed to natural English, reading dictionary definitions (Yamada 2010: 162). The electronic dictionary is equipped with a function of double-clicking an unknown word to check its meaning. Japan’s hand-held electronic dictionary offers a similar instant check with an included English-Japanese dictionary. (As for its advantages in the print age) not only does the defining vocabulary severely reduce the lexicographer’s defining power but it can also give the user inappropriate input. Svensén (1993: 137–8) points out that the use of a defining vocabulary can produce the kind of definition that few teachers want their students to emulate (e.g. malnutrition and manic depressive in LDOCE2). 41 MED Online provided the following crowdsourced information: forward ADVERB MAINLY SPOKEN • From our crowdsourced Open Dictionary going forward used to say ‘in the future’ or ‘from now on’ in a way that is often redundant because you are talking about the future anyway We’ve benefited from having him and we’d like to do that going forward. Your organization needs to define a strategy for using the web going forward. Submitted from United Kingdom on 18/11/2016 42 We are pleased to announce that the full dictionary text is available for Kindle and online, so that COBUILD is always available to you, wherever you are. Set it as your default Kindle dictionary, or just go to www.collinsdictionary.com/cobuild (COBUILD8, Introduction, xi). 43 Eijiro is a huge bilingual database developed by translators, which can be accessed either from English or from Japanese. Eijiro on the WEB Pro is the fee-charging version, including 2.2 million English headwords with their Japanese translations, 1.4 million English examples with their Japanese translations, and 3.87 million Japanese headwords with their English translations (as of 8 July 2020), https://eowp.blogspot.com/search/label/%E3%83%87%E3%83%BC%E3%82%BF%E6%9B%B4%E6 %96%B0 [accessed 31 August 2020]. 187
The Bloomsbury Handbook of Lexicography
References Dictionaries cited and their abbreviations The Advanced Learner’s Dictionary of Current English, Second edition (ALD2) (1963), Eds. A.S. Hornby, E.V. Gatenby and H. Wakefield, London: Oxford University Press. Cambridge Academic Content Dictionary (CACD) (2009), Ed. P. Heacock Cambridge: Cambridge University Press. Cambridge Advanced Learner’s Dictionary, Second edition (CALD2) (2005), Ed. E. Walter, Cambridge: Cambridge University Press. Cambridge Advanced Learner’s Dictionary, Third edition (CALD3) (2008), Ed. E. Walter, Cambridge: Cambridge University Press. Cambridge International Dictionary of English (CIDE) (1995), Ed. P. Procter, Cambridge: Cambridge University Press. Chamber’s Universal Learners’ Dictionary, International Students’ edition (CULD) (1980), Ed. M.E. Kirkpatrick, Edinburgh: Chambers. Collins COBUILD Advanced Learner’s English Dictionary, Fourth edition (COBUILD4) (2003), Ed. J. Sinclair, Glasgow: HarperCollins. Collins COBUILD Advanced Learner’s English Dictionary, Fifth edition (COBUILD5) (2006), Ed. J. Sinclair, Glasgow: HarperCollins. Collins COBUILD Advanced Dictionary of English, Sixth edition (COBUILD6) (2009), Ed. J. Sinclair, Glasgow: HarperCollins/Boston: Heinle Cengage Learning. Collins COBUILD Advanced Dictionary of English, Seventh edition (COBUILD7) (2012), Ed. J. Sinclair, Boston: National Geographic Learning/Heinle Cengage Learning. Collins COBUILD Advanced Learner’s Dictionary of English, Eighth edition (COBUILD8) (2014), Ed. J. Sinclair, Glasgow: HarperCollins. Collins COBUILD Advanced Learner’s Dictionary of English, Ninth edition (COBUILD9) (2018), Ed. J. Sinclair, Glasgow: HarperCollins. Collins COBUILD English Dictionary, Second edition (COBUILD2) (1995), Ed. J. Sinclair, London: HarperCollins. Collins COBUILD English Dictionary for Advanced Learners, Third edition (COBUILD3) (2000), Ed. J. Sinclair, London: HarperCollins. Collins COBUILD English Language Dictionary, First edition (COBUILD1) (1987), Ed. J. Sinclair, London and Glasgow: Collins. Everyman’s English Pronouncing Dictionary (EPD) (1917), D. Jones, London: Dent. A Grammar of English Words (GEW) (1938), Ed. H.E. Palmer, London and Harlow: Longmans Green. Idiomatic and Syntactic English Dictionary (ISED) (1942), Eds. A.S. Hornby, E.V. Gatenby and H. Wakefield, Tokyo: Kaitakusha. Lighthouse English-Japanese Dictionary, First edition (1984), Eds. S. Takebayashi and Y. Kojima, Tokyo: Kenkyusha. Longman Dictionary of Contemporary English, First edition (LDOCE1) (1978), Ed. P. Procter, Harlow: Longman. Longman Dictionary of Contemporary English, Second edition (LDOCE2) (1987), Ed. D. Summers, Harlow: Longman. Longman Dictionary of Contemporary English, Third edition (LDOCE3) (1995), Ed. D. Summers, Harlow: Longman. Longman Dictionary of Contemporary English, Fourth edition (LDOCE4) (2003), Ed. D. Summers, Harlow: Pearson Education. Longman Dictionary of Contemporary English, Fifth edition (LDOCE5) (2009), Ed. M. Mayor, Harlow: Pearson Education.
188
Monolingual Learners’ Dictionaries
Longman Essential Activator, First edition (LEA1) (1997), Ed. D. Summers, Harlow: Addison Wesley Longman. Longman Essential Activator, Second edition (LEA2) (2006), Ed. D. Summers, Harlow: Pearson Education. Longman Exams Dictionary (LED) (2006), Ed. D. Summers, Harlow: Pearson Education. Longman Language Activator, First edition (LLA1) (1993), Ed. D. Summers, Harlow: Pearson Education. Longman Study Dictionary of American English (LSDAE) (2006), Ed. D. Summers, Harlow: Pearson Education. Macmillan English Dictionary, First edition (MED1) (2002), Ed. M. Rundell, Oxford: Macmillan. Macmillan English Dictionary, Second edition (MED2) (2006), Ed. M. Rundell, Oxford: Macmillan. Macmillan School Dictionary (MSD) (2004), Ed. M. Rundell, Oxford: Macmillan. Merriam-Webster’s Advanced Learner’s English Dictionary, First edition (MWALED1) (2008), Ed. S.J. Perrault, Springfield: Merriam-Webster. Merriam-Webster’s Advanced Learner’s English Dictionary, Second edition (MWALED2) (2016), Ed. S.J. Perrault, Springfield: Merriam-Webster. The New Method English Dictionary (NMED) (1935), Eds M.P. West and J.G. Endicott, London: Longmans Green. Oxford Advanced Learner’s Dictionary of Current English, Third edition (OALD3) (1974), Ed. A.S. Hornby, Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, Fourth edition (OALD4) (1989), Ed. A.P. Cowie, Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, Fifth edition (OALD5) (1995), Ed. J. Crowther, Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, Sixth edition (OALD6) (2000), Ed. S. Wehmeier, Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, Seventh edition (OALD7) (2005), Ed. S. Wehmeier, Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, Eighth edition (OALD8) (2010), Ed. J. Turnbull, Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, Ninth edition (OALD9) (2015), Eds M. Deuter, J. Bradbery and J. Turnbull, Oxford: Oxford University Press. Oxford Advanced Learner’s Dictionary of Current English, Tenth edition (OALD10) (2020), Eds D. Lea and J. Bradbery, Oxford: Oxford University Press. Oxford Learner’s Dictionary of Academic English (OLDAE) (2014), Ed. D. Lea, Oxford: Oxford University Press. Random House Webster’s Dictionary of American English (RHWDAE) (1997), Ed. G.M. Dalgish, New York: Random House. Sutandado Ei-wa Jiten [Standard English-Japanese Dictionary] (Standard EJD) (1929), Ed. T. Takehara, Tokyo: Taishukan.
Other references Akasu, K. et al. (1996), ‘An analysis of Cambridge International Dictionary of English’, Lexicon 26, Tokyo: Iwasaki Linguistic Circle, 3–76. Allen, R. (1996), ‘The year of the dictionaries’, English Today 46, Vol. 12 (2), 41–7. Béjoint, H. (1981), ‘The foreign student’s use of monolingual English dictionaries: A study of language needs and reference skills’, Applied Linguistics 2, 207–22. Béjoint, H. (1994), Tradition and Innovation in Modern English Dictionaries, Oxford: Clarendon Press. Bogaards, P. (1998), ‘Scanning long entries in learner’s dictionaries’ in T. Fontenelle et al. (eds), Actes EURALEX ’98 Proceedings, Vol. 2, Liège: English and Dutch Department, University of Liège, 555–63.
189
The Bloomsbury Handbook of Lexicography
Cambridge Advanced Learner’s Dictionary, Fourth edition (CALD4) (2013), Ed. C. McIntosh. Cambridge: Cambridge University Press. Cowie, A.P. (1981), ‘Lexicography and its pedagogic applications: An introduction’, Applied Linguistics 2, 203–6. Cowie, A.P. (1990), ‘Language as words: Lexicography’ in N.E. Collinge (ed.), An Encyclopedia of Language, London and New York: Routledge, Ch. 19, 671–700. Cowie, A.P. (1999), English Dictionaries for Foreign Learners: A History, Oxford: Oxford University Press. Coxhead, A. (2000), ‘A new academic word list’, TESOL Quarterly, 34 (2), 213–38. Dohi, K. (1999), ‘Thorndike to Amerika no Gakushu Jiten’ [Thorndike and Learner’s Dictionaries in the USA], Toyoko English Studies No. 8, Tokyo: Toyoko Gakuen Women’s College, 17–77. Dohi, K. et al. (2010), ‘An Analysis of Longman Dictionary of Contemporary English, Fifth Edition’, Lexicon 40, Tokyo: Iwasaki Linguistic Circle, 85–187. Fox, G. (1987), ‘The case for examples’ in J.McH. Sinclair (ed.), Ch. 7, 137–49. Fox, G. (1989), ‘A vocabulary for writing dictionaries’ in M.L. Tickoo (ed.), Learners’ Dictionaries: State of the Art, Singapore: SEAMEO Regional Language Centre, 153–71. Hanks, P. (1987), ‘Definitions and explanations’ in J.McH. Sinclair (ed.), Ch. 6, 116–36. Hartmann, R.R.K. (2001), Teaching and Researching Lexicography, Harlow: Pearson Education. Hausmann, F-J. and A. Gorbahn (1989), ‘COBUILD and LDOCE II: A comparative review’, International Journal of Lexicography 2 (1), 44–56. Herbst, T. (1986), ‘Defining with a controlled defining vocabulary in foreign learners’ dictionaries’, Lexicographica 2, 101–19. Herbst, T. (1996), ‘On the way to the perfect learners’ dictionary: A first comparison of OALD5, LDOCE3, COBUILD2 and CIDE’, International Journal of Lexicography 9 (4), 321–57. Horn, E. (1926), A Basic Writing Vocabulary: 10,000 Words Most Commonly Used in Writing, University of Iowa. Ichikawa, Y. et al. (2005), ‘An analysis of Longman Dictionary of Contemporary English, Fourth Edition’, Lexicon 35, Tokyo: Iwasaki Linguistic Circle, 1–126. Imura, M. (1997), Palmer to Nihon no Eigokyoiku [Harold E. Palmer and Teaching English in Japan], Tokyo: Taishukan. Ishii, Y. (2009), ‘Making a list of essential phrasal verbs based on large corpora and phrasal verb dictionaries’ in Y. Kawaguchi et al. (eds), Corpus Analysis and Variation in Linguistics, Amsterdam/ Philadelphia: John Benjamins, 121–40. Ishii, Y. (2011), ‘Comparing the vocabulary sets used in the “Big Five” English monolingual dictionaries for advanced EFL learners’ in K. Akasu and U. Satoru (eds), ASIALEX 2011 Proceedings. Lexicography: Theoretical and Practical Perspectives. Papers Submitted to the Seventh ASIALEX Biennial International Conference, Kyoto Terrsa, Kyoto, Japan. 22–24 August 2011, 180–9. Jackson, H. (2002), Lexicography: An Introduction, London: Routledge. Kawamura, A. (2009), ‘Teigigoi Saiko’ [The Defining Vocabulary Revisited], Shakai Inobeshon Kenkyu 4 (1), Faculty of Social Innovation, Seijo University, 87–98. Kojima, Y. et al. (1989), ‘An analysis of Collins COBUILD English Language Dictionary’, Lexicon 18, Tokyo: Iwasaki Linguistic Circle, 39–158. Kokawa, T. et al (2020), ‘An analysis of Collins COBUILD Advanced Learner’s Dictionary of English, Ninth Edition’, Lexicon 50, Tokyo: Iwasaki Linguistic Circle, 42–108. Komuro, Y. et al. (2006), ‘An analysis of the Oxford Advanced Learner’s Dictionary of Current English, Seventh Edition, with Special Reference to the CD-ROM’, Lexicon 36, Tokyo: Iwasaki Linguistic Circle, 55–146. McArthur, T. (ed.) (1992), The Oxford Companion to the English Language, Oxford: Oxford University Press. Masuda, H. et al. (2003), ‘An analysis of Collins COBUILD English Dictionary for Advanced Learners, Third Edition’, Lexicon 33, Tokyo: Iwasaki Linguistic Circle, 1–173.
190
Monolingual Learners’ Dictionaries
Naganuma, K. (1978), ‘The history of the Advanced Learner’s Dictionary: A. S. Hornby, ISED, and Kaitakusha, Tokyo’ in P. Strevens (ed.), In Honour of A. S. Hornby, Oxford: Oxford University Press, 11–13. Nation, I.S.P. (2001), Learning Vocabulary in Another Language, Cambridge: Cambridge University Press. Rundell, M. (1998), ‘Recent trends in pedagogical lexicography’, International Journal of Lexicography 11 (4), 315–42. Rundell, M. (2006a), ‘More than one way to skin a cat: Why full sentence definitions have not been universally adopted’ in E. Corino et al. (eds), Proceedings of the XII Euralex International Congress, 2006, Torino: Edizioni dell’Orso, 323–38. Rundell, M. (2006b), ‘Learners’ dictionaries’ in K. Brown (ed.), Elsevier Encyclopedia of Language and Linguistics, Second edition, Volume 6, 739–43. Rundell, M. (2012a), ‘Stop the presses – the end of the printed dictionary’, Macmillan dictionaryblog. http://www.macmillandictionaryblog.com/bye-print-dictionary [accessed 31 August 2020]. Rundell, M. (2012b), ‘What will happen to dictionaries in the online world?’, The 19th Study Meeting of the Japan Society of English Usage and Style, 8 December 2012, Nihon University, Tokyo. Rundell, M. (2013), ‘What will happen to dictionaries in the online world?’, The Bulletin of the Japan Society of English Usage and Style 62 (3). Scholfield, P. (1996), ‘Why shouldn’t monolingual dictionaries be as easy to use as bilingual ones?’, Longman Language Review, Issue Number Two. Sinclair, J. McH. (ed.) (1987), Looking Up, London: Collins ELT. Svensén, B. (1993), Practical Lexicography, Oxford: Oxford University Press. Takahashi, K. et al. (1992), ‘An analysis of Oxford Advanced Learner’s Dictionary of Current English, Fourth Edition’, Lexicon 22, Tokyo: Iwasaki Linguistic Circle, 59–200. Takebayashi, S. et al. (1982), ‘An analysis of Chambers Universal Learners’ Dictionary’, Lexicon 11, Tokyo: Iwasaki Linguistic Circle, 30–116. Thorndike, E. (1921), The Teacher’s Word Book, Teachers College, Columbia University. Urata, K. et al. (1999), ‘An analysis of Longman Dictionary of Contemporary English, Third Edition’, Lexicon 29, Tokyo: Iwasaki Linguistic Circle, 66–95. Wehmeier, S. (2000), ‘Oxford Advanced Learner’s Dictionary, Sixth Edition – continuity and change’, JACET [Japan Association of College English Teachers] Kyoto Seminar, Kyoto International Conference Hall, 28–29 October 2000. Yamada, S. (2010), ‘EFL dictionary evolution: Innovations and drawbacks’ in I.J. Kernerman and P. Bogaards (eds), English Learners’ Dictionaries at the DSNA, Tel Aviv: K Dictionaries, 147–68. Yamada, S. et al. (2012), ‘An analysis of the Oxford Advanced Learner’s Dictionary of Current English, Eighth Edition’, Lexicon 42, Tokyo: Iwasaki Linguistic Circle, 1–67. Yamada, S. (2019), ‘Utility of EFL dictionaries for reading a political text: The case of Obama’s farewell address’, Joint Conference of the Dictionary Society of North America and Studies in the History of the English Language, 8 May 2019, Indiana University, Bloomington. Yamada, S. (2020), OALD10 Katsuyo Gaido [Usage Guide of OALD10], Tokyo: Obunsha. https://www. obunsha.co.jp/service/oald10/pdf/OALD10_dic.pdf
191
192
12
Issues in compiling bilingual dictionaries Arleta Adamska-Sałaciak
1 Preliminary considerations Many decisions taken during the compilation of bilingual dictionaries are essentially identical to those encountered in the preparation of monolingual ones (see Trap-Jensen, Chapter 3). Consequently, whether the dictionary is to be organized alphabetically or thematically, published in print or digitally, available online for a fee or free of charge are all questions that will only be mentioned in passing, the main focus of this chapter being on topics specific to the bilingual dictionary genre. The issues that arise in the process of making a bilingual dictionary follow, in the main, from what kind of dictionary it is to be. The coverage (i.e. the number and type of items to be included) and the resulting physical size of the publication need to be decided upon, as does the depth of individual entries, that is, the amount of information offered about a single headword. At the bottom of it all lies the most basic consideration: the target audience, a factor which has a direct bearing on such aspects as the dictionary’s scope and directionality.
1.1 Target audience, scope and directionality Thanks to the fact that one of the languages dealt with in a bilingual dictionary is always known to its users, the dictionary’s audience can be immensely diverse, ranging from complete beginners to very advanced students of the other language. What any particular user wants from their dictionary may, of course, vary: some people will treat it as a language learning aid, while others – travellers, tourists – may have no interest at all in mastering the foreign language. For the latter, the dictionary can be anything from an essential tool, ensuring survival in a foreign environment, to an optional extra used – in addition to, or instead of, a phrase book – as a gesture of politeness towards the inhabitants of the place they are visiting. It is customary to view bilingual dictionary users in terms of the two parties they naturally fall into: one native speaker group for each of the two languages of a particular dictionary. This simple dichotomy glosses over those instances when neither of the dictionary’s object languages is a given user’s native tongue (L1). For speakers of lesser-used languages, there is often no bilingual dictionary available which would cover the foreign language they are interested in combined with their native language. They then have to settle for second best: an L3–L2 dictionary, where L3 is the foreign language they want and L2 is another foreign language, one they have mastered
The Bloomsbury Handbook of Lexicography
sufficiently well to be able to use it as an intermediary. The needs of such users are not taken into consideration in the preparation of bilingual dictionaries, nor is it clear how they could be. Another special case are dictionaries of languages which no longer have any native speakers. A bilingual dictionary of Sanskrit, for instance, will typically feature only one part, with Sanskrit as the source language (SL) and a modern language, such as English, as the target language (TL). Such dictionaries, used mainly by scholars, frequently rely on descriptive explanations of meaning rather than TL equivalents. This is because many headwords are so deeply embedded in the ancient SL culture that translating them by means of single TL words is not possible. Whether dealing with a dead or a living language, dictionaries with only one part (Lx–Ly) are called monoscopal (Hausmann and Werner 1991). Some monoscopal dictionaries are additionally equipped with an Ly–Lx index, meant as a partial substitute for a proper Ly–Lx part in that it allows Ly-speaking users access into the Lx–Ly section via their native language. Most currently produced bilingual dictionaries are biscopal (Lx–Ly and Ly–Lx), that is, consist of two fully fledged word lists, each item in each list being accompanied by one or more equivalents in the other language. The exact contents of a biscopal dictionary for a given pair of languages may vary considerably depending on the dictionary’s directionality. A so-called bidirectional dictionary (Hausmann and Werner 1991) is designed with native speakers of both Lx and Ly in mind, while a monodirectional one is meant for an audience consisting exclusively of speakers of either Lx or Ly. The majority of contemporary bilingual dictionaries either are or claim to be bidirectional. The motivation is largely commercial: it is cheaper to produce one dictionary which will be sold in both markets than invest in two different ones. Considered from the perspective of any single user, this is far from ideal. The necessity of addressing two language communities at once results, especially in print dictionaries, in a situation where some of the information is superfluous from the point of view of one of the user groups. Thus, speakers of Polish do not need to be told what the gender of each Polish noun is or that a given Polish expression is very formal. The space occupied by such redundant information could be put to better use if the needs of only one language community – in this instance, that of Polish speakers – were being catered for. When compiling a monodirectional dictionary, by contrast, lexicographers do not have to constantly switch perspective from one user group to the other: they can maintain a consistent focus on that of the dictionary’s languages which is its intended users’ L2.1 This seems to be reason enough for demanding that, whenever possible, bilingual dictionaries should be monodirectional. Matters are additionally complicated by the fact that, more often than not, one of the two languages of a bilingual dictionary is less widely spoken and less often learnt than the other. As a result, what publishers advertise as a dictionary equally useful to both language communities may, upon closer inspection, prove to be heavily biased towards the needs of the community speaking the less popular language. Such a dictionary is only superficially bidirectional, e.g. by virtue of giving grammatical information about headwords in both the Lx–Ly and Ly–Lx section and/or using both Lx and Ly as the metalanguage. Where semantic description is concerned, it cannot but privilege one of the user groups (see, e.g. Atkins 1985: 15). Most bilingual dictionaries with English which are produced in non-Anglophone countries fit the above description: in spite of the promises on their back covers, they are directed, first and foremost, at speakers of the local language, who are likely to constitute the majority of the users. 194
Issues in Compiling Bilingual Dictionaries
The good news is that the number of monodirectional dictionaries published worldwide has been growing steadily. This is especially visible in the area of pedagogical lexicography, i.e. among dictionaries geared specifically to the needs of less-than-advanced foreign language learners.2 Even so, it must be remembered that the coverage of such dictionaries is, as a rule, less comprehensive than that of large, academic reference works, which are still predominantly bidirectional.
1.2 Function, organization, medium of presentation A dictionary’s target audience, scope and directionality are intimately connected with its functions. Two main functions of bilingual dictionaries have been recognized: receptive (passive, decoding) and productive (active, encoding). Traditionally, the emphasis has been on reception. This is hardly surprising since, for any language pair we care to examine, dictionaries going from L2 to L1 will turn out to be used more readily – and, consequently, to have had a longer history – than those going in the opposite direction. The reasons seem obvious. When a person wants to understand something written or spoken in a foreign language, and there is no-one to help them with the task, they have little choice but to turn to a bilingual dictionary. By contrast, when someone wants to express themselves in a foreign language, they may choose to rely on their existing verbal repertoire, thus forgoing dictionary consultation. Besides, the benefits of consulting a dictionary for decoding tend to be rather more immediate and more obvious than its effects on the user’s production. Despite this traditional dominance of reception, much more attention is being paid to the productive function of today’s bilingual lexicography than used to be the case a mere few decades ago. This is a welcome development, since it is precisely in this area that (biscopal) bilingual dictionaries hold an undisputable advantage over monolingual ones: thanks to being equipped with an L1–L2 part, they enable users to access L2 lexical resources through what those users already know, i.e. lexical items of their L1. Even the most production-oriented monolingual learner’s dictionary cannot compete with that.3 The organization of bilingual dictionaries is, for the most part, semasiological, i.e. based on the headword’s (written) form. Accordingly, for languages with alphabetical writing systems, the arrangement of entries follows the order of the alphabet. Each entry is organized around a particular lexical item (the headword), providing the meanings (senses) which that item can express. This is the kind of reference work one normally has in mind when talking about (printed) bilingual dictionaries. In the much rarer onomasiological (thematic, ideological) bilingual dictionaries (see Sierra, Chapter 19), the word list is arranged according to topics, and each entry is built around a concept, listing the lexical items through which that concept can be expressed. Onomasiological dictionaries are usually monoscopal and cover only a limited subset of the SL lexicon, such as the vocabulary specific to a particular domain of knowledge. They are also primarily concerned with the headword’s semantics, to the exclusion of aspects such as grammar or pronunciation. There is a certain parallel between an onomasiological dictionary, whether mono- or bilingual, and a semasiological L1–L2 bilingual dictionary: both start from the familiar (respectively, a general concept and a word in the user’s native language) and proceed towards the less familiar
195
The Bloomsbury Handbook of Lexicography
(a specific word the user wants, but either cannot recall or does not know), thus serving primarily an encoding function. Depending on the medium of presentation, bilingual dictionaries are divided into paper (print) and digital ones. As properties of the latter are discussed in detail elsewhere in this volume (e.g. Pastor and Alcina, Chapter 8; Klosa-Kückelhaus and Michaelis, Chapter 24), the present contribution deals mainly with medium-independent features. Otherwise, in the absence of indications to the contrary, when talking about a bilingual dictionary we mean a printed book.
2 Compilation 2.1 Human resources Whether a dictionary succeeds in meeting the expectations of its target users depends, to a large extent, on the skill of the lexicographer(s) who have compiled it. Most quality bilingual dictionaries are prepared by teams of people rather than single-handedly. When perfect bilinguals are not available (which is in most cases), the minimum requirement is that the prospective lexicographer should be a native speaker of one of the dictionary’s languages and have a nearnative command of the other. Some project managers insist that the lexicographers and editors working on a dictionary’s Lx–Ly section should be native speakers of Ly (and vice versa), the assumption being that TL equivalents are easier to come by when one is translating into, rather than from, one’s native language. As might be expected, it is not always possible to fulfil this last requirement, either. One variable impossible to control is a candidate’s predispositions for the job. As with translating skills, a person’s talent for lexicography often has little to do with whether they hold a degree in linguistics or another language-related discipline (although being a trained linguist obviously cannot hurt). The most desirable qualification, i.e. an internationally recognized degree in lexicography, while not easy to obtain, has been made possible since 2009 thanks to the European Master in Lexicography (EMLex) Program.4 Otherwise the best indicator of a candidate’s suitability for actual dictionary making is how well they can execute the sample lexicographic tasks assigned to them in the course of their preparatory training – a necessary stage preceding any respectable dictionary project.
2.2 Data sources and methods Whatever the exact procedure followed in the compilation of a bilingual dictionary, it is certain that existing dictionaries (bilingual Lx–Ly and Ly–Lx ones, as well as monolingual ones of both Lx and Ly) will be consulted in the process. For many cheap, low-quality dictionaries, these are the only data complementing the authors’ introspection. Many specialists believe that a modern dictionary worthy of the name cannot be prepared without a suitably large, electronically stored language corpus (see Kilgarriff, Chapter 7). Corpora provide evidence for SL meanings and for the frequency of occurrence of lexical items. They also help identify common syntactic patterns and recurring phraseological combinations, 196
Issues in Compiling Bilingual Dictionaries
so that lexicographers can tap them for illustrations of various aspects of language use, which are then presented in the dictionary in the form of example sentences. In sum, it is on the basis of corpus data that representative, up-to-date word lists can be drawn up and the use of headwords illustrated, thereby ensuring that nothing of importance has been overlooked. All of the above is fairly uncontroversial. Exactly what kind of corpus is required for making a bilingual dictionary is slightly less so. Ideally, two representative, well-balanced corpora should be available, one for each of the source languages of a biscopal dictionary. Access to a parallel corpus (i.e. a corpus of SL texts and their translations) and/or to comparable monolingual corpora (i.e. corpora containing texts of the same type, coming from the same period and dealing with the same kind of topic) would be an additional bonus. There seems to be universal agreement that such closely matched corpora could be of great help in tracking down equivalent candidates for particularly obstinate headwords. As yet, no bilingual dictionaries are known to have been compiled exclusively on the basis of either parallel or comparable corpora. Apart from the obvious obstacle, i.e. unavailability of the right kind of corpus, the main problem seems to be the enormous amount of time required to analyse the wealth of corpus data in a satisfactory way. According to Atkins and Rundell (2008: 478), a bilingual corpus offers too many equivalence candidates, each of which might seem important to the lexicographer, with the likely result that a dictionary compiled in this way would contain ‘too much detail for most users’ and might end up being ‘too big to appear in print’. Not everyone shares those reservations. Studies such as those reported by Perdek (2012) or Granger (2018) demonstrate that experienced lexicographers are apparently undeterred by the embarrassment of riches, rejecting the vast majority of equivalence candidates on sight and swiftly homing in on the most promising ones. One must also mention the invaluable help offered by the Sketch Engine tool and its GDEX component (https://www.sketchengine.eu/guide/gdex/), which have gone a long way towards freeing the search for good equivalents and example sentences from much manual drudgery (see Kilgarriff, Chapter 7). Depending on corpus availability, but also on the time, money and human resources allocated to a particular project, the compilation of a bilingual dictionary may follow different paths. As already indicated, ideally, the lexicographers should have at their disposal two linguistically preanalysed corpora (databases), one for each of the source languages of the dictionary. For many language pairs, this is still not the case. Instead, for dictionaries whose SL is a well-described language Lx, a frequently taken route is to start from a ‘universal’ Lx database built from the resources of an Lx corpus. The database is universal in the sense that it can serve as a blueprint for a bilingual dictionary from Lx into any other language. It is made up of a list of Lx headwords, each complete with the relevant grammatical information, preliminary sense divisions and definitions of the identified senses; example sentences are sometimes provided as well. Such a pre-constructed framework is passed on to a team of TL lexicographer-translators, whose main task is to fill the empty slots, in other words, to supply TL equivalents for all the senses of the SL headwords. Subject to the chief editor’s approval, TL lexicographers may be allowed to modify the initial framework to some extent in order to make it fit the target language better; this is achieved mainly by splitting and lumping senses (see Chapter 15). What happens with the Ly–Lx part of the dictionary depends on the resources available for Ly. The procedure may either be fully analogous (i.e. Ly-corpus-based) or, in the absence of an Ly corpus, it may involve working with several monolingual dictionaries of Ly and/or using the 197
The Bloomsbury Handbook of Lexicography
results of an automatic reversal of the Lx–Ly part (provided the two parts are not being compiled simultaneously). There are also cases when a bilingual dictionary is created through a bilingualization of an existing monolingual dictionary. Although seemingly a simpler task, it may actually require more skill and effort to turn a fully fledged monolingual dictionary into a bilingual one than to fashion a bilingual dictionary from a semi-finished product, i.e. a database of the kind discussed above. Again, depending on the project, different degrees of intervention in the original macro- and microstructure may be allowed.5 For the sake of completeness, it ought to be mentioned that somewhere between a monolingual dictionary and a bilingual dictionary created on its basis lies an intermediate genre: a bilingualized (semi-bilingual) dictionary, i.e. one that offers TL equivalents while retaining the SL definitions from the monolingual dictionary on which it has been founded. Such dictionaries are always monodirectional and monoscopal (L2–L1), with only an L1–L2 index in place of a regular L1– L2 section.6
3 Megastructure The overall structure of a dictionary (its megastructure) comprises the central word list (macrostructure) and the outer texts (outside matter).
3.1 Word lists In a biscopal dictionary, naturally, we have not one word list but two. The main criterion for inclusion in either of them is a given item’s frequency of occurrence, but other factors play a role as well. In the language-learning context, that is, in pedagogical bilingual dictionaries, potential usefulness for learners – which does not always coincide with corpus frequency – must be taken into account. Thus, vocabulary items connected with the language classroom, complying with examination requirements and correlated with the interests of the target age group (e.g. schoolchildren or young adults) have a fair chance of being included. Another reason why the word lists cannot simply be determined by corpus frequency is that this would automatically disqualify older words, which are either extremely rare in, or altogether absent from, corpora of contemporary language. Although most bilingual dictionaries are synchronic contemporary dictionaries (Svensén 2009: 23), a comprehensive dictionary of the general language must also include some obsolescent or even obsolete words, in order to meet the expectations of those users whose proficiency allows them to read older literature in the foreign language. Since such users will need information about those words for decoding purposes only, it follows that, in a monodirectional dictionary, older vocabulary items will feature mainly as headwords in the L2–L1 part. By the same logic, in a bidirectional dictionary, where each of the parts acts as an L2–L1 resource for one of the user groups, both word lists should include important old words. What about the other end of the time spectrum, that is, neologisms? On the whole, lexical items which have not yet become institutionalized do not feature in dictionaries of any 198
Issues in Compiling Bilingual Dictionaries
sort (except, naturally, dictionaries of neologisms). Indeed, lack of a dictionary record has traditionally been one of the main criteria for deciding that a particular item is still at the neologism stage. It is all the more remarkable that, occasionally, there may be more justification for admitting a neologism into a bilingual Lx–Ly dictionary than into a monolingual dictionary of Ly (Adamska-Sałaciak 2016b). Those are cases when no Ly equivalent can be found for a particular Lx headword and when, additionally, there is reason to believe that the Lx word will eventually be borrowed into Ly (the tell-tale signs being its attestation in informal Ly speech and/or in Ly content on the internet). Under such circumstances, a bilingual lexicographer may decide to sanction the incipient borrowing by listing it among the proposed Ly equivalents of the Lx headword. The only chance for a neologism to feature in a bilingual dictionary is thus as a tentative TL equivalent in the Lx–Ly part rather than as a headword in either Lx–Ly or Ly–Lx. Finally, not only single words but increasingly also multiword units can be headwords in bilingual dictionaries. This is a result of the growing realization that units of meaning are not always co-extensive with orthographic words, ample evidence for which comes, among others, from the study of language corpora. From the users’ point of view, one of the benefits of multiword units being elevated to the status of independent headwords is that they are easier to locate than when nested inside entries.
3.2 Outer texts Outer texts are additions situated outside the A–Z core of the dictionary, either on the peripheries of the dictionary proper (front and back matter) or as inserts – plates of drawings, diagrams, etc. – interspersed among the entries (middle matter). Until quite recently, neither dictionary makers nor theoretical lexicographers used to pay much attention to those optional sections, arguing that very few users ever consulted them. While doubtless true, it is hard to say whether the latter is solely the cause of the lexicographers’ neglect or also one of its effects. What is certain is that outer texts often give the impression of an afterthought, carrying material which is relatively easy to get hold of, very likely added at the last minute and without much consideration to what the dictionary audience might really need. Fortunately, things are beginning to look up. In good dictionaries, the A–Z text is now routinely preceded by a ‘How to use the dictionary’ section. Also present is a list of abbreviations, including grammatical codes, phonetic symbols, usage labels and the like. A grammar section – with a list of noun declensions and/or verb conjugations, irregular verbs and similar – is a frequent feature. Occasionally, one can find some or all of the following: a selection of false friends, a writing guide, a bank of common phrases used in everyday conversation, a list of popular texting and e-mailing abbreviations (and their TL equivalents). In bilingual learners’ dictionaries, there is often a special section devoted to aspects of the L2 culture. Lists of geographical and personal (given) names, once quite common, are less frequent these days, partly because important place names tend to feature as headwords in the A–Z part. In print dictionaries, the choice of elements as well as the amount of information to be included in the outside matter is heavily circumscribed by space considerations. In the case of discursive sections, such as those on culture or essay writing, the language of presentation has to be decided
199
The Bloomsbury Handbook of Lexicography
upon, as lexicographers can rarely afford the luxury of two language versions. Ideally, of course, any text in L2 should be accompanied by its L1 translation, thus allowing users to choose the version they feel more comfortable with – an easy thing to do in a digital dictionary. In sum, if a dictionary’s outer texts are to be of real use to the language learner, a lot of thought has to go into their preparation. It may be a good idea to enlist the help of other specialists – language teachers, grammarians, phoneticians – or even delegate the task entirely to them. All of the lexicographers’ time and effort can then be spent on their main job: identifying TL equivalents and presenting them in the most effective and efficient way.
4 Microstructure The microstructure, that is, the structure of a single entry, can be quite complex in a modern dictionary, with many constituent elements following one another in a set order, each conveying different kinds of information. Here, only two types of entry constituent will be dealt with: equivalents and examples of usage.
4.1 Equivalents The one indispensable element of the microstructure of a bilingual dictionary is the headword with its translations (known in lexicography as TL equivalents). Both the identification of suitable TL equivalents and their clear presentation are crucial to the dictionary’s success.
4.1.1 Equivalent provision The principal idea of a bilingual dictionary is deceptively simple: to provide equivalents for all senses of all headwords, such that each equivalent is identical in meaning to the sense it has been matched with.7 Unfortunately, the execution of this idea is in most cases extremely difficult, and at times utterly impossible. Three properties of natural languages – two intra- and one interlingual – are responsible for that: vagueness of meaning, polysemy and lack of one-toone correspondence between different lexical systems (i.e. so-called anisomorphism8). Both vagueness and polysemy carry substantial benefits for language users, making it possible to express an infinitude of meanings with the help of limited lexical resources. The same cannot be said about their implications for lexicography. Semantic vagueness (indeterminacy) significantly complicates the process of identifying the meanings of (decontextualized) lexical items within each language. Until such identification has been accomplished, one cannot even begin to try and match meanings interlingually for the purposes of a bilingual dictionary. Besides, it is not always clear – not only to lexicographers, but also to lexical semanticists – whether a particular lexical item is best viewed as vague or as polysemous, that is, whether it should be thought of as having one general, underspecified meaning or several more or less distinct, independent meanings.9 Opting for the latter, i.e. for polysemy, brings with it the need for careful sense division of SL headwords as well as for meticulous discrimination and disambiguation of TL equivalents (see Sections 4.1.2. and 4.1.3). 200
Issues in Compiling Bilingual Dictionaries
There is also a connection between polysemy and anisomorphism: the rarity of interlingually symmetrical polysemy (i.e. of cases where two polysemous words in two different languages have the same number of exactly the same senses) can be viewed as one of the facets of anisomorphism. To illustrate this issue in a condensed form, let us consider The Sense of an Ending, the title of a novel by Julian Barnes. Individually, both sense and ending are polysemous; combined as above, they not only do not disambiguate each other, but intensify the ambiguity of the resulting phrase. At least three readings of the phrase seem possible: ‘the feeling that an end is approaching’; ‘the meaning of a book’s ending’ and ‘the meaning of a life’s ending’.10 Indeed, it is possible to read the title at a more general level, with all three interpretations being ‘turned on’ at once (the overall effect thus amounting to vagueness rather than polysemy). When it comes to translation, there is, naturally, little chance of preserving the multi-layered ambiguity – or vagueness, as the case may be – unless the items proposed as TL equivalents of the two English nouns exhibit polysemy parallel (at least in the relevant senses) to that of, respectively, sense and ending.11 In general, anisomorphism – a consequence of the fact that different languages structure reality differently – means that perfect interlingual equivalence is an exception rather than the rule. Consequently, if bilingual dictionaries are to be viable at all, the sameness-of-meaning requirement must be relaxed, in recognition of the graded, rather than absolute, nature of interlingual correspondences. A great deal of ingenuity and effort on the part of the lexicographer may be required before even a limited degree of equivalence between the headword and its dictionary TL counterpart can be reached.12 One of the means of achieving correspondence is to extend the scope of the unit to be translated, in accordance with the principle that, when there is no equivalence at word level, it can still be reached at the level of the entire message. Sometimes, it may be necessary to embed the untranslatable SL lexical item in a sentence (and then translate the whole); at other times, adding some minimal context (e.g. a modifier) and translating the resulting phrase will do the trick. It would, however, be wrong to conclude that it is always easier to provide equivalents for SL stretches larger than a single word. In particular, conventional multiword units, especially figurative ones, bring challenges of their own. On top of the difficulties which follow from lack of isomorphism, the extra complication here is that the meaning of a figurative expression is often to some extent motivated (transparent). The motivating link between the actual (that is, figurative) meaning of an expression (such as an idiom or a proverb) and its lexical structure resides in the so-called image component,13 i.e. the mental picture the expression evokes. Strictly speaking, a perfect TL equivalent of a conventional figurative expression should therefore correspond to it on both planes: that of the actual meaning and that of the rich image. What usually happens is that correspondence is present only on one of the planes, if that. If two expressions are identical exclusively on the level of the image, then, of course, they do not qualify as equivalent at all; rather, they are phraseological false friends. There is no agreement as to how to treat two expressions which do mean the same, but are based on markedly different images. Most authors (including practising lexicographers) will accept them as fully equivalent, but some (e.g. Dobrovol’skij and Piirainen 2005) will insist, in accordance with the reasoning presented above, that their equivalence is only partial. Consider the pair of proverbs which Duval (1991/2008: 280) cites as an example of full equivalence: English once bitten twice shy and French chat échaudé craint l’eau froide. Despite 201
The Bloomsbury Handbook of Lexicography
their identical figurative meaning, the two are not always substitutable in translation: one cannot be used in place of the other if, for whatever reason, the literal meaning gets activated, as in these corpus examples: I dared not enter the yard. I was afraid that another ferocious dog might lurk inside. Once bitten, twice shy. We feel that there is a need for a good, reliable repellent in the current market and that our new product offers the necessary protection for travellers to any destination. They say ‘once bitten, twice shy’ but be one step ahead and be prepared before you travel by picking up your bottle today. And then with an almost lazy flick of his wand, he transfigured the sword into a cobra. Harry gave a panicked cry, before throwing the cobra as far as he could, just avoiding the serpent’s strike. ‘Once bitten, twice shy, eh Harry?’14 Of course, bilingual lexicographers could count themselves lucky if all their problems with matching phraseological units were of this kind, the sole question being whether the equivalence works in all contexts or only in some. Most of the time they deal with correspondences which are much more tenuous than in the case just quoted, and sometimes the target language has no fixed expression at all which would make a passable equivalent candidate. In such cases, it is better to give a discursive explanation of the SL meaning in the target language than to propose as an equivalent a TL expression which exhibits significant differences (whether semantic, syntactic, or pragmatic) when compared with the headword, thereby potentially misleading the user.
4.1.2 Equivalent discrimination Early bilingual dictionaries did not discriminate between equivalents at all, listing them one after another, separated only by commas. Later, in slightly more ambitious publications, a distinction was introduced between a comma, separating supposedly interchangeable equivalents, and a semicolon, separating equivalents which were not fully interchangeable. There are still quite a few dictionaries which continue this tradition. That it is not a tradition worth continuing should be obvious from the paucity of cases when a SL headword (or a sense thereof) can be supplied with even a single perfect TL equivalent. To assume that there are more perfect equivalents per headword (sense), and that they are all fully interchangeable, flies in the face of what we know about interlingual anisomorphism, on the one hand, and about the rarity of absolute synonyms within one language, on the other. In a bilingual L2–L1 dictionary meant to serve reception only, such indiscriminate accumulation of equivalents, while hardly ideal, might perhaps be defended. When the dictionary’s target language is the users’ L1, we can, arguably, expect them to be able to pick the appropriate equivalent from among those provided. However, in a dictionary which also aims to take care of the users’ productive needs, lack of equivalent discrimination is bound to have disastrous consequences. Left to their own devices, some users will inevitably make some wrong choices, producing utterances which are, at best, stylistically inappropriate and, at worst, downright incomprehensible or ridiculous; or simply not what they wanted to say. Guiding the user to the equivalent they need involves making a number of decisions: what kind of information should be given to identify the equivalent; in which of the two languages it should
202
Issues in Compiling Bilingual Dictionaries
be phrased; what non-verbal signs (numbers, punctuation marks, icons) should be employed for the purpose. The information made use of in equivalent discrimination can be semantic (e.g. a synonym, hyperonym or collocate), syntactic (e.g. an indication of part of speech, transitivity or type of subject taken by the verb) or pragmatic (e.g. style, domain, regional or temporal label). The choice of typography is largely a question of aesthetics; what matters is that the signs should be distinct enough and not too numerous or difficult to interpret. The metalanguage in a monodirectional dictionary will, naturally, be the native language of the users. Unfortunately, the policy cannot be consistently applied in bidirectional dictionaries, and this necessarily puts one of the user groups at a disadvantage. The next best thing would be to give all information in both languages – not a viable proposition in a printed book, whose entries would swell in size and become impossibly hard to navigate, but easily doable in digital reference works, in which the choice of the metalanguage can be left to the user. The entry quoted below demonstrates how a bidirectional dictionary with English and German (OGD) attempts to solve the problem by alternating the metalanguage (in this instance, preceding the German equivalent by an English synonym of the headword in the appropriate sense or subsense and following it with two or three German collocates): handsome 1. (good-looking) gut aussehend [Mann, Frau]; schön, edel [Tier, Möbel, Vase usw.] 2. (generous) großzügig [Geschenk, Belohnung, Mitgift]; nobel [Behandlung, Verhalten, Empfang]; 3. (considerable) stattlich, ansehnlich [Vermögen, Summe, Preis]; stolz [Preis, Summe]15
4.1.3 Equivalent disambiguation Whenever the lexical item offered as a TL lexicographic equivalent is ambiguous (whether due to polysemy, homonymy or vagueness), information should be provided enabling the user to home in on the sense in which the equivalent is to be interpreted. This is especially important in L2–L1 dictionaries; in L1–L2 ones, the problem should not arise, thanks to the user’s familiarity with the meaning of the (native) headword. Often, the same methods which serve the purposes of sense and/or equivalent discrimination simultaneously resolve equivalent ambiguity. Thus, grammar codes, usage labels and guide phrases containing different kinds of clues (synonyms, collocates, etc.) can all be effective disambiguating devices. In addition, if two or more near-synonymous equivalents are given side by side, they often automatically disambiguate each other, their semantic ‘common part’ indicating the relevant sense. If all else fails, the lexicographer can furnish the equivalent with a gloss, as in this entry from a Russian– English dictionary (CORD): интерпретáтор interpreter (expounder), where the gloss deactivates the ‘oral translator’ reading of the equivalent.
203
The Bloomsbury Handbook of Lexicography
4.2 Examples Unlike equivalents, without which a bilingual dictionary is simply impossible, examples of usage are not an obligatory element of the microstructure. Nonetheless, they can be found more and more often these days, especially in bilingual dictionaries geared specifically to the needs of language learners. Once the decision has been made to include this feature, a number of questions have to be answered, such as whether the examples are better invented by the lexicographer or extracted from a corpus (and, if corpus-based, to what extent they can be modified) and whether or not they should be translated. In dictionaries for beginners and/or young schoolchildren, where the examples need to be simple and short, and where, consequently, the authenticity of the illustrative material is less important than its attunement to the users’ proficiency level, examples entirely made up by the lexicographer may be the best solution. On the other hand, simplicity must give way to authenticity in the exemplification of rare words (or senses), which need to be shown in the context in which they are normally found rather than in artificially simplified sentences. In sum, neither corpus-based nor made-up examples are inherently better – it all depends on the level of the user and on the kind of word (or sense) being illustrated. Since bilingual dictionaries serve users at all levels of proficiency, the examples they offer cannot always be quoted exactly in the form in which they appear in the corpus, but need to be carefully edited before being put in the dictionary. Editorial intervention involves cutting sentences down to manageable size as well as eliminating excessively difficult lexical items, obscure culture-specific references and other potentially distracting detail. Something more than simple editing may be required when examples are meant to assist in encoding, as they normally are in bilingual lexicography (given that the decoding function is taken care of by the equivalents). As demonstrated in an experimental study conducted by Frankenberg-Garcia (2012), production-oriented examples are more effective when they have been hand-picked to address common production errors, thus providing repeated exposure to structures that are problematic for learners with a particular mother tongue background.16 While not a realistic requirement for each single entry, this policy should be followed at least in entries containing words or structures notoriously difficult for the target users. Opinions differ as to what it is that example sentences ought to illustrate in a bilingual dictionary. In trying to answer this question, it helps to think about what bilingual dictionaries are typically used for. First, whatever their motivation, people do not normally consult a bilingual dictionary in order to find out how their native language works. Consequently, in a monodirectional dictionary, it makes sense for examples always to illustrate the users’ L2. This boils down to illustrating the headword in the L2–L1 section (which is what dictionaries have traditionally been doing) and the equivalent in the L1–L2 section (which is an innovation). The innovative approach has been taken by LSW, a dictionary for Polish learners of English, whose Polish–English part contains examples illustrating the equivalents, as in the following: płynąć 1 (ryba, człowiek) swim: · fish swimming up the stream 2 (rzeka, woda) flow:
204
Issues in Compiling Bilingual Dictionaries
· The River Elbe flows through the Czech Republic. · A steady stream of cars flowed past her window. 3 (woda, łzy) run: · Tears ran down her face. 4 (statek, statkiem) sail: · We sailed along the coast of Alaska. 5 (czas) go by: · Time goes by so quickly these days. Regrettably, most bilingual dictionaries which feature examples still employ them exclusively to illustrate source language material (i.e. the headword and the combinations it enters into) in both parts of the dictionary and irrespective of its directionality. In bidirectional dictionaries, there is, of course, no other option: examples must illustrate the headword, because the dictionary’s source language is always an L2 for one of the user groups. Still, bilingual learners’ dictionaries being, by definition, monodirectional, there is hope that the situation there might change in the future.17 The controversy over what examples should illustrate is intimately connected with another question, namely, whether they ought to be translated.18 Most examples support information given earlier in the entry, but some expand on or qualify it, usually by introducing important exceptions (e.g. by showing that, in certain circumstances, a different translational equivalent is needed or that the SL item is regularly omitted when translating into the target language). Examples which merely support the equivalent(s) can be left untranslated, whereas those which illustrate exceptions must be supplied with translations, either of the whole or of the part for which the so-far-unmentioned equivalent is needed. In digital dictionaries, the question loses much of its urgency, as it is possible to make translation an optional feature, to be switched on and off when needed. Even there, however, the problem does not disappear completely. After all, it does matter – especially for advanced users, who should be given maximally authentic models – whether a particular sentence has originally been uttered in a given language or is merely a translation (and thus potentially exhibits features of translationese).
5 Conclusion This chapter has touched upon some important issues involved in the compilation of bilingual dictionaries, but it has not attempted to make any systematic predictions about the future. Such caution seems justifiable, since even the world’s most renowned lexicographers, when asked what future dictionaries will look like – or, indeed, whether there will still be dictionaries a few years from now – tend to avoid direct answers. While it is hard at present to imagine a world without dictionaries, especially bilingual ones, no-one knows for sure how the world of reference science is going to be transformed by the technological advances we are witnessing. What does seem certain is that people will always need information about interlingual lexical correspondences and that, no matter in what form and through what medium that information is conveyed, it will capitalize upon the work which bilingual lexicographers have been doing for centuries.
205
The Bloomsbury Handbook of Lexicography
Notes 1 Or, in general, on the language less known to them (see the remarks about L3–L2 dictionaries above). 2 The dictionaries in the Oxford Beginners series are a good example. 3 See Adamska-Sałaciak (2010b) for a review of the inherent deficiencies of monolingual learners’ dictionaries and the resulting need for bilingual ones. 4 Since 2016, the EMLex can be completed with an Erasmus Mundus Joint Master Degree. 5 Adamska-Sałaciak (2006a: Chapter One) illustrates the different ways in which bilingual dictionary compilation can proceed, using English–Polish/Polish–English dictionary projects from the 1990s and early 2000s as examples. 6 A more nuanced typology has been proposed by James (1994), who makes a distinction between bilingualized and semi-bilingual dictionaries. Most authors, however, use the terms interchangeably. 7 The intricate relations between lexicographic equivalence and semantic identity are discussed in Adamska-Sałaciak (2013). 8 The term anisomorphism was introduced by Zgusta (1971: 294ff.). 9 The different diagnostic tests for distinguishing vagueness from polysemy proposed by philosophers and semanticists are not wholly reliable, often yielding inconclusive or conflicting results; for an overview, see Geeraerts (1993). 10 Either the life of the aging narrator-protagonist or that of his friend, whose reasons for committing suicide are only revealed towards the end of the book, undermining the reader’s earlier perception of the novel’s characters and events. 11 Since Polish does not afford such a possibility, in Barnes (2012) the translator chose the first – arguably, most salient – reading, thereby eliminating any uncertainty and destroying the vague aura of mystery carried by the title. 12 A typology of equivalence types, conceived of as areas on a cline, has been developed in AdamskaSałaciak (2006a, 2010a and 2011). A discussion illustrated with examples taken from dictionaries for different language pairs can be found in Adamska-Sałaciak (2016a). 13 The terminology used here is that of the Conventional Figurative Language Theory as developed in Dobrovol’skij and Piirainen (2005). 14 The examples have been obtained from the English Web 2015 (enTenTen15) Corpus with the help of the Sketch Engine. 15 In this, as in all other examples of dictionary entries quoted in this chapter, only those elements of the entry are given which illustrate the point under discussion. 16 Although Frankenberg-Garcia’s study deals with definitions and examples, imitating the process of monolingual dictionary consultation by (Portuguese) learners of English, it seems that this particular conclusion reached by the author can be extended to bilingual dictionaries. 17 Bilingual learners’ dictionaries, mentioned here in passing a few times, are a genre deserving of separate treatment. For a detailed discussion, see Adamska-Sałaciak (2020) and Adamska-Sałaciak and Kernerman (2016), as well as the entire special issue of the International Journal of Lexicography to which the latter paper is an introduction. 18 The problem is considered in detail in Adamska-Sałaciak (2006b).
References Dictionaries Concise Oxford Russian Dictionary (1998), Revised edition, based on the Oxford Russian Dictionary, Oxford: Oxford University Press. [CORD] 206
Issues in Compiling Bilingual Dictionaries
Longman Słownik Współczesny Angielsko-Polski, Polsko-Angielski (2011), Second edition, Harlow: Pearson Education Limited. [LSW] Oxford German Dictionary (2008), Third edition, Oxford: Oxford University Press. [OGD]
Other references Adamska-Sałaciak, A. (2006a), Meaning and the Bilingual Dictionary: The Case of English and Polish, Frankfurt am Main: Peter Lang. Adamska-Sałaciak, A. (2006b), ‘Translation of dictionary examples – notoriously unreliable?’ in E. Corino, C. Marello and C. Onesti (eds), Atti del XII Congresso Internazionale di Lessicografia, Torino, 6-9 settembre 2006, Vol. 1, Alessandria: Edizioni dell’Orso, 493–501. Adamska-Sałaciak, A. (2010a), ‘Examining equivalence’, International Journal of Lexicography 23 (4), 387–409. Adamska-Sałaciak, A. (2010b), ‘Why we need bilingual learners’ dictionaries’ in I. Kernerman and P. Bogaards (eds), English Learners’ Dictionaries at the DSNA 2009, Tel Aviv: K Dictionaries, 121–37. Adamska-Sałaciak, A. (2011), ‘Between designer drugs and afterburners: A lexicographic-semantic study of equivalence’, Lexikos 21, 1–22. Adamska-Sałaciak, A. (2013), ‘Equivalence, sameness of meaning, and synonymy in a bilingual dictionary’, International Journal of Lexicography 26 (3), 329–45. Adamska-Sałaciak, A. (2016a), ‘Explaining meaning in bilingual dictionaries’ in P. Durkin (ed.), The Oxford Handbook of Lexicography, Oxford: Oxford University Press, 144–60. Adamska-Sałaciak, A. (2016b), ‘On bullying, mobbing (and harassment) in English and Polish: Foreignlanguage-based lexical innovation in a bilingual dictionary’ in T. Margalitadze and G. Meladze (eds), Proceedings of the XVII EURALEX International Congress: Lexicography and Linguistic Diversity, Tbilisi: Ivane Javakhishvili Tbilisi State University, 758–6. Adamska-Sałaciak, A. (2020), ‘Bilingual learners’ dictionaries in the lexicographic landscape’ in M. Szczyrbak and A. Tereszkiewicz (eds), Languages in Contact and Contrast. A Festschrift for Professor Elżbieta Mańczak-Wohlfeld on the Occasion of Her 70th Birthday, Kraków: Jagiellonian University Press, 37–47. Adamska-Sałaciak, A. and I. Kernerman (2016), ‘Introduction: Towards better dictionaries for learners’, International Journal of Lexicography 29 (3), 271–8. Atkins, B. T. S. (1985), ‘Monolingual and bilingual learners’ dictionaries: A comparison’ in R. Ilson (ed.), Dictionaries, Lexicography and Language Learning, Oxford: Pergamon Press, 15–24. Atkins, B. T. S. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. Barnes, J. (2011), The Sense of an Ending, London: Jonathan Cape. Barnes, J. (2012), Poczucie kresu, Warszawa: Świat Książki. (Polish translation of Barnes (2011) by Jan Kabat). Dobrovol’skij, D. and E. Piirainen (2005), Figurative Language: Cross-cultural and Cross-linguistic Perspectives, Amsterdam and Boston: Elsevier. Duval, A. (1991), ‘Equivalence in bilingual dictionaries’ in T. Fontenelle (ed.) (2008), Practical Lexicography: A Reader, Oxford: Oxford University Press, 273–82. (Originally published as Duval, A. (1991), ‘L’équivalence dans le dictionnaire bilingue’, in F-J. Hausmann et al. (eds), Vol. 3, 2817–24). Frankenberg-Garcia, A. (2012), ‘Learners’ use of corpus examples’, International Journal of Lexicography 25 (3), 273–96. Geeraerts, D. (1993), ‘Vagueness’s puzzles, polysemy’s vagaries’, in Cognitive Linguistics 4, 223–72 (Reprinted in Geeraerts, D. (2006), Words and Other Wonders, Berlin: Mouton de Gruyter, 99–148).
207
The Bloomsbury Handbook of Lexicography
Granger, S. (2018), ‘Has lexicography reaped the full benefit of the (learner) corpus revolution?’ in J. Čibej, V. Gorjanc, I. Kosem and S. Krek (eds), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana: Ljubljana University Press, Faculty of Arts, 17–24. Hausmann, F-J. and R.O. Werner (1991), ‘Spezifische Bauteile und Strukturen zweisprachigen Wörterbücher: Eine Übersicht’ in F-J. Hausmann et al. (eds), Vol. 3, 2729–69. Hausmann, F-J., O. Reichmann, H.E. Wiegand and L. Zgusta (eds) (1989–91), Wörterbücher/ Dictionaries/ Dictionnaires. Ein internationales Handbuch zur Lexikographie. An international encyclopedia of lexicography. Encyclopédie internationale de lexicographie, Berlin and New York: Mouton de Gruyter. James, G. (1994), ‘Towards a typology of bilingualised dictionaries’ in G. James(ed.), Meeting Points in Language Studies. Working Papers, Hong Kong: Language Centre, Hong Kong University of Science and Technology, 184–202. Perdek, M. (2012), ‘Lexicographic potential of corpus equivalents: The case of English phrasal verbs and their Polish equivalents’ in R. Vatvedt Fjeld and J.M. Torjusen (eds), Proceedings of the XV EURALEX International Congress, Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 376–88. Svensén, B. (2009), A Handbook of Lexicography, Cambridge: Cambridge University Press. Zgusta, L. (1971), Manual of Lexicography, Prague: Academia; The Hague: Mouton.
208
13
Aspects of African language lexicography D. J. Prinsloo
1 Introduction In this chapter it will be attempted to give a short discussion of what is regarded as the most relevant aspects for African language lexicography. Although the aspects covered represent a rather comprehensive overview, not ‘all issues’ have been addressed. The focus is on African language lexicography in respect of the many challenges posed to lexicographers to compile user-friendly dictionaries for African languages constrained by affordability, lemmatization problems, complicated grammatical structures, lexicographic traditions, etc. All issues addressed are written against the background of the user perspective, i.e. the compilation of user-friendly dictionaries with two basic guidelines that have stood the test of time, i.e. that users should be able to find the information that they are looking for (Haas 1962: 48) and Laufer’s (1992: 71) list of requirements of enabling the user ‘to know’ a word. Examples are given from African languages spoken in South Africa but are representative of other African languages as well. Some clarification regarding the terms ‘African languages’ versus ‘Bantu languages’ is required. The term ‘Bantu’ got stigmatized during the Apartheid era in South Africa. Therefore, the term ‘African’ is preferred in South Africa even in reference to what is internationally referred to as ‘Bantu languages’. The discussion in this chapter is focused on the Bantu language family, and most of the issues described cannot necessarily be generalized to be applicable to other languages on the continent of Africa. However, in order to respect the view of those opposed to the term ‘Bantu’, the term ‘African languages’ will be used. Generally speaking, dictionaries for African languages lack in quantity, are not aimed at specific target users and many are out of print or outdated. Dictionaries are compiled by individuals, mostly through initiatives from publishing houses or are government funded or institutionally supported. It can be stated that most lexicographic problems that African language lexicographers experience are rooted in one aspect, namely the complex grammar of these languages. It is therefore not uncommon or to be regarded as repetition, if any discussion on the complexity of grammar for a specific issue precedes arguments for best lexicographic practice for that specific issue. It is also important to note that issues that are non-complex in English are often very complicated in African languages.
The Bloomsbury Handbook of Lexicography
2 Salient features and grammatical complexity The salient features and the grammatical complexity of African languages underpin every single aspect of African language lexicography. Prinsloo (2020) attempts to describe these complex grammatical systems in detail and shows how specific words such as who, is/am/are, every, of, etc. that are relatively non-problematic to the English lexicographer present a major challenge for African language lexicography. Consider Figure 13.1 for an extract of the noun prefixes, pronouns and concords of classes 1–4 from the seventeen noun classes in isiZulu, neatly summarized in the online isiZulu dictionary isiZulu.net and Table 13.1 for the indicative, one of the eight verbal moods in Sepedi. Table 13.1 is an extract of the indicative present, future and past tenses positive and negative for one of the eight moods in Sepedi.
Figure 13.1 Extract of isiZulu noun prefixes and concords from isiZulu.net (https://isizulu. net/). Table 13.1 A sentence in the indicative mood in Sepedi.
Indicative
210
Tense
Positive
Negative
Pres
Kgoši e a bolela ‘The king speaks’
Kgoši ga e bolele ‘The king does not speak’
Fut
Kgoši e tlo bolela ‘The king will speak’
Kgoši e ka se bolele ‘The king will not speak’
Past
Kgoši e boletše ‘The king spoke’
Kgoši ga se ya bolela ‘The king did not speak’
Aspects of African Language Lexicography
3 Amalgamated lemma lists An amalgamated dictionary for Dutch and Afrikaans with a single amalgamated lemma list for these two languages (Martin and Gouws 2000) inspired research into amalgamation of the lemma lists for closely related African languages. So, for example, the Nguni languages (isiZulu, isiXhosa, isiNdebele and Siswati) as well as the Sotho languages (Sepedi, Setswana and Sesotho) are good examples of language groups in which the members are so closely related that the compilation of a single lemma list becomes a viable option as an alternative to separate lemma lists for each language pair and to save space. Prinsloo (2014: 275) indicates a substantial overlap for the Sotho languages based on a comparison of the 10,000 most frequently used words in each of these languages. This comparison revealed that Sepedi, Setswana and Sesotho have 19.4 per cent words in common, Setswana and Sesotho share 34.4 per cent, Sepedi and Setswana 32.7 per cent, and Sepedi and Sesotho 26.9 per cent. According to his calculations, a single amalgamated lemma list for these three languages could result in a 30 per cent reduction compared to three separate lemma lists for each of the languages.
4 Support systems for dictionaries and linking with other sources for additional information For African language dictionaries, links to support systems for guidance on complicated grammatical structures are very useful, especially to learners of the language. Typical examples of such support tools are a translation tool for isiZulu possessives, Zulu e-Dict test version (Bosch and Faaß 2014), a Sepedi sentence builder Sepedi Helper, (http://sepedihelper.co.za), decision trees for colour distinction in Sepedi (Taljard and Prinsloo 2013), copulatives in Sepedi (Prinsloo and Bothma 2020)), isiZulu and Sepedi kinship terminology (Prinsloo and Bosch 2012) and Sepedi text verification (see below). User studies on the Sepedi Helper (Prinsloo and Taljard 2019) and the Copulative Decision Tree (Prinsloo forthcoming a.) indicate that users find these tools very useful in a dictionary use situation for text production and text verification. Consider Zulu e-Dict test version and the Copulative Decision Tree in Figures 13.2 and 13.3. The possessive construction builder Zulu e-Dict test version is a first version of an integrated e-dictionary translating possessive constructions from English to Zulu (https://iwist-cl.hosting. uni-hildesheim.de/projects/sela/tools.html). Consider also the Copulative Decision Tree in Figure 13.3.
Figure 13.2 Formation of the isiZulu possessive for the food of a person in Zulu e-Dict test version (Bosch and Faaß 2014: 743). 211
The Bloomsbury Handbook of Lexicography
Figure 13.3 A section of the decision tree for Sepedi copulatives (Prinsloo and Bothma 2020: 90).
5 Dictionaries as preferred point of departure To date, many questions have been asked about the future of dictionaries, e.g. how dictionaries of the future will look like or even whether there will be dictionaries in future at all (see Nielsen, Chapter 23). Some regard the information era and especially sophisticated search engines as a threat to dictionaries in the future. Dictionaries are believed to be under threat, with the shift to digital format and the ongoing undermining of expertise by crowdsourced sites contributing to a perception that dictionaries are no longer needed. http://slll.cass.anu.edu.au/ centres/andc/australex2019 [Consulted 1 February 2019] Zimmer (2017) says that digital dictionaries are under threat, and some of that threat comes from none other than Google. He goes as far as to compare dictionaries to a lexicographic Titanic heading straight for the Iceberg Google. Lexicographers should not expect that this presumed threat will ‘go away’ believing that ‘there will always be dictionaries’. Dictionaries of the future should be of such high lexicographic quality and informative and user-friendly that users should still regard dictionaries as the
212
Aspects of African Language Lexicography
preferred point of departure in finding information about a word. Such dictionaries should be enriched by support systems and linked to additional information, easy to consult, appealing to the eye, etc. For African languages, it is expected that paper dictionaries will be used for many more years to come but that they will gradually be paralleled with an increasing number of electronic dictionaries.
6 Lexicographers as data providers It is possible that in future lexicographers would not regard search engines as a threat that needs to be opposed but rather play an increasing role as data providers to search engines, especially Google. This simply means that the user would look up words in Google rather than reverting to a paper or even online dictionary, where they find quality lexicographic information and are provided with typical dictionary-style information linked to additional information. Consider Figure 13.4 for a look-up of intensify in Google.
Figure 13.4 Results for intensify in Google.
213
The Bloomsbury Handbook of Lexicography
In addition to ‘standard dictionary treatment’ such as pronunciation, part of speech, sense distinction, paraphrases of meaning and example sentences, a frequency of use trajectory over two centuries of the word is given. For the African languages such information could, for example, be links to support systems such as the isiZulu possessive construction tool, Sepedi Helper, Copulative Decision Tree or simply basic key word in context (concordance) lines generated for the lemma from corpora.
7 Dictionary compilation software Apart from publishers’ own in-house dictionary compilation programs, a variety of dictionary compilation software is commercially available. For the compilation of African language dictionaries TLex stands out as an example of an excellent program tailored to lexicographic treatment of the complex grammatical systems of African languages. Quoted from the selfdescription of TLex: TLex contains many specialized features … These include an integrated Corpus Query System, real-time preview, full customisability, advanced styles system, ‘smart cross-references’ with tracking and auto-updating, automated lemma reversal, automated numbering and sorting, export to MS Word and typesetting systems … multi-user support for managing teams … Publish to hardcopy, Web, or CD-ROM/software. (https://tshwanedje.com/tshwanelex/)
8 Lexicographic theory versus lexicographic practice Some tension exists between lexicographic theory and lexicographic practice. Lexicographic practice preceded lexicographic theory and in a sense lexicographic theory has an uphill struggle to ‘catch up’ with firmly established lexicographic practices and to proceed beyond that in order to play a leading role in international lexicographic practice. Rundell (2012) refers to the existence of an uneasy relationship between practical and theoretical lexicography and states that lexicography works in practice but asked whether it also works in theory. Bergenholtz and Gouws (2012) quite objectively summarize the situation as follows: Atkins and Rundell (2008: 4) saying, with regard to a theory of lexicography, that they ‘do not believe that such a thing exists’, and Bejoint (2010: 381) saying: ‘I simply do not believe that there exists a theory of lexicography, and I very much doubt that there can be one’, to lexicographers who firmly believe in a lexicographic theory, cf. Wiegand (1998), Bergenholtz and Tarp (2003), Gouws (2011), Tarp (2012). (Bergenholtz and Gouws 2012: 36)
214
Aspects of African Language Lexicography
Prinsloo (2019) describes it as a challenge posed to theoretical lexicography to lead the way in the sense that lexicographers can compile better dictionaries by following theoretical guidance (see also Piotrowski, Chapter 16).
9 Zero equivalence The task of the compiler of a bilingual dictionary is to find translation equivalents for source language items in the target language. So, for example, it is reasonable for the compiler of an English–isiZulu dictionary to expect that translation equivalents for table, chair, dictionary, car, etc. will be available, i.e. itafula, isitulo, isichazamazwi, imoto respectively. Dagut (1981), however, says that it is not possible to find translation equivalents for all source language words in the target language. Gouws and Prinsloo (2005: 158) say that the reason for such incompatibility is that the lexicon of a language ‘does not necessarily develop parallel to the lexicon of any other language’. So, if the source language has words for specific concepts it cannot be assumed that the target language should necessarily have translation equivalents for these concepts. If a translation equivalent does not exist in the target language, i.e. a situation of zeroequivalence, the lexicographer has to find surrogate equivalents. In most cases the compiler would revert to giving a paraphrase of the meaning of the source language word. Consider Figure 13.5 for the isiZulu word in English–IsiZulu, IsiZulu–English Dictionary (EID). English does not have a word for ‘a poor person who is unable to contribute to the chief’s support’ and therefore a short description is required. Adamska-Sałaciak (2006: 117) is of the opinion that zero equivalence is of rare occurrence. For the language pair English–isiZulu, Prinsloo and Zondi (forthcoming) indicate that instances of zero-equivalence occur quite frequently in both directions, i.e. English words not having translation equivalents in isiZulu and isiZulu words for which no English translations exist.
10 Balance versus imbalance in the different sections of bilingual dictionaries In most bi-directional bilingual dictionaries the A to B section and the reverse section B to A are of the same size, i.e. containing an equal number of pages and or lemmas. Examples of
Figure 13.5 umhlinzantulo in EID. 215
The Bloomsbury Handbook of Lexicography
imbalance between the two sections were, however, observed such as the Setswana–English– Setswana Dictionary (SESD) where the division relation between the Setswana–English and the English–Setswana section is 70:30 thus the Setswana–English section is more than twice the size of the English–Setswana section. No reason is given for such an imbalance. Imbalance between the A and B sections can, however, be done for well-motivated reasons. Prinsloo and Heid (2011) describe such a situation for the language pair Setswana (language A) and English (language B). The dictionary is designed for a narrowly defined target user group, mother-tongue speakers of Setswana who require strong guidance in text production in language B, in the section of the dictionary where B is the target language and text reception from language B in the section of the dictionary where B is the source language. The envisaged dictionary will have a limited macrostructure but a fairly extended microstructure in the A → B component. The reverse side B → A will have a relatively larger macrostructure but with a fairly limited microstructure. This means catering for Setswana-speaking target users who wish to empower themselves in English. As mother-tongue speakers, they know the Setswana word but want to find the English equivalent and strong text production guidance from the dictionary. As for the reverse side, they need to find a substantial offering of English lemmas but as mother-tongue speakers of Setswana, they do not require text production guidance in Setswana.
11 Electronic dictionaries Prinsloo, Prinsloo and Prinsloo (2018) did a detailed study of electronic dictionaries for African languages. They are of the opinion that in most cases such dictionaries are useful but do not exhibit the use of what they call ‘true electronic features’. For many languages of the world some electronic dictionaries are merely paper dictionaries put on computer. In some cases such dictionaries are even scanned pages of a paper dictionary or paper dictionaries uploaded and available online. Some dictionaries are built from scratch but they do not reach much beyond being merely translated word lists or skeleton articles. There are, however, electronic dictionaries for African languages which are of high lexicographic quality such as the online isiZulu dictionary, isiZulu.net. (https://isizulu.net/) in Figure 13.6. In Figure 13.6 the dictionary decomposes the word and the user does not need to identify the noun stem before being able to look it up and gives an appropriate morphological analysis of the parts as well as the plural form. See Prinsloo (2011) for a detailed discussion of the lemmatization of nouns and verbs in isiZulu.
12 Data boxes A number of dictionaries use data boxes to guide users in respect of salient information which often falls outside the scope of the default treatment of a lemma. Data boxes are typically used to contrast different words, to indicate the range of application of a word, to indicate antiquation, grammatical aspects, syntactic restrictions, avoidance of words, etc. Gouws and
216
Aspects of African Language Lexicography
Figure 13.6 ngamasonto in isiZulu.net.
Figure 13.7 Guidance on the range of application of umzala in OZSD.
Prinsloo (2010: 501) say that data boxes can really enhance the data transfer in dictionaries. Consider Figures 13.7, 13.8 and 13.9 respectively. Data boxes represent an underutilized treatment strategy, with great potential for African language dictionaries.
217
The Bloomsbury Handbook of Lexicography
Figure 13.8 Syntactic information regarding the use of upša in ONSD.
Figure 13.9 Data box contrasting trom, trommel and drom in RD. 218
Aspects of African Language Lexicography
13 Bridging African languages For most African language dictionaries, English is the hub, i.e. dictionaries bridging English with an African language in a mono- or bi-directional way, e.g. English–Swahili/Swahili–English. However, to the knowledge of the author, no dictionaries bridging African languages with each other exist. There are 10 million speakers of isiZulu and Sepedi, yet not a single Sepedi–isiZulu dictionary is available. It is possible that publishers deem the compilation of such dictionaries as not commercially viable.
14 Text verification Text verification is a new initiative in the development of verification software for text production in an African language in a dictionary use situation, cf. Prinsloo (2020) for a brief introduction. A user produces phrases and sentences in the African language but is unsure whether the structures are grammatically correct and idiomatic. The phrases and sentences are then verified against a corpus of the language for confirmation of correctness. The results lie on different levels. The best outcome is a perfect match with a high frequency count between the produced text and the corpus, i.e. that multiple occurrences of an identical construction has been found in the corpus. Say, for example, the user produced the Sepedi sentence monna yo a sepelago (monna [N01] yo [CD01] a [SC01] sepelago [V]) ‘the man who is walking’. The part of speech sequence for the verbal relative is noun + demonstrative + subject concord + verb stem, i.e. [N]:[CD]:[SC]:[V]. If exact matches were not found a variety of near matches could nevertheless be useful for text verification. Near matches could be instances where the part of speech sequence is the same but the nouns and verbs differ, e.g. dinku tše di nwago ‘the sheep that are drinking’.
15 Negation Negation is a typical issue where a phenomenon is not complicated and does not require special attention from the lexicographer of an English dictionary. This is not the case in African languages. Bosch and Faaβ (2018) did a detailed study of negation in isiZulu and conclude that a substantial number of negative constructions are used. Prinsloo (forthcoming b.) studied the lemmatization of negative morphemes in twelve Sepedi dictionaries and indicated to what extent treatment was inadequate in most of the dictionaries for the negation morphemes ga, sa, se, ga se and ka se. Consider Table 13.2, an extract for one of the moods. Poulos and Louwrens (1994) give a detailed discussion of the complex negation strategies in Sepedi.
219
The Bloomsbury Handbook of Lexicography
Table 13.2 Negation strategies for a verbal mood. Mood Indicative
Negation strategy
Present
ga + subject concord + verb stem ending -e
Future
subject concord + ka se + verb stem ending -e
Past
1: ga se + alternative concord + verb stem 2: ga se + subject concord + verb stem ending -e 3: ga + subject concord + a + verb stem 4: ga + alternative concord + verb stem
16 Lemmatization of items bigger and smaller than words It is not uncommon for the lemma lists of dictionaries to reflect a word bias, i.e. not to take items bigger than words, e.g. multiword lemmas and items smaller than words such as stems and morphemes into consideration. Such a word bias is also typical in dictionaries for African languages. For African languages where productive word formation by a variety of affixes with word stems occur, lemmatization and treatment of stems as well as affixes is essential. In African languages all concords and verbal suffixes should be lemmatized and treated. For example, pay attention and its Sepedi translation ela hloko are often not lemmatized as multi-word lemmas nor are verbal affixes such as the reflexive i- or applicative -el in Sepedi.
17 Children’s dictionaries Taljard and Prinsloo (2019: 199) say that much more attention should be given to African language dictionaries for children. They emphasize that ‘children’s dictionaries are instrumental in establishing a dictionary culture and are the gateway to sustained and informed dictionary use.’ They also caution against any form of stereotyping: lexicographers who plan and eventually compile dictionaries with children as target users must be extremely sensitive to enforcing, sometimes inadvertently, any kind of stereotype, be it gender, racial or cultural stereotyping. (Taljard and Prinsloo 2019: 220)
18 Training in dictionary use There are not many initiatives for teaching and training in dictionary use for African languages. Initiatives known to the author include initiatives sponsored by the African Association for Lexicography (AFRILEX) to guide secondary school learners who are mother-tongue speakers of Sepedi and isiZulu in the use of dictionaries. It is hoped that similar initiatives as well as 220
Aspects of African Language Lexicography
the distribution of African language dictionaries to learners who are mother-tongue speakers of African languages will gain momentum and be prioritized in future.
19 Grammatical divergence Due to the division of nouns into different noun classes in African languages, different words/ concords are used for concepts expressed by single words in English. So, for example, there are no less than forty-five words to indicate the relative distance of a person, animal or object in relation to the speaker for the English words this/these, that and that yonder. Likewise, fifteen different Sepedi words exist for of depending on the noun class to which the noun acting as the ‘possession’ in a possessive construction belongs to, as indicated in Table 13.3. Prinsloo and Gouws (2006) refer to this as ‘grammatical divergence’ which is a problem to the lexicographer since doing justice to all of the translation equivalents for this/these, that and of in an English–Sepedi dictionary can be redundant, lead to lengthy dictionary articles and uneconomical in terms of the utilization of dictionary space in paper dictionaries.
20 A brief description of a selection of other relevant issues with reference to key sources 20.1 A Euro-centric versus an Afro-centric approach to dictionary compilation Initially dictionaries for African languages were compiled in particular by missionaries. This is referred to as a ‘Euro-centric’ approach to the compilation of African language dictionaries. An Afro-centric approach is currently strongly advocated, i.e. for mother-tongue speakers of African languages to compile dictionaries in Africa for Africans (Gouws 2007).
20.2 Affordability Affordability is a major inhibiting factor in dictionary compilation and use in Africa. Prinsloo (2009) refers to a problematic interaction of the number of lemmas and length of the articles versus limitations on price. Table 13.3 Extract from the possessive construction for Sepedi noun classes. Noun class
Possession
of
Possessor
Class 3
Mmotoro ‘car’
wa ‘of’
Tate ‘Father’
Class 6
Matšoba ‘flowers’
a ‘of’
Tate ‘Father’
Class 10
Dikgomo ‘cattle’
tša ‘of’
Tate ‘Father’
Class 14
Bogobe ‘porridge’
bja ‘of’
Tate ‘Father’
221
The Bloomsbury Handbook of Lexicography
20.3 Conjunctivism versus disjunctivism, lexicographic traditions and lemmatization problems African languages such as the Sotho language group employ a disjunctive way of writing – words and concords are written separately whilst the Nguni language group follow a conjunctive orthography – words and concords are clustered together. So, for example, the phrase ‘I do not have money’, although containing the same grammatical and translation equivalents, is expressed by four words in Sepedi but as a single word in isiZulu, i.e. ga ke na tšhelete versus anginamali respectively (ga/a = negative morpheme, ke/ngi = subject concord first person singular, na = copulative verb stem and tšhelete/imali = noun class 9). This led to two lexicographic traditions, i.e. word lemmatization and stem lemmatization and inappropriate application of the traditions, i.e. stem lemmatization for disjunctively written languages (Van Wyk 1995 and Prinsloo 2011). Lemmatization approaches distinguished are the traditional approach (lexicographers enter words in the dictionary ‘as they cross their way’), rule-orientated approach (e.g. entering only basic stem forms and providing rules for affixal derivation), paradigm approach (attempt to include all derivations in a single entry) and an approach based on frequency of use (Prinsloo 2009).
20.4 Issues regarding alphabetical ordering Users of many African language dictionaries find a number of dictionaries to be user-unfriendly because they follow a phonetic ordering instead of a standard sorting order. Consider the boldfaced stretches in the alphabetical order in Groot Noord-Sotho-woordeboek (GNSW) which deviates from a standard alphabetical ordering i.e. A, B, BJ, D, E, F, FS, FŠ, G, H, HL, I, J, K, KG, KH, L, M, N, NG, NX, NY, O, P, PH, PS, PSH, PŠH, R, S, Š, T, TH, TL, TLH, TS, TSH, TŠH, U, W, Y, Z (Gouws and Prinsloo 2005).
20.5 Rulers and block systems Prinsloo and De Schryver identified the need to compile a measurement instrument for balancing alphabetical stretches so that some stretches are not over-treated whilst others are under-treated. Their so-called lexicographic rulers were designed from the calculation of the size of each alphabetical stretch, e.g. from alphabetical word lists culled from corpora (Prinsloo and De Schryver 2007). Consider the ruler for Sepedi in Figure 13.10.
20.6 The user perspective and simultaneous feedback It has been repeated in the literature that modern dictionaries have to be user-friendly – they are judged as good or bad dictionaries, not on linguistic achievement but by the user’s success in finding the information they are looking for (Haas 1962) and to what extent the dictionary assists them to ‘know’ a word (Laufer 1992). It could be argued that consulting the user and determining their needs is not a new concept. However, in many cases user input was obtained after completion of the dictionary with the intention to implement such feedback in the next edition 222
Aspects of African Language Lexicography
Figure 13.10 Sepedi Ruler (Prinsloo and De Schryver 2007: 191).
Figure 13.11 Theoretical framework of Simultaneous Feedback (De Schryver and Prinsloo 2000: 198). 223
The Bloomsbury Handbook of Lexicography
of the dictionary. De Schryver, however, contributed by formalizing this process as a process of simultaneous feedback from the target user, i.e. to be performed while the compilation process is still in progress. Consider Figure 13.11 for a schematic illustration of simultaneous feedback.
21 Conclusion There are many challenges facing the African language lexicographer such as financial constraints, complicated grammatical structures, lemmatization problems and the urge to change from a Euro-centric to an Afro-centric approach. African language lexicography also does not exist in a vacuum; it is affected by the same trends and changes relevant to other languages of the world. More electronic dictionaries of high lexicographic achievement should be compiled for African languages. Lexicography across the world should also adapt to the information era and all the possibilities offered by electronic support tools and linking of internet data with electronic dictionaries.
Acknowledgement This research is supported by the South African Centre for Digital Language Resources (SADiLaR). Findings and conclusions are those of the author.
References Dictionaries Dictionaries (EID) Doke, C.M., D.M. Malcolm, J.M.A. Sikakana and B.W. Vilakazi (2016), English– IsiZulu, IsiZulu – English Dictionary, Johannesburg: Wits University Press and South African Heritage Publishers. (GNSW) Ziervogel, D. and P.C.M. Mokgokong (1975), Groot Noord-Sotho-woordeboek, Noord-Sotho– Afrikaans/Engels, Pretoria: J.L. van Schaik. (ONSD) De Schryver, G.-M. et al. (eds) (2008), Oxford Bilingual School Dictionary: Northern Sotho and English/Pukuntšu ya Polelopedi ya Sekolo: Sesotho sa Leboa le Seisimane, Cape Town: Oxford University Press. (OZSD) De Schryver, G.-M. (ed.) (2010), Oxford Bilingual School Dictionary: Zulu and English, First edition, Cape Town: Oxford University Press Southern Africa. (SESD) Matumo, Z.I. (1993), Setswana–English–Setswana Dictionary, Gaborone: Macmillan.
Websites isiZulu.net. http://isizulu.net [accessed 20 August 2020]. Sepedi Helper. http://sepedihelper.co.za [accessed 20 August 2020]. TLex. http://tshwanedje.com/dictionary/swahili/ [accessed 20 August 2020]. 224
Aspects of African Language Lexicography
Other references Adamska-Sałaciak, A. (2006), Meaning and the Bilingual Dictionary. The case of Polish and English, Frankfurt am Main: Peter Lang. Atkins, B.T.S. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford and New York: Oxford University Press. Béjoint, H. (2010), The Lexicography of English, Oxford: Oxford University Press. Bergenholtz, H and R.H. Gouws (2012), ‘What is lexicography?’ Lexikos 22, 31–42. Bergenholtz, H. and S. Tarp (2003), ‘Two opposing theories: On H.E. Wiegand’s recent discovery of lexicographic functions’, Hermes, Journal of Linguistics 31, 171–96. Bosch, S. and G. Faaß (2014), ‘Towards an Integrated E-Dictionary Application – The Case of an English to Zulu Dictionary of Possessives’, Proceedings of the 16th Euralex International Congress: The User in Focus 15–19 July 2014, Bolzano, Italy, 739–47. Bosch, S. and G. Faaβ (2018), ‘Options for a Lexicographic Treatment of Negation in Zulu’ in I. Kernerman and S. Krek (eds), Proceedings of the LREC 2018 Workshop of GLOBALEX 2018 Lexicography & WordNets, held on 8 May 2018, Miyazaki, Japan. Dagut, M. (1981), ‘Semantic voids as a problem in the translation process’, Poetics Today 2 (4), 61–71. De Schryver, G.-M. and D.J. Prinsloo (2000), ‘Dictionary-making process with “Simultaneous Feedback” from the target users to the compilers’ in U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), Proceedings of the Ninth EURALEX International Congress, EURALEX 2000, Stuttgart, Germany, 8– 12 August 2000, Stuttgart: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 197–209. Gouws, R.H. (2007), ‘On the development of bilingual dictionaries in South Africa: Aspects of dictionary culture and government policy’, International Journal of Lexicography 20 (3), 313–27. Gouws, R.H. (2011), ‘Learning, unlearning and innovation in the planning of electronic dictionaries’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), e-Lexicography: The Internet, Digital Initiatives and Lexicography, London and New York: Continuum, 17–29. Gouws, R.H. and D.J. Prinsloo (2005), Principles and Practice of South African Lexicography, Stellenbosch: SUN PReSS. Gouws, R.H. and D.J. Prinsloo (2010), ‘Thinking out of the box – perspectives on the use of lexicographic text boxes’, Proceedings XIV Euralex International Congress. Leeuwarden, The Netherlands. 6–10 July 2010, 501–11. Haas, M.R. (1962), ‘What belongs in the bilingual dictionary?’ in F.W. Householder and S. Saporta (eds), Problems in Lexicography, Bloomington: Indiana University, 45–50. Laufer, B. (1992), ‘Corpus-based versus lexicographer examples in comprehension and production of new words’ in H. Tommola et al. (eds), Euralex ‘92 Proceedings, Tampere: University of Tampere, 71–6. Martin, W. and R.H. Gouws (2000), ‘A New Dictionary Model for Closely Related Languages: The Dutch–Afrikaans Dictionary Project as a Case in Point’ in U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), Proceedings of the Ninth EURALEX International Congress, 8–12 August 2000, Stuttgart: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, 783–92. Poulos, G. and L.J. Louwrens (1994), A Linguistic Analysis of Northern Sotho, Pretoria: Via Afrika. Prinsloo, D.J. (2009), ‘Current lexicography practice in Bantu with specific reference to the Oxford Northern Sotho School dictionary’, International Journal of Lexicography 22 (2), 151–78. Prinsloo, D.J. (2011), ‘A critical analysis of the lemmatisation of nouns and verbs in isiZulu’, Lexikos 21, 169–93. Prinsloo, D.J. (2014), ‘Lexicographic treatment of kinship terms in an English/Sepedi–Setswana–Sesotho dictionary with an amalgamated lemmalist’, Lexikos 24, 272–90. Prinsloo, D.J. (2019), ‘Lexicography: A perspective on the past, present and future of lexicography with specific reference to Africa’, Keynote presentation, Conference Proceedings of AsiaLex 2019, 13th International Conference of the Asian Association for Lexicography 19–21 June 2019, Istanbul University Congress and Culture Center, Istanbul, Turkey, 148–60.
225
The Bloomsbury Handbook of Lexicography
Prinsloo, D.J. (2020), ‘Detection and lexicographic treatment of salient features in e-dictionaries for African languages’, International Journal of Lexicography 33 (3), 269–87. https://doi.org/10.1093/ijl/ ecz031. Prinsloo, D.J. (forthcoming a), ‘User studies on the Copulative Decision Tree’. Prinsloo, D.J. (forthcoming b), ‘Negation strategies in Sepedi’. Prinsloo, D.J. and S.E. Bosch (2012), ‘Kinship terminology in English–Zulu/Northern Sotho dictionaries — a challenge for the Bantu lexicographer’ in R.V. Fjeld and J.M. Torjusen (eds), Proceedings of the 15th Euralex International Congress, 7–11 August 2012, Oslo, 296–303. Prinsloo, D.J. and T.J. Bothma (2020), ‘A copulative decision tree as a writing tool for Sepedi’, South African Journal of African languages 40 (1), 85–97. Prinsloo, D.J. and G.-M. De Schryver (2007), ‘Crafting a multidimensional ruler for the compilation of Sesotho sa Leboa dictionaries’ in J. Mojalefa (ed.), Rabadia Ratšhatšha: In-depth Literature, Linguistics, Translation and Lexicography Studies in African Languages. Festschrift in Honour of P.S. Groenewald, Pretoria: J.L. van Schaik. Prinsloo, D.J. and R.H. Gouws (2006), ‘Lexicographic presentation of grammatical divergence in Sesotho sa Leboa’, South African Journal of African Languages 16 (4), 184–97. Prinsloo, D.J. and U. Heid (2011), ‘A bilingual dictionary for a specific user group: Supporting Setswana speakers in the production and reception of English’, South African Journal of African Languages 31 (1), 66–86. Prinsloo, D.J., J.V. Prinsloo and D. Prinsloo (2018), ‘African Lexicography in the Era of the Internet’ in P.A. Fuertes Olivera (ed.), The Routledge Handbook of Lexicography, London: Routledge, 487–502. Prinsloo, D.J. and E. Taljard (2019), ‘User studies on the Sepedi Helper writing assistant’, Language Matters 50 (2), 73–99. Prinsloo, D.J. and N.B. Zondi (forthcoming), ‘Zero equivalence in isiZulu dictionaries’. Rundell, M. (2012), ‘It works in practice but will it work in theory? The uneasy relationship between lexicography and matters theoretical’ in R.V. Fjeld and J.M. Torjusen (eds), Proceedings of the 15th Euralex International Congress, 7–11 August 2012, Oslo, 47–92. Taljard, E. and D.J. Prinsloo (2013), ‘Lexicographic treatment of so-called cattle colour terms in Northern Sotho’, AFRILEX 2013, Nelson Mandela Metropolitan University, Port Elizabeth, 2–4 July 2013. Taljard, E. and D.J. Prinsloo (2019), ‘African Language Dictionaries for Children — A Neglected Genre’, Lexikos 29, 199–223. Tarp, S. (2012), ‘Do we need a (new) theory of lexicography?’, Lexikos 22, 321–32. Van Wyk, E.B. (1995), ‘Linguistic assumptions and lexicographical traditions in the African languages’, Lexikos 5, 82–96. Wiegand, H.E. (1998), Wörterbuchforschung, Berlin: Walter de Gruyter. Zimmer, B. (2017), ‘Defining the digital dictionary: How to build more useful online lexical references’, Keynote, Fifth biennial conference on electronic lexicography, eLex 19–21 September 2017, Leiden, Netherlands.
226
14
Issues in sign language lexicography Inge Zwitserlood, Jette Hedegaard Kristoffersen and Thomas Troelsgård
1 Introduction Sign languages make use of the visual/manual modality and are articulated through combinations of handshapes, movements and placement in space or on the signer’s body. The phonemes1 in sign languages are articulated simultaneously for the most part, in contrast to the (mostly) sequential pronunciation of spoken language phonemes. Furthermore, sign languages include non-manual features such as facial expressions, mouth movements, eye-gaze, eye-blinks and movements of the upper body. Sign languages are fully independent languages, each with their own lexicon, phonology, morphology and syntax, and are the native or first language of deaf people all over the world. To date, more than 130 sign languages have been identified (see Lewis 2009), although not all have been studied to the same extent. As is the case for all languages, sign languages develop and change within the user community. Reflecting the visual modality of sign language has always been a challenge to the sign language lexicographer, although the use of electronic media facilitated by technological development has been a tremendous step forward. Many challenges still remain in the field, for instance, the fact that no sign language, as far as we know, has a written representation commonly used by native signers for everyday reading and writing (see, e.g. Brien and Turner 1994), as further exemplified in Section 3. Other problems which leave sign languages in a special position in comparison to many spoken (and written) languages are the lack of standardization, which for example, complicates the identification of basic sign forms (see Section 4.2), and the lack of language resources (see Section 7). Both these problems are partly the result of the absence of a written standard, partly because sign language linguistics, including the building of corpora, is a relatively young discipline. In this chapter we will address some of the opportunities and difficulties that sign language lexicography faces if it is to comprehensively reflect these languages – to be true to linguistic principles and to providing dictionary users with the best possibility of obtaining knowledge of the language in question. After a short overview of the history of sign language dictionaries in Section 2, the problem of how to represent a visual language in printed books will be discussed in Section 3, along with the possibilities provided to sign language lexicography by recently developed multimedia platforms. The lack of a written standard not only causes problems for sign rendering; it also challenges lemmatization. Some of the main challenges are addressed in Section 4, followed by discussion of the issues in sign language lexicography concerning the task of deciding what should be
The Bloomsbury Handbook of Lexicography
handled by a grammar and what should be given separate entries in a dictionary. Section 5 gives an overview of the information types included in recent sign language dictionaries, and Section 6 illustrates some of the solutions to problems of ordering and searching lemmas in dictionaries of sign languages. In the last section we outline some of the possibilities for future development within sign language lexicography.
2 Types of sign language dictionaries Dictionaries come in several types, depending on the needs of the users they serve. Traditionally, users of sign language dictionaries are those who need signs or words to facilitate their signed communication with deaf people (e.g. educators), many of whom are L2 learners of a sign language. As a result, the earliest sign language dictionaries were printed bilingual dictionaries, i.e. from a spoken to a signed language, such as the French–French Sign Language dictionary from the late eighteenth century (Bonnal-Vergés 2008). They usually contained a (small) set of words and signs, and were basically glossaries, providing a single sign equivalent for each word and without information on meaning, use or grammatical characteristics. Through time, sign language dictionaries have incorporated more information with respect to the signs. As mentioned in Section 1, technical developments have recently made the creation of electronic dictionaries possible, which, in addition to the ability to show signs and example sentences as video clips, also facilitates (among other functions) bidirectional sign/word searching and clickable crossreferencing. Although there are specialized sign language dictionaries for particular topics, such as legal dictionaries and dictionaries on religion or health care, there is none of the variety of different dictionaries, such as specialized dictionaries of etymology, synonymy or rhyme words that we find for spoken languages (in particular European languages), nor do monolingual sign language dictionaries exist.2 As sign languages are used in increasingly diverse contexts, a greater variety of sign language dictionaries will need to be created. In particular, native and highly fluent signers, who currently hardly use sign language dictionaries, are expected to request sophisticated features that can be facilitated by rapidly changing technical developments.
3 The issue of sign rendering Lemmas, translations and information about lemmas are generally represented in writing in spoken language dictionaries. The orthography of most spoken languages (Chinese being an exception) is alphabetic or syllabic, based on the phonology and/or phonetics of the language. Sign languages also have phonology (see, e.g. Sandler 1989, Brentari 1998, Van der Kooij 2002), but, as mentioned in Section 1, they have no accepted form of orthography, although several attempts to capture formal features of signs into notation systems have been made (as will be seen below). As a result, compilers of sign language dictionaries have no standard way to represent signs. Different dictionaries make use of a range of different representation types.
228
Issues in Sign Language Lexicography
Drawings or photographs of a person articulating a sign are the most frequently used means of sign representation. This type of representation is found in most printed dictionaries. The holistic images are quite transparent, even for the naive dictionary user. As they are static representations, the dynamics of the sign (e.g. a movement of the hand or a change in the shape of the hand) need to be additionally represented by arrows and other symbols, picture sequences, or multiple overlay images. An illustration of an entry with photographs of the sign can be found in Figure 14.1. The time-consuming construction of images and the amount of space each image takes underlie the fact that only sign translations of words or sign lemmas are provided, but no further information or examples. Other forms of sign representation, found in electronic dictionaries on CD and DVD, and on the internet, include animated cartoons, avatars and, most frequently, video clips. The latter have the advantage of rendering sign dynamics, including non-manual information (e.g. facial expression). Example phrases and sentences for a sign can also be easily included. At the same time, this approach has disadvantages: the signal is brief (a dictionary user may need to view it several times); and it is not possible to view the individual signs in a sequence. For examples we refer the reader to the dictionaries of Australian Sign Language (Auslan) (http://www.auslan. org.au/dictionary/, Johnston 2009, 2012), Danish Sign Language (DTS) (http://www.tegnsprog. dk, Center for Tegnsprog 2008–12), Flemish Sign Language (VGT) (http://gebaren.ugent.be/, Van Herreweeghe et al. 2004) and German Sign Language (DGS) (http://www.sign-lang.unihamburg.de/alex/, Konrad et al. 2007b).
Figure 14.1 Representation of a sign in a (printed) Finnish Sign Language – Finnish dictionary (Malm 1998). Entry for the sign meaning ‘expensive’, ‘precious’, ‘valuable’. The initial and final hand configurations are shown, and the movement is indicated by an arrow. (Copyright Malm 1998. Reprinted with permission of the editor.)
229
The Bloomsbury Handbook of Lexicography
The disadvantages of these representation types are overcome in a third type of representation: a notation system. Such systems consist of sets of symbols and rules for combination of these symbols which are used to describe the formal characteristics of signs, such as the shape of the hand(s), the place of articulation and the movement(s) of the hand(s). Although this type of representation is static, it captures the signs’ dynamics, takes little space and facilitates ordering and searching. Nevertheless, notation systems are not very frequently used in dictionaries. Since such systems are not used by the average sign language user in day-to-day communication, many dictionary compilers do not consider the effort worthwhile which would need to be made by dictionary users to acquaint themselves with such a system. Still, sign descriptions using such systems are sometimes found in addition to images, animations and videos, providing systematic information about the form of signs, and are sometimes used for ordering and searching of sign entries. The best-known systems are the phonologically based Stokoe notation system (Stokoe et al. 1965), the phonetic system HamNoSys (Prillwitz et al. 1989) and the SignWriting notation system (Sutton 2011), originally developed as an orthographic system (but it is not generally accepted or used as such); but other systems are used as well, for example, the Swedish notation system (Bergmann and Björkstrand 1993) used in the Swedish Sign Language dictionary (Institutionen för Lingvistik 2009–11). Examples of combined sign representation through video clips and a notation system are shown in Figures 14.2 and 14.3 (other details in the entries are left out due to space limitations).
Figure 14.2 Entry for the sign for ‘worker’ in the online bilingual Dutch-Flemish Sign Language/Flemish Sign Language-Dutch dictionary. Sign representation through video clip and SignWriting (Van Herreweeghe et al. 2004). 230
Issues in Sign Language Lexicography
Figure 14.3 Entry for the sign for ‘to cut’ in the online health care dictionary of German Sign Language. Sign representation through video clip and HamNoSys (Konrad et al. 2007b).
4 Lemmatization 4.1 Lemma selection Searchable sign language corpora have not been available until recent years (also see Section 7). For this reason, the selection of lemmas for sign language dictionaries has typically been based on lists of words from spoken/written languages, for instance, the basic vocabulary of a language (as in a children’s or learner’s dictionary), or a word list relating to a specific topic, such as food signs, colour signs and sports signs. This approach only works optimally if adequate sign equivalents exist (and are known to the lexicographer). Other sources have been lemma lists taken from existing sign language dictionaries or manual selection performed by sifting through video recordings. The drawback of these approaches, compared to a corpus-based approach, is that many signs will never be selected as lemmas, for instance, signs that cannot be translated adequately into a single word or compound and which are therefore less likely to appear in sign lists based on word lists. With the use of sign language corpora as a tool for lemma selection, it will become easier to ensure that the selected vocabulary reflects the actual language use, for instance, by including corpus frequency as a selection criterion.
4.2 Lemmatization principles Just as lemmatization principles can be problematic for spoken languages, they can be so when dealing with sign languages. A particular challenge is a certain degree of non-fixedness of basic sign forms that results from the lack of a written standard (cf. Section 4.4.1). In the absence of an orthography, the lemmatization principles of a sign language dictionary must involve other areas such as phonology, semantics and etymology in order to set up rules for distinguishing between 231
The Bloomsbury Handbook of Lexicography
one type (and its tokens) and another. In relation to the phonology-related area, many sign language dictionaries include handshape, orientation, place of articulation and movement as lemmatization parameters, while others include non-manual features such as mouth movement.3 Semantics is often included, but the definition of the minimal semantic distance between two lemmas varies considerably across different dictionaries. Etymology is for many sign languages an unexplored area. Even large sign language dictionaries typically have considerably fewer entries than large dictionaries of written languages, regardless of how counts of entries are made. There are a number of reasons for this: signs in general tend to be more polysemous than words (at least in most European spoken languages); and the usage of signs is harder to account for because of the lack of a standardized written representation and language resources. In other words, many signs may exist, and even occur quite frequently, but not appear in any dictionary or other description of the language. Additionally, and also because of the lack of a standard written form, signs are not ‘preserved’ in texts year after year in a fixed form, as words are in books and on webpages; they appear only in video clips that are much fewer in number than written texts, and they are not searchable – apart from those that occur in recently developed sign language corpora.
4.3 Articulation form variants As a part of the lemmatization process it is important to make a clear distinction between form variants and synonyms of a sign. For instance, the Danish Sign Language (DTS) sign for ‘hair’ can have at least two forms, as shown in Figure 14.4: the thumb and the index finger touch each other in both cases, but the other three fingers can be either bent or extended. Both forms are used and accepted in the language community and both forms seem to be evenly distributed across the community, sometimes even within idiolects. The lexicographer could treat the two forms as variants of one sign, or as synonyms. Native signers consider both forms in Figure 14.4 as variants of one sign. This in turn points to the need to decide the extent to which
Figure 14.4 Two variants of the sign for ‘hair’ in DTS. Illustrations from the DTS dictionary (Center for Tegnsprog 2008–12). 232
Issues in Sign Language Lexicography
differences in the forms of signs require treatment as separate lemmas. Form variation is quite common in sign languages, as mentioned earlier, partly due to the lack of a written standard. One solution could be to allow for variation in one and only one of the major parameters: handshape, place of articulation, movement or orientation (see Troelsgård and Kristoffersen (2008) for a discussion of this solution) within the same lemma. Another issue is very important once the distinction between synonyms and variants has been made: should form variants be shown side by side in the dictionary, i.e. should the dictionary be descriptive or normative? If the goal for the dictionary is normative, an additional problem will be how to decide which variant should be preferred. The lack of large corpora makes these decisions very difficult.
4.4 Lemmatization issues Traditionally, sign lemmas have been defined as manual forms that have a meaning. In addition, they must also be pronounceable when occurring in isolation. That is, they should comprise at least a handshape, an orientation and a place of articulation (e.g. in front of the signer’s body). Elements like bound roots, affixes and modifications do not meet these criteria: they do not appear in isolation (similarly to bound roots and affixes in spoken/written languages), and their meanings are often difficult to describe through spoken/written word equivalents. Additionally, the modifications are typically expressed solely by place of articulation, movement or nonmanual features, and are therefore impossible to render in the same way as regular headword signs. Therefore, consideration must be given to whether these elements should be lemmatized in dictionaries or should appear as inflectional/derivational information in the entries or in a grammar. In the following section we focus on different types of such elements, discussing whether they are suitable as dictionary lemmas, and how they could be described in a dictionary (see also Johnston and Schembri 1999).
4.4.1 Classifier predicates A type of sign that occurs in almost all sign languages studied to date are classifier predicates (sometimes referred to as polymorphemic verbs). These signs express the existence and configuration of entities in space and the movement of entities through space (see among others Supalla 1982, Tang 2003, Cuxac and Sallandre 2007, Zwitserlood 2012). They are productively created by combinations of meaningful units that are expressed by handshapes, places of articulation, and movements. Although there is an ongoing debate on the structure and nature of these signs,4 it is generally assumed that the placement or movement of the hand in a classifier predicate expresses location or motion morphemes, respectively, and that the handshape expresses the morpheme for an entity that is somewhere in space or that is in motion. An example from the Sign Language of the Netherlands (NGT) is in Figure 14.5. Classifier predicates are created in the course of signing and can have an infinite number of forms and meanings. They are highly productive and have an interpretation that is predictable from the meaning of the components. As far as we know, no dictionaries have lemmatized classifier predicates. Some dictionaries (e.g. Brien 1992, Tang 2007) describe classifier constructions in a separate grammar section and list the classifier morpheme inventory using pictures or photos of the corresponding handshapes. Another approach, used in the DTS 233
The Bloomsbury Handbook of Lexicography
Figure 14.5 Classifier predicate in NGT. Literally: ‘upright animate entity moves upwards near cylindrical entity’. In context: ‘cat goes up inside drainpipe’ (Crasborn et al. 2008, Creative Commons license BY-NC-SA).
Dictionary (Center for Tegnsprog 2008–12), has been to treat the classifier handshapes as the roots of the classifier predicates and to provide separate entries for these elements, similar to affix entries (Kristoffersen and Troelsgård 2012). In contrast, representing classifier movements and place of articulation in isolation is simply not possible by means of pictures, photographs or videos of a signing person. 234
Issues in Sign Language Lexicography
4.4.2 Numeral incorporation Another type of morphologically complex signs found in many sign languages are ‘numeral incorporated’ signs. One type of numeral incorporation involves the parameterized, simultaneous expression of a numeral and a noun (e.g. Liddell 1996, Mathur and Rathmann 2010). An example of numeral incorporated signs from the Danish Sign Language (DTS) is the third and fourth sign in Figure 14.6, where the sign for ‘month’ is combined with the numerals ‘one’ and ‘three’, respectively. This process is generally limited to nouns that express enumerable quantities (e.g. time and currency units), and is not always fully predictable: for instance, within a single sign language, nouns are idiosyncratic with respect to the numerals that can be incorporated. This process can be described in the grammar section of a sign language dictionary (including the restrictions), and the entries of nouns that allow combination with a numeral can provide this information along with other possible derivations of the noun. A similar process is found, though, where the numeral incorporated forms do not have a pronounceable basic noun root. These forms always express a quantified noun in context, and no citation form of a non-quantified noun is available. Yet, in these cases, a root (i.e. a non-incorporated form) must still be assumed, even though it is not pronounceable in isolation. Such root forms are considered to be bound morphemes and may be expressed by a movement and a place of articulation only. Like the movement in classifier predicates, these roots are difficult to represent in a sign language dictionary in which signs are represented with pictures, photographs or videos.5 The complex forms, clearly derived from the combination of a noun and a numeral, do not need to be lemmatized in dictionaries that provide the user with morphological information, but the forms that do not have a non-quantified noun root cannot be dealt with in that way. However (in contrast to classifier predicates), the inventory of numeral incorporations of each root is finite, and some dictionaries do lemmatize every incidence of numeral incorporation, for instance, the DTS Dictionary (Center for Tegnsprog 2008–12).
4.4.3 Morphological complexity below the level of the lexical sign Recent studies argue that many lexical signs that are generally considered as monomorphemic are, in fact, built up from smaller meaningful units (Van der Kooij and Zwitserlood to appear, in progress). In this, these lexical signs resemble classifier predicates, that, as described above in
Figure 14.6 Examples of numeral incorporation in DTS. All pictures are from the DTS dictionary (Center for Tegnsprog 2008–12). 235
The Bloomsbury Handbook of Lexicography
Figure 14.7 Example of a sign entry in Global Signbank (NGT); description of all form units with their meaning (in this sign). Section 4.4.1, also consist of smaller meaning units, but in contrast to these predicates, lexical signs have a non-compositional meaning. Van der Kooij and Zwitserlood’s analysis, which starts with form features rather than parameters, is that meaning is found even in the smallest form units of lexical signs, such as the number of selected fingers and orientation. Moreover, combinations of such units of form and meaning, even if in themselves not pronounceable, can be meaningful as well. This is illustrated in Figure 14.7. Here, the necessary features are listed, together with the meaning they have in this sign, and the way they combine into a sub-sign lexeme, translated as ‘pronged object’, and a pronounceable lexeme, translated as ‘plug’. Since such a systematic pairing of form and meaning is present in both existing and newly formed signs, sign language dictionaries should in principle also contain lists of Form-Meaning Units (FMUs). While there may be overlap in these units due to the iconic relation between linguistic form and referent or event in the real world, we expect that also interesting languagespecific FMUs and sub-sign lexemes will be discovered.
5 Lemma information As mentioned in Section 2, many sign language dictionaries have a rather simple structure, often simply consisting of a spoken/written language headword and a sign language equivalent rendered as a video clip, a photograph or a drawing. Similarly, dictionaries with signs as headwords often have one or more written translation equivalents as the only information about the lemma. Some sign language dictionaries, however, do include other types of information about sign lemmas. These can include: ●● ●●
●● ●● ●● ●● ●●
236
a formal notation of the sign form (cf. Section 3) a list of the prominent basic phonological features of the sign, such as handshape, place of articulation, and movement textual description of the sign pronunciation information about mouth movements if these typically accompany the sign information about form variants, for instance, rendered as additional video clips part(s) of speech morphological information (inflection and derivation)
Issues in Sign Language Lexicography
●● ●●
●● ●● ●● ●●
additional written translation equivalents description of the use of signs where the meaning or function cannot be rendered satisfactorily through a written language equivalent definitions of sign meanings cross-references to sign synonyms, antonyms, etc. information about usage restrictions, like region or age-specific use of the sign example sentences
Most commonly, only some of the above-mentioned information types are included, and different sign language dictionaries choose different approaches for presenting the information. For instance, a sign language example sentence can be rendered through a video clip, as a series of photos or drawings, or in a formal notation, and it can be accompanied by a sign-by-sign transcription and/or a translation into a written language. Because of the need for graphic or multimedia representation of signs, information that comprises sign representations (e.g. example sentences) consumes more screen space than a textual rendering of similar information in a dictionary of a spoken/written language, as mentioned in Section 3. Because of this, sign language dictionary entries tend to become visually heavy, and in electronic sign language dictionaries some information types are often relegated to a tab or a sub-page in order to save screen space. In Section 3 we have shown an example of a sign entry in a printed sign language dictionary; Figures 14.8 and 14.9 are examples of entries in online dictionaries. The entry in Figure 14.8 (New Zealand Sign Language [NZSL], McKee 2011) includes a line drawing, a video of the sign, additional information on the pronunciation of the sign, grammatical information and a usage example. Moreover, the example is also rendered as a literal textual translation, each word of which is linked to its entry. The sign entry from the DTS Dictionary (Center for Tegnsprog 2008–12) in Figure 14.9 describes a sign having three meanings: ‘minute’/‘second’; ‘time’ (n)/‘moment’ and ‘while’ (conj.), each of which can be viewed in an example sentence with a translation of these sentences. The example sentences are, moreover, transcribed sign by sign (as in the NZSL dictionary), with links to the entries of the signs. Among the information given is the possibility to see a form variant (with the second ’play’ button above the video window), a cross-reference to a sign synonym for the meaning ‘minute’ (the ‘=’ in meaning 1) and acceptable mouthings that can accompany the sign6 (denoted by icons after the relevant Danish equivalents).
6 Ordering and searching 6.1 Lemma ordering in printed dictionaries Printed dictionaries of spoken/written languages typically order lemmas alphabetically, sometimes topically, with the exception of special purpose dictionaries like crossword dictionaries and rhyming dictionaries. For sign language dictionaries the ordering question is made more difficult because no sign language has a standard written form. Several different approaches to ordering have been chosen in printed sign language dictionaries: some dictionaries order their entries alphabetically according to sign glosses7 or word equivalents; others use features of all or 237
The Bloomsbury Handbook of Lexicography
Figure 14.8 Example of a sign entry in the NZSL dictionary (McKee 2011). some of the main phonological parameters of handshape, orientation, place of articulation and movement, prioritized in a fixed order, sometimes corresponding to a formal notation system like HamNoSys. For example, the signs in the AUSLAN dictionary (Johnston 1989) are ordered first by handshape, second by handedness8 and third by place of articulation (starting with the head, then moving downwards). Stokoe et al. (1965), in contrast, order signs first by place of articulation (starting with the space in front of the signer, then moving downwards from the head), second by handshape and third by handedness. Just as there are varying numbers of words starting with a given letter in spoken language dictionaries, the sections of a phonologically ordered sign language dictionary vary in size, as some handshapes and places of articulation occur more frequently than others. Sign language dictionaries that use a phonologically based ordering of lemmas typically also include an alphabetic index of word equivalents. Similarly, 238
Issues in Sign Language Lexicography
Figure 14.9 Example of a sign entry in the DTS Dictionary (Center for Tegnsprog 2008–12). some dictionaries with an alphabetic ordering based on words include an index ordered, for instance, according to handshape (e.g. Konrad et al. 2007a).
6.2 Search facilities in electronic dictionaries Electronic sign language dictionaries typically offer text search on written language equivalents as a minimum. Some dictionaries’ search facilities include topical search. In addition, most newer dictionaries facilitate searches using one or more parameters related to sign form or sign usage, for instance: ●● ●● ●● ●● ●● ●● ●●
handshape place of articulation orientation movement handedness mouth movement region- or age-specific use 239
The Bloomsbury Handbook of Lexicography
For form-related searches, the criteria are generally chosen from pop-up windows or sub-pages, or in some cases via a formal notation system like HamNoSys. Such searches are most likely used by users with a fair or good knowledge of the sign language, while text and topic searches are more likely to be preferred by users with less knowledge of the language. Figures 14.10–14.14 show examples of the selection possibilities found with five different categories of search criteria in electronic sign language dictionaries: handshape, place of articulation (NZSL: McKee 2011), movement (FSL: Kuurojen Liitto 2003–5), orientation (DGS, Konrad et al. 2007b) and mouth movement (FSL: Kuurojen Liitto 2003–5).
Figure 14.10 Handshape selection window in the NZSL dictionary (McKee 2011). The user can choose from 30 handshape groups, comprising in total 63 handshapes. 240
Issues in Sign Language Lexicography
Figure 14.11 Selection window for place of articulation in the NZSL dictionary (McKee 2011).
Figure 14.12 Movement selection window in the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). The options are (from left to right): straight or curved movement, circular movement, twisting or bending wrist, opening or closing hand, finger wiggling, and no movement.
241
The Bloomsbury Handbook of Lexicography
Figure 14.13 Selection window for finger orientation in the Online Health Care dictionary of German Sign Language (Konrad et al. 2007b).
Figure 14.14 Mouth movement selection window in the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). The user can choose from 57 options, 17 shown as photos, as above and 40 rendered as text in a drop-down list. 242
Issues in Sign Language Lexicography
6.3 Presentation of search results in electronic dictionaries The amounts and types of information in the search result lists vary in electronic sign language dictionaries. Most commonly, a text representation in the form of a gloss or one or more written language equivalents is given. Other types of information occurring in search results include: ●● ●● ●● ●● ●● ●● ●●
a photograph or drawing a formal notation of the sign a textual description of the sign pronunciation the mouth movement(s) (if present in the sign) an ID number the part(s) of speech the topic(s)
Similarly to the ordering of signs in printed sign language dictionaries (cf. Section 6.1), search results in electronic dictionaries are typically ordered alphabetically after a translation equivalent or gloss, or according to sign form. Some dictionaries, however, offer a choice between different sort orders. Figures 14.15–14.17 show examples of search result lists from three different electronic sign language dictionaries.
Figure 14.15 Search result list from the Finnish Sign Language dictionary (Kuurojen Liitto 2003–5). Each entry found is represented by a photo, an ID-number and the first line of Finnish equivalents (for each sense). The selected search criteria are located to the right of the result list (not shown in this screenshot). 243
The Bloomsbury Handbook of Lexicography
Figure 14.16 Search result list from the NZSL dictionary (McKee 2011). Each entry found is represented by a drawing, a main gloss, secondary equivalents (if present) and the parts of speech of the sign. The selected search criteria are shown above the result list.
7 Future developments In the above sections we have described some of the main issues and problems that sign language dictionary construction faces. As well as the general problems of creating good dictionaries, sign language dictionary compilers have to work in the context of a general lack of (accessible) resources from which they can derive information about the meaning(s), use and frequency, as well as grammatical information of the signs in the language. Furthermore, because of the absence of an accepted sign orthography, a representation of signs needs to be chosen. Each of the possible choices results in specific problems as described above. As for data sources, rapid 244
Issues in Sign Language Lexicography
Figure 14.17 Search result list from the DTS Dictionary (Center for Tegnsprog 2008–12). Each entry found is represented by a photo, an ID-gloss, 0–3 relevance markers9 and the first Danish equivalent (of each sense). The selected search criteria are shown to the left of the result list. technological development has facilitated the composition of large corpora in the past decades (e.g. the Auslan, NGT, BSL and DGS corpora: Crasborn et al. 2008, Johnston 2009, Schembri et al. 2011, Rathmann 2011), although the necessary annotation and tagging of data in these corpora remains a time- and labour-consuming enterprise. Existing corpora typically consist of elicited language samples; however, because it is now possible for anyone to upload videos to the internet, it will become increasingly easier to acquire sign language data covering a wide range of genres: news, lectures, talks, stories, poetry and jokes, and in this way broaden the coverage provided by corpora. 245
The Bloomsbury Handbook of Lexicography
The morphology of most sign languages studied to date is generally quite intricate (see e.g. Sandler and Lillo-Martin 2006, Van der Kooij & Zwitserlood to appear, in progress). This presents certain challenges for the lexicographer in deciding whether or not to list particular morphemes, and if they are to be included, the best way to do so, as indicated in Section 4. Until now, few of the digital sign language dictionaries address morphological and morpho-syntactic issues (e.g. verb agreement, aspect marking, plural formation and non-manual features), while some printed sign language dictionaries have separate grammar sections. Considering the superiority of the electronic medium with respect to the presentation of sign language examples (i.e. as video clips), the addition of more information on inflection and word formation in sign language dictionary entries, as is often found in dictionaries of spoken/written languages, would be a welcome future development. Similarly, one could imagine the addition of grammar sections, e.g. with lists of verbs that can be modified for a particular type of agreement, or lists of nouns with a particular plural modification, just as some English dictionaries list verbs with their irregular tenses, or some Turkish dictionaries list verbs with the cases they assign to their arguments. As stated in Section 2, at present various types of sign language dictionary exist, almost all of them bilingual. Most of the printed (and many of the digital) ones are basically unidirectional from spoken language to signed language. Expansion of approaches to meeting user needs is likely to occur with further development, in order to serve sign language users and learners, whose requirements may include extensive information about pronunciation, meaning, use and grammatical characteristics of the lemmas. We can also envisage bilingual dictionaries for two sign languages. There are no reasons for fewer dictionary types than are found in spoken languages. More comprehensive dictionaries will be facilitated not only by technical development but also by increasing linguistic understanding of sign languages. Technical developments in particular facilitate creation of user interfaces for a single large dictionary database, where the user can indicate the type(s) of information required, such as signs/words within a particular domain, synonyms and grammatical information. Furthermore, sign language dictionaries for use on portable devices, such as mobile phones and tablets, already exist in the form of websites and apps, and further developments drive rapid change. Finally, the possibilities for creating connections to other digital language resources, such as corpora, grammars, encyclopaedias and dictionaries of other languages, will increase. Currently much effort is put into the development of lexical databases, in particular Global Signbank (Crasborn et al. 2016, 2017, Cassidy et al. 2018), currently used for some twelve different sign languages10 and iLex (Hanke and Storz 2008, Langer et al. 2018). Yet, so far, there are no dictionary interfaces for general or specific target user groups.
Notes 1 This term, originally used for the sound system of (spoken) language, has been extended to the basic formal systems (visual or auditory) of languages. 2 In a typical monolingual dictionary, the same language is used as object language and as metalanguage. The use of sign languages as metalanguages, however, is problematic (see Kristoffersen and Troelsgård 2012).
246
Issues in Sign Language Lexicography
3 For a discussion of the role mouth movements in the lemmatizing process, see Kristoffersen and Niëmala (2008). 4 See, for instance, Engberg-Pedersen (1993), Slobin et al. (2003), Schembri (2003) and Liddell (2003). 5 Some researchers (e.g. Liddell 1996) suggest that such forms are affixes rather than roots. 6 See Kristoffersen and Niëmala (2008). 7 In sign language linguistics, a sign is often represented by a gloss, i.e. a spoken/written language word that reflects the meaning (for polysemous signs that is one of the meanings) of the sign, and is used as a mnemonic for the sign. Sign language glosses are typically written in upper case. 8 Johnston (1989) distinguishes between signs that are made with one hand, double-handed signs (both hands have the same handshape) and two-handed signs (the hands have different handshapes). 9 For handshape and place of articulation searches, the matches are weighted according to their appearance in the sign, so that the first occurring handshape or place of articulation scores higher than the following. Text search matches are weighted in the following order: ID-glosses, Danish equivalents, glosses in usage examples, words in the translations of usage examples. Based on the calculated relevance scores each match receives 0–3 ‘relevance stars’. 10 See https://signbank.science.ru.nl/datasets/available for all datasets.
References Bergman, B. and T. Björkstrand (1993), Kompendium i teckentranskription, Stockholm: Institutionen för lingvistik, Stockholms universitet. Bonnal-Vergés, F. (ed.) (2008), Abbé Jean Ferrand, Dictionnaire à l’usage des sourds et des muets (original circa 1784), Limoges: Lambert-Lucas. Brentari, D. (1998), A Prosodic Model of Sign Language Phonology, Cambridge, MA: MIT Press. Brien, D. (ed.) (1992), Dictionary of British Sign Language/English, London: Faber and Faber. Brien, D. and G. Turner (1994), ‘Lemmas, dilemmas, and lexicographical anisomorphism: Presenting meanings in the first BSL – English dictionary’ in I. Ahlgren, B. Bergman and M. Brennan (eds), Perspectives on Sign Language Usage: Papers from the Fifth International Symposium on Sign Language Research, Vol. 2, Durham: The International Sign Linguistics Association (ISLA), 391–407. Cassidy, S., O. Crasborn, H. Nieminen, W. Stoop, M.Hulsbosch, S. Even, E. Komen and T. Johnston (2018), ’Signbank: Software to support web based dictionaries of sign language’, Proceedings of LREC 2018, 2359–64. Center for Tegnsprog (2008), Ordbog over Dansk Tegnsprog. Available at http://www.tegnsprog.dk [accessed 8 June 2012]. Crasborn, O., I. Zwitserlood and J. Ros (2008), Corpus NGT. 72 hours of monologues and dialogues in Sign Language of the Netherlands, most of which have an open access Creative Commons license (BY-NC-SA), available at: www.ru.nl/corpusngten/ [accessed 10 June 2012]. Crasborn, O., R. Bank, I. Zwitserlood, E. Van Der Kooij, A. Schüller, E. Ormel, E. Nauta, M. Van Zuilen, F. Van Winsum and J. Ros (2016), ‘Linking lexical and corpus data for sign languages: NGT Signbank and the Corpus NGT’ in E. Efthimiou et al. (eds), Proceedings of the 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining, Portorož, Slovenia: ELRA, 41–6. Crasborn, O., I. Zwitserlood, E. Van Der Kooij and R. Bank (2017), ’A corpus-based lexical database for Sign Language of the Netherlands’, Extended abstract, COST ENeL WG3 meeting: Between Corpora and Dictionaries/Crowdsourcing and Gamification, Budapest, Hungary, 24–25 February 2017. Cuxac, C. and M-A. Sallandre (2007), ‘Iconicity and arbitrariness in French Sign Language – highly iconic structures, degenerated iconicity and diagrammatic iconicity’ in E. Pizzuto, P. Pietrandrea and R. Simone (eds), Verbal and Signed Languages. Comparing Structures, Constructs and Methodologies, Berlin: Mouton de Gruyter, 13–33.
247
The Bloomsbury Handbook of Lexicography
Emmorey, K. (ed.) (2003), Perspectives on Classifiers in Sign Languages, Mahwah, NJ: Lawrence Erlbaum Associates. Engberg-Pedersen, E. (1993), Space in Danish Sign Language. The Semantics and Morphosyntax of the Use of Space in a Visual Language, Hamburg: Signum. Hanke, T. and J. Storz (2008), ‘iLex – A database tool for integrating sign language corpus linguistics and sign language lexicography’ in O. Crasborn, T. Hanke, E. Efthimiou, I. Zwitserlood and E. Thoutenhoofd (eds), Construction and Exploitation of Sign Language Corpora. Proceedings of the 3rd Workshop on the Representation and Processing of Sign Languages, 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, Paris: ELRA, 64–7. Johnston, T. (1989), Auslan Dictionary: A Dictionary of the Sign Language of the Australian Deaf Community, Petersham: Deafness Resources. Johnston, T. (2009), The Auslan Signbank. Available at http://www.auslan.org.au/dictionary/ [accessed 13 August 2012]. Johnston, T. (2012), Auslan Corpus, Endangered Languages Archive, SOAS, University of London. Available at http://elar.soas.ac.uk/deposit/johnston2012auslan [accessed 13 August 2012]. Johnston, T. and A. Schembri (1999), ‘On defining lexeme in a signed language’, Sign Language & Linguistics 2 (2), 115–85. Konrad, R., G. Langer, S. König, A. Schwarz, T. Hanke and S. Prillwitz (2007a), Fachgebärdenlexikon Gesundheit und Pflege, Hamburg: Signum. Konrad, R., G. Langer, S. König, A. Schwarz, T. Hanke and S. Prillwitz (2007b), Fachgebärdenlexikon Gesundheit und Pflege, Institut für Deutsche Gebärdensprache. Available at http://www.sign-lang.unihamburg.de/glex/intro/inhalt.htm [accessed 4 June 2012]. Kristoffersen, J.H. and J.B. Niemelä (2008), ‘How to describe mouth patterns in the Danish Sign Language Dictionary’ in R. Müller De Quadros (ed.), 230–8. Kristoffersen, J.H. and T. Troelsgård (2012), ‘Electronic sign language dictionaries’ in S. Granger and M. Paqout (eds), Electronic Lexicography, Oxford: Oxford University Press, 290–312. Kuurojen Liitto (2003–5), Suvi Suomalaisen viittomakielen verkosanakirja. Available at http://www. viittomat.net [accessed 8 June 2011]. Langer, G., A. Müller and S. Wähl (2018), ‘Queries and Views in iLex to Support Corpus-based Lexicographic Work on German Sign Language (DGS)’ in M. Bono et al. (eds), Workshop Proceedings. 8th Workshop on the Representation and Processing of Sign Languages: Involving the Language Community, Language Resources and Evaluation Conference (LREC), Miyazaki, Japan, 12 May 2018, ELRA, 107–14. Lewis, M. (ed.) (2009), Ethnologue: Languages of the World, Sixteenth Edition, Dallas, TX: SIL International. Online version available at http://www.ethnologue.com/ [accessed 10 August 2012]. Liddell, S.K. (1996), ‘Numeral incorporating roots and non-incorporating prefixes in American Sign Language’, Sign Language Studies 92, 201–26. Liddell, S.K. (2003), ‘Sources of meaning in ASL classifier predicates’ in K. Emmorey (ed.), 199–220. Malm, A. (ed.) (1998), Suomalaisen Viittomakielen Perussanakirja, Helsinki: Kuurojen Liittory Libris Oy. Mathur, G. and C. Rathman (2010), ‘Two types of nonconcatenative morphology in sign languages’ in G. Mathur and D.J. Napoli (eds), Deaf Around the World: The Impact of Language, Oxford: Oxford University Press, 54–82. McKee, D. (managing ed.) (2011), The Online Dictionary of New Zealand Sign Language. Available at http://nzsl.vuw.ac.nz [accessed 12 June 2011]. Müller De Quadros, R. (ed.) (2008), Sign Languages: Spinning and Unraveling the Past, Present and Future, TISLR9, forty-five papers and three posters from the 9th Theoretical Issues in Sign Language Research Conference, Florianopolis, Brazil, December 2006, Petropolis: Editora Arara Azul. Prillwitz, S., R. Leven, H. Zienert, T. Hanke and J. Henning (1989), HamNoSys. Version 2.0; Hamburg Notation System for Sign Languages. An Introductory Guide, International Studies on Sign Language and Communication of the Deaf 5, Hamburg: Signum. Rathmann, C. (project leader) (2011), DGS-Korpus. Available at http://www.sign-lang.uni-hamburg.de/ dgs-korpus [accessed 8 June 2011]. 248
Issues in Sign Language Lexicography
Sandler, W. (1989), Phonological Representation of the Sign: Linearity and Nonlinearity in American Sign Language, Dordrecht: Foris. Sandler, W. and D. Lillo-Martin (2006), Sign Language and Linguistic Universals, Cambridge: Cambridge University Press. Schembri, A. (2003), ‘Rethinking “classifiers” in signed languages’ in K. Emmorey (ed.), 3–34. Schembri, A., J. Fenlon, R. Rentelis and K. Cormier (2011), British Sign Language Corpus Project: A Corpus of Digital Video Data of British Sign Language 2008-2011, First edition, London: University College London. Available at http://www.bslcorpusproject.org [accessed 8 June 2012]. Slobin, D.I., N. Hoiting, M. Kuntze, R. Lindert, A. Weinberg, J. Pyers, M. Anthony, Y. Biederman and H. Thumann (2003), ‘A cognitive/functional perspective on the acquisition of “classifiers”’ in K. Emmorey (ed.), 271–98. Stokoe, W.C., D.C. Casterline, and C.G. Croneberg (1965), A Dictionary of American Sign Language on Linguistic Principles, Washington, DC: Gallaudet College Press. Supalla, T. (1982), ‘Structure and acquisition of verbs of motion and location in American Sign Language’, PhD Thesis, San Diego: UCSD. Sutton, V. (2011), Sutton’s SignWriting Site. Available at http://www.signwriting.org [accessed 8 August 2012]. Tang, G. (2003), ‘Verbs of motion and location in Hong Kong Sign Language: Conflation and lexicalization’ in K. Emmorey (ed.), 143–65. Tang, G. (ed.) (2007), Hong Kong Sign Language. A Trilingual Dictionary with Linguistic Descriptions, Hong Kong: The Chinese University Press. Troelsgård, T. and J.H. Kristoffersen (2008), ‘An electronic dictionary of Danish Sign Language’ in R. Müller De Quadros (ed.), 652–62. Van Der Kooij, E. (2002) ‘Phonological categories in sign language of the Netherlands. The role of phonetic implementation and iconicity’, PhD Thesis, Leiden University, Utrecht: LOT Publishers. Van Der Kooij, E. and I. Zwitserlood (in progress), ‘Strategies for new word formation in NGT: A case for simultaneous morphology’, Sign Language & Linguistics. Van Der Kooij, E. and I. Zwitserlood (in progress), ‘Structure & meaning in lexical signs in NGT’. Van Herreweeghe, M., S. Slembrouck and M. Vermeerbergen (2004), Digitaal Vlaamse GebarentaalNederlands/Nederlands-Vlaamse Gebarentaal Woordenboek (Digital Flemish Sign Language-Dutch/ Dutch-Flemish Sign Language Dictionary). Available at http://gebaren.ugent.be/ [accessed 4 June 2012]. Zwitserlood, I. (2012), ‘Classifiers’ in R. Pfau, M. Steinbach and B. Woll (eds), Sign language: An International Handbook, Berlin: Mouton de Gruyter, 158–86.
249
250
15
Identifying, ordering and defining senses Robert Lew
1 Sense(s) in language versus senses in the dictionary Linguists and philosophers of language have often talked of sense as a mass noun, typically in opposition to reference, where sense would refer to conceptual meaning, contrasted with a piece of the world that a linguistic expression refers to. In a dictionary, however, senses are something distinctly different. They are basic units of entry organization: the most distinct component parts of the dictionary article. Piotrowski (1994: 21) defines a sense in lexicography as ‘one of the main divisions of the entry, usually marked typographically by consecutive letters or numbers’. Indeed, senses are often explicitly numbered in sequence, less commonly prefixed by letters, or punctuated in a typographically more subtle manner, such as by semicolons. Occasionally, special symbols are used to effect a visual separation of senses, such as a diamond ♦, centred dot •, triangle ► or square ■. Dictionary senses may be run on (= continued on the same line), but they may also be given each on its own line. A one-sense-per-line presentation is generally believed to be easier to navigate, but it comes at the cost of using up more space. For this reason, this option is particularly common in on-screen presentation of digital dictionaries and whenever user friendliness takes precedence over space considerations, such as in dictionaries directed at language learners or children. To make a general point, entry organization in a dictionary serves the purpose of enabling users to locate – and then make good sense of – the lexicographic data included in the entry. In most dictionary projects, the aim is to create efficient and effective tools, assisting the user in whatever lexicographically relevant queries, problems and doubts they may have, and good entry organization improves the efficiency of the dictionary as a tool. Dictionary users (including many linguists!) tend to conflate these two rather distinct meanings of sense, assuming without much reflection that when they look up a word in a dictionary, the senses present in the entry mirror what goes on in the language. In most cases, however, the correspondence is far from perfect, though generally speaking it tends to be closer in monolingual than in bilingual dictionaries. Also, such an approximation is less of a distortion in academic (or ‘scholarly’) dictionaries, whose general aim may be to present a reasonably faithful portrait of a language. Still, the fact that such dictionaries often include a diachronic dimension reinforces the point that lexicographic sense division cannot be expected to mirror linguistic reality, however the latter is to be understood. Further evidence of the relative autonomy of the lexicographic sense from the linguistic notion by the same name comes from the practice of the elevation of multi-word expressions to sense status in some dictionaries. Multi-words are not infrequently presented on a par with the more
The Bloomsbury Handbook of Lexicography
‘traditional’ dictionary senses. For example, Longman Dictionary of Contemporary English Online enters eight senses of train as a noun. Interspersed between the more conventional senses are four multi-word items: senses four and five respectively are the multi-words bring something in its train and set something in train. Clearly, these two expressions instantiate quite similar semantic values of the lemma train, and yet they are listed as separate senses. Conversely, there are also many dictionaries which lump all multi-words under a single dictionary sense. This broad variation in lexicographic practice strengthens the point that the lexicographic sense may bear, at best, a tenuous relationship to linguistic notions. It is clear that discrete senses exist in dictionaries, but do they exist in language as well?
1.1 Are senses discrete entities? As explained above, the question of atomicity of senses can apply to both the lexical units of a language and the structural elements of a dictionary. Linguists do not all agree on the issue of atomicity of senses. Some of those that do see meanings as atomic like to embark on the ambitious quest for the boundary between polysemy and vagueness. The opposite view is well represented by Patrick Hanks (2000: 211), who maintains that words only carry meaning potentials which are rather vague, and do not take on their full shape outside of their context (which includes, but is not exhausted by, co-text). There is nothing wrong with such vagueness, and it may actually foster language creativity, allowing speakers to express new ideas with existing words. Patrick Hanks and John Sinclair have also argued against a strict separation of form and meaning, showing from corpus evidence that the two tend to go hand in hand: like meanings tend to be expressed through like structures. Again, this is merely a statistical tendency, as language is nowhere near as ordered as many linguists would like it to be. Another pertinent observation that lexicographers and linguists owe to Sinclair is dispelling the myth of orthographic words as principal carriers (or containers) of meaning: units of meaning should not be seen as being coextensive with orthographic words (the idiom principle). Paradoxically, the very fact that so many linguists seem to feel comfortable with the idea of atomic senses in language may well be a reflection of linguists’ practical, pre-theoretical experience with dictionaries (Nowakowski 1990: 10, Burkhanov 1997: 70). It is not at all unlikely that repeated exposure to structured dictionary entries by linguists-to-be in the role of ordinary dictionary users may have shaped their future thinking on how language itself might be structured. In a similar vein, Hanks (2000: 205) notes that ‘[t]he numbered lists of definitions found in dictionaries have helped to create a false picture of what really happens when language is used’. In this context, it is appropriate to reflect on what the ‘identification of senses’ in the chapter title might really refer to. Who does the identifying and what is the thing that is being identified? One answer that can be given with some confidence is that dictionary users identify senses in the dictionary which they happen to be consulting: they look for the structural segments of entries that best fit the problem at hand which has prompted them to consult a dictionary in the first place. These senses have been put in the dictionary by the lexicographer. But has the lexicographer actually ever identified these exact senses in the language? This is a tough question and the answer can at best be a qualified yes, with the degree to which it may be true depending on the type of vocabulary item. Some words appear to have meanings which are relatively fixed and
252
Identifying, Ordering and Defining Senses
do not yield that much to contextual coercion (such words are sometimes termed autosemantic). But there are other words, which of themselves tend to be rather vague and pick up a significant portion of their meaning from the context (relatively synsemantic words). Very common words can be semantically impoverished, such as, in English, have in have a go. Today, much lexicographic work is done by examining massive corpus evidence, but, as any novice lexicographer is soon bound to discover, it is notoriously difficult to compartmentalize corpus citations into discrete senses. Having access to greater volumes of data usually makes the problem even harder: for reasonably frequent words, lexicographers have to wrestle with hundreds of citations and try to group them into manageable clusters of meaning. As a result, as pointed out by Van der Meer (2004: 807), ‘one of the hardest problems torturing practising lexicographers has always been the question of how to describe the meaning of so-called polysemous words’. Atkins and Rundell (2008: 264) concur when they state that ‘there is little agreement about what word senses are (or even whether they exist). Lexicographers are therefore in the position of having to describe something whose nature is not at all clear’. Consequently, Kilgarriff (1997), in a paper with a telling title (‘I don’t believe in word senses’), rejects the word sense – being an ill-defined entity – as the basic unit. Instead, dictionary word senses are the result of clustering attested uses appearing as concordance lines. So, although there may be no discrete senses in language, they do exist as artefacts in a dictionary.
2 Specifying senses in monolingual dictionaries The modern lexicographer is often confronted with hundreds of citations and faces the intimidating task of having to arrange them neatly into portions appetizing enough to be appreciated by future dictionary users. Working with large corpora is a humbling experience for linguists, and the job of arranging a multitude of corpus citations into neat, discrete senses, is far less obvious than many would believe. In fact, two opposing strategies have been identified at this stage of dictionary compilation, known as lumping and splitting. The first strategy aims to minimize the number of senses so that they each cover as much semantic ground as possible. In contrast, those who follow the second strategy (‘the splitters’) will tend to generate a rather larger number of finely distinguished senses. As Hanks (2000: 208) observes, exposure to ever-growing corpora naturally entices lexicographers into adding yet further definitions to the dictionary. This happens in part because it does seem easier than reflecting on whether the definitions already in place can be modified to accommodate the newly encountered usage, but also because having a lot of ‘meanings’ is often seen as a desirable feature from a marketing point of view, so as to boost the number of ‘references’ that can later be bandied about in promotional materials. But even in corpus citation lines, meanings do not lie there exposed and ready to be picked up or ‘discovered’. Rather, corpus lines provide evidence of ‘traces of meaning events’ (Hanks 2000: 211). That senses in dictionaries do not have as much grounding in linguistic reality as is often naively held, can be readily ascertained by examining closely analogous entries in different dictionaries. To work through a concrete example, let us take the noun mind. In the online version
253
The Bloomsbury Handbook of Lexicography
of the Longman Dictionary of Contemporary English, this lemma receives three senses, if we ignore the metonymically derived sense ‘intelligent person’ and all the numerous multi-word expressions: 1. your thoughts or your ability to think, feel and imagine things 2. used to talk about the way that someone thinks and the type of thoughts they have 3. your intelligence and ability to think, rather than your emotions In contrast, a close competitor, Oxford Advanced Learner’s Dictionary, also available online, gives four rather different senses (again ignoring the metonymic ‘intelligent person’): 1. the part of a person that makes them able to be aware of things, to think and to feel 2. your ability to think and reason; your intelligence; the particular way that somebody thinks 3. your thoughts, interest, etc. 4. your ability to remember things At the same time, the DANTE lexical database gives no fewer than eight senses covering roughly the same semantic space. Surely, the best professional lexicographers cannot be describing the same reality? The undeniable observation that the more voluminous (‘comprehensive’) the dictionary is, the greater the number of senses it will tend to have for a typical common word (and not just because larger dictionaries address areas of meanings excluded from smaller ones!), testifies to the fact that senses in the dictionary are only objective with respect to the entry structure of this dictionary. They should not be seen as an objective representation of language in any dimension. At the very most, they are attempts at such a representation, but filtered through the practical realities of the particular lexicographic project, dictated by the foreseen target users and uses, and constrained by the available financial, human and technical resources. Rundell (1999: 40) makes the point clearly when he observes: (as lexicographers have always known), the notion that a given word has five or ten or twenty ‘senses’ is simply a useful working convention without any objective truth-value (…) What dictionary-makers attempt to do is to segment this continuum of meaning in ways that will provide maximum benefit to the target user. It is not irrelevant to observe at this point that dictionary senses are not necessarily always designed to represent separate ‘meanings’ of the strict semantic kind. Instead, separate sense status may be accorded to distinct uses of the word. For example, verb entries may be structured by the syntactic patterns of use in which they are observed.
3 Senses in bilingual dictionaries: meaning structure versus equivalence structure In bilingual dictionaries, the issue of sense division is more complex, as it involves not one but two, lexical systems. In organizing the entry into senses, lexicographers may thus be guided by interlingual equivalence relations. This provides an extra criterion, and a relatively objective one 254
Identifying, Ordering and Defining Senses
at that, especially if, in the near future, suitable parallel corpora become more widely available as a source of evidence on textual equivalence between lexical items in two (or more) languages (an idea which goes back to Hartmann 1985). The issue was taken up by (among others) Manley, Jacobsen and Petersen (1988), who use the term meaning structure to refer to a type of sense organization which relies on the source language solely, and equivalence structure to one based on the equivalence relations with the target language. They assert that ‘meaning structure is a relic from the monolingual dictionary and … the more we can approach equivalence structure the closer we will get to the ideal form of the bilingual dictionary’ (1988: 296). Most authors writing on the issue concur that senses need to reflect such equivalence relations, even if the description of the source language gets ‘subtly distorted’ (Atkins 1996: 523) in the process. There are actually two opposing aspects of equivalence structure: (1) sense distinctions in the source language may be redundant and undergo elimination; and (2) it may be advisable to introduce extra distinctions so as to provide a tighter match between the lexical items in the two languages. To illustrate the first scenario, quite a few senses of English high, which tend to be distinguished in monolingual dictionaries, translate into German as hoch. All these senses of English could then be conflated in an English-to-German dictionary, thus making the entry presentation more economical and, arguably, easier to navigate and use. But there are doubts, such as what to do when a given sense in L1 has another important translation in L2. Decisions like these are usually best made on a per-case basis, depending on the particular constellation of equivalents and also on what functions the dictionary is envisaged to perform. Conversely, what appears to be a single sense in a source-language item may require splitting according to substantive distinctions in the target language. For example, the English noun drift in the sense ‘deviation from course’ has different equivalents in Russian depending on whether it refers to an aircraft (дрейф) or a vessel (снос). Therefore, the option of separating the two meanings or uses out as either senses or subsenses might at least be considered, if not always acted upon. Of course, one could argue in such cases that we are dealing with the same ‘sense’, merely providing a choice of equivalents that are restricted in their use. But this just begs the question of what a ‘sense’ is; if we see it, as I believe we should in this context, as a lexicographic construct rather than a linguistic one, then it is certainly something that can be split. Even when dictionary editors aim in principle for equivalence structure, practical considerations may prevail and skew the structure in the direction of that found in a monolingual dictionary. This can happen because a monolingual dictionary of a language is not infrequently a starting point in the compilation of a bilingual dictionary with this language as the dictionary’s source language (SL). Alternatively, lexicographers may start with a universal framework of that language created to be used as a skeleton in bilingual projects. It is only natural that this SL-based structure will tend to impress itself on the final product, even if this is not the intention of the lexicographer. Meaning structure is overtly aimed for in dictionaries following what Jarošová (2000: 18) calls the explanatory principle. This echoes Lev Shcherba’s idea of the explanatory dictionary originally expressed in the 1940s (Shcherba 1995 is the English-language version). Meaning structure is also sanctioned in most semi-bilingualized dictionaries, where lexicographers are often discouraged, if not downright prohibited, from manipulating the sense divisions inherited from the monolingual model dictionary. At times, this frustrates the bilingual lexicographer. To 255
The Bloomsbury Handbook of Lexicography
use an example from my own experience: when working on a Polish adaptation of a major monolingual learner’s dictionary, I had to contend with the basic sense of the English verb pour being defined as ‘to make a liquid flow from or into a container’. This sense was supposed to subsume a similar action on powdery substances such as sugar. The problem is that Polish requires completely different verbs in the two cases, but as splitting senses was not an option, I had to settle for an awkward side-by-side presentation of two totally unrelated (from the point of view of the Polish user) equivalents. All in all, except in artificial cases such as the last one described, it should by now be apparent that the sense structure of most existing bilingual dictionaries is usually a compromise between the analysis of the SL and the constellation of the TL equivalents of the source item. It can be argued that a bilingual dictionary with a dominant text production function might benefit from a sense structure closer to that of a monolingual dictionary of the source language. Here, the typical user of such an entry has limited knowledge of the target language and may not recognize at least some of the equivalents given. If so, they need guidance in the SL (either their native language or at least one they speak better than the TL), and such guidance more naturally mimics the distinctions typical of a monolingual dictionary. Still, if several senses share the same equivalent, there is no compelling reason not to combine them, thus saving a considerable amount of space and improving the visibility of the remaining senses with perhaps more unusual equivalents (an effect demonstrated empirically in Lew et al. 2013).
4 Ordering senses 4.1 Ordering senses in monolingual dictionaries The major approaches to sense ordering should be seen as guidelines rather than hard-andfast rules, as excessively orthodox adherence to any one such principle is likely to lead to undesirable outcomes for some entries. A notorious example is the entry summit in the first edition of COBUILD (Sinclair and Hanks 1987), where sense ordering according to corpus frequency compelled the lexicographers to list the ‘political meeting’ sense first, before the ‘top of the mountain’ sense. This example underscores the fact that, above all, common sense should prevail over any strict application of principles. As lexicographers discover over and over again in the course of their work, the lexicon of a natural language is not regular enough for an across-the-board treatment to work seamlessly for all items. Rather, we should always remain open to individual solutions, and not hesitate to depart from the general principle whenever the peculiarity of a lexical item justifies this. Having said that, consistency is in general seen as a virtue in dictionaries, so guiding principles are needed. The most popular principles of relevance in guiding sense ordering are: chronology, frequency, markedness and logic.
4.1.1 Chronology In chronological ordering, also known as historical, senses are arranged from the earliest attested to the most recent. As one would expect, the principle is most relevant for historical and diachronic dictionaries. However, there also exist general dictionaries using this arrangement. 256
Identifying, Ordering and Defining Senses
For example, the American dictionary publisher Merriam Webster’s Incorporated has insisted on the application of the historical principle in its range of general dictionaries, including the popular Merriam-Webster’s Collegiate Dictionary. This dictionary was found inferior for US college students compared with other dictionaries aimed at college students or advanced learners of English (McCreary and Amacker 2006, McCreary 2008, 2010). In a large measure, the disappointing performance of the Merriam-Webster dictionary was ascribed to its policy (Mish 2004: 20a) of placing first historically oldest senses which are no longer current. In view of the evidence that dictionary users all too often do not read dictionary entries beyond the first sense (Tono 1984, Lew 2004), placing a non-contemporary meaning in this privileged position is counterproductive for most typical uses of the dictionary. McCreary (2008) suggests that this policy should be reversed by placing archaic senses towards the end of the entry.
4.1.2 Markedness Relegating archaic senses to the final sections of an entry may be taken as indicative of another principle: that of placing marked senses after unmarked ones. This criterion (hailed as ‘distribution’ by Fuertes-Olivera and Arribas-Baño 2008: 38) says that senses which are not in general use, are restricted geographically, pragmatically or socially, should follow those not so restricted. Sound as it is, it is obvious that the policy is insufficient in itself, as most senses with serious claims for entry-initial placement will not be restricted in any way. It should be clear, then, that this principle will not be of much help in those decisions that determine the most salient form of the entry: those regarding the most salient meanings. It will, however, assist in deciding what to do with those senses which exhibit restriction in use.
4.1.3 Frequency The idea behind frequency ordering is to present the sense in which the lemma is most frequently used as the first one, and then order the remaining senses in decreasing frequency. The criterion has been in use for some time, though in pre-corpus times frequency was evaluated subjectively by intuition, and an early publication on the topic, Kipfer (1984), writes of ‘usage ordering’. Frequency-based ordering really came into its own with the introduction of digital text corpora. Even though corpus tools are still not quite capable of automatically counting the occurrences of words in specific senses, modern corpus query applications go a long way towards facilitating such estimates. There is no question that ordering by frequency is convenient for the lexicographer, providing a relatively objective ground for ordering decisions (issues of corpus balancing aside), but is it also in the best interest of the dictionary user? All too often authors claim that listing the most frequent senses at the top gives the user the best chance of finding what they want in the shortest time possible. The fact is, though, that such claims remain largely unproven. It was English monolingual dictionaries for advanced learners that embraced frequency ordering most enthusiastically. However, if we picture a scenario of advanced learners of English looking up the meaning of a common word (common words tend to have many senses, other things being equal), it is quite unlikely that they will be looking for the most frequent sense, as this sense will normally be quite familiar to advanced language learners. Indeed, I have heard comments from advanced learners of English that they start examining long entries from the bottom up, as
257
The Bloomsbury Handbook of Lexicography
they have discovered through extensive dictionary use that the senses they seek are often found towards the end of the entry. Perhaps it is the use of a similar strategy that might account for the special salience of final senses noted by Nesi and Tan (2011; see also Dziemianko 2017). Placing the most frequent sense first is rather more defensible if a dictionary is going to be used for text production (such as essay writing). Whereas looking up a frequent sense of a common word is not likely when the dictionary is being used for comprehension, users engaged in text production may wish to seek guidance or reassurance on the grammatical or collocational behaviour of a well-known sense. This invites the conclusion that the optimal sense ordering hangs on what the dictionary is actually used for (or is designed to be used for). In digital dictionaries, sense ordering might conceivably be adjusted depending on the circumstances of use (an idea developed in Lew 2009).
4.1.4 Logic Logical ordering is sometimes invoked by dictionary editors in the front matter. The notion was subjected to close scrutiny by Hiorth (1954), and then Kipfer (1984), who found it to be merely a label with little content. Another term encountered in the front matter of dictionaries is psychologically-meaningful ordering (Kipfer 1984: 103), but it has never been made clear how these two types would actually differ. All in all, it seems that these different labels represent intuitive attempts at respecting the dictionary entry as a coherent text (cf. Frawley 1989), rather than seeing it as a loose amalgamation of independent senses. In order to give a more holistic picture of meaning, lexicographers should strive to present senses as related, to the extent that this is practical, typically by introducing an important core sense of some generality and then demonstrating how other peripheral senses relate to this pivotal sense. These senses may be derived from the core sense by meaning extension, specialization or generalization, including the figurative processes of metaphor and metonymy (Van der Meer and Sansome 2001, Atkins and Rundell 2008, Wojciechowska 2012). We return to this issue below. Unlike in applying the previous principles, this approach to sense ordering implies grouping senses at different levels of organization, so the structure of the entry need not be flat. Instead, subsenses should be allowed to be nested under the main sense. A well-known exemplar of such an approach is the New Oxford Dictionary of English (Hanks and Pearsall 1998), where a systematic attempt has been made to cluster related subsenses under a smaller number of ‘prototypical’ senses. Its subsequent editions largely continue this tradition under the slightly changed title Oxford Dictionary of English, now also a basis for English dictionaries encountered in numerous digital products. The number of hierarchical levels can be larger than two, and the hierarchy can get quite elaborate. As Fraser (2008: 72) notes, large scholarly dictionaries may feature as many as four levels of sense organization, with a possible arrangement including the following: overarching Divisions, labelled with capital letters (A, B, C); semantic Branches, with Roman numerals (I, II, II); Sections, with Arabic numerals (1, 2, 3); and Subsections, with lower-case letters (a, b, c). A prominent exemplar of a dictionary with this style of sense organization (maximal, though not obligatory for every entry) is the Oxford English Dictionary.
258
Identifying, Ordering and Defining Senses
4.2 Ordering senses in bilingual dictionaries As we have already seen, entry structure in bilingual dictionaries may be carried over from a monolingual dictionary which may have been used as a starting point in the compilation of bilingual dictionaries. This routinely happens in the (often superficial) adaptations of monolingual dictionaries referred to as semi-bilingual or bilingualized dictionaries (Hartmann 1994). More interesting are those works in which senses in a bilingual entry have been organized around their equivalents. In such cases, there is an argument to be made for placing at the top those senses which include the most common textual equivalent in the TL, and give further senses in descending order of frequency of equivalents translating this headword (not the same, of course, as the absolute frequency of candidate equivalents). Another way to think of this measure is as conditional probability of a candidate equivalent appearing in a target-language text, given the presence of the source lemma in a source-language text. The rationale for this ordering principle would be that a user seeking a TL equivalent is first presented with equivalents which translate the headword in the largest proportion of cases. Until recently, such ordering would mostly be based on the intuition of the lexicographer. Currently, corpora are increasingly being used in the compilation of bilingual dictionaries, but they still tend to be separate corpora for the two languages. As such, they can provide information on the frequent patterns of use of words, but offer no direct clues on the correspondences between the two lexical systems. However, advances in parallel corpora now allow meaningful assistance in the identification of the most common textual equivalents between languages. Even if the most frequent equivalents in texts are not in each and every case the best candidates for inclusion in all types of bilingual dictionary, they are by and large the most serious candidates to consider.
5 Helping dictionary users identify the relevant sense Polysemous entries present a special challenge to dictionary users, as they need to locate the relevant section of the entry. Research on dictionary use (Tono 1984, Bogaards 1998, Lew 2004) shows that users tend to look at the top part of the entry, and may not scan the whole entry unless there are obvious signals in the entry that the top sense is not what they should be looking at. There is also some evidence (Nesi and Tan 2011) that more sophisticated users tend to look at the final sense in the entry, but again, the material in the middle sections of the entry is not so easily accessible. To assist dictionary users in navigating long entries, two broad types of navigational aids have occasionally been used: (1) entry menus and (2) sense guidewords (also known as signposts, shortcuts or mini-definitions). In both these types of navigational aid, the idea is to provide the user with rough-and-ready clues to the range of meaning or use covered within a specific sense section of the entry, and so direct them to the most relevant sense. The difference between the two types lies in their spatial organization, as illustrated below with a modified partial entry from the Seventh Edition of the Oxford Advanced Learners’ Dictionary, as used in Lew’s (2010) study.
259
The Bloomsbury Handbook of Lexicography
ADVANCE 1. FORWARD MOVEMENT 2. DEVELOPMENT 3. MONEY 4. SEXUAL 5. PRICE INCREASE 6. MOVE FORWARD 7. DEVELOP 8. HELP TO SUCCEED 9. MONEY 10. SUGGEST 11. MAKE EARLIER 12. MOVE FORWARD 13. INCREASE ■ noun 1 [C] the forward movement of a group of people, especially armed forces: We feared that an advance on the capital would soon follow. 2 [C, U] advance (in sth) progress or a development in a particular activity or area of understanding:recent advances in medical science We live in an age of rapid technological advance. …
ADVANCE ■ noun FORWARD MOVEMENT 1 [C] the forward movement of a group of people, especially armed forces: We feared that an advance on the capital would soon follow. DEVELOPMENT 2 [C, U] advance (in sth) progress or a development in a particular activity or area of understanding:recent advances in medical science · We live in an age of rapid technological advance. MONEY 3 [C, usually sing.] money paid for work before it has been done or money paid earlier than expected: They offered an advance of £5 000 after the signing of the contract. ∙ She asked for an advance on her salary. …
Entry menus gather all the clues in a solid block at the top of the entry (left-hand column above). In contrast, guidewords are distributed throughout the entry, with indicators introducing each sense (right-hand column above). The efficacy of such entry navigational aids has been established mainly in the context of monolingual dictionaries for language learners (Tono 1984, 1992, 1997, 2001; Bogaards 1998; Lew and Pajkowska 2007). It would stand to reason that in bilingual dictionaries the need for such access-facilitating devices is diminished, as one of the languages of the dictionary would usually be the native language of the user, allowing for more efficient scanning of the entries than if the entries are all in a foreign language, as would be the case in a monolingual dictionary for language learners. However, there is some evidence that digital bilingual dictionaries do benefit from clickable entry menus as long as the target sense is additionally highlighted (Lew and Tokarek 2010). Direct comparisons between the two systems (Lew 2010, Nesi and Tan 2011) suggest that the distributed system works better. The advantage of guidewords over menus may be explained by the physical proximity between guidewords and full definitions, which allows the two entry elements to work in synergy. Also, since entry menus are found at the top of the entry, there is a real risk of dictionary users getting lost on the way from the menu to the sense further down the entry, even if they have identified the relevant sense correctly in the menu itself, particularly if the entry is long and runs on to another page or screen. This is much less of a risk when the clue is adjacent to its sense section of the entry, as it is in the guideword system.
260
Identifying, Ordering and Defining Senses
6 Defining senses Defining senses, or meanings, is most relevant to semasiological monolingual dictionaries. Most types of onomasiological dictionary, such as thesauri or synonym dictionaries, tend not to have definitions, except perhaps for bringing out the differences between alternative lexical choices. Prototypical bilingual dictionaries do not normally employ definitions, working instead with equivalents in another language as the primary instrument for explaining meaning. Nevertheless, bilingual dictionaries sometimes do resort to definition in cases where an equivalent happens not to be available, or an equivalent would not be clear on its own. In such and similar cases, a definition (in this use often called a gloss) may be added for clarification.
6.1 The form of definition For centuries, monolingual lexicography has been dominated by the Aristotelian model of defining. This format, also known as the classical definition, attempts to describe the defined item (definiendum) by supplying at least two pieces of information. First, it identifies the general category of things to which the defined item belongs. Second, it specifies the features by which the thing defined distinguishes itself from other members of this broader category. The technical terms for the two elements of the classical definition are genus and differentia specifica (or, in the plural, differentiae specificae), respectively (though they need not necessarily come in this particular order). For example, if a heater is defined as ‘a machine for making air or water hotter’ (LDOCE online), then what the definition is telling us is that a heater is a type of machine (genus) with a particular function of making air or water hotter (differentia specifica). This defining strategy thus involves two complementary moves: a generalization followed by specialization. Even though the classical definition has ruled for centuries, it has not ruled supreme. Studies by historical lexicographers (e.g. Osselton 2007, Stein 2011) have identified instances of other defining strategies, some of which have recently enjoyed a comeback. Foremost amongst these has been the so-called full-sentence definition (FSD), brought to the contemporary limelight by the COBUILD range of dictionaries starting in 1987 (see Sinclair 1987). The case for FSD is made by Hanks (for more detail, see Hanks 1987). There are several variants of the full-sentence definition, but the most important characteristic is that the defined item is embedded in the definition itself, as in this definition from COBUILD online: ‘A heater is a piece of equipment or a machine which is used to raise the temperature of something, especially of the air inside a room or a car.’ Such full-sentence definitions are claimed to be more similar to regular discourse, and remind the reader of an explanation a teacher or parent might offer. However, studies into patterns of spontaneous defining (Fabiszewski-Jaworski 2011) do not confirm this claim: while the full-sentence format is used at times, the classical definition remains by far the most popular. Another feature of the FSD is that the inclusion of the definiendum in the definition creates an opportunity for highlighting typical word combinations with the item being defined. This is particularly common with verbs and adjectives, as in this COBUILD definition of instil: ‘If you instil an idea or feeling in someone, especially over a period of time, you make them think it or feel it.’ 261
The Bloomsbury Handbook of Lexicography
Other dictionaries have not adopted the full-sentence definition to the extent that the COBUILD range has. Amongst the problems of this defining format are: excessive wordiness, complexity, and appeal to conventions that remain largely obscure to the average user (Rundell 2006). Still, the FSD is used in moderation in most current monolingual English dictionaries for learners, and the format has inspired a lot of lexicographic research (e.g. Barnbrook 2013). A related defining format is the single-clause definition (originally termed the single-clause when definition, although it may occasionally employ question words other than when Lew and Dziemianko 2006a), used most readily to define abstract nouns, especially ones which lack a useful genus term. Instead of defining destruction as ‘the act or process of destroying something or of being destroyed’ (LDOCE online), the single-clause alternative would just say ‘when something is being destroyed’, avoiding the clumsy and over-general act or process. It may well be that such general words do not contribute that much to the explanation of the exact meaning of the definiendum, but at least they do indicate that a noun is being defined: something that the single-clause definition does rather poorly (Dziemianko and Lew 2006, Lew and Dziemianko 2006a, 2006b, 2012). A frequent defining strategy in concise dictionaries is to give a synonym or several synonyms. Interestingly, such a defining strategy bears affinity to the methods of bilingual lexicography: a synonym can be thought of as a special type of (near-)equivalent. While a bilingual dictionary provides equivalents in another language, synonym definitions may represent a different regional variety (e.g. kirk ScotE church) or register (puke infml vomit). Whenever a lemma represents a non-neutral item, as in the last case, and is rendered with a synonym in general use, the use of a synonym as a definition is generally accepted. Otherwise, it is frowned upon as a lexicographer’s easy way out. A number of other defining formats are occasionally used, such as the morphological definition (a formulation unwrapping a derivative word, e.g. swiftly in a swift fashion), extensional definition (enumerating typical exemplars, e.g. legume a seed such as a pea or bean) or ostensive definition (pointing to the definiendum, e.g. black the colour of this print). The above classification of definition formats has mostly dealt with the syntactic devices by means of which definitional sequences are put together. But the ultimate building blocks of definitions are words, and there is a general, and not altogether unreasonable, expectation that those words be simpler than the word being defined. Of course, this is hardly possible in defining the most common vocabulary (whose presence in monolingual dictionaries is somewhat tokenistic). The requirement of defining in simple words found a systematic and formal implementation in the so-called vocabulary control movement of the mid-twentieth century (Cowie 1999), out of which grew the defining vocabularies of the major monolingual English learners’ dictionaries. These vocabulary lists typically consist of between 2,000 and 3,500 words in their most common senses, and it is with the use of this restricted set that the definitions of up to 100,000 senses recorded in such dictionaries are written. It is often argued that the use of restricted vocabulary generally makes definitions easier to understand. While this is probably so, it is also true that the formulations become less precise, more wordy and roundabout, if not downright strained. The artificiality extends to unnatural collocational patterns, as the natural collocates may not be in the defining vocabulary set. Problems such as these throw into question the rigid restrictions imposed by defining vocabulary lists. As an alternative, Hanks (2009: 307) proposes that while definitions should be ‘as simple as possible’, they should at the same time be ‘as complex as 262
Identifying, Ordering and Defining Senses
necessary’. This appears to be a reasonable position, given the numerous problems associated with the use of restricted defining vocabulary. Rather than trying to dumb down definitions for language learners, publishers should offer bilingual learners’ dictionaries (Adamska-Sałaciak 2010, Lew and Adamska-Sałaciak 2015).
6.2 Relations between definitions of different senses Lexicographers defining polysemous entries need to grapple with the issue of relatedness between different senses. Foregrounding the links between different shades of meaning may help repair some of the damage done by artificially chopping semantic space into separate dictionary senses. In line with this consideration, there are those who stress that the dictionary entry is just one type of text (e.g. Frawley 1989), with its own cohesive links. Arguably, readers going through the entry can benefit from the definitions of subsequent senses building on the preceding ones, at the same time avoiding repetition. As a result, however, some definitions may become impossible to interpret without the contextual support of the earlier ones (‘an instance of this’ is a classic formulation in lexicographese, its popularity probably due more to space-saving considerations than to anything else). The assumption underlying such defining practice is that dictionary users behave as entry readers. This assumption can be problematic, as dictionary users do not have to, and often do not care to, go through the complete entry, if they are looking for a quick solution. On such a scenario, it may be more advantageous if a definition of each sense is relatively autonomous, so that its comprehension does not send the dictionary user on a quest for clues all over the entry. One way to approach the issue of how closely the senses should be interrelated is through the primary function of the dictionary. Using a dictionary for comprehension favours quick consultation, and for such uses, relatively autonomous senses might work best. In contrast, if an entry is used for browsing or vocabulary learning, the user is likely to spend more time examining larger portions of the entry, and for such uses a more holistic approach to defining may be more suitable.
References Adamska-Sałaciak, A. (2010), ‘Why we need bilingual learners’ dictionaries’ in I.J. Kernerman and P. Bogaards (eds), English Learners’ Dictionaries at the DSNA 2009, Tel Aviv: K Dictionaries, 121–37. Atkins, B.T.S. (1996), ‘Bilingual dictionaries – past, present and future’ in M. Gellerstam, J. Jarborg, S.G. Malmgren, K. Noren, L. Rogström and C.R. Papmehl (eds), EURALEX ‘96 Proceedings, Göteborg: Department of Swedish, Göteborg University, 515–46. Atkins, B.T.S. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. Barnbrook, G. (2013), ‘A sense of belonging: Possessives in dictionary definitions’, International Journal of Lexicography 26 (1), 2–22. Bogaards, P. (1998), ‘Scanning long entries in learner’s dictionaries’ in T. Fontenelle, P. Hiligsmann, A. Michiels, A. Moulin and S. Theissen (eds), EURALEX ‘98 Actes/Proceedings, Liege: Université Départements d’Anglais et de Néerlandais, 555–63.
263
The Bloomsbury Handbook of Lexicography
Burkhanov, I. (1997), ‘On the correlation between lexicology, linguistic semantics and lexicography’, Zeszyty Naukowe Wyższej Szkoły Pedagogicznej w Rzeszowie. Seria Filologiczna. Językoznawstwo 4 (26), 55–73. Corino, E., C. Marello and C. Onesti (eds) (2006), Atti del XII Congresso di Lessicografia, Torino, 6-9 settembre 2006, Allessandria: Edizioni dell’Orso. Cowie, A.P. (1999), English Dictionaries for Foreign Learners: A History, Oxford: Clarendon Press. Dziemianko, A. (2017), ‘Dictionary entries and bathtubs: Does it make sense?’ International Journal of Lexicography 30 (3), 263–84. Dziemianko, A. and R. Lew (2006), ‘When you are explaining the meaning of a word: The effect of abstract noun definition format on syntactic class identification’ in E. Corino et al. (eds), 857–63. Fabiszewski-Jaworski, M. (2011), ‘Spontaneous defining by native speakers of English’ in K. Akasu and S. Uchida (eds), ASIALEX2011 Proceedings Lexicography: Theoretical and Practical Perspectives, Kyoto: Asian Association for Lexicography, 102–9. Fraser, B.L. (2008), ‘Beyond definition: Organising semantic information in bilingual dictionaries’, International Journal of Lexicography 21 (1), 69–93. Frawley, W. (1989), ‘The dictionary as text’, International Journal of Lexicography 2 (3), 231–48. Fuertes-Olivera, P.A. and A. Arribas-Baño (2008), Pedagogical Specialised Lexicography: The Representation of Meaning in English and Spanish Business Dictionaries (Terminology and Lexicography Research and Practice 11), Amsterdam: John Benjamins. Hanks, P. (1987), ‘Definitions and explanations’ in J. Sinclair (ed.), 116–36. Hanks, P. (2000), ‘Do word meanings exist?’, Computers and the Humanities 34 (1–2), 205–15. Hanks, P. (2009), ‘Review of Stephen J. Perrault (ed.) (2008), Merriam-Webster’s Advanced Learner’s English Dictionary’, International Journal of Lexicography 22 (3), 301–15. Hanks, P. and J. Pearsall (eds) (1998), New Oxford Dictionary of English, Oxford: Oxford University Press. Hartmann, R.R.K. (1985), ‘Contrastive text analysis and the search for equivalence in the bilingual dictionary’ in K. Hyldegaard-Jensen and A. Zettersten (eds), Symposium on Lexicography II. Proceedings of the Second International Symposium on Lexicography, May 1984, at the University of Copenhagen (Lexicographica. Series Maior 5), Tübingen: Max Niemeyer, 121–32. Hartmann, R.R.K. (1994), ‘Bilingualised versions of learners’ dictionaries’, Fremdsprachen Lehren und Lernen 23, 206–20. Hiorth, F. (1954), ‘Arrangement of meanings in lexicography: Purpose, disposition and general remarks’, Lingua 4, 413–24. Jarošová, A. (2000), ‘Problems of semantic subdivisions in bilingual dictionary entries’, International Journal of Lexicography 13 (1), 12–28. Kilgarriff, A. (1997), ‘I don’t believe in word senses’, Computers and the Humanities 31 (2), 91–113. Kipfer, B.A. (1984), Workbook on Lexicography: A Course for Dictionary Users, Exeter: University of Exeter. Lew, R. (2004), Which Dictionary for whom? Receptive Use of Bilingual, Monolingual and Semibilingual Dictionaries by Polish Learners of English, Poznań: Motivex. Lew, R. (2009), ‘Towards variable function-dependent sense ordering in future dictionaries’ in H. Bergenholtz, S. Nielsen and S. Tarp (eds), Lexicography at a crossroads: Dictionaries and Encyclopedias Today, Lexicographical Tools Tomorrow (Linguistic insights – studies in language and communication, Vol. 90), Bern: Peter Lang, 237–64. Lew, R. (2010), ‘Users take shortcuts: Navigating dictionary entries’ in A. Dykstra and T. Schoonheim (eds), Proceedings of the XIV Euralex International Congress, Ljouwert: Afûk, 1121–32. Lew, R. and A. Adamska-Sałaciak (2015), ‘A case for bilingual learners’ dictionaries’, ELT Journal 69 (1), 47–57. Lew, R. and A. Dziemianko (2006a), ‘A new type of folk-inspired definition in English monolingual learners’ dictionaries and its usefulness for conveying syntactic information’, International Journal of Lexicography 19 (3), 225–42.
264
Identifying, Ordering and Defining Senses
Lew, R. and A. Dziemianko (2006b), ‘Non-standard dictionary definitions: What they cannot tell native speakers of Polish’, Cadernos de Traduçao 18, 275–94. Lew, R. and A. Dziemianko (2012), ‘Single-clause when-definitions: Take three’ in Proceedings of 15th EURALEX International Congress, Oslo, 7–11 August, 2012, Oslo: Oslo University, 997–1002. Lew, R., M. Grzelak and M. Leszkowicz (2013), ‘How dictionary users choose senses in bilingual dictionary entries: An eye-tracking study’, Lexikos 23, 228–54. Lew, R. and J. Pajkowska (2007), ‘The effect of signposts on access speed and lookup task success in long and short entries’, Horizontes de Lingüística Aplicada 6 (2), 235–52. Lew, R. and P. Tokarek (2010), ‘Entry menus in bilingual electronic dictionaries’ in S. Granger and M. Paquot (eds), eLexicography in the 21st Century: New Challenges, New Applications, Louvain-laNeuve: Cahiers du CENTAL, 193–202. Manley, J., J.R. Jacobsen and V.H. Pedersen (1988), ‘Telling lies efficiently: Terminology and the microstructure in the bilingual dictionary’ in K. Hyldgaard-Jensen and A. Zettersten (eds), Symposium on Lexicography III, Tübingen: Max Niemeyer, 281–301. McCreary, D.R. (2008), ‘Looking up “hard words” for a production test: A comparative study of the NOAD, MEDAL, AHD, and MW Collegiate Dictionaries’ in E. Bernal and J. DeCesaris (eds), Proceedings of the XIII EURALEX International Congress, Barcelona: Universitat Pompeu Fabra, 1287–93. McCreary, D.R. (2010), ‘Three collegiate dictionaries: A comparison of reading comprehension test scores for university students using MWCD11, AHD4, and NOAD2’ in I. Kernerman and P. Bogaards (eds), English Learners’ Dictionaries at the DSNA 2009, Tel Aviv: K Dictionaries, 55–74. McCreary, D.R. and E. Amacker (2006), ‘Experimental research on college students’ usage of two dictionaries: A comparison of the Merriam-Webster Collegiate Dictionary and the Macmillan English Dictionary for Advanced Learners’ in E. Corino et al. (eds), 871–85. Mish, F.C. (ed.) (2004), Merriam-Webster’s Collegiate Dictionary, Eleventh edition, Springfield, Massachusetts: Merriam-Webster Incorporated. Nesi, H. and K.H. Tan (2011), ‘The effect of menus and signposting on the speed and accuracy of sense selection’, International Journal of Lexicography 24 (1), 79–96. Nowakowski, M. (1990), ‘Metaphysics of the dictionary versus the lexicon’ in J. Tomaszczyk and B. Lewandowska-Tomaszczyk (eds), Meaning and Lexicography, Amsterdam: John Benjamins, 5–19. Osselton, N.E. (2007), ‘Innovation and continuity in English learners’ dictionaries: The single-clause when-definition’, International Journal of Lexicography 20 (4), 393–9. Piotrowski, T. (1994), Problems in Bilingual Lexicography, Wrocław: Wydawnictwo Uniwersytetu Wrocławskiego. Rundell, M. (1999), ‘Dictionary use in production’, International Journal of Lexicography 12 (1), 35–53. Rundell, M. (2006), ‘More than one way to skin a cat: Why full-sentence definitions have not been universally adopted’ in E. Corino et al. (eds), 323–37. Shcherba, L.V. (1995), ‘Towards a general theory of lexicography’, International Journal of Lexicography 8 (4), 314–50. Sinclair, J. (ed.) (1987), Looking up: An Account of the COBUILD Project in Lexical Computing, London and Glasgow: Collins. Sinclair, J. and P. Hanks (1987), Collins COBUILD English Language Dictionary (COBUILD1), London and Glasgow: Collins. Stein, G. (2011), ‘The linking of lemma to gloss in Elyot’s Dictionary (1538)’ in O. Timofeeva and T. Säily (eds), Words in Dictionaries and History. Essays in Honour of R.W. McConchie, Amsterdam: John Benjamins, 55–79. Tono, Y. (1984), On the Dictionary User’s Reference Skills, B.Ed. Thesis, Tokyo Gakugei University. Tono, Y. (1992), ‘The effect of menus on EFL learners’ look-up processes’, Lexikos 2, 230–53. Tono, Y. (1997), ‘Guide Word or Signpost? An experimental study on the effect of meaning access indexes in EFL learners’ dictionaries’, English Studies 28, 55–77. Tono, Y. (2001), Research on Dictionary Use in the Context of Foreign Language Learning: Focus on Reading Comprehension (Lexicographica. Series Maior 106), Tübingen: Max Niemeyer.
265
The Bloomsbury Handbook of Lexicography
Van Der Meer, G. (2004), ‘On defining: Polysemy, core meanings, and ‘great simplicity”’ in G. Williams and S. Vessier (eds), Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004, Vol. 2, Lorient: Université de Bretagne Sud, 807–15. Van Der Meer, G. and R. Sansome (2001), ‘OALD6 in a linguistic and a language teaching perspective’, International Journal of Lexicography 14 (4), 283–306. Wojciechowska, S. (2012), Conceptual Metonymy and Lexicographic Representation, Frankfurt: Peter Lang.
266
16
A theory of lexicography – is there one? Tadeusz Piotrowski
1 Introduction This chapter will above all clarify methodological and notional issues encountered in discussions of theories of lexicography – first, it will answer the question whether there are theories of lexicography, second, it will discuss what their components are and, third, it will touch on their validity. In what follows I will treat such expressions as ‘theory of lexicography’, ‘theoretical lexicography’ or ‘metalexicography’ as synonymous. As usual, there are idiosyncratic uses of such terms. Hüllen (1999), for example, makes a contrast between lexicography, which for him denotes a theory, and dictionary-making, which is the name for practice. For Wiegand (1984, 2010) metalexicography is a general term and encompasses a theory of lexicography. There is an enormous body of literature on theory in general, its components and status in research, above all in the sciences; quite a lot of that is strictly philosophical and belongs to the philosophy of science (cf. e.g. Psillos and Curd (eds) 2008). This chapter also applies notions from the philosophy of science to lexicographic theories. I will be as non-technical as possible, which also means that the analysis will be fairly superficial. Unless noted otherwise, the type of dictionary that I will implicitly refer to will be the general commercial monolingual dictionary, or a dictionary for general users in terms of Béjoint (2015), which is, or rather was, the prototypical dictionary in the English-speaking countries, and the lexicographic activities that aim at producing commercial monolingual and bilingual dictionaries will be called practical lexicography or, simply, practice. While it is still convenient to discuss lexicography using a well-known object, a dictionary, we have to note that a dictionary as a distinct object – printed or electronic – is disappearing (cf. the conclusions in Kosem et al. 2019). However, lexicographic data – data with information on linguistic items – is used in many applications, basically, any Internet browser that suggests standard linguistic forms of inputted items has this function. Research dictionaries, products of academic lexicography, though in many aspects similar to commercial lexicographic reference works, have a number of qualities which will be outside the scope of this chapter, above all, they do not have to be immediately understandable to their users, who are expected to learn the theory behind these descriptions (cf. Geeraerts 2015), and they are not expected to earn money.
The Bloomsbury Handbook of Lexicography
What is a theory? In the literature on lexicography in the English language one can find quite strong statements against theories. The following quotation is fairly typical: ‘Lexicography is above all a craft, the craft of preparing dictionaries … A science has a theory, a craft does not … how can there be a theory of the production of artefacts? there is no theory of lexicography’ (the arrangement of quotes does not follow their order in the book). These authoritative statements were written by Henri Béjoint in his recent book (2010: 381) and are still repeated (cf. Chishman and de Schryver 2019). Unfortunately, such broad generalizations can be falsified fairly easily. First, let us sort out terminological issues. ‘An artefact may be defined as an object that has been intentionally made or produced for a certain purpose’ (Hilpinen 2011); a dictionary certainly is an artefact, as is a violin, or a bridge. However, in contrast to what Béjoint writes, theories of artefacts do exist. Hilpinen (2011) refers to some of them, for example, to Margolis and Laurence (2007). There are also books which refer to theories of artefact production, for example, to bridge design (Zhao and Tonias 2012). Though based on findings of sciences, bridge design is not a science, it is a craft. And yet it has a theory and a philosophy. One can wonder further why crafts that produce strictly linguistic objects, such as dictionaries, and other texts, are supposed to lack theories. There is an ancient craft that does produce linguistic objects, a craft that very likely provided a stimulus towards production of some of the first dictionaries that we know from Sumer. The craft is called translation. Translation is as venerable as lexicography, like dictionary-making it is also both a craft and an art. Yet there is no dearth of theories of translation (cf. e.g. Gentzler 2001, Pym 2010). There are numerous analogies between translation and lexicography, not only when equivalence in bilingual dictionaries is involved, and I will refer to those analogies in what follows. What Pym has to say about theories in translation can be very easily applied to lexicography. Translators, like lexicographers, are said to need no theories in their work. But Pym (2010: 2) says: Translators are theorizing all the time … whenever they decide to opt for one rendition and not others, they bring into play a series of ideas about what translation is and how it should be carried out. They are theorizing … A theory sets the scene where the generation and selection process takes place. Translators are thus constantly theorizing as part of the regular practice of translating. While working on a dictionary, lexicographers do decide between several possible solutions. In their decision-making they usually rely on previous models of lexicographic practice, i.e. on previous theories of what lexicography is, even though the theories are quite often implicit. It is fair to conclude then that, like translators, lexicographers are theorizing all the time, we can add also that they often follow implicit theories from the past. The main trouble, it seems, in deciding whether some ideas or generalizations form a theory or not is with the meaning of the word ‘theory’. It is rather puzzling that most often those who write about theoretical lexicography do not define what a theory is. The reluctance of researchers to define what they mean by theoretical lexicography was noted by Wiegand as early as 1989. Béjoint certainly does not define it, though it is rather clear that he refers to scientific theories, i.e. those formulated within natural sciences. Few people would agree that lexicography is a 268
A Theory of Lexicography
branch of natural science; therefore, perhaps the concepts of scientific theories should not be applied to lexicography. Since the nineteenth century the humanities have been said to follow different methodologies: the humanities are about interpretation or understanding of the world (Mantzavinos 2020). In contemporary lexicography or corpus linguistics computers handle linguistic forms far more efficiently than human beings do; however, it is human lexicographers who can interpret the forms or understand their significance. For a description of the meaning of the word ‘theory’ it is only natural to turn to a dictionary. A more extended discussion of the concept of theory in lexicography can be found in AdamskaSałaciak (2019) and Tarp (2012). theory 1 a system of rules, procedures, and assumptions used to produce a result … 5 a set of hypotheses related by logical or mathematical arguments to explain and predict a wide variety of connected phenomena in general terms ⇒ the theory of relativity (http://www.collinsdictionary.com/dictionary/english/theory) Sense 5 is even more clear in the American Collins Dictionary (actually Webster’s New College), while sense 1 is less clear there, and I will not quote it: theory 5 that branch of an art or science consisting in a knowledge of its principles and methods rather than in its practice; pure, as opposed to applied, science, etc. (https://www.collinsdictionary.com/dictionary/english/theory) It follows then that there are two distinct senses of the word ‘theory’: (1) ‘a study of principles and methods (of an art or science)’ (2) ‘a cohesive set of hypotheses’. The first sense is usually contrasted with the word ‘practice’, and it is this sense that seems to be used most often (though, to repeat, undefined) with reference to lexicography, as we can see in the typical title of a book on lexicography: A Handbook of Lexicography. The Theory and Practice of Dictionary-Making (Svensén 2009). Svensén does not say what he means by the word ‘theory’, either. However, sometimes the word ‘theory’ can be used intentionally in an ambiguous way, as a metaphor, for example, by Rundell (2012); I take it that for him theory is any recommendation for lexicographers that does not result from a study of empirical data. In their influential book Atkins and Rundell (2008: 9) say ‘is this absence of theory such a bad thing? It may make more sense to think in terms of the principles that guide lexicographers in their work.’ In the two sentences they first use the word ‘theory’ in sense 2 – though, as usual, they do not define it, then they use the term ‘principles’ that is synonymous, as we have seen, with ‘theory’ in sense 1. Atkins and Rundell describe principles of lexicographers’ work in their book, so, logically, they wrote a theory of lexicography. What Atkins (1992/93: 7) wrote in an earlier paper about the aims of theoretical lexicography is very similar to Pym’s description of the theorizing of a translator: Every editor of every new dictionary must make decisions on how to manage every one of … aspects of lexicography, and more. Theoretical lexicography must
269
The Bloomsbury Handbook of Lexicography
provide a theoretically sound, yet practical, basis for such decision-making. To do so requires awareness of these points in the dictionary design process where there is editorial choice and those where there is none. In this paper she did define theoretical lexicography; it is ‘a body of theory related to lexicography’ (Atkins 1992/93: 4), which cannot be said to be very precise, being tautological: a theory is a theory. However, on one understanding of the word ‘theory’ Sue Atkins produces a theory of lexicography. We may therefore conclude that whenever we have a description of the principles of lexicographers’ work – hopefully systematic and coherent – we have a theory of lexicography. From this point of view lexicography, like translation, has a number of theories, though these theories are unlike ordered sets of hypotheses about empirical data that make predictions about the future that can be found in the natural sciences. I will return to this point later.
3 Theories of lexicography In general, used to discuss lexicography one needs relevant tools – terms, concepts, methods. When these tools are brought together in statements that make a coherent text, we have to do with a theory. In terms of their scope, theories of lexicography can be arranged in a hierarchy, and the word ‘theory’ can be treated as a homograph/homophone. First, style guides are theoretical statements that describe how to write a specific dictionary (theory1); second, one level up there would be those theories (theory2) that discuss lexicographic principles in a number of dictionaries (a lexicographic genre) from a practical point of view, i.e. concentrating on those that the authors think are most important in practice, for example, because the dictionaries they describe are frequent enough. This is the case with Atkins and Rundell’s book on lexicography, where they discussed monolingual and bilingual commercial dictionaries. However, a lexicographer who wants to write a dictionary of synonyms, or a terminological dictionary, would not find their book very helpful. Theory2 is a set of statements which refer to more than one dictionary. Theory1 is strictly prescriptive; it tells lexicographers what they are to do when producing a dictionary. And it has to be. Theory2 is not as prescriptive as theory1: it shows what choices a lexicographer has that can be encountered when writing a dictionary. Finally, theory3 would be a theory general enough to cover all dictionaries and all aspects of lexicography, or at least many of them. It would be above all descriptive; it would be there to interpret the basic principles of lexicography. Tarp (2008: 9) calls, somewhat tautologically, general theories ‘general (sic!) summarizing statements about lexicography’, while specific theories are for him statements about sub-areas of lexicography. He also presents (p. 11) a diagram that shows the relationship between practice and theory, which can be interpreted using the hierarchy in the monumental publication Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography. Hausmann et al. (1989–91) does include theoretical articles that describe most aspects of lexicography. Unfortunately, they are formulated from various points of view and do not present a unified description, even though in strictly theoretical sections it is Herbert Ernst Wiegand’s articles that predominate. The main ideas from the Encyclopedia are presented far more coherently in shorter theoretical sections added to the multilingual dictionary of lexicographic terms (Wiegand et al. 2010). 270
A Theory of Lexicography
Wiegand’s work is the most ambitious and comprehensive general theory3 of lexicography that I know (cf. a list of his publications http://www.gs.uni-heidelberg.de/personen/wiegand. html). One has to note, however, that his theory is concerned with printed dictionaries; he acknowledges this explicitly in the titles of his more recent publications (Wiegand 2013). A more recent attempt is the theory of lexicographic functions by Tarp (2008), which I am going to discuss later on. There is one important problem with most general theories that I know. Briefly, the theories do not go deep enough. They are often systematic accounts of lexicography, but they are not critical enough, Wiegand essentially engages in never-ending reclassifications of objects that his theory distinguishes, Tarp is not concerned with empirical facts: his ‘data is, quite literally, invented’ (de Schryver 2012: 495). Wierzbicka (1985: 5) says that ‘even the best lexicographers, when pressed, can never explain what they are doing, or why’. This statement can be interpreted as saying that lexicographers work according to hidden assumptions, and a theory should uncover them (cf. also various views on this issue, for example, Richard Hudson’s, in Béjoint (2010: 346–7). One notable exception could be the theory behind the COBUILD project, which challenged many cherished tenets of lexicography (Sinclair 1987), in particular those of pedagogical lexicography. One should also mention Patrick Hanks’s work (2013). If lexicography is to change, then those assumptions have to be described, because for centuries dictionaries have been compiled on their basis, and the environment in which dictionaries are used, their form and function are changing rapidly now. We also have different views on the nature of languages than before, and we think we have more adequate theories about the lexicon now, the chief being that an isolated word is not the principal unit of the lexicon (cf. Hanks 2013). That means that the early assumptions on which dictionaries were created are no longer valid. But this, for example, is not reflected in Tarp’s theoretical statements, he is concerned with uses of lexicographic products and data, no matter what their quality is. Dictionaries are cultural products, and ‘theory is a critique of common sense, of concepts taken as natural’, it is ‘the demonstration that what has been thought or declared natural is in fact a historical, cultural product’ (Culler 2000: 14). The assumptions that most dictionaries until recently were based on are precisely historical, cultural products, and it is the function of a theory to identify them. When Werner Hüllen says that behind early dictionaries one can see ‘such essential assumptions as ‘words are self-contained semantic entities of a language’ or ‘words as names identify objects in the world’ (Hüllen 1999: 4), then it is clear not only that numerous dictionaries are based on them (e.g. the Merriam-Webster dictionaries), but that we can see them deep down in the theoretical statements about lexicography, as, for example, in the Tarp book from 2008 (cf. Piotrowski 2009). Patrick Hanks (2000: 13) is certainly right when he objects to the view that words are entities, objects, saying that ‘Treating meanings as events rather than objects yields a more satisfactory explanation of the dynamic nature of language than treating them as objects’. He tacitly identifies one of the most important hidden assumptions in lexicography, and, indeed, in Western culture in general, in which words are identified with objects called concepts, notions, etc. The question is obviously how to describe events in a dictionary. One might think that the electronic medium will make it possible, as the lexicography of the future will be computational lexicography. In another paper Hanks suggests that ‘a major task for computational lexicography will be to identify meaning components … they may at heart be quite simple structures: much simpler, in 271
The Bloomsbury Handbook of Lexicography
fact, than anything found in a standard dictionary. But different’ (Hanks 2000/2008: 134). As far as I know, this has not been accomplished.
4 Lexicographic theory and theories in science The authors I have cited, like Béjoint, seem to think that if lexicographical theory would not be like those in the sciences then there is no theory at all for lexicography. Perhaps it was Wiegand who first very authoritatively stated that, and he has been followed since: Lexicography was never a science, it is not a science, and it will probably not become a science. Scientific activities as a whole are aimed at producing theories, and precisely this is not true of lexicographical activities. We must bear in mind that writing on lexicography is part of meta-lexicography and that the theory of lexicography is not part of lexicography. (Wiegand 1984: 13) It has to be remembered that for Wiegand ‘science’ very likely means research activities in general, as the German word Wissenschaft ‘science’ is not associated primarily with the natural sciences, as it is in English (cf. also Adamska-Sałaciak 2019). Wiegand very carefully distinguishes between lexicography, the practice itself, and meta-lexicography, research into the practice, and, he suggests, lexicography produces dictionaries, not theories, while meta-lexicography does not produce dictionaries but general statements about them. Accordingly, for him meta-lexicography can be a science, while lexicography is not. This is a crucial distinction, which is often obscured. This preoccupation with the scientific nature of lexicography and meta-lexicography echoes the endless discussions of whether linguistics is a science (cf. Clark 2006), or, even more generally, whether the humanities can be like the sciences, with the natural sciences and their methods, above all physics, being models, paragons, for any scholarly activity (cf. the brief historical discussion in Mantzavinos (2020) about whether there should be one method for the sciences and the humanities or not). However, it is clear that what lexicographers do does not differ, in essence, very much from what descriptive linguists do (cf. also Hanks 2000), or, in general, what scientists do (cf. Mantzavinos 2014, 2020). Lexicographers study – observe – linguistic data, most often with preconceived conceptions, with intuitions about lexical units, which we might call naïve hypotheses. While working on a dictionary they form generalizations on the basis of the data, which quite often clash with these hypotheses. These generalizations are recorded in dictionary entries, and are, above all, classifications of facts of language. One well-known classification is that into word classes, such as nouns and verbs. But also so-called descriptions of meaning result from lexicographers’ classifications of their material. Lexicographers group together contexts (quotations) with the item they want to describe; in each group the word has the same, or a similar, contribution to the meaning of the context. This is what Patrick Hanks (2000: 12) calls ‘unique contributions of words to the meaning of sentences in which they occur’. This contribution is then given a label, which we call a definition, and recorded in a dictionary entry. If descriptive linguistics is scientific, then obviously lexicography is also scientific in the same way. What is different is that linguists are there to discover the truth, to use a pompous phrase, to 272
A Theory of Lexicography
discover something about reality. They do not care about uses of that truth, i.e. whether the users of the description will actually understand it or need it for practical purposes. Lexicographers are there to discover the truth AND to present their findings in such a way that they will be practical for a specific type of user. Thus, the difference between lexicography and linguistics primarily lies in the use of a certain method of description and presentation of data in dictionaries, which, in turn, is usually chosen because of user needs. I used the expression ‘hypotheses’ above. For practising lexicographers they are their own intuitions, convictions about a word, products of their life with their language. However, they are also often reflexes of their schooling (these should be described by theory3). Working on a dictionary, they quite often find that their intuitions are too individual – they have been formed in the course of their own personal experience, and that their convictions do not agree with what they find in the empirical data, the texts. Therefore, they change the first draft of the explanation of meaning, to allow for what can be generalized from the data. The finished entry is also a hypothesis, because it is impossible for the lexicographer to take account of all texts. Another lexicographer, studying a different set of contexts, may come to a different conclusion. Hypotheses are also an important initial stage for any scholarly research which aims at producing a theory. In fact, scientists work like lexicographers: ‘Text interpretation can be conceptualized and practiced like every other scientific activity’ (Mantzavinos 2014: 57); the difference is in the methodological rigour which scientists use to test the hypothesis and to present the results, showing how well the tested hypothesis fits the empirical data, i.e. whether it is confirmed or refuted. We do not find this rigour in a dictionary; we tend to believe the authority and integrity of the lexicographer. If we turn to theories of lexicography (theory2) we also find numerous hypotheses in them, even if this word is not used. Any sentence that formulates a general statement about probability of occurrence of a feature, such as, ‘in the majority of the dictionaries this can be found’, ‘most dictionary entries include …’, etc. are in fact hypotheses, which, when rephrased more precisely, could be easily empirically tested, that is, they are in essence falsifiable. For example, instead of saying ‘most dictionaries’ we can say ‘8 dictionaries out of 10 exhibit a certain feature’. This is just as scientific as saying that ‘We detected 26 avian species in the agricultural conservation buffers at the study site during the 2007-2009 breeding seasons’ (Adams, Burger and Riffell 2019). I will return to this comparison with ornithology later on. It is unfair, thus, to say that a theory of lexicography does not put forward falsifiable hypotheses (cf. Wiegand 1989: 261). It does, though from a scientist’s point of view they are not formulated precisely, nor tested properly on empirical data. What I usually find in theories of lexicography is discussion of various theoretical principles on the basis of one or two model dictionaries. This is not good empirical evidence on which to form general conclusions. One sub-field of theoretical lexicography, research into user behaviour, which started off as informal surveys (cf. Adamska-Salaciak 2019), now uses rigorous empirical methods, though Tarp (2009) makes valid critical remarks on the choice of samples, i.e. groups of users, studied. In the light of the above discussion it is no wonder that a historian of dictionaries and lexicography, Werner Hüllen, has no doubts about the status of theory in lexicography, when he says that ‘during the nineteenth and twentieth centuries, [it] has developed into the analytical, fully grown, hypothesis-driven science we have today’ (Hüllen 1999: 4). Wiegand (1989, cf. also Tarp 2008: 7–11), on the other hand, remains sceptical about it, and would be most happy 273
The Bloomsbury Handbook of Lexicography
if the word ‘theory’ were not used at all. Obviously, we can use any other label, for example, ‘metalexicography’. There are theories of lexicography, they can be even said to be scientific, that describe dictionaries. Do those theories contain explanations, i.e. do they answer the question why certain methods have been used? Explanation is usually contrasted with ‘mere’ description of empirical facts in the sciences. This contrast will be better seen from an example: all of the accounts of scientific explanation described below would agree that an account of the appearance of a particular species of bird of the sort found in a bird guidebook is, however accurate, not an explanation of anything of interest to biologists (e.g. the development, characteristic features, or behavior of that species). Instead, such an account is ‘merely descriptive’. (Woodward 2011) Dictionaries are more like bird guidebooks, and so are their theories. Theories of lexicography (theory1 and theory2), and, indeed, histories of dictionaries as well, usually do not explain lexicographic products, they are systematic accounts of dictionaries. In contrast, theoreticians of translation are not content with descriptions of similarities and differences between the original and the translation, though in many theories of translation we do find catalogues of such discrepancies. Theoreticians of translation would like to explain, however, why a translator used a certain strategy; they want to interpret translators’ work. At least some of them do. Those lexicographic theories, theory1 and theory2, that are intended to be practical have a strong prescriptive bent, i.e. the metalexicographer suggests that a practitioner should use this or that method or structure. This prescriptivism is a very important point of difference between theories in the sciences and in lexicography. The former are not prescriptive. Occasionally these prescriptive remarks about dictionaries are linked to some purported needs of the user, which are rarely given any empirical grounding, simply following some assumptions derived from logical criteria, such as the level of proficiency. Or they are supported by some reference to linguistic theories, though in an extremely heterogeneous way, and one can suppose that it is the other way round, that it is traditional methods of lexicography that are given some support from linguistics in these books. For example, most theoretical descriptions of equivalents in a bilingual dictionary use the early structural approach from contrastive linguistics, in which are distinguished full equivalents, partial equivalents, zero equivalents (cf. e.g. Svensén 2009: 253–75), based on their semantic overlap. Let’s leave aside the disturbing fact that usually for structural linguists meaning of items in one language cannot be easily compared to meaning of equivalent items in another language, if at all, because meaning is the function of an item in the language. Strange as it might seem, there is strong opposition to the use of the notion of equivalence in contemporary translation studies, because the very concept is seen to be confusing rather than enlightening (cf. Pym 2010). To defend the traditional approach in lexicography, Adamska-Sałaciak (2010: 403) says that in bilingual dictionaries ‘what matters is, on the one hand, the expectation of equivalence on the part of the dictionary user and, on the other, the lexicographer’s intention to meet this expectation’ (see also Adamska-Sałaciak, Chapter 12). Thus, a lexicographer provides the naïve users with what they falsely believe are the building blocks of translation, i.e. words with their equivalents in the other language. Hartmann’s comment is even more telling; he says 274
A Theory of Lexicography
that the bilingual dictionary is ‘a repository of the collective equations established by generations of “translating lexicographers” ’ (Hartmann 2005: 18). Should the user, however, be very much interested in the history of ‘collective equations’? And should a theoretician defend the status quo of lexicography, or should he or she rather suggest ways of overcoming folk beliefs, such as those described by Hüllen, entombed in a dictionary? Judging by what we are witnessing now, computer technology can help overcome the folk beliefs about equivalence. The very popular machine translation services (Google, Bing, Yandex, Deepl, etc.) can be used to translate typical texts that a person that would use a bilingual dictionary usually has problems with; the number of instances of their use is simply staggering. What is usually translated is chunks of texts, rather than isolated words. Even for isolated words (e.g. in a menu in a restaurant) such services as Google ‘know’ the context, because they know where the person is (cf. Simonsen 2014). Are not bilingual dictionaries doomed?
5 Components in a theory of lexicography Before starting discussion of components of a theory there is an important distinction that has to be made. Namely, should a theory of lexicography discuss only those specific methods whose results can be called structures of dictionaries, and the structures themselves, i.e. lexicographic form, or should it also discuss what those structures contain, i.e. lexicographic content? Dictionary structures are extremely abstract, or, to use another word, empty, to be filled in by relevant data, and the same structures are used again and again in a dictionary, chained (recursively), or one within the other, embedded. This difference between form and content is common knowledge for all those who encode a dictionary for computer processing, to be used by human users or by computers. Normally in a printed dictionary both lexicographic form and linguistic content are merged together; in contrast, in a computer-encoded dictionary the two are kept apart, the computer merges them for the user on display or when printing. Here is a typical example of an entry, without content, with explanations. The lines _______ indicate empty slots. start of entry
_______ headword _______ hyphenation _______ pronunciation
_______ part-of-speech label
_______ definition
The structure was taken from Guidelines P5 of the Text Encoding Initiative (TEI), in which Chapter 9 is a description of the structure of dictionaries for the purpose of computer encoding (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html, and cf. Bański, Bowers and
275
The Bloomsbury Handbook of Lexicography
Erjavec 2017). This encoding, however, was designed for printed dictionaries and was to ensure that data could be exchanged between various digitized dictionaries. This abstract form could correspond to a number of different entries, for example to coracle cor·a·cle \`kȯr-ə-kəl, `kär-\ noun : a small boat used in Britain from ancient times and made of a frame (as of wicker) covered usually with hide or tarpaulin or to brunch \`brənch\ noun : a meal usually taken late in the morning that combines a late breakfast and an early lunch (adapted from.merriam-webster.com/dictionary) In the latter example the slot ‘hyphenation’ was left empty, therefore it is not shown in the final stage. In fact, it is perfectly possible to discuss a dictionary structure not using specific items, but discussing which classes of items go with which structural elements, and this is what Wiegand was engaged in. It is the use of abstract information structures into which meaningful elements can be inserted that distinguishes lexicography from other types of information description, and which makes dictionaries like databases. It is also the difficulty of fitting descriptions of linguistic facts into those structures that makes lexicography so difficult, so unnatural (cf. Bolinger 1985). It has to be stressed that this feature can be found not only in dictionaries that provide information on a language, but also in dictionaries that provide information on objects; the latter type of reference work is often called the encyclopaedia, the former the dictionary, and there is a continuum from purely linguistic dictionaries to encyclopaedic dictionaries or encyclopaedias. It is obvious that the lexicographic form in both dictionaries and encyclopaedias is the same, i.e. they both share the same conventions, though linguistic dictionaries are typically far more complex structurally, using recursion and embedding of lexicographic structures. Both encyclopaedias and dictionaries are reference works of the same type (McArthur 1998/2003, Tarp 2011: 56–7), and any fully fledged theory of lexicography (theory3) should take this into account. The TEI guidelines are designed for computers, and today lexicographic form is ultimately meant to be used by computers. It is, thus, technology that drives what lexicographic form is and will be – but we know this very well from the history of dictionary-making. However, lexicographic data is less and less often displayed in traditional dictionary-like presentations but are added to other services. Does the content also belong to lexicography? In linguistic dictionaries this content is a description of facts of a language, therefore quite often it is thought that lexicography is a branch of linguistics. The content, however, can be a description of facts from various fields of knowledge, and it is not suggested that an encyclopaedia of mathematics makes lexicography a branch of mathematics. As the two aspects, form and content, have different functions, it would be best if in a theory the names of the elements in each would be independent of both typographical (or any other) or linguistic conventions. The elements of dictionary structures should have their own names, and the items to be inserted into the structures should have their own, to avoid confusion between the two. This is, in fact, what the TEI guidelines are about.
276
A Theory of Lexicography
Therefore, there are two aspects that a theory of linguistic lexicography is to take account of, lexicographic form and linguistic content. However, as noted earlier, linguistic content is selected and prepared for insertion into lexicographic form from the point of view of the user. The user can be a human being, but also the computer. And this is the third dimension of a theory of lexicography. For most practical lexicographers a dictionary is a sort of message to the user, and, as in any communication act, the form and content of the message are adjusted to the interlocutor’s needs. These three aspects, or components, of the theory of lexicography can be called, using somewhat metaphorically the Morris terms (Morris 1938, cf. Lyons 1977: 114–20), syntactic, when the focus is on form, semantic, with the focus on content, and pragmatic, the focus being the user. In many theoretical accounts of lexicography the three aspects were indeed distinguished, though perhaps not as explicitly (cf. Wiegand 1984, Geeraerts 1989, Bothma, Gouws and Prinsloo 2016). Recently it has been suggested that the pragmatic aspect of lexicography should be given absolute priority when compiling a new dictionary, within so-called function theory (Tarp 2008), which is a model of anticipated needs of users. His theory runs in a nutshell as follows: a theorist describes the users’ needs without any empirical research into them, by intellectual speculation, empirical studies will only hamper the theorist, and then he/she designs lexicographic products that satisfy those hypothetical needs. However, it is clear that the theorist uses his/her own experience to design the needs, because some empirical foundation has to be used. Are we certain that the experience of the theorist is that of other users of dictionaries? And can we really predict users’ needs? A company like Apple solves the problem by creating the needs of its customers (their desire in marketing lingo), and then produces objects to satisfy them – a good example is the computer tablet – they can do so because of the sheer power and inventiveness of its marketing division. Indeed, ‘Apple is one of the greatest marketers of all time’ (e.g. Moorman 2018), and Steve Jobs was a marketing person above all. The needs do not last long; we are just now witnessing the decline of the tablet, after about ten years of its life. Do dictionary makers have the same power? I do not think so. At present it is obvious that it is American companies like Google, Apple, Microsoft, Amazon and Facebook that for the time being define the needs of the users, offering their products in exchange for personal data (cf. Simonsen 2017, Kosem et al. 2019). We are talking about the mass public; there will usually be a place on the market for a couple of niche products. At present the lexicographic form (the syntactic aspect) is defined by IT specialists and lexicographers, uses of content (the pragmatic aspect) by IT companies and perhaps lexicographers, so it is content itself (the semantic aspect) that is left to the lexicographers, i.e. lexicographic data. At this point one can ask whether the description of a language in dictionaries is dependent on descriptions of languages in linguistics? Some people believe so, saying that lexicography is a branch of applied linguistics, like lexicology (whose status in English linguistics is unclear). A good recent discussion can be found in Tarp (2010), who himself thinks that lexicography, in particular its theory, is an independent field of study. Atkins and Rundell (2008) say that lexicographers should use linguistic studies in their descriptions of a language as inspiration, which means that the lexicographic description is different from the linguistic one.
277
The Bloomsbury Handbook of Lexicography
The chief difference between lexicography and linguistics was imaginatively described by Dwight Bolinger, who says: One potential actor is still waiting in the wings: the lexicographer. He is an unpretentious fellow, and perhaps that is why he has been there so long. If only he were not such a dilettante – insisting on looking at every wayward sense and anecdote, at frequencies and usages, at the whole of meaning instead of some theoretically important part. (Bolinger 1975: 224) Bolinger hints at the fact that linguists use idealization, which is ‘a deliberate simplification of something complicated with the objective of making it more tractable’ (Frigg and Hartmann 2012). For example, they take account of that part of the lexicon only which falls under the scope of their theory. If we look, for example, at Wierzbicka’s work (e.g. Wierzbicka 1985), who writes a lot about lexicography, it is clear that she describes just a handful of words, and no lexicographer would do that. Wierzbicka’s descriptions are very detailed, while dictionaries include shallow descriptions of a large number of lexical items. In general, lexicology might be said to do intensive description of a small number of items, and lexicography does extensive description of a large number of items (cf. Hanks 2006). Obviously, a lexicographer also idealizes the data, because it is not possible to describe all possible contextual uses of a word in a dictionary. However, lexicographic idealization is carried out from the point of view of a specific type of user. The benefits of the independent lexicographic approach to the public and to linguistics are obvious. It is not difficult today to collect huge masses of raw data, or even to process them using shallow methods, such as those described by Grefenstette (1998), who calls them approximate linguistics. Sketch Engine is a tool to process text on a shallow level (Kilgarriff et al. 2004 and Chapter 7); another is iWeb (Davies 2018–). While those shallow methods do produce extremely valuable results, they are still just data to be interpreted, and the interpreter has to have the adequate skills, those of a lexicographer, translator or linguist. A dictionary has one advantage over such compilations of semi-processed data – it contains huge numbers of items with interpretations of all the items it includes, by highly competent individuals, on the basis of the current norms of the society they live in. Whatever the quality of the interpretation, its one advantage is that it is there, in the dictionary. Davies’s iWeb is an invaluable resource for professionals, however, when interpretation of the data it produces is required, the user is directed to the Merriam-Webster Dictionary website (Merriam-Webster. com Dictionary). When we think about billions of users, we think, for example, about Microsoft, Facebook or Google. While Google are very competent at processing linguistic forms, when they need an interpretation of them they also access a traditional dictionary; for English it is The Oxford Dictionary of English (Soanes 2010); the same dictionary is used when semantic interpretation of particular English items is shown in the application translate.google.com. Therefore, one should be cautious about assertions that lexicography, or its theory, should conform to the latest linguistic fashions. This is a point that Geeraerts makes implicitly in his 2007 paper. While Geeraerts himself thinks that practical lexicography is an applied branch of lexicology, i.e. that lexicology formulates theoretical principles, and lexicography uses the principles in practice (Geeraerts 2007: 1172), he says that ‘a number of existing definitional 278
A Theory of Lexicography
and descriptive practices in the dictionary that are somewhat suspect from an older theoretical point of view receive a natural interpretation and legitimacy in the theoretical framework offered by Cognitive Linguistics’ (pp. 1160–1). Thus, if lexicographers did not use some definitional practices because they had been condemned by one group of linguists, they might come under criticism from another group of linguists who think these practices are legitimate.
6 Validity of theories As we have seen, there are various theories of lexicography and, naturally, some of them are in competition. In his theoretical book, Tarp (2008) criticizes Wiegand’s theory, because it is too linguistically oriented and does not take into account lexical functions as much as Tarp thinks it should. Tarp, on the other hand, engages in purely intellectual speculations. How to judge who is right, Wiegand or Tarp, or perhaps de Schryver, who criticizes them both? This is the question about the validity of theories, and about predictions that they might make. ‘The most convincing demonstration of the validity of a scientific hypothesis is the successful prediction of a previously unobserved or unrecorded observation’ (Ziman 1984: 43). Are theories of lexicography predictive in this sense? A short answer is that they are not and that they will not be. This results, again, from the differences between the sciences and the humanities; the term theory is understood in different ways in the sciences and the humanities. To avoid this confusion perhaps it is really better to use a more specialist term like ‘metalexicography’. The natural sciences form general statements about the world; these general statements, sometimes called scientific laws, are always true under specified conditions (or they are specific probabilities for the occurrence of a certain event; cf. also Adamska-Sałaciak 2019). There are no exceptions to them, an exception simply means that the law is false. A scientific law will be true in future, too, and this makes it possible for scientists to make predictions about future events. The humanities, in contrast, cannot make such predictions. Interestingly, while Mantzavinos (2014) argues that in the humanities scholars use the same hypothetical-deductive methods as the scientists do, he has nothing to say about predictions. In the humanities we can make some generalizations on the basis of the description of past events – in a way the humanities are always historical sciences. These generalizations are true only about the past and, perhaps, the immediate present. We do not know what the future will be, we can only hope that the nearest future will be similar to the present, and we extrapolate our expectations about the present, derived from past experiences, to the future. That is why dictionaries have to be rewritten, that is why practical manuals of lexicography can be used with some success in ‘normal’ times, because we think that tomorrow will be essentially like today. While in ‘normal’ times this extrapolation of principles derived from the past into the future works reasonably well, because social conditions change gradually, in unusual times (perhaps it would be apt to use here Kuhn’s phrase ‘the revolutionary change of paradigm’ (Kuhn 1970)) this does not work at all, because tomorrow can be quite different from today. In lexicography we certainly have the revolutionary situation, often called disruptive; dictionaries rapidly change, they become abstract objects in virtual space, and are no longer concrete tomes on the bookshelf. It is highly likely that the dictionary of the future will not be
279
The Bloomsbury Handbook of Lexicography
perceived as an object at all, it will work like a background process, unnoticed by human users. We already know dictionaries of this sort, for example, the spelling checker working with a word editor or predictive text input methods used when writing a text message on the phone. The services do not use typical dictionary displays but they offer dictionary functions. The traditional models of dictionary distribution and use no longer function as they did. Also, users’ expectations change rapidly. This also means that any principles derived from past behaviour of dictionary users, i.e. from surveys of users’ needs and expectations, are doomed from the beginning. The knowledge of the past does not give us a clue as to how the users might use a dictionary in a completely different environment, i.e. in the unknown future. So what does the validity of a lexicographic theory depend on? As usual in the humanities, it primarily depends on the authority of the theoretician. We tend to believe that because Sue Atkins and Michael Rundell produced successful dictionaries in the past, we can be sure that their guide will help us make a good dictionary in the future. This belief is fallible, however, as we have noticed. As long as theories are based on the authority of their makers they are simply sets of beliefs. A good theory, especially if it makes some suggestions about the future, should also describe how to verify its claims.
7 Conclusion In this chapter I have argued that there are indeed theories of lexicography, from prescriptive descriptions of how to make one dictionary, to general theories that cover all aspects of lexicography. The most developed general theory of printed dictionaries is that by Herbert Ernest Wiegand. A theory of lexicography should address three aspects, which make lexicography unique among other reference sciences: syntactic (lexicographic structures), semantic (content of structures) and pragmatic (user needs); however, lexicographic structures now are designed primarily by computer scientists, user needs and expectations are created by global computer companies, it is adequate description of lexical items that is left to the lexicographer. There are unfortunately very few theories that discuss the most basic assumptions on which dictionaries are founded. Validity of theoretical models in the humanities is primarily based on their efficiency in the past; therefore, they cannot be used in times when the cultural and technological environment rapidly changes.
References Adams, H. L., W. Burger, Jr., S. Riffell. (2019), ‘Influence of disturbance on Avian communities in Agricultural Conservation Buffers in Mississippi, USA’, The Open Journal of Ornithology, Available at https://openornithologyjournal.com/VOLUME/12/PAGE/16/FULLTEXT/ [accessed 20 March 2020]. Adamska-Sałaciak, A. (2010), ‘Examining equivalence’, International Journal of Lexicography 23 (4), 387–409. Adamska-Sałaciak, A. (2019) ‘Lexicography and theory: Clearing the ground’, International Journal of Lexicography 32 (1), 1–19.
280
A Theory of Lexicography
Atkins, B.T.S. (1992/93), ‘Theoretical lexicography and its relation to dictionary-making’, Dictionaries 14, 4–43. Atkins, B.T.S. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. Bański, P., J. Bowers and T. Erjavec (2017), ‘TEI-Lex0 guidelines for the encoding of dictionary information on written and spoken forms’ in I. Kosem et al. (eds), Electronic Lexicography in the 21st Century: Proceedings of eLex 2017 Conference, 485–94. Béjoint, H. (2010), The Lexicography of English, Oxford: Oxford University Press. Béjoint, H. (2015), ‘Dictionaries for general users: History and development; current issues’ in P. Durkin (ed.), The Oxford Handbook of Lexicography, Oxford: Oxford University Press. Bolinger, D. (1975), Aspects of Language, New York: Harcourt, Brace, Jovanovich Inc. Bolinger, D. (1985), ‘Defining the indefinable’ in R. Ilson (ed.), Dictionaries, Lexicography and Language Learning, Oxford: Pergamon Press, 69–73. (Reprinted in T. Fontenelle (ed.) 2008) Bothma, T. J. D., R. H. Gouws and D. J. Prinsloo (2016), ‘The role of e-lexicography in the confirmation of lexicography as an independent and multidisciplinary field’ in T. Margalitadze and G. Meladze (eds), Proceedings of the XVII EURALEX International Congress, Tbilisi: Lexicographic Centre, Ivane Javakhishvili Tbilisi State University, 109–16. Chishman, R. and G-M. De Schryver (2019), ‘An overview of Digital Lexicography and directions for its future: An interview with Gilles-Maurice de Schryver’, Calidoscópio 17 (3), 659–3. Available at https://euralex.org/2019/12/10/interview-with-gilles-maurice-de-schryver-in-calidoscopio/ [accessed 20 March 2020]. Clark, B. (2006), ‘Linguistics as a science’ in K. Brown (ed.), Encyclopedia of Language and Linguistics, Oxford: Elsevier Science. Culler, J. (2000), Literary Theory: A Very Short Introduction, Oxford: Oxford University Press. Davies, M. (2018–), The 14 Billion Word iWeb Corpus, Available at https://www.english-corpora.org/ iWeb/ [accessed 3 April 2020]. De Schryver, G-M. (2012), ‘Lexicography in the Crystal Ball: Facts, trends and outlook’ in R. Vatvedt Fjeld and J.M. Torjusen (eds), Proceedings of the 15th EURALEX International Congress. 7–11 August 2012, Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 93–163. Fontenelle, T. (ed.) (2008), Practical Lexicography. A Reader, Oxford: Oxford University Press. Frigg, R. and S. Hartmann (2012), ‘Models in science’ in E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Fall 2012 edition). Available at http://plato.stanford.edu/archives/fall2012/entries/modelsscience/ [accessed July 2012]. Geeraerts, D. (1989), ‘Principles of monolingual lexicography’ in F-J. Hausmann et al. (eds), 287–96. Geeraerts, D. (2007), ‘Lexicography’ in D. Geeraerts and H. Cuyckens (eds), The Oxford Handbook of Cognitive Linguistics, Oxford: Oxford University Press, Ch. 44. Geeraerts, D. (2015), ‘Lexicography and Theories of Lexical Semantics’ in P. Durkin (ed.), The Oxford Handbook of Lexicography, Oxford: Oxford University Press, 425–38. Genztler, E. (2001), Contemporary Translation Theories, Revised Second edition, Clevedon: Multilingual Matters. Grefenstette, G. (1998), ‘The future of linguistics and lexicographers: Will there be lexicographers in the year 3000?’ in T. Fontenelle, P. Hiligsmann, A. Michiels, A. Moulin and S. Theissen (eds), Proceedings of the Eighth EURALEX Congress, Liège: University of Liège, 25–41. (Reprinted in T. Fontenelle (ed.) 2008) Hanks, P. (2000), ‘Contributions of lexicography and corpus linguistics to a theory of language performance’ in U. Heid et al. (eds), 3–13. Hanks, P. (2000/2008), ‘Do word meanings exist?’ Computers and the Humanities 34, 205–15. (Reprinted in T. Fontenelle (ed.) 2008) Hanks, P. (2006), ‘Lexicography’ in K. Brown (ed.), Encyclopedia of Language and Linguistics, Oxford: Elsevier Science. Hanks, P. (2013), Lexical Analysis: Norms and Exploitations, Cambridge, MA: MIT Press.
281
The Bloomsbury Handbook of Lexicography
Hartmann, R.R K. (2005), ‘Interlingual references: On the mutual relations between lexicography and translation’, The Hong Kong Linguist 25, 43–52. (Reprinted in R.R.K. Hartmann (2007), Interlingual Lexicography Selected Essays on Translation Equivalence, Contrastive Linguistics and the Bilingual Dictionary, Tübingen: Max Niemeyer, 208–17.) Hausmann, F-J. and H.E. Wiegand (1989), ‘Component parts and structures of general monolingual dictionaries: A survey’ in F-J. Hausmann et al. (eds), 328–60. Hausmann, F-J., O. Reichmann, H.E. Wiegand and L. Zgusta (1989–91), Wörterbücher/Dictionaries/ Dictionnaires: An International Encyclopedia of Lexicography, vols. 1–3, Berlin: Walter de Gruyter. Heid, U., S. Evert, E. Lehmann and C. Rohrer (eds) (2000), Proceedings of the Ninth EURALEX International Congress, EURALEX 2000, Stuttgart: Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. Hilpinen, R. (2011), ‘Artifact’, in: E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Winter 2011 edition). Available at http://plato.stanford.edu/archives/win2011/entries/artifact. Hüllen, W. (1999), English Dictionaries 800–1700. The Topical Tradition, Oxford: Clarendon Press. Kilgarriff, A., P. Rychly, P. Smrž and D. Tugwell (2004), ‘The Sketch Engine’ in G. Williams and S. Vessler (eds), Euralex 2004 Proceedings, Lorient: Université de Bretagne-Sud, 105–16. (Reprinted in T. Fontenelle (ed.) 2008) Kosem, I., R. Lew, C. Müller-Spitzer, M.R. Silveira and S. Wolfer (2019), ‘The Image of the Monolingual Dictionary across Europe. Results of the European Survey of Dictionary use and Culture’, International Journal of Lexicography 32 (1), 92–114. Kuhn, T.S. (1970), The Structure of Scientific Revolutions, Second edition, Chicago: University of Chicago Press. Lyons, J. (1977), Semantics, Vol. 1, Cambridge: Cambridge University Press. Mantzavinos, C. (2014), ‘Text Interpretation as a Scientific Activity’, Journal for General Philosophy of Science 45, 45–58. Mantzavinos, C. (2020), ‘Hermeneutics’ in E. N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Spring 2020 edition). Available at https://plato.stanford.edu/entries/hermeneutics/ [accessed 20 March 2020]. Margolis, E. and S. Laurence (eds) (2007), Creations of the Mind: Theories of Artifacts and their Representation, Oxford: Clarendon Press. McArthur, T. (1998), ‘What then is reference science?’ in T. McArthur, Living Words, Language, Lexicography and the Knowledge Revolution, Exeter: University of Exeter Press, 215–22. (Reprinted in: R. R. K. Hartmann (ed.) (2003), Lexicography, Critical Concepts. Vol. III. Lexicography: Lexicography, Metalexicography and Reference Science, London: Routledge, 422–8.) Merriam-Webster.com Dictionary, Merriam-Webster, Available at https://www.merriam-webster.com/ dictionary/collegiate [accessed 9 April 2020]. Moorman, C. (2018), ‘Why Apple Is Still A Great Marketer and What You Can Learn’, Forbes, Available at https://www.forbes.com/sites/christinemoorman/2018/01/12/why-apple-is-still-a-great-marketerand-what-you-can-learn/#40e5b54f15bd [accessed 18 March 2020]. Morris, C. (1938), ‘Foundations of the theory of signs’ in O. Neurath, R. Carnap and C. Morris (eds), International Encyclopaedia of Unified Science I, Chicago: University of Chicago Press, 77–138. Piotrowski, T. (2009), ‘Review of Tarp 2008’, International Journal of Lexicography 22 (4), 480–6. Psillos, S and M. Curd (eds) (2008), The Routledge Companion to Philosophy of Science, Abingdon and New York: Routledge. Pym, A. (2010), Exploring Translation Theories, London and New York: Routledge. Rundell, M. (2012), ‘It Works in Practice but Will It Work in Theory? The Uneasy Relationship between Lexicography and Matters Theoretical’ in R. V. Fjeld and J. M. Torjusen (eds), Proceedings of the 15th EURALEX International Congress, 7–11 August 2012, Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 47–92. Simonsen, H.-K. (2014), ‘Mobile Lexicography: A survey of the mobile user situation’ in A. Abel, C. Vettori and N. Ralli (eds), Proceedings of the XVIth EURALEX International Congress: The User in Focus, Bolzano/Bozen: Institute for Specialised Communication and Multilingualism, 15–19.
282
A Theory of Lexicography
Simonsen, H.-K. (2017), ‘Lexicography: What is the Business Model?’ in I. Kosem, C. Tiberius, M. Jakubíček, J. Kallas, S. Krek and V. Baisa (eds), Electronic Lexicography in the 21st Century: Proceedings of eLex 2017 conference, Brno: Lexical Computing CZ s.r.o., 395–415. Sinclair, J. (ed.) (1987), Looking Up: an Account of the Cobuild Project in Lexical Computing, London and Glasgow: HarperCollins. Sketch Engine, Available at https://spp.sketchengine.eu [accessed 3 April 2020]. Soanes, C. (2010), Oxford Dictionary of English, Third edition, Oxford: Oxford University Press. Svensén, B. (2009), A Handbook of Lexicography. The Theory and Practice of Dictionary-Making, Cambridge: Cambridge University Press. Tarp, S. (2008), Lexicography in the Borderland between Knowledge and Non-Knowledge. General Lexicographical Theory with Particular Focus on Learner’s Lexicography, Tübingen: Max Niemeyer. Tarp, S. (2009), ‘Reflections on lexicographical user research’, Lexikos 19, 275–96, Available at http:// lexikos.journals.ac.za/pub/article/view/440/157 [accessed 10 July 2012]. Tarp, S. (2010), ‘Reflections on the academic status of lexicography’, Lexikos 20, 450–65. Available at http://lexikos.journals.ac.za/pub/article/view/152/94 [accessed 13 July 2012]. Tarp, S. (2011), ‘Lexicographical and other e-tools for consultation purposes: Towards the individualization of needs satisfaction’ in P. A. Fuertes-Olivera and H. Bergenholtz (eds), e-Lexicography. The Internet, Digital Initiatives and Lexicography, London: Continuum, 54–70. Tarp, S. (2012), ‘Do We Need a (New) Theory of Lexicography?’, Lexikos 22, 321–32. Available at https://lexikos.journals.ac.za/pub/article/view/1010 [accessed 12 March 2020]. Wiegand, H.E. (1984), ‘On the structure and contents of a general theory of lexicography’ in R.R.K. Hartmann (ed.), LEXeter’83 Proceedings. Papers from the International Conference on Lexicography at Exeter, 9-12 September 1983, Tübingen: Max Niemeyer, 13–30. Wiegand, H.E. (1989), ‘Der gegenwärtige Status der Lexikographie und ihr Verhältnis zu anderen Disziplinen’ in F-J. Hausmann et al. (eds), 246–80. Wiegand, H.E. (with M. Beißwenger etc.) (2010), ‘Systematic Introduction’, in Wörterbuch zur Lexikographie und Wörterbuchforschung. Dictionary of Lexicography and Dictionary Research, Berlin and New York: W. De Gruyter, 129–225. Available at http://www.herbert-ernst-wiegand.de/ index_publikationen_01.htm [accessed 10 April 2020]. Wiegand, H.E., M. Beißwenger, R.H. Gouws, M. Kammerer, A. Storrer and W. Wolski (eds) (2010), Wörterbuch zur Lexikographie und Wörterbuchforschung/Dictionary of Lexicography and Dictionary Research. Band 1/Volume 1 Systematische Einführung /Systematic Introduction + A–C, Berlin and New York: Walter de Gruyter. Wiegand, H.E., I. Feinauer and R.H. Gouws (2013), ‘Types of dictionary articles in printed dictionaries’ in R.H. Gouws, U. Heid, W. Schweickard and H. E. Wiegand (eds), Dictionaries. An International Encyclopedia of Lexicography. Supplementary Volume: Recent Developments with Focus on Electronic and Computational Lexicography (Handbücher zur Sprach- und Kommunikationswissenschaft 5.4), Berlin and Boston: W. de Gruyter, 314–66, Available at http://www.herbert-ernst-wiegand.de/ dokumente/475.pdf [accessed 19 March 2020]. Wierzbicka, A. (1985), Lexicography and Conceptual Analysis, Ann Arbor: Karoma Publishers. Woodward, J. (2011), ‘Scientific explanation’ in E.N. Zalta (ed.), The Stanford Encyclopedia of Philosophy (Winter 2011 Edition). Available at http://plato.stanford.edu/archives/win2011/entries/ scientific-explanation/ [accessed 6 July 2012]. Zhao, J. and D. Tonias (2012), Bridge Engineering. Rehabilitation, and Maintenance of Modern Highway Bridges, Third edition, New York: McGraw-Hill Professional. Ziman, J. (1984), An Introduction to Science Studies. The Philosophical and Social Aspects of Science and Technology, Cambridge: Cambridge University Press.
283
284
17
Compiling dictionaries for minority and endangered languages Verna Stutzman and Kevin Warfel
1 Introduction The word dictionary originated in 1220, when John of Garland, from England, published Dictionarius (Hüllen 1999: 85) to help students with Latin diction, thus inventing the word dictionary. Today, dictionary is defined as ‘a book or electronic resource that lists the words of a language (typically in alphabetical order) and gives their meaning, or gives the equivalent words in a different language, often also providing information about pronunciation, origin, and usage’ (Dictionary 2020). The oldest known dictionaries are bilingual Sumerian-Akkadian wordlists found on the Akkadian Empire cuneiform tablets, discovered in Ebla (modern Syria) and dated c. 2300 BCE (Misachi 2017). Figure 17.1 depicts some of the significant dictionary publications from then until today. Compiling and publishing dictionaries in the Western world accelerated for majority languages after A Dictionary of the English Language, compiled by British lexicographer Samuel Johnson, was published in 1755. It included example sentences for the first time and is among the most influential dictionaries in the history of the English language. One of the major players in minority-language dictionary development in modern times is SIL International (SIL), a faith-based non-profit organization. Founded in 1934, SIL has grown from a small summer linguistics training program with two students to a staff of over 5,000 people from 89 countries of origin. SIL is currently involved in over 1,660 active language projects, representing 1.07 billion people in 162 countries. (About SIL 2020) Activities included in these language projects are the development of a writing system,1 linguistic analysis and bilingual/multilingual dictionary compilation. SIL has supported communities in the production of Scripture portions, linguistic books and articles, reading materials and dictionaries. Some dictionaries are minimal wordlists, and others are detailed publications of lexical data. Although SIL personnel, working alongside local language speakers, were involved in compiling and publishing minority-language dictionaries from the outset, in 2012 SIL formed the Dictionary and Lexicography Services unit in order to more effectively support dictionary compilation and publication in minority languages. Webonary2 was launched as the online
The Bloomsbury Handbook of Lexicography
Figure 17.1 Timeline of significant dictionary publications.
platform for publishing dictionaries, making the lexical data SIL had collected over the years readily available to the world. To date, there are more than 240 minority-language dictionaries in various stages of completion published on the site.
2 Why dictionaries are needed for minority languages There are several different audiences that benefit from a dictionary in a minority language: the language community itself, government officials of the country where the language is spoken, neighbouring and interested communities who have business and personal dealings with the language community, the world at large and future generations of the language community. A dictionary can be tailored to any of these audiences, but the primary benefit of a minoritylanguage dictionary is its ability to help ‘bridge the gap between the minority community and national language speakers’ (Grimes 1986: 422). To be maximally useful, the minority-language dictionary must be bilingual (or multilingual), including the vernacular and one or more languages of wider communication (LWC). An LWC can be a regional language that is shared by several language communities or the official language of the country.
2.1 The language community Having a dictionary in their mother tongue has been demonstrated to boost a language community’s self-esteem, help them appreciate their language, and increase their willingness to learn to read and write it. One example of this is the experience of the Huave people in Oaxaca, Mexico, who were told, ‘ “You don’t have a real language. You only speak in grunts and groans. You need to learn Spanish,” and the Huaves were often ashamed of their own language’ (Cahill, Stairs and Stairs 2003: 77). When the Huave New Testament was published in 1972, sales were relatively slow. It wasn’t until their dictionary/grammar was published in 1981 that the people really became excited about their language and began conducting Huave language classes. The dictionary that ‘made the Huaves proud of their language and themselves is being used in the local bilingual schools’ (Cahill, Stairs and Stairs 2003: 79). Kindberg (2002) reported Feenstra’s description of language work among the Dogrib people in the Northwest Territories of Canada. Until 1980, education took place in English and, although 286
Compiling Dictionaries for Minority Languages
the people used the vernacular in their homes, they did not use it for reading and writing. Schoolteachers began to ask Feenstra for a Dogrib dictionary, but all he had was a wordlist. When he realized the urgency of the desire for a dictionary, he and his Dogrib co-workers began to prepare the dictionary for publication. ‘The results were dramatic. Immediately teachers used the dictionary in the schools and it quickly became obvious that the publication run had been far too small’ (Kindberg 2002: 10). ‘The timely publication of the Dogrib wordlist “dictionary” initiated wide-spread literacy, promoted general good will for the translation project, and opened a “market” for Bible stories and translated Scriptures’ (Kindberg 2002: 11). Governments tend to require that a minority language has a dictionary available before approving that language as a language of instruction in the education system. For example, the Republic of the Philippines Department of Education lists an ‘Officially Documented Vocabulary’ as one of the ‘Four Minima for MTB-MLE3 implementation’ (Policy Guidelines: 137). In Ghana, one language community held a Rapid Word Collection4 workshop in 2012 because of their need for a dictionary in order to use their language for formal instruction in school. Peter Adaawen, Project Coordinator for the Buli Literacy Project, stated, ‘With a dictionary, it will give easy access for us to actually teach our children in our own language’ (Clark 2012). The existence of a bilingual dictionary promotes literacy among speakers of the minority language, which serves as a bridge, making the leap to literacy in the LWC smaller and therefore more feasible. Mother-tongue authors are empowered to record their oral traditions in written form for current and future generations, translate vital materials disseminated primarily in the LWC, and author new content, thus boosting the size of the written corpus of the language. The dictionary provides tools for writing – a guidebook for orthography, a spelling checker and a synonym look-up or thesaurus.
2.2 Outsiders to the language community Various branches of the government of the language community’s host country have a vested interest in the publication of a bilingual dictionary for minority languages, especially for translation purposes. For example, in India, the Commission for Scientific and Technical Terminology was established in 1960 to oversee the development and publication of ‘scientific and technical terms in … all Indian languages’ (Language Education 2016). People in neighbouring language communities may have a business or personal interest in learning the language. Any literate individual interested in learning another language finds a bilingual dictionary to be a great resource in the quest to communicate in that language. The linguistic community also has a keen interest in having access to bilingual minority-language dictionaries for linguistic analysis, both synchronic and diachronic.5
2.3 Future generations – revitalizing languages When languages are in danger of dying out or the last known speaker of a language has passed on, if the language has been documented, it may be possible to revitalize the language. The Wampanoag people of Massachusetts began reviving their language in 1993 (Wôpanâak 2020). The materials available to their revitalization project included a translation of the Bible, various 287
The Bloomsbury Handbook of Lexicography
tracts and court documents, and a dictionary (Cotton, Pickering and Davis [1707] 1829) that had been published by Cambridge. The project was successful and ‘there are now 15 adult speakers of the Wampanoag language and around 75 children at various levels of fluency’ (Silversmith 2016). According to Echerd (2019: 5), the minimum documentation that is needed to preserve and revitalize a language is the lexicon, the grammar and audio recordings of the sounds of the language. In the Quinault language community of Washington State (USA), the last speaker died in 1996 (Quinault 2015), and there are no audio recordings (Stephen Echerd, personal communication, 27 May 2020). They are attempting to resurrect their language using a wordlist of 900 words and the translated Gospel of Mark. Wikipedia lists at least 22 language-revitalization projects that have been successful to varying degrees (List of Revived Languages 2020).
3 Compiling minority-language dictionaries Compiling a minority-language dictionary is a huge task – a never-ending and somewhat thankless task. Samuel Johnson (1755: i) wrote: It is the fate of those who toil at the lower employments of life … to be exposed to censure, without hope of praise … Among these unhappy mortals is the writer of dictionaries … Every other author may aspire to praise; the lexicographer can only hope to escape reproach. While Johnson characterized compiling dictionaries as a thankless task, Roberts, Hedinger and Gravina (2014: 14) compare the task of finding words and compiling them into a dictionary to a ‘treasure hunt and an adventure of scientific research’. Compiling a dictionary involves four stages: collecting the words to be included, managing the list of words as it grows in length, discovering and capturing information about each word and bringing sufficient order to the data to publish it in a comprehensible format.
3.1 Collecting words There are two steps in collecting words for the dictionary: identifying the words and capturing them in written form or as a digital recording. Since a native speaker may know between 10,000 and 30,000 words (see Figure 17.2), there is potential for a large dictionary. There are multiple strategies that lexicographers have used to collect words for a dictionary: holding a Rapid Word Collection workshop, employing the flypaper method, translating a wordlist, analysing text corpora, creating a computer-generated list and using picture books (Stutzman, Warfel and Bryson forthcoming). The Rapid Word Collection (RWC) methodology has its origins in the work of Ronald Moe, who developed the Dictionary Development Process (DDP) (Moe 2007). RWC revolutionized the task of collecting words by using a systematic method based on semantic domains to capture words in a community-organized workshop. The words in our mental vocabulary are not sorted 288
Compiling Dictionaries for Minority Languages
Figure 17.2 How much of the Lexicon does a person need to know? (Echerd 2019: 11).
alphabetically, but are organized around semantic domains, which are ‘key concepts, much like stars cluster in galaxies, or planets revolve around a star’ (Moe 2007: 2). Moe developed a set of nearly 1,800 semantic domains,6 which are used in the word-collection process. They are organized hierarchically under nine major headings – Universe and creation; Person; Language and Thought; Social behaviour; Daily life; Work and occupation; Physical actions; States and Grammar (see Figure 17.3). Each semantic domain (see Figure 17.4) includes: ●● ●●
●● ●●
●●
number for sorting purposes domain label (consisting of a word or short phrase that captures the basic idea of the domain) short description of the domain series of elicitation questions designed to help people think of the words that belong to the domain short list of example words in the LWC under each question that belong to the domain.
Figure 17.3 Semantic Domain = cluster of words (Moe 2007: 2). 289
The Bloomsbury Handbook of Lexicography
Figure 17.4 Sample Semantic Domain Questionnaire.7
‘Different languages vary in the semantic domains they identify, in how finely they carve up these domains, and in how they make distinctions between different members of a domain’ (Peoples and Bailey [1988] 2018: 61). Given the uniqueness of each culture, anthropologists have traditionally believed that a unique set of domains must be discovered for each language – a task so daunting that few have undertaken it. While it is true that each language has some unique domains, humans from different cultures are more alike than different and there is a large degree of universality in the semantic domains we use. Moe’s semantic domains and elicitation questions do not purport to be universal, but they have proven to work in a wide variety of languages to stimulate the neural networks and trigger an outpouring of related words. The human brain has the ability to rapidly recall the words that belong to a semantic domain, jumping from word to word along the pathways of lexical relations. In a workshop setting with groups of four to six mother-tongue speakers working together, each team leader takes the questionnaire for a particular semantic domain, asking the group members to call out words related to that semantic domain. For example, if the semantic domain is sun, they can quickly call out a number of words such as moon, light, sunbeam, shine, sunrise, noon, sunset, sunstroke. ‘Experience has shown that the synergy of working in a team – as opposed to the isolation of an 290
Compiling Dictionaries for Minority Languages
individual working alone – results in a far greater number of words being collected and is much more enjoyable and encouraging’ (Warfel and Stutzman 2020: 40). Not only is the number of words collected in a two-week period greater than with any other method, but ‘no other method both collects and classifies words at the same time’ (Moe 2007: 9). Boerger and Stutzman (2018: 178) studied twelve RWC workshops and found that the ‘average number of raw words collected was 13,762 words’. RWC workshops consistently achieve a total of 12,000 or more entries during a brief two-week period if the best-practice formula is followed. When compiled in lexical format, this results in the equivalent of 8,000–9,000 unique entries. A less methodical way of collecting words is the flypaper method – as the researcher mingles with language speakers in the community and hears unfamiliar words, s/he enters them into her/his notebook. Pwaka used this method for collecting words in the Lou language in Solong8 village. He collected around 3,000 words over many years on scraps of paper (Pwaka et al. 2013: Introduction). In today’s world, the words could be entered into an app on a cellphone, like the Iwaidja began doing in 2012 (Birch 2012). Translating a wordlist is a relatively quick way to collect several hundred words. The Comparative African Word List, a list of 1,700 words developed for comparative linguistics across African languages, has been used for this purpose (Roberts and Snider 2006). Apart from providing a good starting point for compiling a dictionary, ‘word lists have served well for grouping languages according to their linguistic affiliation and even for comparative reconstruction of lexical items of the parent language’ (Bartholomew and Schoenhals [1983] 2019: 11). A comprehensive language development project profits from the collection of texts, a corpus. These texts can be a part of traditional oral literature or modern texts, including letters, songs, sermons and Scripture translations, as listed in Figure 17.5. As these texts are collected, they provide a rich source of vocabulary items to include in the dictionary. For example, the compilers of the Oxford English Dictionary use the Oxford English Corpus, a text corpus of twenty-first-century English, the largest corpus of its kind, containing nearly 2.1 billion words. It includes language from the United Kingdom, the United States, Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore and South Africa. The text is mainly collected from web pages; some printed texts, such as academic journals, have been collected to supplement particular subject areas (Oxford English Corpus 2020).
Figure 17.5 Traditional oral literature (Van den Berg 2012: 7). 291
The Bloomsbury Handbook of Lexicography
Obviously, using the text corpus method – the technical process of gleaning words from thousands of vernacular texts – is a desirable way to populate a dictionary. Unfortunately, large corpora are in short supply for languages with an oral-only tradition. A ‘blank dictionary is a computer-generated list of all possible words of two syllables in the language. This list is then checked by native speakers as to whether or not these words actually occur’ (Van den Berg 2012: 8). This approach works well if the language has a simple syllable structure with only V and CV9 syllables, fewer than seven vowels and no more than 25 consonants. If a language has a complex syllable structure or a large number of phonemes, the possibilities are too numerous for this method to be practical. Both the Muna in Sulawesi, Indonesia, and the Vitu in West New Britain, Papua New Guinea, used this approach with success. Vitu has five vowels and fourteen consonants with only V and CV syllables. Muna also has only V and CV syllables, five vowels, and twentythree consonants. The blank Muna dictionary generated 94 pages of 180 possible words on each page, totalling about 17,000 words. Figure 17.6 shows a sample of some words starting with the letter D from Muna that were generated using this procedure. The underlined words are those which actually occur in the language. Words which are not underlined do not exist in the language. Picture books with scientific names are especially helpful for eliciting terms for flora (e.g. trees, flowers, grasses) and fauna (e.g. birds, mammals, insects, fish, other sea creatures).
3.2 Entering words into database software After identifying words for the dictionary, it is essential to enter them into a lexical database in order to manage the growing list of words and to process them for publishing (Stutzman, Warfel and Bryson 2020a: 2). Accordingly, one must choose appropriate database software and install it. SIL has a history of creating software to help with linguistic analysis and dictionary compilation, starting with Shoebox in 1987, followed by LinguaLinks in 1996, and FieldWorks Language Explorer (FLEx),10 sometimes referred to as Fieldworks. Of the dictionary-creation software produced by SIL,11 only FLEx is currently (as of 2020) being developed and maintained. FLEx supports tasks ranging from the initial entry of collected data through to the preparation of data for publication, including dictionary development, interlinearization of texts, morphological analysis and the production of dictionary publications. Some key criteria for choosing lexicography software for a dictionary-making project are: ●● ●●
Unicode-compliant, handles multiple writing systems structured data format, integrated with other software
Figure 17.6 Muna data (Van den Berg 2012: 8). 292
Compiling Dictionaries for Minority Languages
●● ●●
ongoing development, technical support, large community of users allows collaboration by multiple users on one project
The software should be Unicode12 compliant since ‘Unicode is an industry-wide character set encoding standard designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world’ (Language Technology Standards 2020). Similarly, Trippel et al. (2007: 3206) note that a ‘problem that appears artificial only at first glance is the use of multiple writing systems in the same lexicon article. Lexicon articles are the kind of text where such a change can happen frequently, for example in multilingual environments or with languages with more than one writing system’. FLEx is Unicode-aware and handles multiple writing systems easily both within fields and across fields. Good lexical software uses structured data like Lexicon Interchange FormaT (LIFT), which is an XML13 format for lexical information (dictionaries) allowing transfer of data between programs such as WeSay, FLEx and Lexique Pro.14 FLEx is also integrated with Phonology Assistant15 and Paratext16 and provides for uploading to Webonary.org for easy online publishing. There are also companion tools17 which are designed to accomplish various tasks in conjunction with FLEx. Well-designed databases have fields for specific kinds of information, rather than allowing the user to type the lexical data in a freeform manner, without any explicit indication as to the kind of data. For example, FLEx is ‘built on a complex data model which includes hundreds of possible fields’ (Baines 2018: 953). Occasionally, word-processing software has been used to publish a dictionary, but because such software provides no structure for lexical data, this approach nearly always results in numerous inconsistencies in the presentation of the data. FLEx continues to be developed, has a large community of support and is expected to be maintained far into the future. Technical support is readily available and affordable. A large community of users exists, making the experience of learning new software easier. For example, there are user forums for FLEx18 and for Webonary.19 FLEx also allows multiple users to collaborate on the same database, even if the users are continents apart. It achieves this by using the Send/Receive20 feature and the LanguageDepot server.21 There are currently several tools available to FLEx users for entering words into the FLEx database: the Collect Words tool, the Combine22 and LanguageForge.23 The Collect Words tool is part of the user interface in FLEx, while the Combine and LanguageForge are webbased applications that transfer data to/from FLEx. The Combine and LanguageForge allow collaboration in real time. In summary, FLEx meets all the requirements for good lexical database software.
3.3 Developing entries After the words have been collected and entered into database software, the next step is to develop the entries. The elements of an entry and how they are related to each other within the entry will be covered in Section 5 ‘Microstructure of a dictionary article’.
293
The Bloomsbury Handbook of Lexicography
4 Macrostructure of a dictionary publication The organization of a dictionary includes its elements, their order in the publication and the relationships between them. ‘This organization covers four main types of structure: megastructure, macrostructure, mesostructure, and microstructure’ (Rodríguez [2016] 2019: 19). This chapter deals with the concepts of macrostructure, microstructure and mesostructure, in that order. Macrostructure refers to how the dictionary publication is structured with the front matter, the body and the back matter (Stutzman, Warfel and Bryson 2020c). Publication used to mean only print publishing, but currently the emphasis is on electronic publishing. With the ability to publish electronically on the Internet and via mobile apps, new feature possibilities arise for the user experience (see Part III in this volume). These will be explored in a later section.
4.1 Body of dictionary The body of the dictionary contains the list of lexical entries, or articles, with detailed information in each entry. Individual entries will have a headword, a lexical or grammatical category (often called a part of speech), a definition (either in the same language or in an LWC), an example sentence (with translation), an optional picture, an optional audio recording and various other information about the headword. According to Omakaeva (2019: 2526), an ‘entry head (the lemma) is part of the macrostructure as well as of the microstructure and therefore assumes a pivotal role’. The kinds of words included in the list of lexical entries are basic uninflected words, such as affixes, clitics, roots and stems. Compounds, derivations or phrases may be included in the list, or they may be presented solely as part of a related entry. Inflected words are typically not included. The ordering of the list of entries is dependent on the writing system, of which there are three types – alphabets, syllabaries or logographies – each with its own distinctive sorting method. In supporting minority-language dictionary compilation, SIL focuses primarily on languages that use an alphabetic writing system,24 so the entries are sorted alphabetically. There are several decisions that affect how the strings of characters are ordered: ●●
●● ●●
How are characters that are not in the majority-language alphabet integrated into the vernacular alphabetical order? How are letters with diacritics ordered? How are digraphs and multigraphs ordered?
Most languages that use an alphabetic writing system are written left-to-right, where the order is based on the leftmost character in the word, but in languages which use a right-to-left writing system, such as Arabic and Hebrew, the ordering is based on the rightmost character in the word.
4.2 Front matter In a print publication, the introductory material, or front matter, is presented before the first entry and contains obligatory information like copyright and publication details: who owns the data, 294
Compiling Dictionaries for Minority Languages
how it may be licensed, name of publisher and date of publication, including an example of how to cite the dictionary in other academic publications. All authors, compilers and contributors to the dictionary project are acknowledged. Because the ‘new literate needs guidance in the mechanics of using such a volume’ (Bartholomew and Schoenhals [1983] 2019: 205), a table of contents is indispensable for a print dictionary. In an online dictionary, the menu system or site map performs the function of a table of contents. An explanation of the layout of an entry helps new users of the dictionary understand what the pieces of information signify and how to find the information they are seeking. If abbreviations are used in the body of the entry, then an abbreviations guide is provided. An introduction to the language – its classification, where it is spoken, by whom, by how many, etc. – is desirable as the ‘sophisticated scholar needs orientation to the language area and the distinctive features of the vernacular’ (Bartholomew and Schoenhals [1983] 2019: 205). An overview of the alphabet is presented – a list of the letters in the order used in the dictionary. In lieu of a comprehensive phonology description, many dictionary publications provide a pronunciation guide. This is displayed using International Phonetic Alphabet25 characters or audio recordings, either in a generic chart included in the front matter or as an individual pronunciation for the headword of each entry. Students of the language will be interested in the history of how the current version of the dictionary came about. For example, is this the first attempt at a dictionary or a revision of a previous dictionary? Is this dictionary a result of a word-collection workshop or years of language development work?
4.3 Back matter Back matter is presented after the last entry and traditionally has been considered optional. A reversal index, or finder list, may be included for each LWC. This is a list of the translation equivalents, sorted alphabetically. For example, in the English reversal index of the Wayuu dictionary pictured in Figure 17.7, the English translation equivalents are along the left margin and the Wayuu words are beside them. A semantic domain index (or thesaurus) is a feature provided in a Webonary dictionary which has been produced using the Semantic Domains list in FLEx. Charts or tables of inflectional paradigms included in the appendices, along with either an overview or detailed description of the phonological and grammatical systems, help users in learning about the language. In an online dictionary, this may be a link to a downloadable PDF file. A language map helps readers visualize where the language is spoken and how it is related to surrounding language groups. A list of published materials or links about the language assists anyone researching the language.
4.4 Digital dictionary features Digital dictionary publications, which include interactive websites, mobile apps, downloadable PDF files and CDs/DVDs, allow features not possible in print publications. Although digital publications no longer present content in terms of front and back matter, this content should not be omitted from digital dictionaries, for reasons stated above. 295
The Bloomsbury Handbook of Lexicography
Figure 17.7 English reversal index (Captain and Captain 2019, as cited in Stutzman, Warfel and Bryson 2020c: 27). In a digital publication, a search feature allows for searching for complete or partial words in either the vernacular or LWC. Often the search results can be filtered by field or grammatical category (see also Pastor and Alcina, Chapter 8). In an online dictionary, it is possible to provide links to other Internet sites that focus on the language. For example, the Lou dictionary Webonary site (Pwaka et al. 2013) lists the following links to sites about the Lou language: ●● ●●
●●
296
Ethnologue link: https://www.ethnologue.com/language/loj World Atlas of Language Structures (WALS) link: https://wals.info/languoid/lect/wals_ code_lou Open Language Archives Community (OLAC) resources link: http://www.languagearchives.org/language/loj
Compiling Dictionaries for Minority Languages
Figure 17.8 Entry with audio and colour-coded language data (Lopez and Broadwell 2013).
Links to both audio and video files are possible in digital dictionaries. Figure 17.8 displays an entry from the Copala Triqui Webonary dictionary (Lopez and Broadwell 2013) with a play button. Clicking on the play button allows the user to listen to a recording in the vernacular. Using different colours to differentiate the various languages in the dictionary is a viable option in a digital publication. Whereas this is expensive to do in print, colour is free in the digital world. When publishing digitally, various browse views of the dictionary entries are possible. For example, Webonary presents a browse view that includes all the details in each entry. A dictionary app created with the Dictionary App Builder26 (DAB), designed for the small screen of a cell phone, displays a minimal browse view, and the user has to click on a specific headword to see the full entry. Word search puzzles, crossword puzzles, matching games and other activities can be created using the words in a bilingual dictionary. An online dictionary site is a logical place to make these available to users of the language.
5 Microstructure of a dictionary article (entry) Kunze and Lemnitzer (2004: 1) state, ‘the term microstructure denotes the structure, i.e. the information items and their relations, of a single dictionary entry.’ The number of elements in an entry can range from just a few to hundreds (Baines 2018: 953), and the discussion of microstructure refers to how these elements relate to each other within the dictionary article. An entry consists of a word with its inflections, its pronunciation, its meanings, its history and its relationships with other words. See Figure 17.9 for a typical dictionary entry. Because there are so many different kinds of information, it is imperative to develop a consistent approach for presenting them in a dictionary article, so that the various types of information are presented in the same order in each entry. When deciding how to organize the elements of a dictionary entry, it is important to first identify the intended audience (Newell 1995: 4). The information that is key to the primary audience is placed in the beginning of the dictionary article. For example, an academic audience may want cross references at the beginning of the entry, while the language community, who are more likely to know the relationships, may want the cross references placed toward the end of the entry, possibly even in a separate paragraph. 297
The Bloomsbury Handbook of Lexicography
Figure 17.9 Dictionary entry (Niggli 2016b, as cited in Stutzman, Warfel and Bryson 2020b: 4).
Merriam-Webster defines sense as ‘one of a set of meanings a word or phrase may bear especially as segregated in a dictionary entry’ (Sense 2020). Because a word may have more than one sense (e.g. the English word run has multiple meanings: an animal runs, a river runs, a stocking runs), the information in a dictionary article is organized hierarchically. The highest level of the hierarchy is the dictionary entry, which contains both entry-level and sense-level details. Entry-level information is divided into two parts: what precedes the senses and what follows the senses (see Figure 17.10).
5.1 Form versus meaning According to Tang (2018: 1), ‘form in linguistics and language refers to the symbols used to represent meaning.’ A principle for organizing the information about a word is to separate
Figure 17.10 Structure of an entry (Stutzman, Warfel and Bryson 2020b, 7). 298
Compiling Dictionaries for Minority Languages
information about its form (spelling, pronunciation, etc.) from information about its meaning (definition, examples, etc.). Details about the form are organized at the entry level and may include: ●● ●● ●● ●● ●●
spelling and spelling variants pronunciation and pronunciation variants etymology (the word’s history, where it came from) dialect variants derived words or phrases
Information about a word’s meaning is organized at the sense level in the hierarchy and may include: ●● ●● ●● ●● ●● ●● ●●
its meaning(s) its grammatical category (part of speech) examples of how it can be used in a sentence other words that are related to its meaning in some way (synonyms, antonyms, etc.) notes about how the word is used (formal speech, taboo, etc.) cultural notes about the meaning scientific name (e.g. for flora and fauna)
When trying to decide whether a specific fact about a lexeme should be included at the entry level or the sense level, there are several questions to consider: ●●
●●
Is this information about the form of the word or about its meaning? ¡¡ If it is about the form, it belongs at the entry level. ¡¡ If it is about the meaning, it belongs at the sense level. If the word has more than one meaning, is this information the same for all meanings or does it vary depending on the meaning? ¡¡ If the information is the same for all senses, it belongs at the entry level. ¡¡ If the information changes based on the meaning, it belongs at the sense level.
5.2 Entry-level elements Entry-level elements include headword, pronunciation, irregularly inflected forms, essential linguistic information, etymology and entry-level notes. Each entry has a headword that serves as (a) the title of the article in the published form and (b) the key element of the entry in the database. The headword is often referred to as the lexeme form or lexical unit when discussing the database, or as the citation form when discussing the presentation of the entry. Headwords take the form of: ●●
●● ●● ●●
words: stems (bound or free), roots (bound or free), morphemes, inflected forms (for languages with bound roots) affixes: prefixes, suffixes, infixes, circumfixes, etc. function words: clitics, particles loan words
299
The Bloomsbury Handbook of Lexicography
●● ●● ●● ●●
proper nouns irregularly inflected forms, spelling and dialectal variants derived forms and compounds phrases and idioms
In the case of inflectable forms, it may not be obvious which inflected form to choose as the citation form. Newell (1995: 235) proposes several guidelines for choosing the best citation form: one that represents the basic meaning of the lexical unit, one that is easy to understand in isolation, one that occurs most frequently, is the least inflected form and occurs in natural text. A pronunciation is indicated by a phonetic representation of how the word is pronounced. It may also include an audio recording (for digital publications), a representation of the tone pattern or an indication of where the pronunciation is used (especially if more than one pronunciation is given). Normally the pronunciation is displayed immediately following the headword. Irregularly inflected forms usually occur at the beginning of the entry. For example, in Bantu languages (De Schryver 2010: 172), nouns are usually listed in the singular. However, the plural form is normally included soon after the headword as well, because it is difficult or impossible to predict the plural form based solely on the form of the singular. Observations regarding grammatical irregularities, which Bartholomew and Schoenhals ([1983] 2019: 20) termed essential linguistic information, are usually included at the end of the entry. Etymology tells the history of a word. It includes: ●● ●● ●●
the form in the original language the name of the original language the literal meaning in the original language.
For an academic audience, etymological information is presented early in the entry, whereas in a dictionary intended primarily for the language community, it is included towards the end of the entry.
5.3 Sense-level elements Sense-level elements include ‘definitions or explanations of literal and figurative, denotative and connotative meanings’ (Singh 2010: 6), grammatical categories, and example sentences. Grammatical category is a syntactic category for elements that are part of the lexicon of a language. Other synonymous terms include part of speech, word class, grammatical class and lexical category. A group of words belong to the same grammatical category if they can occupy the same syntactic position in a sentence and take the same set of inflectional affixes. For example, in English, nouns are nouns because they can function as the subject or object of a sentence and can be inflected for number. Any word that has these qualities is therefore a member of the grammatical category noun. A good definition in a bilingual dictionary begins with a one- or two-word equivalent in the LWC, often called a gloss. This orients the dictionary user to the core concept that the
300
Compiling Dictionaries for Minority Languages
word represents. Modifying words or phrases are added to make the meaning more precise. The precision should be restrictive enough to exclude words with related meanings, but not so restrictive that only a part of the meaning is included. For example, a word that means rope made from buffalo hide cannot be adequately defined as rope or rope made from animal hide; they are too general. Nor can it be defined as rope made from the skin from a female buffalo, as that is too narrow (Van den Berg 2012: 12). An effective example in a bilingual dictionary includes a sentence in the vernacular language and its translation in the LWC. A good test of the quality of an example sentence is ‘whether many or very few words can replace the illustrated word’ (Van den Berg 2012: 21). In a good example sentence, very few words can replace the illustrated word. If there is more than one LWC, the translation in each of the LWCs is provided. If appropriate, a reference for the example may be provided, indicating the source (individual or text) of the example sentence. In addition to the grammatical category, definition and example sentences, other kinds of information provided at the sense level may include lexical relations, usage in the social context, images and various types of notes.
5.4 Other hierarchical elements Other hierarchical elements include subsenses and subentries. Subentries are used to display lexical entries (or complex forms) that are derived from the headword of the main entry, while senses and subsenses are used to display different meanings of the same word. An example of subsenses in an English dictionary article is shown in Figure 17.11, where senses a, b, c and d are subsenses of sense 1. A subentry (normally an entry-level element) is a ‘unit in the lexical database representing a lexeme that is made up of more than one morpheme and is lexically related to one or more major entries’ (Subentry 2003). ‘Subentries offer an opportunity to display the grammatical and lexical structure of the vernacular in a slightly expanded fashion’ (Bartholomew and Schoenhals [1983] 2019: 171). In Figure 17.12, the senses appear first and are numbered; they are followed by subentries of the headword kasé. For example, kasé kanmawad is the head of a subentry derived from kasé. A subentry may be presented in its own paragraph and indented to make it clear that it is subsidiary to the main entry, or it may be displayed inline following the last sense, as illustrated in Figure 17.12.
Figure 17.11 Subsenses of senses (Job 2020).
301
The Bloomsbury Handbook of Lexicography
Figure 17.12 Subentries vs. senses (Frank 2020, as cited in Stutzman, Warfel and Bryson 2020b: 30).
6 Mesostructure – cross-referencing of entries A dictionary entry may reference other entries that are related to it in some way. Trippel (2006: 41) defines this ‘interrelation between the lexicon entries’ as the mesostructure of the dictionary. In digital dictionaries the referencing may include a link that allows jumping to the referenced entry. The term cross reference is a general term that refers to any relationship (meaning-based or form-based) between two entries, such as: ●● ●● ●●
●●
lexical relations (e.g. synonym, antonym, part-whole, scale) complex lexical forms (e.g. derived forms, compound words, idiomatic expressions) variants (e.g. a different spelling or pronunciation of the same word, or the same word in another dialect) generic cross references that do not fit into the other categories
A cross reference may originate at either the entry level or the sense level, and it may refer to either an entry or a particular sense of an entry. In Figure 17.13, the entry manje cross-references the entry papaye.
Figure 17.13 Cross references (Niggli 2016a, as cited in Stutzman, Warfel and Bryson 2020b: 27).
302
Compiling Dictionaries for Minority Languages
A lexical relation is a ‘culturally recognized pattern of association that exists between lexical units in a language’ (Lexical Relation 2003). There are two kinds of lexical relationships: paradigmatic and syntagmatic (ELT Concourse nd.). These terms ‘are introduced by Saussure (1974) to distinguish two kinds of signifiers: one concerns positioning (syntagmatic) and the other concerns substitution (paradigmatic)’ (Chiu and Lu 2015). Some examples of paradigmatic lexical relations include: ●● ●● ●● ●●
synonym (e.g. look, see, view) antonym (e.g. good - bad) part-whole relationship (e.g. a door is a part of a house) generic-specific relationships (e.g. a cat is a kind of animal)
Syntagmatic lexical relations are structured in sets of pairs. The two members of each pair (A and B) have compatible semantic components, are in a fixed syntactic and semantic relationship to each other and are typically associated with each other. For example, in English pilot-fly, pastor-preach and janitor-clean are pairs of words that express the relationship of agent-action. Teacher–student and doctor–patient are examples of a benefactor–beneficiary relationship. All these examples are meaning-based, so they are presented at the sense level of an entry. Complex lexical items are ‘strings of language in which more than one meaning-carrying element can be recognized yet which are very likely candidates to be stored as units in people’s linguistic repertoires’ (Mos 2010: 24). They are lexical units which are constructed from other lexical items. Examples include: ●● ●● ●● ●● ●●
compounds (e.g. blue + berry = blueberry) contractions (e.g. it + is = it’s) derivations (e.g. important + -ly = importantly) idioms (e.g. kick the bucket = die) sayings (e.g. a stitch in time saves nine)
They can be presented as subentries of the lexical items they are built from, as full main entries, or both. Variant forms – different ways of saying the same word, based on the speaker’s age, where they live, etc. – are included in the dictionary entry, with an indication as to when or where each is used. These lexical variants ‘differ from synonyms in that synonyms are different forms for the same concept, while lexical variants are different word forms for the same expression’ (NISO 2010: 45). Variant types include: ●● ●● ●● ●● ●●
dialectal variant (e.g. trunk – boot) spelling variant (e.g. color – colour) phonologically conditioned variant (e.g. a – an) irregularly inflected variant (e.g. take – took) free variant (e.g. it is – it’s)
Normally variants are displayed near the beginning of the entry, after the headword or the pronunciation.
303
The Bloomsbury Handbook of Lexicography
7 Types of publication After having collected and entered words in a software database, there are many options for publication, depending on what level and/or type of detail is desired for a particular audience: wordlist, vocabulary, glossary, lexicon, dictionary, thesaurus, picture dictionary, historical dictionary or encyclopaedic dictionary. For example, in a FLEx database, it is possible to tag entries to be included in different publications, such as the main dictionary publication, a learner’s dictionary or a specialty dictionary of some kind, e.g. a glossary of flora and fauna. When a particular publication is exported from the database for publishing, only those entries tagged for inclusion in that specific publication are exported (see also Nielsen, Chapter 23). Until relatively recently, the only publishing mediums were tangible: clay tablets, papyrus, paper, etc. However, with the advent of the digital age new opportunities have arisen. Talking dictionaries are now a possibility, allowing for the incorporation of sound bites in a publication. Anderson and Harrison (2006) first spearheaded the Talking Dictionaries website in 2006 as ‘exceptional digital tools to help preserve and learn words and phrases in endangered languages’. The Ethnos Project presented its first link to Talking Dictionaries in 2012, ‘Introduction: The Enduring Voices Project’s Talking Dictionaries’ (Oppenneer 2012). In comparison to print publication, digital publication is much easier and thus can be more frequent, with revised editions being possible at minimal cost and effort. Publish early and often is a reality if one uses FLEx to publish on Webonary.org, quickly sharing lexical information around the world. As cell phones are now commonplace in even the most rural areas, publishing dictionaries in the form of downloadable apps is very popular, making them accessible to a wide audience, including many who previously couldn’t afford to buy publications in printed form. Lexical data from FLEx can be imported into SIL’s Dictionary App Builder software, creating bilingual and multilingual dictionary apps for minority languages.
8 Conclusion In this chapter, we discussed the importance of compiling and publishing bilingual and multilingual dictionaries in minority languages. We described the process for compiling a dictionary in terms of word collection, entry development and platforms for publishing. We highlighted the most efficient method for collecting words, i.e. SIL’s Rapid Word Collection methodology. We demonstrated how SIL’s lexical database software, Fieldworks Language Explorer, is used effectively to accomplish this goal. We presented an overview of the macrostructure of a dictionary publication and the microstructure of a dictionary entry and showed how entries can be related to each other (mesostructure). Our hope is that all minority language communities will soon have access to bilingual or multilingual dictionaries, in order to take pride in their heritage and preserve their language.
304
Compiling Dictionaries for Minority Languages
Notes 1 2 3 4 5
A writing system is based on a script and a set of rules on how to use it. https://www.webonary.org/. MTB-MLE = Mother Tongue-Based Multilingual Education. Rapid Word Collection methodology is described in the section Collecting Words. As DLS Coordinator, Verna Stutzman receives a lot of requests from the linguistic community for access to lexical data that SIL members have collected over the years. 6 http://semdom.org/. 7 RWC workshops use semantic domain questionnaires like this: http://semdom.org/v4/1.6.1. 8 Located on Lou Island, Manus province, Papua New Guinea. 9 V syllable contains one vowel; CV syllable contains a consonant, followed by a vowel. 10 https://software.sil.org/fieldworks/. 11 Shoebox, Field Linguist’s Toolbox, LinguaLinks, WeSay, and FLEx. 12 https://home.unicode.org/ or http://unicode.org/main.html. 13 XML stands for Extensible Markup Language. It is a text-based markup language derived from Standard Generalized Markup Language (SGML), https://www.tutorialspoint.com/xml/xml_overview. htm. 14 http://www.lexiquepro.com/. 15 https://software.sil.org/phonologyassistant/. 16 https://paratext.org/. 17 http://software.sil.org/fieldworks/support/companion-tools/. 18 https://groups.google.com/g/flex-list. 19 https://groups.google.com/g/webonary-list. 20 Send/Receive transmits data and merges changes done by other collaborators. 21 http://software.sil.org/fieldworks/support/using-sendreceive/. 22 https://thecombine.languagetechnology.org/. 23 https://languageforge.org/. 24 Alphabet & Abjad = 5.9B; Syllabary & Abugida = 1.65B; Logographic 1.3B, https://www.worldatlas. com/articles/the-world-s-most-popular-writing-scripts.html. 25 https://www.internationalphoneticalphabet.org/. 26 https://software.sil.org/dictionaryappbuilder/.
References About SIL (2020), sil.org https://www.sil.org/about [accessed 10 June 2020]. Anderson, G.D.S. and K.D. Harrison (2006), ‘Talking Dictionaries’. https://talkingdictionaries.app/ [accessed 21 July 2020]. Baines, D. (2018), ‘An overview of FieldWorks and related programs for collaborative lexicography and publishing online or as a mobile app’ in J. Čibej, V. Gorjanc, I. Kosem and S. Krek (eds), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana, Slovenia, 17–21 July 2018, Ljubljana: Ljubljana University Press, 953–8. Available at https://euralex.org/ publications/an-overview-of-fieldworks-and-related-programs-for-collaborative-lexicography-andpublishing-online-or-as-a-mobile-app/ [accessed 20 July 2020]. Modified version also available online: https://www.sil.org/resources/archives/76085 [accessed 20 July 2020]. Bartholomew, D.A. and L.C. Schoenhals (1983/2019), Bilingual Dictionaries for Indigenous Languages, Second edition, Ed. Thomas L. Willett, Instituto Lingüístico de Verano, A.C. https://www.sil.org/
305
The Bloomsbury Handbook of Lexicography
system/files/reapdata/13/15/67/131567599402703622459842646184418180277/BDIL_2nd_ed_elec. pdf [accessed 1 June 2020]. Birch, B. (2012) ‘New phone app launched for the Ma! Iwaidja Dictionary’, SIL.org. https://www.sil.org/ about/news/new-phone-app-launched-ma-iwaidja-dictionary [accessed 15 June 2020]. Boerger, B. and V. Stutzman (2018), ‘Single-event Rapid Word Collection workshops: Efficient, effective, empowering’, Language Documentation & Conservation 12, 147–93. http://hdl.handle. net/10125/24766 [accessed 1 June 2020]. Cahill, M., G. Stairs and E. Stairs (2003), ‘A Dictionary and Scripture Use in Huave’, Word & Deed 2 (2), 77–9. Sil.org. https://www.reap.insitehome.org/handle/9284745/48425 [accessed 6 July 2020]. Captain, D. and L. Captain (eds) (2019), Wayuu - English - Spanish Dictionary, Webonary.org. SIL International. https://www.webonary.org/wayuu/browse/browse-english/ [accessed 21 July 2020]. Chiu, W. and K. Lu (2015) ‘Paradigmatic relations and syntagmatic relations: How are they related?’ Proceedings of the Association for Information Science and Technology 52 (1), 1–4. https://asistdl. onlinelibrary.wiley.com/doi/full/10.1002/pra2.2015.1450520100122 [accessed 17 August 2020]. Clark, P. (2012), ‘Rapid Word Collection: The Buli Experience’, Arca Associates: Sandema, Ghana. Vimeo.com. https://player.vimeo.com/video/44131617 [accessed 9 June 2020]. Cotton, J., J. Pickering and J. Davis (1707/1829), Vocabulary of the Massachusetts (or Natick) Indian Language, Cambridge, MA: s.n. Hathitrust.org. https://catalog.hathitrust.org/Record/100280131 [accessed 23 June 2020]. De Schryver, G.-M. (2010), ‘Revolutionizing Bantu Lexicography — A Zulu Case Study’, Lexikos 20 (AFRILEX-reeks/series), 161–201. https://pdfs.semanticscholar.org/dbd2/0123806a2adea7eba8e14b15 76b7810aff36.pdf [accessed 17 August 2020]. Dictionary (2020), Lexico.com. https://www.lexico.com/en/definition/dictionary [accessed 26 May 2020]. Echerd, S. (2019), ‘Preserve most of the core words in a language: Using an extensive “List of Semantic Domains,” a Lexical Database, and the “Rapid Word Collection” method’, SIL International, Unpublished manuscript. ELT Concourse Teacher Training (nd), Lexical Relationships. https://www.eltconcourse.com/training/ inservice/lexicogrammar/lexis_relationships.html [accessed 17 August 2020]. Frank, D.B. (ed.) (2020), Kwéyòl Dictionary, Dallas: SIL International. Webonary.org. https://www. webonary.org/kweyol/g887d37a8-e3e4-462d-ac93-e2d1b865915a/ [accessed 25 May 2020]. Grimes, J.E. (1986), ‘Book Review: Bilingual Dictionaries for Indigenous Languages’, International Journal of American Linguistics 52 (4), 422–5. www.jstor.org/stable/1265542 [accessed 18 May 2020]. Hüllen, W. (1999), English Dictionaries, 800–1700: The Topical Tradition, New York: Oxford University Press. Job (2020) in Merriam-Webster Dictionary, Merriam-Webster.com. https://www.merriam-webster.com/ dictionary/job [accessed 28 July 2020]. Johnson, S. (1755), A Dictionary of the English Language, London: Consortium. Kindberg, E. (2002), ‘Dictionary Turns the Tide’, Word & Deed 1 (3): 9–11. Sil.org. http://www.reap. insitehome.org/handle/9284745/48340 [accessed 8 June 2020]. Kunze, C. and L. Lemnitzer (2004), ‘Computational Lexicography’, Seminar für Sprachwissenschaft, Universität Tübingen. http://milca.sfs.uni-tuebingen.de/B2/Textbook/, Last updated 25 July 2004. http://milca.sfs.uni-tuebingen.de/B2/Textbook/DictStruct/MiLCA_COLEX_DictStruct-03.xhtml [accessed 10 June 2020]. Language Education (2016), Department of Higher Education, Ministry of Human Resource Department, Government of India. Available at https://mhrd.gov.in/language-education-4 [accessed 21 July 2020]. Language Technology Standards (2020), SIL.org. https://www.sil.org/language-technology/standards [accessed 21 July 2020]. Lexical Relation (2003) in SIL Glossary of Linguistics Terms, LinguaLinks Library, version 5.0, Dallas, TX: SIL International. https://glossary.sil.org/term/lexical-relation [accessed 28 July 2020]. List of Revived Languages (2020), Wikipedia.org. https://en.wikipedia.org/wiki/List_of_revived_ languages [accessed 28 May 2020].
306
Compiling Dictionaries for Minority Languages
Lopez, R.V. and G.A. Broadwell (2013), Copala Triqui – Spanish – English Dictionary, New York: University at Albany, State University of New York (Albany Triqui Working Group). Available at https://www.webonary.org/copalatriqui/browse/browse-vernacular-english/ [accessed 21 July 2020]. Misachi, J. (2017), ‘The World’s Oldest Dictionaries’, Worldatlas.com. https://www.worldatlas.com/ articles/the-world-s-oldest-dictionaries.html [accessed 9 June 2020]. Moe, R. (2007), ‘Dictionary Development Program’, SIL Forum for Language Fieldwork. https:// www.sil.org/system/files/reapdata/74/26/20/74262098499698801429138168340553232050/ SILForum2007_003.pdf [accessed 9 June 2020]. Mos, M.B.J. (2010), Complex Lexical Items, Utrecht: LOT. https://www.lotpublications.nl/ Documents/246_fulltext.pdf [accessed 21 July 2020]. Newell, L.E. (1995), Handbook on Lexicography for Philippine and Other Languages, Manila: Linguistic Society of the Philippines. Niggli, U. (ed.) (2016a), Dioula – French – English Dictionary, Dallas: SIL International. https://www. webonary.org/dioula-bf/ [accessed 24 July 2020]. Niggli, U. (ed.) (2016b), Kusaal–French Dictionary, Dallas: SIL International. https://www.webonary.org/ kusaal-bf/ [accessed 24 July 2020]. NISO (2010), Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, Baltimore: National Information Standards Organization. Available at https://groups.niso. org/apps/group_public/download.php/12591/z39-19-2005r2010.pdf [accessed 21 July 2020]. Omakaeva, E. (2019), ‘Jangar Heroic Epic in Mirror of Language, Culture, Information Technology: Thematic Dictionary’, SCTCMG 2019 – Social and Cultural Transformations in the Context of Modern Globalism, 2523–8. https://www.researchgate.net/publication/338215030_Jangar_Heroic_ Epic_In_Mirror_Of_Language_Culture_Information_Technology_Thematic_Dictionary [accessed 15 June 2020]. Oppenneer, M. (2012), Talking Dictionaries, Ethnos Project. https://www.ethnosproject.org/talkingdictionaries/ [accessed 21 July 2020]. The Oxford English Corpus (2020), Last modified on 28 April 2020. https://www.sketchengine.eu/oxfordenglish-corpus/ [accessed 22 June 2020]. Peoples, J. and G. Bailey (1988/2018), Humanity: An Introduction to Cultural Anthropology, Eleventh edition, Boston, MA: Cengage Learning. Policy Guidelines on the K to 12 Basic Education Program (2019), Republic of the Philippines, Department of Education. https://www.deped.gov.ph/wp-content/uploads/2019/08/DO_s2019_021.pdf, Appendix 2 [accessed 21 July 2020]. Pwaka, S., S. Keliwin, R. Stutzman and V. Stutzman (2013), Lou-English Dictionary, SIL International. https://www.webonary.org/loudictionary/ [accessed 21 July 2020]. Quinault Indian Language (1998–2015). http://www.native-languages.org/quinault. htm#:~:text=Quinault%20Indian%20Language,keep%20their%20ancestral%20language%20alive [accessed 14 July 2020]. Roberts, J.S. and K.L. Snider (2006), ‘SIL Comparative African Wordlist (SILCAWL)’. https://www.sil. org/resources/publications/entry/7882 [accessed 9 June 2020]. Roberts, J.R., R. Hedinger and R. Gravina (2014), Dictionary Making, European Training Programme. https:// drive.google.com/file/d/1QicPK0Q-5ycq05_noRV88HTeSEZWwRxy/view [accessed 9 June 2020]. Rodríguez, M. Del Rosario Caballero (2016/2019), Lexicographic Tools. A Course Book (2ª edición corregida y aumentada), Ediciones de la Universidad de Castilla La Mancha. Saussure, F.de. (1916/1974), Course in General Linguistics, London: Fontana. Semantic Domains (2014), SIL.org. http://semdom.org/ [accessed 9 June 2020]. Sense (2020) in Merriam-Webster Dictionary. https://www.merriam-webster.com/dictionary/sense [accessed 28 July 2020]. Silversmith, S. (2016), ‘This lost native language of Massachusetts is waking up again’, The World. Available at https://www.pri.org/stories/2016-12-29/lost-native-language-massachusetts-waking-again [accessed 17 August 2020].
307
The Bloomsbury Handbook of Lexicography
Singh, P. (2010), ‘Dictionary and its structure’, ANUŚĪLANA: Research Journal of Indian Cultural, Social, and Philosophical Stream XXIV (3). Available at https://www.academia.edu/2420534/ Dictionary_and_Its_Structure [accessed 6 July 2020]. Stutzman, V., K. Warfel and B. Bryson (2020a) ‘Dictionary-Making & Lexicography Tools’ in V. Stutzman, K. Warfel and B. Bryson, Dictionary-Making and Lexicography Course, Dallas, TX: SIL Dictionary and Lexicography Services. Available at https://sites.google.com/sil.org/dls-course/lessons/ a-theoretical-foundations/dictionary-making-and-lexicography-tools [accessed 20 July 2020]. Stutzman, V., K. Warfel and B. Bryson (2020b), ‘Overview of an entry’ in V. Stutzman, K. Warfel and B. Bryson (eds), Dictionary-Making and Lexicography Course, Dallas, TX: SIL Dictionary and Lexicography Services. Available at https://sites.google.com/sil.org/dls-course/lessons/c-introductionto-entries/overview-of-an-entry [accessed 20 July 2020]. Stutzman, V., K. Warfel and B. Bryson (2020c), ‘Structure of a dictionary publication’ in V. Stutzman, K. Warfel and B. Bryson, Dictionary-Making and Lexicography Course, Dallas, TX: SIL Dictionary and Lexicography Services. Available at https://sites.google.com/sil.org/dls-course/lessons/atheoretical-foundations/structure-of-a-dictionary-publication [accessed 20 July 2020]. Stutzman, V., K. Warfel and B. Bryson (forthcoming), ‘Word-Collection Strategies’ in V. Stutzman, K. Warfel and B. Bryson, Dictionary-Making and Lexicography Course, Dallas, TX: SIL Dictionary and Lexicography Services. Subentry in a Lexical Database (2003) in SIL Glossary of Linguistics Terms, LinguaLinks Library, version 5.0. Dallas, TX: SIL International. https://glossary.sil.org/term/subentry-lexical-database [accessed 28 July 2020]. Tang, W.M. (2018), Form and Meaning in Linguistics. https://wmtang.org/2018/01/25/form-and-meaningin-linguistics/#:~:text=Form%20in%20linguistics%20and%20language,different%20meanings%20 in%20different%20contexts [accessed 14 July 2020]. Trippel, T. (2006), ‘The lexicon Graph Model: A generic model for multimodal lexicon development’, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld. https://d-nb. info/999767062/34 [accessed 18 June 2020]. Trippel, T., M. Maxwell, G. Corbett, C. Prince, C. Manning, S. Grimes and S. Moran (2007), ‘Lexicon Schemas and Related Data Models: When Standards Meet Users’. http://www.lrec-conf.org/ proceedings/lrec2008/pdf/812_paper.pdf [accessed 18 June 2020]. Types of animals (2014), Semdom.org. http://semdom.org/v4/1.6.1 [accessed 26 May 2020]. Van Den Berg, R. (2012), Making dictionaries: A course for national translators, Ukarumpa: SIL Papua New Guinea. Warfel, K. and V. Stutzman (2020), Rapid Word Collection Training Manual, Revised edition, Dallas, TX: SIL International Dictionary and Lexicography Services. Available at https://drive.google.com/ file/d/1BA7h39hXUKi1ZtWTBQCG2B_rjyIoUjeG/ [accessed 21 July 2020]. Webonary – Dictionaries and Grammars of the World (2013–20), https://www.webonary.org/ [accessed 26 May 2020]. Wôpanâak Language Reclamation Project (2020), http://www.wlrp.org/project-history.html [accessed 22 June 2020].
308
18
Aspects of multi-word expressions in Asian lexicography Vincent B. Y. Ooi, Ai Inoue, Kilim Nam and Cuilian Zhao
1 Introduction Multi-word expressions (MWEs) are phraseological units that go beyond the single word and have some degree of idiomatic meaning. The importance of MWEs is well known: they represent an ‘extensive part of the mental lexicon of native speakers’ and have various practical applications in lexicography and elsewhere (see Gantar et al. 2018). While the treatment of MWEs in the dictionary naturally would depend on the language concerned, there have been attempts to propose a more universal classification for MWEs in dictionaries that range from an ‘idiom’ to an ‘idiomatic multiword combination with an old inflexion’ (see Bergenholtz and Gouws 2013: 13–17).1 However, Atkins and Rundell (2008: 167) do note that MWE boundaries are also ‘so fluid’: ‘it has proved impossible to establish watertight criteria for lexicographers to apply in dealing with multiword items.’ Notwithstanding this observation, Atkins and Rundell (2008: 359) also argue that MWEs in English consist mainly of ‘idioms, collocations, phrasal verbs, compounds and support verb constructions’. For the different languages in this chapter, MWEs can be taken to involve some formulaic, metaphorical or semi-prefabricated expressions that contain two or more words (extending to the sentence or longer) whose meaning is native to the language concerned. The following sections highlight some central issues for the treatment of MWEs in Chinese, Japanese, Korean and Malay lexicography. While it is clear that the term ‘Asian lexicography’ extends to these four Asian languages that we have first-hand familiarity with respectively, it also hopefully draws the reader’s attention to the vast range of Asian lexicography that remains to be documented and studied more extensively.
2 MWEs in Chinese: Introduction Multi-word or multi-character expressions in the Chinese-English Dictionary (Unabridged) (henceforth the CEDU) fall into several categories, ranging from shorter expressions such as Ai Inoue’s research for this chapter has been made possible by the Grant-in-Aid for Scientific Research (C) (Grant number 20K00674), and she would like to thank the Japan Society for the Promotion of Science.
The Bloomsbury Handbook of Lexicography
词 (usually two-character word, e.g. 文化 ‘culture’) and 成语 (four-character idiom, e.g. 文山 会海 ‘mountains of documents and oceans of meetings’), to longer expressions such as 俗语 (common saying, e.g. 吃一堑, 长一智 ‘a fall in the pit, a gain in the wit’) and 歇后语 (two-part allegorical saying, e.g. 牛鞅子架在马身上——乱了套 (‘an ox yoke around the neck of a horse – is a misplaced halter) everything is complete chaos [pun on 乱套]’). As with phraseological units in English and many other languages, MWEs in Chinese usually go beyond the literal meanings of their component characters and can carry highly abstract messages. In this part of the chapter, we will focus on the principles and treatment of the MWEs in the ongoing project CEDU from the perspectives of users’ cognitive needs, mental lexicon representation, the evolution of senses and culture.
2.1 Guiding principle In the preface to the CEDU (Vol. 1), the late editor-in-chief Lu Gusun laid down the guiding principle of the dictionary: ‘descriptivism with a grain of salt’, which in essence indicates the limitations of the dictionary. In fact, descriptivism does not mean ‘anything goes’, and the lexicographer is not one of those ‘whateverists’ who recruit whatever they may come across in a given language. On the contrary, it is imperative that the inclusion of dictionary materials undergoes a selective process pertaining to, at least, the orientation of the lexicographers’ and users’ values. When it comes to the treatment of MWE entries in the dictionary, the guiding principle applies in such a way that senses and illustrative examples, the two most important components of the entry, are represented to facilitate users’ understanding (decoding) and use (encoding).
2.2 Theoretical framework The rationale behind the treatment of MWEs in the CEDU is based on two theoretical constructs: the bilingual mental lexicon and the depth of processing framework. A word in the mental lexicon consists of at least two levels: the lexical level (spelling, pronunciation, etc.) and the conceptual level. According to de Groot et al. (1995), the L2 learner’s mental representation of a lexical item develops from lexical connection to conceptual mediation as their L2 proficiency grows (see Figure 18.1). The levels or depth of processing framework (Craik and Lockhart 1972) holds that the nature and duration of the memory trace are determined by the level or depth at which the input is processed. Inputs that receive only superficial analysis are assumed to be more poorly retained than inputs subjected to deeper semantic analyses. Greater depth implies a greater degree of semantic or cognitive analysis. Recognizing a lexical item may trigger ‘elaboration coding’ such as associations, images or stories from one’s experience. The two constructs, tapping into the storage, processing and retention of information of the language user, are the basis for the principles of MWE treatment in the CEDU. The levels of bilingual mental lexicon representation provide a theoretical basis for a systematic representation of MWE information in the dictionary, and the depth of processing framework gives us fascinating insights into the inclusion of linguistic and non-linguistic information, and even perhaps the 310
Multiword Expressions in Asian Lexicography
Figure 18.1 Development of the bilingual Lexicon.
appropriate proportion of each part, in the dictionary entry. In what follows, we will look into the treatment of MWEs in the CEDU, with a focus on how a network is subtly woven inside and beyond the senses.
2.3 Networking the senses of MWEs in the CEDU As with contemporary practice, the most frequently used (or consulted) sense in the CEDU is usually given the first sense status in the entry, but it is not necessarily the first to appear in the evolutionary chain of the senses. In a language with a long history of development, the original meaning of an expression might have long faded out for lack of use, so that the dictionary might even fail to record it. Yet, to facilitate understanding on the part of the user, especially the L2 learner who has little knowledge of the language, it is necessary to build a network pertaining to the evolution or relations among the senses.
2.3.1 Inclusion of existing or early sense(s) to account for novel usage With the development of modern means of communication, new words or meanings catch on and swiftly go viral. After a period of time in use, they become foregrounded in the user’s mental lexicon and stabilize as a concrete sense. In addition, it is possible that they will continue to be in use and are therefore safe enough to be included in the dictionary, as is the case with the second sense of 狗血 and the third sense of 流量 illustrated below. Yet the literal sense of 狗血 is given first, which, as a superstitious practice, prepares the user for the dramatic novel usage. In the case of 流量, the existing senses (1 and 2) reveal the Chinese way of referring to data – a metaphorical use mapping the amount of data to water or traffic flow. 狗血 gǒu xuè 1. dog’s blood [to be sprinkled in a haunted place to exorcize demons] 2 overly and incredibly theatrical; utter bosh: 电影的~剧 情 the movie’s trite and overly contrived plot | ~剧 an OMG play … 流量 liú liàng 1 rate of flow; discharge; flux … 2 volume of flow; traffic … 3 [Comp] (amount of) data: ~使用量/需求/转存 data usage/demand/rollover | ~ 套餐 a data bundle/plan … 311
The Bloomsbury Handbook of Lexicography
Senses from ancient texts, which are not in current use but are still found in historical records or classical literature or are embodied as sense-components in multi-character units, are included in the CEDU. For example, the 24th and 25th senses of the entry 明 (illustrated below) are rarely used independently in modern Chinese, but are found in MWEs such as 明水 (clean water [used in sacrificial services]; sacrificial water), 明器 (funerary or burial object), 神明 (gods), and other MWEs still in use. For the sake of tracing the ‘origin and evolution’ or the reason for word meanings, it is necessary to retain such ‘root meanings’ in the dictionary entry. 明 míng … ►24 clean (sacrificial offering) see 明水 /- shuǐ/, 明器 /- qì/ 25 god: ~祀 sacrificial offering for gods see also 神明 /shén -/ … Without a proper knowledge of the original meanings of the components in MWEs, amateur dictionary users might go astray or even be misinformed when trying to figure out the meanings of individual characters in the MWEs.
2.3.2 Tracing etymology A multi-word expression might have its origin in historical events, literature, dialects, social events, scientific and technological developments, language contact, etc. In the CEDU, the origin of an MWE entry is invoked when a literal explication of the constituent characters seems to be disconnected from its (extended or figurative) sense(s). For instance, the four-character idiom 隼集陈庭 means ‘to have encyclopaedic knowledge; to be erudite with a retentive memory’. Its literal or word-for-word interpretation is ‘a falcon (隼) perches (集) in Chen’s (陈) courtyard (庭)’. In between the literal and the figurative there is a gap that has to be bridged, for otherwise even the uninformed Chinese mind might find it hard to establish a logical connection. Such a gap can be filled by invoking or restoring the etymology. The idiom alludes to a story about Confucius recorded in 《国语》 Narratives of the States. When Confucius stayed at Marquis Chen’s, a falcon wounded by an arrow perched and died in Chen’s courtyard. When Chen’s men approached Confucius to enquire about the arrow, he recognized it and told them about its owner. In the CEDU, the etymology is condensed into a line in the parentheses to serve the purpose: 隼集陈庭 sǔn jí chén tíng (to recognize the arrow when a falcon perching in the court of marquis Chen dies of it) to have encyclopaedic knowledge; to be erudite with a retentive memory When it comes to such MWEs as loans or dialects, etymological information is given in square brackets. 便当 biàn dāng lunch box [from the Japanese word bento (弁当)] 门槛精 mén kǎn jīng (Shanghai) clever, shrewd [said to be a transliteration of ‘monkey king’]
2.4 Acculturation of illustrative examples As Wittgenstein (1953: 31) maintains, ‘meaning is use’. The meaning of a word lies in its use. In addition to theorizing meanings of words or expressions, linguists such as Firth (1957) and Grice
312
Multiword Expressions in Asian Lexicography
(1975) are much concerned with the meanings of sentences or utterances. The idea is that the sentence or utterance has meaning not only at the semantic level, but also at the pragmatic level, the latter of which is mainly realized in its environment of use, that is, its context. If we regard the dictionary definition as a linguistic context, then the example serves as a situational context for the meaning of the word. There is a co-extension relationship between the situational context and the linguistic context: the latter is generated from the former, and again acts on the former. For instance, in the entry 麻烦, the situational contexts of the examples illustrate the linguistic contexts of the definitions and extend the domain of application for the linguistic context. Typical expressions such as 麻烦了, 不怕麻烦 supplement the definitions with ‘what is left unsaid’, thereby creating a more complete situational context than otherwise (see Table 18.1). The translation of examples constitutes an important part of bilingual lexicography. Yet it is not easy for the compiler to overcome barriers between languages of divergent cultures. The late Professor Lu Gusun once likened the process of finding a corresponding translation to ‘bridging’ and ‘arriving’: [translation is] starting from one language and ‘arriving’ on the opposite bank of another language … It is simply impossible to locate a corresponding point by crossing the bridge itself, let alone ‘arriving’ (Lu 2012). Inspired by Professor Lu’s view, compilers of the CEDU have adopted a ‘beyond equivalence’ strategy when translating, maximizing the example’s cultural fields of application in the target language (Lu et al. 2015: 5). For instance, the saying 攀得高, 跌得重 is mapped on to the classic finance saying ‘the bigger they are, the harder they fall’, both sharing a conceptual (metaphorical) representation (see Figure 18.2):
Table 18.1 Linguistic and situational contexts for 麻烦 má fan (sense 1). Definition (linguistic context)
Example (situational context)
troublesome; bothersome
~了, 我把钥匙锁在屋里了 what a nuisance; I’ve locked my key in the room | 手续很~ it is very aggravating to go through the formalities | 他们不怕~, 服务得很周到 they spared no pains to give good service
Figure 18.2 Representation of aculturated chunks. 313
The Bloomsbury Handbook of Lexicography
Table 18.2 Acculturated example translation. Definition (linguistic context)
Acculturation (situational context)
玫瑰 méi guī 1 [Bot] rugosa or hedgerow rose; Rosa rugosa
予人~, 手留余香 fragrance stays in the hand that gives the rose
认责 rèn zé to claim or accept responsibility
事成居功者众, 事败~者寡 success has many fathers, but failure is an orphan
饮食 yǐn shí 1.to eat and drink 2.food and drink; diet
1. ~无度 binge eating 2. 健康/均衡的~ a healthy/balanced diet | 时尚~ a fad diet
Table 18.2 contains a selection of acculturated translations for examples in the CEDU. These ready-to-use expressions with their respective authentic ‘acculturations’ provide a closer ‘anchorage’ for the translations in the target situational context. Hopefully, they will be helpful for the dictionary user’s understanding and acquisition.
3 MWEs in Japanese: Introduction This section aims at showing that Japanese English learners cause discrepancies in input-output English and Japanese multi-word expressions (MWEs). The reason why the discrepancies are caused seems to be one in which the input-output MWEs are unconsciously influenced by MWE usage in the first language and their associations. English idioms are hard to understand and use for Japanese English learners, and so English–Japanese dictionaries for learners (EJDLs) have to play an important role in helping Japanese English learners accurately understand and use English idioms. Thanks to the advancement of large-scale computer corpora, recently published EJDLs tend to accurately describe the actual grammatical and vocabulary behaviour of MWEs. However, EJDLs fail to correctly describe the linguistic and cultural features which each idiom potentially has. Also, descriptions of MWEs in EJDLs need improvement in that EJDLs try not to cause a ‘lost in translation’ situation when Japanese English learners use MWEs due to unconscious influence from Japanese idiom usage and Japanese culture, and vice versa: the original meanings of Japanese idioms are lost in translation when they are translated into English because English culture and English ways of thinking unconsciously influence translation.
3.1 Idioms – lost in translation and lack of Englishness English is particularly rich in idioms that are derived from the domain of sailing, and this is hardly surprising in light of England’s long history as a seafaring nation. On the other hand, Japanese idioms do not have such domains and tend to be used in various domains. Also, they contain a part of the body or animals to adequately and briefly explain a reaction to a situation. For example, Japanese have a negative image of rabbits because they misbehave in famous folk 314
Multiword Expressions in Asian Lexicography
stories, while rabbits do not in English. Instead, turtles leave a favourable impression on Japanese because they have been said to bring people good fortune. First, Japanese postgraduate students who have learned English phraseology for a year and are English learners are asked to translate the idioms shown in (1) into Japanese without explaining the meanings of the idioms (italicized by the author). Almost all L2 learners could not correctly translate the idioms. Japanese translation of each sentence in (1) is shown in (2) below. (1) a. The actor shook his head at the offer to play a leading role in a play. b. The new personnel has such a loose tongue that we cannot tell him our crucial matters. c. All employees’ jaws drop to the floor because of his outrageous words and actions. d. It is said that this charm is a rabbit’s foot. e. Government employees should conduct themselves by the book. The meanings, origins and syntactic features, etc. of the idioms used in each sentence are explained in detail to the L2 learners as follows: generally, shake one’s head is translated as 首 を横に振る(kubi wo yoko ni furu) (shake crossly one’s neck, kubi means neck) in (1a), but in English head instead of neck is used like shake one’s head. The idiom have a loose tongue in (1b) is equivalent to tongue in (1b) is equivalent (kuchi means a mouth) in Japanese. Same as (1a), a different body part is used in English in (1b). The idiom jaws drop to the floor in (1c) is used by Justin Pierre James Trudeau, Canadian prime minister, to express astonishment at what then-US president Donald Trump did when he talked with other countries’ prime ministers and presidents as in His teams’ jaws dropped to the floor. The idiom jaws drop to the floor is used for saying that someone is very surprised and shocked and is not included in major EJDLs. EJDLs merely describe that jaw is used to show surprise or disappointment. In the case of (1d), a rabbit’s foot is not included in EJDLs although it has been said to bring someone good luck in English. By the book in (1e) is explained as correctly following rules or systems for doing something in a strict way in EJDLs and is substituted with according to the book. Subsequently, L2 learners are again asked to translate the idioms in (1) into Japanese and to explain the idioms without any advance notice a couple of weeks later. Then, analysis is done regarding the ‘unconscious influence’ in the native language from linguistic and cultural standpoints. Table 18.3 shows the translated expressions by the postgraduate students. Although prior explanation was provided a couple of weeks earlier, the results show that L2 learners could not correctly translate the English idioms into Japanese. In (1a), the L2 learners Table 18.3 The translation of English familiar idioms into Japanese by L2 learners. equivalent and right Japanese idioms to the English idioms in (1)
idioms translated by L2 learners
(1a)
首を横に振る
頭を振る(頭(atama)=head) (i.e. shake one’s head)
(1b)
口が軽い
no answer
(1c)
あんぐり口を開ける
あごが外れる (i.e. dislocate one’s jaws)
(1d)
幸運をもたらす
no answer
(1e)
規則に従って
本に書いてあるように(i.e. as the book writes)
315
The Bloomsbury Handbook of Lexicography
literally translate the English idiom into Japanese, so wrote different body parts from the original English idiom. In (1b), L2 learners did not recall any equivalent Japanese idioms for have a loose tongue, so they could not translate it into Japanese. In the case of (1c), L2 learners wrongly translated jaws drop to the floor into あごが外れる (lit. dislocate one’s jaws), so it did not make sense at all. A L2 learner translated the idiom (1c) into 笑いすぎてあごが外れる(lit. dislocate one’s jaws because of too much laughing). Dislocate one’s jaws (あごが外れる) has two meanings in Japanese: one is literal and the other is idiomatic to express that something is so much fun for somebody that s/he almost has a dislocated jaw because of laughing too much. The reason why L2 learners did not correctly translate is that Japanese does not have semantically equivalent idioms to (1c). Also, in (1c) mouth is used instead of jaws in Japanese (e.g. あん ぐり口を開ける, lit. a mouth wide open with surprise, astonishment, etc.), so it is hard for Japanese to deduce the meaning of jaws drop to the floor from jaws. In (1d), L2 learners could not understand and translate the idiom because – as explained earlier – rabbits have been long considered evil because of Japanese folk stories. Hence, this leads to a lack of answer for the idiom a rabbit’s foot. As for (1e), L2 learners understood that book is used for its literal meaning, so they could not correctly translate (1e) into Japanese. Consequently, Japanese ways of thinking and Japanese culture subconsciously influence the translation of English idioms into Japanese. On the other hand, L2 learners in (1) were given a somewhat contradictory activity in which they were asked to translate sentences shown in (2) into English without any advance notice a few weeks later after activity (1). Sentences in (2) are translated Japanese versions of (1). Italicized Japanese phrases in (2) also correspond to the English idioms shown in italics in (1). (2) a. その俳優は、舞台での主役の申し出に対して首を横に振った。 b. 今度の新入社員は、口が軽いので重要なことは話せない。 c. 彼の非常識な言動により、職場の全員があんぐり口を開けた。 d. このお守りは、幸運をもたらすと言われている。 e. 国家公務員は、規則に従って行動すべきだ。 The responses of the L2 learners to (2) are summarized in Table 18.4. Table 18.4 shows that L2 learners have the following three tendencies in translating Japanese idioms into English: (i) Japanese idioms are literally translated into English with completely different words from original words used in English idioms like (2d), (ii) Japanese idioms are literally translated into English like (2a, c) and (iii) Japanese idioms can be translated into
Table 18.4 The translation of Japanese familiar idioms into English by L2 learners. equivalent and right English idioms to the Japanese idioms in (2)
idioms written by L2 learners
(2a)
shake one’s head
nod/shake/wave his/her neck (=首)
(2b)
have a loose tongue
cannot keep a secret, be not discreet
(2c)
jaws drop to the floor
open a mouth (=口) with surprise
(2d)
a rabbit’s foot
bring good luck
(2e)
by the book
according to the rule/ the book, by the book
316
Multiword Expressions in Asian Lexicography
semantically similar English expressions or idioms like cannot keep a secret, be not discreet and according to the rule in (2b, e). The reason for the three tendencies is due to the influence of Japanese idioms, of Japanese ways of thinking, and of Japanese culture. Especially in (2d), Japanese English learners could not recall or know that rabbits are regarded as bringing good luck in English, so they used completely different words to express a rabbit’s foot. In (2a, c), as the words framed in rectangle in Table 18.4 show, L2 learners use original words of Japanese idioms (i.e. a body part) without corresponding to the words used in English idioms shown in the framed rectangle in the case of translating Japanese idioms into English. This is due to the influence of Japanese idioms, i.e. L2 learners do not know that body parts used in Japanese idioms differ in English idioms. In (2a), 首を横に振る (i.e. shake crossly one’s neck, yoko means crossly, lit. shake one’s head) is literally translated into shake crossly one’s neck. L2 learners did not use a different body part or a head when they translated 首を横に振るinto English. Similar with the idiom, a wellknown Japanese idiom 首を縦に振る (tate means longwise, lit. nod one’s head) is translated kubi
wo
yoko
ni
furu
kubi
kubi
wo
yoko
ni
wo
yoko
ni
furu
furu
into shake longwise one’s head. Hence, it is important to express which direction a subject shakes his/her head in the Japanese idioms (首を横に振る, 首を縦に振る), while it is important to express how a subject moves his/her head in English idioms (i.e. shake, nod). In (2b, e), it was fairly difficult for L2 learners to translate the Japanese idioms into English because they do not know the equivalents in English to the Japanese idioms, so they changed the Japanese idioms into either semantically similar expressions or other idioms. To put it differently, they freely translate the Japanese idioms. Depending on the circumstances, code-switching from Japanese to English does not always properly work due to the influence of Japanese idioms, ways of thinking and cultural backgrounds in the case of translating Japanese idioms into English. Hence, the idioms translated by L2 learners lack Englishness. In addition, EJDLs fail to correctly describe the correspondence relation between Japanese idioms and English ones: this leads to a lack of Englishness and results in a ‘lost in translation’ situation.
4 MWEs and Korean lexicography: Introduction In the Korean lexicographic tradition, MWEs, such as idiomatic expressions, collocations or free combinations of high frequency, have been mostly described in subentries or as pseudoheadwords, rather than as independent headwords. A major reason for this is that Korean language dictionaries conventionally follow what Svensén calls the ‘graphical principle’ (1993: 208), that is, the ‘one-word-one-headword principle’. As a result, traditional lexicographic studies before the advent of corpus linguistics in Korea had more often than not overlooked the importance of MWEs in contrast to single word units. Once the ‘corpus revolution’ (Sinclair 1991, Rundell and Stock 1992, Sinclair et al. 2004) reached the lexicographic domain in Korea in the 1990s, corpus-based Korean lexicography took on the task of determining how to extract MWEs and how to represent them in the dictionary macro- and microstructure. The corpus-based Yonsei Korean Dictionary, published in 1998, brought the case of collocations into focus by presenting them as independent items within 317
The Bloomsbury Handbook of Lexicography
the microstructure, under the label ‘collocation’. Subsequent Korean learner’s dictionaries and corpus-based dictionaries have been presenting collocations as separate items ever since, and this focus culminated in the publication of the Korean Collocation Dictionary for learners in 2007. Recently, MWEs have been gaining in attention and significance in the field of modern Korean lexicography. Indeed, since MWEs are considered to be useful chunks of the mental lexicon, they can serve as an infrastructural resource (just as the lexis of single word units) in language learning or information retrieval. The following sections describe the current state of play in the treatment of MWEs in Korean lexicography and discuss the two central issues of their extraction and description.
4.1 Issues in the extraction of MWEs Despite the growing interest in MWEs, their representation in Korean dictionaries has many shortcomings, which can be correlated to limitations in the methodology of MWE extraction and selection and the scope of the corpora used to perform these tasks. Current Korean dictionaries are conventionally compiled based not only on existing dictionaries and corpora but also to some extent on lexicographers’ intuition and subjective judgement. The MWEs extracted and selected for description are as exhaustive as the lexicographer’s capacity according to such circumstances. Thus, it clearly appears that the MWE types (collocations, idioms, etc.) and register (written language vs spoken language, Web language, etc.) in Korean dictionaries have not been sufficiently taken into consideration. Furthermore, Korean lexicography has so far adopted a phraseology-based approach, as opposed to the frequency-based approach, when it comes to the extraction and selection of MWEs. As explained in Nesselhauf (2005), the phraseological approach to collocations (which is also valid for other types of MWEs) is based on semantic compositionality, degree of fixedness and syntactic relationship. Syntactic combinations are usually described in terms of part of speech, that is, of independent words, such as ‘adjective + noun’ and ‘adverb + verb’. While the phraseology-based approach is the typical approach in lexicography (Nesselhauf 2005: 12), the underlying syntactic combination principle is precisely the core issue in the case of the Korean language due to its agglutinative and morphemic nature. Indeed, Korean MWEs can be composed of functional words that are not necessarily ‘word units’ but rather ‘morphemes’. Studies in Korean corpus linguistics that have thoroughly exploited and applied the frequency-based approach have shown, for example, that many important morpheme-based (rather than word-based) formulaic expressions2 have not been included in dictionaries despite their high frequency (e.g. Nam et al. 2016). Table 18.5 compares high-frequency formulaic expressions with high-frequency single word units within the same frequency range in the Korean National Corpus (KNC). Most of the formulaic expressions presented in Table 18.5 are not represented in Korean dictionaries. However, they are all not only high-frequency expressions but first and foremost useful expressions. The formulaic expression ey ttalu-myen (according to), for example, appears in the same frequency range as basic words such as pilok (even if) and olh- (right), which suggests its importance and common use in the Korean language. The non-inclusion of such formulaic expressions in dictionaries shows that Korean lexicography still fails to reflect research based on frequency and on corpus-driven approaches as regards the extraction of MWEs. 318
Multiword Expressions in Asian Lexicography
Frequency range Formulaic in KNC expression
Frequency
Single word unit
Frequency
1001~ 1500
1,330
비록/ADVERB pilok even if
1,471
옳다/ADJECTIVE olhta is right
1,349
봄/NNG Pom Spring
1,317
매일/ADVERB Mayil every day
1,096
에 따르-면 ey ttalumy-en according to
-면 안 되-myen an toy should not
501~ 1000
1,078
에 의하-면 ey uyha-myen according to
920
아마도/ADVERB amato most likely
953
-다 보-면 -ta po-myen if
841
오히려/ADVERB ohilye rather
875
-지 않-으면 안 되-ci anh-umyen an toy have to
760
왜냐하면/ADVERB_ conjunctive waynyahamyen the reason is (that)
790
면 좋-겠-myen coh-keyssI wish
668
거짓말/NNG
685
예-를 들-면 yey-lul tul-myen for example
549
kecismal lie
다만/ADVERB_conjunctive taman though
531
In addition, another corollary concern is the type of corpora used for MWE extraction, as their scope may prove limited and/or inadequate for the task. Even today Korean dictionaries mostly utilize ‘traditional’ materials, that is, corpora that mainly consist of written language and literary sources. While some headwords do reflect the Korean spoken language, MWEs that are common in speech, including -myen an toy (should not); -ta po-myen (if); -myen coh-keyss- (I wish), as indicated in the above table, are still under-represented.
4.2 Issues in the description of MWEs Speaking of bilingual dictionaries, Granger (2018) points out that very few MWEs are given the status of independent headwords and are mostly treated as subentries at best or in examples at 319
The Bloomsbury Handbook of Lexicography
worst. This is quite comparable to the treatment of MWEs in Korean monolingual dictionaries, which tend to describe MWEs rather passively, that is, not as an independent headword but as part of sense definitions or within examples of usage. This poses two kinds of problems. First, for learners of Korean, it could be quite challenging not only to search for the relevant sense definition (or example) within the entry, but also to work out under which headword the MWE is presented, if ever. MWEs often combine common words that have a wide range of senses. For instance, the formulaic expression -ta po-myen (if) is one of the (rare) MWEs to be described as a sense definition, under the headword po-ta (to see). However, in the Standard Korean Language Dictionary (SKLD) provided by the Korean platform Naver, po-ta is divided into three sections. The first corresponds to po-ta as a verb and is divided into twenty-eight senses, and the other two sections correspond to the use of po-ta as auxiliary verb in various constructions and are each divided into four senses, with the construction -ta po-myen figuring in Section II, sense 4. Second, when the MWE is presented only as an example, separate explanations are hardly provided. A case in point is the MWE yey-lul tul-myen (for example) in SKLD. Under the headword yey (example), three examples are provided, among which yey-lul tul-e (for example), which is an inflectional variant of the formulaic expression presented in Table 18.5. The two verb endings –e and –myen may have different values, but as MWEs yey-lul tul-e and yey-lul tulmyen have the same meaning and usage. As a result of the lack of usage information, including colligation and semantic prosody, the description of MWEs in current Korean dictionaries fails to grasp and render the pragmatics of such MWEs. For example, the noun cam (sleep) can be combined with the verb ilwuta (achieve); however, the collocation is only realized in negative constructions, as cam-ul ilwu-ci mos-ha-ta (cannot fall asleep). Pragmatic information is key in the description of MWEs since most MWEs are far more frequent in spoken language than in writing and are being used in specific genres or registers; in other words, they are highly contextdependent. The issues shown in the examples above lead us to suggest that redefining the status of MWEs as independent headwords could be a solution to overcome not only the lack of systematic description but also the poor representation of MWEs in Korean dictionaries. As Bogaards (2013) remarks, there is no reason to treat MWEs differently from single word units considering their degree of fixedness. Furthermore, this would reflect even more the new relation of dictionary users in contemporary media (i.e. online), wherein users are more inclined to enter meaningful chunks in search bars, rather than single words.
5 MWEs in Malay: Introduction The most common MWEs in Malay take the following forms: (1) numeral-based idioms, e.g. dua kali empat, ‘two times four’, meaning ‘two parties behaving in the same way’ – see Sew (2015: 18), (2) reduplicated phrases (e.g. hati-hati, literally ‘heart-heart’, meaning ‘cautious/ meticulous’), (3) poetic structures, including pantun (four-line verse with an ‘abab’ rhyme scheme) and sajak (similar to ‘free verse’), that each produce a distinctive figurative meaning, (4) metaphors, (5) simpulan bahasa (literally ‘knotted language’, or language ‘tied together’) and
320
Multiword Expressions in Asian Lexicography
(6) peribahasa (similes/proverbs). Of these structures, arguably the most popular are simpulan bahasa and peribahasa. For Charteris-Black (2002: 112), simpulan bahasa are mainly ‘two-word Malay figurative units.’ Agreeing, Tajuddin (2002: 161) characterizes three-word expressions as similes or proverbs (and extending Tajuddin’s characterization, even figurative expressions at the sentence level can count as similes/proverbs). Further, Tajuddin (2002: 161–5) argues that there are ten types of simpulan bahasa that range from parts of the body to common things and concepts used in daily life. She observes that they represent ‘a linguistic manifestation of Malay culture in its many facets – the life and daily activity of the Malays, their beliefs and physical environment’. Agreeing that Malay figurative language is ‘one of the main means through which characteristic attitudes and beliefs are transmitted between social groupings’, Charteris-Black (2002: 111–12) adds that such expressions – used in the right context – imbue the user with ‘a hallmark of intelligence, quick-wittedness and education’. The use of such language is also seen as evidencing ‘creativity’, ‘aesthetical functions’ and symbolism ‘in the way of thinking among the Malay people’ (Jaafar 2005:40). The codification of MWEs as part of the Malay language is the mainstay of the Institute of Language and Literature or, in Malay, Dewan Bahasa dan Pustaka (DBP), which has the larger goal of promoting literacy and reading in Malaysia. To this end, the DBP has created a number of print and online resources, including the authoritative Malay print dictionary, Kamus Dewan (fourth edition) and the more recent Kamus Dewan Perdana (which was not available for our study at the time of writing), and its Korpus DBP – an available online corpus that allows standard corpus-linguistic queries and comprises newspaper texts, books, magazines, literature texts, working papers, etc. Tajuddin (2002:157) mentions the somewhat newer and urban MWE mulut laser (a hybrid phrase of Malay+English to mean ‘laser mouth’, or someone who tends to shoot his/her mouth off). Extending Tajuddin, we also study its fully Malay equivalent mulut celupar (in consultation with some native Malay speakers), alongside the somewhat associated term mulut tajam (‘sharp mouth’, or in English, ‘sharp tongue’). These terms were then searched using the print Kamus Dewan, its online mirror equivalent and the Korpus DBP. The results are surprising: while mulut tajam is represented in the Kamus Dewan, the same dictionary does not seem to have any attestations for mulut laser or mulut celupar. However, its corpus has 10 instances for the term mulut tajam, 58 instances for mulut celupar and 20 instances for mulut laser. Hence, there should be greater synergy between the print and online versions (which, hopefully, the Kamus Dewan Perdana reflects). In addition, the concordance for mulut celupar suggests repeated verb collocates such as melampar (‘slap’), menjadi (‘become’), menggodam (‘hack/chop off’), terkeluar daripada (‘accidentally coming out of’) and ‘tidak suka’ (‘don’t like’) that imbue the node term with negative connotations. Such collocational information should be represented in Malay reference and learner’s dictionaries. Turning to another well-known expression, ada udang di sebalik batu (literally ‘there is a prawn behind the rock’, to mean ‘Still waters run deep’), the Kamus Dewan lists the less used variant ada udang di balik batu instead. However, the Korpus DBP lists 100 occurrences for ada udang di sebalik batu and only 15 occurrences for ada udang di balik batu. Perhaps both variants could be listed with their associated frequency information in a future dictionary edition. 321
The Bloomsbury Handbook of Lexicography
A longer idiomatic expression such as takkan dua kali orang tua kehilangan tongkat (literally ‘there won’t be two times that an old person will lose his/her walking stick’, meaning ‘Once bitten, twice shy’) is not found in the print Kamus Dewan entry for tongkat, but its online version has a section devoted to peribahasa (proverbs) in which there is a discussion of this expression; similarly, the corpus captures 3 instances of this arguably less well-known expression. Also, DBP maintains a Twitter presence and the expression is discussed in a 2018 tweet. Since lexicography is nowadays construed as ‘the science concerned with the theory and practice of dictionaries, that is, dictionaries, encyclopaedias, lexica, glossaries, vocabularies, terminological knowledge bases, and other information tools covering areas of knowledge and its corresponding language’ (Fuertes-Olivera 2018: 1), the future for the storage of Malay MWEs and other lexical information will be largely electronic (with print versions eventually phased out). Only the standalone electronic device (e.g. current smartphones) and Web access platforms have enough storage space to store the lexicon, construed as the central repository of language. Thus, besides other online resources such as kamuslengkap.com, there are currently a few apps in the Apple Store that are focused on simpulan bahasa and peribahasa – nascent though they may be currently in their coverage of Malay MWEs. It is worth noting that cultural knowledge of Malay should also be captured in the dictionary. In this connection, Charteris-Black’s (2002: 112) observations on the spatial relations in Malay are worth noting. For instance, while cold in an expression such as cold hearted has a negative connotation, the Malay expression hati sejuk (‘heart/liver cold’) – or more correctly, sejuk hati – has a ‘positive connotative meaning’ (a feeling of relief about something). Similarly, the Malay idiom tangan dingin (‘hand cold’) is equivalent to the English or American expressions green fingers/green thumb – thus, ‘cold’ in Malay generally connotes a positive meaning. However, some generalizations may not always work. Adding to Charteris-Black’s observations, we may surmise that ‘front’ (hadapan/depan) in Malay connotes positivity and ‘back’ (belakang) will connote negativity – as evidenced by an expression such as ada aku dipandang hadap, tiada aku dipandang belakang (‘I look forward/front, I do not look backward’, meaning ‘Love is forwardfacing; from a distance, it is soon forgotten’). However, in an expression such as di belakang ia menendang kita, bila di depan ia mengeting kita, jika di tengah, ia berpusing ligat pula (roughly, ‘from the back it kicks us; when in front, it cuts our Achilles tendon; in the middle, it spins too’, meaning ‘Guilt feelings invoked when facing a loved one’) shows that spatial relations by themselves need not always be binary; collocational and contextual information would equally be important for connoting ‘positivity’ or ‘negativity’. It would be useful to have all these types of information in the e-/online dictionary of the future.
6 Conclusion In this chapter, we have examined aspects of MWEs for lexicography in the Asian languages that we set out to investigate. For Chinese, we focused on the representation of semantic and contextual information of MWEs in The Chinese-English Dictionary that included networking literal, conceptual and cultural aspects. For Japanese, we observed that phrases in both Japanese-language (JL) and English–Japanese dictionaries (EJ) could be more clearly defined
322
Multiword Expressions in Asian Lexicography
and classified in view of the discrepancies in input-output English and Japanese multi-word expressions (MWEs) by Japanese English learners. For Korean, we complemented phraseology and frequency-based approaches for the extraction and description of Korean MWEs. For Malay, we examined the treatment of several MWEs in online and print resources produced by the authoritative Dewan Bahasa dan Pustaka (DBP) and suggested codification improvements that come about when MWEs in print dictionaries are based more on corpus/empirical evidence and aligned with their corresponding electronic/online format in the Internet era.
Notes 1 The suggestion by Bergenholtz and Gouws (2013) for the occurrence and spread of different types of multi-word combinations in dictionaries is largely based on the examination of Danish dictionaries. 2 ‘Formulaic expression’ and ‘multi-word expression’ are used here interchangeably as synonyms. Nonetheless, the term ‘multi-word expression’ can be problematic in Korean precisely because of the definition of ‘word’ as unit in and the typological features of the Korean language.
References Dictionaries Baharom, H.N. (Editor-in-chief) (2005), Kamus Dewan, Fourth Edition, Kuala Lumpur: Dewan Bahasa dan Pustaka. Available at https://prpm.dbp.gov.my/ [accessed 15 October 2020]. Korean Collocation Dictionary for Learners (2007), Seoul: Communication Books. Lu, G.S., C.L. Zhao, J.B. Wan, Y. Shen et al. (eds) (2015), The Chinese-English Dictionary (Unabridged) (Vol. 1), Shanghai: Fudan University Press. Standard Korean Language Dictionary (1999). Available at https://ko.dict.naver.com/#/main [accessed 15 October 2020]. Yonsei Korean Dictionary (1998), Yonsei Institute of Language and Information Studies, Seoul: Doosan Dong-A.
Other references Atkins, B.T. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. Austin, J.L. (1975), How to Do Things with Words, Oxford: Oxford University Press. Bergenholtz, H. and R. Gouws (2013), ‘A lexicographical perspective on the classification of multi-word combinations’, International Journal of Lexicography 27 (1), 1–24. Bogaards, P. (2013), ‘A history of research in lexicography’, in H. Jackson (ed.), The Bloomsbury Companion to Lexicography, London: Bloomsbury, 19–31. Charteris-Black, J. (2002), ‘Second language figurative proficiency: A comparative study of Malay and English’, Applied Linguistics 23 (1), 104–33. Craik, F. and R.S. Lockhart (1972), ‘Levels of processing: A framework for memory research’, Journal of Verbal Learning and Verbal Behavior 11, 671–84.
323
The Bloomsbury Handbook of Lexicography
De Groot, A.M.B. and H. Comijs (1995), ‘Translation recognition and translation production: Comparing a new and an old tool in the study of bilingualism’, Language Learning 45, 467–509. Firth, J.R. (1957), Papers in Linguistics, London: Oxford University Press. Fuertes-Olivera, P. (2018), ‘Introduction: Lexicography in the internet era’, in P. Fuertes-Olivera (ed.), The Routledge Handbook of Lexicography, London: Routledge, 1–15. Gantar, P., L. Colman, C.P. Escartin and H.M. Alonso (2018), ‘Multiword expressions: Between lexicography and NLP’, International Journal of Lexicography 32 (2), 138–62. Granger, S. (2018), ‘Has lexicography reaped the full benefit of the (learner) corpus revolution?’, Proceedings from the XVIII EURALEX International Congress – Lexicography in Global Contexts (17–21 July 2018), Ljubljana: Euralex, 17–24. Grice, H.P. (1975), ‘Logic and conversation’, in P. Cole and J. Morgan (eds), Studies in Syntax and Semantics III: Speech Acts, New York: Academic Press, 183–98. Jaafar S. (2005), ‘Fungsi Leksikal “Bunga” dalam Simpulan Bahasa dan Peribahasa Bahasa Melayu’, Jurnal Bahasa DBP 5 (4), 40–67. Lu, G. S. (2012), ‘Flying over and arriving’ in G.Q. He (ed.), Translation Teaching and Research Vol. 2, Shanghai: Fudan University Press, 1–1. Nam, K., H.J. Song and J. Choi (2016), ‘A Morpheme-based analysis of lexical bundles in Korean: An interface between corpus-driven approach and lexicography’, Lexicography – Journal of Asialex 3 (1), 39–62. Nesselhauf, N. (2005), Collocations in a Learner Corpus, Amsterdam: Benjamins. Rundell, M. and P. Stock (1992), ‘The corpus revolution’, English Today 8 (4), 45–51. Sew, J.W. (2015), ‘Aspects of cultural intelligence in idiomatic Asian scripts’, Word 61 (1), 12–24. Sinclair, J. (1991), Corpus, Concordance, Collocation, Oxford: Oxford University Press. Sinclair, J., S. Jones and R. Daley (2004), English Collocation Studies: The OSTI Report, London: Continuum. Svensén, B. (1993), Practical Lexicography, Oxford: Oxford University Press. Tajuddin N. (2002), ‘Malay idiomatic expressions: Their structure and categorization’, Jurnal Bahasa Moden, 155–66. Wittgenstein, L. (1953), Philosophical Investigations, Oxford: Basil Blackwell.
324
19
Issues in onomasiological lexicography Gerardo Sierra
1 The concept of onomasiology The fundaments of modern linguistics have their origin in the theories taught by de Saussure, theories that were later compiled by his students. Among other contributions, Saussure pictured the linguistic sign as the combination of a concept (signifié) and an acoustic image (signifiant). A word would be only a sequence of sounds if it did not have a meaning (if there was no signifié). It becomes a word only when the sound is associated with a representation that evokes the acoustic image (concept). This results in the dual-nature structure of the linguistic sign. In the field of semantics, Ullman (1967) defines the concept of meaning based on three basic components: name, sense and thing. Name refers to the sounds that make up the word. Sense is the information the name transmits to the listener. Thing is the characteristic or non-linguistic event that is spoken about. He identifies the meaning as the relation between the name and the sense. This relation is not biunivocal, as pointed out by Ullman, since various names can be associated with a meaning, while various senses can be connected to a name. The first case refers to synonymy, while the second one to polysemy and homonymy. Likewise, he mentions that the relation between name and sense is ‘reciprocal and reversible’; that is, it is possible to move back and forth between them. Thus, the communicative function of language between a listener and a speaker implies a duality, signification and designation (see Figure 19.1). The signification, as a semasiological approach, derives from the name to the concept. The listener receives acoustic images and, from the significations field in their mind, determines the meaning. On the other hand, the onomasiological approach (or denomination) is based on the concept and moves towards the acoustic image. A speaker chooses designations of his vocabulary to convey mental objects. While Ullman defines onomasiology as a branch of semantics, although he does not explicitly contrast it with semasiology, Baldinger (1980) highlights both as two opposite approaches. He uses the semantic triangle, thing-sense-name shown by Ullman, as well as the linguistic sign shown by de Saussure to introduce a structural relation between a thing or reality, a concept or sense, and a name or acoustic image. In the same way, the name and the thing are not directly connected. The linguistic sign is represented by the bipolarity present between name and sense.
I want to thank PASPA-DGAPA-UNAM for the support received, as well as Jorge Magaña for his comments and Roberto Santamaría for the English translation.
The Bloomsbury Handbook of Lexicography
Figure 19.1 Dual approach between semasiology and onomasiology. Baldinger confirms two structural relations: semasiology and onomasiology, to identify two types of dictionary that take into account the user’s needs. The semasiological dictionary helps to decode and corresponds to the point of view of who interprets the speaker; it uses the form of the expression to look for the signification. It is alphabetically arranged, and it associates meanings with expressions or words in a way that the arrangement within the entries moves from a word to the meaning. On the other hand, an onomasiological dictionary, given that it helps in decoding from the point of view of the speaker, uses the mental object and looks for the designations. It connects names with concepts, and the arrangement moves from the meaning or concept to the name or word. It is arranged by concepts and categorizes the field of meanings. The distinction between semasiology and onomasiology allows the consideration of a new perspective in lexicography. In the semasiological approach, the perspective is from the dictionary to the user. Lexicographers produce dictionaries and the definitions provide the elements sufficient to know the meaning of a word. Alphabetical dictionaries begin with the name to look for the sense or senses associated with it, while conceptual dictionaries begin with the sense and identify the name or names related to it. On the other hand, the onomasiological approach goes from the user to the dictionary. The user must provide the concept, while the dictionary interprets that concept to find the most suitable word. The user can formulate the concept through various methods and can use a range of words that are similar in a given context. Depending on the social, cultural and geographical context, the description of the context may have multiple inherent properties. Therefore, the design of an onomasiological dictionary must first foresee a multiplicity of properties for each concept 326
Issues in Onomasiological Lexicography
and, second, the diversity of words that can be used to name them. Then, the task consists in the accurate interpretation of the description of the concept and the provision of the likely word or words the user is looking for.
2 Conceptualization The concept is a mental representation of an object that originates in the minds of individuals through an abstraction process. We call this process, conceptualization; a process that groups the data of common properties, according to external factors, to later internalize the concept. The starting point for describing and defining concepts and identifying their interrelations is the identification of their properties. In this regard, we must consider a dichotomy. On the one hand, it is recognized that some properties are necessary and sufficient to distinguish a concept from any other, and these properties reflect the essential characteristics of the concept. On the other hand, other properties are not essential, only observable in an individual thing, so they are accidental, they can change with the passage of time and are not even really necessary in a scientific sense (Petöfi 1982: 105). In fact, Aitchison (1994) stated that there are differences between experts and laymen in distinguishing between essential and non-essential characteristics. Although experts could specify the true nature of things, sometimes they provide irrelevant information for the mental lexicon. On the other hand, ordinary people do not agree with each other and even sometimes change their minds. This is because social, cultural and geographical factors influence how people acquire knowledge and, therefore, perceive the world. As Table 19.1 shows, the essential characteristics are not necessarily present in the mental lexicon of a person; each one describes different properties. The descriptions were written by undergraduate students from Manchester, England, to refer to the concept of euthanasia. Through a simple exercise like this, which presents the conceptualization in which people are involved in order to get to a word from its meaning, the traditional supposition that a semasiological approach should be used to provide a formal definition to describe the meaning of a word is set aside. On the contrary, in the onomasiological approach, the user can formulate a
Table 19.1 Description of euthanasia by users. It’s when old or disabled people are killed legally and happens in the Netherlands
Illegal act of helping somebody die who is terminally ill, who is in lot of pain, who needs help – cannot die alone
Act or practice of causing death painlessly
Ending life before natural end, to relieve from suffering due to illness
The decision taken by someone as to whether they die or not
Allowing people to die if they wish to, giving them the drugs or means to terminate their lives
Idea of turning off life-support machine; coma
Killing someone when they are terminally ill with their permission
The right to decide to end one’s own life
Right to die under special conditions
327
The Bloomsbury Handbook of Lexicography
concept in different ways and use a variety of words to find a particular word. Even a description of non-essential characteristics (given together) provides enough information to identify the term.
3 The use of onomasiological dictionaries Although the first dictionary designers and compilers were aware that people needed dictionaries, they did not know the different groups of users nor what they really wanted. In effect, lexicographers considered themselves as users, and it was how they compiled dictionaries. Currently, this has changed and publishers now try to offer works that meet the different needs of people, since a dictionary that satisfies a particular type of user is not necessarily the most convenient for others. The proper analysis of users, their needs and their characteristics allows for different dictionaries and improving their design. In this sense, Cowie (1983) proposes two criteria concerning users: one about their needs, that is to say, what kind of information they are looking for and for what purpose; and another about their skills or competences in the use of dictionaries to obtain better results. The studies to identify the needs of the users are based on direct observation and in questionnaires applied to different groups of users, with different objectives and orientations. The first one was made by Barnhart (1962) for roughly 56,000 students. The classification of the uses varies considerably from one test to another, in a way that it is difficult to arrive at conclusions. For example, while Kharma (1985) takes into consideration eight types of use, Quirk (1973) only considers four, of which two match Kharma’s: meaning and spelling. In Figure 19.1, different uses can be observed, presented in descending order.1 Included are Tomaszczyk (1979), Béjoint (1981), Hartmann (1983) and Kipfer (1987). According to the graph, most of the users look for meaning and spelling, closely followed by grammar and synonyms (see Figure 19.2).
Figure 19.2 Uses of the dictionaries. 328
Issues in Onomasiological Lexicography
Nevertheless, it should be noted that these studies aim to discover the uses made of dictionaries, whether monolingual or bilingual, even though they do not show precisely what the needs of users are. People acquire what exists, but this does not always meet their needs. The lexicographers who did these surveys are surprised that a large number of users look for synonyms in the dictionaries. In reality, there is a wide variety of needs that are not met by traditional dictionaries. In order to identify this range of possibilities, it is convenient to take into account that there are four main linguistic activities: reading, writing, listening and speaking. Based on these activities, the requirements of users are different. Reading and listening are receptive and comprehension processes in which passive decoding is performed. Writing and speaking, on the other hand, are production processes that require active coding. Generally speaking, the receptive linguistic users (readers or listeners) need a reference book to understand unknown words they have read or listened to, while for the active language users, writers and speakers, require linguistic indications to convey concepts. In this sense, and according to Baldinger, one can distinguish two major classes of dictionary: comprehension and production. These terms bear a close relationship to the semasiological (comprehension) and onomasiological (production) terms. Comprehension dictionaries go from form to content, from signifier to signified, from word to definition; they are those which users check to look up the meaning of a word, phrase, or other lexical units they have heard or read but are unfamiliar with. Production dictionaries, on the other hand, go from content to form, from signified to signifier, and they are used to facilitate the creation of texts, whether oral or written. Under the premise that dictionaries are more frequently needed and used to decode rather than to encode, lexicography has been mainly oriented towards the production of dictionaries to meet the demand of passive decoders with the required information of a linguistic, semantic, encyclopaedic and pragmatic nature. The reason, as stated by Shcherba (1995) and Svensén (1993), lies in that the first lexicographers were doubtful about native speakers requiring to find words that expressed thoughts. It was even considered that these dictionaries for looking up words worked, in some cases, as support in doing crosswords. Very recently, it has been accepted that the knowledge of a native language is sometimes uncertain. Even though it has not been reported as such in the studies, Hartmann (1983) points at the usefulness of reference works that help to produce language: to write and to speak. For example, he mentions that more than half of users frequently feel frustrated while checking dictionaries. Although their study does not provide additional relevant information about users that look up words, he has observed that at least three out of four users need a dictionary to write. There are several types of reference work that claim to solve the needs of writers moving from a meaning or concept to a corresponding word. Their usefulness is also extensive and has been recognized by several authors. Among the different uses, it is worth mentioning the following: ●●
Finding the right synonym. It might happen that during the production of a text, an author knows a word and wants to know the most suitable synonym among those which present a slight semantic difference and similar properties. Rather than understanding an unknown word, the objective is to produce a word whose meaning is already known, discovering the associative links in the lexicon. The author knows what they want to say; they know that the right word exists but are unable to find it.
329
The Bloomsbury Handbook of Lexicography
●●
●●
●●
●●
Enriching the lexicon. Sometimes, an author wants to clarify and shape an idea they wish to convey, but the words at hand do not express what they want. To do this, they look up the word they are uncertain about among other possible words or new ones that help them enrich the lexicon of the topic they are writing on. Solving the tip-of-the-tongue (TOP) problem (Brown and McNeill 1966). A common, unfortunate and extremely frustrating situation is not being able to find a particular word. The users want to find the required word they are thinking about instead of a set of related words or possible synonyms but cannot remember it. They know the word exists, but they cannot recall it; they think of an idea that might enable them to find the right word that escapes them. Knowing the word for a given concept. When a foreign language is used or when somebody tries to use a specialized term, the speaker tends not to know if there exists a word for an idea they want to convey. This problem usually occurs during translation and interpretation activities where the need to find more suitable words in the target language emerges to translate an obscure expression in a source language. Learning a language. During the language learning process, speakers frequently know a word in their source language, but they do not know it in the target one. More than a bilingual dictionary, what they are looking for is a work that allows them to express the ideas they have in mind to get to the word they are looking for.
4 Printed onomasiological dictionaries To develop reference works with a focus on concepts, as well as to meet the needs of writers, there have been some attempts to go beyond traditional lexicography. Several terms are used to name this type of reference work: ideological dictionary (Zgusta 1971; Shcherba 1995), semantic dictionary (Malkiel 1975), conceptual dictionary (Rey 1977), speaker-oriented lexicon (Mallinson 1979), thematic wordbook (McArthur 1986), nomenclator (Riggs 1989) and word finder. All these works have in common an onomasiological approach. The name ‘onomasiological dictionary’ covers every dictionary that is used to obtain a word from an idea. Their special feature is that words are not regarded as isolated units but are generally arranged or grouped by shared semantic features under keywords. The types of more representative dictionary are described below which claim to meet the requirement of writers that need to move from meaning or concept to a corresponding word. The description recognizes four types of books of words according to the type of information contained, the structure and the type of search performed: (1) thesauri, representative of the majority of conceptual reference books with a systematic chart of topics; (2) reverse dictionaries, with specific characteristics that allow the user to search directly based on a keyword instead of an index or concept tree; (3) synonymy and antonymy dictionaries, sometimes mistaken for thesauri but very different from them, given that they are structured differently and work with words rather than concepts; and, finally, (4) pictorial dictionaries that present concepts as pictures instead of words. Even though these dictionaries are also available in digital format, their structure is the same, and so they do not differ in their descriptions or their search methods. 330
Issues in Onomasiological Lexicography
4.1 Thesauri Several authors consider that thesauri are the oldest onomasiological dictionaries, as the configuration of the entries goes from signified to signifier. These feature a systematic classification in which the lexicon is arranged hierarchically by topics, according to the viewpoint of the author. Thus, a user can browse the topics to find the words that are closely related to a concept. Even though there are old records regarding works of this nature, such as Julius Pollux’s Onomasticon, in modern lexicography the Thesaurus of English Words and Phrases by Roget (1852), also available online, is recognized as one of the first (Hüllen 2004, 2009). Its thematic classification starts with 6 classes (abstract relations, space, matter, intellectual faculties, voluntary powers, sentient and moral powers), which are broken down into 39 sections and 990 heads. This very structure was followed for Spanish with the Diccionario de ideas afines y elementos de tecnología, for French with the Dictionnaire idéologique and for German the Deutscher Sprachschatz. With a different structure, the Diccionario ideológico de la lengua española by Casares for Spanish stands out. Another dictionary worth noting is Wüster’s multilingual work, The Machine Tool, which also opens the path to terminology, lexicography’s sister. The regular steps to arrive at a target word from a concept are, first, obtaining an approximation to the concept and, second, choosing a keyword to begin the search; that is, looking for words that characterize the concept; then, choosing a small number of words that seem most relevant for a search. However, sometimes users have difficulties in some of the steps, as well as in identifying the exact search words that match the keywords of the thesaurus. Likewise, they can find the classification scheme very difficult to navigate, which is why these dictionaries have an alphabetical index. In fact, most authors agree that the index seems to be the best entry point to find the most relevant entry.
4.2 Reverse dictionaries The name ‘reverse’ might be confusing since it is also used for dictionaries in which the configuration of words is alphabetical from the rightmost letter. In our context, this name refers to the search process from concept to words, instead of the sequence of traditional dictionaries, from words to concept. Among the onomasiological dictionaries that have coined this term are Bernstein’s Reverse Dictionary, Reader’s Digest Reverse Dictionary and The Oxford Reverse Dictionary. To find a target word in any of the dictionaries, users think first about a concept and a keyword that refers to this concept. Then they can go directly to the main body of the dictionary that consists of the ‘reverse dictionary’ itself. Given that the macrostructure is alphabetical, the user can move directly from the keyword to the entry with the target word, without the need for any index. For instance, in the case of the most recent dictionary in this category, The Writer’s Digest Flip Dictionary, there are several options to search for the word euthanasia. death: bane, biolysis, curtains, decay, decease, demise, departure, doom, end, euthanasia, exit, expiration, extinction, fatality, grim reaper, loss, murder, passage, passing, sleep 331
The Bloomsbury Handbook of Lexicography
death caused to relieve suffering: euthanasia death, painless and peaceful, during terminal illness: euthanasia killing of mercy: euthanasia mercy killing: euthanasia TYPES OF MURDER mercy killing: euthanasia However, it is worth mentioning two difficulties in the use of these dictionaries: that the user does not think of a suitable keyword or that it does not take them to the target word. This is understandable from looking at Bernstein’s dictionary, for example, since it has 13,390 entries, which can be accessed through roughly 8,000 keywords; that is, there are almost two entries for each clue word. This number ends up being insufficient because there are many ways of thinking about a concept. Because of this, the Digest itself suggests trying different keywords and hope that one of them produces a result.
4.3 Synonym and antonym dictionaries Synonym dictionaries are regarded as a type of onomasiological dictionary by almost all lexicographers. Their objective lies in enabling a user to increase their vocabulary and discover a word’s associative relations, which enables the finding of an alternative for the word being used. In this sense, it enables a user to find the word that they are looking for, but which escapes the user. For this, users must think of keywords with a similar meaning to that of the target word, instead of associated words that lead to the concept. Most dictionaries contain lists of related words. In general, the entries are arranged alphabetically, but the internal list of synonyms, quasi synonyms or related words can be listed alphabetically or otherwise. In some cases, they present antonyms, although there are also dictionaries which specialize in them. In the same manner as thesauri, most synonym dictionaries list the elements without any explanation of the words (without providing definitions). A few dictionaries explain the meaning of sets of apparent synonyms and provide the differences of use of each word. Among them, it is worth noting Webster’s New Dictionary of Synonyms and the Diccionario de Sinónimos by Roque García, for English and Spanish, respectively.
4.4 Pictorial dictionaries Pictorial dictionaries have specific features that make them superior to other books of words. As in conceptual dictionaries, the world is arranged by concepts, but each concept can be represented through drawings that illustrate the parts or classes related to the concept; a word indicates the name of the part or class. The definitions are unnecessary because there is a direct relation between name and object. The pictures show the vocabulary of a whole topic, which is grouped in a classification scheme. Usually, there is an alphabetical index that allows the search from a word to the object and identifying related words. A restriction is that they are only suitable for physical objects and their parts or classes that can be visually represented. Most pictorial dictionaries only include nouns but, in some cases,
332
Issues in Onomasiological Lexicography
verbs that represent actions can also be shown, as well as some adjectives. Apparently, the topics and the information given by each image depend on the imagination of the illustrator. For example, it is possible to find forty types of hats and caps. In this way, the dictionary becomes an encyclopaedia. The most popular dictionaries of this type are the Oxford-Duden. There are several bilingual and multilingual editions based on the Bildwörterbuch, first published in Germany in 1937. The Bildwörterbuch Online has a semantic structure that comprises 17 topics, contains more than 6,000 images covering around 20,000 words. Although pictorial dictionaries have an onomasiological approach, in that they enable the user to obtain the target word through the search of the image of a concept, there are differences in the purpose of these works. While Shcherba (1995: 337) considers that ‘it is often invaluable in searches for a desired foreign-language word’, Hill (1985) states that they are useful for teachers and writers, rather than to learn a second language.
5 Electronic onomasiological dictionaries Printed dictionaries have several limitations that, thanks to computational lexicography, are significantly overcome by electronic dictionaries, whether on CD-ROM, online or from an app. These limitations are present both for the lexicographer or dictionary publisher and for the user. Among the limitations for the lexicographer, there is the cost and updating. First, an electronic dictionary is much easier to update than a printed book. On the one hand, computational technology is taking such big steps that, in order to be competitive, there must be regular innovation in the software behind electronic dictionaries. Additionally, in terms of content, it is easier to include neologisms or remove archaisms, to name a few cases. To update a dictionary on CD-ROM, it is required to schedule the new versions. In the case of online dictionaries, there is no need for a waiting period, except informing the subscribers that an update will take place during a short period of time, generally a few hours. From the point of view of users, they can look up information through a variety of potential routes in electronic dictionaries. For this to happen in printed dictionaries, they would require an unimaginable volume to contain all the necessary information. Thanks to the contributions of computational linguistics, users can perform searches in natural language; that is, they can express their own concept the way they would express it to somebody else. For electronic dictionaries to work, they require a knowledge base and a search engine. Then, the results can be presented in an organized way, in an alphabetical list of words or lists of words grouped by concept, as in a thesaurus. The following description of onomasiological dictionaries focuses on works that have been explicitly aimed towards concepts. Online thesauri are not included, given that they present the same limitations as the printed ones. The presentation takes into account the knowledge database, which can be based on dictionaries or graphs. Also, the most recent contributions that use machine learning are mentioned.
333
The Bloomsbury Handbook of Lexicography
5.1 Traditional dictionaries for onomasiological searching Besides the traditional search that enables a user of printed dictionaries to find the definition of a given word (entry), electronic dictionaries allow for other types of search that could not be done in printed ones. Taking as a basis the fact that semasiological dictionaries have the necessary information, they can be used to look for a word whose definition contains one or several specific words. For example, one could look for the list of all the entries in whose definitions the words ‘apparatus’ and ‘measure’ are present, and one could obtain as a result: absorptiometer, actinometer, areometer, etc. It is in electronic dictionaries where it is possible to perform this type of search with one or more words connected by Boolean operators. In this sense, the success or failure of the onomasiological search lies in the keywords introduced by the user to represent the concept of the term they are looking for. Unfortunately, the words introduced by the user do not always match the words that are used in the definition of the entry they are looking for. For example, Sierra and McNaught (2003) make a comparison of the use of two dictionaries for an onomasiological search, the Collins English Dictionary (CED) and the Oxford English Dictionary (OED). For instance, they mention that not all measuring instruments contain the words ‘instrument’ and ‘measuring’ in their definitions, given that some can contain ‘apparatus’ or ‘device’ and ‘ascertaining’, ‘determining’, ‘estimating’, ‘testing’ or ‘indicating’ in their place. Below, one can compare the two definitions of Alkalimeter. An apparatus for determining the concentration of alkalis in solution (CED) An instrument for ascertaining the amount of alkali in a solution (OED) To solve the problem of the different terminology used by the user and by the lexicographer, it is proposed that the automated dictionaries, for this kind of search, automatically expand the original formulation of the user with clusters of related words, whether they are synonyms or other lexical relations (hyponyms-hyperonyms, meronyms-holonyms, among others). Thus, not only stated keywords would be looked for in the definitions, but also the ones belonging to the clusters. These clusters make up the lexical knowledge base (LKB), which, in other words, contains a wide range of semantically equivalent terms that allow users to enter a query using a wide variety of alternatives to a particular term. Different LKBs use available synonym dictionaries and thesauri. One of the first onomasiological dictionaries was the one created for French by Dutoit and Nugues (2002). Their LKB uses a database of hierarchically arranged words according to hyponymic and hyperonymic relations. Another attempt at an onomasiological dictionary is the one developed for Turkish by El-Kahlout and Oflazer (2004), who widen the search with the synonymic relations obtained on Wordnet. In the case of the onomasiological dictionary for Spanish by Sierra and McNaught (2003), a tool for generating these clusters through the alignment of definitions was developed (Sierra and McNaught 2000). These definitions are obtained not only from dictionaries but also automatically extracted from specialized texts through a definitional context extractor (Alarcón et al. 2009). Another alternative for widening the search with semantic clusters is to search in different dictionaries at the same time, check the wide variety of dictionary definitions, and take the one most suitable for the search query. Because of this, the LKB will be made up of a wide range of 334
Issues in Onomasiological Lexicography
definitions from a variety of sources, both normative and non-normative. As an example, there are two known onomasiological dictionaries available online: the OneLook Reverse Dictionary and the Reverse Dictionary. The first of them is regarded as a good example of onomasiological lexicography, and it is usually used as a baseline for the development of other dictionaries.
5.2 Graph-based dictionaries In recent times, there have been attempts to create onomasiological dictionaries based on knowledge databases that use the definitions to obtain semantic relations usually represented in graphs. Alcina (2009), for example, proposes ONTODIC, an onomasiological dictionary for Spanish based on ontologies, specializing in the area of industrial ceramics. For this, they perform processes for the automatic extraction of concepts, the formalization of characteristics and the identification of conceptual relations. For their part, Thorat and Choudhari (2016) propose a dictionary based on a graph that derives from the semantic information extracted from definitions. The algorithm directly measures the similarity between the entry query and any word in a graph of words. Although its performance is comparable with that from the OneLook Reverse Dictionary, it is worth mentioning that it only works with a small lexicon of about 3,000 words. Among other types of graph are the ones constructed with Word Association Norms (WAN), which are collections that present stimulus words and the set of their associated responses. They are compiled by presenting a stimulus word to the participant and asking them to produce the first word that comes to their mind. The WAN are a special type of semantic network, and they are available in many languages. It could be considered that Roget’s Thesaurus is an example of WAN. More recent ones are the Edinburgh Associative Thesaurus (Kiss et al. 1973) and the collection of the University of South Florida (Nelson et al. 2004). For Mexican Spanish, the only available resource is the Corpus de Normas de Asociación de Palabras para el Español de México (Arias-Trejo et al. 2015). It has been demonstrated that WAN are useful for developing onomasiological dictionaries, given that they correctly represent the connections between words and the way the concepts are linked in the human mind (see Figure 19.3), even more so with the advent of the Internet and language technologies, which have allowed the development of online resources fed by the enormous corpus that the World Wide Web provides. Reyes-Magaña et al. (2019) introduced a lexical search model based on graphs. This graph is built up using a word association corpus. The lexical search model uses WAN in Spanish as the basis of the design of a lexical search system that works from the clues or definitions to the concept; that is, this system works as an onomasiological dictionary. The proposed dictionary uses a given definition and looks in the graph for the word that best corresponds with it. The results have been shown to be very encouraging no matter the language they are using. WordNet is an extremely useful and widely used resource for different applications within computational linguistics, which can be considered as a knowledge graph par excellence, based on WAN studies. Although WordNet has not been used as an onomasiological dictionary, its organization of lexical information in terms of the meaning of words, instead of word forms, makes it similar to a thesaurus, as in the case of Roget. While in Roget the lexical relations are
335
The Bloomsbury Handbook of Lexicography
Figure 19.3 Example of a graph.
left implicit and the material is explicit, in WordNet one must go from lexical relations such as ‘hyponymy’ and ‘brother term’ to find the material. That is, in Roget one arrives at the material directly, without the need to possess previous knowledge of the taxonomies, while in WordNet one must think about the possible taxonomic tags.
5.3 Tendencies The development of onomasiological dictionaries has awakened the interest of areas such as artificial intelligence, in particular the ones focused on machine learning, such as the recurrent neural networks (RNN) and the word embedding models. These models provide high accuracy in the prediction of the modelling of sequences of words, such as the definitions or descriptions given by users. In this way, they enable the word to be found whose definition is most similar to the query. Hill et al. (2016) propose the creation of an onomasiological dictionary based on an RNN that is trained with definitions obtained from dictionaries or extracted from encyclopaedias. The words of these definitions make up the vocabulary of the neural network, which uses an architecture known as longest short-term memory (LSTM). With this, the query is embedded in 336
Issues in Onomasiological Lexicography
the semantic space of word embeddings and returns the words whose embeddings are closest to the expected result. Zhang et al. (2020) present a model of a multichannel onomasiological dictionary that incorporates four of them to predict the characteristics of the target words based on the given entry queries. The predictors allow the part-of-speech categories to be identified, as well as the morphemes, the semantic hierarchy of the words and the sememes. In this way, it is intended to emulate alternative ways in which a human being can think. With these predictors, an RNN architecture known as bidirectional LSTM (BiLSTM) is used. His dictionary (WantWords) for Chinese and English is available on the web (Qi et al. 2020). Gabriele Tomassetti (2019) provides the necessary elements to create an onomasiological dictionary based on RNN. He uses a word embeddings model widely used nowadays by the community of Natural Language Processing, Word2Vec. This model uses the distributional semantics formulated by Harris (1954) in which the words that appear in the same context tend to convey similar meanings. Through his website DZone, he provides developers with the codes and algorithms for the training and functioning of the dictionary, and even to create a web interface.
6 Drawbacks of onomasiological dictionaries From the typology described, a variety of dictionaries emerge that have an onomasiological character, of which Hartmann (2006) has counted more than 600 works in some 35 languages. Despite this variety, the dictionaries designed for users to search for a word expressing their idea present several limitations. The existing printed dictionaries only provide a list of related terms alphabetically arranged (e.g. a synonym dictionary) or by concepts or topics (e.g. thesauri). Pictorial dictionaries cannot help with concepts that cannot be drawn, while in reverse dictionaries, there are only one or two ways to arrive at the target word; however, the user cannot always think exactly in those terms. Among the main difficulties for lexicographers is the lack of a universal arrangement method alternative to the alphabetical order of the entries. While alphabetical sorting is reliable and objective, moving from the meaning to the word implies considering a macrostructure that depends on the lexicographer’s subjectivity. As for the user, they must be capable of translating their idea in terms of words and then select the ones that in their opinion are most relevant for the main idea they need to express, but do not remember (Zock, Ferret and Schwab 2010). However, the user might not have a clear image of the concept: they might have a blurry image of the meaning of the word they are looking for; they might have a description of their characteristics or remember some words that can be connected to the ones they are looking for. In other less fortunate cases, they can only remember some letters, for example, what the first or last letter of the word is. Unfortunately, the users cannot always express their concepts in a clear and unambiguous way, so only a few times do they obtain the expected results. There is also the problem that there are many ways in which a person can express the same concept. This is because we all think differently, and there can be as many conceptualizations of the meaning of a word as there are people who create them. These differences in perception make it clear that the definitions formulated by the user might not match with the formal definitions
337
The Bloomsbury Handbook of Lexicography
found in conventional dictionaries. A useful onomasiological dictionary must take into account not only the point of view of lexicographers but also the many ways the users see the world. A difficulty for users emerges with the way they express the concept to begin the search. The vocabulary of ordinary people is wide and their experience of the use of words diverse enough to expect that everybody uses the same keywords to refer to a particular concept. People express the same description (e.g. measuring instruments) with different words (instrument, device, apparatus; determining, measuring, recording, etc.). Once users have these words, they must verify if they match or not with the ones found in dictionaries. To show the difference between different printed onomasiological dictionaries, the search of five target words from different keywords is given in Table 19.2. In the case of WordNet, the adjacent related terms, hyponyms, hyperonyms, meronyms, holonyms, etc., were searched. As can be seen, there is no single agreement in the words that might be used to arrive at the target word. In the last two decades, several online dictionaries have been designed that allow searches in natural language. The users enter their own definition in natural language, and the engine searches for the words that match the definition. There has been little commercial success with dictionaries such as the OneLook Reverse Dictionary, the most popular, but its architecture usually reflects the compiler’s knowledge and is not made explicit. Also, some scientific research has been carried out on the construction of reverse dictionaries. These works adopt approaches based on the concordance between the entry query and the definitions stored in the knowledge base. Nevertheless, these methods do not solve the main difficulty regarding the fact that the entry queries written by the users might differ widely from the definitions of the target words. Electronic dictionaries seem to break the barriers of controlled vocabulary, allowing the user to introduce the description of the concept with the words that come to their mind. Nonetheless, the reality is that this does not always yield good results either. To compare these dictionaries, tests can also be made by introducing, in this case, a description of the concept and evaluating whether the target word is among the first k results, whether it appears in the first place, among the first three, or in the first five. This measure of evaluation is called p@k (Manning et al. 2009), and it is a common measurement of the effectiveness of a model in comparison with other information retrieval systems. As an example, a comparison of four dictionaries, OneLook Reverse Dictionary, Reverse Dictionary, WantWords and the Graph-Based Onomasiological Dictionary (GBOD) by Reyes-Magaña et al. (2019), is presented in Table 19.3.
Table 19.2 Comparison between onomasiological dictionaries. Target Word
Clue Word
Euthanasia Death Killing
Roget
Bernstein
Digest
Flip
WordNet
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Mercy
✓
Suicide Monopoly
338
Chambers
✓
Control
✓
Exclusive
✓
✓
✓ ✓
Issues in Onomasiological Lexicography
Aberration
Behaviour
✓
Derange
✓
✓
Deviation
✓
✓
Insanity
✓
✓
mental
✓
✓
fun ✓
laughter
✓
merriment
✓
air
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓ ✓
noisy
✓ ✓
✓
gaiety
Barometer
✓ ✓
lapse Hilarity
✓
✓
✓ ✓
✓
measure
✓
pressure
✓
✓
The descriptions were composed by the same undergraduate students, as for Table 19.1. The numbers indicate the place the word water appears. It can be observed that the variation of the entry description can lead to different results.
Conclusions The analysis of the literature on lexicography shows that the design of a dictionary must also take into account personal factors (needs and skills), besides the crucial fact that people have difficulties expressing what they are searching for. The distinction linguistics makes between
Table 19.3 Comparison of electronic onomasiological dictionaries. OneLook
Reverse
WantWords
GBOD
It’s a clear liquid that you get from a tap
49
19
28
1
The colourless transparent liquid occurring in rivers
1
6
4
1
A clear, neutral liquid that surrounds us everywhere
1
3
9
10
Liquid, clear, drinkable – constituents are hydrogen and oxygen
17
7
10
1
Fluid, clear, tasteless, colourless
1
15
NR
1
Wash with it; drink it; used for dilution; H2O; found in springs, rivers, lakes, seas, oceans
1
2
1
1
339
The Bloomsbury Handbook of Lexicography
‘active coding’ and ‘passive decoding’ allows us, knowing the different types of dictionaries, to focus on the onomasiological type. We have analysed the available dictionaries aimed at helping writers and speakers to express an idea when they forget a word, as well as the few attempts to create new types of onomasiological dictionaries. The tendency of online dictionaries focused on the systems aimed at the user. The ideal onomasiological search must allow writers to introduce the concept that will be searched for through the ideas they can have, using any word in any order. The system must be constructed in a way that it accepts a wide range of words that it later analyses to show the user the word that is the closest to the concept they had in mind when the search began. This does not mean that it is not possible to design a complete and efficient onomasiological dictionary. In our context, efficiency means that a dictionary must meet the requirements of a particular type of user, in a determined domain of a terminology with a specific background. Therefore, for an onomasiological dictionary to be really useful, it must allow users to express their ideas in their own terms, utilizing the words they know and constructing sentences (queries) in a way they would usually do when they ask for help of others to remember a forgotten word. In other words, the dictionaries must accept a wide variety of words to recover the missing word, even if the user does not use the same terms provided by the definitions of the lexicographer. As Zock and Schwab (2008) point out, an onomasiological dictionary must take into account the different properties of the concepts it claims to express, as well as the diversity of forms that allow reference to these concepts. Consequently, the basis of such a dictionary is the lexical knowledge base that will contain all necessary knowledge for an onomasiological search.
Note 1 This chapter summarizes the work of many years, which was initiated in the doctoral thesis (Sierra 1999) and which continues to this day.
References Dictionaries Barcia, R. (1980), Diccionario de Sinónimos, México: Oasis. Benot, E. (dir.) (1899), Diccionario de ideas afines y elementos de tecnología, Madrid: Pedro Núñez. Bernstein’s Reverse Dictionary (1975), London: Routledge and Kegan Paul. Bildwörterbuch Online, http://www.bildwoerterbuch.com/. Casares, J. (1942), Diccionario ideológico de la lengua española, Barcelona: Gustavo Gili. Collins English Dictionary (1994), Glasgow: HarperCollinsPublishers. Edmonds, D. (2000), The Oxford Reverse Dictionary, Oxford University Press. Kipfer, B.A. (2000), The writer’s digest flip dictionary, Ohio: Writer’s Digest. OneLook Reverse Dictionary, http://www.onelook.com. Oxford English Dictionary, http://www.oed.com/. Reader’s Digest Reverse Dictionary (1989), London: The Reader’s Digest Association Limited.
340
Issues in Onomasiological Lexicography
Reverse Dictionary, https://reversedictionary.org/. Robertson, T. (1859), Dictionnaire Idéologique, Paris: A. Derache. Roget, P.M. (1852), Thesaurus of English Words and Phrases. Roget’s Thesaurus http://www.gutenberg.org/files/10681/10681-h/10681-h.htm. Sanders, D. (1878), Deutscher Sprachschatz, Hamburg: Sanders. WantWords, https://wantwords.thunlp.org/. Webster, N. (1973), Webster’s New Dictionary of Synonyms, Springfield: Merriam. WordNet, https://wordnet.princeton.edu/. Wüster, E. (1968), The Machine Tool, Oxford: Technical Press.
Other references Aitchison, J. (1994), Words in the Mind: An Introduction to the Mental Lexicon, Oxford: Blackwell Publishers. Alarcón, R., G. Sierra and C. Bach (2009), ‘Description and evaluation of a definition extraction system for Spanish language’ in G. Sierra, M Pozzi and J.M. Torres-Moreno (eds), Proceedings of the 1st Workshop on Definition Extraction, Borovets, 7–13. Alcina, A. (2009), ‘Metodología y tecnologías para la elaboración de diccionarios terminológicos onomasiológicos’ in A. Alcina, E. Valero and E. Rambla (eds), Terminología y Sociedad del Conocimiento, Bern: Peter Lang, 33–58. Arias-Trejo, N., J.B. Barró́ n-Martínez, R.H. López and F.A. Robles (2015), Corpus de normas de asociación de palabras para el español de México, México: Universidad Nacional Autónoma de México. Baldinger, K. (1980), Semantic Theory: Towards a Modern Semantics, Oxford: Basil Blackwell. Barnhart, C.L. (1962), ‘Problems in editing commercial monolingual dictionaries’ in F.W. Householder and S. Saporta (eds), Problems in Lexicography, Bloomington: Indiana University, 161–81. Béjoint, H. (1981), ‘The foreign student’s use of monolingual English dictionaries: A study of language needs and reference skills’, Applied Linguistics 2 (3), 207–22. Brown, R. and D. McNeill (1966), ‘The “tip of the tongue” phenomenon’, Journal of Verbal Learning and Verbal Behavior 5 (4), 325–37. Cowie, A.P. (1983), ‘The pedagogical/learner’s dictionary: I. English dictionaries for the foreign learner’ in R.R.K. Hartmann (ed.), Lexicography: Principles and Practice, London: Academic Press, 135–44. Dutoit, D. and P. Nugues (2002), ‘A lexical database and an algorithm to find words from definitions’ in F. Van Harmelen (ed.), Proceedings of the 15th European Conference on Artificial Intelligence, Amsterdam: IOS Press, 450–4. El-Kahlout, I. D. and K. Oflazer (2004), ‘Use of Wordnet for retrieving words from their meanings’ in P. Sojka, K. Pala, P. Smrž and C. Fellbaum (eds), Proceedings of the global Wordnet conference (GWC2004), Brno: Masaryk University, 118–23. Harris, Z.S. (1954), ‘Distributional structure’, Word 10, 146–62. Hartmann, R.R.K. (1983), ‘The bilingual learner’s dictionary and its uses’, Multilingua 2 (4), 195–201. Hartmann, R.R.K. (2006), ‘Onomasiological dictionaries in 20th-century Europe’, Lexicographica 21, 6–19. Hill, C.P. (1985), ‘Alternatives to dictionaries’ in R. Ilson (ed.), Dictionaries, lexicography and language learning, Oxford: Pergamon Press, 115–21. Hill, F., K. Cho, A. Korhonen and Y. Bengio (2016), ‘Learning to understand phrases by embedding the dictionary’, Transactions of the Association for Computational Linguistics 4, 17–30. Hüllen, W. (2004), A History of Roget’s Thesaurus: Origins, Development, and Design, Oxford: Oxford University Press. Hüllen, W. (2009), Networks and Knowledge in Roget’s Thesaurus, Oxford: Oxford University Press. Kharma, N.N. (1985), ‘Wanted: A brand-new type of learner’s dictionary’, Multilingua 4 (2), 85–90.
341
The Bloomsbury Handbook of Lexicography
Kipfer, B.A. (1987), ‘Dictionaries and the intermediate student: communicative needs and the development of user reference skills’, Lexicographica 17, 44–54. Kiss, G., C. Armstrong, R. Milroy and J. Piper (1973), ‘An associative thesaurus of English and its computer analysis’ in A. J. Aitken, R. W. Bailey and N. Hamilton-Smith (eds), The Computer and Literary Studies, Edinburgh: Edinburgh University Press, 153–65. Malkiel, Y. (1962), ‘A typological classification of dictionaries on the basis of distinctive features’ in F.W. Householder and S. Saporta (eds), Problems in Lexicography, Bloomington: Indiana University, 3–24. Mallinson, G. (1979), ‘The dictionary and the lexicon: A happy medium?’, ITL - International Journal of Applied Linguistics 45–6, 10–18. Manning, C D., P. Raghavan and H. Schütze (2009), Introduction to Information Retrieval, Cambridge: Cambridge University Press. McArthur, T. (1986), ‘Thematic lexicography’ in R.R.K. Hartmann (ed.), The History of Lexicography, Amsterdam: John Benjamins, 157–66. Nelson, D.L., C.L. McEvoy and T.A. Schreiber (2004), ‘The University of South Florida Word association rhyme and word fragment norms’, Behavior Research Methods, Instruments & Computers 36, 402–7. Petöfi, J.S. (1982), ‘Exploration in semantics: analysis and representation of concept systems’ in F.W. Riggs (ed.), The Cocta Conference, Frankfurt: Indeks Verlag. Qi, F., L. Zhang, Y. Yang, Z. Liu and M. Sun (2020), ‘WantWords: An open-source online reverse dictionary system’ in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics. Quirk, R. (1973), ‘The social impact of dictionaries in the U.K.’ in R.I. McDavid and A.R. Duckert (eds), Lexicography in English, New York: New York Academy of Sciences, 76–88. Rey, A. (1977), Le lexique images et modèles: du dictionnaire à la lexicologie, Paris: Librairie Armand Colin. Riggs, F.W. (1989), ‘Terminology and lexicography: Their complementarity’, International Journal of Lexicography 2 (2), 89–110. Reyes-Magaña, J., G. Bel-Enguix, G. Sierra and H. Gómez-Adorno (2019), ‘Designing an electronic reverse dictionary based on two word association norms of English language’ in Proceedings of Electronic Lexicography in the 21st Century Conference, 865–80. Shcherba, L.V. (1995), ‘Towards a general theory of lexicography’, International Journal of Lexicography 8 (4), 314–50. Sierra, G. (1999), Design of a Concept-Oriented Tool for Terminology, Manchester: University of Manchester Institute of Science and Technology. Sierra, G. and J. McNaught (2000), ‘Extracting semantic clusters from MRDs for an onomasiological search dictionary’, International Journal of Lexicography 13 (4), 264–86. Sierra, G. and J. McNaught (2003), ‘Natural language system for terminological information retrieval’ in International Conference on Intelligent Text Processing and Computational Linguistics, 541–52. Svensén, B. (1993), Practical Lexicography: Principles and Methods of Dictionary-Making, Oxford: Oxford University Press. Thorat, S. and V. Choudhari (2016), ‘Implementing a reverse dictionary, based on word definitions, using a node-graph architecture’, arXiv preprint arXiv:1606.00025. Tomaszczyk, J. (1979), ‘Dictionaries: Users and uses’, Glottodidactica 12, 103–19. Tomassetti, G. (2019), ‘Creating a reverse dictionary’, DZone. https://dzone.com/articles/reversedictionary-neural-network. Ullman, S. (1967), Semantics: An Introduction to the Science of Meaning, Oxford: Basil Blackwell. Zgusta, L. (1971), Manual of Lexicography, Prague: Academia. Zhang, L., F. Qi, Z. Liu, Y. Wang, Q. Liu and M. Sun (2020), ‘Multi-channel reverse dictionary model’ in Proceedings of the AAAI Conference on Artificial Intelligence, 312–19. Zock, M., O. Ferret and D. Schwab (2010), ‘Deliberate word access: An intuition, a roadmap and some preliminary empirical results’, International Journal of Speech Technology 13 (4), 107–17. Zock M. and D. Schwab (2008), ‘Lexical access based on underspecified input’ in Coling 2008: Proceedings of the workshop on Cognitive Aspects of the Lexicon (COGALEX 2008), Manchester, 9–17.
342
20
Issues in collaborative and crowdsourced lexicography Franck Sajous and Amélie Josselin-Leray
1 Introduction Within a few decades, lexicography has undergone a number of changes, be it from a theoretical, technological or economic point of view. The major ones can be listed as follows: the descriptive revolution (Trap-Jensen 2018), the computerization of print dictionaries (Nagao et al. 1980, Berg et al. 1988), the contribution of corpus linguistics (Rundell and Stock 1992) and the NLP toolassisted data analysis (Rundell and Kilgarriff 2011), the release of various forms of e-dictionaries and their online publication (Nesi 2008) and, finally, for some, the end of print versions (Rundell 2014). While some of these changes result from internal shifts triggered by the private and the academic sectors, some others have been caused by external factors. Among the latter, one can find the rise of several types of free online dictionaries such as ‘potpourri of dictionaries’ (dictionary aggregators) and ‘DIY dictionaries’ (e.g. Wiktionary or Urban Dictionary). With such dictionaries being free, commercial dictionaries have had to adapt their business model (Kilgarriff 2005). The emergence of these new resources also raises questions about new ways of compiling dictionaries. ‘DIY dictionaries’, which are described either as ‘collaborative’ or ‘crowdsourced’, are evidence of the interest of the crowds for lexical descriptions; they also show that internet users can contribute in various ways to self-organized amateur lexicography projects. Another simultaneous innovation is that other disciplines, such as NLP, have started to resort to micro-tasking – an implementation of crowdsourcing, also referred to as microworking, that consists in breaking down a complex task into simpler tasks that can be performed by various workers online – for annotation projects. Professional lexicographers who consider resorting to volunteers for dictionary compiling may draw inspiration from such approaches. However, one may wonder how the crowdsourced data production or annotation experiments that took place in NLP, for which little or no prior knowledge is required, can apply to the context of lexicography. Several closely related questions arise: can the crowds be guided within an institutional framework? Which type of lexicographic project can they be involved in? Which tasks could/should they perform, and when they do, within which participatory schemes? When it comes to tasks requiring greater linguistic competence – or, at least, sensitivity – to what extent can the analysis of ‘DIY dictionaries’ give an accurate and complete picture of what amateurs are able to produce?
The Bloomsbury Handbook of Lexicography
Establishing how relevant it is to resort to the crowds and which implementation is more suitable depends on the very nature of the lexicographic project (which type of dictionary?), on its degree of completion (is it a new dictionary being compiled or an existing dictionary being updated?) and on the resources (corpora and tools) that are available for the language under study. This chapter is based on the analysis of several projects which rely on various schemes involving the crowds, either on an experimental or on a large scale. It aims at describing the ins and outs of such collaboration- or crowdsourcing-based lexicographic projects, focusing in particular on their potential, their challenges and their limitations. Section 2 tries to identify what the notions of crowdsourcing and collaboration encompass. Section 3 distinguishes the (supervised) processes that aim at dictionary writing with the crowds from the (autonomous) processes that rely on their writing by the crowds. After considering the reasons for using the crowds in the lexicographic process, where many tasks can be automated (Section 4), in Sections 5 to 7 we study three types of project that can benefit from this external help: traditional institutional projects in which volunteers are entrusted with annotation tasks that take place during the data analysis stage, projects based on field linguistics, i.e. the collection of linguistic data where amateurs are considered as informants, and open dictionary projects where users are asked to suggest additions and modifications. Finally, Section 8 addresses the ethical problems that can arise from the different implementations of the work done by the crowds.
2 What is ‘crowdsourcing’ and what is ‘collaboration’? The adjectives ‘collaborative’ and ‘crowdsourced’ are commonly used – sometimes interchangeably – to refer to projects fed by the crowds, but a distinction needs to be made between the two. A term that was made popular by Howe (2006), crowdsourcing originally referred to the outsourcing by companies of tasks to be performed by the crowds, i.e. communities of internet users. It has now become an umbrella term which encompasses several categories of methods that are used in various fields. It actually takes Estellés-Arolas et al. (2015) a 120-word-long description to provide an integrated definition of crowdsourcing, which comes after no less than forty – often divergent – definitions of the concept, all taken from the literature. The list of key ingredients underlying the concept have been summed up by Brabham (2013: 3) as follows: ‘an organization that has a task it needs performed, a community (crowd) that is willing to perform the task voluntarily, an online environment that allows the work to take place and the community to interact with the organization, and mutual benefit for the organization and the community’. The author (ibid.: XV) also attempts at giving a more concise definition: ‘[the] deliberate blend of bottom-up, open, creative process with top-down organizational goals’. Among the various crowdsourcing approaches, micro-tasking is a form of distributed work which consists in breaking down a problem that needs to be solved (or data that need annotating) into a large number of simple tasks which will then be assigned to several micro-workers; those micro-workers will receive a minimum amount of money for performing the tasks which will then be aggregated to produce the final result. This approach is based on the search for redundancy and consensus. The same task is assigned to several micro-workers, and the result is considered reliable only if the contributions converge. For some lexicographic projects, the integration of micro-tasking
344
Collaborative and Crowdsourced Lexicography
– which is already common practice in NLP – into the overall dictionary-making workflow has started being seriously considered (Čibej et al. 2015). The collaboration approach relies on interaction between several people (e.g. between contributors only or between contributors and the organizing body) who intend to achieve the same goal (even though different individual objectives might also be involved). The presence of interaction is what differentiates collaboration from micro-tasking the most. Two more differences, which are related to each other, can be mentioned: what motivates the internet user to perform a given task and how well he/she is familiar with the aim of the task (i.e. which overall project does the task fit into?). As far as micro-tasking is concerned, micro-workers seldom know what the answers they provide will be used for and their motivation is mostly a financial one. Conversely, in collaborative projects – whether they are dictionaries which are compiled outside an institutional framework, like Wiktionary, open dictionaries or instances of ‘field lexicography’ (see Section 6) – in most cases, contributors are aware of the overall intended purpose, and their only or main motivation might consist in achieving this very objective. The notions of collaboration and distributed work are by no means new; nor are they mutually exclusive. An early implementation of crowdsourcing is the reading programme of the Oxford English Dictionary (OED),1 which was launched in 1857 to collect a corpus of quotations. It recruited voluntary and paid readers who would copy contexts of occurrences for a number of words and would then send the slips of paper by post. In a more recent context, the compiling of a dictionary by lexicographers who work from a remote location and who have been assigned a number of entries to write could be considered a form of distributed work which is akin to crowdsourcing. But at the same time, in the cases when the definitions written by a given lexicographer are systematically reviewed by another, the process could be understood as collaboration. What makes the recent approaches based on the contribution of amateur crowds innovative is the number, diversity and range of skills of the individuals that are involved, as well as the ways the various participatory schemes are implemented. Which types of approach and which types of contributor are most suitable for a given task deserves further investigation, together with an analysis of the contributors’ motivation. To what extent can the study of dictionaries compiled by the crowds provide some possible answers?
3 Dictionaries written by the crowds vs dictionaries written with the crowds This section tries to determine whether dictionaries written by the crowds can shed light on the best way to write dictionaries with the crowds within an institutional framework. Even though there is an obvious link between dictionaries fed by the crowds and the process of collaborative or crowdsourced writing (since the former are the result of the latter), both dictionaries and process are of interest for two different fields: metalexicography (which focuses on the end result, i.e. the dictionary) and lexicography (which focuses on the process itself). At this point, it seems necessary to make it clear that this chapter focuses mostly on the dictionary-making process (i.e. lexicography). Metalexicographic studies are nevertheless worth mentioning since their analysis and description of dictionaries provide valuable insights into the lexicographic 345
The Bloomsbury Handbook of Lexicography
processes. Interpreting and generalizing the findings should be done with caution, though, since dictionaries written by the crowds are a complex object of study, as will be shown below. Some features of dictionaries such as Wiktionary and Urban Dictionary have been identified through quantitative and qualitative analyses. For example, Meyer and Gurevych (2012) quantitatively analysed several editions of Wiktionary and compared them to other resources. They chose to use resources available in electronic format (e.g. WordNet) in order to automate their comparisons. By characterizing the size of the various headword lists, or the correlation between the number of senses by lexical unit in the various resources, what they actually assess for Wiktionary is its capacity to act as a lexical resource for NLP, and not its value as a dictionary for humans – which it is in the first place. Qualitative studies were led by Hanks (2012) and Rundell (2017), who commented on the definitions of the English Wiktionary by analysing a limited number of examples. Even if we may think they went through a larger number of definitions than what appears in the papers, we may wonder what the size of a representative sample could be given the size of the headword list of this dictionary. Following an ‘old-fashioned approach to describing word senses’ (in particular, a large number of derivative definitions), they explain that the definitions under scrutiny are taken from dictionaries which are old enough to be copyright-free – which was also pointed out by Sajous and Hathout (2015) regarding the French Wiktionary. In the same way, Sajous et al. (2019) showed that the alternating presence/ absence of point of view in the English and the French Wiktionary is mostly due to the import of entries from existing dictionaries. In other words, Wiktionary entries might not necessarily reflect the lexicographic skills of amateurs, but more specifically the features of dictionaries from the past. Some areas of the lexicon, however, prove particularly useful for analysing the specific contributions of the crowds: (i) neologisms found in the general language and (ii) recent specialized terms, whose treatment cannot be ascribed to older sources. According to Sajous et al. (2020), the French Wiktionary can claim a better coverage of the lexicon of computer science than a commercial, general-purpose dictionary and the definitions of terms pertaining to that field are more accurate. Another study by Sajous et al. (2018) has shown how swiftly amateurs are likely to detect formal and semantic neology in Wiktionary but also in Urban Dictionary. The latter, which was originally designed as a slang dictionary and which is known to have become a virtual playground and an escape valve for some – which it actually is – sometimes turns out to be the only lexicographic resource available that includes the type of knowledge which is required to fully understand the meaning of some lexical units from a given field or subculture. There is no denying that the dictionary’s policy, which encourages contributors to express their points of view, combined with a form of editing control which does exist but can be deemed inefficient, paves the way for a large number of inside jokes and hate speeches. However, it also generates a large number of metalinguistic remarks targeting occurrences of misuse of the lexicon. Relevant analyses of some polysemous lexical units, which also include a diachronic description, can also be found. In a nutshell, as stated by Damaso (2005: 59), Urban Dictionary is both ‘a toy and a weapon’, but also ‘a tool’. Obviously, not all relevant pieces of information found in Urban Dictionary have to be recorded in an institutional dictionary, but they do show that some contributors have real analysis skills, and also bring extra information – in their own way – to more conventional lexicographic descriptions.
346
Collaborative and Crowdsourced Lexicography
Even if routine tasks involved in professional lexicography can be fruitfully performed via micro-tasking, confining the crowds to those ‘menial tasks’ might not be the only option. The clear-sightedness and linguistic intuition of contributors can also be put to good use, especially in open dictionaries or in field linguistics projects, as shown below.
4 Do lexicographers need the crowds (when they already have corpora and tools)? Rundell and Kilgarriff (2011) and Kilgarriff (cf. Chapter 7) give an overview of the tools available for corpus lexicography, of the tasks they can perform automatically and of those that can be partly automated as a support for lexicographers: the compiling, cleaning and annotating (lemmatization and POS-tagging) of corpora; the building of headword lists (word frequency counting, detection of formal neologisms); collocation calculation; lexical profiling; the visualizing and sorting of occurrences (concordancers, choice of good examples); vocabulary tagging (assigning of grammatical tags based on syntactic annotation, of field tags based on the corpus metadata), etc. Since the lexicographer seems to be relieved of the most tedious tasks thanks to automation and is ‘only’ left with the actual writing of the dictionary entries, one may wonder how relevant resorting to the crowds can be. There are in fact four main arguments in favour of involving the crowds. First, corpus lexicography relies on NLP tools which are based on machine-learning systems that use datasets – which often happen to be crowdsourced – either in the training phase or in the evaluation phase. Second, no matter how much these tools can be improved, they will never be flawless. There is noise in the input data and noise in the output data. Paradoxically enough, tools have allowed lexicographers to save some time, but, simultaneously, their ever-improving processing capacities have also exponentially increased the amount of data to be analysed: it is necessary for the results that are automatically obtained to be manually validated or invalidated. Such a lengthy and tedious process sometimes requires minimal language skills and can be accomplished, under certain conditions, by the crowds. Third, some tasks still cannot be automatically undertaken, as underlined by Rundell and Kilgarriff (2011): ‘Automated lexicography is still some way off. In particular, we have not yet reached the point where definition writing and (hardest of all) word sense disambiguation (WSD) are carried out by machines.’ Despite the studies that have been carried out since then, their remark still stands today. One may wonder if the crowds, rather than editing the output of the tools, could not simply replace them. Fourth, the automation of tasks by tools is only possible when a given language has digital corpora and tools specifically designed to process them. When there are none, corpus lexicography has to be replaced with another type of project which relies on field linguistics, for which one can appeal to crowds of informants in pioneering ways. Finally, once the dictionary compiling process is over, lexicographers can call upon the crowds for user feedback and updating advice. The three following sections describe the various stages in which the crowds can be involved, depending on the type of project and its degree of completion.
347
The Bloomsbury Handbook of Lexicography
5 Integrating the crowds into the professional lexicographic process 5.1 Crowds + NLP Within the context of a monolingual Slovenian dictionary project, Kosem et al. (2013) integrate a crowdsourcing task aimed at identifying false positives among automatically extracted collocations, or bad examples, i.e. examples where the collocations do not appear in the expected syntactic structure. The examples have been randomly drawn from a gold standard that has been designed specifically for the task and are presented to participants who need to assess their reliability. According to the authors, the experiment, which was in the experimental phase at the time, produced highly reliable results (no figures are provided). Following on, Čibej et al. (2015) consider integrating crowdsourcing into the overall workflow of lexicographic projects. They draw up a list of tasks in which the crowds could be involved and list recommendations for the development and implementation of the corresponding microtasks. All the tasks that may be crowdsourced deal with the data analysis phase, while the editorial work remains in the lexicographer’s hands. Kosem et al. (2018) take up the task of identifying false collocations, which was described above. The new experiment involves four participants, who annotate 6,590 collocations for 88 sample headwords through microtasks presented via an in-house interface. The results, which, in this study, were quantified, show an encouragingly high inter-annotator agreement, but this measurement alone is no guarantee for the quality of the results, as will be shown in Section 8.2. One issue raised by the 2013 and 2018 experiments is what a scaled-up version would be like in terms of participants. In the NLP field, the experiments carried out through microwork platforms exclusively deal with the English language. We are not aware of any large-scale language annotation experiments carried out through microwork for any other language. In the case of Kosem et al.’s (2018) experiment, the authors write that the annotation tasks they propose are ‘not very demanding, even for non-linguists’, but their annotators are students in linguistics. Kosem et al. (2013) use non-lexicographers ‘with good knowledge of a language’. All these experiments can be relevant as proof of concept but raise questions about the possibility of broader recruitment. Could this type of task also be performed by naive people? If so, is a more massive recruitment of speakers of Slovene (and, more generally, of other languages) conceivable? If not, do the authors have a sufficiently large pool of student linguists?
5.2 Crowds vs NLP Even today, many data analysis tasks remain difficult to automate using NLP tools. Two of those tasks – definition writing and WSD – are already mentioned by Rundell and Kilgarriff (2011). Two additional ones that seem even harder to undertake are (i) Word Sense Induction (WSI), a preliminary phase which consists in identifying the different meanings of a lexical unit and (ii) the detection of semantic neology. This section tries to establish whether, for performing such complex tasks, mobilizing the crowds could be an alternative to designing new algorithms. Some unsupervised clustering algorithms (in particular topic-modelling algorithms) tackle the
348
Collaborative and Crowdsourced Lexicography
task of WSI by grouping the contexts in which lexical units appear in a given corpus, but lead to some problems. On the one hand, many require one to determine a priori the number of clusters associated with each lexical unit (each cluster ultimately corresponds to a given sense). Some papers, such as Lau et al. (2012), propose solutions whose algorithm tries to find out what an appropriate level of granularity would be. On the other hand, it is very difficult to anticipate what the optimal parameterization for this type of algorithm could be, especially since the evaluation procedures are complex, as shown by the SemEval-2010 campaign (Manandhar et al. 2010). Although it is more common to replace humans by machines, using amateur crowds where algorithms perform poorly can also be considered. Microwork is suitable for simple tasks. For complex tasks, procedures that automatically break them down into simpler subtasks may be developed. Rumshisky (2011) proposes such a strategy based on micro-tasking ‘intended to imitate the work done by a lexicographer in corpus-based dictionary construction’ for WSI and WSD. With this goal in mind, she designs an iterative process that groups together occurrences deemed to have a similar meaning. The process consists in presenting micro-workers, for a given word and a target occurrence, with all the other occurrences one after the other. The microworker must determine whether the meaning of the word in context is similar to that of the target occurrence. The occurrences selected by majority vote form a cluster with the target occurrence. Not only does the proposed strategy generate a sense inventory and a sense-annotated corpus but it also provides metrics based on the inter-rater agreement/disagreement that estimate the coherence of each cluster, the typicality of an occurrence for a given cluster and the proximity between two clusters. As mentioned earlier, another task which is considered difficult to automate is the detection of semantic neology. Lau et al. (2012) suggest adapting a WSI algorithm to discover new word meanings, which they apply to the ukWaC corpus (focus corpus) and the BNC (reference corpus). Cook et al. (2013) apply this method to newswire articles taken from the Gigaword corpus and ask an experienced lexicographer to analyse the results. Even if false positives are proposed (and if it can be assumed that proven cases of semantic neology are overlooked), the evaluation shows the relevance of integrating such a system into the lexicographer’s toolbox. As far as distributional semantics is concerned, prediction models based on neural embeddings, which have recently been used for the detection of semantic neology, raise the same issues as count models based on explicit distributional vector spaces, such as those implemented by Gulordava and Baroni (2011): ‘they do not account for polysemy, and appear best-suited to identifying changes in predominant sense’ (Lau et al. 2012). Words embeddings are commonly used to detect semantic shifts between two synchronic corpora which differ in nature (e.g. different genre/domain). For instance, Fišer and Ljubešić (2018) attempt to differentiate standard and non-standard Slovenian by comparing embeddings learnt from the contemporary Gigafida and Tweeter corpora. The embeddings used to detect semantic neology – diachronic embeddings – are built in the same way as those intended to detect semantic shifts between synchronic corpora. For example, Hamilton et al. (2016) use GoogleBooks N-Grams over the 1800–1999 period, which they divide up into ten-year time periods. Regardless of the very specific nature of the Twitter and GoogleBooks corpora, detecting semantic neology requires the fulfilment of two opposite needs: (i) reaching a critical volume of data that can be exploited by neural models while (ii) limiting the texts under study to time periods that are sufficiently short (e.g. one or two years) to detect semantic shifts that are recent enough for lexicographic use. As far as GoogleBooks N-Grams are concerned, it should be noted 349
The Bloomsbury Handbook of Lexicography
that they are not released on a regular basis – the last version to be released before 2020 was the 2012 version. As with WSD and WSI, semantic neology detection is an area where the crowds may very well compete with algorithms: in the same way as Lau et al. (2012) adapt a WSI method for the detection of new meanings of lexical units, Rumshisky’s (2011) iterative method, which uses crowdsourcing to infer a sense inventory and a disambiguated corpus, could very well be adapted to the task. An estimate of the time and cost involved needs to be made, but compiling a corpus in keeping with this approach seems more feasible than compiling one for the construction of diachronic embeddings. To sum up, several interesting approaches have been proposed to undertake a number of difficult tasks: WSI, WSD and semantic neology detection. However, their implementation, whether based on automation or crowdsourcing, is still perfectible and the viability of a largescale integration into a lexicographic project remains questionable. In the meantime, turning to dictionaries entirely written by the crowds might offer new prospects: in 2012, Lau et al. gave two examples of new meanings ‘not included in many dictionaries’: ‘send a message on Tweeter’ for the verb tweet and ‘style’ for the noun swag. The new meaning of tweet (added to the OED in June 2013) was recorded in Wiktionary on 22 February 2009 (as a reminder, Tweeter was launched in 2006). Swag (n. 2) was a new entry added to the OED in January 2018, but it first appeared in the Macmillan Dictionary in August 2012 thanks to its open crowdsourced dictionary, and in Wiktionary on 8 October 2011. Fully collaborative dictionaries and crowdsourced ones tend to include formal neologisms quickly and extensively, but also to record semantic neology (Sajous et al. 2018). Whether they are automatic or crowdsourced, the methods for detecting neologisms could therefore be complemented by careful scrutiny of dictionaries such as Wiktionary and, to some extent, Urban Dictionary. A hybrid solution may be considered in the future: either by using crowdsourcing before using methods such as those developed by Lau et al. (2012) and Cook et al. (2013) (which brings us back to the above-mentioned ‘crowds + NLP’ configuration), or by automatically cross-checking data taken from dictionaries written by the crowds with those obtained by automatic corpus processing.
6 Lexicography and field linguistics 2.0 The involvement of the crowds in the compiling or updating of dictionaries as described in the previous section only holds in projects for which digital corpora and tools are available. For the lexical description of languages that have neither (e.g. Swahili or Zapotec languages), data collection must be carried out beforehand, or simultaneously if the dictionary is being published (online) while it is being created (see also Stutzman and Warfel, Chapter 17). The data collection phase is a field linguistics task that is traditionally performed by linguists/lexicographers ‘in person’, together with the informants, but that can also benefit from the use of online tools, as illustrated below. In the Kamusi project, whose objective is the production of ‘quality lexicographical data for many languages that otherwise would not or could not exist’, a set of tools that allow the breaking of lexicographical data collection into targeted microtasks were used, as described by Benjamin (2015). The microtasks make it possible to collect translations of a set of words in the
350
Collaborative and Crowdsourced Lexicography
target language, to suggest synonyms, to provide inflectional information, examples of usage, and even definitions. In the case of definitions, a term in the target language is provided with the definition of its English translation equivalent, which has been extracted from Princeton WordNet. Contributors must write a definition in their own language (which may be a translation of the WordNet definition or not). Through a game based on a point-earning system, the next contributors are encouraged to give an improved definition or to vote for an alternative definition proposed by another participant. These microtasks, which are sometimes gamified on Facebook or smartphones apps, are presented in the public interface, which has been constantly upgraded since the outset of the project. In another paper, Benjamin (2016) looks back at the initial phase of the project, which started two decades earlier: in December 1994, ‘the same week as the release of Netscape 1.0’, thirty Swahili speakers who were connected to the Internet were asked to translate English word lists into their language, with the results to be compiled into a static file shared on a Gopher server. There was an intermediary stage between the original phase and the current technological platform: an interface consisting in a form with fields for the words, their part of speech, their definition in Swahili etc. For any word, contributors could edit any field and the dictionary editor, who was notified automatically, could accept or reject the contribution, or modify an entry in turn. This was an early instance of a system implementing the principles of a wiki, which was structured as a database with centralized editorial control. Looking at how the project began reveals that a distributed linguistic work scheme was already in place, using whatever means were at hand. Whether this can be considered as early crowdsourcing or simply as a set of tools facilitating remote communication between linguists/lexicographers and informants is hard to tell. In his 2015 article, Benjamin did describe his system as crowdsourcing, but in 2016, he wrote that ‘the project has always been conceived as collaborative but controlled’ (our emphasis). This is yet another example of an alternative – or hesitant – use of the concepts of collaboration and crowdsourcing. More recently, Harrison et al. (2019) describe a project for the compiling of talking dictionaries of Zapotec languages that relies on a high level of collaboration between linguists, undergraduate students, technical experts and many Zapotec speakers, who actively participate in the design of the dictionary. The collaboration takes place both on site (in person), and remotely, via an online multimedia platform which was developed as part of the project and designed both for browsing the dictionary and for feeding it. For example, the pronunciation of words can be recorded during field surveys or lexicography workshops related to the project. The recordings can also be done remotely and uploaded onto the platform by Internet users. The headword list is established using predefined word lists, legacy sources, and existing teaching material, among other things. Additional words can be collected during thematic conversations or through photo elicitation techniques. The photos, which are extracted from crowdsourced and free naturalist sites, are also used to illustrate entries. This project is interesting for three reasons. First of all, the speakers volunteering to participate in the dictionary have the same ideological motivating force as the initiators of the project, which they themselves describe as linguistic activism. The authors insist that the methodology – and not only the final product – is central to this project, and that ‘the collaborative practices as well as the resulting resources can be interventions in contexts where discrimination and detrimental linguistic ideologies conspire to silence languages’. The issues at stake are, on the one hand, to gain 351
The Bloomsbury Handbook of Lexicography
recognition for a language and a culture and, on the other hand, to participate in a revitalization of that language. Everything is done, through collaboration, to strengthen local communities during the recording sessions: e.g. intergenerational sharing of linguistic knowledge, or special interest in diatopic variation from one village to another. There is also online bonding – the platform is linked to social media (e.g. Twitter and Facebook, where community members communicate in their own language) – which allows members of the diaspora to reconnect with members of indigenous communities. Second, the notion of prosumer, which is put forward more often in a theoretical than in an actual way, is embodied in that project in a very concrete manner by the collaborating speakers: the participants actively contribute to a dictionary that they can use, that reflects their culture and that belongs to them. It has been planned from the very beginning of the project that any kind of output will be placed under a free license and any contributing author is systematically credited. Third, just like the Kamusi project described by Benjamin, this project is based on a mix of traditional approaches to field linguistics and participatory knowledge production via crowdsourcing or collaboration. This hybrid approach shows that volunteers, depending on their motivation, can collaborate with professionals and not only work for them. It also shows that collaboration and crowdsourcing tasks – which can be used jointly – can be specifically tailored to meet the needs of a project and fit into the project’s workflow. Finally, it demonstrates that the participatory process can take place outside of wikis and that crowdsourcing can take place outside of the leading platforms.
7 From user feedback to open dictionaries User feedback did exist before the Internet era in the form of occasional postal mailings sent by users, most of the time to question the presence or absence of a word in the headword list. It is also at the heart of the notion of ‘simultaneous feedback’ developed by De Schryver and Prinsloo (2000), who believe it should take place throughout the whole dictionary writing process. Since dictionaries started going online, their users have often been invited to submit comments, in the same way as some online newspapers offer their readers the opportunity to write comments at the bottom of the articles. According to Rundell (2017), this type of feature does not aim to collect users’ linguistic knowledge, but to increase user engagement: the more time a user spends on a website, the more income it generates. Some other dictionaries encourage users to contribute in a more precise manner, for example, by submitting suggestions of words to be added to the headword list. The Macmillan Open Dictionary takes it one step further by asking contributors to submit new words or meanings and to write the corresponding definitions. Once they have been validated by the Macmillan lexicographers (provided they do not contain offensive content and there is evidence showing their use), the contributions get published, without the definitions having to be rewritten in accordance with the dictionary’s defining style. Originally designed as a separate lexicon, the crowdsourced open dictionary is now part and parcel of the Macmillan English Dictionary. Entries submitted by contributors are clearly indicated as originating from the open dictionary (the contributor’s pseudonym, location and date of submission are mentioned) but it is worth mentioning that they are accessible via the same search bar as entries from the ‘regular’ dictionary. In recently added entries (June 2020), we can find common vocabulary
352
Collaborative and Crowdsourced Lexicography
(e.g. dogsitting and misbelief, which were first recorded in the OED in November 2010 and June 2002), specialized terms (e.g. symbiont, ‘one of the two organisms involved in symbiosis’), formal neologisms (e.g. maskne ‘skin irritation and spots caused by wearing a face mask’, which appeared in Urban Dictionary in April 2020 and in Wiktionary in July 2020) or semantic neologisms related to current events (e.g. air bridge ‘a travel arrangement between two countries in which the global outbreak of a disease is under control’). This confirms the ability of the crowds to detect formal and semantic neology, but also to write definitions that are considered, if not perfect, at least acceptable.
8 Ethics The integration of crowdsourcing and collaborative participation into the lexicographic process is not systematic yet, but some significant milestones have already been set through a variety of projects. Nonetheless, several methodology-related questions remain unanswered. For example: how can the crowds be encouraged to participate in tasks that do not sound very attractive to start with? How should the data collected be assessed? In the case of micro-tasking, all these problems are closely linked and also raise the question of ethics: since there is paid labour involved and since the rationale behind micro-tasking is originally to cut the cost, it may be tempting for some to resort to predatory practices (referred to as ‘click servitude’, ‘crowdsploitation’ or ‘digital slavery’) – where should the limit be set? The lack of a national – let alone international – legal framework for online work makes it all the more necessary to reflect upon ethical issues (both from a legal and a moral perspective), even if this goes beyond a purely scientific approach. As pointed out by Bederson and Quinn (2011), it is the responsibility of the designer of the microtasks to establish good practices before any irreversible social damage is done due to wrong technological choices.
8.1 Motivation and remuneration There are many different reasons why amateurs contribute to a project. Whether these reasons are on the ideological or the utilitarian side, all contributors pursue either a common interest or an individual goal – to name but a few: pursuing a hobby, finding intellectual satisfaction, achieving fame, asserting one’s identity, reinforcing a sense of belonging to a community, producing opensource commons or else acquiring new skills. In the case of annotations carried out in the form of paid microwork, the main motivation remains money (although there might be secondary motivations). Talking about a WSI task, Rumshisky et al. (2012) state – rather bluntly – that, while restricting participation to US micro-workers (i.e. banning Indian contributors) may enhance the quality of annotations, it also requires pay increase, without which Internet users will show little or no interest in the proposed microtasks. Is it legal and desirable to discriminate potential participants on the basis of their origin (or geolocation) without even assessing their competence? For a given task, which amount can be considered fair remuneration, based on duration and the skills required? Can the ‘right’ compensation be universal or should it be indexed to the cost of living? Under which conditions should remuneration be denied to a micro-worker? 353
The Bloomsbury Handbook of Lexicography
Entertainment might yet be another motivation, especially by means of Games With A Purpose (GWAP). This consists in designing a system for collecting or annotating data in the form of an online game. Phrase Detective (Chamberlain et al. 2009), for example, is a game designed to anaphorically annotate a corpus. However, since hardly any Internet user found anaphora resolution particularly entertaining, a system of rewards in the form of vouchers sent to the highest-scoring players had to be added to the initial version (Poesio et al. 2015). According to Jurgens and Navigli (2014), most GWAPs consist of a text-based interface that makes the game look too similar to a traditional annotation task. As a consequence, the authors suggest developing video games with a graphic design close to the one gamers are familiar with. They get better results with Puzzle Racer than with a microwork platform, and at a lower cost (75 per cent less). Those results, however, must be put into perspective for two reasons. First, since participation is ensured by the recruitment of students paid by vouchers, the attractiveness of the game cannot be genuinely assessed. Second, the financial cost of developing the game is nil: it also has to do with student involvement since it was developed as part of a Java course. In comparison, the budget allocated to the salaries of the developers of Phrase Detective amounted to £60,000, with vouchers representing an additional budget of £18,000 (Poesio et al. 2015). Getting computer science students to program a GWAP is not so much of an issue. Nor is getting linguistics students to participate in an annotation project highly problematic; it is in fact quite the opposite – it can be very instructive. But how much free work can one reasonably demand from students?
8.2 Quality control: Data evaluation vs. workers’ evaluation There are several ways to evaluate the data obtained through crowdsourcing, including microworking. This section focuses on the two main methods used for the evaluation of linguistic annotations:2 (i) the comparison to a gold standard and (ii) the measuring of an inter-annotator agreement. The development of a gold standard, which is used in particular for the evaluation of machine learning systems requires manual work carried out by experts. As a consequence, it can be used on a only small scale (because of the cost of human experts), which raises the question of the representativeness of the sample thus annotated. Moreover, the dataset produced can hardly be used for any other task or any other type of data than the ones initially targeted (Kilgarriff 1997). This leaves inter-annotator agreement, which measures the degree of consensus among raters for a given annotation. There are several measures, including Cohen’s kappa, which evaluates the agreement between two annotators, and Fleiss’s kappa, which is used for a greater number of annotators.3 These measures are particularly well suited to the evaluation of annotation by crowdsourcing, which relies on annotation redundancy and consensus building. However, the fact that this measure should only be used as a negative indication is often overlooked: a set of annotations (which is evaluated as a whole) that shows low agreement has to be blamed on unreliable annotations or poorly defined annotation tasks. But the reverse is not necessarily true: high agreement only signals homogeneous annotations, not necessarily quality annotations. The agreement can also be calculated locally, for each annotated unit. Rather than trying to achieve high agreement at all costs (sometimes by distorting the annotation task), Aroyo and Welty (2013) consider that ‘annotator disagreement is not noise, but signal; it is not a problem to be overcome, rather it is a source of information’. In the case of WSD, for example, this signal can be used automatically or manually to modify the sense
354
Collaborative and Crowdsourced Lexicography
inventory. Chklovski and Mihalcea (2003) rely on web-annotators’ disagreement to detect sense inventories that might be too fine-grained and to automatically derive coarser-grained inventories from them. For the very same task, Čibej et al. (2015) suggest having a rough draft of sense division drawn up by a lexicographer before requesting the annotators to match up the occurrences with the different senses inventoried. Disagreements may ‘alert the lexicographer to an overly coarse sense division or even to an overlooked (sub)sense’. Evaluating a set of annotations is a complex task that raises methodological questions. The questions raised by the individual assessment of online micro-workers could also be considered as methodological considerations as long as what is involved is the discarding of their annotations (i.e. the fact of not using them) when, for some reason, they are deemed unreliable. When it comes to refusing to pay a micro-worker on such grounds, the question becomes an ethical issue. Problematic workers may fall into two categories: those who are under-qualified for a given task, and deliberate scammers. The first category seems simple to handle and is based on transparent requestor/worker communication. A dataset is used to test the worker at the very beginning, before the actual annotation process, and to discard him/her if he/she does not pass the test. Dealing with malicious workers, i.e. those who try to get paid as quickly as possible by providing the first answer that comes to mind, is more delicate. They may in fact provide correct answers for the initial test and then proceed to cheat. It is possible to introduce occasional questions from a gold standard throughout the annotation process, or to measure the intra-annotator agreement by presenting the same worker with the same item to be annotated several times, at various intervals, in order to test his/her annotation consistency. More often than not, the agreement score is being measured to detect workers who systematically deviate from the others. Dismissing a worker who is too often deviant, i.e. basing one’s decision on the ‘wisdom of crowds’ concept, however tempting, is quite unfair: the majority may be wrong while the individual may be right. Even though the studies led by Snow et al. (2008), which are often cited, show that an NLP system trained on the annotations of several naive annotators obtains better results in several semantic tasks than a system trained on those of a single expert, the findings of Murray and Green (2004), who show that inter-rater agreement is correlated with a homogeneous – and not a high – level of competence among annotators should not be overlooked. Adding the annotations produced by an expert (which are supposed to be quality ones) to those produced by a group of naive people causes the agreement to drop (although, one can imagine this does increase the overall quality of the annotations). So, if, for some reason, it suddenly occurred to a professional lexicographer to participate in a WSD task via a micro-working platform, he/she would potentially be detected as a spam worker and would thus be denied payment. Obviously, such a worst-case scenario is not meant to question the need to detect fraudulent behaviour, nor the need for procedures proposed to reach that goal. More specifically, it aims to make the case, first and foremost, of the necessary human supervision of decision algorithms.
9 Conclusion In Chapter 7, Adam Kilgarriff writes: ‘Quite what the role of lexicographer will be, in ten years’ time, is far from clear, but I am confident that the role of the corpus will grow, with the
355
The Bloomsbury Handbook of Lexicography
line between dictionary and corpus blurring, and the lexicographer operating at the interface.’ The year 2003 was the year Wiktionary was launched (three years after Urban Dictionary) and coincides with the emergence of crowdsourcing. What has happened since then in lexicography regarding the corpus/dictionary continuum and the changing role of the lexicographer has proved him right. Another significant change in the lexicographic process is definitely crowdsourcing and collaborative publishing, which Rundell (2017) sees as an opportunity rather than a threat for professional lexicography. He clearly states that it would be ‘foolish to ignore [their] potential’: with the right guidance, amateurs can make significant contributions to the design of dictionaries. In his vision of lexicography (which is compatible with Kilgarriff’s), the dictionary-making process can be thought of as the division of labour between three participants: lexicographers, machines and volunteer amateurs. Since they each have different assets, the challenge is to find the most efficient configuration for each task to be performed. Some of the configurations that have already been tried out have been listed in this chapter. In the context of a corpus lexicography project, micro-working may be used either together with NLP tools, or instead of them. In the context of what could be named ‘field lexicography’ collaborative and/ or crowdsourced platforms allow online contributors, considered as informants, to participate in lexical acquisition tasks, or to perform more complex tasks such as the writing of definitions. The latter approach allows the compiling of dictionaries for languages with few or no corpora and tools, which would not have been created otherwise. Finally, open dictionaries provide a wide range of additional knowledge overlooked by traditional dictionaries (linguistic knowledge such as regional variations or encyclopaedic knowledge related to specialized fields or subcultures) which allows them to increase their coverage and their receptiveness to lexical innovations. In addition to the necessary optimization of the distribution of the tasks among computer systems, naive people and lexicographers described by Rundell, the desired efficiency probably also depends on the mutual satisfaction of volunteer workers, publishing houses and dictionary users. Whether crowdsourcing and collaborative knowledge production are about to become the next ‘revolution’ in lexicography is hard to tell at this point. In the near future, publishers may see it as yet a new way to cut the production costs. May such renewed processes also leave room for further innovation by lexicographers and allow users to gain access to ever-improving dictionaries.
Notes 1 This type of parallel is refuted by Brabham (2013: 9–10) on the grounds that ‘crowdsourcing is not old […] it is a new phenomenon that relies on the technology of the Internet.’ His only argument to justify his viewpoint is the fact that the Internet ‘make[s] crowdsourcing qualitatively different from the open problem-solving and collaborative production processes of yesteryear’. 2 The task-based evaluation of the impact of data on the performance of a system is beyond the scope of this chapter, as it only indirectly – and not intrinsically – evaluates the quality of input data. 3 The theoretical underpinnings and methodological issues raised by these measures are beyond the scope of this chapter. See, for example, Artstein and Poesio (2008) for more details.
356
Collaborative and Crowdsourced Lexicography
References Aroyo, L. and C. Welty (2013), ‘Harnessing disagreement in crowdsourcing a relation extraction gold standard’, Technical Report RC25371 (WAT1304-058), IBM Research 4. Artstein, R. and M. Poesio (2008), ‘Inter-coder agreement for computational linguistics’, Computational Linguistics, 34 (4), 555–6. Bederson, B.B. and A.J. Quinn (2011), ‘Web workers unite! Addressing challenges of online laborers’, Proceedings of the International Conference on Human Factors in Computing Systems, Vancouver, 97–106. Benjamin, M. (2015), ‘Crowdsourcing microdata for cost-effective and reliable lexicography’, Proceedings of the 9th ASIALEX Conference, Hong Kong. Benjamin, M. (2016), ‘Lexicography without lexicographers: Crowdsourcing and the compilation of a multilingual dictionary’. Available at https://www.elexicography.eu/wp-content/uploads/2016/03/ Benjamin_lexicography-without-lexicographers.pdf. Berg, D., G. Gönnet and F. Tompa (1988), ‘The New Oxford English Dictionary Project at the University of Waterloo’. Technical Report OED-88-01, Centre for the New Oxford English Dictionary, University of Waterloo. Brabham, D.C. (2013), Crowdsourcing, Cambridge, MA: MIT Press. Chamberlain, J., U. Kruschwitz and M. Poesio (2009), ‘Constructing an anaphorically annotated corpus with non-experts: Assessing the quality of collaborative annotations’, Proceedings of the ACL-IJCNLP Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, Singapore, 57–62. Chklovski, T. and R. Mihalcea (2003), ‘Exploiting agreement and disagreement of human annotators for Word Sense Disambiguation’, Proceedings of RANLP 2003, Borovets. Čibej, J., D. Fišer and I. Kosem (2015), ‘The role of crowdsourcing in lexicography’, Proceedings of eLex 2015, Herstmonceux, 70–83. Cook, P., J.H. Lau, M. Rundell, D. McCarthy and T, Baldwin (2013), ‘A lexicographic appraisal of an automatic approach for detecting new word-senses’, Proceedings of eLex 2013, Tallinn, 49–65. Damaso, J. (2005), The new Populist Dictionary: A Computer-mediated, Ethnographic Case Study of an Online, Collaboratively-authored English Slang Dictionary, MA dissertation, Queen Mary, University of London. De Schryver, G.-M. and D.J. Prinsloo (2000), ‘Dictionary-making process with ‘simultaneous feedback’ from the target users to the compilers’, Proceedings EURALEX 2000, Stuttgart, 197–209. Estellés-Arolas, E., R. Navarro-Giner and F. González-Ladrón-de Guevara (2015), ‘Crowdsourcing fundamentals: Definition and typology’ in F.J. Garrigos-Simon, I. Gil-Pechuán and S. Estelles-Miguel (eds), Advances in Crowdsourcing, New York: Springer International Publishing, 33–48. Fišer, D. and N. Ljubešić (2018), ‘Distributional modelling for semantic shift detection’, International Journal of Lexicography, 32 (2), 163–83. Gulordava, K. and M. Baroni (2011), ‘A distributional similarity approach to the detection of semantic change in the Google books ngram corpus’, Proceedings of the GEMS 2011 Workshop, Edinburgh, 67–71. Hamilton, W.L., J. Leskovec and D. Jurafsky (2016), ‘Diachronic word embeddings reveal statistical laws of semantic change’, Proceedings of the 54th Annual Meeting of the ACL, Berlin, 1489–501. Hanks, P. (2012), ‘Corpus evidence and electronic lexicography’ in S. Granger and M. Paquot (eds), Electronic Lexicography, Oxford: Oxford University Press, 57–82. Harrison, K.D., B.D. Lillehaugen, J. Fahringer and F.H. Lopez (2019), ‘Zapotec language activism and talking dictionaries’, Proceedings of eLex 2019, Sintra, 31–50. Howe, J. (2006), ‘The rise of crowdsourcing’, Wired, 14.06. Jurgens, D. and R. Navigli (2014), ‘It’s all fun and games until someone annotates: Video games with a purpose for linguistic annotation’, Transactions of the ACL 2, 449–64. Kilgarriff, A. (1997), ‘I don’t believe in word senses’, Computers and the Humanities, 31 (2), 91–113.
357
The Bloomsbury Handbook of Lexicography
Kilgarriff, A. (2005), ‘If dictionaries are free, who will buy them?’, Kernerman Dictionary News, 13, 17–19. Kosem, I., P. Gantar and S. Krek (2013), ‘Automation of lexicographic work: An opportunity for both lexicographers and crowdsourcing’, Proceedings of Elex 2013, Tallin, 32–48. Kosem, I., S. Krek, P. Gantar, Š.A. Holdt, J. Čibej and C. Laskowski (2018), ‘Collocations Dictionary of Modern Slovene’, Proceedings of the 18th EURALEX Congress, Ljubljana, 989–97. Lau, J. H., P. Cook, D. McCarthy, D. Newman and T. Baldwin (2012), ‘Word sense induction for novel sense detection’, Proceedings of the 13th EACL Conference, Avignon, 591–601. Manandhar, S., I.P. Klapaftis, D. Dligach and S.S. Pradhan (2010), ‘SemEval-2010 Task Word 14: Word sense induction & disambiguation’, Proceedings of the 5th Workshop on Semantic Evaluation, Los Angeles, 63–8. Meyer, C. M. and I. Gurevych (2012), ‘Wiktionary: A new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography’ in S. Granger and M. Paquot (eds), Electronic Lexicography, Oxford: Oxford University Press, 259–91. Murray, G. C. and R. Green (2004), ‘Lexical knowledge and human disagreement on a WSD task’, Computer Speech & Language, 18 (3), 209–22. Nagao, M., J. Tsujii, Y. Ueda and M. Takiyama (1980), ‘An attempt to computerize dictionary data bases’, Proceedings of COLING 1980, Tokyo, 534–42. Nesi, H. (2008), ‘Dictionaries in electronic form’ in A.P. Cowie (ed.), The Oxford History of English Lexicography, Oxford: Oxford University Press, 458–78. Poesio, M., J. Chamberlain, U. Kruschwitz, L. Robaldo and D. Luca (2015), ‘Phrase detectives: Utilizing collective intelligence for internet-scale language resource creation’, Proceedings of IJCAI 2015, Buenos Aires, 4202–6. Rumshisky, A. (2011), ‘Crowdsourcing word sense definition’, Proceedings of the Fifth Linguistic Annotation Workshop, Portland, 74–81. Rumshisky, A., N. Botchan, S. Kushkuley and J. Pustejovsky (2012), ‘Word sense inventories by nonexperts’, Proceedings of the 8th LREC Conference, Istanbul, 4055–9. Rundell, M. (2014), ‘Macmillan English Dictionary: The end of print?’, Slovenščina 2.0, 2 (2), 1–14. Rundell, M. (2017), ‘Dictionaries and crowdsourcing, wikis, and user-generated content’ in P. Hanks and G.-M. de Schryver (eds), International Handbook of Modern Lexis and Lexicography, Berlin, Heidelberg: Springer. Rundell, M. and A. Kilgarriff (2011), ‘Automating the creation of dictionaries: Where will it all end?’ in F. Meunier, S. De Cock, G. Gilquin and M. Paquot (eds), A Taste for Corpora. In honour of Sylviane Granger, Amsterdam: John Benjamins, 257–82. Rundell, M. and P. Stock (1992), ‘The corpus revolution’, English Today, 30, 9–14. Sajous, F. and N. Hathout (2015), ‘GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary’, Proceedings of eLex 2015, Herstmonceux, 405–6. Sajous, F., N. Hathout and A. Josselin-Leray (2019), ‘Du vin et devin dans le Wiktionnaire: neutralité de point de vue ou neutralité et point de vue ?’, Études de linguistique appliquée, 194 (2), 147–64. Sajous, F., A. Josselin-Leray and N. Hathout (2018), ‘The complementarity of crowdsourced dictionaries and professional dictionaries viewed through the filter of neology’, Lexis 12. Sajous, F., A. Josselin-Leray and N. Hathout (2020), ‘Les domaines de spécialité dans les dictionnaires généraux: le lexique de l’informatique analysé par les foules et par les professionnels … de la lexicographie’, Neologica 14, 83–107. Snow, R., B. O’Connor, D. Jurafsky and A.Y. Ng (2008), ‘Cheap and fast – but is it good?: Evaluating non-expert annotations for natural language tasks’, Proceedings of the 8th EMNLP Conference, Morristown, 254–63. Trap-Jensen, L. (2018), ‘Lexicography between NLP and linguistics: Aspects of theory and practice’, Proceedings of EURALEX 2018, Ljubljana, 25–37.
358
PA RT I I I New directions in lexicography
360
21
Theoretical, technological and financial challenges: Some reflections for making online dictionaries Pedro A. Fuertes-Olivera
1 Introduction: Background and definitions The present chapter reflects on the future of dictionaries and analyses some of the challenges lexicography faces in connection with the expansion of the internet. This has resulted in the frequent use of the term e-lexicography with various meanings. For instance, discussions concerned with the term printed online dictionaries indicate that e-lexicography is even used when discussing printed dictionaries that have simply been uploaded onto the internet. I define the term e-lexicography as the branch of lexicography concerned with the theory and practice of online dictionaries, i.e. dictionaries, encyclopaedias, lexica, glossaries, vocabularies, terminological knowledge bases and other information tools covering areas of knowledge and its corresponding language that can be accessed through the internet. Such a definition indicates that it is relevant to note that the opposition between online dictionaries and paper dictionaries should not be regarded as absolute, since the transition from paper to online is a process controlled by theoretical, technological and financial considerations. These imply the existence of challenges that will be the object of study of this chapter. The redaction of this chapter needs several adjustments in order to comply with the above definition of the term e-lexicography. One of the adjustments consists in using the term online dictionary for referring to any type of information tool, be they dictionaries, knowledge databases, lexicons, glossaries, etc., that contains (or aims to contain) collections of structured data and are accessed through the internet. The second adjustment defends broadening discussions on the challenges the internet poses for dictionary making. These must go beyond a kind-of microlexicographic analysis, i.e. an analysis that is restricted to traditional lexicographic topics such as user’s needs, language issues, etymological considerations, the structure of the article and so on. Instead, it must face the challenges in a more global and abstract way, thus imitating what has happened in other spheres of human life. For instance, discussions on e-commerce were initially concerned with technical aspects for making payment secure. They are now centred on the changes this new concept has imposed on human behaviour, as we can now acquire goods and services from far away locations, which has modified the concept of customer and his or her relationship with retailers.
The Bloomsbury Handbook of Lexicography
Both adjustments indicate that the internet is a technology that represents more than a new lexicographic medium within the field of e-lexicography. To the best of my knowledge, most discussions on e-lexicography have usually been limited to issues connected with a range of natural language processing approaches and tools that are available to augment accessibility and usefulness. My view is that these approaches are short-sighted as they do not pay attention to the relationship between the technology (the internet), humans (the dictionary users and the dictionary makers), lexicographic developments and economic costs (any human activity has benefits and costs). The analysis of this interrelationship is the main aim of this chapter, structured into three more sections and a conclusion. Section 2 focuses on theoretical challenges, which are connected with the theory or theories used for conceptualizing and making online dictionaries. Section 3 deals with technology, especially with platforms and ‘gadgets’ that are needed for making online dictionaries. Section 4 analyses resources, especially financial ones, and offers a way forward for the future of lexicography. For the sake of simplicity, the implication of these challenges will be illustrated with reference to a lexicographic project under construction (Fuertes-Olivera et al., in press). This project, which is provisionally named Diccionarios Valladolid-UVa, may lead to different lexicographic products and services, thus opening a road which might be trodden in future projects (Fuertes-Olivera and Tarp, in press). To sum up, this chapter broadens the concept of lexicographic challenges the internet poses for lexicography by incorporating the whole human factor into the discussion. The human factor is here illustrated in terms of the distinction between contemplative and transformative lexicography, i.e. contemplative meaning the practice of analysing existing dictionaries and questioning users about their use of existing dictionaries, with transformative meaning performing analysis concerned with market exploitation as we must accept that lexicographers need to search out business models that allow them to make dictionaries profitable. Within this debate, this chapter adopts a transformative stance and makes a fundamental claim: The development of online dictionaries is dependent on finding business models with which lexicographers can obtain funds for creating, making and updating online information tools (Simonsen 2017; Fuertes-Olivera 2019; Fuertes-Olivera and Tarp in press). The validity of the above stance can be put into practice by taking costs on board and making a sound use of available devices, but must be always preceded by the lexicographer’s analysis that aims to characterize and typologize dictionaries in order to establish a basis upon which the corresponding lexicographic solution(s) can be found and developed. This idea accords with the main tenets of the function theory of lexicography, which is the theoretical base upon which this chapter studies the continuous challenge of applying new technology to dictionary making (Fuertes-Olivera and Tarp 2014, in press; Fuertes-Olivera 2016, 2018).
2 Theoretical challenges Fuertes-Olivera (2016) claims that there are three main approaches currently being used for designing and compiling newly conceived specialized online dictionaries. One of these, which basically operates under the tenets of Natural Language Processing and Artificial Intelligence,
362
Reflections for Making Online Dictionaries
aims to construct mined dictionaries, i.e. ones that are constructed automatically by mining the web. Ye et al. (2012), for instance, give an account of their cross-language association dictionary (CLAD), which is mined from Wikipedia. They maintain that this ‘is different from the traditional bilingual dictionary in its ability to expand the word associations from the semantic perspective. In the mined CLAD, the words associated to a given word are not limited to direct translation in other languages, but also include related words in other languages’ (Ye et al. 2012: 2474–5). The main challenge facing proponents of this approach is solving the problem of linguistic anisomorphism from an operational point of view. Meaning, synonymy, phraseology, etc. are slippery concepts and, to the best of my knowledge, human intervention and detailed study are needed for working with them in lexicographic projects. A solution to this challenge may be accomplished by dividing the project into two stages. In stage 1, the web can be mined for extracting a prototype of a mined dictionary. In a second stage, human lexicographers must work with the mined dictionary and solve the problems they encounter, especially those connected with linguistic anisomorphism. The second approach aims at designing and constructing online dictionaries under the tenets of linguistic theories and knowledge engineering, a buzzword that seems to refer to the set of activities mainly concerned with building, maintaining and developing knowledge-based management systems, usually through (semi-)automated methods (see Clark et al. 2012: 555–8 for a review). This line of work is attracting a lot of interest and is being publicly financed, e.g. the European Union is financing ELEXIS, a project that aims to develop an infrastructure with five main objectives: 1. foster cooperation and knowledge exchange between different research communities in lexicography in order to bridge the gap between lesser-resourced languages and those with advanced e-lexicographic experience; 2. establish common standards and solutions for the development of lexicographic resources; 3. develop strategies, tools and standards for extracting, structuring and linking of lexicographic resources; 4. enable access to standards, methods, lexicographic data and tools for scientific communities, industries and other stakeholders; 5. promote an open access culture in lexicography, in line with the European Commission Recommendation on access to and preservation of scientific information. (European Lexicographic Infrastructure: https://cordis.europa.eu/project/id/731015) The main challenge of this approach is operational, i.e. lexicographers using this approach must convert their ideas into real open access dictionaries, especially because they are using public funds. The existence of replicated online dictionaries illustrates the concept of ‘operational challenge’ I am using in this chapter. Replicated online dictionaries constitute a very heterogeneous category that is integrated by dictionary projects that reproduce (or aim to reproduce) lexicographic practices and methods taken from dictionaries without questioning their adequacy for their specific lexicographic project. This means that whenever a new online dictionary project is initiated, its compilers must start by analysing whether methods, practices and concepts used in other online dictionaries will 363
The Bloomsbury Handbook of Lexicography
also serve them. For the sake of simplicity and for space reasons, I will show the working of this challenge in the context of making specialized online dictionaries, i.e. dictionaries covering areas outside general cultural knowledge and general language (Fuertes-Olivera and Tarp 2014) that are currently presented as prototypes, i.e. dictionary projects that are still on the drawing board. The number of online specialized dictionary prototypes described in the literature is vast and covers a large number of specialized fields. They go from prototypes for assisting in the production of scientific articles (Alonso et al. 2011) to prototypes for assisting in both communicative and cognitive use situations in a specific domain (Fernández and Faber 2011). The former are usually identified as dictionary prototypes, whereas the latter tend to adopt fancier names such as data banks or terminological databases. As they have not resulted in operational online dictionaries yet, we cannot offer a detailed analysis of their merits. However, a critical review of the publications that explain the lexicographic concepts underlying these project types indicates that they are assuming lexicographic methods and concepts that do not agree with the nature of the data to be included, which makes their compilers adopt decisions that are not bound to result in making real specialized online dictionaries. These prototypes assume that they can treat terms as if they were words, and thus claim that the compilation of specialized online dictionaries can be accomplished under the tenets of a particular linguistic theory, e.g. cognitive linguistics, that the terms they need for making dictionaries can be spotted in specialized corpora, usually by performing keyness analyses, and that their meanings, uses and usages can be identified through a detailed analysis of the concordances retrieved with software such as WordSmith Tools (Scott 2020). Furthermore, relying on corpus data, especially on automatic terminological extraction, for selecting terms (at least many of them) as well as their synonyms and antonyms, writing definitions of terms, and preparing usage notes, especially in culture-bound subject fields, is almost impossible at the current state of knowledge and innovation (see Fuertes-Olivera 2012 for a review of the role of corpus in specialized lexicography). For instance, small case ‘a’ is used in Spanish accounting texts to indicate that the accompanying account is booked (i.e. recognized) in the credit side in a system of double-entry bookkeeping. This meaning could not be discovered by performing a concordance analysis of ‘a’ within a reasonable time span. Similarly, the lemmas included as example (1), which are a selection of Spanish International Accounting Standard (IAS)/ International Financial Reporting Standard (IFRS) terms cannot be extracted from a corpus as this concept is defined in Corpus Linguistics. Nor could a linguistically conceived terminological corpus have been used for selecting the 3,000-odd 4-plus-orthographic accounting terms included in Spanish accounting terminology, or for explaining the meaning differences observed in more than 2,000 accounting terms with contiguous meanings, some of them being very specialized, whereas some others are popularized with or without idiosyncratic culture-bound meanings (example 2). plan de opciones sobre acciones para empleados transacción con pagos basados en acciones liquidada en efectivo formato de la cuenta de pérdidas y ganacias costes de transacción atribuibles a un activo o pasivo financiero patrimonio neto atribuible a los tenedores de instrumentos de patrimonio neto de la domonante
364
Reflections for Making Online Dictionaries
Example 1. 7-plus-orthographic words lemmatized in the Accounting Dictionaries (Fuertes-Olivera et al. 2012)
Example 2. Meanings of reembolso in the Accounting Dictionaries (Fuertes-Olivera et al. 2012)
reembolso (three meanings): 1. A reembolso is the regular repayment of instalments and interest on a loan (popularized culture-independent meaning. English amortization). 2. A reembolso is the redemption of a debt or obligation (specialized meaning. English repayment). 3. A reembolso is payment received or to be received as compensation for expenditure (popularized culture-bound meaning; English reimbursement). The third approach claims that the design and construction of online dictionaries is a cooperative task in which lexicographers, experts, e.g. IT experts, and (private) companies participate. This approach espouses the view that the design and construction of online dictionaries is a lexicographic task, which both influences and is influenced by innovative and profit-oriented activities. In other words, the design and making of online dictionaries needs to make compatible the nature of lexicography as a reference science with the costs and benefits associated with innovation (i.e. technology, Section 3) and business activities (Section 4). Under this approach, online dictionaries are tools that are the product of applying a lexicographic theory to dictionary-making. Fuertes-Olivera and Tarp (2014), for instance, claim that a lexicographic theory for making dictionaries is more than an ontological requirement in the era of the internet: it is a necessity for understanding the continuing challenge of applying new technology to this process. This necessity is explained as an intellectual and social challenge as it is mostly concerned with locating the options the internet offers within a theoretical framework that accepts the existence of an object of study (an information tool), which is rooted in the form of concepts, categories, theories and assumptions, has a (proto-) history, contains independent methodological contributions and offers directions for practical actions (Tarp 2012; see also Piotrowski, Chapter 16). At an abstract level, the theoretical framework envisaged defends the position that the very essence of lexicography is its capacity to provide quick and easy access to dictionary data from which information needed by different types of users in different types of social situations can be retrieved. During the last decade, the number of publications based on the above-mentioned relationship among data, access routes and user’s needs has grown considerably. For some of these publications, e.g. Fuertes-Olivera and Bergenholtz (2011), the relationship among data, access routes and user’s needs must be understood as a change of paradigm in lexicography (Leroyer 2011), which is based on two related assumptions, i.e. on two related intellectual and social challenges in the context of this chapter. 365
The Bloomsbury Handbook of Lexicography
The first assumption is that lexicography is an independent science within which one or more theories are possible. For example, the function theory of lexicography advocates that the core of lexicography is the design of utility tools that can be accessed and consulted easily with a view to meeting immediate information needs occurring for specific types of users in specific types of extra-lexicographic situations (Fuertes-Olivera and Tarp 2014). The second assumption is that lexicography is not isolated from the rest of the world, i.e. lexicography has relationships with many disciplines, this relationship being determined by the degree of knowledge needed for making a specific dictionary. The translation of both ideas to dictionary making in the era of the internet needs an understanding of challenges associated with technology (Section 3) and economic know-how (Section 4).
3 Technology Wikipedia defines technology as ‘the sum of techniques, skills, methods, and processes used in the production of goods and services or in the accomplishment of objectives, such as scientific investigation’. Following this broad definition, this section analyses the technological challenges connected with the design and making of online dictionaries which aim to offer dynamic articles with dynamic data, i.e. articles and data that can be adapted to the specific types of information needs which specific types of users performing specific types of lexicographically relevant activities might have in any consultation situation. The first challenge deals with the selection of the Dictionary Writing System (DWS), i.e. the software for writing and storing lexicographic data with the basic aim of allowing lexicographers and publishers to produce and manage reference tools (Abel 2012). Kilgarriff (2006: 7) claims that it basically consists of an editor, a database, a Web interface, various management tools and a kind of dictionary grammar which specifies the structure of the dictionary. Although there are off-the-shelf DWSs (i.e. commercially available systems), this chapter defends the design and creation of an in-house system, i.e. a system specifically designed and created for a particular dictionary project. Fuertes-Olivera (2019), for example, describes the main characteristics and challenges associated with the design of the DWS which has been created for constructing the Diccionarios Valladolid-UVa (Fuertes-Olivera et al., in press), a dictionary project based theoretically on the function theory of lexicography (see Section 2, above) and financially on market-oriented policies (Section 4). For reasons of space and readership, I will only refer to some of the lexicographic implications underlying this in-house DWS, leaving aside more technical aspects such as the mark-up language used and so on (see Abel 2012 for a review of the main characteristics of dictionary writing systems). The first implication is that this DWS consists of an editing tool, a database and a set of administrative tools. The editing tool is prepared for creating and editing dictionary entries and contains up to sixty-one different slots for describing the meaning and use of each lemma. All the slots are at the same time independent and integrated. As will be shown in Section 4, this is a defining factor of the Diccionarios Valladolid-UVa project. It allows lexicographers to use the data stored in each slot in different ways, it being possible to select them for making
366
Reflections for Making Online Dictionaries
different types of online dictionaries or different information tools, i.e. for making use of dynamic dictionary articles (see Section 4, below). Some of the slots display roll-down menus of predefined categories, e.g. ‘formal’, ‘informal’, ‘neutral’, whereas some others allow lexicographers to enter their text in a creative way, i.e. without using predefined categories. For instance, the roll-down menu ‘inflections’ includes a list of articles and gender and number inflections that describe the gender and number of the Spanish and English lemmas, the languages described in the Diccionarios Valladolid-UVa. Roll-down menus are very useful because they speed the storing process and eliminate the possibility of making spelling mistakes. Hence, a technological challenge connects the dictionary project with the design of the DWS: it must be decided which roll-down menus can be created, how they can be done, and the option(s) for adding more menus, e.g. for including ‘diamesic variation’, i.e. variation in language across medium of communication, ‘diastratic variation’, i.e. variation in language across social groups and so on. The ‘creative slots’ are divided into two main categories. One of the categories does not have space constraints. This means that lexicographers can write in them as much as they wish. For example, the slots for ‘definitions’ and ‘examples’ do not have space constraints. Some other slots have space constraints, i.e. the length of the text written in them is limited, for example, the slot for including ‘grammar notes’. Space is very valuable, both economically and operationally, and it is therefore a challenge to decide which of the slots can be subjected to space constraints and which not. Although the editing tool is divided into several parts, each for each component of the different languages studied and stored, it is an integrated one. This means that it allows lexicographers to move data among the different parts easily. Hence, it is another challenge to design an integrated editing tool, as such a tool facilitates work, saves time and money and makes the lexicographic work more uniform. For instance, the data describing Spanish lemmas can be used for creating a monolingual dictionary or a bilingual one. Second, an integrated editing tool will allow the editor of the project to decide which of the monolingual lexicographic data can be moved to the bilingual one and how this movement can be done guaranteeing that the data are adequate for the new dictionary. The option used in the DWS of the Diccionarios Valladolid-UVa is to restrict this possibility to the editor of the project (he is the only one who can do the movement) and to equip this functionality with a function button which changes colour depending on the ‘status’ of the word. For instance, the Spanish equivalents of English lemmas headline and elect are titular and elegido. Figures 21.1 and 21.2 show titular in green and elegido in orange. This means that the Spanish equivalent titular was moved to the Spanish part as a lemma whereas elegido was not (elegido is usually recognized as the past participle of the verb elegir, which is the lemma in the Spanish part of the DWS). In other words, the ‘green button’ indicates that the equivalent is also a lemma in the integrated DWS whereas the ‘orange button’ shows that this is only an equivalent in the DWS.
Figure 21.1 Green colour for the equivalent in the DWS of the Diccionarios Valladolid-UVa. 367
The Bloomsbury Handbook of Lexicography
Figure 21.2 Orange colour for the equivalent in the DWS of the Diccionarios Valladolid-UVa. Such a system is an easy solution to an important challenge, which is to decide which ‘equivalent’ can be also a ‘lemma’ in the dictionary project and which specific data types are associated with the movement. For instance, moving the equivalent titular to lemma status does not imply the movement of the definition, as we have observed that definitions cannot be automatically copied; instead they must be studied separately and adapted to the specificities of the language, e.g. to the existence of language anisomorphism. The database is a storage system, which is used for storing the text entered and edited in the editing tool. In the DWS of the Diccionarios Valladolid-UVa, the database allows lexicographers to run different searches, most of them aiming at answering research questions, as we are interested not only in designing and compiling commercial dictionaries but also in explaining the decisions taken. The query language used by this DWS can be used for filtering the text with the aim of, say, finding relevant lexicographic questions. For instance, Fuertes-Olivera (2019) shows that lexicographers can search for all the Spanish lemmas ending in ‘-se’, i.e. they are typically reflexive verbs. In an English–Spanish/Spanish–English dictionary, these verbs merit special attention as their potential equivalents may demand very different solutions. For instance, Spanish expressions such as se estropeó el coche (Eng: the car breaks down), se venden pisos (Eng: flats are sold) and se me estropearon los zapatos con la lluvia (Eng: the rain ruins my shoes) are all structurally similar but their English translations are very different. The car breaks down is an intransitive sentence, the rain ruins my shoes is a transitive sentence and flats are sold is a passive sentence. To sum up, the use of query language to search for specific situations facilitates the lexicographic work as we can easily prepare a kind of ‘lexicographic guide’ for achieving systematicity and saving time. Although the database contains as much lexicographic data as possible, the DWS is equipped with technologies for retrieving as little as necessary, which are based on the philosophy less is more. This philosophy can be put into practice due to two main decisions, one technical and one lexicographical. The technical decision is common practice in today’s lexicographic work. This database is based ‘on a server-client architecture: lexicographers work on computers that are connected to a server where all changes are stored centrally’ (Abel, 2012: 93). This allows lexicographers to work via the Internet from different locations. The lexicographic decision is our commitment to writing dynamic online dictionaries, i.e. users have at their disposal prefixed options, selected by the editor of the project and offered to users thanks to a combination of technological and lexicographical know-how. This know-how is manifested in decisions concerned with the search systems and the interface of the dictionaries. In sum, one key technological challenge is to work with a DWS that is well-suited for a particular dictionary project, e.g. for storing dynamic lexicographic data. 368
Reflections for Making Online Dictionaries
Dynamic lexicographic data, i.e. data that can be accessed in different forms, usages, situations and so on, is associated with technologies that allow users to recreate and re-represent their own dictionary data. Bothma (2011:71), for example, discusses some information technologies that make visible the relationship between the technology already available with data presentation, expected costs and the satisfaction of users’ needs. He, however, approaches this option critically when he assumes that the use of existing (and for the same reason future) ‘technologies should not be simply because the technologies exist; they should only be adopted if they bring a higher level of efficiency to the dictionary and enhance the user experience with the e-dictionary, that is, it allows the user to satisfy his/her information needs more effectively and more efficiently’. Bothma’s review of current existing information technologies indicates that some of them are already incorporated in online dictionaries, the ones identified as ‘Model T Fords’ (Tarp 2011: 60–1), i.e. online dictionaries that have gone beyond traditional boundaries and are making use of existing technologies in order to provide quicker data access, and are adapting the dictionary articles to the various functions displayed by the dictionary. His analysis shows that information technologies allow for the personalization of information presented to the user by means of filtering and adaptive technologies based on the user’s profile. Among the information technologies he cites that are being used or considered for use in online dictionaries, this chapter highlights some of them, either those that are already in use, or those that might be used in the near future depending on how fast the intellectual and social challenges described in this chapter cope with them: ●●
●●
●●
●●
Searching: Searching is the exploration of a defined information space with a defined objective and search strategy; for instance, the function-based/situation search functionalities in the Accounting Dictionaries (Fuertes-Olivera and Niño Amo 2018). Navigating: Navigating is ‘the exploration of a defined or undefined information space without using a defined strategy’ (Bothma 2011: 81). Navigation is a very common way of moving between discrete online information entities and many dictionaries allow this functionality, which is usually presented as a kind of table of contents at the beginning of a dictionary article, as in Wikipedia. User profiling/modelling: User profiling or modelling is the information technology that retrieves customized data based on a user’s profile that has been constructed through the user supplying the system with specific data, by the system tracking user behaviour, or a combination of both, as observed in smart phones. Filtering: Filtering is the information technology that uses filters for allowing users to select the amount and type of dictionary data retrieved. Filtering can be user-controlled or system-controlled, depending on whether the filter is based on the choices indicated by the user or on his or her previous searches. This information technology is currently being used in several dictionary projects, for example, the function-based/use situation-based filter in the Accounting Dictionaries (Fuertes-Olivera and Niño Amo 2018). Spohr (2011: 114) also offers an indication of how this technology can work in his dictionary project, which is described as modular, i.e. it has a multilayer architecture, ‘with the lexicographic data model at the top, followed by the lexicographic data and the access and presentation model at the bottom’. The lexicographic data model contains classes and descriptive instances and the properties and relations used to connect them. The lexicographic data layer contains
369
The Bloomsbury Handbook of Lexicography
●●
●●
●●
370
actual lexical items that are described by means of properties and relations. Finally, the ‘access and presentation layer defines which of the entities (both classes, properties and instances) are relevant to which users in which situations’. Adaptive hypermedia: Adaptive hypermedia, which is subdivided into adaptive presentation and adaptive navigation support (Bothma 2011:88), is the information technology that tailors what the user sees to his or her interests, goals, abilities, etc. Adaptive hypermedia is therefore concerned with the system’s ability to guide users into their specific search objectives. This technology allows the presentation of data tailored to a user’s needs. The use of adaptive hypermedia is done ‘by means of marking up data in the document’ (Bothma 2011: 91), a possibility that seems to be in its inception in some specialized dictionaries. For instance, in the New Palgrave Dictionary of Economics Online, users have an advanced search system that allows them to search in the full text of the dictionary article, the bibliography attached to the different dictionary articles, the article titles, the names of contributors, the abstract of the article and the list of keywords. These options are possible because the system has been previously marked on purpose. Bothma (2011: 91) envisages the complex mark-up of data on the Web as the ideal of the Semantic Web, one of whose objectives may be extremely useful for future online dictionaries: it should ‘allow a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing’ (www.w3.org/2001/sw). Kwary (2012: 35–9) illustrates the use of adaptive hypermedia in an English dictionary project of Finance for Indonesian students. Kwary’s proposal contemplates an online dictionary that has eight features divided into three lines. The first line has three features: a search text-box, an Insert System button and a Search button. The second line contains four features in the forms of tick-boxes, ‘English definition, Definisi B. Indonesia (“Indonesian definition”), Terjemahan (“equivalent” in Indonesian), and CFA Sample Question’. Finally, the last line is the search result box. Kwary claims that the adaptive search system directs the search action to the preferable result without the users having to click a particular tick-box when searching for the meaning of a term. He also adds that the adaptive system embedded in the dictionary suggests entries and works as an incremental search which is integrated into the search text-box. Linked open knowledge: Linked open knowledge is an information technology and technique that implies that the knowledge is in the public domain, as defended by the Open Data Movement. It is currently being used in several dictionary projects, typically for cross-referencing users to Wikipedia pages, selected texts and corpus concordances, as shown in the Interactive Language Toolbox developed at KU Leuven (see Verlinde 2011), and the Accounting Dictionaries (Fuertes-Olivera and Niño Amo 2018, Fuertes-Olivera et al. 2012, Nielsen et al. 2012). Recommender systems: A recommender system is an information technology and technique that seeks to predict the user’s preference when he or she is searching. Bothma (2011) sees this technology as useful for selecting synonyms and antonyms and claims that a kind-of recommender system is being used in the Ordbogen over faste vendinger (Bergenholtz 2010), and that such a system is very adequate when using a production dictionary, for instance for indicating that the specific word the user is looking at is not adequate in a context of usage and that he or she should use another one, which is also indicated.
Reflections for Making Online Dictionaries
●●
Annotation systems: An annotation system is an information technology and technique for creating information, as the Wiki environment in which Wikipedia is created. In addition, some online dictionaries make use of several systems for sharing/creating information, typically by making queries and posting answers, giving feedback through emails, etc. Bothma (2011) claims that by means of public annotations, the currency and completeness of an e-dictionary could be enhanced.
To sum up, technological challenges must be overcome for creating customizable environments in which the user will be able to set up his or her profile (and adapt it accordingly). The system will adapt the user’s profile to the changing environment and the user’s specific needs, the database will require that the data will be marked up and that there are links to external sources, and the system will be able to make recommendations and allow the user to make annotations (Bothma 2011).
4 Financial challenges A review of the lexicographic literature reveals that traditional publishers are not investing in the design and making of new online dictionaries. Miller (2018: 361), for example, claims that there ‘has been a tendency for publishers merely to post electronic versions of paper dictionaries online, without considering the advantages and disadvantages of the new medium’ (Fuertes-Olivera and Bergenholtz 2011:1). Such a claim can be easily demonstrated by analysing the characteristics of, say, learners’ dictionaries of English. Although these may be the most profitable lexicographic products ever made, they are still far from having incorporated the defining characteristics of the internet, e.g. the possibility of accessing dynamic dictionary articles. This may be due to a lack of enough financial resources and hence my claim that finding profitable models is the most demanding challenge lexicography is facing today as it is easy to conclude that without new dictionaries lexicography will disappear theoretically and practically. Simonsen (2017), Fuertes-Olivera (2019) and Fuertes-Olivera and Tarp (in press) claim that this challenge can be overcome by moving upstream in the value chain to guarantee the future of lexicography, and Fuertes-Olivera and Tarp (in press) add that moving upstream in the lexicographic value chain means above all the integration of dictionaries into high-tech tools, applications, assistants, platforms and services in the broad sense of the word (my emphasis). This implies the adoption of three assumptions. Firstly, lexicographers must accept that their product par excellence is not dictionaries, but lexicographic data that can either be presented to the users in the form of dictionaries or be integrated into various types of tools, platforms and services. Secondly, the ontological nature of lexicographic data is being a commodity, i.e. an economic good or service that has full or substantial fungibility. This means that lexicographic data must not be prepared for a particular dictionary project, but for any envisaged usage. Hence, lexicographers face the challenge of creating lexicographic data in a very open and broad way, e.g. assuming that the more lexicographic data stored in the Dictionary Writing System and the more detailed analysis of it the better. Finally, the two previous assumptions can be put into practice by working with a technology company which has adequate technological means, financial resources and business models. Since 2014 the lexicographic project Diccionarios Valladolid-UVa (Fuertes-Olivera 371
The Bloomsbury Handbook of Lexicography
et al., in press) is an attempt to put the above assumptions into practice. The project, which may cost up to one million euros and is mostly financed by the Danish company Ordbogen A/S, aims at preparing lexicographic data for describing the meaning and use of around 50,000 frequent English and Spanish words. Each of them can be described with up to 61 specific lexicographic data, e.g. its definition, grammar, grammar notes and cultural notes. Each data is stored in a specific slot of the DWS (see Section 3, above), and each data can be used for different purposes. For instance, since April 2019, the data is being used in Write Assistant, a product designed to assist users with written text production in their mother tongue (e.g. Spanish) or a foreign language (e.g. English). Fuertes-Olivera and Tarp (in press) claim that Write Assistant is a good example of both the new horizons opened by recent technological breakthroughs and the challenges posed to contemporary lexicography, especially to finding new business models. Write Assistant allows users, e.g. Spanish native speakers, to write Spanish words in an English text. The system will automatically offer the user different English equivalents for the Spanish words. Much of the data offered for selecting the right equivalent comes from the lexicographic data stored in the DWS of the Diccionarios Valladolid-UVa. This means that the lexicographic data is being used in a novel way and we hope this will allow the company to obtain economic resources that will offer economic returns, part of which can be invested in continuing the financing of the project, a project that can adopt more different forms, e.g. it can be published (and hopes to do so) in different novel online dictionaries, some monolingual, some bilingual, some general, some specialized and so on.
Conclusion In conclusion, this chapter has dealt with the future of online dictionaries. It has defended the position that this future is broader than the simple microlexicographic analyses concerned with lexicographically oriented linguistic issues. Instead, it has focused on challenges that stem from a lexicographic conception dependent on available technology and financial resources. These challenges are highlighted, presented and discussed in the framework of a dictionary project that offers guidance for the future, which will be totally dependent on having profitable business models for lexicographic activities.
References Abel, A. (2012), ‘Dictionary writing systems and beyond’ in S. Granger and M. Paquot (eds), Electronic Lexicography, 83–106, Oxford: Oxford University Press. Alonso, A., C. Millon and G. Williams (2011), ‘Collocational networks and their application to an e-advanced learner’s dictionary of verbs in science (DicSci)’ in I. Kosem and K. Kosem (eds), 12–22. http://www.trojina.si/elex2011 [accessed 17 July 2020]. Bergenholtz, H. (2010), Ordbogen over faste vendinger (Database and design: Richard Almind), Odense: Ordbogen.com (www.ordbogen.com) [accessed 10 July 2020]. Bothma, T. J. (2011), ‘Filtering and adapting data and information in an online environment in response to user needs’ in P. A. Fuertes-Olivera and H. Bergenholtz (eds), 71–102.
372
Reflections for Making Online Dictionaries
Clark, M., Y. Kim, U. Kruschwitz, D. Song, D. Albakour, S. Dignum, U. Cerviño Baresi, M. Fasli and A. De Roeck (2012), ‘Automatically structuring domain knowledge from text: an overview of current research’, Information Processing and Management 48, 552–68. European Lexicographic Infrastructure (ELEXIS). https://cordis.europa.eu/project/id/731015 [accessed 17 July 2020]. Fernández, T and P. Faber (2011), ‘The representation of multidimensionality in a bilingualized EnglishSpanish thesaurus for learners in Architecture and Building Construction’, International Journal of Lexicography 24 (2), 198–225. Fuertes-Olivera, P.A. (2012), ‘Lexicography and the internet as a (re-)esource’, Lexicographica 28, 49–70. Fuertes-Olivera, P.A. (2016), ‘A Cambrian explosion in lexicography: some reflections for designing and constructing specialised online dictionaries’, International Journal of Lexicography 29 (2), 226–47. Fuertes-Olivera, P.A. (ed.) (2018), The Routledge Handbook of Lexicography, London: Routledge. Fuertes-Olivera, P.A. (2019), ‘Designing and making commercially driven integrated dictionary portals: The Diccionarios Valladolid-UVa’, Lexicography 5, 1–21. Fuertes-Olivera, P.A. and H. Bergenholtz (eds) (2011), e-Lexicography: The Internet, Digital Initiatives and Lexicography, London and New York: Continuum. Fuertes-Olivera, P.A. and M. Niño Amo (2018), ‘The accounting dictionaries’ in P.A. Fuertes-Olivera (ed.), 455–72. Fuertes-Olivera, P.A. and S. Tarp (2014), Theory and Practice of Specialised Online Dictionaries. Lexicography versus Terminography, Berlin and Boston: De Gruyter. Fuertes-Olivera, P.A. and S. Tarp (in press), ‘A window to the future: Proposal for a lexicographicallyassisted writing assistant’, Lexicographica. Fuertes-Olivera, P.A., H. Bergenholtz, S. Nielsen, P. Gordo Gómez, L. Mourier, M. Niño Amo, Á. de los Ríos Rodicio, Á. Sastre Ruano, S. Tarp, and M. Velasco Sacristán (2012), Accounting Dictionaries (A series of 10 interconnected Spanish, Spanish-English, and English-Spanish Dictionaries), Database and Design: R. Almind and J. Skovgård Nielsen, Odense: Lemma.com. Fuertes-Olivera, P.A., H. Bergenholtz, P. Gordo Gómez, Á. de los Ríos Rodicio, Á. Sastre Ruano and S. Tarp (in press), Diccionarios Valladolid-UVa. Interactive Language Toolbox. https://ilt.kuleuven.be/inlato/ [accessed 27 July 2020]. Kilgarriff, A. (2006) ‘Word from the chair’ in G-M. De Schryver (ed.), DWS. Proceedings of the Fourth International Workshop on Dictionary Writing Systems, 7, Pretoria: SF Press. https://tshwanedje.com/ publications/dws2006.pdf [accessed 27 July 2020]. Kwary, D.A. (2012), ‘Adaptive hypermedia and user-oriented data for online dictionaries: a case study on an English Dictionary of Finance for Indonesian students’, International Journal of Lexicography 25 (1), 30–49. Leroyer, P. (2011), ‘Change of paradigm: from linguistics to information science and from dictionaries to lexicographic information tools’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), 121–40. Miller, J. (2018), ‘Learners’ dictionaries of English’ in P.A. Fuertes-Olivera (ed.), 353–66. New Palgrave Dictionary of Economics Online, http://www.dictionaryofeconomics.com/dictionary [accessed 17 July 2020]. Nielsen, S., L. Mourier and H. Bergenholtz (2012), Accounting Dictionaries (A series of 13 interconnected Danish, Danish-English, English-Danish and Danish Dictionaries), Database and Design: R. Almind and J. Skovgård Nielsen, Odense: Lemma.com. Scott, M. (2020), WordSmith Tools, Version 8, Stroud: Lexical Analysis Software. Simonsen, H.K. (2017), ‘Lexicography: What is the business model?’ in I. Kosem, C. Tiberius, M. Jakubíček, J. Kallas, S. Krek, and V. Baisa (eds), Electronic lexicography in the 21st century. Proceedings of the eLex 2017 conference, Brno: Lexical Computing CZ, 395–415. https://elex.link/ elex2017/proceedings-download/ [accessed 17 July 2020]. Spohr, D. (2011), ‘A multi-layer architecture for “pluri-monofunctional” dictionaries’ in P.A. FuertesOlivera and H. Bergenholtz (eds), 103–20. Tarp, S. (2011), ‘Lexicographical and other e-tools for consultation purposes: towards the individualization of needs satisfaction’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), 55–70. 373
The Bloomsbury Handbook of Lexicography
Tarp, S. (2012), ‘Do we need a new theory of lexicography?’, Lexikos 22, 321–32. Verlinde, S. (2011), ‘Modelling interactive, reading, translation and writing assistants’ in P.A. FuertesOlivera and H. Bergenholtz (eds), 275–86, Wikipedia. http://en.wikipedia.org/wiki/Main_Page [accessed 10 March 2012]. Write Assistant https://www.writeassistant.com/es/ [accessed 27 July 2020]. Ye, Z., J. Xiiangji, B. He and H. Lin (2012), ‘Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval’, Journal of the American Society for Information Science and Technology 63 (12), 2474–287.
374
22
The future of historical dictionaries, with special reference to the online OED and thesaurus Charlotte Brewer
1 Introduction Historical dictionaries are a special case in lexicography and dictionary-making, requiring different areas of research and raising different types of questions about the art and craft of lexicography (see also Considine, Chapter 9). A historical dictionary is usually historical in two distinct ways: first, since it selects, records and comments on how words have been used during the course of a language’s past, it tells a story – perhaps many stories – about the development not only of language but also of society and culture over the period in which that language has been used. Second, such dictionaries are themselves historical documents, especially since they take decades to research and write. Any historical dictionary will exhibit the scholarly values and methods of the period over which it was assembled. Inevitably, as it falls out of date – for language, and language scholarship, will continue to move forward as lexicographers run behind trying to catch up – it will need supplementing or revising. These further accretions or transformations will again mirror values in culture and scholarship that change over time. Paradoxically, it is digitization – that is, one of the most distinctively modern developments in technological and intellectual culture in recent times – which has opened up both types of history to us more now than at any stage in the past. The digitization of historical dictionaries has enabled ordinary users as well as language scholars and academics (for whom, in general, these dictionaries were originally written) to scrutinize the evidence accumulated by these dictionaries over periods of the past, and to subject this to systematic analysis, whether linguistic or sociolinguistic. Moreover, digital investigation of historical dictionaries can tell us how a dictionary has been put together (e.g. its choice of quotation sources), and reveal the values encoded in its scholarly and methodological assumptions. Many nations have produced historical dictionaries, and dictionaries are often thought of as expressing, or indeed embodying, patriotic values. The first major European dictionaries were unavoidably imbued with such significance, for example, Italy’s Vocabolario degli Accademici della Crusca (1612) and France’s Le Dictionnaire de l’Académie françoise dedié au Roy (1694), both of which were the product of academies determined to regulate and purify the language of the nation. Likewise, today’s historical dictionaries have often been funded by institutions intrinsic to the country whose predominating culture they record: the Deutsches Wörterbuch
The Bloomsbury Handbook of Lexicography
(1852–1960), edited by the brothers Jacob and Wilhelm Grimm, began as a private enterprise but had its funding taken over by various state organizations from around 1868 onwards, while the Trésor de la Langue Française (1971–94) and the Woordenboek der Nederlandsche Taal (1863–1998) were both state-funded from the outset. In consequence, past and present editions of many European national historical dictionaries are freely available online. The Oxford English Dictionary (OED), on which this chapter concentrates, is an exception in that it has always been published by an independent publishing house, Oxford University Press (OUP). None of its past editions – whether the first, second, or its two twentieth-century supplements – is available in searchable online editions and in its current form it can only be consulted via a subscription site. In many other ways, however, it is a striking example of the characteristics and features of historical dictionaries as well as an illuminating guide to the future of this genre. Originally published 1884–1928, but compiled over a much longer period (from around 1860 onwards), the first edition of the OED was subject to all sorts of changes in editorial and lexicographical policy. In its definitions of words relating to every sort of topic – science and technologies, the arts, politics, religion, public and domestic life, law, commerce, pastimes – it reflected the temper of its time. Its earliest published sections were out of date before the final ones were published, and its identity, function and purpose have since been significantly complicated by a series of supplements and additions, most recently by the monumental project of revision that has been published online (and only online) since 2000 (www.oed.com). It is widely recognized as the greatest dictionary of English ever published, or (to quote its website’s strapline) ‘The definitive record of the English language’, though it can certainly be criticized for imperfections and inconsistencies. From early on the OED was seen as occupying a unique role in the political as well as the scholarly or linguistic life of the nation, so that during the First World War its publishers claimed it as ‘An Imperial Asset’. Its readers thought the same, one reviewer suggesting that it ‘should be the most coveted possession of all public libraries in the United Kingdom, in the Colonies, and at least the headquarters of every district in India … It is not so much a Dictionary as a History of English speech and thought from its infancy to the present day’.1 This recognition of the role of the English language – and hence of the OED – in the British imperial project is familiar to us today. But the value attached to the role has undergone transformation: we now recognize its intrinsic racism. Many entries in the first-edition OED reflect then-predominant assumptions relating not only to race and the Anglo-centricity of the English language but to sex and sexuality, gender, class, religion, etc., which have since been recognized as unacceptable: for example, the explanation published in 1924 that the term ‘white man’ meant ‘a man of honourable character such as one associates with a European (as distinguished from a negro)’, or the 1909 definition of Sapphism, referring to female homosexuality, as ‘unnatural sexual relations between women’. One of the significant challenges that has faced the publishers and lexicographers working on editions and revisions to the OED since the first edition was published has been to identify such offensive editorial features in the dictionary and to re-write the text concerned, while preserving the historical evidence about attitudes and beliefs of the past to which the original formulations bore testimony. Another challenge has been to reduce the Anglo-centric focus of the original OED and extend the range of quotation sources and language covered, both historical and contemporary, whether World Englishes or other regional varieties. 376
Future of Historical Dictionaries
This chapter’s title specifies the ‘future’ of historical dictionaries. Over the last two decades, the OED has taken an interesting turn, since the publishers have wholeheartedly embraced internet publishing. This was an innovative step in 2000, when its website was first launched, and still innovative today, as the dictionary develops increasingly sophisticated electronic search and display tools to justify its subscription fee and increase its attractiveness not only to a scholarly but also a more general audience. This presents its publishers, and indeed the lexicographers themselves, with a further set of challenges, some of which, as we shall see, are in conflict with the academic values according to which the dictionary was first conceived and compiled. In common with other historical dictionaries first embarked on in the nineteenth century, the OED was rooted in its quotations from examples of real use of language, sourced in printed versions of texts dating from the Old English period to the present day (i.e. the late nineteenth and early twentieth centuries). Quotations are thus at the heart of the historical dictionary. They constitute the primary evidential basis both for the definitions themselves and for tracing the history of a word’s senses and uses from first to last recorded occurrence. For the OED, whose chronological range covers recent as well as historical vocabulary, revisiting and revising the choice of quotation sources offers more challenges still, since the technological revolution over the last three decades has enabled digitization not just of the dictionary itself but of many millions of words written and printed, both past and present, expanding the dictionary’s potential sources in ways which must sometimes seem unmanageable to the lexicographers. While the number of quotations in the dictionary has risen from around two million in the first edition to nearly four million online, the Victorian literary canon continues to dominate the collection overall, so that in this crucial respect the OED still resembles a Victorian entity onto which a very large twentyfirst-century component has been bolted. This chapter (1) explains the consequences of OED’s history for its future (2), examines how its current revision and website are presenting the dictionary’s past and present stock of information, (3) and considers its links, actual or potential, with other dictionaries, principally the Historical Thesaurus of the OED (4), a work that began life in Glasgow University in the 1960s but is intimately connected with the OED itself. Along the way it reviews the huge advantages, actual and potential, of digitizing historical dictionaries alongside the accompanying dangers of misrepresentation and error.
2 The OED up to 2020 To understand the OED today, and think coherently about its likely form and function in the future, we need to know quite a bit about its past. The narrative of its origin and compilation has been engagingly told in a couple of best-sellers as well as more academic books.2 Briefly, the dictionary was embarked on in the early 1860s by the London Philological Society, a group of (male) scholars and other men of letters who were knowledgeable about language and about lexicographical history, could see that English dictionaries to date had fallen short of comprehensive scholarly coverage of many important features of language, and were determined to produce a definitive work treating the history of English vocabulary in its entirety. Enlisting the help of hundreds of volunteers (an early example of crowdsourcing), the lexicographers set
377
The Bloomsbury Handbook of Lexicography
out to record every word in the language, reading through thousands of printed sources of every description, in every period in the language, in order to discover how words were used in real contexts. Poring over the resulting quotation extracts, around five million altogether, enabled the editors of the dictionary (which in 1878–9 was taken over from the Philological Society by OUP) to determine definitions for words and to trace how meanings and senses had developed over time. As they themselves rightly claimed, the OED’s grounding in this mass of historical evidence created a revolution in the art of lexicography (Murray 1933, Preface). As we have already seen, a historical dictionary falls swiftly out of date. On the one hand, contemporary language moves on and new words and usages come into existence, while on the other, historical scholarship uncovers new texts from the past and new information about the history and development of the language. Based though it was on the best Victorian and Edwardian scholarship of the time, the OED could not avoid obsolescence as the years passed following its completion. By the 1950s its publishers were faced with a dilemma. Should they keep on printing something that was so out of date? Or should they revise the whole thing again from scratch – which would not only be prohibitively expensive but also consume another few decades? Not unnaturally, they compromised. They reprinted the original dictionary on the one hand, but commissioned a supplement on the other. This supplement focused on new words entering the language since the first edition, or new senses of existing words – like the use of Trident to refer to the nuclear weapon system, aerobics, Angle-poise, antihistamine and so on – and was eventually published in four volumes between 1972 and 1986 (Burchfield 1972–86).3 In this way, OUP made up lost ground with contemporary, twentieth-century vocabulary, but kept the original Victorian and Edwardian dictionary as it was. Then in 1989, in a bold and prescient move, OUP digitized the whole work, merging the original first edition – still unchanged – with the four-volume supplement, adding a further (unidentified and unidentifiable) 5,000 new words and senses, and reprinting and re-releasing the resulting combination as a second edition. It was this development that introduced the first seriously complicating element into the OED, adumbrating problems to come in the immediate and more distant future. The second edition was warmly received by the general public, but severely criticized by lexicographical specialists for its confusing mixture of (almost entirely) old wine in new bottles.4 While everything about its appearance – a brand new work, apparently, in twenty handsome volumes (soon to be available in the then-novel format of CD-ROM) – suggested that it was at the cutting-edge of scholarship as well as technology, its content was very largely the same as in the original Victorian and Edwardian version. The editors had identified and corrected a handful of the problematic definitions mentioned above, relating to race or sexuality, but had left many more untouched (the entry for Sapphism, for example, dropped the word ‘unnatural’ in the original definition, but continued to refer to female homosexuality as a ‘vice’ in the etymology, while ‘unnatural’ remained a descriptive term in a number of words referring to male homosexuality, e.g. sodomy, buggery and the like). The second edition retained hundreds of definitions that were outdated in other ways too; for example, dialect was explained as a ‘subordinate’ form of a language ‘arising from local peculiarities of vocabulary’, i.e. ‘a provincial method of speech’, while slang was said to be ‘the special vocabulary used by any set of persons of a low or disreputable character; language of a low and vulgar type’. These definitions appeared to belong to the same date as the second edition, namely 1989, but could never have been written at that time: they express 378
Future of Historical Dictionaries
value judgements on language which no respectable lexicographer would then have put his or her name to. In fact, of course, both definitions had been taken over unchanged from the first edition. The same applied to many of the more recondite components of the unrevised entries, for example, pronunciation and etymology: the pronunciation recorded that of late Victorian England, while many etymologies had long been superseded by more recent scholarship. In more subtle and pervasive ways, too, the second edition belied its late-twentieth-century date. As revealed by electronic searches of the dictionary (which OUP itself, in digitizing the dictionary for the second edition, had made possible), the OED had throughout its compilation been subject to more than one type of cultural bias. The individual writers most favoured for quotation purposes, for example – and here we should remember that quotations were the principal evidence on which this dictionary was based, and from which it derived its unique authority – were those of the Victorian literary canon: Shakespeare, Walter Scott, Chaucer, Milton, Pope, Tennyson, Dickens and the Bible. Female authors received short shrift, female poets even shorter shrift, and some periods (e.g. the late sixteenth century) were much more intensely represented in the dictionary than others (the eighteenth century). This is not surprising: these were the sources that both the lexicographers and their volunteers believed were the most important (or in the case of female writers and the eighteenth century, the least important) for literary and therefore linguistic evidence on the use of language. However, from the 1930s or so onwards, linguists had begun to question the assumption that it is great literature we should turn to in seeking to record (as the OED sets out to do) the history and development of the language, while literary critics had begun to question the assumptions implicit in the formation and promulgation of a literary canon in the first place. In much of the quotation evidence, therefore, as well as the definitions it contained, the 1989 second edition bore the stamp of an era long disappeared.5 At the same time, nevertheless, the ‘new’ dictionary looked entirely modern – not just because of the smart new printing, but also because of the patently recent vocabulary leaping off the page. Thousands of new words, only just recorded in the supplement, were clearly of the present, whether relating to recent cultural phenomena (the twist, flower power, yuppie), to science, technology and history (moon landing, wysiwyg, cultural revolution) or to transformed sexual mores, which meant that vocabulary previously considered unprintable could be recorded for the first time (cunnilingus, fellatio). The result was a sort of mongrel edition, happily consulted by many as the last word in lexicography – and indeed, enormously useful and erudite in many respects – but treated by scholars, if they were aware of the problems, with considerable caution. The OED lexicographers and publishers themselves knew better than anyone that the second edition was unsatisfactory, however warmly received by non-specialists. They soon came up with a solution, although it has been an expensive one so far and commits them to major expense in the future. In the 1990s OUP began the long overdue task of revising the OED in its entirety, pulling the still largely Victorian and Edwardian second edition into the twenty-first century. A team of around sixty lexicographers began the process of overhauling, re-researching and rewriting each and every component of each and every entry: spelling, pronunciation, etymology and semantic analysis. At the same time, they embarked on a major new reading programme to provide the quotation evidence which would drive this reconfiguration forward. The revision is ongoing and its progress is stately; a website statement dated 2020 recorded it as 40 per cent complete (https://public.oed.com/history/dictionary-milestones/). 379
The Bloomsbury Handbook of Lexicography
While engaged on revising the historical portion of the dictionary – all the entries printed in or before OED2 (1989) – the lexicographers are also documenting the vocabulary of today. Recent additions to the OED lexicon include Brexit, ze (the gender-neutral third-person pronoun), many words and senses related to the onset of Covid-19 – corona n.3, contact tracer, frontliner – and Nigerian English borrowings and coinages (buka, Okada).6 Although this gradually emerging third edition of the OED, OED(3), is indisputably a new project, with search tools, hyperlinks and other analytic resources exploiting newly available technology (as the next two sections describe), its editors are therefore in many respects retracing the steps taken by their Victorian predecessors. The major difference is that many of the research methods being applied to the Victorian original are twenty-first-century ones – electronic searching of databases, electronic storage, electronic editing and electronic production.
3 The OED: Current and developing characteristics The current and future job of the lexicographers is thus twofold: revision and updating of the old entries, and creation of new ones. But the past is constantly present in OED lexicography, in more ways than one. Most importantly, it is present in the 50 per cent of entries that are as yet unrevised, many of which have been virtually unchanged since their initial publication between 1884 and 1928 in the first edition. This is because OUP has decided that the best way to present the dictionary online is to merge the new and revised entries seamlessly with the unrevised ones. Why? Because online publication reaches a more varied audience than was ever exposed to the print version, whose thirteen volumes, forbiddingly heavy and expensive, were usually confined to the reference sections of university libraries and to the homes of relatively wealthy academics and intellectuals. By contrast, the current OED is regularly browsed by those with access to institutional subscriptions – journalists, teachers, lawyers, editors, technologists and word enthusiasts, not only in the United Kingdom but throughout the global community, notably the United States and Japan, where interest in the OED is high. This wide audience brings commercial returns, but also, as OUP sees it, the responsibility to meet popular requirements as well as scholarly ones. In turn, this requires on-screen presentation that is simple and easy to use. The problem is that the history of the dictionary is not simple at all, and the second edition, into which the ongoing third edition is being gradually merged, was itself a hybrid of new with old. OUP has nevertheless decided that attempts to represent the history of the dictionary more transparently, distinguishing between the different stages of composition and publication, and making it clear that the content of many existing entries (dating back seventy years and more) contains information long since superseded by subsequent scholarship, would be impossibly confusing for its non-specialist users and for anyone who is not a lexicographical historian.7 Opinions will vary on whether this is a good or bad decision. Many users, however, including academic users, are simply unaware of the mix of new lexicography with old and are easily led astray by their assumption that all entries have been recently updated. Moreover, it could certainly be argued that one of the chief attractions of OED for non-specialist users is its impeccable scholarly credentials, and that it is precisely this characteristic of OED that the
380
Future of Historical Dictionaries
current website, with its unannounced merging of new with long-outdated entries, obscures or at worse misrepresents. The website is updated every quarter, when a set of newly revised or newly created entries is uploaded and the corresponding set of unrevised entries silently removed. At the same time, the lexicographers can make changes or corrections to entries or parts of entries throughout the whole dictionary – updating bibliographical details in the quotations, or regularizing editorial labels should they re-think their rules on describing terms as ‘obsolete’, ‘slang’, ‘colloq.’ (i.e. ‘colloquial’) – while improving the user-friendliness of the site more generally by adding brief explanatory features on subjects such as surnames or place names in English. This mode of revision, composition, accretion and regular transformation is something that was impossible in print format, and it has many advantages for consumer as well as lexicographer. Most obviously, dictionary users no longer need to wait years for a supplement. Instead, a fresh batch of new scholarship materializes onto the screen every three months, with every entry in correct alphabetical order. The revised OED may never be published in print again, and electronic publishing is the likely future for many types of dictionary now. Where historical dictionaries are concerned, there is an added advantage for both publisher and reader: neither will have to bear the costs associated with producing such works in hard copy. But there are disadvantages to web publication, too, particularly for those seeking a truly definitive definition. The entry you consult in January may be different by March (entirely revised, or with new quotations, date changes or other alterations), and many of these changes are not identified (or identifiable in systematic ways). Academics find this evanescence disturbing, as it is no longer possible to cite a stable authority; even the casual reader may find it disconcerting. In January 2020 the OED signalled that such unmarked changes were a new feature of the dictionary, but in fact they have been regularly introduced since 2010.8 Notable examples include the entry for marriage. First revised in 2000 and still (as of July 2020) marked with that date, the entry was rewritten in or after 2013 in response to legislation in England and Wales permitting same-sex marriage (in the wake of such legislation in other territories around the world). References to the gendered terms husband and wife were dropped and a new definition substituted: ‘The legally or formally recognized union of two people as partners in a personal relationship’. However admirable the intention to keep the dictionary up to date, silent changes such as these obscure the lexicographic record. Lexicographical judgements on the wordings of definitions, including when and whether to update them, communicate views on the relative importance of changes in both language and culture; such judgements influence as well as reflect different bodies of opinion in the language community more widely. They are of interest therefore to every type of cultural commentator and historian, whether academic or not. OED’s decision to re-word the definition of marriage was significant, as is the fact that decision did not take place in 2000, as indicated by the entry’s current date-stamp, but over ten years later. Equally significant is the dictionary’s continued retention (as of July 2020) of the gendered terms husband and wife in entries for marriage and marital, suggesting that the intention to record changes in usage related to gender and sexuality is not being followed through in necessary detail. Especially in an historical dictionary, it is important that editorial decisions of this (or any) kind are identified and dated. This unidentifiable and unverifiable changefulness marks a sea change in post-print OED lexicography. (For other examples of silent changes obscuring the lexicographical record, this time to entries for sexual vocabulary, see Brewer 2013.) 381
The Bloomsbury Handbook of Lexicography
Other electronic transformations of OED continue to be introduced. Since its inception in 2000, the online website has provided tools for searching and display that have been improved and refined in successive makeovers, notably a re-launched platform in 2010. They offer the ability to exploit the vast stores of information previously trapped on printed pages, thus inaccessible to large-scale research and analysis, on features ranging from spelling variants, etymologies and morphology, through definitions and quotations, to editorial labels and notes. Searches can now reveal the answers to questions of interest to professional linguists and word enthusiasts alike. For example, which writers does the dictionary record as having introduced the most words into the language? Do these words have particular grammatical, morphological or etymological features? Which periods of the language have been (again, according to the OED), most lexically productive? Front-page buttons allow one to search by language of origin, by subject, by geographical region (unevenly represented as yet), by ‘usage’ – e.g. for words labelled ‘allusive’, ‘archaic’, ‘colloquial and slang’, ‘derogatory’ – and so on. Particularly notable is the feature dating from 2010 which lists the top thousand most cited quotation sources in the dictionary, inviting further research – though disturbingly, this shows that the Victorian literary canon (Shakespeare, Walter Scott, Chaucer, Milton, Dryden) continues to dominate the OED after twenty years of revision; the list figures just 28 women authors out of the total of 1,000 (as of June 2020). It surely cannot be the case that these authors have played so influential a role in the history and development of English literature, let alone English language. More recent additions include information on the frequency of each headword (though confusingly for a historical dictionary, this is based on post-1970 usage data taken from Google Books Ngrams data) as well as links to discussions of words and relevant topics on the website’s accompanying blog (many excellent, e.g. on gender-neutral pronouns by Dennis Baron, some less so; the publisher disclaims responsibility for the blog’s content). Quotations now also link, where appropriate, to entries on their authors in the Oxford Dictionary of National Bibliography and to longer extracts from the text in question in Oxford Scholarly Editions Online (both subscription resources). Over 2019–20, the dictionary has been developing an API, or ‘application programming interface’ (defined in an entry of 2017 as ‘a set of routines, protocols, and tools designed to allow the development of applications that can utilize or operate in conjunction with a given item of software, set of data, website, etc’). This promises the potential for still more intensive and focussed search of the OED’s contents. Such resources open up the history and development of the language in ways that could not have been dreamed of by the original lexicographers, and they give a good indication of the directions in which other historical dictionaries, not just the OED, may travel. Equally, however, these new digital resources lead to the possibility of error and pitfall. OED Online search facilities and results are often flawed. Searches for quotations from The Times using the website’s own recommended pathways will turn up results from Musical Times, N.Y. Times, Financial Times; similar problems characterize other website queries. Ten years after the 2010 makeover, search results are still delivered in a format which requires users to click through to each individual entry in order to see the text of the quotations or other material searched for – a laborious process which negates much of the advantage of electronic searching in the first place. But the most concerning issue with digitization where the OED is concerned is that it is impossible to search the revised portions of the dictionary (OED3) separately from the unrevised (OED2). Yet there is no doubt that the differences between these two versions are huge, reflecting 382
Future of Historical Dictionaries
changes in language and scholarship over many decades (as discussed in Brewer (2005–20); see https://oed.hertford.ox.ac.uk/oed-editions/oed3/). The revision is transforming the historical record of the vocabulary found in OED1/OED2 – pushing dates of first attestation earlier (sometimes by centuries) as well as last attestations later, re-analysing the development of the different senses of words, rewriting editorial annotations and applying new editorial labels, rewriting etymological and other ancillary material in entries. OED3 is also quoting more intensively from sources its predecessors neglected, including eighteenth-century texts, writing by women (to however limited an extent), and non-literary texts of all kinds – wills, inventories, many more newspapers and journals, diaries, legal and local government records, and a host of other heterogeneous texts, many available for the first time in digitally searchable resources. However, the publisher’s decision to merge OED2 and OED3 together effectively obscures the nature of OED3’s changes and improvements. Website searches return an undifferentiated mix of old and new lexicographical scholarship, one which cannot be used for research purposes unless each entry is checked one-by-one in the dictionary to ascertain whether it has been revised or not. The moral is clear. However sophisticated the electronic medium in which they appear, dictionaries reflect the evidence and research put into them in the first place – and in the case of the mixed OED website, electronic searches turn up results reflecting many different stages of composition. The only way to solve this problem is to allow users to search the old material separately from the new (as well as the other way round), a resource withdrawn in 2010 when the electronically searchable version of OED2 was deleted from the website, leaving readers with the hybrid version of the dictionary only. Looking on the bright side, this problem of inconsistent data will gradually reduce as the revision of OED proceeds, and will have entirely disappeared in twenty years or so (!) when the revision is complete. By then, we can hope that searchable OED2 will have returned to the website, as an important record of historical lexicography bearing witness to a whole era of society and culture as well as to literary and linguistic scholarship on language itself. For as OED3’s editor John Simpson states in his online preface, in a remark that can be applied to other historical dictionaries too, ‘Far more than a convenient place to look up words and their origins, the Oxford English Dictionary is an irreplaceable part of English culture. It not only provides an important record of the evolution of our language, but also documents the continuing development of our society’ (http://www.oed.com/public/oedhistory#future, accessed June 2020).
4 Other dictionaries; future developments Another feature of OED Online, one with implications for lexicography more widely, is its hyperlinks to related material in other dictionaries and reference works, some published by OUP itself. The most innovative of these is OED’s younger ‘sister’ dictionary, The Historical Thesaurus of the Oxford English Dictionary (HTOED, Kay et al. 2009). The origins of HTOED date back to 1964, when its initiator M. L. Samuels first discussed with Glasgow University colleagues the benefits to be gained from ‘turning the dictionary [OED] inside out’, i.e. loosing its contents from their alphabetical moorings and re-organizing them by semantic categories
383
The Bloomsbury Handbook of Lexicography
(Kay and Witherspoon 2002). The problem with conventional (print) dictionaries, as has often been observed, is that alphabetical order has very little to do with the meanings of words or the connections between them. The advantage of thesauruses, by contrast, is that words are arranged according to their meanings (see Sierra, Chapter 19). The idea behind HTOED was more ambitious still, since by recasting and reformulating OED’s evidence it was able to show how related categories of vocabulary, along with the objects or concepts denoted by that vocabulary, had developed through time. This reconfiguration of OED not only enables new types of historical lexical research but also opens fresh paths of historical and cultural investigation in areas beyond the purely linguistic. As the editors explain, the larger groups of related words can themselves be thought (and used) as ‘conceptual maps’ – charting the development of (say) vocabulary relating to the mind or to concepts such as strength or virtue or sin or wealth, to cultural artefacts of all imaginable kinds or to social entities and relationships; see further Kay et al. (2009: Preface). All this is on display in historically organized categories of data which is itself coterminous with the entire range of recorded vocabulary in English – or to be more precise, with the vocabulary recorded in the OED: the main drawback of the project in its printed form was that the editors based their study on the second edition of OED, and therefore replicated the imperfections of that edition as well as the strengths of the original OED itself.9 This limitation disappears online – at least, it disappears to the extent that the HTOED on the OED Online website is updated in line with new OED3 entries, though this seems to work better with older words than with new: coronavirus, first recorded 1968, is correlated with vocabulary related to ‘virus’, while phablet (a smartphone near the size of a tablet), first recorded in 2010, has no thesaurus entry (unlike smartphone itself). Meanwhile a reassembled editorial team are working towards an independent second online edition, produced and hosted at https://ht.ac.uk (Alexander n.d.), a resource supported (like the original HTOED) by the University of Glasgow, at which new lexical research can be read. Links to other resources on OED Online have come and gone on the website since first being introduced in 2010. Two that have stayed put are of particular value to historical linguists, giving access to the two other major period dictionaries of English, the Middle English Dictionary (MED; supported by the University of Michigan, initially completed in 2001 and now undergoing further revision; see Kurath et al. (1952–2001)) and the Dictionary of Old English (DOE, letters A- I; supported by the University of Toronto; see Cameron et al. (n.d.)). These dictionaries provide a much wider range of synchronic quotation evidence for the periods they cover than does the OED itself, enabling specialists to get a better sense of connotation than is possible from the more selective parent dictionary – both were in origin spin-offs from the first edition of the OED.10 Moreover, although MED was originally composed (from the mid-1930s on) with the example of the OED before it, it often chooses to analyse the semantic structure of words differently. Comparing the two dictionaries side by side on the screen illuminates difficult lexicographical decisions about how to interpret difficult words and concepts with particular cultural resonance (e.g. ‘honour’, ‘truth’ and the like); evaluations of this sort are much easier with the new hyperlinks. Linking to other dictionaries is likely to develop as OUP re-thinks its plans for OED. Future candidates might include the Dictionary of the Scots Language (http://www.dsl.ac.uk/), which combines the Dictionary of the Older Scottish Tongue (Craigie, Aitken et al. 1937–2002) with The Scottish National Dictionary (Grant 1931) and is under further development, as well as historical 384
Future of Historical Dictionaries
dictionaries of non-UK English such as the Dictionary of South African English (https://dsae. co.za; Niekerk et al. n.d.), the Dictionary of the English/Creole of Trinidad & Tobago (Winer 2009) and OUP’s own Australian National Dictionary (second edition; Moore et al. 2016). The Dictionary of American Regional English (or DARE; Cassidy and Hall 1985–2012, Hall n.d.) and Dictionary of Canadianisms on Historical Principles (or DCHP2; Dollinger and Fee 2017) have both made particularly interesting use of new (or newly manipulated) digital resources, notably the interactive maps showing regional distribution of terms (DARE) and information on relative regional frequency of terms (DCHP2). Older historical dictionaries offer further possibilities of comparison and contextualization, for example, Jamieson’s Etymological Dictionary of the Scottish Language (1808; Rennie n.d.) or Wright’s English Dialect Dictionary (1898; Markus 2019). For specialist areas of vocabulary, Green’s Dictionary of Slang (Green 2020), an ongoing project focusing on the period c. 1500 onwards, is the outstanding candidate. All these dictionaries contain rich resources for the historical study of English and most (other than Moore et al. 2016 and Winer 2009) are freely available online. In all cases, it will be important to make absolutely clear to the OED reader the differing characteristics and methodology of the resources respectively linked to. The future of historical dictionaries, whether completed a century or more ago or ongoing, is unquestionably digital. It is notable that both MED and DOE include searchable corpuses of their source quotation texts on their websites. Like HTOED, therefore, these reconceived word-resources point the way to a semantics which unsettles the theory of meaning implied by a conventional dictionary, whose alphabetical organization (as already observed) suggests that words are discrete entities with entirely independent meanings. In practice, the meaning of a word depends on its relationships with other words, whether the semantic groupings revealed by a thesaurus, or the syntactical and collocational relationships revealed by corpuses. In this way, the methodological developments enabled by digitization enable shifts in lexicographical theory as well as method. In turn, it is more important than ever before that a historical dictionary – and OED Online in particular – should be as transparent as possible about its choice and range of quotation sources, i.e. the corpus, or primary data, that a dictionary of this type draws on. Working out how best to source, present, develop and exploit a dictionary’s contents, while preserving academic standards intact, will be among the most important issues for historical lexicography of the future.
Notes 1 Oxford University Press (1916): 16; The Periodical, 15 February 1928: 25. 2 Murray (1977) remains the authoritative account for non-specialists, though see Winchester (1998 and 2003). Gilliver (2016), written by an OED lexicographer, offers an exhaustive historical narrative of the entire work. A good range of analytic and descriptive accounts of the first edition can be found in Mugglestone (2000), important features of which are further investigated in Mugglestone (2005). Brewer (2007) provides a history of OED over the twentieth and twenty-first centuries. 3 In fact this was the second supplement; the 1933 re-issue of OED (Murray 1933) had included a one-volume supplement with entries for many words or usages that had entered the language since publication of the dictionary’s first instalment in 1884. 4 See Stanley (1990), Algeo (1990), Brewer (1993) and (for an overview) Brewer (2007): 213–22. 385
The Bloomsbury Handbook of Lexicography
5 See further Brewer (2005–20), Brewer (2010a). 6 See lists at https://public.oed.com/updates/; https://public.oed.com/blog/nigerian-english-releasenotes/. 7 As explained by Oxford University Press to the author in 2013. 8 Grathwohl et al. (2020); see further https://oed.hertford.ox.ac.uk/oed-editions/oed-online/re-launched/ continuous-change/. 9 See further Brewer 2010b, on which this account draws. 10 Craigie (1919); Brewer (2007): 75–6.
References Alexander, M. (Director) (n.d.), Historical Thesaurus of English. https://ht.ac.uk. Algeo, J. (1990) ‘The emperor’s new clothes: The second edition of the Society’s dictionary’, Transactions of the Philological Society 88, 131–50. Brewer, C. (1993), ‘The Second Edition of the OED’, Review of English Studies 44, 313–42. Brewer, C. (2005–20), ‘Examining the OED’. http://oed.hertford.ox.ac.uk/. Brewer, C. (2007), Treasure-House of the Language: The Living OED, New Haven and London: Yale University Press. Brewer, C. (2010a), ‘The use of literary quotations in the Oxford English Dictionary’, Review of English Studies 61, 93–125. Brewer, C. (2010b), ‘Review of: Christian Kay, Jane Roberts, Michael Samuels and Irené Wotherspoon (eds), Historical Thesaurus of the Oxford English Dictionary’, Review of English Studies 61, 801–5. Brewer, C. (2013), ‘OED Online Re-launched: Distinguishing old scholarship from new’, Dictionaries: Journal of the Dictionary Society of North America 34, 101–26. Burchfield, R.W. (1972–86), A Supplement to the Oxford English Dictionary, 4 vols, Oxford: Clarendon Press. Cameron, A. et al. (n.d.). Dictionary of Old English. https://www.doe.utoronto.ca/pages/index.html. Cassidy, F.G. and J.H. Hall (1985–2012), Dictionary of American Regional English, 5 vols, Cambridge, MA: Harvard University Press. Craigie, W.A. (1919), ‘New dictionary schemes presented to the Philological Society, 4th April 1919’, Transactions of the Philological Society, 6–11. Craigie, W.A., A.J. Aitken et al. (1937–2002), A Dictionary of the Older Scottish Tongue, 4 vols, Chicago: University of Chicago Press, Aberdeen: Aberdeen University Press; London: Oxford University Press. Dollinger, S. and M. Fee (2017), The Dictionary of Canadianisms on Historical Principles, Second edition, Vancouver, BC: University of British Columbia. www.dchp.ca/dchp2. Gilliver, P. (2016), The making of the Oxford English Dictionary, Oxford: Oxford University Press. Grant, W. (1931), The Scottish National Dictionary, 10 vols, Edinburgh: Scottish National Dictionary Association. Grathwohl, C., M. Proffitt, P. Durkin and K. Martin (2020), ‘Renewing our commitment to the OED’, 16 January. https://public.oed.com/blog/renewing-our-commitment-to-the-oed/. Green, J. (2020). Green’s Dictionary of Slang. https://greensdictofslang.com. Hall, J. H. et al. (n.d.), Dictionary of American Regional English. https://www.daredictionary.com. Kay, C. and I. Wotherspoon (2002), ‘Turning the dictionary inside out: Some issues in the compilation of a Historical Thesaurus’ in J.E. Diaz Vera (ed.), A Changing World of Words, Amsterdam: Rodopi, 109–35. Kay, C., J. Roberts, M. Samuels and I. Wotherspoon (eds) (2009) Historical Thesaurus of the Oxford English Dictionary, 2 vols, Oxford: Oxford University Press. Kurath, H., A.K. Sherman, J. Reidy and R. Lewis (eds) (1952–2001), Middle English Dictionary, Ann Arbor: University of Michigan Press. http://ets.umdl.umich.edu/m/med/.
386
Future of Historical Dictionaries
Markus, M. (2019), English Dialect Dictionary Online. http://eddonline-proj.uibk.ac.at/edd/index.jsp. Moore, B., A. Laugesen et al. (eds) (2016), The Australian National Dictionary, Second edition, Oxford University Press, Australia. Mugglestone, L. (ed.) (2000), Lexicography and the OED: Pioneers in the Untrodden Forest, Oxford: Oxford University Press. Mugglestone, L. (2005), Lost for Words: The Hidden History of the Oxford English Dictionary, New Haven and London: Yale University Press. Murray, J.A.H. et al. (1933), The Oxford English Dictionary: Being a Corrected Re-issue with an Introduction, Supplement, and Bibliography of A New English Dictionary on Historical Principles, Founded Mainly on the Materials Collected by the Philological Society, Oxford: Clarendon Press. Murray, K.M.E. (1977), Caught in the Web of Words: James A.H. Murray and the Oxford English Dictionary, New Haven and London: Yale University Press. Niekerk, T. van et al. (n.d.), Dictionary of South African English. https://dsae.co.za. Oxford University Press (1916), The Oxford Dictionary: A Brief Account, Oxford: Oxford University Press. Rennie, S. (n.d.), ‘Jamieson’s Dictionary of Scots’. https://jamiesondictionary.com/. Stanley, E.G. (1990), ‘The Oxford English Dictionary and Supplement: The integrated edition of 1989’ Review of English Studies 61, 76–88. Winchester, S. (1998), The Surgeon of Crowthorne, London: Penguin. Winchester, S. (2003), The Meaning of Everything: The Story of the Oxford English Dictionary, Oxford: Oxford University Press. Winer, L. (2009), Dictionary of the English/Creole of Trinidad & Tobago: on historical principles, Montreal: McGill-Queen’s University Press. Wright, J. (1898), The English Dialect Dictionary, London: Henry Frowde.
387
388
23
The future of dictionaries, dictionaries of the future Sandro Nielsen
1 Introduction The lexicographic landscape is gradually changing due to internal and external developments. Lexicographic research activities, findings and output in the form of principles, theories and dictionaries represent internal developments, whereas changes in social behaviour, information needs and technology are typical examples of external factors affecting research in and the making of dictionaries. In this light, it is relevant to ask the question: Do we have dictionaries in the future? Possible answers depend on, among other things, what we mean by the term ‘dictionary’ and the time frame involved. Experience shows that new artefacts eventually replace existing ones in most, if not all, cases; though it would be imprudent to think that dictionaries will disappear within the near future. As long as people find that some of their needs for information can be solved by consulting dictionaries, their existence will be ensured for some time. What objects people will regard as dictionaries may change, however, owing to a range of factors, including the types of need identified, the media available and the types of help provided. The future of dictionaries has been discussed sporadically in the literature during the past two decades. Andersen and Nielsen (2009), Samaniego Fernández and Pérez Cabello de Alba (2011) and Kallas et al. (2019) are some of the most recent contributions, and they mention a number of trends that may shape the future. First of all, they point to the ontological shift from linguistics to lexicography as a separate discipline or as part of information science, i.e. a theoretical development. The second trend is the change in the form and size of printed and electronic dictionaries, i.e. a practical development. Finally, the authors find that more and more lexicographers recognize that general lexicography can benefit from the theoretical and practical advances in specialized lexicography. These trends are discussed below.
2 Redefining dictionaries One of the first things to consider in a forward-looking study is the nature of the social reality examined. The ontological position of lexicographers is, therefore, imperative because it affects the way in which lexicographers perceive their objects of research and work (see also Piotrowski, Chapter 16). Dictionaries are often described as reference books containing words
The Bloomsbury Handbook of Lexicography
and their spelling, pronunciation and meaning, or as reference works containing words and their translations in another language, see e.g. Crystal (2010: 112) and Sterkenburg (2003: 396). However, developments in both practical and theoretical lexicography have highlighted some of the limits of such ontological positions: they are specific examples of types of dictionary; they are not generally applicable; and they are often mutually exclusive. In other words, a dictionary is not just a dictionary. In an attempt to describe the dictionaries of the future, a wider and more complex ontological position is necessary. Nielsen (2009: 215, 2018: 72–3) proposes that dictionaries are reference tools made up of several surface features, i.e. features that are visible to users in print or on screens, such as user guides, wordlists, appendices, search sites and result sites. In addition, dictionaries have at least three significant underlying features, the overriding of which is that dictionaries are designed to have one or more lexicographic functions. A lexicographic function is the type of help a dictionary can give to a specific user type in a specific type of non-lexicographic situation in which someone may consult dictionaries to find help; for instance, communicative functions such as providing help to translate, understand or write texts, and cognitive functions such as to provide help to acquire knowledge independently of communicative activities (e.g. Bergenholtz and Tarp 2010: 30). Second, dictionaries contain data that have been selected to support the relevant function(s), and thirdly, lexicographic structures combine and link the data so that they support and fulfil the dictionary function(s). These features should not be seen in isolation, as a dictionary is made up of the totality of surface and underlying features, and their interrelationships. The proposed ontological position is generally applicable in lexicographic contexts. It applies, for instance, to printed and electronic dictionaries, to monolingual and bilingual dictionaries, to general and specialized dictionaries, to learner’s and expert’s dictionaries, to language and encyclopaedic dictionaries, and it underlines the fact that dictionaries are complex lexicographic information tools. While traditional definitions of dictionaries are rooted in linguistics, the functionoriented definition primarily focuses on information needs of users and how lexicographers can respond to such needs (see also Fuertes-Olivera (21) in this volume). This shift is accentuated by Hartmann (2012: 101): ‘Lexicography is not just part of applied linguistic lexicology and technical terminology, but a potentially independent field capable of expansion, with reference works (or “information tools”) other than dictionaries – like encyclopaedias, atlases, catalogues, manuals and directories – being just as important.’ With the functional approach, theoretical and practical lexicography are not subject to linguistic constraints and this approach provides a platform from which lexicographers can respond satisfactorily to the needs for information in the knowledge and information society of tomorrow.
3 Dictionaries of the future It is notoriously difficult to forecast the future. Even so, forward-looking statements may express opportunities based on possible courses of action related to present-day knowledge; according to Mićić (2010: 1), ‘Most of the trends, technologies and issues that will determine our future in the next ten to twenty years are already visible now. The future is already here;
390
The Future of Dictionaries
it just hasn’t arrived everywhere to the same extent.’ The above discussion indicates that one challenge facing the lexicographic community is the provision of tools that can satisfy needs for information. At a theoretical level, the shift of focus towards lexicography and information science requires a theoretical basis that applies to all dictionaries and not separate theories for individual types of dictionaries, i.e. a transformative approach instead of a contemplative approach (e.g. Gouws 2011: 17–20, Tarp 2009: 24). A transformative theory is prospective and allows lexicographers to develop guidelines for designing and making dictionaries that are adapted to specific types of users and to specific types of user situations. Even though Bergenholtz (2011: 30) laments the limited effect theory has had on lexicographic practice, either because publishers do not (sufficiently) take theoretical advances into consideration or because a long time-lag is involved, dictionaries of the future will likely benefit from something like function-oriented principles and theories in order to satisfy information needs. But what will these dictionaries look like?
3.1 Can dictionaries be made future-proof? Dictionaries will be available in many shapes and sizes and two general types can be identified: printed dictionaries and electronic dictionaries. Even though this distinction seems clear, there is not always a clear dividing line between the two. As pointed out by Andersen and Nielsen (2009: 360–1), it has been claimed that printed dictionaries are an endangered species that will be replaced by electronic ones. Nonetheless, printed dictionaries are likely to be with us for still some time. Printed dictionaries invite users to adopt a slow and reflective consultation procedure in connection with certain functions, for example providing help to language learning. Furthermore, publishers offer dictionaries printed on demand, which may keep costs down; and printed dictionaries are not dependent on power sources in order to work. Other factors that may extend the life of printed dictionaries are that some printed dictionaries are transferred to electronic media without any or few changes (e.g. Tono 2009), and that some dictionaries come with interactive e-texts available from publishers via the internet. Such e-texts may include text corpora, audio pronunciation guides, still pictures, video footage, online writing guides and the possibility of making personalized notes (e.g. Oxford Advanced Learner’s Dictionary and Dictionary of Law). Electronic dictionaries also come in a great many varieties. They can be dictionaries that were originally produced as printed books and transferred to physical electronic media (e.g. CD-ROMs and pocket electronic dictionaries), they can be dictionaries originally produced in electronic form for one medium and transferred to another (e.g. from dictionaries available only on CDROMs to being accessible online), and they can be born-digital online dictionaries accessible from PCs, flat screens, tablets and smart phones. One major difference between printed and (some) electronic dictionaries is their location in relation to users. Access to printed dictionaries and electronic dictionaries stored on physical media (printed books, CD-ROMs, DVDs, desktop applications) depends on the actual location of the dictionaries relative to users in most situations, whereas it is not a question of where online dictionaries are located but how they can be accessed no matter where users are and when users want access. In light of the above, lexicographers may abandon the dichotomy printed versus electronic dictionaries and instead distinguish between
391
The Bloomsbury Handbook of Lexicography
offline and online dictionaries. Whether printed, electronic, physical or online, many of these dictionary types are likely be with us for some time to come. Developments in some countries, e.g. Scandinavia, may indicate what will happen to printed dictionaries. The Scandinavian languages belong to the so-called small languages and publishers have gradually discontinued printing individual dictionaries and made them available only in electronic form. The large bilingual dictionaries to and from Scandinavian languages are now mainly available in electronic form due to falling sales of printed copies, and the large (multivolume) monolingual dictionaries, whether they are called lexicon, encyclopaedia, etc., have experienced a similar development for some years now. If this trend is applied to the general market for dictionaries, those printed dictionaries that are most likely to survive will be medium-sized, monolingual dictionaries covering one of the United Nations’ world languages, and medium-sized, bilingual dictionaries between these world languages and important pairings of world languages and non-world languages. The surviving dictionaries will be supplemented by online access to the corresponding digitized dictionaries and additional e-texts and digital recordings that support the lexicographic functions of individual dictionaries as described above and with extensions, for example, by providing help to learners who translate texts by showing different translation strategies for particular genres (Nielsen 2010). Online dictionaries may be made available by different types of commercial suppliers. Most publishing houses act as specialist shops offering access to their own dictionaries, usually on a subscription basis, so that (potential) dictionary users can take out subscriptions for one or more dictionaries for specified periods. In the communication and information society another type of actor enters the stage, namely what may be called lexicographic supermarkets. These are internet businesses, not publishers, who offer access to dictionaries they have not produced themselves but licenced or bought from one or more publishers and one or more authors. These lexicographic supermarkets provide a wide range of web-based services catering to peoples’ needs for information and knowledge.
3.2 Digitization of information tools An external factor that has great influence on lexicography is the digitization of information activities. The general trend is that printed media lose market shares to electronic media as carriers of data, and since the 1980s this trend has affected theoretical and practical lexicography. The digitization of communication in general will, in the long run, result in printed dictionaries giving way to online information tools. In principle, everybody can upload information tools, including dictionaries, to the internet and many contribute through crowdsourcing to the contents of various types of wikis, and since it is impossible to predict what everybody will do in the future, a number of selected topics with general application to both theoretical and practical lexicography will be addressed. In today’s knowledge and information society we are constantly exposed to a plethora of data coming to us from many different sources, and this data, or information, blitz will continue in the future. As dictionaries contain data, they should be able to give users something better and more helpful than other information tools; otherwise, it will be difficult to justify their existence. Online dictionaries compete with internet search engines as providers of data that can be turned
392
The Future of Dictionaries
into information when processed by readers, but the main problem of internet search engines is that they tend to provide too many results from searches in a vast sea of unstructured data, and the results are often irrelevant for the particular information needs of searchers. One way in which lexicographers can future-proof dictionaries is to develop theories or principles that allow them to design and produce information tools that give users the opportunity to access structured data with targeted searches and have the search results presented in structured ways that tell users exactly what they need to know. The detachment from linguistics causes lexicographers to change their perception of dictionaries. Talking and thinking about online dictionaries, people often imagine a database that is accessed by users from an interface whose only function is to give direct access to the database. In this case, the relationship between database and dictionary is a one-to-one relationship and the database is identical with the dictionary. Andersen and Almind (2011), Bergenholtz (2011), Bergenholtz et al. (2011), Bothma (2011), Nielsen and Almind (2011) and Spohr (2011), among others, discuss different technical setups describing online dictionaries as constructions with three main components. First, there is a database containing specially selected data that have been structured in a way that facilitates search and retrieval. Second, users may have access to not just one but several dictionaries via their interface. Third, users access the lexicographic data through a search engine introduced as a mediator between the dictionary and the database, allowing users to search for data in the database, from where the search engine retrieves and presents the relevant data according to user requests. In this case, the relationship between database and dictionary is a one-to-many relationship and the dictionary is not identical with the database.
3.3 One database may serve many dictionaries The tools of information and communications technology allow lexicographers to include considerably more data in databases than in printed dictionaries, but lexicographers should carefully consider how many of these data users will be presented with (see also Stutzman and Warfel, Chapter 17). The 18 contributions in Haß and Schmitz (2010) on internet lexicography are generally based on the premise that databases should contain as comprehensive a coverage of linguistic concepts as possible and present all to users, i.e. the more data addressed to a headword the better for documentation purposes and hence users. This reflects the idea that the dictionary is equal to the database and that dictionaries have the sole function of documenting linguistic concepts. Most users, however, do not have documentation problems (this type of problem seems to be the concern of philologists and terminologists) but problems in extra-lexicographic contexts which amount to genuine needs for specific information about something in order to complete a specific task (see the discussion in Bergenholtz 2012). Dictionaries should offer satisfactory help in such situations, not with an overload of data but with carefully selected and presented data that satisfy information needs; this is in line with the general principles of communicating in a modern society as expressed by Sternberg (1988: 58): ‘The best way to inform your reader is to tell them what they are likely to want to know – no more and no less.’ One way in which to develop such dictionaries is to treat the database as a comprehensive collection of structured data in which dictionaries search for the data that tell users exactly what they want to know.
393
The Bloomsbury Handbook of Lexicography
The needs for specific types of data to satisfy specific types of need require a proper lexicographic response. Publishers and lexicographers who treat the database as equal to the dictionary will have to develop a new database for every dictionary, but this is both timeconsuming and expensive. If they take the alternative approach described above, lexicographers and publishers only have to develop one database in order to publish several dictionaries and the database can be monolingual, bilingual and multilingual depending on the design and purpose of the entire dictionary concept. Monolingual databases can only form the basis of monolingual dictionaries, whereas bilingual databases can form the basis of several monolingual dictionaries in either language and several bilingual dictionaries between the two languages. For example, a database that contains data in two languages (L1 and L2) can be the source of the following set of dictionaries with the identified functions: ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●
Dictionary providing help to understand a word or term in L1 Dictionary providing help to understand a word or term in L2 Dictionary providing help to produce a text where the expression is known in L1 Dictionary providing help to produce a text where the expression is known in L2 Dictionary providing help to find a word or term where the meaning is known in L1 Dictionary providing help to find a word or term where the meaning is known in L2 Dictionary providing help to translate a word or term from L1 into L2 Dictionary providing help to translate a word or term from L2 into L1 Dictionary providing help to translate a collocation or phrase from L1 into L2 Dictionary providing help to translate a collocation or phrase from L2 into L1 Dictionary providing help to acquire knowledge about a word, term or concept in L1 Dictionary providing help to acquire knowledge about a word, term or concept in L2
As this list indicates, each dictionary has its own function (i.e. they are monofunctional dictionaries) and search options are tailor-made for the function in question. The two dictionaries providing help to acquire knowledge may be described as polyfunctional in that they can each be used to acquire general knowledge about a topic (e.g. the inflectional paradigm of irregular verbs) or specific knowledge about a topic (e.g. the height of the Eiffel Tower in Paris). Only these two cognitive dictionaries are likely to present users with all the data addressed to a word, term or concept in the database, while the other ten dictionaries present different sets of a carefully selected number of data types from the database. Online dictionaries based on a distinction between database and dictionary have been designed and described in the literature, see e.g. Bergenholtz (2010, 2012), Bergenholtz and Bothma (2011), Bergenholtz and Gouws (2010), Fuertes-Olivera and Nielsen (2012), and Tarp (2011). Nielsen and Almind (2011) describe a set of accounting dictionaries linked to an English and Danish database and show how the database may be connected to the surface feature component called result site (Nielsen et al. 2020). Readers who want to know the meaning of a particular English term found in a text (e.g. notifications) may consult the English dictionary providing help to understand a word or term. The search engine makes a targeted search for the term in the database in those data fields containing headwords and their inflections, allowing users to search for the base form of a word or term as well as its inflected forms as encountered in texts. The search engine retrieves the data addressed to the search string as shown in Figure 23.1.
394
The Future of Dictionaries
Figure 23.1 Search result helping users to understand the word or term ‘notification’. The dictionary presents data that are intended to help users understand words and terms found in texts: the meaning of the term searched for – ‘no more and no less’. Definitions are written with the factual and linguistic competences of the intended user group in mind and written as full sentences using natural language to make it easy for users to turn the data into useful information by a mental process. Authors may need help with writing texts that contain the term notification and consult the dictionary providing help to produce texts where the expression is known. The search engine searches the database for the following three types of data: inflection, collocation and example. Figure 23.2 contains the data retrieved and their arrangement as presented to users. The data types in Figure 23.2 all provide help to write texts in English. The definition allows users to ascertain whether the word has the correct meaning in the writing context and the grammar data, collocations and examples support actual text production. Again, users are presented with a limited number of data types selected from all the data types contained in the database and addressed to the headword notification. Translators who need help with translating the English term notification may consult the dictionary providing help with translating a word or term from English into Danish. The search engine searches the database for the following type of data: inflection. The dictionary presents a range of data types, including: headword, definition, collocations, examples, equivalent and inflection. Figure 23.3 shows the result of the search. In addition to presenting the meaning of the term, the data contain the Danish equivalent ‘anmeldelse’, its inflectional paradigm, two synonyms to the equivalent, English collocations and
395
The Bloomsbury Handbook of Lexicography
Figure 23.2 Search result helping users to write texts with the word or term ‘notification’.
Figure 23.3 Search result helping users to translate the word or term ‘notification’.
396
The Future of Dictionaries
example sentences with translations into Danish. The dictionary recommends one equivalent and informs users that there are two synonymous expressions in Danish, so that users are given one suggested solution to the translation of the English word, instead of presenting three equivalents and leaving it to users to pick one; this keeps the lexicographic information costs at a minimum (see Section 3.4). Authors may want to express a specific meaning but not know the exact word to use. By consulting the dictionary providing help to find a word or term where the meaning is known, users can search for a word expressing the meaning ‘change the original amount’ and the search engine searches the database in the following data types: definition, usage note, synonym and antonym. The dictionary gives users the search result shown in Figure 23.4. On the basis of the data presented in Figure 23.4, users can establish whether the word has the correct meaning through the definition and find help with text production in the form of inflection, collocations and examples. Figures 23.1–23.4 show dictionaries providing help in communicative situations, but help in cognitive situations may also be available. Students may want to acquire general or specific knowledge about the term reinsurer and consult the dictionary providing help to acquire knowledge about a word, term or concept. The search engine makes a targeted search in the database in two types of data, namely inflection and definition, and retrieves the relevant data types as presented in Figure 23.5. The definition in Figure 23.5 explains the meaning of the term, which is complemented by the context example; and the synonym and antonym help place the term ‘reinsurer’ in a terminological hierarchy. By clicking the cross-reference (‘See also’), users are taken to another
Figure 23.4 Data providing help to find a word or term where the meaning is known. 397
The Bloomsbury Handbook of Lexicography
Figure 23.5 Help in cognitive situations providing all data addressed to the search word. article with relevant additional data and the item indicating the source of the definition is a link transferring students to the website where the International Financial Reporting Standard (IFRS) is found. There students can find additional information and gain more knowledge. An alternative description of the above scenario is possible. It may be argued that the bilingual database serves four polyfunctional dictionaries each with several search options so that users who consult one of the monolingual dictionaries have the choice of searching for the following kinds of assistance: ●● ●● ●● ●●
Help to understand a word, term or concept Help to write a text where the expression is known Help to find a word or term where the meaning is known Help to acquire knowledge about a word, term or concept
Users who consult one of the bilingual dictionaries may search for the following kinds of assistance: ●● ●● ●●
Help to understand a word, term or concept Help to translate a word or term Help to translate a collocation or phrase
Search options can thus be tailor-made for specific lexicographic functions and polyfunctional online dictionaries with the above search options will work in a simple way. When they consult the dictionaries, users will type their search strings into search boxes and select the type of help they want, whereupon search engines will search the database and retrieve the relevant data. 398
The Future of Dictionaries
These data will then be presented to users on the dictionary websites in a predetermined order, similar to the examples shown in Figures 23.1–23.5. This is also in line with the communicative principle of telling users only what they need to know and the principle of allowing users to access lexicographic data in different but targeted ways.
3.4 Access to and presentation of lexicographic data Online dictionaries offer various access routes to their data. Some have alphabetical lists of headwords that users can click and then be transferred to the relevant articles, and others are mere digitized texts in which users have to scroll up and down in search for information. In these cases users will be presented with the entire data stock in the articles no matter why they consulted the dictionaries. Modern information technology offers more advanced search options that can be directly linked to user needs in those cases where lexicographers adopt a setup that distinguishes between database and dictionary. By focusing on the needs of users in various types of user situation, lexicographers can ensure that the data retrieved satisfy a specific type of need and are presented in such a way that users can easily turn the data into useful information. Databases and online dictionaries allow lexicographers to take a dynamic approach to accessing and presenting definitions. Bergenholtz and Kaufman (1997) and Nielsen (2011) explain that this may result in the presentation of more than one definition for each meaning of a word or concept in the sense that definitions are written in different genres. First, databases and dictionaries can have one type of definition for each function. A good definition supporting text production in a foreign language is likely to differ from a good definition for text production in the same language by native speakers; and a definition that can best support text comprehension in the user’s native language is likely to differ from a good definition supporting the acquisition of knowledge. Second, an online dictionary designed to help different types of user may contain definitions that reflect the cultural, factual and linguistic competences of, for instance, beginners, intermediate and advanced learners, or laypersons, semi-experts and experts. The definition that can best help laypersons understand a concept is different from the definition that experts and semi-experts need in order to successfully understand the same concept in their field of specialization. One route of access is for users to type the search word into a search box and indicate whether they are laypersons, semi-experts or experts (alternatively: beginners, intermediate or advanced learners). The access option for laypersons will search the database for the graphemic search string in the data fields containing headwords and those containing definitions intended for laypersons and the output device presents the data retrieved. Similarly, the search engine will search the headword fields for the search string and the definition fields for the appropriate definition marked for semi-experts or experts and the output device will show the result of either search. In short, online dictionaries may contain several definitions of the same headword written in different genres and instead of showing all three definitions every time users look for a particular headword, these dictionaries will show only the definition users need. Databases and online dictionaries may also present definitions in two or more languages. This is, for instance, relevant in connection with dictionaries treating English vocabulary for nonnative speakers, and Kwary (2012: 36) shows how it is possible to help users with definitions in
399
The Bloomsbury Handbook of Lexicography
English as a default option and definitions in Indonesian as an option for users whose Englishlanguage competence makes them unable to properly understand the English definitions. The database will contain definitions of the same word or concept in two or more languages, and the dictionary will show the one specifically written for a particular function or type of user as requested by those who consult it. Online dictionaries may allow users to perform non-manual searches. In the future, people may prefer voice-activated access so that they can say a word and this will trigger the search and retrieval functions similar to the way that voice-activated navigation systems work. In these cases users will be given an audio-visual presentation of the data found, e.g. a definition, a phrase or an inflectional paradigm will be shown and read aloud. Voice-activated access and audio-visual presentation of search results will benefit dyslexic and visually impaired persons in particular. Online dictionaries can provide help in situations other than communicative and cognitive. Tarp (2008: 127) explains how dictionaries may function as ‘how-to’s’, i.e. books giving instructions on how to do specific things, e.g. how to operate machines, and such dictionaries have operative functions. For instance, a group of student nurses may have been asked to take blood from patients but have not received any instructions on how to carry out the task. In such situations they may consult online dictionaries and find the necessary instructions either as written text, as series of illustrations or photos, as audio guides, or as video footage with voiceover. Furthermore, it is possible that online dictionaries may present data in three-dimensional form, including holograms. This requires users to have the appropriate equipment, which may be particularly helpful for dictionaries whose functions are operative; for example, by showing the inside of an internal-combustion engine and how the individual parts function. A characteristic feature of digital media is that they can be personalized. Information and communications technology may allow users to personalize their dictionaries through the various search options available as described above, and by allowing users to upload data to their dictionaries. Users may then have personalized dictionaries in which they can search in the database as well as the personally added data. Nevertheless, user-uploaded data should be completely separate from the data in the database (which is accessible only to editors), but it should be possible to connect the uploaded data with specific headwords, definitions, collocations, illustrations, pictures, video footage, etc. Irrespective of technical setup, users value easy and quick access to and clear presentation of data in dictionaries. Lexicographers should therefore consider the ease with which users will be able to acquire the necessary information from the data and the way in which these data are accessed. Ease of access and appropriate presentation of data come under the heading lexicographic information costs, which are defined as the effort that users believe or feel is associated with consulting a dictionary or any part of it. Search-related information costs are the effort related to the look-up activities users have to perform when consulting a dictionary in order to get access to the data they are searching for. The more activities users have to perform to find the help they need, the higher will be the costs. Comprehension-related information costs are the effort related to the ability of users to understand and interpret the data presented in a dictionary, and this effort is related to the cultural, factual and linguistic competences of users and the way in which the data are presented. An appropriate dictionary design and data presentation structure may keep lexicographic information costs at a low level, whereas an inappropriate 400
The Future of Dictionaries
design and use of structures may lead to high information costs. For example, the access options available in online dictionaries affect users’ perception of the costs associated with finding help to solve problems, and small screens on e.g. smartphones may only show partial texts. The actual wording and presentation of data, such as a high degree of textual condensation, may result in high information costs, while clear and consistent access routes may reduce lexicographic information costs (Nielsen 2008: 173–4). If they are presented with too many data, users incur high information costs, not only finding (or not finding) the data that answer their questions, but also having to read all the data to make sure that they have not missed anything. This means that online dictionaries should present the data in such a way that users feel they get answers to their questions with ease and have gained useful information by consulting the dictionaries. Unfortunately, lexicographic information costs cannot be eliminated, but prudent and proper attention to this aspect may result in a reasonable cost level that does not seriously affect the use of dictionaries.
4 Concluding remarks Dictionaries do have a future and dictionaries of the future will, to an increasing extent, be regarded as ‘digital assistants’. This development may be explained in terms of economic sectors in modern society: dictionaries are in a transitional phase from the manufacturing sector into the service sector in an attempt to keep up with the general move into a knowledge and information society. Dictionaries of tomorrow will be information tools which, through their surface and underlying features, provide help to satisfy specific types of lexicographically relevant need of specific types of potential user in specific types of extra-lexicographic situation. These tools will be designed and developed in line with advances in society, for example, communications technology and the general use of digital media, and lexicographers need a platform that allows them to respond satisfactorily to the needs for information and knowledge of actual and potential users. The lexicographic platform will be supported by two pillars: a theoretical and a practical one. The theoretical pillar focuses on needs-adapted data presentation, using principles for making dictionaries that provide users with limited amounts of structured data from which useful information can be retrieved. The practical pillar will be a technical one, using available technical features exactly because they can provide help that satisfies user needs and not simply because they are available. The above discussion is based on the possible courses of action related to presentday knowledge found in general as well as specialized lexicography and shows how databases can serve as bases for several dictionaries; how users can search for help in communicative, cognitive and operative situations; how dictionaries can provide data that specifically cater to different types of user; how users can personalize dictionaries; and how lexicographers can offer users different ways of access to the lexicographic data (see also Fuertes-Olivera, Chapter 21). Online dictionaries thus allow surface features to interact dynamically with underlying features, and vice versa. This does not mean that the setup of online dictionaries will change overnight, but the theoretical and practical issues discussed in this chapter provide some pointers to the future of dictionaries and dictionaries of the future.
401
The Bloomsbury Handbook of Lexicography
References Dictionaries Deuter, M., J. Bradbery and J. Turnbull (managing eds) (2015), Oxford Advanced Learner’s Dictionary, Ninth Revised edition, Oxford: Oxford University Press. Nielsen, S., L. Mourier and H. Bergenholtz (2020), Accounting Dictionaries (A series of 13 interconnected Danish, English, Danish-English, English-Danish dictionaries), Database and design: Richard Almind and Jesper Skovgård Nielsen, Odense: Ordbogen.com. http://www.ordbogen.com [accessed 12 August 2020]. Richards, P.H. and L.B. Curzon (2011), Dictionary of Law, Eighth edition, Harlow: Longman.
Other references Andersen, B. and R. Almind (2011), ‘The technical realization of three monofunctional phrasal verb dictionaries’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), 208–29. Andersen, B. and S. Nielsen (2009), ‘Ten key issues in lexicography for the future’ in H. Bergenholtz, S. Nielsen and S. Tarp (eds), 355–63. Bergenholtz, H. (2010), ‘Needs-adapted data access and data presentation’ in Doctorado Honoris Causa del Excmo. Sr. D. Henning Bergenholtz, Valladolid: Universidad de Valladolid, 41–57. Bergenholtz, H. (2011), ‘Access to and presentation of needs-adapted data in monofunctional internet dictionaries’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), 30–53. Bergenholtz, H. (2012), ‘Concepts for monofunctional accounting dictionaries’, Terminology 18 (2), 243–63. Bergenholtz, H. and T. Bothma (2011), ‘Needs-adapted data presentation in e-information tools’, Lexikos 21, 53–77. Bergenholtz, H., T. Bothma and R. Gouws (2011), ‘A model for integrated dictionaries of fixed expressions’ in I. Kosem and K. Kosem (eds), Electronic Lexicography in the 21st Century. New Applications for New Users: Proceedings of eLex 2011, Bled, 10–12 November 2011, Trojina: Institute for Applied Slovene Studies, 34–42. Bergenholtz, H. and R. Gouws (2010), ‘A new perspective on the access process’, Hermes 44, 103–27. Bergenholtz, H. and U. Kaufmann (1997), ‘Terminography and lexicography. A critical survey of dictionaries from a single specialized field’, Hermes 18, 91–125. Bergenholtz, H., S. Nielsen and S. Tarp (eds) (2009), Lexicography at a Crossroads. Dictionaries and Encyclopedias Today, Lexicographical Tools Tomorrow, Bern: Peter Lang. Bergenholtz, H. and S. Tarp (2010), ‘SP lexicography or terminography? The lexicographer’s point of view’ in P.A. Fuertes-Olivera (ed.), 27–38. Bothma, T. (2011), ‘Filtering and adapting data and information in an online environment in response to user needs’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), 71–102. Crystal, D. (2010), The Cambridge Encyclopedia of Language, Third edition, Cambridge: Cambridge University Press. Fuertes-Olivera, P.A. (ed.) (2010), Specialised Dictionaries for Learners, Berlin and New York: Walter de Gruyter. Fuertes-Olivera, P.A. and H. Bergenholtz (eds) (2011), e-Lexicography. The Internet, Digital Initiatives and Lexicography, London and New York: Continuum. Fuertes-Olivera, P.A. and S. Nielsen (2012), ‘Online dictionaries for assisting translators of LSP texts: The accounting dictionaries’, International Journal of Lexicography 25 (2), 191–215. Gouws, R. H. (2011), ‘Learning, unlearning and innovation in the planning of electronic dictionaries’ in P. A. Fuertes-Olivera and H. Bergenholtz (eds), 17–29.
402
The Future of Dictionaries
Hartmann, R.R K. (2012), ‘[Review of] Pedro A. Fuertes-Olivera and Henning Bergenholtz (eds), e-Lexicography. The Internet, Digital Initiatives and Lexicography’, International Journal of Lexicography 25 (1), 99–103. Haß, U. and U. Schmitz (eds) (2010), ‘Thematic section’, Lexicographica 26. Humbley, J., G. Budin and C. Laurén (eds) (2018), Languages for Special Purposes. An International Handbook, Berlin and Boston: Walter de Gruyter. Kallas, J., S. Koeva, M. Langemets, C. Tiberius and I. Kosem (2019), ‘Lexicographic practices in Europe: Results of the ELEXIS survey on user needs’ in I. Kozem et al. (eds), 519–36. Kosem, I., T. Zingano Kuhn, M. Correia, J.P. Ferreira, M. Jansen, I. Pereira, J. Kallas, M. Jakubíček, S. Krek, and C. Tiberius (eds) (2019), Electronic lexicography in the 21st Century: Smart Lexicography. Proceedings of the eLex 2019 Conference, Brno: Lexical Computing CZ. Available at https://elex.link/ elex2019/ [accessed 12 August 2020]. Kwary, D. (2012), ‘Adaptive hypermedia and user-oriented data for online dictionaries: A case study on an English dictionary of finance for Indonesian students’, International Journal of Lexicography 25 (1), 30–49. Mićić, P. (2010), The Five Futures Glasses. How to See and Understand More of the Future with the Eltville Model, Basingstoke: Palgrave Macmillan. Nielsen, S. (2008), ‘The effect of lexicographical information costs on dictionary making and use’, Lexikos 18, 170–89. Nielsen, S. (2009), ‘The evaluation of the outside matter in dictionary reviews’, Lexikos 19, 207–24. Nielsen, S. (2010), ‘Specialised translation dictionaries for learners’ in P.A. Fuertes-Olivera (ed.), 69–82. Nielsen, S. (2011), ‘Function- and user-related definitions in online dictionaries’ in F.I. Kartashkova (ed.), Ivanovskaya leksikografischeskaya shkola: traditsii i innovatsii [Ivanovo School of Lexicography: Traditions and Innovations]: A Festschrift in Honour of Professor Olga Karpova, Ivanovo: Ivanovo State University, 197–219. Nielsen, S. (2018), ‘LSP lexicography and typology of specialized dictionaries’ in J. Humbley, G. Budin and C. Laurén (eds), 78–90. Nielsen, S. and R. Almind (2011), ‘From data to dictionary’ in P. A. Fuertes-Olivera and H. Bergenholtz (eds), 141–67. Samaniego Fernández, E. and B. Pérez Cabello De Alba (2011) ‘Conclusions: Ten key issues in e-lexicography for the future’ in P. A. Fuertes-Olivera and H. Bergenholtz (eds), 305–11. Spohr, D. (2011), ‘A multi-layer architecture for “pluri-monofunctional” dictionaries’ in P. A. FuertesOlivera and H. Bergenholtz (eds), 103–20. Sterkenburg, P. van (ed.) (2003), A Practical Guide to Lexicography, Amsterdam and Philadelphia: John Benjamins. Sternberg, R.J. (1988), The Psychologist’s Companion: A Guide to Scientific Writing for Students and Researchers, Cambridge: Cambridge University Press and British Psychological Society. Tarp, S. (2008), ‘The third leg of two-legged lexicography’, Hermes 40, 117–31. Tarp, S. (2009), ‘Beyond lexicography: New visions and challenges in the information age’ in H. Bergenholtz, S. Nielsen and S. Tarp (eds), 17–32. Tarp, S. (2011), ‘Lexicographical and other e-tools for consultation purposes: Towards the individualization of needs satisfaction’ in P.A. Fuertes-Olivera and H. Bergenholtz (eds), 54–70. Tono, Y. (2009), ‘Pocket electronic dictionaries in Japan: User perspectives’ in H. Bergenholtz, S. Nielsen and S. Tarp (eds), 33–67.
403
404
24
The design of internet dictionaries Annette Klosa-Kückelhaus and Frank Michaelis
1 Introduction This contribution will provide an overview of the essential role played by design in both the form of dictionaries and their usability, as well as examining the different traditions which exist in the design of (print and electronic) dictionaries (Section 2.1). The development of dictionary design depends on the intended context in which the dictionary will be used, its potential users and its data modelling (Section 2.2). Usage studies (Section 3) can help to look further into user needs concerning dictionary design. The design practice is dependent on a number of elements which are not unique to dictionaries, but also on many dictionary-specific factors: for example, whether the dictionary is a retrospective digitalization project or whether the design takes as its starting point the content of the dictionary or its intended users (Section 4). Search functionality is what provides access to an internet dictionary, so that its design also has to be planned carefully (Section 5). Finally, the role of established design frameworks in the design process will be considered, including how templates are employed and how the lexicographical process should be informed by the interconnected development of content and design (Section 6). In this way, design is much more than mere aesthetic eye candy, added on top of the core conception of the dictionary. The design of practical everyday objects, including tools like dictionaries, involves a wide range of aims and requirements. As such, functional, economic and aesthetic factors all need to be taken into account, and even psychological aspects, such as the emotions that users associate with the object. In this context, design means developing the best possible solution for a product so that these potentially competing requirements are assembled into an effective whole.
2 General thoughts on the design of (internet) dictionaries 2.1 Similarities and differences between print and online design Print dictionaries are created according to the principles of graphic design, where typography plays the most important role. There are familiar design elements for print dictionaries which have, in some cases, been handed down over many hundreds of years: the alphabetical order of the headwords, which appear in bold at the beginning of the entry; the layout of the headwords in
The Bloomsbury Handbook of Lexicography
columns (usually two per page) or the indication of the range of entries on a page through column headings at the top of that page. Here, it has always been a problem that the print space has to be used as economically as possible, leading to high text-density and making reading more difficult. For this reason, design decisions for print dictionaries mostly seek to achieve a balance between the need to present the text in a readable manner and the need to optimize use of the limited print space. In the online medium, different conditions tend to apply, so that different design decisions can be reached. Website content is described in a hierarchically structured fashion using HTML (Hypertext Markup Language), which a browser converts into the desired form of presentation. In the early years of the worldwide web, HTML left it to individual browsers to determine how particular elements, such as text, were represented, so that the design of websites left much to be desired. Nowadays, the combination of HTML and cascading style sheets (CSS) gives web developers greater control over the design of their websites, as CSS defines their graphic style, colour, and animations, as well as the layout for printing or for small screens. Recent enhancements, such as web-fonts and rules for complex grid layouts, mean that HTML and CSS come close to print in the possible forms of representation that they offer. However, this wide range of possibilities also brings with it the need to use them correctly. As such, the demands placed on web designers’ skills and the resources that need to be invested in the design of dictionaries have also increased. In the digital medium, the new aspect of user interface design, or application design, adds a further element to the design of the text itself, including the wide range of interactions that users have with a dictionary website. For example, internet dictionaries contain links (the defining characteristic of hypertext), and they have a number of standardized interactive elements, such as buttons, text fields or menus (all of which are already included in HTML to create simple forms of input). Finally, JavaScript makes it possible to make dynamic changes to the content of websites and to develop particular components (also known as widgets), such as tabs and menus, which are not included in the HTML standard. This facilitates complex interactions between users and internet dictionaries. If these are implemented correctly, then users do not have to learn specially how to look things up or how to navigate in an internet dictionary. Rather, the dictionary ‘functions’ in the same way as other websites and applications. At its best, therefore, the design of digital dictionaries draws on both traditional graphic design and user-interface design from the field of software development. Depending on how interactive the design for an internet dictionary needs to be, a greater or lesser number of application design elements can be incorporated. Here, the dictionary text and its word entries undoubtedly remain at the centre of the overall design, but complex, dictionary-specific components can be designed in the application around that text, such as headword lists, indexes, extended search functions or data visualizations.
2.2 Design dependencies In all design decisions, it is possible to distinguish three sets of dependencies for web design: first, on the context in which an internet dictionary is used; second, on the form of modelling chosen for the dictionary data and, third, on the dictionary’s users.
406
The Design of Internet Dictionaries
2.2.1 Context of use In the case of a stand-alone dictionary, design decisions are, of course, more independent than if it is part of a dictionary portal or if it is embedded in another application, such as a text editor or a language-learning platform. In these cases, the design standards from the environment in which the dictionary is embedded must first be implemented. In the most extreme cases, the dictionary in its own right may disappear almost entirely from the user interface and is visible only, for example, in a text editor through the wavy underlining of an incorrectly spelled word and the suggestion provided for how the word should be spelled. A dictionary intended to be used on a mobile device is subject to different constraints than a dictionary for a desktop browser. This includes not only the space available on the screen where the dictionary content is to be displayed, but also a variety of control elements. While the mouse can be used in a desktop browser, navigation on mobile devices works by touching the surface of the screen with a stylus or with your fingers. For example, controls on mobile pages have to be designed to be large enough to allow them to be operated reliably and, of course, some functions, such as ‘mouse-over’ effects, are absent altogether from mobile sites. While mouse clicks are undoubtedly the primary form of interaction for desktop browsers, mobile platforms offer a wider range of navigation gestures, such as swiping or pinch and zoom. Location and light conditions also have a role to play. For example, an internet dictionary which is to be used primarily outside, on a smartphone, in glaring sunlight, has to use contrast differently to one which is used mostly in indoor spaces. For this reason, many websites now have two design variants – a light mode and a dark mode – and allow the user to adjust them accordingly.
2.2.2 Data modelling Naturally, the structure of the dictionary content itself has a decisive effect on design. A fundamental distinction exists between textual data and structured data. Textual data consists of continuous discursive, narrative, or argumentative text in natural language (in contrast to formal language). In addition, this form of data may contain an internal informational structure, distinguished in semantic terms (e.g. headings, quotations, references). By contrast, data structures or records consist of pairs of so-called keys and values: in the context of dictionaries, for example, these records consist of the ‘lemma’ and ‘word class’ keys or of the lemma values hand, run, diligent or you and the word-class values noun, verb, adjective, pronoun and so on. These pairs of keys and values can be assembled into groups or objects, combined into more complex structures in trees and lists, and stored in databases. In XML (Extensible Markup Language), which is the most common metalanguage in lexicography, structured textual data is also referred to as ‘mixed content’. Keys correspond to the names of elements or attributes and values to the specific values of the elements or attributes. In most cases, dictionaries are characterized by hybrid forms of textual and structured data: i.e. data structures containing additional information (e.g. metadata) may be embedded in the text. These embedded data structures might also break down information that is formulated in a dictionary text into a formal representation, or model, which can be understood by a computer. Conversely, data structures may be supplemented by textual data, for example, in the form of detailed commentary fields.
407
The Bloomsbury Handbook of Lexicography
As far as textual data is concerned, the emphasis in design rests primarily on typography and legibility. For data structures, the design often reflects the tree structure of the data in list-form or in a hierarchically organized form, reminiscent of a table of contents. However, data structures can also be presented in a form similar to continuous text, for example, when entries from a list are arranged one after another in the same line, separated by commas. In any case, planning the graphic display of the dictionary data itself in the design of internet dictionaries involves combining both principles (continuous text and structured text) in a manner appropriate to the data, the context of use, and the user.
2.2.3 The user Focusing on the user in the design process – in other words, user-centric or human-centric design – has its origins in industrial product design and follows the well-known principle that ‘form follows function’. Applied to the use of a dictionary, this means that the elements in the dictionary and its content are organized and designed in such a way that the user is able to successfully look up what they need to, while expending the minimum possible time and cognitive effort. If the user is to be the starting point for design decisions, then naturally a number of questions have to be answered. For example: Who is a (typical) user? Which problems do they typically want to solve? What is the (typical) search behaviour adopted to answer the problem? We can begin to answer these kinds of questions through so-called ‘user stories’. These are case study scenarios involving fictional users (who are conceived in as concrete and realistic a way as possible), which give designers a framework for the development process. User testing and dictionary usage studies (cf. Section 3) can then be employed to establish how effective these scenarios and planning strategies prove themselves to be in reality. By contrast, many internet dictionaries nowadays continue to adopt a content-centric approach to design; that is, they list their information in a more or less condensed fashion, organizing it according to their internal structure (which is primarily motivated by lexicological or lexicographical principles). As such, it is left to the unspecified user to extract from the internet dictionary the information relevant to them in a particular situation. This is particularly the case for general monolingual or multilingual dictionaries which are not integrated into other applications. However, if a dictionary is embedded in an application and a specific context of use, then, as one would expect, the user and their aims should exert a strong influence on the design. Unfortunately, embedded dictionaries and those intended for specific purposes have tended to play a lesser role in academic lexicography up to now.
3 Usage studies on design Although there is now a relatively long tradition of research into the use of print and internet dictionaries (see Nesi, Chapter 5), there are not many usage studies which deal specifically with questions of design. Hitherto, research in metalexicography has also not tended to concentrate on design issues for internet dictionaries. Publications by Almind (2005), Debus-Gregor and Heid (2013), Oppentocht and Schutz (2003), Spohr (2008) and Swanepoel (2001) have focused on the connection between the modelling of data and its online presentation. Studies by Corréard 408
The Design of Internet Dictionaries
(2002), Hollós (2018), Lew (in press) and Schmitz (2016) looked, above all, at the arrangement of the lexicographical information on the screen. Other researchers, notably Dziemianko (2014, 2015 and 2016), have examined the positioning of particular kinds of information or the use of colour. Finally, Storjohann (2018), Torner and Arias-Badia (2019) and Michaelis, Müller-Spitzer and Wolfer (2019), among others, have concerned themselves with possible new forms of data presentation. Usage studies on internet dictionaries in the form of eye-tracking experiments have been undertaken notably by Lew (2010), Lew et al. (2013), Lew and Tokarek (2010), Nesi and Tan (2011) and Tono (2000 and 2011), while Müller-Spitzer, Michaelis and Koplenig (2014) used this method to test a new design for a dictionary portal. Heid and Zimmermann (2012) propose usability testing as a method to develop the design of internet dictionaries, and Koplenig and Müller-Spitzer (2014) outline the results from a usage study on a variety of possible forms of data presentation.
Figure 24.1a Heat-map of participants in an eye-tracking study scanning the OWID website as a whole (Müller-Spitzer, Koplenig and Michaelis 2014: 724). 409
The Bloomsbury Handbook of Lexicography
Eye-tracking studies, in particular, make it possible to assess in detail whether the arrangement of information on the screen, the typographical design, and the use of colour, etc. are understood in the given situation by the study participants in the way that was planned and whether they are used by them to orient the way they look at the screen. The heat-map in Figure 24.1.a illustrates that participants spent time looking at all the parts of the website when they were given the task of familiarizing themselves with the OWID dictionary portal; Figure 24.1.b demonstrates that, when asked to find out which dictionaries are combined on the portal, participants’ attention lingered above all on the list of these resources. Thus, the design developed for the dictionary portal is successful in helping the user to orient themselves between the different kinds of information.
Figure 24.1b Heat-map of participants in an eye-tracking study scanning the OWID website for all dictionaries included (Müller-Spitzer, Koplenig and Michaelis 2014: 724).
410
The Design of Internet Dictionaries
4 Design practice for internet dictionaries 4.1 Design fundamentals If we view internet dictionaries as a subset of websites more generally, then the design options and rules that have been developed in this field will also apply to them. For designers of internet dictionaries, this has the crucial advantage that they can draw on a breadth of existing design practice and experience. As explained in Section 2.1, web design is influenced by print and graphic design and their traditions that reach back centuries. This should not surprise us: for all that our technology and media might have changed, humans’ cognitive capacities in the way they interact with text and image cannot have changed in any fundamental way in what is, in evolutionary terms, a relatively short period of time. Something which was easy or difficult to read 200 years ago will continue to be so today. It is beyond the scope of this article to provide a comprehensive overview of the wide variety of design traditions and schools. However, we would like to present a selection of basic principles as they apply to internet dictionaries, before addressing more dictionary-specific issues. It is not possible to answer, in general terms, the question of what design should actually accomplish. Some text or page design is intended to put the user in a particular mood and make them associate the content with a particular experience, usually an emotional one. This is the domain of UX design (user experience design), and, although this aim seems to be of greater importance for marketing and product pages, it also plays a role in internet dictionaries. For reference works, for example, an appearance which communicates ‘reliability’ and ‘credibility’ might be appropriate, comparable with news broadcasting. A dictionary that addresses a very specialist group of users – for example, sportspeople or computer enthusiasts – might prefer to adopt a ‘modern’ or ‘fresh’ look. However, conveying information quickly and simply should be a common goal of most dictionaries, so that design principles such as visual hierarchy, consistency and legibility play a significant role in most dictionary design decisions. Here, legibility means the extent to which a text can be read easily and without tiring the eyes. Decisive in this context are design techniques such as line length, line spacing, font size, choice of font and the contrast between the colour of the font and the background. Consistency (and repetition) refers to the uniform design of recurring elements, and, thereby, the reduction of cognitive effort on the part of the user, who does not have to learn the position and use of control elements of the interface again and again. The rule that ‘less is more’ also has a place here, since any newly created and different element must be repeatedly learnt, and understood afresh, by the user. The principle of visual hierarchy means that every element on the page possesses a specific level of importance. If all the elements on a page have the same importance, then the user will not know where to look first. The visual hierarchy of the page should establish a structure to deliberately direct the user’s attention towards particular focal points. The use of colour and scale are relevant design techniques in this context, as well as animations, which are particularly effective at securing and holding the user’s attention. Figure 24.2 demonstrates how design techniques such as white space and proximity, colour, contrast, scale, alignment, shapes and typography can be used in a dictionary text in different ways, and in combination with one another, in order to support the principles outlined above. 411
The Bloomsbury Handbook of Lexicography
Figure 24.2 Entry ‘administrator’ in Dictionary of South African English.
On websites, traditional design elements are supplemented by elements which originate in the field of native interface and input mask design and which facilitate the user’s interaction with the computer. In user interface design, components (also known as widgets) are the basic building blocks which are used to assemble more complex structures, such as the individual views of an application or the application as a whole. Components themselves are, in turn, made up of smaller components, or design primitives (lines, shapes, text) (see Figure 24.3). 412
The Design of Internet Dictionaries
Figure 24.3 Examples for design primitives in Dictionary of South African English.
In addition, it is possible to distinguish these components according to their function. Hence, there are components: ●● ●●
●● ●● ●●
For grouping and organizing content, e.g. cards, lists, text sections, accordions For navigating within content, e.g. tabs, navigation drawer, navigation bars (top, side, bottom) For performing tasks or giving commands, e.g. buttons, menus For user input or selections, e.g. text input fields, select boxes, check boxes For messages or responses from the application, e.g. popups, progress bars, dialogue boxes, status bars.
A particular challenge for user interface design is that these components also have to be (repeatedly) recognized as such by the user. Hence, these components tend to exist in a similar form in all operating systems (Windows, Linux, Android, iOS). However, they intentionally diverge from one another in their specific design, in order to create an individual look and feel unique to the particular product. Websites, including internet dictionaries, make use of the same techniques and are able to design their own look and feel. If the design of the user interface diverges too far from the conventions of the operating system which is most familiar to the user, then there is a real danger that users will no longer recognize the components as interface components and will not know how to operate them. Moreover, the implementation of the user interface design and interactive components is more demanding than that of static content. Components often possess several states, which have to be distinguished visually from one another. A button, for example, can be ‘normal’, ‘pressed’, ‘focused’, ‘active’ or ‘disabled’. The implementation of design techniques must be 413
The Bloomsbury Handbook of Lexicography
well organized, in order to ensure that these states can be distinguished from one another and that there is consistency in their presentation. Users also require direct visual feedback to show whether their action has been successful or unsuccessful. For example, a button that does not change its state when the user clicks on it leaves the user unaware whether or not the computer has recognized the click and whether it will perform the required action. In this, modern user interface design (as of the year 2020) seeks to be as unobtrusive as possible. Instead of using text to provide lengthy status messages, an action button will change colour; for example, if the action has been successful, it will change to green and its label to a tick; if not, then it will turn red and the label will become a cross. Implementing these kinds of animated micro-interactions assumes at least basic knowledge about animations on the part of the designer. Another complex area is accessibility, that is, design which ensures access without any barriers. The technical possibilities for accessible design have improved over the years on the part the browser, but (as of 2020) designers often still lack knowledge and experience in implementing these recommendations and guidelines. Standardization organizations, such as W3C, provide assistance in this area and are driving development forward, for example, with the Web Content Accessibility Guidelines (WCAG) 2.0. Nowadays, development tools in browsers indicate to designers whether, for instance, the contrast they have chosen between the foreground and background meets these guidelines. HTML itself includes additional mark-ups which make it easier for text-to-speech programs to read an HTML page. However, planning for all these technologies brings with it a discernible increase in design effort, and it is essential that these be taken into account in the design of internet dictionaries.
4.2 Specific aspects of internet dictionary design 4.2.1 Retrospective digital dictionaries There are considerable overlaps with textual scholarship in the presentation of retrospective digital dictionaries, that is, print dictionaries, usually older ones, which are subsequently digitalized. One principal characteristic of these projects is often to achieve as exact a reproduction as possible of the original text. Hence, the pagination of the print version is frequently retained, to ensure that the online version can still be cited. Editorial interventions have to be marked and created in such a way that they are recognizable and so on. In terms of design, a particularly interesting issue is how a relationship is constructed between the ‘modern’ dictionary application and the ‘old’ dictionary pages and content which are contained within it. This discontinuity is particularly striking in cases of image digitalization, where the user is presented with scanned images of the original dictionary. But that discontinuity can also be intentional, an indicator to the user to remind them that they are precisely reading a historical source, rather than a contemporary reference work. By contrast, the opportunity presents itself in digital transcriptions of older print dictionaries to re-evaluate the original print design, for example, improving its clarity by introducing a clearer visual hierarchy or replacing an old-fashioned typeface, such as Fraktur, with a modern font in order to ensure legibility for twenty-first-century readers. If users are presented with a historical dictionary in the form of a contemporary design, then there is, of course, an increased risk that 414
The Design of Internet Dictionaries
the user will confuse it with a contemporary dictionary. Unfortunately, there are limited design options available to counteract this misunderstanding.
4.2.2 Content-centric presentation On a very abstract level (and from a design perspective), many dictionary entries can be described as a structure in which the lexicographical information about a headword is organized into thematically related groups; then, alongside that information, these groups may contain further subordinate groups (e.g. primary meaning and secondary meaning). In content-centric design, the dictionary interface then reflects, in a more or less 1:1 manner, this tree-like structure, nested in as many levels as necessary. This hierarchical structure is intended to enable the user to grasp quickly the structural organization of the entry, so that they can direct their attention to the relevant block. Of course, it is a prerequisite for this that the user has a prior expectation about what type of information they can find in which group and how this information can help them in answering their problem. Whether these expectations of users on the part of lexicographers are realistic is the object of enquiry in user research and dictionary design (cf. Section 3). Figure 24.3 shows the design techniques employed to translate this hierarchical lexicographical structure into a visual hierarchy.
4.2.3 User-/human-centric design In user-centric design, it is no longer the lexicographical structure which stands at the centre, but rather design is oriented towards the actual task which the user is undertaking or the problem which they are seeking to solve. The dictionary Paronyme – Dynamisch im Kontrast, for example, is a dictionary which is intended to help the user with uncertainty about the meaning and usage of German paronyms. In many of the views in this dictionary, the design attempts to assist in the task of ‘comparing and contrasting’. Users see all of the partial meanings at a glance in a sortable overview and are able to choose up to three of them, receiving the corresponding detailed views, presented alongside one another in an overlay. This is intended to ensure that the user can access for themselves similarities and differences in the words, down to the level of individual examples of usage. If the users’ tasks and questions are placed at the centre of the design, then the question arises as to why those tasks and questions should not be resolved at the point where they arise. The integration of dictionaries in text editing programs, for text production, for example, or in digital editions of texts, to ensure their comprehension, would be a logical next step. Here, dictionaries no longer appear as independent entities, but rather, as far as possible, fit seamlessly into the user’s working environment, in order to support them in their actual work, composing or interpreting texts. This is already standard today for very simple lexicographical questions, such as spelling or hyphenation. In these kinds of applications, the challenge for design lies more in the area of functional integration than in visual design
4.2.4 Other features of online dictionaries In addition to entries for individual words, internet dictionaries can provide a range of further texts, illustrations or applications which, above all, make it easier to access the information relating to the words contained in the dictionary. For one thing, there are overviews of word 415
The Bloomsbury Handbook of Lexicography
entries which satisfy particular criteria: for example, in a dictionary of neologisms, a list of words which emerged in a particular time period; in a dictionary of loan words, lists of words borrowed from a particular language; or in a general dictionary, a list of all the words derived from proper nouns and so on. The word entries included in the lists are created as hyperlinks, so that these kinds of list not only have an informational value referring to the content of the dictionary, but also provide possible points of access to that content. Visualizations, such as word clouds, can also be used as navigation tools, inviting users to explore the content of the dictionary. This is all the more the case if these are interactive visualizations. For example, if a corresponding data model allows it, chains of loanwords from one language into a series of other languages can be represented as an interactive graph in which the user can navigate. Nevertheless, such complex representations are more appropriate for illustrative purposes and to encourage exploration of dictionary content; they are not suitable for quickly looking something up. Finally, it is possible to integrate static illustrations, videos or audio data, alongside text and visualizations. Dictionary design has to plan for these kinds of elements: for example, decisions need to be taken as to whether photographs, film or audio clips should only be opened or started by clicking on them, whether they should be integrated into the dictionary interface or open in a new window or whether they should simply be signposted with hyperlinks, as appropriate. In conceptual terms, it is important, in each case, to ensure a close interconnection between the word entry and these kinds of feature.
5 The design of search functions Users of internet dictionaries are familiar with three different search options, which they recognize from other websites: a simple search for a search term; a search for characteristics or attributes; and a full-text search (see also Pastor and Alcina, Chapter 8). Each of these possibilities not only offers advantages and disadvantages for the user but also poses challenges for the design of an internet dictionary. The simplest form to search in an internet dictionary is to enter a search term into a search field (the position of which on the page should satisfy the familiar requirements for websites more generally). However, in both online and print dictionaries, there is the problem of how to search for a word the spelling of which you do not know. In internet dictionaries, this problem is addressed through fuzzy search options which tolerate errors. In cases where the lemma is not searched for, but rather an inflected form or a historical variant, (automatic) re-direction in internet dictionaries replaces the customary cross-references from print dictionaries. For this, the orthographic or morphological variants of a lemma are stored in a search index, where additional information can be provided, for example, information on their relationship with the lemma. Already when it is being typed into the search field, the search term can be automatically and incrementally completed, what is known as ‘type-ahead search’. Suggestions for words can also be shown (‘Did you mean … ’), from which the user can select the relevant entry. If the entry for a single word is found, then this is usually shown directly on the screen. If a search generates multiple search results, then the situation is different, and these are displayed on a separate page of search results.
416
The Design of Internet Dictionaries
The value of a search for properties is, above all, to limit the massive number of hits for the user, something which is particularly common on websites for online retailers. Hence, shoppers in an online shop can search only for blue sweaters, made of cotton, with long sleeves and a V-neck, in a price range between $30 and $50. This is not easy to translate to dictionaries, since it does not usually help, when searching for a particular word, to limit that search according to word class, number of syllables, inflectability and so on. However, these kinds of searches by property do exist in internet dictionaries. They allow using the dictionary like a database: searching, for instance, in the context of lexicological research, for examples of verbs borrowed in the eighteenth century from French into Italian; searching for examples of word entries in which a quotation from Jane Austen provides the first attested usage in English; or searching for German neologisms from the 1990s which do not originate in English. In design, these kinds of search frequently draw on menus and dropdown lists, among other techniques. The results of the search are displayed on separate pages on which the results can often be further sorted or filtered, before the user finally, out of the whole mass of results, follows the hyperlinks to individual word entries or exports or prints the search results as a whole. In a full-text search, a search term is generally searched for in the visible dictionary text, that is, in all word entries and, where applicable, also in the surrounding text, irrespective of whether the dictionary consists of textual data or structured data (cf. Section 2.2.2). To limit the number of hits, many internet dictionaries offer this search at specific levels of the text, for example, only in the definitions, the citations or the examples of usage. In terms of design, search results are displayed according to well-known models from other applications (e.g. Google), whereby a small snippet of the text is shown with the highlighted result. In cases with very high numbers of hits, the search results are distributed across several pages, so-called ‘paging’. A hyperlink leads from each snippet to the original dictionary text.
6 The design process At the end of this presentation of the design of internet dictionaries, it is worth including some reflections on the process of design development. Where possible and appropriate, these should draw on well-known design frameworks and should, at least, give consideration to the use of templates. Finally, in the planning of a dictionary project, the design process should be integrated at an early stage into the lexicographical process, in order to facilitate the development of a form of presentation which is attractive, intuitive to use, and appropriate to the subject area of the dictionary and its intended function.
6.1 Established design frameworks There is something to be said for engaging with the design guidelines and frameworks developed by the major producers of operating systems (Google/Android, Microsoft and Apple). As has already been mentioned in Section 4.1, it is the ‘native look and feel’ of the surrounding operating system to which users are most accustomed. They already have certain expectations about how
417
The Bloomsbury Handbook of Lexicography
the elements on their screen should behave, and applications that do not hold to those conventions irritate, or even annoy, users. On top of that come the not inconsiderable effort and complexity involved in the development of a new design system. Adopting existing designs allows focusing on the development of the components specific to the application. In addition to technical documentation and tutorials on web development, firms such as Google and Microsoft provide detailed documentation and, above all, explanations of their design guidelines, for example, Google’s Material Design. The design systems, or guidelines, describe what has evolved over the years into ‘good practice’. They contain collections of standard components, colour schemes and also standard navigation and interaction models (for their platform). Pairs of ‘dos’ and ‘don’ts’ illustrations help designers to avoid making simple errors which can irritate users. However, this consolidation of design conventions through market success does not always lead to the best possible design. A prominent example of this is our standard keyboard layout, which still follows that of typewriters and which is far from optimal in ergonomic terms. For this reason, user research (cf. Section 3) and creative experiments are important in order to question and challenge existing conventions. Alongside the more gradual general development of design, there are also design fashions and trends. Perhaps best known is the Web 2.0 with its glossy image buttons (early 2000). Nowadays (as of 2020), so-called flat design tends to dominate. However, these are more stylistic elements than design elements in the strictest sense. Nonetheless, as is the case in fashion, what was once the latest style quickly appears old-fashioned in the present, if not outright ridiculous. Since internet dictionaries are mostly long-term undertakings, elements which are characteristic of a particular fashion should be used with caution. As they stand out, they can quickly make what is actually a well-designed and well-functioning site appear old-fashioned.
6.2 Templates There are numerous resources on the internet which offer website templates, frequently as opensource material, free for anyone to use. These can be implementations by the manufacturer or by third parties of existing design frameworks, such as Google’s Material Design, or implementations of their own designs. Many prominent websites make their own framework available, such as Twitter and its Bootstrap system. If a template is used in a great number of other projects, as Bootstrap has been, then the design acquires a certain prominence and familiarity. This degree of familiarity is an advantage in terms of usability. However, it becomes more difficult to distinguish one project from another visually. A further definite advantage of using existing templates is the ability to draw on the work of professional designers and developers. However, because designers are oriented towards what is common in the market, these templates tend to be conceived more for blogs, portfolios and commercial or marketing sites. The particular requirements of internet dictionaries play no role at all. Depending on the framework, extending and modifying an existing template to one’s special needs can sometimes be expensive and can, in some circumstances, require just as much prior knowledge as implementing one’s own design.
418
The Design of Internet Dictionaries
6.3 Processes The following lexicographical processes would be involved in producing an internet dictionary according to the waterfall model: starting with the planning and conception of the dictionary, the process would then move on to the preparation and provision of the dictionary sources for the creation of the word entries. Next, the web application would be implemented, followed by the proofreading and testing of the interface. Finally, the internet dictionary would be released or would go on sale. However, this linear process can be problematic in some circumstances: for example, problems that were not identified during the planning phase, or dealt with inadequately, can only be resolved later in a very time- and cost-intensive way. Moreover, feedback from users that is gathered only after release or delivery cannot be taken into consideration during the development of the dictionary. When applied to the design of internet dictionaries, it is important to consider that the linear planning and realization of a dictionary project results in particular dependencies between content and presentation being identified only when it is too late and being reworked only at great cost, if at all. For example, an internet dictionary project plans to offer brief lexicographical commentaries in a small pop-up window. These commentaries cannot be generated as abbreviated versions of longer sections of text from the main entry or from abbreviations included in the entry. Instead, the data model has to provide for this information type from the very beginning. For these reasons, an iterative design process should be chosen for internet dictionaries, in which developmental phases focusing on specific areas can be run on numerous occasions. In this kind of process, prototypes can be developed at an early stage or only specific elements of the subsequent application tested, so that feedback from users can also be taken into account in the planning stage. In this way, the conception of content and design should be interconnected from the outset, so that, at its best, the team involved in an internet dictionary project involves not only lexicographical expertise, but also expertise in IT and web design.
7 Conclusion Whether in the print or electronic medium, dictionaries comprise not only content, but also the form in which this content is presented to users. In particular, for internet dictionaries, it is worth planning this presentation carefully, adopting in the process the best of both worlds, print lexicography and web design, in order to facilitate a successful user experience. To this end, specific technical and design expertise is required, in order to take a wide variety of decisions in the design process in consultation with the lexicographers responsible for the content. That this is being successfully accomplished more and more frequently demonstrates how far the design of internet dictionaries has developed over the last twenty-five years or so. Further development in this field is to be eagerly awaited.
419
The Bloomsbury Handbook of Lexicography
References Almind, R. (2005), ‘Designing internet dictionaries’, Hermes. Journal of Linguistics 18 (34), 37–54. Bootstrap Version 4.5 (2020), ‘Introduction’ https://getbootstrap.com/docs/4.5/getting-started/ introduction/ [accessed 27 November 2020]. Corréard, M-H. (2002), ‘Are space-saving strategies relevant in electronic dictionaries?’ in A. Braasch and C. Povlsen (eds), Proceedings of the 10th EURALEX International Congress, Copenhagen, Denmark, 13 – 17 August 2002, Kopenhagen: Center for Sprogteknologi, 463–70. Debus-Gregor, E. and U. Heid (2013), ‘Design criteria and “added value” of electronic dictionaries for human users’ in R. Gouws, U. Heid, W. Schweickard and H. E. Wiegand (eds), Wörterbücher. Dictionaries. Dictionnaires: Ein internationales Handbuch zur Lexikographie. An International Encyclopedia of Lexicography. Encyclopédie international de lexicographie, Berlin and New York: de Gruyter, 1001–13. Dictionary of South African English, s.v. ‘administrator, n.’ https://dsae.co.za/entry/administrator/e00073 [accessed 27 November 2020]. Dziemianko, A. (2014), ‘On the presentation and placement of collocations in monolingual English learners’ dictionaries: Insights into encoding and retention’, International Journal of Lexicography 27 (3), 259–79. Dziemianko, A. (2015), ‘Colours in online dictionaries: A case of functional labels’, International Journal of Lexicography 28 (1), 27–61. Dziemianko, A. (2016), ‘An insight into the visual presentation of signposts in English learners’ dictionaries online’, International Journal of Lexicography 29 (4), 490–524. Heid, U. and J.T. Zimmermann (2012) ‘Usability testing as a tool for e-dictionary design: Collocations as a case in point’ in R.V. Fjeld and J.M. Torjusen (eds), Proceedings of the 15th EURALEX International Congress 2012, Oslo, Norway, 7–11 August 2012, Oslo: Universitetet i Oslo, Institutt for lingvistiske og nordiske studier, 661–71. Hollós, Z. (2018), ʻDatendistribution relativ zum Webdesign. Der erste Prototyp des E-KOLLEXʼ in V. Jesenšek and M. Enčeva (eds), Wörterbuchstrukturen zwischen Theorie und Praxis, Berlin and Boston: de Gruyter, 151–71. Koplenig, A. and C. Müller-Spitzer (2014), ‘Questions of design’ in C. Müller-Spitzer (ed.), Using Online Dictionaries, Berlin and New York: de Gruyter, 189–204. Lew, R. (2010), ‘Users take shortcuts: Navigating dictionary entries’ in A. Dykstra and T. Schoonheim (eds), Proceedings of the XIV Euralex International Congress, Ljoufwert: Afuk, 1121–32. Lew, R. (in press), ‘Space restrictions in paper and electronic dictionaries and their implications for the design of production dictionaries’ in P. Bański and B. Wójtowicz (eds), Issues in Modern Lexicography, München: Lincom Europa. Lew, R. and P. Tokarek (2010), ‘Entry menus in bilingual electronic dictionaries’ in S. Granger and M. Paquot (eds), eLexicography in the 21st Century: New Challenges, New Applications, Louvainla-Neuve: Cahiers du Central, 145–6. Lew, R., M. Grzelak and M. Leszkowicsz (2013), ‘How dictionary users choose senses in bilingual dictionary entries: An eye-tracking study’, Lexikos. Journal of the African Association for Lexicography 23, 228–54. Material Design. ‘Introduction’, Ed. Google. https://material.io/design/introduction [accessed 27 November 2020]. Michaelis, F., C. Müller-Spitzer and S. Wolfer (2019), ‘The Sintra variations – thinking outside the box in designing online dictionaries’ in I. Kosem and T. Zingano Kuhn (eds), Electronic lexicography in the 21st century (eLex 2019): Smart Lexicography. Book of Abstracts. Sintra, Portugal, 1–3 October 2019, Brno: Lexical Computing CZ s.r.o., 43–4. Müller-Spitzer, C., F. Michaelis and A. Koplenig (2014), ‘Evaluation of a new web design for the dictionary portal OWID. An attempt at using eye-tracking technology’ in C. Müller-Spitzer (ed.), Using Online Dictionaries, Berlin and New York. de Gruyter, 207–28.
420
The Design of Internet Dictionaries
Nesi, H. and K. Hua Tan (2011), ‘The effect of menus and signposting on the speed and accuracy of sense selection’, International Journal of Lexicography 24 (1), 79–96. Oppentocht, L. and R. Schutz (2003), ‘Developments in electronic dictionary design’ in P. Van Sterkenburg (ed.), A Practical Guide to Lexicography, Amsterdam and Philadelphia: John Benjamins Publishing Company, 215–27. OWID – Online-Wortschatz-informationssystem Deutsch (2008ff.), Ed. Leibniz-Institut für Deutsche Sprache Mannheim. https://www.owid.de [accessed 27 November 2020]. Paronyme – Dynamisch im Kontrast (2018ff), Ed. Leibniz-Institut für Deutsche Sprache Mannheim. https://www.owid.de/parowb [accessed 27 November 2020]. Schmitz, U. (2016), ʻWörterbücher als Sehflächenʼ in S. Schierholz, R., Z. Hollós and W. Wolski (eds), Wörterbuchforschung und Lexikographie, Berlin and Boston: de Gruyter, 207–25. Spohr, D. (2008), ‘Requirements for the design of electronic dictionaries and a proposal for their formalisation’ in E. Bernal and J. DeCesaris (eds), Proceedings of the 13th EURALEX International Congress, Barcelona, Spain, 15–19 July 2008, Barcelona: Universitat Pompeu Fabra, Institut Universitari de Lingüística Aplicada, 617–29. Storjohann, P. (2018), ‘Commonly confused words in contrastive and dynamic dictionary entries’ in J. Čibej, V. Gorjanc, I. Kosem and S. Krek (eds), Proceedings of the 18th EURALEX International Congress: Lexicography in Global Contexts. Ljubljana, Slovenia 17–21 July 2018, Ljubljana: Ljubljana University Press, Faculty of Arts, 187–97. Swanepoel, P. (2001), ‘Dictionary quality and dictionary design: A methodology for improving the functional quality of dictionaries’, Lexikos. Journal of the African Association for Lexicography 11: 160–90. Tono, Y. (2000), ‘On the effects of different types of electronic dictionary interfaces on L 2 learners’ reference behaviour in productive/receptive tasks’ in U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds), Proceedings of the Ninth EURALEX International Congress. Stuttgart, Germany, August 8th–12th, Stuttgart: Universität Stuttgart, Institut für Maschinelle Sprachverarbeitung, 855–61. Tono, Y. (2011), ‘Application of eye-tracking in EFL learners’ dictionary look-up process research’, International Journal of Lexicography 24 (1), 124–53. Torner, S. and B. Arias-Badia (2019), ‘Visual networks as a means of representing collocational information in electronic dictionaries’, International Journal of Lexicography 32 (3), 270–95. Web Content Accessibility Guidelines (WCAG) 2.0, ed. B. Caldwell, M. Cooper, L. Guarino Reid and G. Vanderheiden, Last modified 11 December 2008. https://www.w3.org/TR/WCAG20/
421
422
25
Resources Reinhard Hartmann
1 Introduction Is lexicography a practical field (linked to publishing) or a scientific discipline (part of ‘dictionary research’ and ‘reference science’), with complex interconnections to linguistic studies, information technology and other subjects? The answers to such questions will determine its academic status, but one important condition for its future success is the availability of reliable resources. For the purposes of this chapter, resources are defined as ‘compendia of valuable assets for work and study’ – not so much for the lexicographic compilation process, but for research on such topics as dictionary history, dictionary typology, dictionary criticism, dictionary structure, dictionary use and dictionary IT. Having played a part in developing dictionary research over the last three decades and contributing to surveys of the dictionary scene (Hartmann 2010a), I can document some of the major facilities that benefit scholars specializing in lexicography.1 There are several problems, however, in this area. Information often tends to be scattered and out of date, so it is difficult to select genuine ‘knowledge’ from the few available sources, complicated by tensions between such extremes as private versus public initiatives, individual versus collective ventures, amateur versus professional activities, subject-based versus interdisciplinary specializations, commercial versus academic bodies, language-based versus encyclopaedic reference tools, paper/print versus electronic media, local versus regional efforts and national versus international organizations. The information presented here is limited to eight types of resource and has been grouped into the following sections: Academies – for establishing details about the national status of languages, Associations – for finding and sharing forums with likeminded specialists, Corpora/Databases – for testing the empirical evidence on linguistic usage, Journals – for consulting professional organs to update knowledge, Networks – for checking what links are available between groups of experts, Online Dictionaries – for exemplifying trends in the digital media, Publishers – for finding and distributing information, and University Research Centres – for advancing academic progress through investigation and training. (A further set of resources – bibliographical references – can be consulted in the bibliography.)
The Bloomsbury Handbook of Lexicography
My choice of such resources is not arbitrary (as may be suggested by their presentation in alphabetic order), but an effort has been made to specify some of the thematic and chronological links between them and to specify reasons why they are important for dictionary research. In each case, a table is presented with a list of around ten representative examples. The coverage of the information is unavoidably selective (not only because of the restriction to 5,000 words), and there are many overlaps between them, but I trust that the data offered in each section will prove useful to readers.
2 Academies How do we obtain and document knowledge about national languages? The answer to this question still depends on which kinds of institutions are available, one of which can be a national academy of science. Many such state-funded research institutions are engaged in dictionary and encyclopaedia projects and often may include specialized libraries, archives and institutes for the study of language history, dialectology, lexicology, terminology and onomastics (Hartmann 2012). This aspect of research into dictionary history can be quite revealing, as it overlaps with efforts to develop ‘standards’ for these national languages, although some aspects of the academy approach have been criticized as ‘regulatory’ and ‘puristic’ (Thomas 1991). At the same time, many innovations are introduced, e.g. by applying linguistic and computational frameworks such as corpora to the compilation and publication of reference tools. Many academies of this type are exemplified in Wikipedia, and some are described in detail with reference to their websites. A select list of eight of them from Europe and two from other continents is presented in Table 25.1 in order to illustrate their widely differing structures and some of their achievements in terms of the language institutes they contain, the encyclopaedia and dictionary projects they engage in and/or the journals they promote, most with relevance to progress in lexicography (two of them, in Hungary and the Netherlands, have even hosted international congresses of EURALEX). All are cited together with their websites, including those for special language institutes, e.g. the Accademia della Crusca in Florence, the Akademie der Wissenschaften in Göttingen and the Académie Française in Paris. Some countries have several competing academies (such as India, where there are three national academies of science). There can be various networks between academies, e.g. through national bodies such as the American National Academies at Washington (www.nationalacademies. org/), the Union of eight German Academies at Mainz (www.akademienunion.de/), the Institut de France with five national academies (www.institut-de-france.fr/) and the Council of Finnish Academies (www.academies.fi/). There are also overarching international bodies such as the Association of Spanish-language based Academies (ASALE http://asale.org/), the All-European Academies (ALLEA www.allea.org/), the InterAcademy Panel (IAP www.interacademies.net/ Academies//ByCountry.aspx) and the International Union of Academies (UAI www.uai-iua. org/), some of whose websites give more details on their ongoing reference schemes. Sometimes there are other links, e.g. via associations like EFNIL, ICC and Mercator for the study of ‘lesser-used’ or endangered minority languages such as Welsh and Scottish Gaelic in the UK, Frisian in the Netherlands or Basque in Spain, and occasionally tensions can develop
424
Resources
Acronym for the academy (location)
Full name of the academy (founding date)
Achievements e.g. language institutes, dictionary projects, journals
Website
AdC (Florence)
Accademia della Crusca (1583)
Italian Lexicography Centre, Studi di Lessicografia Italiana
www.accademiadellacrusca.it/
AdW (Göttingen)
Akademie der Wissenschaften (1751)
German Dictionary by the Grimm Brothers, Goethe Dictionary
http://adw-goe.de/ https://adw-goe.de/forschung/ forschungsprojekteakademienprogramm/goethewoerterbuch/
AF (Paris)
Académie Française (1635)
Dictionary of French
http://www.academie-francaise. fr/le-dictionnaire/commissiondu-dictionnaire
AAH (Canberra)
Australian Academy of the Humanities (1969)
Language Studies, Language Atlases, Humanities Australia
www.humanities.org.au/
CASS (Beijing)
Chinese Academy of Social Sciences (1977)
Institute of Linguistics, Chinese Dictionary, Zhongguo Yuwen
www.english.cssn. cn/?COLLCC=1796941704&
FA (Leeuwarden)
Fryske Akademy (1938)
Frisian Dictionary, Place Names, Databases, Trefwoord
www.fryske-akademy.nl/
MTA (Budapest)
Magyar Tudományos Akadémia (1825)
Linguistic Re-search Institute, Hungarian Dictionary, Hungarian Corpus
https://mta.hu/english/ www.nytud.hu/depts/index.html
PAN (Warsaw)
Polska Akademia Nauk (1952)
Institute of the Polish Language, Polish Dictionary, Język Polski
https://pan.pl/ https://ijp.pan.pl/en/
RAE (Madrid)
Real Academia Español (1713)
Spanish Dictionary, Spanish Corpus, Escuela de Lexicografía Hispánica
www.rae.es/
RAN (Moscow)
Rossijska-ja Akademija Nauk (1724)
Russian Language Institute, Russian Dictionaries, Russian National Corpus, Voprosy Jazykoznanija
www.ras.ru/ www.ruslang.ru/
425
The Bloomsbury Handbook of Lexicography
between such academy-like bodies and various research centres at universities in the respective countries, e.g. when they offer taught courses (→ Sections 6 and 9). There are no directories as such devoted to academies, but most of them are cited in general guides such as The Europa World of Learning, and some of them are listed in web portals like the Canadian ‘Scholarly Societies’ Project (www.scholarly-societies.org/) or the German-based University Directory (www.university-directory.eu/Academies-directory.html). Some of the international associations (such as ALLEA mentioned above) offer lists, portals and portraits of their member academies.
3 Associations How do we make progress by getting together in groups? It was the development of learned or scholarly societies that eventually led to the creation of academies discussed in Section 2. National and international associations of a slightly different kind form important links between individuals working in the professional context of lexicography (Hartmann 2013). They often provide opportunities not only for publishing the researchers’ own investigation results (in all branches of dictionary research), but also help to make them appreciate alternative approaches from neighbouring fields, which can encourage a wider view of scientific progress. Most notable for lexicography have been the ten societies listed in Table 25.2, together with their major achievements and websites, such as the DSNA in North America and EURALEX in Europe. Of direct relevance are the continental AFRILEX, ASIALEX and AUSTRALEX, the regional NFL for Northern Europe and the national societies AELex for Spain, ATL for the UK, LEDA for Denmark and ZCX for China (plus a few in other countries such as Bulgaria, India, Japan and Korea). Some of these have impressive records in terms of the number of conferences held and the proceedings and journals published. To be successful, such associations need to have pioneering members (often affiliated with university research centres → Section 9), innovative special-interest groups, informative websites and publications (→ Sections 5 and 8). Of particular benefit to their participants are regular meetings (such as the biennial congresses held by EURALEX since 1983, all with published proceedings) or successful conference series such as the six meetings devoted to historical lexicography and lexicology, the eight of the European Language Resources Association (www. lrec-conf.org/), the ones of the French and international Journées des Dictionnaires (https:// www.u-cergy.fr/lt2d/fr/manifestations-scientifiques/journees-des-dictionnaires.html) or those on corpus linguistics and other networks (→ Sections 4 and 6). Of interdisciplinary interest may be national and international associations representing such fields as applied, computational and corpus linguistics, terminology, dialectology, onomastics, translation, languages for specific purposes and indexing/archiving. For some of these, networks exist which act as overarching ‘federations’, ‘unions’ or ‘councils’ (sometimes providing directories of their national members), such as the Association for Computational Linguistics (www.aclweb.org/), the Association Internationale de Linguistique Appliquée (www.aila.info/), the European Association for Terminology (www.eaft-aet.net/), the Fédération Internationale des Traducteurs (www.fit-ift.org/), the International Council of Onomastic Sciences (https://icosweb. 426
Resources
Table 25.2 Associations. Acronym for the association
Full name of the association
Achievements e.g. conferences, journals
Website
AELex
Asociación Española de Estudios Lexicográficos
8 (2004–18), Revista de Lexi-cografía
www.aelex.net/
AFRILEX
African Association for Lexicography
24 (1995–2019), Lexikos
www.afrilex.co.za/
ASIALEX
Asian Association for Lexicography
14 (1997–2019), Lexicography, Journal of Asialex
http://asialex.org/
AUSTRALEX
Australasian Association for Lexicography
19 (1990–2019)
www.adelaide.edu.au/ australex/
DSNA
Dictionary Society of North America
22 (1975–2019), Dictionaries
www. dictionarysociety.com/
EURALEX
European Association for Lexicography
18 (1983–2018) International Journal of Lexicography
www.euralex.org/
LEDA
Foreningen af Leksikografer i Danmark
Meetings (from 1989), LEDA-Nyt
http://leksikografer. dk/
NFL
Nordisk Forening for Leksikografi
15 (1991–2019), Lexiconordica
https://nordiskleksikografi.com/
ZCX (LSC/CLA)
Zhongguo Cishu Xuehui
11 (from 1993), Cishu Yanjiu
www.guoxue.com/
net/) and the International Society for Dialectology and Geolinguistics (http://geo-linguistics.org/). For examples of national and global interdisciplinary bodies → the American Council of Learned Societies (www.acls.org/) and the International Council for Science (https://council.science/). There are very few directories and web portals that list lexicography-based associations. Some more associations, particularly on linguistics and the study of English and other languages and language families, are cited in Hartmann (2010b), for another list of associations of relevance to lexicography → https://globalex.link/associations/, for a select list of conferences → http:// linguistlist.org/callconf/index.cfm, for a general guide to academic bodies around the world → The Europa International Foundation Directory.
4 Corpora/databases Where and how do we obtain empirical evidence on linguistic usage? Dictionary IT has been a field of rapid growth as one of the important branches of metalexicography. Corpora and databases overlap in the sense that most collections of spoken or 427
The Bloomsbury Handbook of Lexicography
written language material are electronically processed and the relevant information is extracted from them by specially designed technologies, some of which have made significant contributions to recent dictionary projects, supported by new associations and conference series such as the International Computer Archive of Modern and Medieval English at Bergen (http://icame.uib. no/), the European Language Resources Association (www.elra.info/), the International Corpus Linguistics Conferences (https://www.conferenceindex.org/conferences/corpus-linguistics), the Asociación Española de Lingüística de Corpus (http://www.aelinco.es/en) and the Asia-Pacific Corpus Linguistics Conferences (http://corpling.com/conf/). The corpora and databases listed in Table 25.3 have been selected to represent their major types for major languages. They range from large to small and can cover a wide range of text genres (one or more languages, spoken or written language, general or LSP) and can concentrate on grammatical and semantic features (such as parts of speech or sense groups). Occasionally tensions arise between optimistic expectations and critical doubts, between spoken and written language data, between general language corpora and those of mixed or specialized text genres, between lexicography-oriented corpora and LSP/terminology databases, between monolingual and bilingual/translation corpora, and even between lexical databases and online dictionaries, as demonstrated by Fuertes Olivera and Bergenholtz (2011). More references to corpora and databases can be found on websites of academies (→ Section 2), associations (→ Section 3), networks (→ Section 6), online dictionaries (→ Section 7), publishers (→ Section 8) and university research centres (→ Section 9). It is impossible to list all important corpus and database projects in view of their variable coverage and ever-changing nature, but a range of them can be located in the Encyclopedia of Applied Linguistics (2012) and at the websites of some of the schemes cited in Table 25.3 or through interfaces, directories and gateways like the following, some with brief descriptions of their contents.
Table 25.3 Corpora/Databases. Acronym for the corpus/ database
Full name of the corpus/database (Location)
Special features Website e.g. language, text genre (size)
BLF
Base Lexicale du Français (KU Leuven)
French, newspaper texts, 50 m. words
http://ilt.kuleuven.be/ blf/
BNC
British National Corpus (U Oxford)
English, mixed genres, 100 m. words
www.natcorp.ox.ac.uk/
BoE
Bank of English COBUILD Corpus (Birmingham U/ Collins), Now The Collins Corpus
English, mixed genres, 4.5 billion words
https://collins.co.uk/ pages/elt-cobuildreference-the-collinscorpus
428
Resources
Acronym for the corpus/ database
Full name of the corpus/database (Location)
Special features Website e.g. language, text genre (size)
ČNK
Český Národní Korpus (Charles U Prague)
Czech, mixed genres, 3 billion words, including InterCorp parallel corpus
www.korpus.cz/english/ index.php
COCA
Corpus of Contemporary American English
English, mixed genres, 1 billion words
https://www.englishcorpora.org/coca/
EUROPARL
European Parliament Proceedings (Strasbourg)
Parallel corpus of political debates, up to 55 m. words each for 21 languages
www.statmt.org/ europarl/
EUSKARA
UZEI Terminology and Lexicography Centre (Donostia)
Basque, mixed genres, 4.6 m. words
www.euskaracorpusa. net/
IATE
Inter-Active Terminology Database for Europe (Luxembourg)
Multilingual translation, http://iate.europa.eu/ 8 m. terms
ICE
International Corpus of English (UC London)
Comparable English corpora from different parts of the world (each 1 m. words)
www.ucl.ac.uk/englishusage/projects/ice-gb/ index.htm
CNRTL (Paris) www.cnrtl.fr/corpus/ CoRD (Helsinki) www.helsinki.fi/varieng/CoRD/corpora/index.html CorpusEye (Odense) https://visl.sdu.dk/visl/corpus.html ETB(Riga) https://www.eurotermbank.com/ FrameNet (Berkeley CA) https://framenet.icsi.berkeley.edu/fndrupal/about LDC (Philadelphia) https://catalog.ldc.upenn.edu/ LL (Ypsilanti MI) http://linguistlist.org/sp/GetWRListings. cfm?WRAbbrev=Texts#wr173 NaCTeM (Manchester) www.nactem.ac.uk/resources.php Opus (Uppsala) opus.nlpl.eu OTA (Oxford) https://ota.bodleian.ox.ac.uk/repository/xmlui/ SketchEngine (Brighton) www.sketchengine.eu/ Valency Patternbank (Erlangen) www.patternbank.uni-erlangen.de/ WordNet Database (Princeton) http://wordnet.princeton.edu/ Wortschatz (Leipzig) http://wortschatz.uni-leipzig.de/. 429
The Bloomsbury Handbook of Lexicography
5 Journals How do we keep track of new developments through the medium of periodicals? Many of the academies and associations cited in Sections 2 and 3 sponsor journals, which I have surveyed (Hartmann 2009). All branches of dictionary research can benefit from such serial publications, e.g. dictionary history (from IJL), dictionary typology (from Lexicographica), dictionary criticism (from Reference Reviews), dictionary structure (from Dictionaries) and dictionary IT (from Language Resources and Evaluation Journal). The list in Table 25.4 contains ten journals of special relevance to lexicography, and an increasing number of them are becoming available in online editions. There are quite a few journals available for neighbouring disciplines such as applied linguistics, computational and corpus linguistics, languages for specific purposes, semiotics, dialectology, onomastics, terminology, library science, indexing and translation. Some of these can be pursued via the websites of some of the bodies mentioned in other sections (e.g. two on academies, three on associations, four on corpora and eight on publishers). More can be found in directories such as the EBSCO Journals (www.ebscohost.com/title-lists/), the JSTOR Scholarly Journal Archive (www.jstor.org/), the MLA Bibliography (https://www.mla.org/Publications/MLA-InternationalBibliography/About-the-MLA-International-Bibliography/MLA-Directory-of-Periodicals) or the journal LLBA (https://www.proquest.com/). Table 25.4 Journals. Title
Since
Website
Cahiers de Lexicologie
1959
http://atilf.atilf.fr/jykervei/ cahlex.htm
Cishu Yanjiu [Lexicographical Studies]
1979
www.cishu.com.cn/
Dictionaries. Journal of the DSNA
1979
https://muse.jhu.edu/journal/540
International Journal of Lexicography
1988
https://academic.oup.com/ijl
Language Resources and Evaluation Journal [formerly Computers and the Humanities]
(1966) 2005
https://www.springer.com/ journal/10579
Lexicographica. International Annual for Lexicography
1985
https://www.degruyter.com/ view/journals/lexi/lexi-overview. xml?lang=en
Lexiconordica
1994
http://nordisksprogkoordination. org/nfl/publikationer/ Lexiconordica
Lexikos
1991
https://lexikos.journals.ac.za/ pub/
430
Resources
Title
Since
Website
Lexique. Revue française de lexicologie et de linguistique
1982
www.septentrion.com/en/revues/ lexique/
Reference Reviews
1987
https://www.emerald.com/ insight/publication/issn/09504125
6 Networks How do we support each other’s work? We have already noticed the development of ‘networks’ in the above sections on academies, associations, corpora and journals. This is not a straightforward term, as it has computational and social connotations, both of which are important for all branches of dictionary research (cf. Hartmann 2011a, 2011b). Networks can cover a wide range of contacts within and between all types of institutions (from workshops and companies to schools, colleges, universities and other public bodies) within and between all types of special disciplines (from arts and humanities to linguistics and computing) at all hierarchical levels (from local and regional to national, international and global), and may be given many different titles, from ‘committee’ and ’institute’ to ‘group’, ‘forum’ and ‘union’. Table 25.5 concentrates on networks related to lexicography and terminology for English, German and a number of other European languages. For English, numerous informal networks exist in countries around the world, such as the Iwasaki Linguistic Circle in Japan whose journal Lexicon sponsors critical reviews of English-language based dictionaries. For Germanic languages → the FGLS based at the University of Bristol (www.bris.ac.uk/german/fgls/), for Nordic languages → the Expert Group Nordic Language Council ENS at Copenhagen (www.norden.org/ en/), for Romance languages from French to Valencian → the Organisation Internationale de la Francophonie (www.francophonie.org/), the Union Latine (www.unilat.org/), the Study Network on Minority Romance Languages (www.romaniaminor.net/) and the Inter-University Institute for the Valencian Community (www.iulma.es/). For minority languages in Europe (such as Celtic) → the Irish Foras na Gaeilge (www.forasnagaeilge.ie/), the Forum for Research on Languages of Scotland and Ulster (https://frlsu.org/), the International Celtic Congress (www.ccheilteach.ie/) and the European Research Centre on Multilingualism and Language Learning (www.mercatorresearch.org/). There are no directories of networks as such, but some are cited on the websites of bodies such as FTT at Las Palmas (http://terminol.ulpgc.es/), the Google Scholars specializing in lexicography (http://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors= label:lexicography), ISO/TC37 (https://www.iso.org/committee/48116.html).
431
The Bloomsbury Handbook of Lexicography
Table 25.5 Networks. Acronym for the network
Full name of the network (location)
Achievements Website e.g. conferences, journals
EFNIL
European Federation of National Institutions for Language (The Hague)
Annual conferences, projects
www.efnil.org/
ELC
European Language Council (Berlin)
Thematic Network Projects, European Language Portfolio
www.celelc.org/
IL
Wissenschaftliches Netzwerk Internetlexikographie (Mannheim)
(Workshops of) Academic Network of Internet Lexicography
http://multimedia. ids-mannheim.de/ mediawiki/web/index. php/WebHome
INFO-TERM
International Information Centre for Terminology (Vienna)
Meetings, Newsletter
www.infoterm.info/
IRMM
Institute for Reference Materials and Measurements (EC Joint Research Centre, Geel)
Workshops, https://ec.europa.eu/jrc/ Conferences en/reference-materials (Catalogue of Reference Materials)
LTT
Réseau de Lexicologie, Terminologie, Traduction (Bruxelles)
Meetings, Lists of Publications, Lettre d’information LTT
https://www.reseau-ltt. net/
NordTerm
Network of Nordic Terminology Organisations (Helsinki et al.)
Assemblies, NordTerm-Net, Term banks
www.nordterm.net/ index.html
RaDT
Rat für deutschsprachige Terminologie (Bern)
German-Language Terminology Links
www.radt.org/
WBN
Wörterbuch-Netz (U Trier)
Electronic Documentation of Historical and Dialect Dictionaries
http://woerterbuchnetz. de/
432
Resources
7 Online dictionaries What is the impact of the digital media? Reference works come in all sizes, formats and functions, which is one reason why we have already met some aspects of e-lexicography (→ Section 4). Dictionary IT is thus an important field which should help us distinguish the various trends in which online schemes are developing, from adaptations and aggregations of existing dictionaries to completely new interfaces and sometimes collaborative projects involving different publishers (→ Section 8) and university research centres (→ Section 9). Table 25.6 lists ten electronically produced ‘open’ dictionaries for English and a range of European languages. The emphasis is on features which make them special, such as the range of languages and subjects covered, their databases and their various presentation modes. In addition to these, there are the so-called aggregators (or internet sites which combine the reference material from several different sources), such as DictionaryBoss http://dictionaryboss.com/ DictionaryCom https://www.dictionary.com/ The Free Dictionary www.thefreedictionary.com/ ThesaurusCom https://www.thesaurus.com/ VisualThesaurus www.visualthesaurus.com/. Table 25.6 Online dictionaries. Acronym for the dictionary
Full title of the dictionary
Special features (contents)
Website
ANW
Algemeen Nederlands Woordenboek
Dutch words, meanings et al.
http://anw.inl.nl/
COBUILD
Collins Birmingham University International Language Database English Language Dictionary
English corpus-based information, translation equivalents
https://www. collinsdictionary.com/ http://dictionary.reverso. net/
LDoCE
Longman Dictionary of Contemporary English Online
Word finder
www.ldoceonline.com/
Macmillan
Macmillan Dictionary
British and American corpus-based
www. macmillandictionary. com/
ORDNET
Det Danske Ordbog, Ordbog over det Danske Sprog
Danish general and historical dictionaries
www.ordnet.dk/
Oxford
Oxford Dictionaries Online
World English dictionary & thesaurus
https://www.lexico.com/
433
The Bloomsbury Handbook of Lexicography
Acronym for the dictionary
Full title of the dictionary
Special features (contents)
Website
OWID
OnlinewörterBücher des Instituts für Deutsche Sprache
e-lexiko, https://www.owid.de/ neologism dictionary, etc.
PBWB
Pons Bildwörterbuch
Pictorial German dictionary
www.bildwoerterbuch. com/en/home/
PDEV
Pattern Dictionary of English Verbs
Verb phrase collocations
http://deb.fi.muni.cz/ pdev/
TLF
Le Trésor de la Langue Française informatisé
French historical dictionary
http://atilf.atilf.fr/
There are also several internet ‘portals’ offering directories of dictionaries and other reference materials, such as: CL Research www.clres.com/ ElexicoCom www.elexico.com/ LinguistList http://linguistlist.org/sp/GetWRListings.cfm?WRAbbrev=Dict RefdeskCom www.refdesk.com/ Yahoo Directory https://www.yahoo.com/reference/dictionaries
8 Publishers How do we present our information through commercial channels? Publishing bodies are needed for the transmission of research results and the acknowledge ment of authorship. There have been occasional tensions between (open) research output and (commercial) media, and the 2008 economic recession led to a reduction of dictionary projects and their staff and a greater willingness of publishers to collaborate and even to join forces. Table 25.7 lists a selection of bodies known for the publication of dictionaries and other reference materials as well as journals and book series of relevance to lexicography, concentrating on the English-speaking world and Europe. Among the networks linking publishers of dictionaries and other reference materials are associations such as: the Publishers Association (London) www.publishers.org.uk/ the Association of American Publishers (Washington DC) https://publishers.org/ the Federation of European Publishers (Brussels) https://fep-fee.eu/ the European Association of Search and Database Publishing (Brussels) www. eadp.org/
434
Resources
Table 25.7 Publishers. Name of the publisher (location)
Special features e.g. journals, book series, dictionaries
Website
J. Benjamins (Amsterdam)
www.benjamins.com/ Babel, International Journal of Corpus Linguistics, Terminology, Language International World Directory (BS), Studies in Corpus Linguistics (BS), Studies in the History of the Language Sciences (BS), Terminology and Lexicography Research and Practice (BS)
Bibliographisches Institut (Mannheim)
(Meyers) Encyclopaedias, (Duden) Dictionaries, Atlases
De Gruyter (Berlin)
www.degruyter.com/ Corpus Linguistics and Linguistic Theory, Dialectologia et Geoling-uistica, Folia Linguistica, Lebende Sprachen, Lexicographica [Inter-national Annual], Semiotica, WSK Online – Handbooks of Linguistics and Communication Science (BS), Lexicographica Series Major (BS)
Elsevier (Amsterdam – London – New York)
English for Specific Purposes, Journal of English for Academic Purposes, Language and Communication, Language Sciences, Lingua, System, (Medical) Dictionaries, Technical Handbooks
HarperCollins (New York – London)
www.harpercollins.com/ ELT News, ELT Reader (BS), (Collins) Dictionaries, Wordbank www.harpercollins.co.uk
Larousse (Paris)
Langages, Langue Française, Encyclopaedias, (Larousse & Didier) Dictionaries
Macmillan (London – New York)
https://macmillan.com Journal of Information http://us.macmillan.com/ Technology, Latino Studies, (Macmillan, Palgrave & Encarta) Dictionaries
www.duden.de/
www.elsevier.com/
www.editions-larousse.fr/
435
The Bloomsbury Handbook of Lexicography
Name of the publisher (location)
Special features e.g. journals, book series, dictionaries
Website
Oxford U.P. (Oxford)
Applied Linguistics, English Language Teaching Journal, International Journal of Lexicography, Journal of Semantics, Dictionaries
www.oup.co.uk/
Random House (New York – London)
Living Language (BS), (Webster) Dictionaries
https://www. penguinrandomhouse.com/ www.penguin.co.uk
Wiley (Malden, MA – Chichester)
The Modern Language Journal, www.wiley.com/ Inter-national Journal of Applied Linguistics, Encyclopaedias (BS), Companions (BS)
the International Publishers Association (Geneva) www.internationalpublishers. org/ and Publishers Global www.publishersglobal.com/. More lists of publishers can be found in the websites of the associations mentioned above and in directories such as the Directory of Publishing and The Writer’s Handbook.
9 University research centres How do we advance knowledge through research and training? We have already seen in earlier sections how several kinds of dictionary work can be pursued in connection with academies, corpus projects, interdisciplinary networks, online dictionary projects or publishers. The recent development of lexicographic units at universities, covering one or more branches of dictionary research, is worthy of recognition, but their numbers are still very small (6 out of 117 universities in the UK, 7 out of 83 in Germany, 6 out of 79 in France and 5 out of 62 in Spain). One representative selection of such centres has been published in a list of portals on the EURALEX website (Hartmann 2010b). All of these centres face potential tensions between humanities and science, theory and practice, education and commerce, or public and private initiatives. Limited funding has occasionally led to a decline in numbers of staff, projects and postgraduate courses, and sometimes to the complete closure of such units. All of this can affect even the ten pioneering units exemplified in Table 25.8, which concentrates on the situation in Europe, with special reference to such factors as dictionary projects (which often involve links with academies, archives, libraries and publishers), MA and PhD programmes, publications, conferences and the availability of corpus and other computer technologies. 436
Resources
Acronym for centre (Location)
Full name of the centre (university)
Website Achievements e.g. special projects, publications, courses, meetings
CENTLEX (Aarhus)
Center for Lexikografi, Aarhus Universitet
Danish LSP online dictionaries, Hermes, MA & PhD, Conferences
https://cc.au.dk/en/ centreforlexicography/ centreprofile/
FGL (Oslo)
Forskar-gruppe for Leksiko-grafi, Universitetet i Oslo
www.hf.uio.no/iln/forskning/ grupper/leksikografi/index. html
GdL (La Coruña)
Grupo de Lexicografía, Universidade da Coruña
ICLVCR (Erlangen)
Interdisciplinary Centre for Lexicography, Valency and Collocation Research, FriedrichAlexanderUniversität Lexicography Research Group, Universitat Pompeu Fabra
Norwegian dictionaries, Dialect and text corpora, MA & PhD, Conferences of NFL 1991 & 2013, Congress of EURALEX 2012 Spanish dictionary, Revista de Lexicografía, MA & PhD, Thematic bibliography, International conference 2004 Valency and collocation research, MA (EMLex) & PhD, Conferences and workshops
Spanish Learner’s Dictionary, Thematic Networks for Terminology & Lexicography, Corpora, Termbank, MA & PhD, Congress of EURALEX 2008, Corpus Seminar 2010
https://www.upf.edu/web/ universitat/-/grup-de-recercaen-informacio-lexicograficainfolex-
INFOLEX (Barcelona)
www.udc.es/grupos/ lexicografia/
www.lexi.uni-erlangen.de/en/
437
The Bloomsbury Handbook of Lexicography
Acronym for centre (Location)
Full name of the centre (university)
Website Achievements e.g. special projects, publications, courses, meetings
INL (Leiden)
Instituut voor Nederlandse Lexi-cologie, RijksUniversiteitLeiden Laboratoire Lexiques, Dictionnaires et Informatique, Université Paris 13 Nord Lexikaliska Institutet, Göteborgs Universitet
Historical and https://ivdnt.org/ contemporary dictionaries, Corpus development French lexicography, https://www.univ-paris13.fr/ MA & PhD, Journées Tout/ldi-2/
LDI (Paris)
LI (Göteborg)
RIILP
ZLL (Poznań)
Swedish and multilingual dictionaries, MA & PhD, Conference of NFL 1999 Research Institute Pattern Dictionary of of Information and English Verbs, Language Processing, MA & PhD University of Wolverham-pton Zakład Bilingual Leksyko-grafii i dictionaries, MA & Leksyko-logii, PhD Uniwersytet im. Adama Mickie-wicza
https://svenska.gu.se/ forskning/forskningsprofiler/ lexikologi-och-lexikografi/ lexikaliska-institutet https://islex.arnastofnun.is/se/ https://www/wlv.ac.uk/RIILP
http://wa.amu.edu.pl/wa/ Department_of_Lexicology_ and_Lexicography
Occasionally, these centres can interlink with other disciplines, departments or institutions, e.g. when the Applied Linguistics Institute at Pompeu Fabra University in Barcelona provides networks for terminology and other fields, or when the ICRLVC at Erlangen acts as a forum for lexicography and collocational research in different parts of the university and MA courses elsewhere, or when the LDI allows professional connections to be maintained between Cergy and other universities in the Paris region, or when joint online LSP dictionary projects bring together staff at the Aarhus Business School in Denmark and the Spanish University of Valladolid. There are a range of university research units which have become well known for their special initiatives, such as lexicography summer schools (at Ivanovo in Russia), the study of minority languages (at Cambridge in England or at Ghent in Belgium), the promotion of neighbouring disciplines such as NLP and corpus linguistics (at Stuttgart in Germany or at Lorient-Bretagne-
438
Resources
Sud in France), terminology (at Lyon 2-Lumière in France or at Pecs-Karoli-Gaspar in Hungary) and translation (at Tampere in Finland or at Bologna in Italy). Some are managing to collaborate with research units in independent institutes or in academies (e.g. the Magnusson Language Institute at Reykjavik in Iceland). There are no general directories available for academic lexicography centres, but some specific websites can provide selected information, e.g. on the International Association of Universities at Paris (https://iau-aiu.net/), on lists of universities and academies (www.university-directory. eu/), on the academic ranking of world universities (http://www.shanghairanking.com/), on research funding in the UK (https://www.ukri.org/), on linguistics programmes (LinguistList https://linguistlist.org/teach/programs/), or on staff based at British universities (www.academia. edu/). For general guides to academic bodies around the world → The Europa International Foundation Directory and The Europa World of Learning.
10 Conclusion This chapter has examined eight assets of potential value to dictionary research: ●● ●● ●● ●● ●● ●● ●● ●●
Academies, Associations, Corpora and databases, Journals, Networks, Online dictionaries, Publishers and University research centres.
Not all topics could be fully covered, but an effort has been made to show correlations and overlaps between the main resources, e.g. lexicographic work that is being carried out at academies, publishers and universities and existing networks that have been developed between them. A number of issues may have been simplified or overlooked, e.g. when selected examples were presented in the tables, and it was not possible to describe the current situation in all countries, languages and disciplinary specializations (e.g. NLP, terminology, translation and onomastics), but the emphasis on websites has hopefully improved empirical evidence and increased our knowledge of the current facts.
Note 1 I wish to acknowledge the help I have received from the editor and a number of co-authors (especially Paul Bogaards and Robert Lew) and other scholars elsewhere, such as Félix Córdoba Rodríguez, Gilles-Maurice de Schryver, Dmitrij Dobrovolskij, Anna Hannesdóttir, Thomas Herbst, Iztok Kosem, Sabina Pavlova, Jean Pruvost, Serge Verlinde and Geoffrey Williams.
439
The Bloomsbury Handbook of Lexicography
References Dictionaries, directories and other reference works (for online dictionaries → Section 7) Directory of Publishing 2012 (U.K. and Republic of Ireland), Thirty-seventh edition (2011), London: Continuum and Publishers Association. The Encyclopedia of Applied Linguistics (2012), Ed. Carol A. Chapelle (10 volumes/online). Malden MA and Chichester U.K.: Wiley & Blackwell. EURALEX Bibliography. http://euralex.pbworks.com/w/page/7230036/FrontPage. The Europa International Foundation Directory, Twenty-first edition (2012), London: Routledge and Taylor and Francis Group. The Europa World of Learning, Seventieth edition (2020), London and New York: Routledge and Europa Publications. The Writers’ Handbook 2020 (2019), Ed. J. Paul Dyson. London: J.P. and A. Dyson.
Other references Fuertes-Olivera, P.A. and H. Bergenholtz (eds) (2011), e-Lexicography. The Internet, Digital Initiatives and Lexicography, London: Continuum. Hartmann, R.R.K. (2009), ‘Keeping in touch: A survey of lexicography periodicals’, Lexikos 19, 404–22. Hartmann, Reinhard (2010a), ‘Has lexicography arrived as an academic discipline? Reviewing progress in dictionary research during the last three decades’ in H. Lönnroth and K. Nikula (eds), Nordiska Studier i Lexikografi 10. Rapport från Konferensen om Lexikografi i Norden, Tammerfors 2009, Tammerfors University of Tampere and Oslo: Språkrådet, 11–35. Hartmann, Reinhard (2010b – revised 2011), ‘Reference portals to internet sources relevant to lexicography and terminology, EURALEX Website. http://euralex.pbworks.com/f/ Reference+Portals+aug+2010.pdf. Hartmann, Reinhard (2011a), ‘International and interdisciplinary networking for the benefit of reference science’ in F.I. Kartashkova (ed.), Ivanovskaya Leksikograficheskaya Shkola: Traditsij i Innovatsij/ Ivanovo School of Lexicography: Traditions and Innovations. A Festschrift in Honour of Professor Olga Karpova, Ivanovo: Ivanovo State University, 158–79. Hartmann, Reinhard (2011b), ‘Linking up. The role of networking in disciplinary contacts within and around lexicography, with special reference to four European countries’, Dictionaries 32, 33–65. Hartmann, Reinhard (2012), ‘The contribution of European academies to dictionary-making, lexicography and reference science’ in K.P. Márkus, T. Pintér and D. Pődör (eds), Lexicography and Lexicology. A Festschrift in Honour of Tamás Magay, Szeged: Grimm Publishing House, 310–33. Hartmann, Reinhard (2013), ‘Lexicographic associations’, Article 39 in R. Gouws et al. (eds), Dictionaries. An International Encyclopedia of Lexicography. Supplementary Volume 5.4 Recent Developments with Special Focus on Computational Lexicography, Berlin: De Gruyter Mouton, 613–19. Thomas, G. (1991), Linguistic Purism, London and New York: Longman.
440
Glossary of lexicographic terms Barbara Ann Kipfer
Abbreviation: a shortened or contracted form of a lexical unit, used to represent the whole, utilizing omission of letters, and sometimes substitution of letters, or duplication of initial letters to signify plurality. Abridged dictionary: a condensed or derived work from an unabridged or much larger dictionary and usually including the most essential vocabulary while excluding the rare and archaic or omitting information such as etymologies or examples; also called abridgment or abridgement (example: Shorter Oxford English Dictionary). Access structure: the design of a reference work with parts that allow users to search for particular types of information such as alphabetical order of the headword list or an index for a conceptual thesaurus. Allusive use: a word sense characterized by or containing allusion, making an implied or indirect reference. Alphabetic order: an arrangement in the order of the language’s alphabet; an indexing method in which names, terms or words are arranged in the same sequence as the letters of the alphabet (A–Z). Analogical dictionary: a work containing information such as collocations, synonyms, confusable words, etc. (example: Macmillan Collocations Dictionary). Analytic (or analytical) definition: the classical formula to explain the meaning of a word or phrase using the genus (generic term) and differentia (distinguishing feature or features) formula; also called logical definition (example: triangle – a plane figure (genus) that has three straight bounding sides (differentia). Annotation: a note added as a comment, explanation or instruction. Antedating: citation selection from the earliest possible sources; evidence of the occurrence of a word or phrase given in an historical dictionary. Antonym: a word or phrase with the opposite meaning of another, e.g. alive/dead, clean/dirty. Applied linguistics: the branch of linguistics concerned with practical applications of language studies; the study of language for practical uses, such as teaching or speech therapy. Article: 1 a paragraph describing a headword in a dictionary; also called entry, 2 a piece describing a vocabulary feature, issue or process of dictionary-making that is included in the front matter of a dictionary. Attributive use: a lexical unit, usually an adjective but sometimes a noun, that comes before the noun it describes; an adjective that describes an attribute of the noun phrase of which it is a part. Audio data, audio corpus: any language data that is recorded from speech, either by writing it down or with a device; a speech or spoken database of audio files and/or text transcriptions. Authority: a person, organization or profession considered an expert in a particular subject. Back-formation: a word-formation process in which an element, usually an affix, is removed from a word to create another, e.g. burgle from burglar. Back matter, end matter: any material following the main text of a book or reference work, such as an appendix or index.
Glossary of Lexicographic Terms
Balanced corpus, sample corpus: a corpus that tries to represent a particular type of language over a specific span of time, seeking to be representative of that time frame. Base word: the form of a word that heads a dictionary entry, including the simplest (canonical) terms; the most grammatically simple form of a word; also called basic form, canonical form, entry form, head form (example: arm). Bidirectionality: a foreign language dictionary that goes two ways, functioning in two directions, e.g. English to French and French to English. Bilingual dictionary: a reference work providing equivalent words/phrases in two languages; a dictionary with translation equivalents of two languages (e.g. Collins Robert French-English English-French Dictionary). Bilingualized dictionary: a monolingual dictionary that has been translated into another language, usually a learner’s dictionary (e.g. Password series by Kernerman). Bogey: a word entered in a dictionary through some error, as misunderstanding or misreading a manuscript, or by design as a test for plagiarism; also called ghost word, e.g. dord. Borrowing: a word-formation process in which a word or phrase is transferred from one language to another; a word or phrase from one language taken into another language and naturalized; also called loanword (example: goulash, into English from Hungarian ‘gulyás-hús’); see also calque. Bottom-up lexicography: a method of attempting to find out what learners need and want to know before compiling or adding to a reference work; user involvement in lexicography. Brown Corpus: (full name Brown University Standard Corpus of Present-Day American English) a corpus compiled in the 1960s by Henry Kučera and W. Nelson Francis at Brown University (Providence RI) from a wide-ranging text collection. It has 500 samples of English language text and around one million words from works published in the United States in 1961. It was the first modern, machine-readable general corpus of American English. Calque (loan translation): an expression (compound, derivative or phrase) introduced into one language by translating its constituent parts from another language; e.g. superman (English) from German Übermensch. Canonical form: the most grammatically simple form of a word; also called base word, basic form, entry form, head form (example: arm). Catchword: a lexical unit which is included in the wordlist defined in a dictionary; a guideword. Children’s dictionary: a dictionary created for children between (usually) ages 4–16, or some subset of that, including words that children need and want to learn and with specially written definitions suited to the reading level. Citation: a word or phrase with enough context to understand the meaning, taken as a unit that is recorded or excerpted from written and spoken sources. Citations are a source of lexicographical data that are collected, sorted and analysed for writing definitions and are used as verbal illustrations or examples in dictionary entries. Classificatory label: a taxonomic name or class of a sense or entry, e.g. the order and genus for an animal; also called taxonomic name, scientific classification (example: Felis catus). Closed corpus: a collection of text that is limited by the number of sources available, as for a dead language or dialect; a collection of text claiming to contain all or nearly all data from a particular field, e.g. the Old English Corpus. Cognate: a pair or group of words that have the same root or languages that are genetically related; a member of a pair or group of words and phrases that have an intralingual or interlingual genetic
442
Glossary of Lexicographic Terms
relationship (English apple and German apfel are cognate words; English and Flemish are cognate languages). Cognitive equivalence: the method of ensuring the source language and target language items have equal value in their respective language systems. Coinage: a lexical unit that has been invented or made up; a lexical unit that is used for the first time. Collect: to excerpt citations for the wordlist of a dictionary; to gather source material for compiling a reference work. College dictionary (collegiate dictionary, desk dictionary): an intermediate-size, single-volume dictionary intended for use by students or at an office desk and containing information similar to an abridged general dictionary, e.g. Merriam-Webster’s Collegiate Dictionary. Collocation: a combination of words (adjective-noun, verb-preposition) that have a certain mutual expectancy, that have a great likelihood of co-occurring, e.g. false expectation, hot coffee, nice surprise. Collocations vary in the degree to which one lexical unit expects another to occur with it. Collocations are more fixed than free combinations and less fixed than idioms. Colloquial use: a word sense that is common and acceptable in casual and informal conversation and contexts, the most common functional style of speech. Compound word: two or more lexical units (simple words) that form a new lexical unit (a new word with a single meaning), e.g. dry + clean = dryclean, time + keeper = timekeeper. Computational lexicography: the use of computers, the Internet, software, programming and scripting, electronic corpora and other technological means to collect language information for lexicons and lexical projects. Concise dictionary: a smaller version of a larger dictionary wherein the coverage and scope are reduced. The vocabulary is often reduced and some features eliminated or abbreviated. Concordance: a systematic list of every occurrence of every lexical unit in a specific text or texts, arranged with the lexemes in the centre with preceding and following context; also called keyword-in-context (KWIC) concordance. A concordance provides details about grammar, usage, compounding, lemmatization, collocation and context. Confusable terms: any lexical units that are often confused because they sound or look similar at first glance, but which have differences in meaning and use. Connotation: associations and characteristics connected with a lexical unit or sense beyond the linguistic explanation or denotation, e.g. that caviar is a symbol of luxury. Context: a phrase, sentence, or paragraph surrounding a lexical unit that depicts its meaning or sense; also called lexicographic context, minimal context, situational context, context of use. Taken from either written or spoken sources, context shows the characteristic features of a lexical unit and the setting or circumstances with which a word or phrase is associated. Core sense: the sense of a lexical unit that leads to other, related subsenses and which precedes the subsenses in a dictionary entry. There may be more than one core sense for a lexical unit and each is followed by its related subsense or subsenses. Core vocabulary: the vocabulary of lexical units that make up 70–90 per cent of the most frequently used words of a language; a list of words determined to be the most useful for communication within a language. Corpus (plural, corpora): the written and spoken sources used as the basis for a reference project, a systematic collection of texts which documents a language (example: Corpus of Contemporary American English).
443
Glossary of Lexicographic Terms
Corpus linguistics, corpus-based approach: the study of language based on a set of samples of text with words in natural context, seen as a reliable resource for language analysis; an approach to dictionarymaking based on computational tools and techniques that study large samples of real-world text. Coverage: the amount of material included in a reference work, especially as compared with other works. Cross-reference (or cross reference): the listing of another lexical unit for an entry or a sense that is considered either synonymous with or related to that entry or sense; a word or symbol in a reference work that indicates related information. Crowd-sourced lexicography, collaborative lexicography, user participation lexicography: a range of methods employing user-generated content for gathering linguistic data and adding to or creating lexicons or lexical databases. Dating: the use of textual evidence to date the earliest occurrence of a word or phrase. Decoding: the process of understanding the meaning of a lexical unit; the passive linguistic activity where learners look up words they do not know or understand so they may then understand the word’s meaning in context. Definiendum: a lexical unit (word or phrase) that is defined in a reference work. Definiens: the explanation of the meaning of a lexical unit (word or phrase); also called definition. Defining vocabulary (limited defining vocabulary): a restricted set of words used to define all other terms in a dictionary; the controlled use of vocabulary in definitions, restricting the descriptions to using the most frequent words of a vocabulary to describe other words. This is a practice of learner’s dictionaries, e.g. Longman Dictionary of Contemporary English. Definition: the explanation of the meaning of a lexical unit (word or phrase); also called definiens, gloss. The definition offers semantic information and is the prominent feature of a dictionary. Denotation: an aspect of meaning that relates a word or phrase to the thing it expresses, i.e. what a sense of a lexical unit actually refers to; also called referential meaning, cognitive meaning, reference. This is the usual topic of the definition while the more subjective or emotive aspects (connotation) are not described. Denotatum: an actual existing object referred to by a lexical unit (word or phrase); the meaning distilled from a referent by perception. It is contrasted with designatum. Density: the degree of vocabulary coverage in a reference work. Derivative: a word that is created by the addition of an affix to a base or stem, e.g. lexical, lexically, lexicalize, lexicalization. Derivatives are not always given headword status but may be run-on entries or sub-entries under the headword from which they were derived. Descriptive: an approach to describing the meanings and uses of the language based on observed facts rather than on attitudes as to how it should be used (prescriptive) (noun, descriptivism). Designatum: an object that is referred to by a lexical unit (word or phrase), whether it actually exists or not; the aspect of meaning identified for expression by a word or phrase. It is contrasted with denotatum. Diacritic (or diacritic mark, diacritical mark): a sign placed above or below a character or letter to indicate that it has a different sound or phonetic value or that a syllable has a certain type of stress or tone (example: piñata, jalapeño). Dialect label: a label indicating a specific geographic area where a sense or entry is used, e.g. New England, British (also called regional label), or a particular group of speakers, e.g. kids’ slang. Dictionary: a reference work that describes chosen lexical units (words, phrases) of a language or subject; also called lexicon (2). 444
Glossary of Lexicographic Terms
Dictionary archaeology, lexicographic archaeology: any methods used to discover the links between different dictionaries by studying their histories and contents. Dictionary portal: a website or web page providing access or links to dictionary, lexicon, glossary or thesaurus websites. Differentia (plural differentiae): the specifying term or terms in an analytic, classical or logical definition, the term which qualifies or characterizes the genus, one or more of the characteristic features which distinguish the word explained from the generic term (genus) of which it is considered a specific instance; also called differentia specifica (example: autumn/fall: the third season of the year, when crops and fruits are gathered and leaves fall). Digital dictionary: see Electronic dictionary. Diminutive: a word is formed from another by the addition of a suffix expressing smallness in size, e.g. a booklet is a small book, eaglet is a small eagle. Direct entry: the listing of a multi-word expression under its first constituent, e.g. autumnal equinox listed in letter A. Direct sense: the primary or significant sense of a lexical unit. Disambiguation, word sense disambiguation: the recognition and separation of lexical units that are similar in form or meaning; also, this process in computational linguistics, performed by electronic means, especially for identifying which sense of a word is used in a sentence. Domain, subject, topic: an information category for a headword or definition that denotes its use exclusively or mainly in a discipline or genre. E-dictionary: see Electronic dictionary. EFL: English as a foreign language; the presentation in a dictionary or language reference book of English used by learners for whom it is not their native language. Electronic data source: data that is collected and held in a computerized system, such as a database or website. Electronic dictionary: a reference work that is compiled with the use of computers and is presented in computerized form, including online dictionaries, dictionaries on media such as DVD, spelling checkers and thesauri in word processors, and dictionary databases; any machine-readable version of a dictionary. E-lexicography (electronic lexicography, computational lexicography): the processes involved in the compilation, design and implementation of electronic dictionaries and other word-based reference works. Encoding: the process of productively using language in speaking, translating or writing; the active linguistic activity of using words and phrases to express meaning. Also, the process of converting or translating words and phrases from one language to another. Encyclopaedic definition: an explanation of a word or phrase that is comprehensive and includes information that is not strictly linguistic, a definition that reflects encyclopaedic knowledge or facts; e.g. the definition of elephant in Collins English Dictionary: Either of the two proboscidean mammals of the family Elephantidae. The African elephant (Loxodonta africana) is the larger species, with large flapping ears and a less humped back than the Indian elephant (Elephas maximus), of S and SE Asia. Entry: a paragraph describing a lexical unit (word or phrase), the basic unit in a reference work. An entry may describe the base word and all of its parts of speech (as sub-entries), or the parts of speech may be depicted as separate entries. Entry block: a paragraph describing a lexical unit or part of speech of a lexical unit.
445
Glossary of Lexicographic Terms
Entry form: the most grammatically simple form of a word; also called basic form, canonical form, head form, base word. Entry line: the parts of an entry which precede the definition(s), usually the headword, pronunciation and syllabication, part of speech, and label(s); the initial line of a reference-work entry indicated by indentation or typography (e.g. boldface). Entry word: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary. Canonical forms are almost always included. Also called entry, entry head, headword, keyword, lemma, main entry, word entry. Equivalent: 1 a lexical unit (word or phrase) used as a synonymous definition for another entry word, also called defining equivalent, synonymous equivalent (example: gimcrack – a showy object of little use or value: GEWGAW) 2 the translation (in the target language) of a lexical unit (in the source language) of a bilingual dictionary, also called translation equivalent (example, dog = le/un chien). ESL: English as a second language; the presentation in a dictionary or language reference book of English used by learners for whom it is not their native language, but a second language they are learning. Etymological dictionary: a reference work describing the histories and origins of the entry words, tracing back to the earliest form (etymon) and meaning of words and phrases. Etymological fallacy: a linguistic misconception that a present-day meaning of a lexical unit should necessarily be similar to its historical meaning even though the word or phrase has changed meaning over time. Etymology: the history and origin of a lexical unit (word or phase). In print dictionaries, the etymologies are usually given in abbreviated form, e.g. for cabbage from Concise Oxford Dictionary: ME: from OFr. dial. caboche ‘head’, var. of OFr. caboce. Etymon: the form from which a word is derived, e.g. the etymon of glossary is Latin glossa, from Greek glossa ‘tongue, language’. Example/example sentence: a phrase or sentence excerpted from a written or spoken source or written by a lexicographer that contains a form of the entry word and which indicates additional information about its grammatical or semantic characteristics, such as its context and usage: also called citation, illustrative citation, illustrative example, illustrative phrase, illustrative quotation, illustrative sentence, quotation, verbal illustration. Excerpt: to extract a lexical unit and its context from a written or spoken source, recording it for analysis or other uses; to select suitable material from a data set for compiling a reference work. Exclusion: the deletion of words and phrases from a reference work due to obsolescence, lack of use in currently published literature, or the publisher not wanting politically incorrect or offensive words included. Fascicle: any of the sections of a book being brought out in instalments prior to its publication in completed form; one of several instalments of a published reference work, once a practice of large historical dictionaries. Field label: a subject label in a reference work used to indicate the discipline, domain or field of a sense or entry, e.g. biology, linguistics. Figurative use: a non-literal sense of a lexical unit, the result of an extension of the basic meaning, and often marked with a label. Fixed expression: two or more words that are always found together in a specific arrangement, such as collocations (e.g. nice surprise) compounds (e.g. school dictionary) and idioms (e.g. red herring); a phrase whose constituent elements cannot be moved or substituted without changing the meaning or literal interpretation.
446
Glossary of Lexicographic Terms
Formulaic definition: a style of describing meaning written in a specified, traditional formula, also called truncated definition. The formulas differ according to parts of speech and are used to provide consistency in treatment. Frequency: the number of occurrences of a word/phrase in written or spoken contexts, as in corpora, kept as a statistic and used in wordlist compilation and usage label assignment for the vocabulary considered for inclusion in reference works. Front matter: the introductory material of a reference book prior to the wordlist, usually including a preface, explanatory notes, articles, pronunciation guide, abbreviations and any other information on how to use the book; also called fore-matter. Function word: an article, conjunction, determiner, or preposition that plays a grammatical role in a sentence, as opposed to an adjective, noun or verb that expresses meaning. General dictionary: a reference work intended to give comprehensive coverage of the language or vocabulary. Genus: the word or phrase that classifies a lexical unit, the part of a definition that is the superordinate word (hypernym) to which the word being defined is subordinate (hyponym); also called IS-A, is-a. Ghost word: a word entered in a dictionary through some error, as misunderstanding or misreading a manuscript, or by design as a test for plagiarism; also called bogey. Gloss: a brief explanation of the meaning of a lexical unit (word or phrase) (example: band ‘strip of material’ as opposed to a definition of band ‘strip of material used as a distinguishing mark on clothes’). Glossary: a simple or short list of defined words, a wordlist defined for a limited or specialized subject or a wordlist defined concisely for a limited subject. Grammatical code: one of a system of abbreviated terms and symbols used to designate detailed syntactic information, e.g. U = uncountable noun. Grammatical information: grammar material offered in a dictionary, such as the word class (part of speech), inflections, verb relations (transitive, intransitive), noun relations (count, mass, noncount), collocations, clause and word formation. Graphic illustration: a picture, line drawing, table, list, or map included to aid description of a lexical unit or units. Guide word (or guideword): a word or part of a word printed at the top or bottom of a reference book page to indicate what entries are included on that page. Also, in learners’ dictionaries, the list of words at the beginning of a long entry to aid the user to find the sense required. Hapax legomenon: any word or phrase that appears only once in a manuscript, document, or particular area of literature. Hard word: a term that is unfamiliar to the average reader, usually a foreign, scientific, technical, or formal term. Hard words are looked up in dictionaries and motivated the creation of dictionaries, e.g. Robert Cawdrey’s Table Alphabeticall of 1604. Headword: the form of a lexical unit (word or phrase) chosen for inclusion in the wordlist defined in a dictionary, especially canonical forms; also called entry, entry head, keyword, lemma, main entry, word entry. Practice varies as to how headwords are marked typographically and how variant forms are shown. Heteronym: a lexical unit that is spelled the same as another (homograph) but differs in meaning and pronunciation.
447
Glossary of Lexicographic Terms
Historical dictionary: a type of reference work that attempts to describe all the forms and meanings of its entry words from their inception; a description of a vocabulary’s history from beginning to present, documenting the changes in form and meaning of words and phrases. Homograph: a lexical unit which is spelled the same as another but has a different pronunciation and meaning, e.g. minute ‘division of time’ and minute ‘tiny’. Homonym: a lexical unit which is spelled and pronounced the same as another but has a different meaning and etymology, and the two types are homographs and homophones, e.g. bear ‘animal’ and bear ‘carry’. Homophone: a lexical unit which is pronounced the same as another but has a different spelling and meaning, e.g. fair and fare. Hypernym (or hyperonym): the generic term of a set which has members (words and phrases) that are more specific (hyponym), e.g. step is a hypernym of footstep. Hyponym: the specific term of a word or phrase, which is a member of a larger, more generic term or set, e.g. footstep is a hyponym of step. Hyponymy: the hierarchical relationship between the meanings of lexical units, in which the meaning of one lexical unit is a specific type of another lexical unit. The sense of the hyponym (specific term) can be said to be included in that of the hypernym (generic term), e.g. flower is a type of plant, tiger is a type of cat. Ideological dictionary: a reference work arranged so that the user moves from meaning to word; also called analogical dictionary, onomasiological dictionary. Idiom: a fixed expression with a unitary meaning that is not always transparent from the combination of the meanings of its constituent words, e.g. kick the bucket, let the cat out of the bag. Illustration: a drawing, diagram or image which is offered to clarify the definition of a lexical unit or group of words or phrases. Index entry: an entry whose headword is a variant and cross-reference of a fully defined headword in a reference book. Inclusion: the inclusion of words and phrases in a reference work that are either new or have come into more frequent usage. Indo-European language: any of the family of languages that were spoken in Europe and parts of Asia and places colonized by Europeans over the past 3,000 years. The twelve branches are Indic, Iranian, Anatolian, Armenian, Hellenic, Albanian, Italic, Celtic, Tocharian, Germanic, Baltic and Slavic. Inflection: a change in the basic form of a word that shows a grammatical function such as case, gender, number, tense, person, mood, or voice (example: inflects, inflected, inflecting); a form, suffix or element involved in such a change. Informant: a person who answers a questionnaire or otherwise supplies examples of usage or other data for a dictionary or linguistic project. Inkhorn term: a hard word usually coined from foreign roots such as Latin or Greek, and thought to be unnecessary or overly pretentious, e.g. animadversion for ‘criticism’. Intentional definition: a type of definition using a formula which specifies the attributes or characteristics of a concept in relation to its hypernym (generic term), e.g. pine is ‘kind of evergreen tree’. Intensive form: a form of a lexical unit denoting stronger, more forceful, or more concentrated action relative to the root.
448
Glossary of Lexicographic Terms
International Phonetic Alphabet (IPA): a pronunciation transcription system based on the Latin alphabet that uses numerous symbols (diacritics) to represent speech sounds – now the usual way of representing pronunciation in dictionaries. Internet dictionary: see Online dictionary. Jargon: words or phrases that are used by a particular group or profession and are somewhat difficult for outsiders to understand. Keyword: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary, especially canonical forms; a defined word or phrase. Also called entry, entry head, headword, lemma, main entry, word entry. Label: a descriptor or abbreviation indicating a restricted usage of a lexical unit, by classification (domain, field, subject), part of speech, language variety, register, status, style or usage, e.g. American English, archaic, derogatory, informal, slang. Learners’ dictionary (or learner’s dictionary): a reference book intended for foreign (non-native) learners of a language, e.g. Collins COBUILD Advanced Learner’s English Dictionary. Lemma: a lexical unit which is included in the wordlist defined in a dictionary, especially canonical forms. Also called entry, entry head, headword, keyword, main entry, word entry. Lemmatization: the sorting or ordering of lexical units (in a corpus) by grouping a lemma with all of its inflected or variant forms. Lexeme: a word or phrase regarded as a single, definable item in the vocabulary of a language; also called lexical unit, lexical item. Lexemes are usually thought of as a combination of a graphic/phonic form with a meaning/semantic value in a particular grammatical context. Lexical: of or pertaining to the description of the meanings of the units of a language; of or relating to the meaning of words as distinguished from their grammar and construction; of or pertaining to a lexicon or lexicography. Lexical database, lexical knowledge base: a database of information about words; a computer-based lexical resource that is accessible through software. Lexical field, semantic field: a set of lexical units grouped by meaning and that refer to a specific subject; a set of lexical units with related meanings and which form a conceptual network within a domain. Lexical unit: a word or phrase regarded as a single, definable item in the vocabulary of a language; also called lexeme, lexical item. Lexicographer: a person who writes, edits or compiles dictionaries. Lexicographic archive: a collection of lexical information from various sources that is available for lexicographers and researchers; a collection of dictionaries in a reference department of a library or as a special collection. Lexicographic definition: an explanation of meaning that is considered the equivalent of the lexical unit and may be substituted for the lexical unit in a context (example: The book was long. = The set of written, printed, or blank sheets bound together into a volume was long). Lexicographic (or lexicographical): of or pertaining to lexicography, the defining of words or to dictionarymaking. Lexicographic order, lexicographical order: the way words are alphabetically ordered based on the alphabetical order of their component letters; also known as alphabetical order, dictionary order, lexical order.
449
Glossary of Lexicographic Terms
Lexicography: the practices and principles of dictionary-making, the editing or compiling of a dictionary; the professional activity and academic field concerned with dictionaries and other reference works, the latter also called metalexicography. Lexicology: a branch of linguistics that is concerned with the study of the basic units of vocabulary (lexical units), their formation, meaning, and structure. Lexicon: 1 the entire set of lexical units of a language; the totality of a language’s vocabulary; 2 a reference work listing and explaining the words of a language, language variety, specialized work, etc., sometimes synonymous for dictionary. Linguistics: the study of language and how it works. Listeme: an item that is part of a list, usually memorized as part of that list. Loan translation, calque: an expression adopted by one language from another in the same or nearly the same form. Loanword (or loan-word): a word or phrase that has been borrowed into a language and has not been fully assimilated into the vocabulary; also called borrowing. Logical definition: the classical formula to explain the meaning of a word or phrase using the genus (generic term) and differentia (distinguishing feature or features) formula; also called analytic definition (example: triangle – a plane figure (genus) that has three straight bounding sides (differentia). Lookup: the action or process of looking something up in a dictionary or dictionary database. Machine-readable dictionary: a dictionary that is stored as computer (machine) data; a lexical database or electronic dictionary. Machine translation: the production of text in natural language from that in another natural language by means of computer programming and software; automatic translation of one language into another. Macrostructure: the overall organizational scheme of a reference work, often starting with an alphabetical wordlist. Main entry: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary. Canonical forms are almost always included. Also called entry, entry head, entry word, headword, keyword, lemma, word entry; compare sub-entry, run-on. Meaning: the relationship between a word or phrase and object(s) or idea(s) which it designates, i.e. what a lexical unit denotes or conveys; a description (in a dictionary) of a concept referred to or implied by a word or phrase; also called sense. Meaning discrimination: the division of distinct meanings (senses) of a word or phrase within a dictionary entry; also called sense discrimination. Meaning elicitation: any process, now often computational, used to determine the meaning of a lexical unit, piece of writing, taxonomy, or other knowledge unit or structure. Megastructure: the scope and totality of the parts of a reference work. Meronymy: a hierarchical sense relation linking the subordinate parts to the superordinate whole, such as finger is a part of hand; the semantic relation connecting a part to the whole. Metadata: data providing information about other data, of which there are distinct types: administrative metadata, descriptive metadata, reference metadata, statistical metadata and structural metadata. Metalanguage: any language used to describe language, the language that describes the meanings or senses of lexical units in a dictionary.
450
Glossary of Lexicographic Terms
Metalexicography: the study of lexicography and the processes of dictionary-making (also, metalexicographer). Metonymy: the use of a lexical unit when referring to something using the name of something else to which it is closely related, e.g. the White House when referencing the president of the United States. Microstructure: the internal organizational scheme or design of a reference unit within a reference work, providing detailed information about the word or phrase. The microstructure is usually explained in a reference work’s front matter or guide. Monodirectionality: a foreign language dictionary that goes only one way, e.g. English-to-French. Monolingual dictionary: a reference work that describes only one language using the same language, e.g. general dictionaries, learner’s dictionaries. Monosemous: of or having only one sense or meaning. Monosemy: of a word or phrase, the state of having a single meaning; compare polysemy. Morphology: the form and structure of words in a language, especially their change, combination, derivation, and inflection; a branch of grammar concerned with the formation and structure of words and phrases. Multilingual dictionary: a reference work in which the vocabularies of several languages are related to each other using translation equivalents. Multi-word lexical unit (or multi-word lexical unit, multi-word unit, multi-word expression): a lexical unit consisting of two or more words which function as a unit (lexeme), both syntactically and semantically; also called multi-word combination, multi-word expression (example: express train, out of date). Natural language processing: the application of computer techniques to analyse and generate natural language text, including disambiguation, lemmatization, machine translation, parsing and speech synthesis. Neologism: a new word or a new meaning for an established word; also, the practice of coming up with or coining new words. Nesting: the practice of clustering related words/phrases within an entry in a reference work, e.g. the entry for casual would nest the entries casually, casualness at the end of the entry. Nonce word: a word or phrase coined for a particular occasion; also called hapax legomenon, nonce form. Normative dictionary: a reference work which is based on normative attitudes as to how a language should be used, a dictionary written prescriptively rather than descriptively based on facts observed about its usage. Object language: the human language from which the entry words of a dictionary are taken. Obsolete term: in a language, a lexical unit that is no longer used in speech and writing and is considered out of date; such terms survive in earlier writings or speech recordings. Online dictionary: any dictionary that is available on a computer network, such as the Internet/Web, and capable of being searched for word data. Onomasiological dictionary: a reference work arranged by the meaning or concept leading to the lexical unit (word or phrase), a dictionary presenting language as expressions of semantically linked concepts (ideas, meanings); also called semantic dictionary (examples: reverse dictionary, word-finding dictionary, Roget-style thesaurus). Onomastic dictionary: a reference work describing personal or other names such as place names, pseudonyms and surnames.
451
Glossary of Lexicographic Terms
Ontology: the hierarchical structuring of knowledge about things by subcategorizing them according to their essential (or at least relevant and/or cognitive) qualities; a set of concepts – such as things, events and relations – that are specified in some way (such as specific natural language) in order to create an agreedupon vocabulary for exchanging information. Open corpus: an open-ended corpus which is compiled from an unlimited number of written and spoken sources and accounts for change in the language. Orthographic word: a lexical unit (word or phrase) distinguished from others by its spelling. Orthography: the system of spelling that a language uses; the principles underlying spelling within a language; the study of spelling. Ostensive definition: a definition which includes a representation of the object or idea being described, as ‘yellow: a color whose hue resembles that of ripe lemons or sunflowers’; also called synthetic definition. In a dictionary this can be supplemented by pictorial illustration or otherwise pointing directly at an object. Overall-descriptive dictionary: a reference book intended to describe the standard and non-standard uses of its vocabulary. Part of speech: the syntactic classification or grammatical role of a sense or entry, e.g. noun, verb, prefix; also called word class. Pedagogical lexicography: any reference work specifically designed for teachers and learners of a foreign or non-native language. Phonetic: of or pertaining to speech sounds. Phonetics: a branch of linguistics concerned with the production and nature of speech sounds, especially in articulatory-biological or acoustic-physical terms. Phonological word: a lexical unit (word or phrase) distinguished from others by its pronunciation. Phonology: a branch of linguistics concerned with the study of speech as a system of sound patterns, especially relationships between syllables and words; the history and theory of sound changes. Phrasal verb: a verb combined with a preposition or adverb (or both) and functions as a verb with a specific meaning that differs from the combined meanings of the individual lexical units. Phrase: two or more words functioning as a syntactical and semantic unit with a single grammatical function. Phraseological information: data or a reference work describing fixed expressions, phrases or sentences; data about phrases in syntactic context. Phraseology: the study of phrases, such as fixed expressions, idioms and multi-word expressions. Picture dictionary, visual dictionary: a reference work which focuses on illustrations (images, line drawings) to convey the meaning of terms and, often, the parts comprising the terms. Plagiarism: the illegal copying of definitions from one dictionary into another. Plan: the editorial policies, practices and objectives developed by those involved in a lexicographic project. Pocket dictionary, mini dictionary: a very small dictionary that would fit into a pocket. Political correctness: the avoidance of including language expressions that are or can be perceived as discriminatory or offensive. Polysemous: of or having more than one sense or meaning.
452
Glossary of Lexicographic Terms
Plural: the form of a lexical unit used for referring to more than one person or thing. Polysemy: of a word or phrase, the state of having more than one sense or meaning. Most lexical units are polysemous and a general dictionary functions to distinguish those senses; compare monosemy. Possessive form: the form of a lexical unit expressing ownership or direct possession. Pragmatic information/pragmatics: word data describing social and cultural rules of speaking such as gesture, intonation and tone, pitch, and other conventions of communication. Prescriptive: pertaining to a strict and authoritarian approach to describing the meanings and uses of lexical units based on normative attitudes as to how a language should be used as opposed to the descriptive which is based on facts observed about a language’s use (noun, prescriptivism). Pronouncing dictionary: a type of reference work which gives information on the pronunciation of lexical units. Pronunciation: the form, production and representation of speech as studied in phonetics and phonology. Pronunciation is codified in dictionaries, mainly by phonetic transcription (using special symbols as from the International Phonetic Alphabet) or respelling (by conventional letters or characters). Pronunciation key: a table that translates the symbols used to represent speech sounds and gives representative words containing those speech sounds. Proper name, proper noun: a noun that denotes a particular entity, organization, person, place and which is usually capitalized. Quotation: a phrase or sentence excerpted from a written or spoken source that contains a form of the entry word and which indicates additional information about its grammatical or semantic characteristics; in a dictionary entry or citation file, a citation; also called example, illustrative citation, illustrative example, illustrative phrase, illustrative quotation, illustrative sentence, verbal illustration. Range of application: the semantic and syntactic restrictions of a lexical unit’s use. Range of information: the extent of information offered about the entry words in a reference book; also called range. Reading program: a systematic excerption process for collecting new terms and the context(s) in which they are used. Received Pronunciation, RP: a way of pronouncing British English that is often used as a standard in the teaching of English as a foreign language. Redundancy: information that is expressed more than once or the use of multiple words to express a single idea. Reference skills: the ability to find needed or sought-after information within a dictionary or other reference work; an understanding of the structure and features of a reference work for useful consultation. Reference work: a print or electronic work compiled to convey information. Regional use: a sense of a lexical unit that is particular to a region or locality. Register: a variety of language associated with a particular social context; the style of language, grammar, and words used for particular situations, e.g. informal language used at a party, legal language, jargon of a technology field. Register label: a label used to mark a feature of usage for a word or phrase, such as formality, style, or variation in place and time.
453
Glossary of Lexicographic Terms
Respelled pronunciation (or respelling): a pronunciation transcription system using conventional letters or characters and a minimum number of diacritical symbols. Retronym: a new word or phrase created from an existing word which distinguishes it from an existing term that was once used alone, e.g. acoustic guitar/electric guitar, cloth diaper/disposable diaper. Reverse dictionary: a reference work which alphabetically lists clue words or phrases that offer users a way of moving from a concept, idea, or meaning to the target word – an inversion of the traditional order; also called onomasiological dictionary, word-finding dictionary. Root: in etymology, the part of a word which is common to a word family and may have cognates in related languages, e.g. Primitive Indo-European *mater- as the root of ‘mother’ and cognates. Root word (or root morpheme): the base of a word; the form of a word after all affixes are removed, e.g. vigour as the root of reinvigorating. Rule-based definition: the use of a rule to define a word, e.g. ‘whom: used instead of “who” as the object of a verb or preposition’ or ‘objective case of who’. Run-in: a derivative, idiom, etc. placed within an entry. Run-on: a derivative, idiom, etc. placed at the end of an entry; a word or phrase not given separate headword status but added as a sub-entry under a word or phrase to which it is related. This is a typical treatment of derivatives that do not require separate definition; compare main entry. Scientific terminology, scientific vocabulary: the words and phrases used by scientists in the context of their professional activities. Semantic: of or concerning meaning or the distinction of meanings of a word or phrase. Semantic change: the gain or loss of meanings in words and phrases in the evolution of word usage, created by a number of factors and processes such as narrowing of a sense, broadening of a sense, shift to positive or negative connotation; also called semantic development, semantic drift, semantic progression, semantic shift. Semantic dictionary: a reference work arranged so that the user moves from meaning to word; also called onomasiological dictionary. Semantic field (lexical field): a set of words grouped by meaning which form a conceptual network, e.g. colour terms; an area of human experience or perception that is described by a set of interrelated words/ phrases. Semantic web: a proposed extension of the World Wide Web whose pages have subject matter encoded in them without reliance on keyword in context (KWIC), the goal of which is to make all Internet data machine-readable. Semasiology: the explanation of the meaning of given words or phrases; semantics. Traditional dictionaries supply semasiological information while thesauruses and reverse dictionaries offer the opposite, onomasiological data. Semiotics: the study of the way in which people communicate through signs and symbols. Sense: a meaning conveyed by a lexical unit (word or phrase), one of several meanings that can be established for a word or phrase and described by a definition in a dictionary; also called meaning. Sense discrimination: the division of meanings within a dictionary entry, the treatment of polysemy (multiple meanings) through rationalization, discrimination and display in dictionary entries; also called meaning discrimination.
454
Glossary of Lexicographic Terms
Sense ordering: the principles employed in a reference work for arranging the different senses (meanings) of the entries. Senses may be ordered historically according to the semantic changes the word/phrase has undergone, or by frequency of use, or logically in relation to a ‘core’ meaning from which other senses have developed. Sense relation: any semantic link between two or more words, such as by homonymy, hyponymy, polysemy, synonymy, complementarity, and antonymy. Two types are distinguished: inclusion (hyponymy, synonymy) and exclusion (antonymy). Slang: any informal word or phrase that is not considered appropriate in certain circumstances, such as formal occasions. Slang used by a particular group of people is sometimes called cant, jargon or argot (also, slang dictionary). Slip: a piece of paper, card, or database entry where a record is made of a lexical unit’s context and any other linguistic information excerpted from a written or spoken source, a record of information from a reading program for a dictionary project. Source language: the language of the entry words, especially in a bilingual dictionary; the language of a text that is translated into another (target) language. Specialized dictionary: any reference work restricted to a subset of language or for a specific target audience, e.g. law dictionary, dictionary of early American English. Specialized sense: a connotative, figurative, or idiomatic meaning of a lexical unit (word or phrase) that exists only in a special context; the narrowing of a sense within a particular context, e.g. virus in computing. Spelling: the correct order of the letters in a word; a language’s set of conventions that regulates the way of using graphemes to represent the language in writing. Sprachgefühl: an intuition for language meaning and usage, a sensitivity to language, especially for what is grammatically or idiomatically acceptable in a given language. Standard-descriptive dictionary: a reference book intended to describe the standard uses of its vocabulary. Status label: a label used to mark the acceptability or currency of a word or phrase in a dictionary, e.g. obsolete, rare. Style label (or stylistic label): a label used to mark the style level of a word or phrase in a dictionary, esp. the formality and social acceptability of a sense’s or entry’s use, e.g. colloquial, formal, informal, nonstandard, slang, vulgar; sometimes called status label. Subentry (or sub-entry): a listed or defined derivative of a lexical unit, one of the numbered senses of a headword within a dictionary entry; a derivative, idiom, etc. listed or defined within or following a dictionary entry; compare main entry. Subject label: a label used to mark the field, domain, or subject of a sense or entry, e.g. chemistry, computer science, psychology. Subsense (or sub-sense): one of the distinct meanings of a polysemous word, often marked by a number; a meaning that follows or is attached to a main sense of a word or phrase and which gives a more specific meaning or use. Substitution principle/substitutability: a method of defining in which the definition text is substitutable for the lexical unit (word or phrase) in context; the principle that a word or phrase in a text can be replaced by its dictionary definition for certain categories of words.
455
Glossary of Lexicographic Terms
Syllabification (or syllabication): a system of depicting word division for writing, using a symbol, as a centred dot or dividing line, to show the acceptable division points; the division of words into phonic syllables and their written representation by graphic syllables for purposes of hyphenation. Symbol: a mark, letter, number, picture or shape used to represent something. Synonym: a lexical unit (word or phrase) whose meaning is similar to that of another lexical unit or units. Synonymy varies in degree and nature and there are no true synonyms as no two words have exactly the same sense in terms of denotation, connotation, formality or currency. Synonym dictionary: a dictionary that contains information on words or phrases grouped by semantic similarity, offering lexical choices for expressing a specific meaning. Synonymy: a paragraph, usually following an entry that lists and discriminates lexical units similar to the entry word in meaning and usage, a display of synonym relations in the form of a short essay; also called synonym paragraph, synonym study. Syntax: a branch of grammar concerned with the part of speech or word class of lexical units (words and phrases) and their compatibility within sentences and texts; the grammatical information about a lexical unit. Synthetic definition: a definition which includes a representation of the object or idea being described, as ‘yellow: a color whose hue resembles that of ripe lemons or sunflowers’; also called ostensive definition. In a dictionary this can be supplemented by pictorial illustration or otherwise pointing directly at an object. Taboo word, bad word: a lexical unit which is considered unacceptable, offensive, or politically incorrect. Target language: the language of the translations or equivalents in a bilingual dictionary; the language into which a source language is to be translated. Taxonomy: a classification system created to order or connect relationships between the entities; a classification of terms and concepts to show their relations. Technical vocabulary: the words and phrases used by technologists in the context of their professional activities. Terminological dictionary: a type of reference work providing information about a special vocabulary or the vocabulary of a specialist field, e.g. Dictionary of Lexicography. Thematic dictionary: a type of reference work that is organized by topics or concepts, e.g. a thesaurus. Theoretical lexicography, metalexicography: the scholarly discipline of analysing and describing the semantic, syntagmatic and paradigmatic relationships within the lexicon (vocabulary) of a language, developing theories of dictionary components and structures linking the data in dictionaries, the needs for information by users in specific types of situations, and how users may best access the data incorporated in printed and electronic dictionaries; also, the academic study of lexicography, especially its historical principles and research methods. Thesaurus: 1 a wordlist with synonyms, either arranged by concept/idea or alphabetically; a type of reference work presenting synonym networks between words within concepts, 2 an onomasiological or thematic reference work. Trade name (brand name): a proprietary name or symbol given to a business, company, product or service, which may or may not be registered as a trademark. Trade names that are officially registered and legally protected are trademarks. Some achieve generic status and become part of everyday language, e.g. Kleenex. Translation equivalent: the translation (in the target language) of a lexical unit (in the source language) of a bilingual dictionary, a word or phrase in one language which corresponds in meaning to a word or phrase in another language; also called equivalent.
456
Glossary of Lexicographic Terms
Troponym: a lexical unit that describes a manner of doing something, especially a verb that indicates more precisely the manner of doing something, e.g. stroll is a troponym of walk. Typifying definition: a definition that focuses on what is typical about the lexical unit being defined, e.g. abaya = a long, loose-fitting overgarment, typically made of wool, traditionally worn in some Arab countries. Typography: the art and craft of composing type and fonts; the arrangement and appearance of words and letters in a printed document. Typology: the classification of dictionaries and other reference works by their types, including by coverage, format, functionality, information included, language(s), medium, size, users. Unabridged: complete, comprehensive in coverage; inclusive of standard and nonstandard meanings and uses. In a dictionary family, this is the largest in size. Usage: the manner in which lexical units are used syntactically and semantically according to time, space, or circumstance in a language or by a part of society. Usage information is collected and presented in a dictionary often by descriptive labels or usage notes and/or by giving example sentences. Usage label: the marking of a word, phrase or sense for a syntactic or semantic restriction; the marking of a word or phrase as typical or appropriate in a particular context or language variety. Usage note: a note or paragraph offering explanation or guidance on syntactic or semantic restrictions on a lexical unit (word or phrase). User information needs: the identification of the types of information most needed by dictionary users, including accessibility, design, format, level of authority, coverage and scope. Variant: a form of a word or phrase that is different from the standard (most commonly/frequently used) form – in spelling, pronunciation or grammar. Verbal illustration: a phrase or sentence excerpted from a written or spoken source or written by a lexicographer that contains a form of the entry word and which indicates additional information about its grammatical or semantic characteristics, such as its context and usage; also called citation, example, illustrative citation, illustrative example, illustrative phrase, illustrative quotation, illustrative sentence. Vocabulary: the total list of lexical units (words, phrases) chosen for entry in a dictionary; also called wordlist; the sum total of the words used in a language or by a speaker of a language. Word entry: a lexical unit (word or phrase) which is included in the wordlist defined in a dictionary. Canonical forms are almost always included. Also called entry, entry head, entry word, headword, keyword, lemma, main entry. Wordlist: the total list of lexical units (words, phrases) chosen for entry in a dictionary; also called vocabulary.
457
Annotated bibliography Howard Jackson The aim of this bibliography is to point the researcher in lexicography in the direction of the most useful sources and to further bibliographical material. It cannot possibly list all the publications on (meta-)lexicography that have appeared over the last half-century or so. For older works, bibliographies listed under 8.1 below may be consulted. The focus in this bibliography is on more recent and up-to-date work, and specifically on dictionary research rather than dictionaries per se. The bibliography is not a simple alphabetical list but is organized under a number of topic headings, which are intended to act as a guide to the reader. In topic 2 and from topic 5 onwards, works are listed in reverse chronological order (latest first).
Topic headings: 1. Bibliographies (of dictionary research) 2. Encyclopaedias, compendia and dictionaries of lexicography 3. Book series 4. Journals 5. Manuals 6. Textbooks 7. Historical lexicography 8. Bilingual lexicography 9. For learners 10. Lexicography of individual languages 11. Electronic lexicography 12. Dictionary use 13. Other works
1 Bibliographies (of dictionary research) Euralex Bibliography of Lexicography, edited by Anne Dykstra, online at: http://euralex.pbworks.com/w/ page/7230036/FrontPage. [still under construction, though not updated since 2012; participation from lexicographers invited; it includes: a thematic list, an alphabetical list, and R.R.K. Hartmann’s bibliography (updated to July 2007)] Bibliografía temática de la lexicografía (2003), compiled by Félix Córdoba Rodríguez at the Universidade da Coruña, online at http://www.udc.es/grupos/lexicografia/bibliografia.htm. [the alphabetical listing is complete and contains over 10,000 items; but the thematic listing has not been finished]
Annotated Bibliography
Boccuzzi, C., M. Centrella, M. Lo Nostro and V. Zotti (2007), Bibliographie thématque et chronologique de métalexicographie 1950–2006, Fasano: Schena Editore. [with a concentration on French metalexicography, this volume presents its bibliography by topic and by chronology] Dolezal, F.T. and D.R. McCreary (1999), Pedagogical Lexicography Today. A Critical Bibliography on Learners’ Dictionaries with Special Emphasis on Language Learners and Dictionary Users (Lexicographica. Series Maior 96), Tübingen: Max Niemeyer. [an annotated bibliography of over 500 articles in the field of pedagogical lexicography, with a topic index and commentary addressing the issues and debates within the field] Wiegand, H.E. (2006–7), Internationale Bibliographie zur germanistischen Lexikographie und Wörterbuchforschung, 3 vols, Berlin: Walter de Gruyter. [International Bibliography of German Lexicography and Dictionary Research; Vol 1, A-H; Vol 2, I-R; Vol 3, S-Z]
2 Encyclopaedias, compendia and dictionaries of lexicography Ogilvie, S. (ed.) (2020), The Cambridge Companion to English Lexicography, Cambridge: Cambridge University Press. [aims to contextualize an array of English dictionaries and pose theoretical and methodological questions relating to their role as tools of standardization, prestige, power, education, literacy, and national identity; comprising 27 chapters in three parts: I Issues in English lexicography; II English dictionaries throughout the centuries; III Dictionaries of English and related varieties] Fuertes-Olivera, P.A. (ed.) (2018), The Routledge Handbook of Lexicography, Abingdon and New York: Routledge. [47 chapters in six parts: I Foundations of lexicography; II The interdisciplinary nature of lexicography; III Types of dictionary; IV Innovative dictionaries; V World languages, lexicography and the Internet; VI Looking to the future: lexicography in the Internet era] Hanks, P. and G-M. de Schryver (eds) (2017), International Handbook of Modern Lexis and Lexicography, Berlin and Heidelberg: Springer Verlag. [a ‘live’ reference work, currently with 28 chapters, available at https://link.springer.com/referencework/10.1007/978-3-642-45369-4#about; it ‘deals with every aspect of lexicography in all major languages, together with area studies of lexicography in indigenous languages, as well as rare and endangered languages’] Durkin, P. (ed.) (2016), The Oxford Handbook of Lexicography, Oxford: Oxford University Press. [36 chapters in four parts, together with an introduction by the editor and an appendix of ‘A Chronology of Major Events in the History of Lexicography’: Part I The synchronic dictionary; Part II Historical dictionaries; Part III Specialist dictionaries; Part IV Specific topics] Gouws, R.H., U. Heid, W. Schweickard and H.E. Wiegand (eds) (2014), Dictionaries. An International Encyclopedia of Lexicography. Supplementary Volume: Recent Developments with Focus on Computational Lexicography, Berlin: De Gruyter Mouton. [supplement to Hausmann et al. (1989– 91) – see below; special attention has been given to the following topics: the status and function of lexicographic reference works, the history of lexicography, the theory of lexicography, lexicographic processes, lexicographic training and lexicographic institutions, new metalexicographic methods, electronic and, especially, computer-assisted lexicography] Wiegand, H.E., M. Biebwenger, R.F. Gouws, M. Kammerer, A. Storrer and W. Wolski (eds) (2010), Wörterbuch zur Lexikographie und Wörterbuchforschung/Dictionary of Lexicography and Dictionary Research, Berlin: Walter de Gruyter. [first volume (A–C) of a proposed four-volume dictionary containing the specialist terminology of dictionary research in about 5,600 headwords, 7,200 reference headwords and 50,000 headword equivalents in nine languages, with an introduction to the subject in both English and German]
459
Annotated Bibliography
Fontenelle, T. (ed.) (2008), Practical Lexicography: A Reader, Oxford: Oxford University Press. [a collection of significant articles on issues of dictionary compiling, under the following headings: I Metalexicography, macrostructure, microstructure, and the contribution of linguistic theory; II Corpus design; III Lexicographical evidence; IV Word senses and polysemy; V Collocations, idioms, and dictionaries; VI Definitions; VII Examples; VIII Grammar and usage in dictionaries; IX Bilingual lexicography; X Tools for lexicographers; XI Semantic networks and wordnets; XII Dictionary use] Hartmann, R.R.K. (ed.) (2003), Lexicography: Critical Concepts, 3 vols, London and New York: Routledge. [Vol. 1: Dictionaries, Compilers, Critics and Users; Vol. 2: Reference Works across Time, Space and Languages; Vol. 3: Lexicography, Metalexicography and Reference Science; 70 previously published articles, mostly from the twentieth century, collected together to relate to the themes represented in the titles of the volumes]. Burkhanov, I.Y. (1998), Lexicography: A Dictionary of Basic Terminology, Rzeszów: Rzeszów Wydawn. Wyższej Szkoły Pedagogicznej w Rzeszowie. Hartmann, R.R.K. and G. James (1998), Dictionary of Lexicography, London and New York: Routledge. [a comprehensive listing of lexicographical terms, together with an extensive bibliography; second, revised paperback edition published in 2001] Martínez De Sousa, J. (1995), Diccionario de lexicografía práctica (Dictionary of Practical Lexicography), Barcelona: Biblograf. Hausmann, F-J., O. Reichmann, H.E. Wiegand and L. Zgusta (1989–91), Wörterbücher/Dictionaries/ Dictionnaires: An International Encyclopedia of Lexicography, vols 1–3, Berlin: Walter de Gruyter. [No 5 in the series Handbücher zur Sprach- und Kommunikationswissenschaft; articles organized in 38 sections covering the whole range of lexicography, representing the state of the art at the end of the 1980s; vols 1–3. Available at http://www.degruyter.com/view/serial/16647]
3 Book series Études de lexicologie, lexicographie et dictionnairique, series edited by Bernard Quémada and Jean Pruvost, published by Honoré Champion, Paris.[The series is devoted to lexicology, lexicography and dictionary science; the issues raised by computerization in lexicography and dictionary science, the distinctions between monolingual and bilingual dictionaries, the study of words based on the latest research, are some of the subjects treated in this series.] Lexicographica: Series Maior. Supplementbände zum Internationalen Jahrbuch für Lexikographie, Max Niemeyer Verlag, Tübingen (until Vol. 134 in 2008); Walter de Gruyter Verlag, Berlin/ New York (from 2009). [supplementary volumes to Lexicographica, the international annual of lexicography; since 1984 over 150 volumes have been published in this series, which constitutes an international library for the field of lexicography and dictionary research; the published volumes represent the whole range of current perspectives, from dictionary history and dictionary typology to dictionary criticism, and from dictionary use and dictionary structure to computational lexicography. Details of volumes published can be found at: http://pub.ids-mannheim.de/extern/ lex/] Terminology and Lexicography Research and Practice, John Benjamins Publishing Co, Amsterdam [aims to provide in-depth studies and background information pertaining to lexicography and terminology; general works include philosophical, historical, theoretical, computational and cognitive approaches; other works focus on structures for purpose- and domain-specific compilation (LSP), dictionary design, and training; the series (20 volumes published between 1999 and 2020, with list available at https://benjamins.com/catalog/tlrp) includes monographs, state-of-the-art volumes and course books in the English language]
460
Annotated Bibliography
4 Journals (see also Chapter 25) Cahiers de lexicologie, Institut de Linquistique Française, since 1959, two parts per year. http://atilf.atilf. fr/jykervei/cahlex.htm Dictionaries, Dictionary Society of North America, since 1979, single annual volume. https:// dictionarysociety.com/ International Journal of Lexicography, Oxford University Press, for European Association for Lexicography, since 1988, four parts per year. https://academic.oup.com/ijl Lexicographica: Internationales Jahrbuch der Lexikographie, Walter de Gruyter (up to 2008, Max Niemeyer), since 1985, single annual volume. https://www.degruyter.com/view/journals/lexi/lexioverview.xml?lang=en Lexiconordica, Nordisk Forening for Leksikografi, since 1994, single annual volume. http:// nordisksprogkoordination.org/nfl/publikationer/Lexiconordica Lexicon, Kenkyusha, for Iwasaki Linguistic Circle, Tokyo, since 1972, at least one volume per year. See https://globalex.link/publications/Lexicon/ Lexikos, Bureau of the WAT and African Association of Lexicography, since 1991, single annual volume (since 2011 available freely online at https://lexikos.journals.ac.za/pub). Revista de Lexicografía, Universidade de Coruña (Grupo de Lexicografía), since 1994/95, single annual volume. https://www.udc.es/grupos/lexicografia/revista.htm Studi di lessicografia italiana, Accademia della Crusca, since 1979, single annual volume (not all years). https://accademiadellacrusca.it/it/contenuti/studi-di-lessicografia-italiana/1225 Trefwoord, Fryske Akademy, since 1999, single annual volume; combined in 2016 with De Woordenaar and published by the Instituut voor de Nederlandse Taal; available online at: https://ivdnt.org/ onderzoek-a-onderwijs/publicaties/trefwoord.
5 Manuals Atkins, B.T.S. and M. Rundell (2008), The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press. [after an introductory chapter, there are three parts: (1) pre-lexicographic planning; (2) analysing the data; (3) compiling the entry] Fontenelle, T. (ed.) (2008), Practical Lexicography: A Reader, Oxford: Oxford University Press. [a collection of significant articles on issues of dictionary compiling, under the following headings: I Metalexicography, macrostructure, microstructure, and the contribution of linguistic theory; II Corpus design; III Lexicographical evidence; IV Word senses and polysemy; V Collocations, idioms, and dictionaries; VI Definitions; VII Examples; VIII Grammar and usage in dictionaries; IX Bilingual lexicography; X Tools for lexicographers; XI Semantic networks and wordnets; XII Dictionary use] De Villers, M-E. (2006), Profession lexicographe, Montréal: Les Presses de l’Université de Montréal. [an introduction to lexicography and the profession of the practical lexicographer] Svensén, B. (2004), Handbok i lexicografi. Ordböcker och ordboksarbete i teori och praktik, Stockholm: Norstedts Akademiska Förlag. [translated into English and published in 2009 as A Handbook of Lexicography. The Theory and Practice of Dictionary-Making, Cambridge University Press; intended as a general orientation to the principles and methods of lexicography] Van Sterkenburg, P. (ed.) (2003), A Practical Guide to Lexicography, Amsterdam and Philadelphia: John Benjamins. [a collection of 29 contributions, intended as a coursebook aimed at ‘professional lexicographers and students of language’, organized in two parts: 1 The forms, contents and uses of dictionaries; 2 Linguistic corpora (databases) and the compilation of dictionaries] Bergenholtz, H. and S. Tarp (eds) (1995), Manual of Specialized Lexicography. The Preparation of Specialized Dictionaries, Amsterdam and Philadelphia: John Benjamins. [aims to provide an improved
461
Annotated Bibliography
foundation for practical LSP lexicography and a manual for would-be LSP dictionary makers; chapters cover the range of information that such dictionaries may contain] Svensén, B. (1993), Practical Lexicography: Principles and Methods of Dictionary-Making, Oxford: Oxford University Press. [systematically and comprehensively covers monolingual and bilingual lexicography; updated by Svensén (2004)] Boguraev, B. (ed.) (1991), Building a Lexicon, special issue of the International Journal of Lexicography, 4 (3). [three articles on the contributions of, respectively, lexicography, linguistics and computers to building a lexicon] Zgusta, L. (1971), Manual of Lexicography (Janua Linguarum, Series Maior 39), The Hague: Mouton. [the original manual, now out of print]
6 Textbooks Vrbinc, M. (2020), The ABC of Lexicography, Saarbrücken (Germany): Lambert Academic Publishing. [a step-by-step introduction to the art and craft of dictionary-making from the point of view of a lexicographer and dictionary user: what types of dictionaries there are, who uses dictionaries and what users look up, what a dictionary consists of, what the elements of the word-list are, which elements a dictionary entry has, what types of specialized dictionaries there are, what terminological dictionaries are and how corpora are used in lexicography] Klotz, M. and Th. Herbst (2016), English Dictionaries. A Linguistic Introduction, Berlin: Erich Schmidt Verlag. [discusses dictionary structure, types of information in monolingual and bilingual dictionaries, electronic dictionaries, lexicographic data; and concludes with a survey of English dictionaries] Herbst, Th. and M. Klotz (2003), Lexikografie. Eine Einführung, Paderborn: Schöningh (UTB). [deals, among other things, with the form of definitions, the selection of examples, the handling of collocations and idioms, and syntactic information in monolingual and bilingual dictionaries] Jackson, H. (2002), Lexicography. An Introduction, London and New York: Routledge/Taylor & Francis. [overview of the history, types and content of dictionaries, with a concentration on English monolingual dictionaries, including those for learners] Hartmann, R.R.K. (2001), Teaching and Researching Lexicography (Applied Linguistics in Action Series), Harlow: Longman/Pearson Education. [deals with the relationship between lexicographic theory and practice in three sections: I Lexicography in Practice and Theory, II Perspectives on dictionary research, III Issues, methods and case studies, IV Resources] Landau, S.I. (2001), Dictionaries: The Art and Craft of Lexicography, Cambridge: Cambridge University Press. [second edition of a classic introductory account of the lexicography of English by a working lexicographer, with particular attention to the influence of the corpus revolution, and a useful chapter on legal and ethical issues] Béjoint, H. (2000), Modern Lexicography: An Introduction, Oxford: Oxford University Press. [originally published as Tradition and Innovation in Modern English Dictionaries in 1994, a useful introduction to the study of lexicography] Kipfer, B.A. (1984), Workbook on Lexicography (Exeter Linguistic Studies 8), Exeter: University of Exeter. [a guide to the study of lexicography, explaining the processes that lexicographers go through, with an especially useful section on types of defining style]
7 Historical lexicography Mitchell, L.C. (2020), A Cultural History of English Lexicography, 1600–1800: The Authoritative Word, Abingdon and New York: Routledge. [this volume analyses the complex and changing relationship in
462
Annotated Bibliography
the early modern period between authority and lexicographers, identifies ways in which lexicographers constructed their authority, examines the link between the conservative and the subversive in dictionaries, and charts the shift of linguistic authority from grammarians to lexicographers] Considine, J. (ed.) (2019), The Cambridge World History of Lexicography, Cambridge: Cambridge University Press. [a survey of dictionary making from the ancient civilizations of Mesopotamia, Egypt, China, India, and the Greco-Roman world, to the contemporary speech communities of every inhabited continent; 32 chapters arranged in four parts: I The Ancient World; II The Pre-Modern World; III The Modern World: Continuing Traditions; IV The Modern World: Missionary and Subsequent Traditions] Considine, J. (2017), Small Dictionaries and Curiosities: Lexicography and Fieldwork in Post-Medieval Europe, Oxford: Oxford University Press. [investigates the first European wordlists of minority and unofficial languages and dialects, from the end of the Middle Ages to the early nineteenth century, collected by people who were curious about the unrecorded or little-known languages they heard around them; they document more than 40 language varieties, from a Basque-Icelandic pidgin of the North Atlantic to the Kalmyk language of the lower Volga] Adams, M. (ed.) (2010), ‘Cunning Passages, Contrived Corridors’: Unexpected Essays in the History of Lexicography, Monza: Polimetrica. [a collection of essays covering the history of lexicography, historical lexicography, and historical lexicology] Cowie, A.P. (ed.) (2009), The Oxford History of English Lexicography, 2 vols, Oxford: Oxford University Press. [Vol. 1 General-Purpose Dictionaries; Vol. 2 Specialized Dictionaries; covers from Middle Ages to present, chronologically in Vol. 1 and thematically in Vol. 2]. San Vincente, F. (ed.) (2008–10), Textos fundamentales de la lexicografía italoespañola (1917–2007), 3 vols, Monza: Polimetrica. [a joint history of Italian and Spanish lexicography, examining dictionaries published in the two countries in the nineteenth and twentieth centuries; a fourth volume is intended to cover 1570 to 1805] Considine, J. (2008), Dictionaries in Early Modern Europe: Lexicography and the Making of Heritage, Cambridge: Cambridge University Press. [draws on published and archival material to survey a wide range of dictionaries of western European languages (including English, German, Latin and Greek) published between the early sixteenth and mid-seventeenth centuries] Considine, J. and G. Iamartino (eds) (2008), Words and Dictionaries from the British Isles in Historical Perspective, Newcastle upon Tyne: Cambridge Scholars Publishing. [11 papers from the second International Conference on Historical Lexicography and Lexicology, with an emphasis on lexicography and dictionaries of English] Yong, H. and J. Peng (2008), Chinese Lexicography: A History from 1046 BC to AD 1911, Oxford: Oxford University Press. [covers three millennia and 600 titles, including general-purpose dictionaries, dialect dictionaries, LSP dictionaries and encyclopaedias; with a primary focus on monolingual lexicography] Coleman, J. (2004–10), A History of Slang and Cant Dictionaries, 4 vols, Oxford: Oxford University Press. [Vol. 1 covers the period 1567–1784, Vol. 2, 1785–1858, Vol. 3, 1859–1936; Vol. 4, 1937–84]. Coleman, J. and A. McDermott (eds) (2004), Historical Dictionaries and Historical Dictionary Research. Papers from the International Conference on Historical Lexicography and Lexicology, Leicester 2002 (Lexicographica. Series Maior 123). Tübingen: Max Niemeyer. [papers from the first conference on historical lexicography and lexicology, in two parts; Part 1 has 12 articles on dictionary history, Part 2 has 6 articles on historical dictionaries] Hayakawa, I. (2001), Methods of Plagiarism. A History of English-Japanese Lexicography, Tokyo: Jiyūsha. [traces the development of the methods of compiling English–Japanese dictionaries, especially in the 1860s and 1870s; special attention is paid to early English–Japanese dictionaries such as the Eiwa-Taiyaku-Shuchin-Jisho (1862) and the Fuon-Sozu-Eiwa-Jii (1873), which enjoyed great popularity] Hüllen, W. (1999), English Dictionaries 800–1700. The Topical Tradition, Oxford: Clarendon Press. [taking in some 400 titles, a discussion of the onomasiological (topical) strand of dictionary making, especially in English, from its beginnings in the ninth century to the end of the seventeenth, from Aelfric to Wilkins]
463
Annotated Bibliography
Van Hoof, H. (1994), Petite histoire des dictionnaires (Bibliothèque des Cahiers de l’Institut de Linguistique de Louvain 77), Louvain-la-Neuve: Peeters. [a review of monolingual and bilingual dictionaries from antiquity to the present] Boisson, C., P. Kirtchuk and H. Béjoint (1991), ‘Aux origines de la lexicographie: les premiers dictionnaires monolingues et bilingues’, International Journal of Lexicography 4 (4), 261–315.
[traces the origins of dictionaries back to ancient civilizations across the world]
James, G. (ed.) (1989), Lexicographers and their works, Exeter: Exeter University
[19 articles on a range of lexicographical topics, including nine on the history of lexicography]
8 Bilingual lexicography Dominguez Vázquez, M.J., M. Mirazo Balsa and V. Riveiro (eds) (2020), Studies on Multilingual Lexicography (Lexicographica. Series Maior 157), Berlin: de Gruyter. [three chapters on ‘multilingual lexicography in a new society’ and eight chapters on ‘multilingual electronic dictionaries’] Tallarico, G. (2016), La dimension interculturelle du dictionnaire bilingue, Paris: Honoré Champion. [examines cultural aspects of four French–Italian and Italian–French dictionaries, especially analysing lexical gaps, examples of use, cultural notes and false borrowings] Stark, M. (2011), Bilingual Thematic Dictionaries (Lexicographica. Series Maior 140), Berlin: Walter de Gruyter. [identifies the characteristic features of bilingual thematic dictionaries, evaluates their usefulness, and proposes improvements] Hartmann, R.R.K. (2007), Interlingual Lexicography, Tübingen: Max Niemeyer. Selected Essays on Translation Equivalence, Contrastive Linguistics and the Bilingual Dictionary (Lexicographica. Series Maior 133), Tübingen: Max Niemeyer. [a collection of 24 essays by Hartmann on the topic of translation equivalence and its treatment in the bilingual dictionary, especially from the perspective of the user] Yong, H. and J. Peng (eds) (2007), Bilingual Lexicography from a Communicative Perspective, Amsterdam and Philadelphia: John Benjamins. [presentation of the ‘communicative theory of lexicography’, pioneered by Yong and Pen, and of the empirical investigation that underpins it] Adamska-Sałaciak, A. (2006), Meaning and the Bilingual Dictionary. The Case of English and Polish, Frankfurt: Peter Lang. [four chapters exploring the field of bilingual lexicography, entitled ‘Bilingual lexicography’, ‘Capturing meaning’, ‘Hunting for equivalents’, and ‘Giving examples’] Chan, Sin-Wai (ed.) (2004), Translation and Bilingual Dictionaries. Papers from the Hong Kong 2002 Conference (Lexicographica. Series Maior 119), Tübingen: Max Niemeyer. [after an introductory chapter by the editor on dictionaries and translators, the volume has two parts; Part 1 has 8 articles on translation and bilingual dictionaries, Part 2 has 8 articles on bilingual dictionaries and intercultural communication] Ferrario, E. and V. Pulchini (eds) (2002), La Lessicografia Bilingue tra presente e avvenire, Vercelli: Ed. Mercurio. [papers from a conference on bilingual lexicography, concentrating on Italian, held in Vercelli, Italy in May 2000] Szende, T. (ed.) (2000a), Dictionnaires bilingues. Méthodes et contenus, Paris: Honoré Champion.[a collection of papers on bilingual lexicography given at the first ‘Journée sur la Lexicographie bilingue’, in 1998] Szende, T. (ed.) (2000b), Approches contrastives en lexicographie bilingue, Paris: Honoré Champion. [papers from the second ‘Journée sur la Lexicographie bilingue’, in 1999] Béjoint, H. and P. Thoiron (eds) (1996), Les dictionnaires bilingues (Champs linguistiques), Louvain-laNeuve: Aupelf-Uref-Duculot. [12 contributions on bilingual dictionaries] Farina, D.M.T. (ed.) (1996), The Translational Equivalent in Bilingual Lexicography (thematic issue of Lexicographica, International Annual Vol. 12), Tübingen: Max Niemeyer.
464
Annotated Bibliography
Bartholomew, D.A. and L.C. Schoenhals (1983), Bilingual Dictionaries for Indigenous Languages, México: Summer Institute of Linguistics. [a practical manual for the fieldworker on the preparation of bilingual dictionaries, oriented towards the languages of Central America but generalizable to the indigenous languages of other areas]
9 For learners Heuberger, R. (2018), ‘Dictionaries to Assist Teaching and Learning’, in P.A. Fuertes-Olivera (ed.), The Routledge Handbook of Lexicography, Abingdon and New York: Routledge, 300–16. [focuses on examining the features and core issues peculiar to monolingual learners’ dictionaries, including in their electronic format] Bielińska, M. (2010), Lexikographische Metatexte. Eine Untersuchung nichtintegrierter Außentexte in einsprachigen Wörterbüchern des Deutschen als Fremdsprache, Frankfurt: Peter Lang. [a discussion of the ‘outer texts’ (front and back matter) in German monolingual learners’ dictionaries] Fuertes-Olivera, P.A. (ed.) (2010), Specialised Dictionaries for Learners (Lexicographica. Series Maior 136), Berlin/New York: De Gruyter. [a contribution to pedagogical specialized lexicography, it argues for the need for better specialized dictionaries for learners based on a sound theoretical framework] Kernerman, I.J. and P. Bogaards (eds) (2010), English Learners’ Dictionaries at the DSNA 2009, Tel Aviv: K Dictionaries Ltd. [eleven papers given at the special seminar at the Dictionary Society of North America conference of 2009 by contributors from a range of countries, reflecting on the learners’ dictionary tradition in English] Tarp, S. (2008), Lexicography in the Borderland between Knowledge and Non-knowledge. General Lexicographical Theory with Particular Focus on Learner’s Lexicography (Lexicographica. Series Maior 134), Tübingen: Max Niemeyer. [proposes ‘function theory’, which is then applied to and illustrated from learners’ dictionaries] Welker, H.A. (2008), Panorama geral da lexicografia pedagógica, Brasilia: Thesaurus. [aims to provide a general overview of the field of pedagogical lexicography for researchers, students and language teachers; includes summaries of dictionary reviews] Fuertes-Olivera, P.A. and A. Arribas-Baño (2008), Pedagogical Specialized Lexicography, Amsterdam and Philadelphia: John Benjamins. [deals with specialized dictionaries used to meet pedagogical needs in the teaching of business English and LSP translation to Spanish learners] Leaney, C. (2007), Dictionary Activities, Cambridge: Cambridge University Press. [aimed at language teachers for developing students’ dictionary skills; explains what the features of a dictionary are and how to navigate a dictionary, through to more complex topics such as collocations, idioms and word building; also looks at the use of electronic dictionaries and specialized dictionaries] Huszár, Andrea (2006), Eine vergleichende Untersuchung von Lernerwörterbüchern des Deutschen und Englischen, Hamburg: Verlag Dr. Kovač. [Vol. 3 in the series Angewandte Linguistik aus interdisziplinärer Sicht (ed. K-D. Baumann); a comparison of learners’ dictionaries in German and English, with a particular focus on the treatment of ‘Funktionsverbgefüge’ and compound verbs] Stein, G. (2002), Better Words. Evaluating EFL Dictionaries, Exeter: Exeter University Press. [a collection of papers authored by Stein, some previously unpublished, on EFL lexicography, focusing on the suitability and effectiveness of monolingual and bilingual EFL dictionaries in teaching and learning] Humblé, P. (2001), Dictionaries and Language Learners, Frankfurt: Haag and Herchen. [elaborates a model for a foreign language learner’s dictionary, taking account of didactic and pedagogical implications] Heuberger, R. (2000), Monolingual Dictionaries for Foreign Learners of English: A Constructive Evaluation of the State-of-the-Art Reference Works in Book Form and on CD-ROM (Austrian Studies
465
Annotated Bibliography
in English 87), Vienna: Braumüller. [a critique of late 1990s English dictionaries for advanced learners, in both print and electronic form] Cowie, A.P. (1999), English Dictionaries for Foreign Learners: A History, Oxford: Clarendon Press. [a history of the development of the genre, together with discussions on phraseology, the role of the computer, and user-related research] Herbst, T. and K. Popp (eds) (1999), The Perfect Learners’ Dictionary(?) (Lexicographica. Series Maior 95), Tübingen: Max Niemeyer. [papers from a symposium held in Erlangen in 1997; Part 1 contains 15 papers on the 1995 generation of English learners’ dictionaries; Part 2 has 4 papers on other types of learner dictionary; Part 3 has 4 papers on dictionaries and corpora] Stark, M. (1999), Encyclopedic Learners’ Dictionaries. A Study of their Design Features from the User Perspective (Lexicographica. Series Maior 92), Tübingen: Max Niemeyer. [an examination of this hybrid dictionary seeks to identify what encyclopaedic learners’ dictionaries are as a type, to investigate their usefulness to learners, and to suggest how their design might be improved in order to serve users’ needs better] Zöfgen, E. (1994), Lernerwörterbücher in Theorie und Praxis (Lexicographica. Series Maior 59), Tübingen: Max Niemeyer. [a discussion of the theory and practice of pedagogical lexicography, with special reference to French learners’ dictionaries]
10 Lexicography of individual languages Burada, M. and R. Sinu (eds) (2020), A Local Perspective on Lexicography. Dictionary Research, Practice and Use in Romania, Newcastle-upon-Tyne: Cambridge Scholars Publishing. [addresses a range of culture-specific topics related to both dictionary-as-process and dictionary-as-product, reflecting the interests and concerns of all the stakeholders on the lexicographic continuum: theorists, practitioners, and dictionary users; topics covered encompass several macro- and micro-structural aspects relating to paper and online, monolingual and bilingual, general and specialized reference works] Van Der Kuip, F. and W. Visser (eds) (2018), Lexicography of Smaller Languages: between foreignism and purism, Special Issue of International Journal of Lexicography 31 (2), Oxford: Oxford University Press. [contains contributions on Basque, Estonian, Flemish Sign Language, Frisian, Irish and Welsh] Giannakis, G.K., C. Charalambakis and F. Montanari (eds) (2018), Studies in Greek Lexicography, Berlin: De Gruyter. [nineteen studies by specialists in the field of Greek lexicography; some papers deal with historical aspects of Greek lexicography covering all phases of the language, i.e. ancient, medieval and modern, as well as the interrelations of Greek to neighbouring languages; other papers address more formal issues, such as morphological, semantic and syntactic problems that are relevant to the study of Greek lexicography, as well as the study of individual words] Russell, L.R. (2018), Women and Dictionary-Making: Gender, Genre and English Language Lexicography, Cambridge; Cambridge University Press. [tracing the craft of dictionary making from the fifteenth century to the present day, this book explores the vital but little-known significance of women and gender in the creation of English language dictionaries] Miyoshi, K. (2017), The First Century of English Monolingual Lexicography, Newcastle-upon-Tyne: Cambridge Scholars Publishing. [deals with monolingual English dictionaries from 1604 to 1702, as an alternative to the classic account in Starnes and Noyes (1946)] Wei Xiangqing et al. (2014), Lexicography in China (1978–2008), Beijing: The Commercial Press. [a comprehensive survey of dictionaries of all types in the period in China, including print dictionaries, digital dictionaries, dictionaries online and dictionaries of minority languages] Béjoint, H. (2010), The Lexicography of English. From Origins to Present, Oxford: Oxford University Press. [covering both dictionary history and current issues, this work surveys the range of lexicography
466
Annotated Bibliography
in English, in both Britain and the United States, with useful contrastive perspectives with French lexicography, and speculation on the future of the dictionary] Żmigrodzki, P. (2009), Wprowadzenie do leksykografii polskiej, Third edition, Katowice: Wydawnictwo Uniwersytetu Śląskiego. [introduction to Polish lexicography] Correia, M. (2008), Os Dicionários Portugueses, Lisboa: Caminho. [dictionaries of Portuguese] Ishikawa, S., K. Minamide, M. Murata and Y. Tono (eds) (2006), English Lexicography in Japan, Tokyo (Japan): The Jacet Society of English Lexicography and Taishukan Publishing Company. [a collection of articles by Japanese scholars, showing the range of interest in Japan in the lexicography of English, under the following headings: 1 Dictionary and words; 2 Dictionary – analysis and comparisons; 3 Dictionary and pragmatics; 4 Dictionary and gender; 5 Dictionary and education] Pruvost, J. (2006), Les dictionnaires français, outils d’une langue et d’une culture, Paris: Éditions Ophrys. [a review of dictionaries of French from the perspectives of lexicology and lexicography] Ruhstaller, S. and J. Prado Aragonés (eds) (2001), Tendencias en la investigación lexicográfica del español. El diccionario come objeto de estudio linguístico y didáctico, Huelva: Universidad de Huelva. [trends in the lexicographic investigation of Spanish] James, G. (2000), Colporul: A History of Tamil Dictionaries, Chennai: Cre-A. [a history of Tamil lexicography from the earliest times, set in the context of reference science and soiolinguistics, with an extensive bibliography and discussion of unpublished manuscripts] McArthur, T. and I. Kernerman (eds) (1998), Lexicography in Asia, Tel Aviv: Password Publishers Ltd. [around a dozen papers, largely from a 1997 conference on ‘Dictionaries in Asia’, surveying the field and discussing issues relevant to the Asian continent] Dodd, W.S. (ed.) (1995), A Survey of Spanish Lexicography/Panorama de la Lexicografía Española, Special Issue of International Journal of Lexicography 8 (3). [4 articles in Spanish on the history of and current issues in Spanish lexicography]
11 Electronic lexicography Kosem, I., T. Zingano Kuhn, M. Correia, J.P. Ferreria, M. Jansen, I. Pereira, J. Kallas, M. Jakubíček, S. Krek and C. Tiberius (eds) (2019), Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference, 1–3 October 2019, Sintra, Portugal, Brno: Lexical Computing CZ, s.r.o. [the theme of the conference was ‘smart lexicography’, including dictionaries on smartphones, as well as ‘smart’ re-use of dictionary information; papers available for download at https://elex.link/elex2019/ proceedings-download/] Kosem, I, C. Tiberius, M. Jakubíček, J. Kallas, S. Krek and V. Baisa (eds) (2017), Electronic Lexicography in the 21st Century. Proceedings of the eLex 2017 conference, 19–21 September 2017, Leiden, the Netherlands, Brno: Lexical Computing CZ, s.r.o. [the theme of the conference was ‘lexicography from scratch’, to investigate state-of-the-art technologies and methods for automating the creation of dictionaries; papers available for download at https://elex.link/elex2017/proceedings-download/] Kosem, I., M. Jakubíček, J. Kallas and S. Krek (eds) (2015), Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference, 11–13 August 2015, Herstmonceux Castle, United Kingdom. Ljubljana and Brighton: Trojina, Institute for Applied Slovene Studies/Lexical Computing Ltd. [the theme of the conference was ‘linking lexical data in the digital age’; it aimed to reflect on strategies for structured dynamic presentation of linked lexical data in modern lexicographic resources; papers are available for download at https://elex.link/elex2015/ conference-proceedings/] Mann, M. (ed.) (2014), Digitale Lexikographie. Ein- und mehrsprachige elektronische Wörterbücher mit Deutsch: aktuelle Entwicklungen und Analysen, Hildesheim, Zürich and New York: Georg Olms Verlag. [10 contributions on digital lexicography, arranged in three parts: (1) Design and preparation of dictionaries; (2) Compilation and further development of digital dictionaries; (3) Studies on existing dictionaries]
467
Annotated Bibliography
Granger, S. and M. Paquot (eds) (2012), Electronic Lexicography, Oxford: Oxford University Press. [after an introduction by Sylviane Granger, Part I contains 6 articles under the title ‘Lexicography at a Watershed’, Part II ‘Innovative Dictionary Projects’ contains 7 articles, and Part III ‘Electronic Dictionaries and their Users’ contains 6 articles] Fuertes-Olivera, P.A. and H. Bergenholtz (eds) (2011), e-Lexicography. The Internet, Digital Initiatives and Lexicography, London: Continuum. [15 papers covering current issues in e-lexicography; Part 1 on ‘function theory’, and Part 2 on specific topics arising from electronic dictionary projects] Kosem, I. and K. Kosem (eds) (2011), Electronic Lexicography in the 21st Century. New Applications for New Users, Ljubljana: Trojina, Institute for Applied Slovene Studies.
[Proceedings of eLex 2011 held in Bled, Sovenia, 10–12 November 2011]
Nesi, H. (2009), ‘Dictionaries in electronic form’, in A.P. Cowie (ed.), The Oxford History of English Lexicography, Vol. 2, Oxford: The Clarendon Press, 458–78. Nielsen, S (2009), ‘Reviewing printed and electronic dictionaries: A theoretical and practical framework’, in S. Nielsen and S. Tarp (eds) Lexicography in the 21st Century. In honour of Henning Bergenholtz, Amsterdam and Philadelphia: John Benjamins, 23–41. Almind, R. (2005), ‘Designing Internet Dictionaries’, Hermes 34, 37–54. [one of a number of articles in this issue of the journal that deal with e-lexicography; it argues that print and electronic dictionaries require very different design solutions; available to download from: https://tidsskrift.dk/her/issue/ view/2845] Haß, U. (ed.) (2005), Grundfragen der elektronischen Lexicographie, Berlin: Walter de Gruyter. [describes the ‘elexico’ on-line dictionary project of German at the Institut für deutsche Sprache, Mannheim – www.elexico.de] Zock, M. and J. Carroll (eds) (2003), Les dictionnaires électroniques: pour les personnes, les machines ou pour les deux?, Paris (France): Association pour le Traitement Automatique des Langues [Issue 44 (2) of Revue TAL, published by Association pour le Traitement Automatique des Langues] Corréard, M-H. (ed.) (2002), Lexicography and Natural Language Processing. A Festschrift in Honour of B.T.S.Atkins, Grenoble: Euralex. [full text available electronically on the Euralex website at: http:// www.euralex.org/elx_proceedings/Lexicography%20and%20Natural%20Language%20Processing/]
12 Dictionary use Müller-Spitzer, C., A. Koplenig and S. Wolfer (2018), ‘Dictionary usage research in the Internet era’, in P.A. Fuertes-Olivera (ed.), The Routledge Handbook of Lexicography, Abingdon and New York: Routledge, 715–34. [after setting dictionary usage research in its historical context, the article concentrates on use of the methods of online questionnaires, eye-tracking and analysis of log files] Müller-Spitzer, C. (ed.) (2014), Using Online Dictionaries (Lexicographica. Series Maior 145), Berlin: De Gruyter. [11 chapters arranged in four parts: I Basics; II General Studies on Online Dictionaries; III Specialized Studies on Online Dictionaries; IV Studies on Monolingual (German) Online Dictionaries, esp. eLexiko] Lew, R. (ed.) (2011), Studies in Dictionary Use: Recent Developments, special issue of International Journal of Lexicography 24 (1), Oxford: Oxford University Press. [seven articles, including an introductory one by Lew, on aspects of dictionary use research] Welker, H.A. (2010), Dictionary Use, A General Survey of Empirical Studies, Brasilia: Author’s Edition. [an overview of empirical resew), Szótárak és használóik [Dictionaries and their Users], Budapest: Akadémiai Kiadó. Lew, R. (2004), Which dictionary for whom? Receptive use of bilingual, monolingual and semi-bilingual dictionaries by Polish learners of English, Poznań: Motivex. [full text available online at https:// repozytorium.amu.edu.pl/handle/10593/655]
468
Annotated Bibliography
Tono, Y. (2001), Research on Dictionary Use in the Context of Foreign Language Learning. Focus on Reading Comprehension (Lexicographica. Series Maior 106), Tübingen: Max Niemeyer. [aims to show how research into dictionary use can contribute to the improvement of dictionary design and the clarification of issues in language learning; it summarizes previous dictionary use research and reports on studies carried out by the author] Nesi, H. (2000), The Use and Abuse of EFL Dictionaries. How learners of English as a foreign language read and interpret dictionary entries (Lexicographica. Series Maior 98), Tübingen: Max Niemeyer. [discusses experimental design problems, especially the unreliability of questionnaires; it proposes the need for detailed accounts of individual dictionary consultations and reports on a number of experiments using computers to gather information on large numbers of individual consultations] Atkins, B.T.S. (ed.) (1998), Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators (Lexicographica. Series Maior 88), Tübingen: Max Niemeyer. [reports on eight studies researching dictionary use by a range of groups, from school students to professors, and with the help of different types of dictionary, both monolingual and bilingual] Battenburg, J.D. (1991), English Monolingual Learners’ Dictionaries. A User-oriented Study (Lexicographica. Series Maior 39), Tübingen: Max Niemeyer. [a description of 11 research projects, all in higher education institutions, that aimed to understand dictionary use better]
13 Other works Bielińska, M. and S.J. Schierholz (eds) (2017), Wörterbuchkritik – Dictionary Criticism (Lexicographica. Series Maior 152), Berlin: De Gruyter. [contains 17 contributions in German and English, examining the role of dictionary criticism in dictionary research, function theory, user studies, and in cultural history] Bański, P. and B. Wójtowicz (eds) (2011), Issues in Modern Lexicography, München: Lincom Europa. Ooi, V.B., A. Pakir, I.S. Talib and P.K.W. Tan (eds) (2009), Perspectives in Lexicography: Asia and beyond, Tel Aviv: KDictionaries. [selected revised papers from the fourth ASIALEX conference plus some new ones; three sections – I Asian perspectives (5 papers), II Pedagogical perspectives (10 papers), III General perspectives (5 papers)] Zgusta, L. (2006), Lexicography Then and Now (Selected Essays edited by F.F.M. Dolezal and T.B.I. Creamer) (Lexicographica. Series Maior 129), Tübingen: Max Niemeyer. [updated and edited essays by Zgusta on a range of lexicographical topics, which shed light on and question current theory and practice] Apresjan, J. (2000), Systematic Lexicography, Oxford: Oxford Unversity Press. [a translation into English, by Kevin Windle, of this seminal Russian work on lexicography] Dalby, A. (1998), A Guide to World Language Dictionaries, London: Library Association Publishing. [bibliography of the most important dictionaries of 285 languages and language groups globally, encompassing around 1,640 dictionaries] Kachru, B. and H. Kahane (eds) (1995), Cultures, Ideologies and the Dictionary, Studies in Honor of Ladislav Zgusta, Tübingen: Max Niemeyer. [especially good on the history of lexicography and on the cultural and ideological influences on dictionaries. Part I Contextualizing Culture; Part II Lexicography in Historical Context; Part III Ideology, Norms, and Language Use; Part IV Pluricentricity and Ethnocentrism; Part V Dictionaries across Languages and Cultures; Part VI Language Dynamics vs Prescriptivism; Part VII Language Learner as the Consumer; Part VIII Structuring Semantics; Part IX Ethical Issues and Lexicologists’ Biases; Part X Terminology across Cultures] Shcherba, L. (1995), ‘Towards a general theory of lexicography’, International Journal of Lexicography 8 (4), 314–50. [an English translation of Shcherba’s article, originally published in Russian in 1940] Burchfield, R.W. (ed.) (1987), Studies in Lexicography, Oxford: Clarendon Press. [a collection of essays dealing with historical, period and modern regional dictionaries such as DARE (Dictionary of American Regional English) and the Australian National Dictionary]
469
Names Index
Names have been included in the index only where their work has been extensively or significantly cited. Abel, A. 366 Adamska-Sałaciak, A. 193, 274 Akasu, K. 31, 36, 177 Alcina, A. 89, 335 Atkins, B.T.S. 12, 160, 197, 269–70, 309 Baldinger, K. 325–6, 329 Béjoint, H. 8, 60–1, 64, 151, 159, 268 Bergenholtz, H. 2, 57–8, 214, 391 Bogaards, P. 5, 34, 35, 148, 151, 320 Bolinger, D. 278 Bothma, T.J. 369–71 Bowker, L. 93 Brewer, C. 375 Bukowska, A.A. 62–3 Chi, A. 145 Coleman, J. 62 Considine, J. 131 Corriente, F. 138 Cowie, A.P. 151, 328 De Schryver, G-M. 148–9, 222, 224, 352 Dodd, W.S. 92 Dohi, K. 32, 37, 182 Dubois, J. & C. 6 Durkin, P. 140 Dziemianko, A. 11, 262, 409 Forget, N. 91 Fox, G. 151, 179, 181 Fuertes-Olivera, P.A. 361 Geeraerts, D. 278 Görlach, M. 138 Gouws, R. 2, 149, 214, 215 Hanks, P. 12, 61, 151, 252–3, 262, 271 Hartmann, R.R.K. 2, 5, 32, 60, 61, 158, 159, 329, 390, 423 Hausmann, F-J. 181, 270
Hornby, A.S. 167 Hüllen, W. 267, 271, 273 Inoue, A. 309 Jackson, H. 1, 32, 57, 156, 458 Johnson, S. 6, 285, 288 Josselin-Leray, A. 343 Kawamura, A. 179 Kay, C. 384 Kilgarriff, A. 11, 71, 253, 347, 355 Kipfer, B.A. 257, 258, 441 Kister, K.F. 33 Klosa-Kückelhaus, A. 405 Kosem, I. 44, 348 Kristoffersen, J.H. 227 Kwary, D.A. 370, 399 Landau, S. 6, 58 Lau, J.H. 349, 350 Lehr, A. 90 Lew, R. 10, 46, 49, 65, 90, 158, 251 Liberman, A. 138, 139 Lu, G.S. 310, 313 McArthur, T. 2, 146, 167 McMillan, J.B. 32 Michaelis, F. 405 Moe, R. 288, 289 Müller-Spitzer, C. 44, 46, 65 Murray, J.A.H. 133, 135 Nakamoto, K. 33 Nam, K. 309 Nielsen, S. 156, 389, 390 Nesi, H. 43, 48, 153, 159 Ogilvie, S. 7, 62 Ooi, V.B.Y. 309 Osselton, N.E. 31
Names Index
Palmer, H.E. 166 Passow, F. 133 Pastor, V. 89 Piotrowski, T. 2, 251, 267 Prinsloo, D.J. 209, 210, 215, 216, 222, 352
Svensén, B. 178, 179, 269, 317 Swanepoel, P. 34, 63
Rey, A. 8, 132 Rey-Debove, J. 6, 8 Roget, P.M. 331, 335 Rumshisky, A. 349, 353 Rundell, M. 37, 61, 147, 148, 149, 158, 170, 176, 179–80, 182, 183–4, 197, 214, 254, 269, 309, 352, 356
Ullman, S. 325
Sánchez, M.d.M. 92 Sajous, F. 343, 346 Saussure, F. de 303, 325 Schierholz, S.J. 61 Scholfield, P. 148, 176 Sierra, G. 325, 334 Sinclair, J.McH. 60, 61, 168, 181, 252 Steiner, R.J. 32 Stutzman, V. 285
Tarp, S. 270, 273, 277, 279, 400 Trap-Jensen, L. 19 Troelsgård, T. 227
Varantola, K. 46 Warfel, K. 285 Welker, H.A. 43, 46 West, M. 166 Wiegand, H.E. 7, 57, 62, 267, 271, 272, 276 Wierzbicka, A. 271, 278 Yamada, S. 149, 151, 165 Zgusta, L. 5, 6 Zhang, S. 46, 48, 51, 337 Zhao, C. 309 Zimmer, B. 212 Zwitserlood, I. 227
471
General Index
adaptive technology 28 Afrilex 5 anisomorphism 201 bilingual dictionary 32, 49, 193ff, 215–16, 254–6, 259 collaborative lexicography 28, 343ff collocation 76, 115, 318, 348 common core 23 corpus lexicography 11, 21, 26, 58, 71ff, 196–8, 253, 291–2, 317, 347 crowdsourcing 28, 59, 343ff customization 27, 149 data box 216–18 database 19, 20, 292–3 definition 61, 132, 179–81, 261–3, 300 dictionary analysis 35–7, 62 dictionary criticism 5, 7, 31ff, 62, 154–6 dictionary design 147 Dictionary Society of North America 5 dictionary typology 7–8, 49, 90, 228, 304 dictionary use 10, 43ff, 63–5, 89ff, 152, 328–30 dictionary users 46–7, 153–4, 157, 193–4, 222, 277, 326, 337, 352 dictionary writing system 26, 58, 59, 214 e-dictionary 28, 38, 48, 50, 51, 89ff, 170, 183, 216, 229, 239–44, 295–7, 333, 338 encyclopaedia 276 equivalents 200, 202–3, 215, 255–6, 259, 274, 300 etymology 136, 137ff, 312 Euralex 5 examples 83, 181–2, 204–5, 301, 312–14, 320 eye-tracking 46, 65 frequency 25, 171, 198, 257–8 full-sentence definition 151–2, 179–80, 261 general-purpose dictionary 137, 145 grammar 172ff, 209–10, 221, 246 graph-based dictionary 335–6 guide words 149, 176–8, 259–60
hapax legomenon 132 headword See lemma historical lexicography 131ff historical research 6 idioms 314–17 inclusion policy 25–6 Iwasaki Linguistic Circle 7, 35, 39 n.3, 62 labels 79 domain 82 grammatical 80–1 register 81–2 layout 27 learners’ dictionary 33, 145, 146ff, 165ff lemma 22–3, 59, 73, 231, 233, 236, 299 lexical knowledge base 334–5 limited defining vocabulary 150, 151, 168, 178–9, 262 linguistics 2, 59–60, 272–3, 277–8, 350 loanword 24 log-file analysis 45–6, 65 macrostructure 8, 147, 294 mediostructure 9, 302–3 megastructure 198 menus 149, 176–8, 259–60 mesostructured See mediostructure metalexicography 1, 61–2, 267, 272, 279 methodology 57ff microstructure 9, 200, 297ff minority languages 285ff multi-word expressions 73, 199, 220, 252–3, 303, 309ff neologism 74, 198–9, 346, 350 nesting 147 normative tradition 22 onomasiological dictionary 195, 325ff, 330 open dictionary 352 pedagogical lexicography 145ff, 271 phraseological unit 202
General Index
pictorial dictionary 332–3 polysemy 200, 259 practical lexicography 1, 19ff, 37, 57–9, 214, 267 pronunciation 136, 146, 300 protocols 45, 62 questionnaire 44, 62, 152 reference science 2 reference skills 158–60 reverse dictionary 331–2, 338 search types 92, 94–105, 239–42, 295 semantic domains 289 semasiological dictionary 195, 326 sense discrimination 60, 251ff, 349 sense order 256, 298, 311
sign language lexicography 227ff Sketch Engine 11, 60, 78, 197, 278 style guide 21, 270 subsenses 301 theory (of lexicography) 1, 214, 267ff thesaurus 331 typography 27, 251 user needs 43, 58 vocabulary learning 10, 48, 148 Webonary 285–6 word list 198–9, 211, 288–9, 294 word sketch 76–9 bilingual 85
473
474
475
476
477
478
479
480