241 71 17MB
English Pages [349] Year 2011
Preface
On 17 June 2010, the University of Valladolid conferred the degree of Doctor of Lexicography, Honoris Causa, on Henning Bergenholtz. The governing bodies of the university made the decision on two grounds, both of which were stated in the Laudatio read at the graduation ceremony. First, Dr. Bergenholtz has been a professional lexicographer for more than 30 years. During these years, he and his colleagues from the Centre for Lexicography, Aarhus School of Business, University of Aarhus, have developed and set in motion a theory to approach dictionary-making and reviewing in a way that is very different from the linguistic colonialism espoused by linguists and the Wiegand model that is based on a contemplative approach to lexicography. The new model, which is called the function theory of lexicography, has allowed Dr. Bergenholtz and his colleagues to reshape the field of lexicography, changing the field completely. Secondly, Dr. Bergenholtz has collaborated and is collaborating with the University of Valladolid in a very productive and scholarly way. For example, he helped us to organize an international symposium on e-lexicography, which was held at the University of Valladolid (14–16 June 2010), and which can be considered a landmark in the field, as participants discussed innovative and workable proposals. Some of the proposals presented have been included in this book, which has been prepared and edited according to accepted academic standards: all the contributions were subject to a peer-review process and discussed by wellknown international scholars in the field. The above claims allow me to thank the following: z Dr. Bergenholtz for his help in preparing the academic programme. z The participants in the symposium for bringing to the discussion new and
provocative ideas. My special thanks go to the authors who have contributed in this book and to several colleagues who also participated and collaborated to create a friendly and relaxed atmosphere: María José Crespo; Klara Ceberio; Sahat Ugartetxea; Mercedes Jaime; Rocío Jiménez Briones; Ricardo Mairal; Ángel de los Ríos; Pablo Gordo; Jacek Lesinski; Francisco Ruiz de Mendoza; Sol Sta. María; Bernadette Borosi; and Ángeles Sastre. z The reviewers of the different articles for their insights and thoughtful comments.
9781441128065_fm_finals_txt_print.indd vii
7/7/2011 3:14:39 PM
Preface
viii
z The funding authorities, which provided us with the funds for carrying out
the event: { Ministerio de Ciencia e Innovación (Grants. FFI2008–01703/FILO, and FFI2009–07109-E/FILO) { Junta de Castilla y León (Grant VA039A09, and BOCYL, de 12 de abril de 2010). { Universidad de Valladolid (Ayudas del Vicerrectorado de Investigación). Pedro A. Fuertes-Olivera Valladolid, 29 October 2010
9781441128065_fm_finals_txt_print.indd viii
7/7/2011 3:14:39 PM
Notes on Contributors
Richard Almind is a researcher at the Centre for Lexicography, Aarhus School of Business, University of Aarhus, Denmark. His main task is connected with designing lexicographical databases and preparing software for producing both printed and internet dictionaries. He has published extensively on dictionary layouts, lexicographical databases and several aspects connected with the distinction between information tools and information databases. He is currently working on the design of the database for the Spanish Accounting Dictionary and other internet dictionaries that are being planned and compiled at the Centre for Lexicography. He can be contacted at [email protected]. Birger Andersen is Associate Professor at the Centre for Lexicography, Aarhus School of Business, University of Aarhus, Denmark. His research interests cover areas such as grammatical information in dictionaries, particularly dictionary grammars, the functional rationale for various types of data in dictionaries, and specialized bilingual lexicography. His current research centers on devising and creating a database for the generation of three electronic English–Danish phrasal-verb dictionaries for reception, translation and production, respectively. He can be contacted at [email protected]. Henning Bergenholtz is Professor of Lexicography at the Aarhus School of Business, University of Aarhus, Denmark, where he is also Director of the Centre for Lexicography. Since 2005, he has been Extraordinary Professor at the University of Stellenbosh (South Africa). In 2010, the University of Valladolid conferred the degree of Doctor of Lexicography, Honoris Causa, upon him. Dr. Bergenholtz has published on grammar, lexicography and language policy, and has been the editor of more than 30 dictionaries (for example, the Dictionary of Fixed Expressions). He has published extensively within the field of lexicography (for example, the Manual of Specialised Lexicography), and more than 200 papers. He can be contacted at [email protected]. Inger Bergenholtz has been a piano teacher at Kolding Music School in Denmark since 1988. She graduated in musicology at the University of Copenhagen and afterwards took a degree as a pianist at the Hochschule für Musik in Berlin. After teaching the piano at music schools in Germany, she
9781441128065_fm_finals_txt_print.indd ix
7/7/2011 3:14:39 PM
x
Notes on Contributors
went to Denmark in 1987 and worked as a piano teacher and as a chamber musician. She published a print dictionary of music, Politikens Musikordbog, in 1996 and she is the author of papers in lexicographical journals, for example, LexicoNordica and Lexikos. In 2006, she began editing an internet dictionary of music, first www.musikordbogen.dk, now Musikordbogen on www.ordbogen. com. She can be contacted at [email protected]. Theo Bothma is Professor and head of the Department of Information Science at the University of Pretoria, Pretoria, South Africa, and chairperson of the School of Information Technology (2008–2012). His teaching and research focus on information organization and retrieval (including information literacy), web development and electronic publishing, as well as on curriculum development. He has published widely and presented numerous papers at local and international conferences. He was the editor of the IFLA FAIFE World Report 2007 and IFLA World Report 2010 (an interactive web-based publication). He is a member of the editorial boards of Libri, SA Journal of Libraries and Information Science, International Journal for Information Ethics, Online Information Review, New Review of Information Networking and Education for Information. He is member of LIASA, ACM, CSSA, as well as the IFLA Standing Committee for Knowledge Management and an expert resource person for IFLA FAIFE. He can be contacted at [email protected]. Pascual Cantos is a Full Professor at the Department of English at the University of Murcia (Spain). He received his BA (1988) and PhD (1993) degrees in English Language and Literature from the University of Murcia (Spain), his MA in Computational Linguistics (1995) from Essex University (UK) and a Postgraduate Diploma in Multivariate Statistics (2000) from the UNED (Spain). His main research interests are in Corpus Linguistics, Quantitative Linguistics, Computational Lexicography and Computer Assisted Language Learning. He is a member of the LACELL (Lingüística Aplicada a la Computación, Enseñanza de Lenguas y Lexicografía) Research Group at the University of Murcia. Pedro A. Fuertes-Olivera is Associate Professor at the University of Valladolid, where he was Vicerrector de Economía (2003–2006). He obtained his accreditation for Full Professor in 2011. He teaches specialized discourse, especially ESP. His research interest lies in specialized lexicography, translation, and language teaching. He has recently published several books and papers on lexicography, and is currently working in the planning and compilation of several dictionaries within the project known as the Accouting Dictionaries. He is a member of the editorial board of Hermes, Journal of Language and Communication Studies, Revista Española de Lingüística Aplicada, and Odisea. He can be contacted at [email protected] .
9781441128065_fm_finals_txt_print.indd x
7/7/2011 3:14:39 PM
Notes on Contributors
xi
Rufus H. Gouws is Professor in Afrikaans Linguistics and Chair of the Department of Afrikaans and Dutch at the University of Stellenbosch in South Africa, where he also coordinates the Postgraduate Programme for Lexicography. His research focuses primarily on theoretical lexicography and he complements his theoretical work with contributions in the lexicographic practice as editor and co-editor of various Afrikaans dictionaries. He is a member of the editorial board of the Fachwörterbuch zur Lexikographie und Wörterbuchforschung/Dictionary of Lexicography and Dictionary Research and of the journal Lexicos as well as co-editor of the journal Lexicographica and the book series Lexicographica Series Maior. He has published extensively in the field of lexicography and is a regular participant in international conferences and symposia. He is a former president of AFRILEX, the African Association for Lexicography. He can be contacted at [email protected]. Ulrich Heid is a computational linguist and professor of language technology and computational linguistics at the University of Hildesheim, Germany. He did studies of romance linguistics and history and earned a PhD in computational linguistics at Stuttgart (1995); at the same place, he got his habilitation in 2001. Heid worked at University of Stuttgart since 1986, and at the University of Hildesheim since 2008; he has been involved in projects centred on dictionary structures (one in the Framework of the Stellenbosch Institute for Advanced Study), on corpus lexicography and on corpus linguistics. Since 2010, he has been involved in a basic research project on the treatment of ambiguities in corpus linguistic data extraction, a European project on terminology extraction and a national project on infrastructures for language resources. Patrick Leroyer, PhD, is Associate Professor of French and Danish Specialized Translation and Business Communication at the Aarhus School of Business, University of Aarhus, Denmark, and is attached to the Centre for Lexicography. He is also Editor-in-Chief of Hermes, Journal of Language and Communication Studies. His current research is on theoretical lexicography in a broad, functional perspective, and includes the development of specific theories for data-access, -selection and -presentation in connection with communicative, cognitive, operative and interpretive functions in paper-based and e-lexicographic information tools. He has published within the fields of Specialized Translation and Business Communication, Terminography and Functional Lexicography. Patrick Leroyer can be contacted at [email protected] Robert Lew is Professor at the Department of Lexicology and Lexicography, Adam Mickiewicz University, Poznań, Poland. His current interests centre around dictionary use, and he is involved in a number of research projects, including topics such as access-facilitating devices, definition formats,
9781441128065_fm_finals_txt_print.indd xi
7/7/2011 3:14:39 PM
xii
Notes on Contributors
dictionaries for production, space in dictionaries and training in dictionary skills. He has worked as a practical lexicographer for various publishers, including Harper-Collins, Pearson-Longman and Cambridge University Press. He is Reviews Editor for the International Journal of Lexicography (Oxford University Press) and advisor to Macmillan Dictionaries. He can be contacted at rlew@ amu.edu.pl. Sandro Nielsen is affiliated with the Centre for Lexicography – Research into Needs-Adapted Information and Data Access, Aarhus School of Business, Aarhus University, Denmark, where he is Associate Professor. He has an MA in English (LSP for translators and interpreters) from 1987 and was awarded his PhD degree in specialized lexicography in 1992. He is the author and co-author of numerous publications on theoretical and practical lexicography, including The Bilingual LSP Dictionary: Principles and Practice for Legal Language (1994), a printed and an online bilingual law dictionary; three printed and five online accounting dictionaries; and is a major contributor to the Manual of Specialised Lexicography. His main research areas are principles for online LSP dictionaries, user guides in dictionaries, lexicographic information costs and academic dictionary reviewing. Teaching interests focus on lexicography and legal translation for translators and interpreters. He can be contacted at [email protected]. Marta Niño-Amo is Associate Professor at the University of Valladolid (Spain), where she teaching accounting. She has been working within the project of the Accounting Dictionaries and is currently the editor of Memento Contable, a respected Spanish publication featuring comments on accounting standards and legislation, with the aim of offering up-to-date information on accounting matters. She can be contacted at [email protected]. Beatriz Pérez Cabello de Alba graduated in English studies from the University of Granada, Spain, and holds a PhD in Linguistics from the University of Córdoba. She has been a visiting scholar in the University of Amsterdam (1993) and the University of Verona (1994) and has lectured in Chulalongkorn University in Bangkok (1997–1998) and in the London School of Economics and Kingston University (1998–1999). In 2000 she worked for Bitext, one of the pioneering language solutions companies in Spain. In 2001, she joined the National University of Distance Learning in Spain, where she has been working since. She has collaborated in several research projects and carried out research on lexicology, lexicography and language technologies. She can be contacted at [email protected]. Eva Samaniego Fernández is Associate Professor at UNED (Spanish Distance University), where she primarily teaches English for Specific Purposes. She is also a sworn translator and holds an MA in specialized translation. Her
9781441128065_fm_finals_txt_print.indd xii
7/7/2011 3:14:39 PM
Notes on Contributors
xiii
research interests are translation, ESP, specialized lexicography and metaphor. She has published within the fields of metaphor translation, legal discourse and translation, translation-applied text analysis, translation and genre, etc. She also teaches English language courses for European Union judges and prosecutors within the framework of the Spanish Council of the Judiciary and Eurojust, and is an appointed linguist of the European Judicial Training Network within the project “Language Training on the Vocabulary of Judicial Cooperation in Criminal Matters”. She can be contacted at [email protected]. Aquilino Sánchez studied at the University of Barcelona, Spain, and later at the University of Munich (Germanistik), Germany, and at Georgetown University, Washington, D.C. (Applied Linguistics). He began his university career at the University of Barcelona and three years later joined the Modern Languages Department at the Autonomous University of Barcelona, Bellaterra. He was also Director of the Official School of Languages in Barcelona for five years. In 1980, he obtained a Chair in English Philology at the University of Málaga and in 1981 he moved to the University of Murcia, where he has taught since. His research interests and teaching centre on Lexicology and Lexicography, Second Language Teaching Methodology and Corpus Linguistics. He has edited the Gran Diccionario de uso del español actual – 2001, SGEL – based on the corpus Cumbre, and also a bilingual Spanish–English dictionary (1993, SGEL). He has lectured in various universities in Spain and abroad and has published many books and articles on these subjects. He can be contacted at [email protected] Dennis Spohr studied computational linguistics at Universität Stuttgart and Dublin City University, with a focus on computational lexicography and semantic web technologies. In 2010, he completed his doctoral dissertation on multifunctional lexicon models under the supervision of Prof. Dr. Ulrich Heid at Universität Stuttgart. Shortly afterward, he joined the Semantic Computing group at the Center of Excellence Cognitive Interaction Technology (CITEC) in Bielefeld, where he is currently working on multilingual ontology localization and cross-language knowledge access and presentation using semantic web technologies. Prior to his appointment in Bielefeld, he had been involved in projects on lexical semantics and computational lexicography in Saarbrücken and Stuttgart. Sven Tarp is Professor of Lexicography at the Aarhus School of Business, and affiliated with the Centre for Lexicography. Since 2008, he has been Extraordinary Professor at the University of Stellenbosh (South Africa). He is author and co-author of several publications on lexicography, including Lexicography in the Borderland between Knowledge and Non-Knowledge: General
9781441128065_fm_finals_txt_print.indd xiii
7/7/2011 3:14:39 PM
xiv
Notes on Contributors
Lexicographical Theory with Particular Focus on Learner’s Lexicography. He can be contacted at [email protected]. Serge Verlinde is Professor of French for Specific Purposes and Director of the Leuven Language Institute at the University of Leuven, Belgium. His main research interests are corpus linguistics, pedagogical lexicography and Computer Assisted Language Learning (CALL). He is co-author of the Dictionnaire d’apprentissage du français des affaires. Recently, he developed web applications for teaching and learning French vocabulary: the Base lexicale du français and Alfalex. He is currently working on online reading, translating and writing aids.
9781441128065_fm_finals_txt_print.indd xiv
7/7/2011 3:14:40 PM
Introduction: The Construction of Internet Dictionaries Pedro A. Fuertes-Olivera Henning Bergenholtz
Many, perhaps a large majority of e-dictionaries, were not e-dictionaries in their own way, but mere printed dictionaries (p-dictionaries) made available on an electronic platform. Still, we can observe that only a few existing e-dictionaries really use the technical possibilities of the electronic medium in the conception and preparation of dictionaries, and in the access to and presentation of data in them. The practical explanation for this is simply that most lexicographers do not see the consequences of the difference between a lexicographical database and a dictionary and, therefore, continue the tradition of planning and compiling polyfunctional e-dictionaries, which are directly taken from or made similar to p-dictionaries. The theoretical explanation for the above situation is that some lexicographers, for example H. E. Wiegand (1998), assume the necessity of new theories to construct e-dictionaries, since they consider that former theories of lexicography are only usable for p-dictionaries. Our view is different, not only because it will force us to accept a two-string theory of lexicography – one for p-lexicography, and another for e-lexicography – but also because we are convinced that what we need is the same theory(ies), although it is adapted to the different access to and data presentation possibilities of the two media. This is the main topic of this book. All the articles in this book, therefore, explore the state of art in e-lexicography by studying how new lexicographical concepts, and their application in specific internet dictionaries, are expected to shape lexicographical innovations in the near future. Electronic dictionaries are typical products of what we call knowledge and information society, which demand a different approach to electronic lexicography from the one which started at the beginning of the electronic-dictionary age. For instance, only internet dictionaries, but not other types of electronic ones, assure quick and easy access to extralexicographical data. Consequently, old typologies of electronic dictionaries must be substituted by more informed ones, which can take into consideration the difference between an information database and an information tool, the
9781441128065_int_finals_txt_print.indd 1
7/6/2011 11:09:23 PM
e-Lexicography
2
access process used and their integration within pedagogical environments; they should also explore the introduction of Boolean searches, and allow for maximizing and minimizing searches, to name just a few of the lexicographic characteristics that are being dealt with in this book, some of which are summarized in Gouws’s chapter, and reproduced below: z The use of data banks from which different types of dictionaries, and even
different dictionaries of the same type, can be extracted. z The mistake of including much more data than needed, a possibility that is
of the utmost interest for e-lexicography, considering that e-lexicography is not hampered by space restrictions. z The broadening of lexicographical theory to the development, planning, compilation and publication of other reference sources, which are also focused on the users of these sources, the data presented in them, the structures to accommodate the data and, ever so important, access to the data in order to achieve an optimal retrieval of information. z A paradigm shift, which is also applicable to printed dictionaries, takes into consideration the fact that dictionary users are also internet users who are used to downloading and uploading all types of data. Uploading offers lexicographers the opportunity to enhance a spirit of lexicographic democratization. These are central issues in the articles included in this manuscript, which is divided into two parts; one being more focused on general theoretical questions and their translation into specific dictionary projects, while the other being more concerned with presenting specific dictionary projects which have recently been out of the lexicographical drawing board with the aim of illustrating what a user-driven lexicography is about. This distinction is rather artificial and we, therefore, request the attention of our readers to view the book as a unified theoretical and practical attempt to discuss where we are and where we can be in the near future. Part 1 Chapters 1–6 are specially concerned with the tenets of function theory, the theoretical construction initially developed by Bergenholtz and Tarp (2002, 2003, 2004, 2005a; see Tarp 2008a, and Tono 2010 for a review), which has also opened new ways in the construction of internet dictionaries,with the aim of facilitating the exact data users need in a quick and easy way. The main idea behind addresses the question as to how lexicographic data can be modelled in such a way that a lexicographic tool is capable of satisfying the different types of users in different types of situations. As several authors point out in this book, lexicographers cannot aim to satisfy the needs of each individual user. Therefore, ‘what is needed is not only a lexicographical tool that is capable of dealing with types of users and situations, but one that provides the necessary mechanisms for individualization of dictionary content – in terms
9781441128065_int_finals_txt_print.indd 2
7/6/2011 11:09:23 PM
Introduction: The Construction of Internet Dictionaries
3
of customizing the views that an individual user is given on the lexicographic data’ (Spohr, this volume). One of the solutions proposed in this book is to move beyond the term ‘dictionary’ and introduce the term ‘information tool’ as a kind of umbrella term with which researchers can design any tool, no matter what we call them, aiming to satisfy the needs users might have in the four use–situations described so far: communicative, cognitive, operative and interpretive. Furthermore, researchers within this theoretical framework are also claiming the necessity for a shift in paradigm, which will place lexicography in the realm of information science, and will focus on describing the defining elements of internet dictionaries in terms of their accessibility, their formal properties and their categorization as information tools with which the distinction between information tool and information database, which is central in many of the dictionary projects here described, is more easily understood. In addition, each chapter in the book analyses more specific proposals and considerations in depth. They are framed within the general frameworks we have referred to in the previous paragraphs, and, hence, they add to the unit of purpose of this book. Below, we include summarize some of the main ideas of each chapter. By so doing, we are confident that readers of this book can really get a gist of the hotly debated main issues concerning e-lexicography in international lexicographical circles. In Chapter 1, ‘Learning, Unlearning and Innovation in the Planning of Electronic Dictionaries’, Gouws states that the future of e-lexicography ‘should not be isolated from either the past or the present’, and, therefore, states that e-lexicography should take what we have learnt from the past and move to the future guided by innovation and intelligent boldness. For example, the distinction between a contemplative and a transformative approach, which has been already applied when working with printed lexicography, is also useful for e-lexicography. The above reflection translates into a number of issues that we have to learn and unlearn in connection with the dominant views in the three stages in which we can group the history of theoretical lexicography: an initial stage dominated by the language contents of the dictionary, and which has resulted in a kind of linguistic colonialism which is still much appreciated and defended in some areas; a second stage that is mostly concerned with dictionary structures, thus following Wiegand’s main theses on the conceptualization of lexicography as an independent discipline and of the dictionary as a text in itself; and a third stage, which follows from Bergenholtz and Tarp’s functional approach, which is centred on lexicographic workings and their interest in putting the dictionary user and the situation of use at the centre of the discussion. Gouws claims that the advent of e-lexicography has not only made the fallacy of linguistic colonialism and the inappropriateness of Wiegand’s stance evident, but has also emphasized a number of issues that we have to unlearn
9781441128065_int_finals_txt_print.indd 3
7/6/2011 11:09:23 PM
4
e-Lexicography
with a view to understanding and explaining lexicography as an information science, and to presenting a general theory that is independent of the medium in which the dictionary is written. Bergenholtz follows suit in Chapter 2, ‘Access to and Presentation of NeedsAdapted Data in Monofunctional Internet Dictionaries’, in which he claims that theoretical lexicography has not had a significant effect on practical lexicography so far, and adds that both theoretical and practical lexicography have had several important errors and misunderstandings, among which he highlights the linguistic colonialism of lexicography and the lack of real attempts to develop new presentation forms and access options. In particular, he criticizes the conclusions drawn from dictionary surveys, which are regarded as unscientific and, therefore, inadequate to advance in our understanding of theoretical lexicography, and mentions that log files show that all the discussion on access structure is of little importance, since each respondent in each individual case chooses a individualized search path. As we cannot describe an access structure for each individual, he proposes looking at the selected individual search path and see which dictionaries – with which macro and micro structures, different search-relevant graphic markers in printed dictionaries and research sequences in certain fields in the database of an electronic dictionary – are associated with a particular fast or particularly slow access. Bergenholtz’s proposal is investigated in two detailed case studies in which several questions are formulated in order to assess whether the users found help in an array of printed and internet dictionaries, and the search time spent consulting the dictionaries. With these two case studies, Bergenholtz exemplifies two important criteria when evaluating the use and quality of a dictionary, which are whether the user could find the item that contains the answer to the question that prompted the search, and how long the search took. For him, the best dictionary is probably the one rendering a usable result in a short period of time, for example the four dictionaries of fi xed expressions that he describes, that are extracted from the same database, each with the aim of being helpful in searches adapted to the function demanded by the user and to the use situation in which the user performs his or her search. Among them he mentions several search options that mark a novelty in electronic lexicography, and signal the way ahead for internet dictionaries, which will be connected to the retrieval of the data the user needs in a specific use situation and no more. For example, users have the possibility of finding an expression with a meaning similar to the one just found, and have the option of finding an expression with a particular meaning in use situations in which they might not know the fi xed expression or cannot exactly remember the idiom or saying it as it is actually known. Bergenholtz also discusses the default searches in the database and details the data presentations in agreement with the tenets of function theory. Some of the theoretical issues raised by Bergenholtz are also present in Chapters 3, 4, 5 and 6, written by Tarp, Bothma, Spohr and Leroyer,
9781441128065_int_finals_txt_print.indd 4
7/6/2011 11:09:24 PM
Introduction: The Construction of Internet Dictionaries
5
respectively. As also mentioned by Gouws, Tarp defends in ‘Lexicographical and Other e-Tools for Consultation Purposes: Toward the Individualization of Needs Satisfaction’ his belief that there is no need for a new theory at the highest level of abstraction. However, he adds that every discipline must cope with the epistemological process of acquiring new knowledge and, consequently, any general theory about any subject field must constantly be improved, and sometimes replaced. This is the current situation in lexicography where we are witnesses to a paradigm shift with which lexicographers are paying attention to elements that existed earlier but were ‘hidden’ or unnoticed, and of completely new elements which exist and are related to the new media and technologies. The combination of both types of elements has given rise to the consideration of lexicography as a consultation discipline integrated into information science, which initially demands a true definition of two concepts: e-lexicography and lexicographical e-tools. Both are frequently used to refer to any reference work made available on an electronic platform. Tarp, however, believes that both concepts have to be understood in a narrower way and, hence, restricts his classification to what he calls Model T Fords, and Rolls Royces, which are lexicographical works that have been (or will be) constructed with the aim of offering dynamic articles with dynamic data that correspond to the specific types of information needs which specific types of users performing specific types of lexicographically relevant activities might have in any consultation situation. He adds that Model T Fords are already in use, for example the Accounting Dictionaries or the Danish Music Dictionary, whereas Rolls Royces are still in the drawing board – the main difference being the fact that the former allows access to the data selected in a prepared database with browsing on the internet – but do not allow a recreation and re-representation of the data made available in this way. In his contribution to this volume, Verlinde, however, contradicts Tarp’s view by indicating that Rolls Royces might have already come into existence and signals his Interactive Language Toolbox – we will present it in the second part of this introduction – as an exemplar of a lexicographical Rolls Royce. Finally, Tarp claims that a theory like the function theory cannot be built directly upon concrete and individual phenomena that might differ from each other in many aspects, but from an abstraction with which we can work by referring to types of users, types of user situations, types of user needs and types of data that might satisfy these needs. Within this general framework, however, each user, user situation, user need, data and consultation is an individual act and, therefore, the individualization of user-needs satisfaction is a question to be taken seriously, especially because the internet allows lexicographers to provide the necessary mechanisms for an individualization of dictionary content. Tarp describes three methods to achieve individualization: the interactive method, which will allow users to be assisted in making a personal profile as
9781441128065_int_finals_txt_print.indd 5
7/6/2011 11:09:24 PM
6
e-Lexicography
well as indicate the specific type of situation or activity where information needs and, even the specific type of need, occur; the active method, which will allow users to design their own ‘master article’ in terms of the types of data wanted and their arrangement on the screen; and the passive method, which consists of automatic tracking of the users’ behaviour during a number of consultations, thus facilitating the creation of a profile of the type of data that the users generally look for in order to furnish the same type of data when the users once more consult the e-tool. The individualization of need is having another important consequence: it is blurring the same concept of dictionary typology as in lexicographical e-tools conceived and, according to the principles of individualization, there are neither monofunctional nor multifunctional data access routes, but only individualized ones that translate in lexicographical e-tools viewed as one multifunctional dictionary with individualized search options within the framework of its defined functions. As a consequence, the best dictionary in terms of needs satisfaction is not necessarily a monofunctional dictionary, but any dictionary – whether monofunctional, pluri-functional or multifunctional – that allows either monofunctional access or individualized access in the framework of its specific and foreseen functions. Tarp’s vision needs to be implemented by means of an array of devices that are described in Chapter 4. Bothma’s ‘Filtering and Adapting Data and Information in an Online Environment in Response to User Needs’ addresses two related and very connected issues, which are not only crucial for the future of e-lexicography, but also for advancing in the development of user-based theoretical and practical lexicography. He investigates to what extent modern information technology can facilitate the design and implementation of e-dictionaries and/or e-information tools for specific user groups and situations and can enable the user to ‘create’ his/her own e-dictionary and/or information tool(s). Both questions are answered by reviewing a number of information technologies and techniques, namely, searching and navigating, user profiling, filtering, adaptive hypermedia, metadata markup, linked open knowledge, recommended systems and annotation systems – all of which can be used in e-dictionaries to customize, that is, personalize information access in response to user needs. Bothma’s views on the above information technologies and techniques are, on the one hand, encouraging, considering that most of them are already in use in specific e-dictionaries, and on the other hand, rather disappointing, as they are not used to their full potential. Hence, he concludes his chapter by formulating a wish: ‘If lexicographers were to embrace these technologies, it would be possible to provide customized information tools that can satisfy the user needs of all individual users. It would therefore be possible to create information tools that would address the information needs not only of the ‘average’ user, but those of a specific user in ‘one out of a thousand
9781441128065_int_finals_txt_print.indd 6
7/6/2011 11:09:24 PM
Introduction: The Construction of Internet Dictionaries
7
consultations’, by providing ‘dictionaries capable of meeting all the users’ needs in specific types of situations’. In such a customizable e-information tool, z the user will be able to: { set up a complex profile indicating his/her preferences; { change the profile based on specific information needs for any given
situation; and { drill down to the required level of complexity and/or detail in any given
situation; z the system will: { further adapt the profile based on the user’s information behaviour;
and { present information to the user based on the characteristics of such a
profile; z the database will require that: { the data be marked up through a complex metadata schema
– to enable matching the characteristics of the user’s profile with the characteristics of the data; { there be links to external data sources (linking open knowledge) – either through direct linking by the lexicographer; or – by on-the-fly searching of such external data sources { to enable the user to get additional information on demand; z The system will also be able to: { make recommendations to the user based on his/her profile and expressed information need; and { allow the user to make private, group or public annotations to the database to – enhance the user’s future use of the data; and help the lexicographer to keep the database more current and up-to-date. (Bothma, this volume) Spohr’s Chapter 5, for example, ‘A Multi-Layer Architecture for “Plurimonofunctional” Dictionaries’, is one of the first attempts to translate the aforementioned theoretical considerations into practice. He reports on recent efforts to develop a model for a lexical resource that enables the definitions of function-based views on lexicographic content (i.e. he offers a summary of the main ideas discussed in the previous chapters), and provides an infrastructure that can be extended so that it allows for an individualization of access and presentation of such content. The lexicographical data model he proposes argues for a hierarchical organization of the entities in a lexical database in order to express information on different levels of granularity, and highlights the benefits that the use
9781441128065_int_finals_txt_print.indd 7
7/6/2011 11:09:24 PM
e-Lexicography
8
of semantic web standards like Resource Description Framework/ Resource Description Framework Schema ( RDF/RDFS ) and Ontology Web Language (OWL) has for the definition of the lexical resource model here described, which distinguishes between several types of lexical entities, ‘with free and bound units located at the highest level, and more specific subtypes below each of them’. Moreover, this structure is reinforced by the fact that the specifications of the user needs are not included directly in the database but on a separate layer, ‘in order to ensure modularity and thus extensibility’. For example, returning to the question of pluri-monofunctionality, the model deals with three specific questions related to the user’s needs: (i) in which language and vocabulary an indication should be presented; (ii) what should be presented and in which language(s); and (iii) which indications should be accessible for which users. Spohr answers these three questions by detailing the functioning of the model for the German user, and by commenting on a prototypical implementation of the proposed architecture, which includes a prototype of ‘a web-based electronic dictionary containing roughly 14,000 lexemes with 44,000 example sentences and almost 35,000 morpho–syntactic preferences’. Finally, Leroyer´s Chapter 6, ‘Change of Paradigm: From Linguistics to Information Science and from Dictionaries to Lexicographic Information Tools’, is a kind of bridge between Chapters 1–6 and Chapters 7–14. On the one hand, he summarizes in a very direct and convincing way the flaws observed in linguistic approaches to lexicography – mostly those by Atkins and Rundell (2008), and Béjoint (2010) – which are still defending the only practical nature of lexicography and thus neglecting a theoretical basis to the practical activity of dictionary making. Hence, he formulates his claim for a shift of paradigm, which views lexicography as an integrated part of the social and information science paradigm and refers to the interdisciplinary discipline concerned with the study, design and development of functional tools aimed solely at the gratification of human information needs and problems. The distinctive feature of lexicographic tools is the triangulation of three interrelated sets of social, logical and semiotic parameters, corresponding respectively to the following dimensions of the tool: user, access and data. Social parameters are in every single case determined by the systematic identification of the specific information problems, needs and profiles of the potential user of the information tool. The social parameters are decisive for both the functional genesis (communicative, cognitive, operative and interpretative functions) and the gratifying use of the information tool. Semiotic parameters are in every single case determined by such data selection and presentation that ensures gratifying extraction of information in accordance with the specific problems, needs and profiles of the potential
9781441128065_int_finals_txt_print.indd 8
7/6/2011 11:09:24 PM
Introduction: The Construction of Internet Dictionaries
9
user. Data are by nature semiotic and consist of verbal and non-verbal signs. The most frequently used symbols in data selection and presentation are words. Logic parameters are in every single case determined by such structures, modes, indices, algorithms and computing technologies that ensure gratifying access to data in accordance with the specific problems, situations, needs and profiles of the intended user. On the other hand, Leroyer presents four different models of lexicographically designed information tools that illustrate the change of paradigm and demonstrate how lexicography is currently moving towards the realm of information science: z A patient e-dictionary, Lexonco, whose genuine purpose, which was to solve
information problems for cancer patients and their families, was achieved by presenting a modular, functional configuration with three types of individualized access modes: the consultational access mode; the interactive and participative access mode; and the automated access mode. z An e-dictionary of real property that is an e-lexicographic guide to French real estate aiming to solve the communicative and cognitive needs of Danes in two specific use situations: Danes without any command of French and Danes with some command of French. z An e-lexicographic mobile tourist guide with data of two kinds: a conventional user-driven search and navigation mode; and a range of automated access modes. z An e-lexicographic guide for scientific text production that aims to, first, build a complete and adaptive tool to provide assistance to scientific text production in English, and, secondly, to support the practical training of students of specialized translation. In a word, Chapters 1–6 present some new arguments for considering lexicography an integral part of information science. Following suit, Chapters 7–10 reinforce this view by describing in detail how three internet dictionaries have been planned and compiled: the Accounting Dictionaries (Chapters 7 and 8), the Danish Music dictionary (Chapter 9) and a dictionary of English Phrasal Verbs (Chapter 10). They are a glaring example of the kind of e-tools that display ‘dynamic articles with dynamic data, which correspond to the specific types of information needs which specific types of users performing specific types of lexicographically relevant activities might have in any consultation situation’ (Tarp, this volume). The Accounting Dictionaries actually consist of a set of two monolingual and three bilingual online dictionaries in the following languages: Danish, English
9781441128065_int_finals_txt_print.indd 9
7/6/2011 11:09:25 PM
10
e-Lexicography
and Spanish. The theoretical basis underlying the project gives priority to lexicographic functions, that is, the help these dictionaries can give to users in specific types of situation where users require knowledge to resolve issues relating to accounting. In Chapter 7, ‘From Data to Dictionary’, Nielsen and Almind first present and discuss the practical, technical basis of the project, which is made up of two distinct components: database and dictionary; thus, they analyse the theoretical framework of the project (already summarized in Chapters 1–6 of this book), and finally show that the theoretical and practical bases are integral features of the project, and consequently, neither can stand alone. Regarding the database of the Accounting Dictionaries, Nielsen and Almind trace the chartered and difficult history of the conversion of the database, from its origin as a two-tier system for Danish, into its current hub-and-wheel structure for Danish, English and Spanish. They also add that the Accounting Dictionaries are best described as a triadic construction in that the structure consists of three main components: a database; (a) dictionary website(s); a search engine. Describing the dictionaries in terms of the above triadic structure has a number of practical and theoretical implications. First, the database is the source of several dictionaries. Secondly, the dictionaries contain, or might contain, several independent components. Finally, the Accounting Dictionaries do not have macrostructures in the text–linguistic sense, but a ‘data presentation structure’ that ‘is supported technically by an output device which arranges the data retrieved from the database according to type, and presents these data in a predetermined order depending on user needs as identified by the type of help sought’. Regarding functions, intended users groups, and data selection, both Nielsen and Almind and Fuertes-Olivera and Niño-Amo, in Chapter 8, ‘Internet Dictionaries for Communicative and Cognitive Functions: El Diccionario Inglés–Español de Contabilidad’, explain that these dictionaries have two main types of function, for example, communicative and cognitive functions, which aim to satisfy the needs of three main user types: (i) translators and language staff; (ii) accounting experts and semi-experts; and (iii) students and laypersons interested in Danish, English and Spanish accounting matters. They add that the selection of data starts with making an external and internal subject classification. The external subject classification puts into focus accounting texts dealing with the rules for accounting as defined by the International Financial Reporting Standards (IFRSs), with which all companies listed on stock exchanges in the European Union must comply. The internal subject classification was originally limited to financial and management accounting. The preparation of the external and internal subject classification resulted in three language-specific electronic text corpora containing authentic texts such as International Financial Reporting Standards, national financial reporting and bookkeeping standards and statutes, and financial statements published by national and international companies.
9781441128065_int_finals_txt_print.indd 10
7/6/2011 11:09:25 PM
Introduction: The Construction of Internet Dictionaries
11
In addition, Fuertes-Olivera and Niño-Amo add a new plane to their description of one of the Accounting Dictionaries, the Diccionario Inglés–Español de Contabilidad. They envisage the way ahead by exploring two possible developments that could contribute to the elaboration of more focused internet dictionaries at no extra cost: (i) to increase reliability by widening the use of hyperlinks in order to minimize or even eliminate the stress users have when working with texts that are in the forefront of new knowledge, a typical situation when translating, reading and producing specialized texts; and (ii) to write a systematic introduction. Bergenholtz and Bergenholtz, and Andersen and Almind employ the same theoretical framework in Chapters 9 and 10, ‘A Dictionary Is a Tool, a Good Dictionary Is a Monofunctional Tool’, and ‘The Technical Realization of Three Monofunctional Phrasal Verb Dictionaries’, respectively. Bergenholtz and Bergenholtz also discuss the characteristics of a good dictionary. Their answer, which is supported within the tenets of the function theory, is that dictionaries are tools and, hence, they ‘must be designed for a specific and limited purpose’; that is, they must be monofunctional dictionaries, and be the result of lexicographical-based questions such as the following: access to the data in dictionaries; access time; dictionary structures; dictionary functions; dictionary users; use situations and so on. Furthermore, they illustrate the workings of some of the above issues relating to the Danish Music Dictionary, a set of three dictionaries, one for reception, one for knowledge and one polyfunctional, which are the result(s) users can extract from a single database whose technical possibilities are similar to the ones already described in Chapters 2 and 7. Andersen and Almind, first, detail the data input of a work in progress: the construction of a lexicographical database with which users can access several dictionaries of English phrasal verbs that target Danish students and professional translators, to whom the expected dictionaries will offer explicit grammatical data and several linguistic labels, for example, diaphatic, diachronous, diatopical, diatextual labels and so on, as well as English collocations and Danish translations. Secondly, they indicate that the database will be capable of ‘generating three different dictionaries with three different functions’: one for checking the meaning of English phrasal verbs, one for translating English phrasal verbs into Danish, and one for assisting users in producing grammatically and stylistically English phrasal verbs. As well as having the option of accessing the types of data in the individual articles in the database that are relevant for the satisfaction of the given need, users will also have the option of getting access to all types of data in the article in the database, ‘since it may be consulted by users with other needs than the three communicative needs specified here’. In addition, there are four more chapters, which are devoted to reviewing internet dictionaries for English and Spanish and to presenting the Base Lexicale du Français, and to offering a case study on evaluating the usability of
9781441128065_int_finals_txt_print.indd 11
7/6/2011 11:09:25 PM
12
e-Lexicography
internet dictionaries. Although English and Spanish are two world languages, the level of development of e-lexicographical tools in both languages is so different that we are unsure about why such a situation really exists. Lew’s ‘Online Dictionaries of English’ (Chapter 11) presents an overview of the spectrum of available online English language dictionaries, and offers some general comments on a few selected key issues. In his overview, Lew indicates that English lexicography is influencing the workings of lexicographers in every corner of the world, and reports on prominent and representative exemplars of specific types of online dictionaries, which he classifies by making use of well-known criteria, for example, general vs. special purpose dictionary, and newer criteria related to the characteristics of the internet, for example, the distinction between institutional and collective, free vs. paid, and number of dictionaries retrieved when searching. The combination of both criteria results in a proposal of ad hoc categories, which contemplate the existence of General English Dictionaries, Learner’s Dictionaries, User-involved Dictionaries, Diachronic (historical) Dictionaries, Subject-field Dictionaries, Dictionaries with Restricted Macrostructures, Dictionaries with Restricted Microstructures and Onomasiological Dictionaries. Lew also calls the attention to a new generation of internet dictionaries that are especially designed to satisfy specific user’s needs in specific use situations. For example, the main novelty of the Louvain EAP Dictionary, which is being developed as a dictionary for non-native writers, ‘is that it is customizable in terms of field domain (business, medicine) and mother tongue (French, Dutch). As a consequence, usage notes and equivalents match the L1 of the user, and some of the examples are domain-specific’. The second part of his chapter is devoted to analysing some issues with online dictionaries, which he considers to be relevant. For example, dictionary aggregators usually retrieve long and very similar entries, which results in ‘highly unhelpful, many-times redundant, tortuous assemblages of disconnected lexicographic data’. Similarly, he comments on the ‘step-wise approach to outer access’, and claims that the option taken by myCOBUILD.com, namely, the partial entries which are listed, seems more adequate than other options. He also alerts against the view espoused by some researchers who confound dictionaries and databases, and exemplifies uses of corpus interfaces and wrappers that are similar to dictionaries. Although the latter are so sophisticated, Lew claims that ‘there is not much hope that their popularity will extend much beyond a relatively small group of power users; the others will just increasingly Google for any answers, irrespective of the nature of the problem and I fear that this tendency presents a real threat to more specialized reference tools, including dictionaries’. Sánchez and Cantos’ ‘e-Dictionaries in the Information Age: The Lexical Constellation Model (LCM) and the Definitional Construct’ (Chapter 12) is
9781441128065_int_finals_txt_print.indd 12
7/6/2011 11:09:26 PM
Introduction: The Construction of Internet Dictionaries
13
amazing in as much as it is a vivid example of a change in focus due to external circumstances. They should have presented a critical review of internet dictionaries of Spanish, but found that these are almost non-existant, and that the few they managed to find were not proper internet dictionaries but paper dictionaries that have been uploaded to the internet. Fortunately, Sánchez and Cantos were able to reorganize their contribution and develop an interesting proposal, one centred on the possibility of using the Lexical Constellation Model (they presented this model initially in Cantos & Sánchez, 2001) for producing better lexicographic definitions. Sánchez and Cantos’s main dissatisfaction with how many dictionaries deal with definitions is their conviction that words ‘are typically defined as if isolated and somehow monolithic units, with only occasional reference to contextual elements (especially collocates) and with scarce information on the intricate web of semantic relationships that shape the units of meaning we call words, or the complex semantic relationships among words associated to specific lexical fields’. This view, which can be disputed by making reference to internet dictionaries in their own right, for example the Accounting Dictionaries, translates into constructing more precise definitions by signalling semantic attraction among words, especially in cases in which such an attraction was apparently unexpected (Cantos & Sánchez, 2001). It results from the clustering of specific semantic features that are perceived as units by the speakers, although this perception can vary when attending to personal and/or contextual factors. Hence, they propose definitions that are tailored to the user’s needs, make use of every possible resource (words, pictures, photographs, videos, etc.), and result ‘from a modular, and hierarchical approach increasing in structure, data and complexity, which goes hand in hand with the user’s demand, user’s access to data and user’s access to (non)lexical tools’. Their proposal is illustrated with the Spanish word mano [Eng: hand], which is defined differently when attending to the condition of the user as a pupil, semi-expert or expert, and a list of resources – most of which are already in operation in internet dictionaries such as the Danish Music Dictionary and the Accounting Dictionaries – that constitute a complete novelty in Spanish lexicographical circles, which are still dominated by linguistic colonialism and the consideration that lexicography and lexicology are related disciplines. Verlinde’s ‘Modelling Interactive Reading, Translation and Writing Assistants’ (Chapter 13) presents the Base Lexicale du Français, which is hailed as a web-resource that has a task-oriented approach offering ‘an alternative to the individualization of lexicographical e-tools by recreating and re-representing data on user profiles to optimize electronic dictionaries’ (Tarp, this volume). Verlinde initially expresses his doubts on the usability of user-involved, bottom-up, collaborative lexicography that lacks critical thinking, leads to confusion and makes certain the more is less paradox, with which scholars, for
9781441128065_int_finals_txt_print.indd 13
7/6/2011 11:09:26 PM
14
e-Lexicography
example Schwartz (2004), refer to the traditional view espoused by linguisticoriented lexicographers, who defend the aggregation of as much data as possible in a dictionary article, no matter its usefulness for the potential user. Thus, he formulates the question he addresses in his chapter, and which is concerned with the kind of user interface that will give as many users as possible access to the information that is relevant in a given context in a user-friendly way. His answer, as in many other recent works by Verlinde and colleagues (e.g. Verlinde, Leroyer & Binon 2010), is the Base Lexicale du Français (BLF), which originated as a database for French only, but which ‘is being expanded to become a multilingual application which will be renamed Interactive Language Toolbox (ILT). The interface’ of the BLF is similar to the one described for the Danish Music Dictionary and the Accounting Dictionaries, and hence considers the possible needs of a user in order to create various small, monofunctional dictionaries. He adds that on the homepage the user does not find a text box for entering his or her search string, but several possibilities that are forcing him or her to identify his or her consultation situation and needs (Tarp, 2008a). Furthermore, the current version of the BLF homepage not only offers access based on the user’s needs, but also on specific tasks performed by the user (typically, reading, translating or writing). Verlinde also shows some advantages of the BFL in a teaching and learning environment. Regarding the BLF reading assistant, Verlinde compares the BLF with the Alexandria tool and advances the possibility of incorporating syntactic analyses of the sentences included in the assistant as well as a contextual translation for multi-word expressions. The BLF translation assistant has the ability to adapt to the topic of a text, and provides the most common word combinations for many academic words. The BLF writing assistant does not correct the submitted text, as spelling and grammar checkers do, but identifies ‘syntactic and lexical patterns which may contain errors’ Moreover, it also contains a tool, already available for Academic Dutch, which enables users to expand their vocabulary (words and word combinations), suggests (near) synonyms and hyperonyms as academic alternatives to general language, and displays specific word combinations for certain words in the text. Verlinde finishes his contribution by commenting on two future developments: (i) to have access to rich databases; and (ii) to carry out precise analyses of submitted texts. In ‘Electronic Dictionaries as Tools: Toward an Assessment of Usability’ (Chapter 14), Heid claims that if electronic dictionaries are to be understood as (software) tools, they should also be designed according to the principles applicable to software tools. One such principle is usability, a concept developed within information science with the aim of assessing the effectiveness and efficiency of the tool when used in a particular situation and for a particular task.
9781441128065_int_finals_txt_print.indd 14
7/6/2011 11:09:26 PM
Introduction: The Construction of Internet Dictionaries
15
Heid reports on a set of usability tests applied to three electronic dictionaries: the ELDIT, Elektronisches Lernerwörterbuch Deutsch–Italienisch (Abel & Weber 2000; Abel & Campogianni 2005); the BLF, Base Lexicale du Français (Verlinde, this volume; Verlinde et al., 2010, etc.); and the OWID, OnlineWortinformationssystem (Müller-Spitzer, 2010). Heid’s case study is precisely described, and hence it can serve as inspiration for future studies on, say, the quality of internet dictionaries. In sum, Heid not only illustrates the close connection between lexicography and information science, an idea already discussed in several chapters of this book, but also offers an interesting and different methodology than the one used by, say, Bergenholtz and colleagues, who have also reported on several methods for assessing empirically whether a user gets what he or she needs and how long this process takes. In particular, Heid’s proposal is really provocative as it tries to relate the basic objectives of usability testing with the concerns of lexicographic theory, and to propose steps towards usability engineering of electronic dictionaries. Finally, the book includes a summary (Chapter 15) – written by Samaniego Fernández and Pérez Cabello de Alba – of the key issues in e-lexicography that were hotly debated by the participants in the international symposium on e-lexicography. They follow Andersen and Nielsen (2009) and, consequently, raise a list of issues, some of which were initially discussed at an International 2008 Symposium hosted by the Centre of Lexicography at the University of Aarhus, and which are expected to generate controversy in the near future. To sum up. The contributions in this book have the following demands in common: A good information tool should be easy to use, easy to learn to use and be able to provide a result in a short span of time. Most of the contributions, except the one by Theo Bothma (Chapter 4), have focused on e-dictionaries, and have emphasized that the dictionary concepts described and analysed are broader than those typically commented on in relation to printed dictionaries. We can summarize them by indicating that a dictionary might have more than dictionary articles (for example, the Musikordbogen or Danish Music Dictionary), as dictionaries should be planned to satisfy the user’s needs in use situations. Hence, a question arises: are we still calling dictionaries the e-tools of the future, especially when some of these are also including systematic introductions, grammar books and other components that are very far from traditional dictionary articles? Or if the question is asked in connection with Grefenstette’s (1998) ‘Will There Be Lexicographers in the Year 3000?’. Probably not. But our reasons for this answer are different from the ones typically espoused by Grefenstette and most British and American lexicographers, as we no longer see the future of lexicography connected more with linguistics than with many other disciplines. At the end of the day, the future of lexicography, and this is what this book is about, starts by planning, compiling and studying information tools, regardless of what we call them. We can call them dictionaries. And this is the name of a famous
9781441128065_int_finals_txt_print.indd 15
7/6/2011 11:09:26 PM
e-Lexicography
16
and well-known information tool, an information tool that has the following specifications: z it is small and handy; it is always available (outside, inside, on holiday, etc.);
it provides answers to all questions; and it provides answers in a way that the user understands them easily; it gives the correct answer, but not a more detailed answer than the one that is necessary in order to solve the problem, for example, when its users, namely Huey, Dewey and Louise, do not know the directions, they must look for help to find the right way; when they do not know whether a particular dish is poisonous or not, they look for help; and when they need help in order to talk to a gorilla, or to understand an inscription carved in an ancient language, their ‘dictionary’ provides them with assistance. This ‘dictionary’ is a very special one; it is a special information tool that is called a personal dictionary, that is, a tool that is only used by Huey, Dewey and Louise. Although we could call it an e-dictionary, it is in fact an intelligent computer, which is able to intuit what Huey, Dewey and Louise want to know and give them the right answers in a way that they can understand. It is a tool that can read their thoughts, and respond to them in a hand-held computer, which has a book jacket as protection (usually red, sometimes yellow). The answer comes up quickly; it is always true and contains all the relevant data – and nothing more. Of course, we do not have such an e-tool and we will not have one in the future. But what we will see in the near future is that the often-quoted division between dictionaries, lexicons, encyclopaedias, handbooks, manuals and so on, will be replaced by different monofunctional tools for special user types with special needs. We think that many of these tools can be used without having to pay them, whereas some others will be more specialized and will be aiming at solving the specific needs of specialists who will pay for them provided that they are updated immediately. This book deals with both types.
9781441128065_int_finals_txt_print.indd 16
7/6/2011 11:09:27 PM
Chapter 1
Learning, Unlearning and Innovation in the Planning of Electronic Dictionaries Rufus H. Gouws
1.1. Introduction When talking about the future of e-lexicography, that future should not be isolated from either the past or the present. In this paper, I show how future lexicography, with regard to both electronic and printed dictionaries as well as dictionary research, can benefit from the past by being cognizant of positive and negative aspects of both types of dictionaries. Therefore, I will be taking a present- and past-based look at the future. Although many remarks in this paper, and many comments regarding the development of lexicographic theory, are not explicitly directed at electronic dictionaries, it is implied that electronic dictionaries fall within the scope of these remarks and within the scope of a general theory of lexicography. We often make a well-motivated distinction between a contemplative and a transformative approach in lexicography. While the former focuses on an investigation of the prevailing situation, the latter opts for a development from a current situation to something new. It is a valid distinction indicating two approaches based on diverse final goals. Although when employing a contemplative approach one could gain from the experiences of existing dictionaries, such an approach too often lacks new ideas. On the other hand, a transformative approach works with innovative ideas but sometimes fails to fully recognize the value of already accomplished success. When planning electronic dictionaries, and when I am using the term ‘electronic dictionary’ I am referring to online/internet dictionaries and not, unless otherwise stated, to CD-ROM dictionaries, the obvious approach could be to argue that we are dealing with something totally new and, therefore, we need to follow an exclusively transformative approach, without any attention to what has already been achieved in the area of printed dictionaries. With regard to CD-ROM dictionaries, one has to accept the fact that many of them are only paper dictionaries
9781441128065_ch01_finals_txt_print.indd 17
7/6/2011 11:09:53 PM
18
e-Lexicography
on electronic platforms and that no new theory is needed for them. Therefore, they will not be the target of this discussion. With reference to the occurrence of real electronic dictionaries, it could also be argued that we merely have a change in medium and that a dictionary remains a dictionary. Paul Kruger, a nineteenth-century South African statesman, once remarked that one should take from the past what is good and use it to build the future. This belief also applies to lexicography. Although the electronic medium has its own demands, possibilities and challenges, it does not imply that lexicographers compiling an electronic dictionary should categorically eschew the work done in the production of printed dictionaries. One can benefit from learning some things from printed dictionaries, and the lexicographic wheel should not be re-invented when embarking on the planning and compilation of electronic dictionaries. What should be learned from the past, and this applies to both printed and electronic dictionaries, is to conscientiously avoid similar traps and mistakes, especially in cases where what are now seen as mistakes were then regarded as the proper way of doing things. This is where the transformative approach with its acknowledgement of the past, but also with its future visions, plays an important role. In these new endeavours, we as lexicographers are still bound to make mistakes in the future, but we have to restrict ourselves to making only new mistakes. Therefore, the planning of new dictionaries should not be dominated by tradition, but innovation and intelligent boldness should constitute the guiding principles. A good transformative approach should allow a reflexive component that partially overlaps with a contemplative approach. In this regard, it is important to be cognizant of an early remark by Zgusta (1971: 18) that lexicography ‘is an activity in which tradition plays a great role’, but he also states that ‘things may change slightly if more interest is given to lexicographic theory and if new work procedures or ways of presentation are developed, tried out, and used’ (19). He further continues ‘The lexicographer’s work is always creative, in a greater or a lesser degree, because he must always try to find new solutions to problems as yet unsolved’ (20). This is the challenge faced by lexicographers working within both the printed and the electronic medium. Many modern-day lexicographers are blessed or cursed with an overactive methodological memory that continuously reminds them of the ways in which they and their colleagues performed their earlier lexicographic endeavours. When employing a transformative approach in the planning of a new dictionary, such a methodological memory could be detrimental, and lexicographers might need to be compelled to adopt an unlearning mode (delete, delete, delete) to rid themselves of some of the traditions enforced by the methodological memory. Where lexicographers embarking on the planning of new, especially electronic, dictionaries are familiar with the traditions and practice of printed dictionaries, their new assignment might also demand the
9781441128065_ch01_finals_txt_print.indd 18
7/6/2011 11:09:53 PM
Learning, Unlearning and Innovation
19
unlearning of certain established habits which have no place in the electronic medium. With its practical and theoretical components, lexicography can be regarded as a two-legged animal. The lexicographic practice resulting in printed dictionaries, irrespective of being produced on clay tablets, that is, the real hardcover dictionaries, papyrus leaves or paper, developed in what can cautiously be called a pretheoretical era. However, this does not imply that there was no theory supporting some of these dictionaries. The history of the lexicographic practice clearly shows that some dictionaries had not been compiled in a haphazard way but according to a well-devised plan. In some cases, this plan had even been published, cf. Samuel Johnson’s famous The Plan of a Dictionary (1747) that preceded the work on his A Dictionary of the English Language (Johnson 1755). Such a plan or model can be regarded as the theoretical framework for such a dictionary. Unfortunately, too many dictionaries did not bear evidence of such a plan or a clear theoretical underpinning. Metalexicography or dictionary research, representing the formal theoretical component of lexicography, is a relative latecomer to the playing field. Consequently, in contrast to many of their predecessors, modern-day dictionaries have the advantage of a sound theoretical basis and lexicographers have no extenuating circumstances for not relying on the available theoretical framework when compiling their dictionaries. This is of special significance to electronic dictionaries. Today’s dictionaries must be better than their predecessors. A little while ago, a university colleague who knows I am involved in lexicography put a valid question to me. He wanted to know whether all the research in theoretical lexicography has led to an improvement in the quality of dictionaries. I immediately responded in a positive way, perhaps too positively. But his question made me think quite a bit. We pride ourselves that lexicography is an independent discipline with dictionaries as its subject matter. If modernday dictionaries, including electronic dictionaries, are not really regarded by their intended and loyal users as being better than their older counterparts, some serious questions must be raised regarding the relevance and future of our discipline. In contrast to the early work on printed dictionaries, the practice of electronic dictionaries developed in an era where well-established theoretical frameworks are available. If electronic dictionaries do not utilize this advantage, then something is rotten in the state of Dictionopolis. However, it is important that the development of electronic dictionaries should not be isolated from that of printed dictionaries. New ideas formulated for new electronic dictionary projects will often also have applicability in printed dictionaries. Lexicographers involved in the planning of these electronic products should refrain from jealously guarding their new ideas in order to restrict their application only to this medium. These ideas should be made available to
9781441128065_ch01_finals_txt_print.indd 19
7/6/2011 11:09:54 PM
e-Lexicography
20
the printed medium, as well, so that lexicography in general can benefit from these new developments.
1.2. Looking at the History of Theoretical Lexicography The distinction between a contemplative and a transformative approach should not be regarded as only applicable to the planning of dictionaries. It can also be interpreted in a wider sense, where it applies to how one approaches the development of lexicographic theory. When revising, changing, adapting and improving lexicographic theory, our main focus should be on a transformative approach in order to reach new goals. For the future of electronic dictionaries, it is important that new theoretical horizons be identified and investigated. To do this, one should not totally eschew a contemplative approach, because a number of lessons are to be learned from the history of lexicographic theory. A historical view of the development of theoretical lexicography, a contemplative approach, gives evidence of some distinct phases in this ongoing progress, but also of some dominant influences (see Gouws, 2004). This forms an important background to a transformative approach that looks at innovative ideas for further developments, and contains a reflexive component to make provision for the inclusion of positive and valid aspects from the current and even older theory to support new ideas, for example, many of the suggestions introduced in the theoretical approach of Ščerba (1940) that are still valid today. Although there are a few exceptions, the introduction of mainstream lexicography has been as a subsection of linguistics. Some of the early publications in this regard were Chapman (1948), Doroszewski (1954), Garvin (1955), Országh (1962) and the significant Householder & Saporta (1967), a book in which the majority of contributions focused on problems of a linguistic nature. This approach was maintained in Ladislav Zgusta’s famous and ground-breaking book, Manual of Lexicography (1971). In the foreword of this book, Zgusta states, ‘we also hope that a coherent statement and discussion of lexicographic problems will help to clarify them, and to demonstrate the importance of their being conceived in the framework of the linguistic theory more effectively’. He starts the introduction to his book as follows, ‘There can be no doubt that lexicography is a very difficult sphere of linguistic activity’. The Manual of Lexicography gave a brilliant introduction to various aspects of lexicography, but the discussion was presented within a strong linguistic framework, the then-prevailing way of looking at lexicography. Zgusta’s book had a big influence on the development of lexicographic theory in the 1970s. As a result, the main focus in earlier lexicographic research had been on the contents, especially the linguistic contents, of dictionaries, with emphasis on the different types of linguistic data presented in dictionary
9781441128065_ch01_finals_txt_print.indd 20
7/6/2011 11:09:54 PM
Learning, Unlearning and Innovation
21
articles. The linguistic bias is especially clear in the research regarding the treatment of meaning in dictionaries. Following the clear distinction between semantic and encyclopaedic data, as typically prevailing in the linguistics of the era of structuralism, linguistically biased theoretical lexicographers analyzed dictionaries, for example, to find the occurrence of encyclopaedic data in lexicographic definitions. They did not question the need for this type of data or the extent that might be permissible but they merely rejected it as nonlinguistic entries. I am not pleading for encyclopaedic data in dictionaries – in many cases the linguists were quite right in condemning the specific entries, but they condemned them on the wrong grounds. They are inappropriate not because they are of an encyclopaedic nature but because they go beyond the needs of the typical user of a given dictionary. As Bergenholtz & Gouws (2007a) have indicated, the distinction linguistic and encyclopaedic might be of interest to linguists but from a lexicographic perspective it has little relevance. Also in electronic dictionaries, where space restrictions do not play such an important role, the decision regarding the inclusion of data must be based on the needs of the users and not on the possibilities of the medium. The strong linguistic focus in at least some dictionaries can also be explained from another perspective – and this is where lexicographers of electronic dictionaries must play a guiding role. The early theoretical discussions of dictionaries mainly had linguists as participants, seeing that theoretical lexicography developed within a linguistic fold. The lexicographic practice is much older than theoretical lexicography, with ‘theoretical lexicography’ referring here to the scientific discipline. It must be accepted that theoretical lexicography had its foundations in the lexicographic practice, and not vice versa. The development went from the practice to the theory. Lexicographic theory developed as a response to the lexicographic practice and the original interest was primarily in the linguistic contents of dictionaries. In this regard, Afrikaans offers an interesting illustration. The development of meta-lexicographical discussions with regard to Afrikaans dictionaries runs parallel to the work on the comprehensive multivolume monolingual Afrikaans dictionary, the Woordeboek van die Afrikaanse Taal (henceforth abbreviated as the WAT). According to Gouws (1997: 19) the WAT played a central role in the development of the Afrikaans meta-lexicographical discussion; see also Botha (2003). This is due to the fact that early meta-lexicographical work in Afrikaans was directed at this dictionary, for example, a suggested model for this dictionary (Boshoff, 1926) and, especially later on, various critical reviews of the early volumes of the WAT, for example, Combrink (1962; 1979), Grobler (1978), Odendal (1979), Gouws (1985). Linguists wrote these discussions and reviews, and their comments focused specifically on the linguistic contents of the WAT and the success or failure of this dictionary as a source of linguistic data. In a similar way, linguists prompted the meta-lexicographic discussion regarding the dictionaries in other languages and the initial attempts to formulate a coherent lexicographic
9781441128065_ch01_finals_txt_print.indd 21
7/6/2011 11:09:54 PM
22
e-Lexicography
theory. Current lexicographic theory owes much to the efforts by linguists, but this does not imply that current and future lexicographic theory must play a subordinate role in its relation to either linguistics or linguists. It is not surprising that the focus on the linguistic contents of dictionaries and the fact that the participants in the lexicographic discussions were primarily linguists resulted in lexicography being treated as a contents-based sub-discipline of linguistics, with limited attention to some issues regarding dictionary typology. But even in the typological classification of dictionaries, the distinguishing features often displayed a linguistic nature. Linguists were the princes of meta-lexicographic discussions, and meta-lexicography and practical lexicography were subsections of the work done by these linguists. This era in the history of lexicography can rightfully be regarded as representing a form of linguistic colonialism, with respect to both the lexicographic theory and practice. Where this approach still prevails, some serious unlearning is necessary to ensure the eventual success of e-lexicography. A second phase in the development of theoretical lexicography witnessed the introduction of an emphasis on dictionary structures, with Herbert Ernst Wiegand a leader and an extremely productive participant in this phase. A focus on dictionary structures does not give an inferior position to the contents of dictionaries, but it does show that the packaging of data plays an important role in the success of dictionaries, in the words of McArthur (1986), as containers of knowledge. The importance of the movement from a contents-based to a structure-based approach in meta-lexicographic thinking might never be underestimated. This is not because the structures are important or the contents are of less importance, but because it represents a shift that indicates a totally different approach to the scope of the research domain of theoretical lexicography. It has often been said that language is the study or research object of linguistics. The early phases of theoretical lexicography primarily focused on the language in dictionaries and the analysis of the presented linguistic data. Language, albeit canned into a dictionary, remained the primary study object of the earlier lexicography. General language dictionaries are books about language and the linguistic contents of these dictionaries will always remain important, but the linguistic contents should not be the dominant study object of lexicography: dictionaries are the research object. With the focus on dictionary structures, Wiegand not only introduced a deviation from the contents, especially the linguistic-based, approach in lexicographic research, but he also introduced an approach that sees lexicography no longer as a sub-discipline of linguistics but as an independent discipline in its own right. Dictionary structures are of little interest to hardcore linguists, but they fascinate lexicographers – or at least some lexicographers. In the early phases of the development of theoretical lexicography, there was a noticeable emphasis on general monolingual and bilingual dictionaries. The current approach of no longer seeing lexicography as a sub-discipline of linguistics has opened
9781441128065_ch01_finals_txt_print.indd 22
7/6/2011 11:09:54 PM
Learning, Unlearning and Innovation
23
the door for other dictionaries, in which the focus is not on language, to be regarded as lexicographic projects of which the planning and compilation also have to rely on a sound theoretical basis. This basis cannot be a linguistic basis but must be a lexicographic basis. Whether compiling an explanatory monolingual dictionary, an etymological dictionary, a dictionary of place names or a dictionary of geology, the compiler is a lexicographer. Lexicographers do not have to be linguists; see Bergenholtz & Gouws (to be published). A third and equally important phase in the development of theoretical lexicography has been the shift in emphasis towards lexicographic functions. Although there have been frequent references to the intended target user, the needs and the reference skills of this user in earlier discussions, see for example Hartmann (1989), the lexicographic functions approach puts the dictionary user and the situation of use at the centre of the discussion. A first step in the planning of any new dictionary must be to determine the envisaged functions, in accordance with an identification of the intended target user. The functions are formulated as an answer to the question: ‘What do I want my target user to be able to do with the envisaged dictionary?’ All other aspects of the lexicographic process result from the functions allocated to the planned dictionary. By adhering to the lexicographic function approach, the need for contents and structures is not ignored, but is merely put into perspective and introduced in relation to the functions. In this regard, the work by Bergenholtz and Tarp, for example Bergenholtz & Tarp (2002) and Tarp (2000), paved the way for a new way of thinking about dictionary research and its application in the lexicographic practice. A lexicographic functions approach is not restricted to a certain type of dictionary (or to a specific medium). Each and every dictionary, irrespective of the typological nature, should satisfy one or more functions. These functions are not linguistically determined. Whereas the text reception and text production functions have successful communication, by means of language, as objective, the cognitive function can be satisfied without involving language as a subject matter. Yet again, this approach confirms the notion that lexicography is an independent discipline. When looking at the future of electronic dictionaries, it is important to realize that the above-mentioned development has primarily occurred in the field of printed dictionaries, albeit that some aspects of lexicographic theory have recently also been interpreted in terms of the electronic medium. With lexicographic theory going through different phases in its development, its application has resulted in the lexicographic practice, especially the practice of printed dictionaries, being subjected to changing theoretical demands. Lexicographers working in the field of electronic dictionaries should realize that the planning and compilation of dictionaries in this medium need not go through all the same phases that crossed the way of the development of printed dictionaries. We have already identified the user, the needs of the user and the functions to ensure the satisfaction of these needs. Lexicographers of future
9781441128065_ch01_finals_txt_print.indd 23
7/6/2011 11:09:54 PM
e-Lexicography
24
printed and electronic dictionaries are in the fortunate position that they can employ the current state of lexicographic theory as point of departure. Electronic dictionaries should develop as a new medium and not the practical component of a new theory or discipline. Lexicographic theory must develop in such a way that it is not a medium-specific theory but rather a theory for, at least, all lexicographic tools.
1.3. Lexicographic Theory for Printed or Electronic Dictionaries? One of the noteworthy attempts in the work of Wiegand was his desire to formulate a general theory of lexicography, see Wiegand (1984). In his theoretical discussions, he focuses primarily on what he calls Sprachlexikographie, that is, ‘language lexicography’. His theory primarily has printed dictionaries in its scope, with the structures introduced in his research therefore also directed at printed dictionaries. Some of these structures would also apply to electronic dictionaries but will need to be adapted, whereas other structures, for example, the macrostructure, have no application possibilities in electronic dictionaries, that is, in internet dictionaries, although they might be useful for those CD-ROM dictionaries which represent little more than a digital format of an already existing printed dictionary. If theoretical lexicography must play its rightful role, the medium of a given dictionary might not determine whether such a dictionary falls within the scope of a given theory. The formulation of a general theory of lexicography should be general enough to provide the theoretical basis for dictionaries in every available medium. We do not need two separate theories for printed and electronic dictionaries – that would seriously diminish the power of our theory. In this regard, a significant advantage of the lexicographic function approach is that since its introduction it has not been a medium-specific theory. When a given function determines the way in which a given dictionary is presented, it does not matter whether it is a printed or an electronic dictionary. The theory requires no adaptation to embrace all different types of dictionaries within its scope. In the Transformational Generative Grammar of the early 1970s, the seminal work by John R. Ross (1967), Constraints on Variables in Syntax, had quite an influence on the formulation of syntactic theory. I am not going to discuss the theory of Ross or try to imply that it has any relevance for lexicography, but I would merely like to misuse the title of his dissertation. We do not need constraints on variables in lexicography. Rather, we need a theory that is powerful enough to embrace the variables – whether they stem from different typological classifications or different media employed to make data available to users.
9781441128065_ch01_finals_txt_print.indd 24
7/6/2011 11:09:54 PM
Learning, Unlearning and Innovation
25
1.4. Electronic Dictionaries: A New Start Although we have had a decade or two of electronic dictionaries, we still are in an early phase of this type of lexicographic endeavour. We are still in the position where we can ensure that the mistakes made in the development of printed dictionaries are not repeated in the planning and compilation of their electronic counterparts. One of the major advantages of electronic lexicography is that it has had its point of departure in the years after the linguistic liberation. Although there will always be room for a variety of dictionary types and although the average person in the street, or behind the computer, will still rely on dictionaries as tools to help solve their communication needs by providing access to linguistic data, it is now recognized that linguists are not the only ones who have a role to play in the planning and compilation of dictionaries. It is also recognized that the planning and compilation of certain dictionaries have no need for any linguistic input. It is also recognized that even when making linguistic dictionaries, different categories of expertise are needed to ensure the best possible product. As a result of the linguistic liberation, a broadening in the expertise base of lexicographers has been established. And we must take this further. The planning and compilation of dictionaries are determined by user needs. This user-directed approach leads to decisions regarding the functions, contents and structures of dictionaries. In an electronic environment, this can come much more easily to fruition compared to a printed dictionary. Where printed dictionaries too often tried to satisfy as many lexicographic needs of as many potential users as possible, or rather impossible, electronic dictionaries enjoy a point of departure without that phase. The fact that the species of lexicographers also includes experts in fields other than linguistics opens new ways of making data available to different users. One of the possibilities of electronic dictionaries that still have to be employed in a much more comprehensive way is the use of data banks from which different types of dictionaries and even different dictionaries of the same type can be extracted. The Centre for Lexicography at the Aarhus School of Business has produced a number of electronic dictionaries where users with different needs can use the dictionary to access the data needed to retrieve the type of information they seek. Bergenholtz, Gouws & Claassen (unpublished) recently developed a model for a series of German–Afrikaans dictionaries to be compiled from one data bank. This includes a text production dictionary for mother-tongue speakers of German, a text reception dictionary for mother-tongue speakers of Afrikaans, a dictionary for translation purposes, and so on, and the model makes provision for different levels of usage that can be retrieved from this data bank. Whereas a paper dictionary can undergo a number of applications of cut and paste to lead to other paper dictionaries, the basis developed for an electronic dictionary offers the
9781441128065_ch01_finals_txt_print.indd 25
7/6/2011 11:09:54 PM
26
e-Lexicography
real and immediate possibility to extract different dictionaries from the same basis. But this demands a different way of planning dictionaries, and lexicographers must look beyond the idea of working on a single isolated dictionary. Planning must be done in such a way that the collection and structuring of data can offer access from different points of departure to retrieve different types of information by different users with different reference skills in need of different types of assistance for different user situations when consulting different dictionaries – all in the same data bank. Ensuring such procedures is one of the challenges of modern-day lexicographers.
1.5. Towards a New Destiny Wiegand (1989: 251) maintains that lexicography is a practice that is aimed at the production of dictionaries in order to initiate another practice; that is, the cultural practice of dictionary use. Enabling the eventual successful dictionary use should remain the prime objective of any lexicographic theory – regardless of whether the dictionary use refers to the use of a printed or electronic dictionary. The theories developed for printed dictionaries in the pre-electronic era were directed at dictionaries as sources of reference. The linguistic liberation also led to a linguist liberation. Not only is lexicography no longer a sub-discipline of linguistics, but lexicographers are no longer a subspecies of linguists. This implies that the endeavours to produce reference sources will not necessarily be language dictionaries – or even dictionaries, as such. The electronic era will take lexicography to a different level – and this is where the earlier-mentioned intelligent boldness comes into play. Learning and unlearning also applies to thoughts on the scope and application range of lexicographic theory. A too narrow dictionary application of lexicographic theory must be unlearned and a broader reference source application learned in its place. A general theory of lexicography will focus on lexicographic work, but should not isolate itself from the theory underlying the development, planning, compilation and publication of other reference sources. The theory of lexicography should not be the theoretical discussion of a sub-discipline of linguistics but rather a sub-discipline of a broader theory of reference works. Such a theory should focus on the users of these reference sources, whether dictionaries or not, on the functions of these sources, the data presented in them, the structures to accommodate the data and, ever so important, access to the data in order to achieve an optimal retrieval of information. Consequently, dictionaries should no longer be the only products resulting from the application of lexicographic research, and the lexicographers doing the research should not occupy their efforts merely with dictionaries. In this regard, they should be cognizant of a different kind of access; that is, the access which lexicographic research opens to the much wider area of the world
9781441128065_ch01_finals_txt_print.indd 26
7/6/2011 11:09:55 PM
Learning, Unlearning and Innovation
27
of reference and its spectrum of reference sources – access to the information era. Lexicographic theory has ensured a scientific way of compiling dictionaries but, unfortunately, this scientific basis is still lacking when it comes to the production of so many other reference sources that are needed as information tools. Applying lexicographic theory to the more general reference source domain does not imply an attitude of lexicographic imperialism. We do not want to replace linguistic colonialism with lexicographic colonialism! It does not enforce lexicographic theory on anything else, but merely makes the results of lexicographic research available to many more applications. Bergenholtz & Gouws (2010) have indicated that dictionaries give better access to their users than do linguistic textbooks. The access process is but one of many aspects where other reference sources can benefit from the results of research in the field of lexicography. Compare in this regard also the work done by Leroyer (2008a; 2008b) with regard to tourist guides. One of the most important features that should be recognized for electronic dictionaries is the fact that linguists are not the only, or even the major, participants in the discussion of these dictionaries and the development of a lexicographic theory applicable to these dictionaries. Yet again, a form of unlearning is required: as modern-day lexicographers, we must move away from the traditional circle of people responsible for the discussion, planning and reviewing of dictionaries. We need a different species of role players, that is, a new generation of lexicographers, and identifying these lexicographers must be the result of pondering on the question ‘Who is a lexicographer?’ (Bergenholtz & Gouws – to be published). Lexicography, with regard to both the theoretical and the practical level, should be performed as a team effort and not as a Lone Ranger endeavour. The team members will typically come from different fields, depending on the type of dictionary to be compiled, and these different team members will acknowledge the diversity residing in and implied by the title of ‘lexicographer’. This implies a paradigm shift, and although this envisaged shift is directed at electronic dictionaries, it is not restricted to or exclusively initiated by thoughts regarding electronic dictionaries. This new way of dealing with the concept lexicographer must be applied to both electronic and printed dictionaries. One way of applying it is by convincing publishers that review copies of dictionaries should not only be submitted to linguists, but also to a wider spectrum of candidate lexicographers. Contrary to the nature of the development of the Afrikaans lexicographic discussion, as indicated earlier in this paper with regard to the role of the WAT, the further development of lexicographic theory should not result only from the reviews and comments by linguists. The unlearning process has another important aspect, along with the distinction between a contemplative and a transformative approach, and this has wide-ranging implications for future dictionaries and future developments in
9781441128065_ch01_finals_txt_print.indd 27
7/6/2011 11:09:55 PM
28
e-Lexicography
meta-lexicography. The example regarding Afrikaans and the WAT might be unique, but the broader principle of theory being formulated to capture features of the lexicographic practice has a more general occurrence. One of the noteworthy aspects of lexicographic theory is that it has been a follower and not a leader. Theory followed practice and reflected what had been happening in practice. This implies that someone planning a new dictionary could use the existing theory to plan and compile a dictionary similar to existing dictionaries. Lexicographic theory was based on a description of what had previously occurred in the lexicographic practice. But adhering only to the guidelines and models presented in the prevailing theory, leaves little room for innovation. Too often, the lexicographic practice has been a matter of more of the same. Unfortunately, this situation will not change if an exclusively contemplative approach in which existing dictionaries are regarded as the only point of departure for the formulation of lexicographic theory and dictionary models continues its dominating role. However, improvement is not guaranteed by the opposite approach where theory is divorced from the practice and formulated in isolation and then enforced on the lexicographic practice. The application of such a practice-alienated theory has little chance of leading to the compilation of successful dictionaries. The success of new dictionaries and new contributions to the theory of lexicography demands a comprehensive transformative approach – comprehensive because it includes a reflexive component that is cognizant of existing dictionaries.
1.6. Contributions by the User The activity of referencing in the last decades have been characterized by, among other things, a comprehensive use of the internet as data source. Lexicographers must realize that this model will increase. Online dictionaries are the tools used by people who are active participants in information retrieval. Many lexicographers employ the internet as their most important source of data. We lexicographers became accustomed to an era of downloading, and we would do well to recognize that users of our products are consumers of information and that they rely primarily, and in many cases exclusively, on the internet. This poses certain challenges to lexicographers, and one response to these challenges is the planning and compilation of online monofunctional dictionaries instead of standalone products. Online dictionaries are a response to the needs of users to retrieve information by means of downloading. However, the information era in which we live, the wiki era (see Gouws, 2009), has also witnessed the emergence of a need not only for downloading but also for uploading. Earlier in this paper I mentioned that we should not move towards lexicographic colonialism. Uploading offers lexicographers the opportunity to enhance a spirit of lexicographic democratization.
9781441128065_ch01_finals_txt_print.indd 28
7/6/2011 11:09:55 PM
Learning, Unlearning and Innovation
29
In this regard, the process of simultaneous feedback (see De Schryver & Prinsloo, 2000), where lexicographers receive feedback from their users while busy compiling the dictionary, can be supplemented by a process of continuous feedback, where users upload their comments and suggestions for the lexicographer to take into consideration. The online network Facebook illustrates an active uploading system. If all the users of Facebook lived in one country, it would be the country with the world’s third-largest population – beaten only by India and China. We are living in the age of uploading, and lexicog raphers will do well to participate by not only creating opportunities for users to upload suggestions that might be helpful to the lexicographer, but also by uploading data themselves onto different sources of reference.
1.7. Conclusion The focus of this volume is on e-lexicography and the construction of internet dictionaries. Looking back at the development of the theory and practice of lexicography, it is clear that for too long the practice of printed dictionaries endured without a sound theory; for too long lexicography did not establish itself as an independent discipline; for too long the pool of lexicographers was restricted to experts from a single field; for too long innovation in the lexicographic practice was impeded by its theory being based on practice, not preceding that practice; for too long lexicographic theory was exclusively directed at the production of dictionaries. Looking to the future, the planning and compilation of electronic dictionaries and the further development of a coherent and medium-unspecific theory we must unlearn a great deal of what we know, and we must learn anew so that we produce innovative reference tools, including dictionaries.
9781441128065_ch01_finals_txt_print.indd 29
7/6/2011 11:09:55 PM
Chapter 2
Access to and Presentation of Needs-Adapted Data in Monofunctional Internet Dictionaries Henning Bergenholtz
2.1. Status Quo of Present Theoretical and Practical Lexicography It cannot be said that theoretical lexicography has had a significant effect on practical lexicography. On the contrary, it can be contended that practical lexicography has had a significant effect on theoretical lexicography, as the latter has to a large extent consisted of contemplative analyses of existing dictionaries – and the theories constructed on this basis are therefore of a deconstructive nature. This is also one of the reasons why both practical and theoretical lexicography to some extent suffer from the same errors and misunderstandings: 1. Dictionaries are classified as mere empty statements, as commodities. Although this is acknowledged by many theorists, this statement has by and large had no consequences. In particular, meta-lexicographic theories do not – as is common practice in the conceptualization and production of commodities – take the real purpose of the dictionary as a point of departure in conceptual considerations and before dictionaries are compiled. 2. One consequence of this non-functional thinking is apparent in the practice of existing dictionaries and in almost all theoretical considerations, since polyfunctional dictionaries are deemed to be the sole and natural possibility. 3. Such polyfunctional dictionaries – and this is true of printed as well as electronic dictionaries – list as many item types and as much data as possible in order to fulfil their diverse functions. The overwhelming amount of data presented in such polyfunctional dictionaries makes it difficult or
9781441128065_ch02_finals_txt_print.indd 30
7/6/2011 11:10:17 PM
Access to and Presentation of Needs-Adapted Data
4.
5.
6.
7.
31
even impossible for the user to find the individual piece of information sought. This often results in information overload or even termination of the search, as happens in many Google searches. Meta-lexicographic considerations also assume a polyfunctional concept, where the general-language lexicography focuses on the linguistic phenomena instead of on the user’s needs. Many meta-lexicographers are actually linguists, who assume that all users of dictionaries also want to see all the items in which a linguist is interested, because of the nature of their scientific work. However, when a user has a reception problem, an explanation of the meaning, and this explanation only, would normally be exactly what the user needs to solve this problem. In a monofunctional reception dictionary, such information stress would not occur, since it would show only the required explanation. Furthermore, and this is one of the main negative surprises of the past ten years of lexicographic research and practice, truly novel attempts to develop new presentation forms and access options are more the exception than the rule. The most exciting might be the automatically compiled dictionaries such as the Wortschatz Universität Leipzig.1 (For the time being, however, the results do not satisfy any recognizable need for information, if one ignores the lexicography that is called lexicotainment [reading dictionaries for entertainment or to kill time]). It is also true that lexicographic interest in the needs of users, who require fast and reliable access to data in information tools, has been astonishingly scant. In this regard, it was particularly the access structure that was directly counterproductive, as the research was based on the lexicographer’s idea of the best way to obtain certain data. For this reason, no attention was paid to the various user-relevant phenomena such as search steps, or search time or search paths; in relation to this see Bergenholtz and Gouws (2010). In particular, it might be observed that the large number of corpus-based and computer-based lexicographic contributions do not – as is usually the case in the production of commodities – take the real purpose of the dictionary as a point of departure for all conceptual decisions when dictionaries are planned and compiled. Programmes are compiled that look for collocations or examples for all forms of dictionaries; corpora are compiled which should supposedly be used for all forms of dictionaries. Corpora, computer programs, pencils, etc., and also the people who work as computer experts, are helpers and are expected to produce special proposals, computer programs, corpora, etc., for certain types of dictionaries – as specified by the lexicographer. Today, the opposite situation often prevails. Lexicographers are required to adapt their work and their data selection and processing options to the results generated by the computer.
9781441128065_ch02_finals_txt_print.indd 31
7/6/2011 11:10:18 PM
e-Lexicography
32
2.2. Access to Printed Dictionaries and Internet Dictionaries Yet another point of criticism against meta-lexicography from 1985 up to the present day is the totally unscientific and actually almost meaningless surveys, in which the respondents were not selected in accordance with the principles of social science. Most lexicographic surveys are associated with percentage indications and generalizing statements, although at best they can only be applicable to the particular respondent. Such surveys usually involve 20 to 100 students, who are, in any case, being taught by the survey leader (mostly students of one or more languages). Thus, the survey leader has selected the respondents himself, without due regard for the rules applicable in the case of a representative selection of a whole population. A typical example of such unscientific questioning is that of Bogaards (1990), who asked 28 Dutch students of French which part of a collocation or of an idiom they would select as a search string when consulting a dictionary; for example:
un panneau fibreux blanchi sous le harnais
noun 4 16
adjective 24 12
With such questions and such a non-representative group, Bogaards generates statistical data on the use of dictionaries and research for set phrases. Not only are the questions and the answers unclear, as one cannot know what the respondents would actually do in a concrete situation (would they, for example, terminate a search if the initial result were negative?). In addition, the lexicographic consequences are unclear. For instance, should one, in the first example, list the phrase under the adjective and, in the second, under the noun? If so, should one then carry out surveys for all expressions consisting of more than one word? And should the same small population of respondents be decisive in all cases? In fact, two respondents who both look up the noun or the adjective show that, if users are to benefit, the set phrases should be listed both under the noun entry and the adjective entry. Moreover, there are surveys where the respondents – contrary to all rules of social science – have been selected by answering questionnaires on the internet. But in reality, it remains debatable whether ‘correct’ (i.e. scientifically founded) surveys would be worthwhile, as the questions usually enquire about linguistic phenomena and not about genuine information needs. In other words, what is asked is whether the respondent was looking for collocations or for examples, and not whether the dictionary was being consulted to solve reception, text production or translation problems. But even if real needs were the object of the enquiry, the answers would in the fi rst place be of interest to commercial lexicography. As far as scientific considerations
9781441128065_ch02_finals_txt_print.indd 32
7/6/2011 11:10:18 PM
Access to and Presentation of Needs-Adapted Data
33
are concerned, it is not at all certain that, if a given information need exists among only 10 per cent of the respondents, this need should not be covered. Log file analyses (Bergenholtz & Johnsen, 2005, 2007, 2011; and Almind, 2008) show that while general statistical results might lead to some interesting conclusions, only the individual user’s search behaviour produces truly relevant results, especially in the form of data that can be used for entirely new dictionary concepts. Starting from this postulate, I would now like to present the results of two tests, one with five respondents and the other with one. No valid results from a representative investigation can, therefore, be presented. Instead, it is an analysis of individual visits to the data that is the material aspect of the tests. Both the individual and the overall results show that all the talk about access structures is of little importance, since each respondent in each individual case chooses a search path, with search stages, that distinguishes access from those of other respondents. One could, of course, describe an access structure for each individual respondent for each individual search, but this does not appear scientifically relevant. What is relevant is to look at the selected individual search path and see which dictionaries – with which macro and micro structures, different search-relevant graphic markers in printed dictionaries and research sequences in certain fields in the database of an electronic dictionary – are associated with a particularly fast or particularly slow access. The first test, a case study, deals with the use of molecular biology dictionaries. The starting point is an instructive explanation in a dictionary that leads to a follow-up question: gene Your genes are the parts of the cells in your body responsible for passing on some of your physical characteristics to your children. (Harrap’s Essential English Dictionary) We must first ask ourselves whether this dictionary definition must be taken literally. The following question is put to the respondent: ‘Do genes occur only in humans?’ Both the extract from the dictionary and the question are submitted to the respondent in writing. The overall result, classified by respondents, yielded the following results:
Dictionary
Search time Proband 1 / 2 / 3 / 4 / 5
Gentechnologie von A bis Z
31’’ / 1’ 48’’ / 2’ 12’’ / 24’’ / 1’ 02’’ Average: 1’ 11’’
++–++
Dictionary of Biotechnology
18’’ / 1’ 28’’ / 28’’ / 14’’ / 13’’ Average: 30’’
+++++
Biotechnology from A to Z
26’’ / 2’ 01’’ / 38’’ / 13’’ / 51’’ Average: 50’’
+++++
A Multilingual Glossary of Biotechnological Terms
28’’ / 52’’ / 40’’ / 41’’ / 3’ 14’’ Average: 1’ 11’’
++–+–
9781441128065_ch02_finals_txt_print.indd 33
Result Proband 1–5
7/6/2011 11:10:18 PM
34
e-Lexicography
Dictionary
Search time Proband 1 / 2 / 3 / 4 / 5
Result Proband 1–5
English–Spanish Dictionary of Biotechnology
45’’ / 1’ 59’’ / 44’’ / 33’’ / 1’ 12’’ Average 1’ 02’’
+++++
Wörterbuch der Gentechnik
21’’ / 25’’ / 27’’ / 20’’ / 50’’ Average: 29’’
+++++
Biotechnology Glossary
1’ 42’’ / 3’ 54’’ / 1’ 05’’ / 1’ 40’’ / 2’ 20’’ Average: 2’ 06’’
–+–++
IATE (The EU’s multilingual term base)
1’ 13’’ / 2’ 26’’ / 49’’ / 4’ 08’’ / 1’ 06’’ Average: 1’ 56’’
–+–––
Hypermedia Glossary of Genetic Terms
44’’ / 4’ 56’’ / 27’’ / 1’ 07’’ / 1’ 31’’ Average: 1’ 45’’
+––++
Microbial Genetics Glossary
51’’ / 3’ 09’’ / 12’’ / 1’ 00’’ / 1’ 49’’ Average: 1’ 24’’
++–––
The last three dictionaries in the list above are internet dictionaries and the other seven are printed dictionaries. The total search time for each respondent is shown in the second column. In the third column, a plus (+) indicates that the respondent found an answer and a minus (-) that he found no answer. It is clear that access to the internet dictionaries is not faster than to the printed dictionaries in these case studies. On the contrary, the three dictionaries with the fastest access time are printed dictionaries. Of the ten dictionaries, the three internet dictionaries occupy seventh, eighth and ninth places. Only the printed dictionary with a systematic macrostructure had slow access, as the users experienced difficulty distinguishing between the different registers and, especially, remembering the long five-digit numbers until the dictionary entries sought were found. One of the respondents in particular often had to return to the register entries several times during one and the same search, because he no could no longer recall the number. It may be concluded that this dictionary was not made for users with a poor short-time memory for numbers or codes. Most interesting are the individual differences, ranging, for example, from 3 to 40 search stages, before the respondent found an answer or had to conclude that no answer was to be found. A similar case study was made with one respondent comprising a search for the meaning of given fi xed expressions, partly in general-language dictionaries and partly in specialist dictionaries. Dictionaries (1), (2) and (7) are internet dictionaries. It can be seen that the two dictionaries with the fastest access time are internet dictionaries. In other words, internet dictionaries can – at least in this case study and unlike in the first case study – allow faster access than do printed dictionaries, which was to be expected, in any case. It should also be noted, however, that the dictionary with the second slowest
9781441128065_ch02_finals_txt_print.indd 34
7/6/2011 11:10:18 PM
Access to and Presentation of Needs-Adapted Data Dictionary
average search steps
(1) Meaning of Fixed Expressions
1,6
(2) Den Danske Netordbog
2,5
(3) Den Danske Ordbog
5,9
(4) Ordbog over det danske sprog (printed)
7,2
(5) Ordbog over det danske sprog (Internet)
5,3
(6) Danske Talemåder
4,9
(7) Nudansk Ordbog (8) Talemåder i dansk
3,9
average search time
35
positive negative uncertain result + average result + average result + average search time search time search time
20”
10 searches 20”
25”
8 searches 24”
1 search 33”
1 search 28”
56”
8 searches 40’
1 search 2’ 47”
1 search 1’ 13”
2’ 39”
8 searches 2’ 35’
1 search 4’ 36”
1 search 1’ 8”
1’ 11”
6 searches 41’
3 searches 2’ 13”
1 search 1’ 1”
45”
5 searches 48”
4 searches 43”
1 search 31”
5 searches 36” 3 searches 30”
4 searches 1’ 51” 6 searches 48”
1 search 10” 1 search 15”
1’ 03” 4,2 39”
access time is also an internet dictionary – surpassed, in the negative sense, only by the printed version of the same dictionary: Remarkably, the dictionary with an average access time in the poor half (Den Danske Ordbog) joins an internet dictionary in second place if the order is changed to a list of dictionaries which were able to offer an answer to the questions fastest. And here it must be conceded that slow access is still better than no answer. The problem is that many users give up the search after 1–2 minutes in a printed dictionary and between 30 seconds and 1 minute in an internet dictionary. There are two important criteria when evaluating the use and quality of a dictionary: (i) Can the required data be found in the dictionary? More precisely: Can the user find the item that contains the answer to the question that prompted the search? One might, for example, see that ude i hampen, an expression indicating that a given statement or act is exaggerated or untrue, could be found in the printed version of the Ordbog over det danske sprog but not in the electronic version of the same dictionary. (ii) How long did the search take? Many users quit after a while, some sooner than others. The best dictionary is probably the one rendering a usable result in a short time. The interesting question is whether it can be explained that some dictionaries have a relatively
9781441128065_ch02_finals_txt_print.indd 35
7/6/2011 11:10:19 PM
36
e-Lexicography
quick access and others a relatively slow access. One cannot take it for granted that each dictionary user will continue searching for a full six minutes in the printed version of the Ordbog over det danske sprog, like our main respondent did. In general, it is also noteworthy and amazing that in the first component of the case study, a multivolume dictionary (Den Danske Ordbog) had a much quicker search time compared with both small single-volume specialized dictionaries of set phrases. A provisional explanation for this is that the multivolume dictionary employs good and clear markers, although few are graphic, to identify specific types of items. Contrary to this, the specialized dictionaries have an unclear layout and they could have had a much quicker access if one or more indexes had been added. Another feature is quite apparent in the first component: the static description of access, with the access structure regarded as identical to or part of the macrostructure, as in Wiegand (2007, 2008), does not comply with the real user situation; for instance, one can see, in the printed edition of Ordbog over det danske sprog, that the user often moves up and down like a yo-yo in the same article, and also often moves from one volume to another and back again in a multivolume dictionary. Once again, the interesting analyses start with the individual questions, where it is not only large differences in the search times which are observed, but also large differences in the number of search steps. A given number of search steps does not always imply the same search time; for example, the following question required 3 steps and 14 seconds in one case and 3 steps and 54 seconds in another. Basically, however, it is observed that more steps result in longer search times, that is, slower access. The search for the fixed expression ude i hampen
Dictionary
search steps
search time
result
Meaning of Fixed Expressions (internet dictionary with 12,000 fi xed expressions)
3
14’’
positive
Den Danske Netordbog (general-language internet dictionary)
3
39’’
positive
Den Danske Ordbog (single-volume general-language dictionary)
4
31’’
positive
15
5’ 46’’
positive
Ordbog over det danske sprog (internet edition of the 28-volume general-language dictionary)
7
1’ 36’’
negative
Danske Talemåder (single-volume dictionary of idioms)
4
56’’
positive
Nudansk Ordbog (single-volume, general-language dictionary)
3
18’’
positive
Talemåder i dansk (single-volume dictionary of idioms)
3
54’’
negative
Ordbog over det danske sprog (28-volume general-language dictionary)
9781441128065_ch02_finals_txt_print.indd 36
7/6/2011 11:10:19 PM
Access to and Presentation of Needs-Adapted Data
37
(literally: outside in the hemp, meaning ‘a specific statement or action is totally exaggerated or even untrue’) yielded the following results:
2.3. Four Monofunctional Internet Dictionaries with Set Phrases The dictionary with fastest access in the above-mentioned case study was Meaning of Fixed Expressions, a dictionary compiled, like three other dictionaries of fi xed expressions, from one and the same database. In order to remain au fait (up-to-date) with current events in this spring of 2010, after the destructive consequences of the Icelandic volcano, a user might be looking for fi xed expressions with aske (ash), using it as his search string and finding the following:
fra asken i ilden
from the ash into the fire
i aske
in ash
sæk og aske
sackcloth and ashes
These three expressions are not lemmas in the ordinary sense, as the headings only present the common denominator of the fi xed expressions, which have a particular meaning and belong to a shared nucleus of expressions. When a user finds an expression with such a nucleus, for example, klæde sig i sæk og aske (to dress in sackcloth and ashes) and is not sure what the expression means, he can use this expression or a part of it as a search string in order to receive assistance with a reception problem. For example, he writes i sæk og aske and presses the button ‘understand an expression’, and then finds the following entry in the dictionary Meaning of Fixed Expressions:
Betydning udtryk for, at nogen kommer fra en ubehagelig situation ud i noget, der er endnu værre
Meaning Expression used when someone escapes from an unpleasant situation into one that is even worse
Faste vendinger bringe fra asken i ilden komme fra asken i ilden komme fra asken og i ilden
Fixed expressions bring from the ash into the fire getting from the ash into the fire getting from the ash and into the fire
9781441128065_ch02_finals_txt_print.indd 37
7/6/2011 11:10:19 PM
38
e-Lexicography
He writes i aske and presses the button ‘understand an expression’, and then the following entry in the dictionary Meaning of Fixed Expressions appears: Betydning udtryk for, at noget er eller bliver ødelagt ved brand Der er her tale om, at noget, som oftest en bygning, er fuldstændig ødelagt og nedbrændt til grunden.
Meaning Expression meaning that something was destroyed by fire, usually a building that burnt down to the ground
Faste vendinger henfalde i aske ligge i aske lægge i aske nedlægge i aske
Fixed expressions to turn to ash to lie in ash to lay in ash to lay in ash
Lastly, he writes sæk og aske and presses the button ‘understand an expression’, at which he finds the following entry in the dictionary Meaning of Fixed Expressions:
Betydning udtryk for at vise ydre tegn på sorg eller anger
Meaning Expression for certain external signs of sorrow, anguish or regret
Faste vendinger iføre sig sæk og aske klæde sig i sæk og aske ligge i sæk og aske omvende sig i sæk og aske sidde i sæk og aske være iført sæk og aske
Fixed expressions to put on sackcloth and ashes to wear sackcloth and ashes to lie in sackcloth and ashes to repent in sackcloth and ashes to sit in sackcloth and ashes to be dressed in sackcloth and ashes
If the user knows the expression or has obtained it by a search as shown above, he might be in doubt as to how the expression can be used. He enters the search string klæde i sæk og aske (to put on sackcloth and ashes) and presses the button ‘write a text’, encountering the following entry in the dictionary (this time, unlike in other entries, without collocation entries):
Faste vendinger iføre sig sæk og aske klæde sig i sæk og aske ligge i sæk og aske omvende sig i sæk og aske sidde i sæk og aske være iført sæk og aske
9781441128065_ch02_finals_txt_print.indd 38
Fixed expression to put on sackcloth and ashes to wear sackcloth and ashes to lie in sackcloth and ashes to repent in sackcloth and ashes to sit in sackcloth and ashes to be dressed in sackcloth and ashes
7/6/2011 11:10:19 PM
Access to and Presentation of Needs-Adapted Data
39
Betydning udtryk for at vise ydre tegn på sorg eller anger
Meaning used to say that someone shows signs of grief or sadness
Grammatik nogen klæder sig i sæk og aske
Grammar someone wears sackcloth and ashes
Eksempler Askeonsdag er den første af fastens 40 dage. I middelalderen mødte de bodfærdige denne dag i kirke iført sæk og aske. Sammen med afl ad skulle det syndige menneske også sone sin straf ved at få besked af præsten på at bede et vist antal Ave Maria – og evt. gå i sæk og aske og leve af vand og brød i en aftalt periode.
Quotations Ash Wednesday is the first of the 40 days of Lent. In the Middle Ages, penitents went to church dressed in sackcloth and ashes on this day. Besides getting a letter of indulgence, the priest could also order the sinner to do penance by saying a number of Ave Marias – and, if necessary, to go dressed in sackcloth and ashes and to take nothing but bread for a set period.
Here the grammatical information, namely, that the subject should be a person, is given; in addition, there are two examples, which can be used for building a text. However, it is possible that the user might want further information, for instance, about the history behind the expression. He can then type in the same search string, press the button ‘know more about a fixed expression’, and then he will find the following entry:
Anmærkninger Udtrykket bruges både i Det Gamle og Det Nye Testamente, hvor personerne giver udtryk for stor sorg eller beklagelse, fx i Esters Bog 4.1: ‘Da Mordokaj fik at vide, hvad der var sket, flængede han sine klæder; han klædte sig i sæk og aske, gik rundt i byen og skreg højt og bittert’. Man tager simple klæder på og strør aske på hovedet: ‘De fastede den dag og klædte sig i sæk og strøede aske på deres hoved og flængede deres klæder’ (Første Makkabæerbog 3,47).
Note The expression occurs in the Old and the New Testaments when people express great sorrow or pity, for example: ‘When Mordecai perceived all that was done, Mordecai rent his clothes, and put on sackcloth with ashes, and went out into the midst of the city, and cried with a loud and a bitter cry; And came even before the king’s gate: for none might enter into the king’s gate clothed with sackcloth.’ (Esther 4.1–2)
Associationer angerfuld bedrøvelse fortrydelse kval smerte
Associations Bible clothes grief mourning repentance
9781441128065_ch02_finals_txt_print.indd 39
7/6/2011 11:10:20 PM
40
e-Lexicography
The associations also serve as links to other fi xed expressions with the same association. The association bedrøvelse (sorrow) refers the user to, among others, the following expressions, about which one can obtain information depending on whether one has a reception problem, a text production problem or wants to know as much as possible about a particular fi xed expression:
halsen snører sig sammen hjertet synker ned i bukserne struben snører sig sammen hænge med næbet hænge med næbbet hænge med hovedet hænge med skuffen hænge med ørerne være ked af det være iført sæk og aske
one’s throat tightens up one’s heart sinks into his shoes one’s throat tightens up with one’s pecker down with one’s pecker down to hang one’s head with one’s jaw hanging with drooping ears to be sad to be dressed in sackcloth and ashes
The purpose of these associations is to afford the user the possibility of finding other expressions with the same association, an expression with a meaning similar to the one just found. This inclusion of associations is a novelty in electronic lexicography and one that can be developed further; more about this at the end of this chapter. At this point, it should first be added that the concept of ‘association’ was taken from recent psychology and is used quite informally: the lexicographer who is working on a dictionary entry has a maximum of 30 seconds to name up to five associated words intuitively (the dictionary in question, Meaning of Fixed Expressions, is produced by only two persons, namely, Esben Bjærge and the author of this chapter). In other words, it is not underpinned by any philosophical theory or systematic specialist ontology, but is quite simply the listing of intuitively indicated and semantically related individual words. The access addressed so far concerns the expression (the expression side, in Hjelmslev’s terminology). It often happens, however, that the user of a dictionary might not know the fi xed expression or cannot exactly remember the idiom or saying as it is actually known and which he would like to use in a particular text. For this reason, the button ‘find an expression with a particular meaning’ was introduced in the above-mentioned dictionary. This allows a word to be entered that covers the meaning or part of the meaning of an unknown or temporarily forgotten fi xed expression. (Once again, there are possibilities for improvement of the lexicographic concept, a matter which will be discussed at the end of the chapter). When the search string misfornøjet (dissatisfied) is used, the following headings of entries with fi xed expressions are obtained, which have a meaning or
9781441128065_ch02_finals_txt_print.indd 40
7/6/2011 11:10:20 PM
Access to and Presentation of Needs-Adapted Data
41
association that corresponds to misfornøjet:
ved grød mildt kraftudtryk, som signalerer irritation
by golly mild oath indicating annoyance
besvære sig udtrykke sin utilfredshed med noget eller nogen
to object to express dissatisfaction with something or someone
bide fra sig komme med protester eller indvendinger mod noget, fordi man er utilfreds med det
snap to protest or raise objections against something one is not happy with
fanden annamme mig kraftudtryk, som signalerer stor misfornøjelse eller irritation
(the) devil take it expletive expressing strong annoyance or irritation
fanden stå i det kraftudtryk, som signalerer stor misfornøjelse eller irritation
what the devil expletive expressing strong annoyance or irritation
for guds skyld mildt kraftudtryk, som signalerer stor misfornøjelse eller irritation
for God’s sake mild oath indicating annoyance or irritation
for himlens skyld mildt kraftudtryk, som signalerer stor misfornøjelse eller irritation
in heaven’s name mild oath indicating annoyance or irritation
give ondt af sig udtrykke sin utilfredshed med noget eller nogen
sound off to express dissatisfaction with something or someone
lugten i bageriet udtryk for, om nogen kan eller ikke kan lide forholdene, som de er
the smell in the bakery Expression indicating that someone likes or dislikes the existing situation
pis og papir udtryk for misfornøjelse eller irritation over et udsagn uden reelt indhold
piss and paper expression of dissatisfaction or irritation about a statement or real content
It cannot be denied that lists of this length border on the kind of information overload I criticized at the start of this chapter. I make suggestions for avoiding such long lists at the end of the paper, but these suggestions have yet to be put to the test.
9781441128065_ch02_finals_txt_print.indd 41
7/6/2011 11:10:20 PM
e-Lexicography
42
2.4. Database Fields and Function-Specific Searching in this Database So far, I have simply presented the individual dictionaries with fi xed expressions without discussing the default searches in the database and without detailing the data presentations. This has been done in accordance with the function theory of lexicography (see Bergenholtz & Tarp, 2002 and 2003), depending on the intended functions of the respective monofunctional dictionary. I will provide the necessary explanations below in conjunction with dictionary examples with vulkan (volcano), that is, with topical examples at the time when the Icelandic volcano is spreading its ash over Europe and North America. With reference to the basic functions in information searches (the communicative functions of reception and text production and the cognitive function ‘knowledge of fi xed expressions’), together with the two fundamentally different search options (expression-based search or meaning-based search), I will clarify the search possibilities and access to four different dictionaries compiled from the same database. The database contains the following fields for different entry types (the content of the individual fields will be explained in the context of the search in each of the four dictionaries). Field 7 comes closest to the lemma in a printed dictionary. Several variants can be entered, as in the case of a multiple lemma. All or part of an expression from field 7 is entered into field 1, on a more or less purely mechanical basis as all the expressions in field 7. The only function of field 1 is that it is the name of the dictionary entry, a name that can be used as a link from other dictionary entries, for example, as an indication of a synonymous fixed expression:
1.
Core field
2. Meaning 3.
Further meaning item
4. Grammar (more items if applicable) 5.
Remark(s)
6.
Internet link
7.
Fixed expression(s)
8.
Style
9. Classification of the fi xed expression 10. Collocation(s) 11.
9781441128065_ch02_finals_txt_print.indd 42
Example(s)
7/6/2011 11:10:20 PM
Access to and Presentation of Needs-Adapted Data
43
12. Synonym(s) 13. Antonym 14.
Association(s)
Access to the first dictionary is gained by pressing the button ‘I read the text, but do not understand the meaning of a fi xed expression’. Here the user enters an expression or part of an expression in the search field and obtains the desired information, that is, the meaning of the fi xed expression. This dictionary is called Meaning of Fixed Expressions. When a search is made in this dictionary, the program looks in two of the fields in the database in the order indicated by figures2 (see column 1). For this dictionary, a maximizing search is carried out. The user obtains one or several entries with the content of three of the fields in the database (see column 3). If more than ten entries are found, they are displayed as a list, where only the content of the core field is shown. In other words, it is only a small part of the database that the user receives as an entry, but it is exactly the part that is needed to solve a reception problem.
Fields searched + order of search
1
2
Field
1. 2. 3. 4. 5. 6.
Core field Meaning Further meaning item Grammar Remark(s) Internet link
7. Fixed expression(s) 8. Style 9. Classification of the fi xed expression 10. Collocation(s) 11. Example(s) 12. Synonym(s) 13. Antonym 14. Association(s)
Dictionary entry
Indication whether a list is needed 1
1 2
3
We search with the string på vulkaner (on volcanoes). The same result is displayed as with the longer expression være på vulkaner (to be on volcanoes), or danse med vulkaner (dancing with volcanoes). It should be noted here that the entry does not start with one or more fi xed expressions, but with the word the
9781441128065_ch02_finals_txt_print.indd 43
7/6/2011 11:10:20 PM
e-Lexicography
44
user searched, that is, the explanation. The different fi xed expressions which contain the core på vulkaner and have this meaning then follow. In this case two entries are found:
Betydning udtryk for at gå i byen eller tage på en ferie for at feste og more sig
Meaning Expressing for going out to have a really wild party
Faste vendinger komme på vulkaner komme ud på vulkaner tage ud på vulkaner være på vulkaner være ude på vulkaner
Fixed expressions to go out on volcanoes to get on top of volcanoes to go out on volcanoes to be on volcanoes to be out on volcanoes
Betydning udtryk for at befinde sig i en situation, som latent truer med at ændre sig til uovervindelige problemer eller føre til åbne stridigheder
Meaning Expression for being in a hazardous situation that can lead to insurmountable problems or blatant confrontations
Faste vendinger danse på vulkaner leve på vulkaner
Fixed expressions dancing on (top of) volcanoes living on volcanoes
The second dictionary is activated by pressing the button ‘I am writing a text with a specific fi xed expression’. Here the user enters a fi xed expression or part of it in the search field and obtains information about the use of the fi xed expression, including its meaning, grammar, collocations, specimen sentences, and synonymous or antonymous fixed expressions. In other words, the search is expression-specific. We call this dictionary Use of Fixed Expressions. When this dictionary is activated, four fields of the database are searched; however, this is a minimizing search, where the search is terminated after one field type has been searched and other fields are therefore not searched. The items relevant to text production are reflected as figures in column 3; if there are more than ten entries, a list is shown: Fields searched + order of search
Field
1
1. 2. 3. 4.
9781441128065_ch02_finals_txt_print.indd 44
Core field Meaning Further meaning item Grammar
Order in diction- Indication whether ary entry a list is needed
2 3 4
1 2
7/6/2011 11:10:21 PM
Access to and Presentation of Needs-Adapted Data
Fields searched + order of search
2
3 4
Field 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Remark(s) Internet link Fixed expression(s) Style Classification of the fi xed expression Collocation(s) Example(s) Synonym(s) Antonym Association(s)
45
Order in Indication whether dictionary entry a list is needed
1
5 6 7 8
We search with the string på vulkaner (on volcanoes) and obtain two dictionary entries. Only the first is quoted:
Faste vendinger komme på vulkaner komme ud på vulkaner tage ud på vulkaner være på vulkaner være ude på vulkaner
Fixed expressions to get on a volcano to get on top of volcanoes to go out on volcanoes to be on volcanoes to be out on volcanoes
Betydning udtryk for at gå i byen eller tage på en ferie for at feste og more sig
Meaning Expression for going out to have a really wild party
Grammatik nogen er på vulkaner
someone is on volcanoes
Eksempler Han havde fulgt traditionen med at være på vulkaner natten før påmønstring. Men lørdag aften var jeg i stedet rigtigt ude på vulkaner, næste dag havde jeg nogle ordentlige tømmermænd.
Example He observed the tradition that in the night before reporting for duty one is on volcanoes. But instead I was really out on volcanoes on Saturday night, and the next day I had a terrible hangover.
For the dictionary entry with volcanoes, there are no collocations, so I am giving another entry with these. Like individual words, fixed expressions also have collocations, and they are specific for each fi xed expression. As in the
9781441128065_ch02_finals_txt_print.indd 45
7/6/2011 11:10:21 PM
e-Lexicography
46
case of individual words, one can quote only a few of the collocations that actually occur: Faste vendinger ligge som sild i en tønde sidde som sild i en tønde stå som sild i en tønde være som sild i en tønde
Fixed expressions to lie like herrings in a barrel to sit like herrings in a barrel to stand like herrings in a barrel to be like herrings in a barrel
Betydning udtryk for, at mange mennesker opholder sig meget tæt på hinanden
Meaning Expression for many people packed close together
Kollokationer sidde som sild i en tønde til forelæsningerne sidde på cafeen stuvet sammen som sild i en tønde ligge som sild i en tønde i den stegende sommerhede fritidsfiskere, der står som sild i en tønde langs åen stå som sild i en tønde i toget på vej til arbejde være pakket sammen som sild i en tønde
Collocations to be packed like herrings in a barrel for the lecture to sit packed together in a pub like herrings in a barrel to lie in the burning sun like herrings in a barrel anglers standing on the bank like herrings in a barrel to be crammed into the commuter train like herrings crammed like herrings in a barrel
The third dictionary is activated by pressing the button ‘I am writing a text and am looking for a fixed expression with a specific meaning’. Here the user can enter one or several words with a specific meaning and find expressions with this meaning or part of this meaning. The user then receives information about the use of the expression, including its meaning, grammar, collocations, specimen sentences, and synonymous or antonymous fixed expressions. In other words, the point of departure is a meaning, which can be very wide and can, therefore, yield many hits. If a more restricted meaning is used as the search string, fewer hits might be found or even none at all. This dictionary is called Fixed Expressions with a Certain Meaning. When a search is made in this dictionary, the program looks in four of the fields in the database, as in the case of a maximizing search. The data is presented as in the dictionary mentioned above (Use of Fixed Expressions), as the function is the same, that is, help with text production problems: Fields searched + order of search
Field
1 2 3
1. 2. 3. 4.
9781441128065_ch02_finals_txt_print.indd 46
Core field Meaning Further meaning item Grammar
Order in Indication whether dictionary entry a list is needed 2 3 4
1 2
7/6/2011 11:10:21 PM
Access to and Presentation of Needs-Adapted Data Fields searched + order of search
Field 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
4
47
Order in Indication whether dictionary entry a list is needed
Remark(s) Internet link Fixed expression(s) Style Classification of the fi xed expression Collocation(s) Example(s) Synonym(s) Antonym Association(s)
1
5 6 7 8
As an example, entering the meaning item fest (feast) we obtain a fairly long list with core expressions for different dictionary entries, including idioms, sayings and slogans, among others:
Når galt skal være, da jo galere des bedre. udtryk for, at man denne gang skal feste eller fråse så meget, man kan, uden at tænke på omkostningerne eller følgerne
If it has to be wild, then the wilder the better. Expression indicating that one is going partying or bingeing regardless of cost or consequences.
på vulkaner udtryk for at gå i byen og feste efter alle kunstens regler
on volcanoes Expressing for going out to have a really wild party.
Skal der være fest, så lad der være fest. udtryk for, at man denne gang skal feste eller fråse så meget, man kan, uden at tænke på omkostningerne eller følgerne
If there’s going to be a party, let’s have a real party. Expression indicating that one is going partying or bingeing regardless of cost or consequences.
slå til skaglerne udtryk for at gå i byen og feste efter alle kunstens regler
to kick over the traces Expressing for going out to have a really wild party.
til man styrter udtryk for at drikke og feste, indtil man er næsten bevidstløs af druk og træthed
till you drop Expression for drinking and partying until you almost keel over from drunkenness or fatigue.
gå i byen tage i byen eller til en festlig sammenkomst for at more sig med andre
to go to town going into town or to a party to have fun with others
9781441128065_ch02_finals_txt_print.indd 47
7/6/2011 11:10:22 PM
e-Lexicography
48
Male byen rød udtryk for at gå i byen og feste efter alle kunstens regler
painting the town red Expressing for going out to have a really wild party.
Æd, drik og vær glad. udtryk for at slippe hæmningerne og more sig
Eat, drink and be merry Expressing for throwing all inhibitions overboard and having fun.
We can then click on the core expression to get to the dictionary entry that gives a meaning that fits the context. An entry will then be displayed with a set of corresponding data, as was illustrated above in the Use of Fixed Expressions dictionary. Although the data presentation of the two dictionaries is identical, the dictionaries are not. In the dictionary Fixed Expressions with a Certain Meaning, access is gained by means of a meaning-oriented search, as in a printed dictionary with a systematic macrostructure and with one or more registers; however, the dictionary Use of Fixed Expressions corresponds to a dictionary with an alphabetic macrostructure without registers. But the information the user is looking for to assist him with the production of a text is the same for both dictionaries. The fourth dictionary is activated by pressing the button ‘I want to know as much as possible about fi xed expressions’. Here the user can enter a fi xed expression. He then receives all the information he received with the first three dictionaries, but in addition information about the background of the fixed expressions, including historical data. He is also informed about the type of the respective fi xed expression, in other words, whether it is a saying, a slogan or an idiom. Lastly, he is also given information about the stylistic level of the fi xed expression (there are three possibilities): (i) elevated style, (ii) low style or (iii) neutral style. This dictionary is called Knowledge about Fixed Expressions. When a search is carried out in this dictionary, the program looks in five of the fields in the database; this is a minimizing search. All fields are displayed in the order shown in column 3.
Fields searched + order of search
Field
1 2 3
1. 2. 3. 4. 5. 6. 7. 8.
5
9781441128065_ch02_finals_txt_print.indd 48
Core field Meaning Further meaning item Grammar Remark(s) Internet link Fixed expression(s) Style
Order in dictionary entry
Indication whether a list is needed
1 5 6 11 8 7 4 3
1 2
7/6/2011 11:10:22 PM
Access to and Presentation of Needs-Adapted Data Fields searched + order of search
Field
Order in dictionary entry
9. Classification of the fi xed expression 10. Collocation(s) 11. Example(s) 12. Synonym(s) 13. Antonym 14. Association(s)
4
49
Indication whether a list is needed
2 12 13 9 10 14
For example, if the search string være på vulkaner (to be on volcanoes) is entered, information on the type of expression, its style, as well as its historical background, is displayed in addition to the information from the other dictionaries discussed above: på vulkaner idiom, neutral
on volcanoes idiom, neutral
Betydning udtryk for at gå i byen og feste efter alle kunstens regler
Meaning Expressing for going out to have a really wild party.
Anmærkning Udtrykket tilskrives den franske statsmand (og tidligere journalist) Narcisse-Achille Comte de Salvandy (1795–1830), som på en fest til ære for kongen af Napoli den 31.5.1830, dvs. kort før Julirevolutionen i Paris, som der her hentydes til, sagde: ‘Voilà, Monseigneur, une fête toute napolitaine : nous dansons sur un volcan !’ (Det er min herre en helt neapolitansk fest, vi danser på en vulkan)
Note The expression is attributed to the Frenchman (and former journalist) Narcisse-Achille Comte de Salvandy (1795–1830), who had been invited to a feast in honour of the King of Naples on 31.5.1830, that is, just before the July revolution in Paris, which he referred to in his famous remark: ‘Voilà, Monseigneur, une fête toute napolitaine : nous dansons sur un volcan !’ (This is, sir, a truly Neapolitan feast; we are dancing on a volcano!)
Associationer beruset fest fuld sjov
Associations drunk feast inebriated Fun
The entries in the last field are the ‘associations’ mentioned earlier. These are associative expressions, or expressions with a related meaning, for finding and writing, and for which the lexicographer, according to the lexicographic instruction, is allowed a maximum of 30 seconds. They are, therefore, not systematic concept items or the like, but simply associative expressions jotted
9781441128065_ch02_finals_txt_print.indd 49
7/6/2011 11:10:22 PM
50
e-Lexicography
down intuitively for use as search strings when the user consults dictionary 3 (Fixed Expressions with a Certain Meaning). It is assumed that the user also intuitively enters such meanings that simply occur to him at the time. The associations indicated in this way also serve as links, so that they can be employed to access all entries with the same association items. Two item types also offered by the dictionary Knowledge about Fixed Expressions are not included in the quoted entries on ash and volcanoes. I will, therefore, present further examples; first, links to other home pages. This possibility is obviously not an innovation, but, on the other hand, it is amazing that only very few internet dictionaries make use of this self-evident option. The main problem with such links is that many of them disappear again fairly soon, and that it is, therefore, necessary to check regularly, either manually or by means of computer programs, whether the links used still connect to active home pages. One cannot, and should not, quote such a link in every entry; of the 12,700 entries in Knowledge about Fixed Expressions, 544 (about 5 per cent) have a link: for example, the following dictionary entry:
den afskyelige snemand idiom, neutral
the abominable snowman idiom, neutral
Betydning et gigantisk mystisk menneskelignende dyr, som ifølge legenden skulle leve i Himalaya
Meaning an enormous man-like animal said to live in the Himalayas
Se også på internettet http://en.wikipedia.org/wiki/Yeti
Also see internet http://en.wikipedia.org/wiki/Yeti
Of course, it is not a real innovation that synonyms for fi xed expressions are provided in a fi xed expressions dictionary, although it amazing that it is done in only very few dictionaries of this type. However, it is an innovation in the sense that the synonym items are not generated by lexicographers, but automatically by the database program. When two fi xed expressions have the same items of meaning, they are listed as synonyms of each other, as in the following entry on herrings in a barrel:
som sild i en tønde idiom, neutral
like herrings in a barrel idiom neutral
Faste vendinger ligge som sild i en tønde sidde som sild i en tønde stå som sild i en tønde være som sild i en tønde
Fixed expressions to lie like herrings in a barrel to sit like herrings in a barrel to stand like herrings in a barrel to be like herrings in a barrel
Betydning udtryk for, at mange mennesker opholder sig meget tæt på hinanden
Meaning Expression for many people packed close together.
9781441128065_ch02_finals_txt_print.indd 50
7/6/2011 11:10:23 PM
Access to and Presentation of Needs-Adapted Data Synonymer sort af mennesker så fyldt som en Noahs ark
51
Synonyms black with people filled up like Noah’s ark
I also quote the historical remark from this article, one of the many examples in this dictionary in which historical errors in other manuals or dictionaries are pointed out:
Anmærkninger Mange idiomer findes i mange sprog, uden at det er klart, hvor og i hvilken sammenhæng udtrykket er opstået. Man kan så godt anføre et citat fra William Shakespeare (1564–1616) eller en af de andre store forfattere og tilskrive ham æren for opfindelsen. Ofte har de dog bare brugt et allerede kendt udtryk. Når det så står i en ordbog, fx Pia Jarvads ‘Bevingede ord’ fra 2006, at ‘som sild i en tønde’ stammer fra Miguel de Cervantes Saavedras (1547–1616) ‘Don Quijote’, er det dobbelt forkert. Spaniere spiser og spiste normalt ikke sild. Cervantes skriver i ‘El ingenioso Hidalgo Don Quijote de la Mancha’ (første del 1605, anden del 1615) derfor heller ikke sild, men sardiner: ‘Oh canalla! gritó a esta sazón Sancho. Oh encantadores aciagos y mal intencionados, y quién os viera a todos ensartados por las agallas, como sardinas en lercha!’. Men rigtigt er det, at ‘som sild i en tønde’ findes i mange sprog, fx fransk: ‘être serrés comme des harengs en caque’ eller tysk: ‘wie Heringe in einer/der Tonne’.
Note In many languages, there are idioms of which the how and when is not perfectly clear. Of course, one can quote an expression from William Shakespeare (1564–1616) or some other known author and credit him with inventing it. However, such authors often merely used an already existing expression. But when in many dictionaries, for example, Pia Jarvard’s ‘Bevingede ord’ of 2006, it is said that ‘like herrings in a barrel’ was coined in ‘Don Quijote’ by Miguel de Cervantes Saavedra (1547–1616), it is a double error. The crucial fact is that Spaniards did not eat herrings then, nor do they do so today, as a rule. In accordance with Spanish habits, Cervantes in ‘El ingenioso Hidalgo Don Quijote de la Mancha’ (Part 1 in 1605, Part 2 in 1615) does not mention herrings, but sardines: ‘Oh canalla! gritó a esta sazón Sancho. Oh encantadores aciagos y mal intencionados, y quién os viera a todos ensartados por las agallas, como sardinas en lercha!’ But it is true that ‘som sild i en tønde’ is used in many languages, for example, in French: ‘être serrés comme des harengs en caque’ or in German: ‘wie Heringe in einer/der Tonne’.
2.5. The Use of Four Different Monofunctional Dictionaries The third dictionary, namely, Fixed Expressions with a Certain Meaning, offering the option of a search based on a meaning or association item, was only added
9781441128065_ch02_finals_txt_print.indd 51
7/6/2011 11:10:23 PM
e-Lexicography
52
after the other three dictionaries had become available. For this first period, that is, from 25 January 2007 to 17 December 2007, the log files show the following percentage use for the three dictionaries: 1. Meaning of Fixed Expressions (understanding a text): 51,242 (60.3%) 2. Use of Fixed Expressions (writing a text): 5,294 (6.2%) 3. Knowledge about Fixed Expressions (learning more): 28,405 (33.4%) I must point out here that searches by web crawlers and search engines were not included in these figures. If these searches were included, a multiplication factor of four would be applicable, as such searches by non-humans in these as well as in all other freely accessible internet dictionaries constitute about 75 per cent of all searches.3 In the subsequent period, the use of what was now four dictionaries increased steeply. The following are the figures for the period 17 December 2007 to 17 December 2008, when all four dictionaries were freely accessible: 1. 2. 3. 4.
Meaning of Fixed Expressions (understanding a text): 95,024 (37.6%) Use of Fixed Expressions (writing a text): 8,865 (3.5%) Fixed Expressions with a Certain Meaning (writing a text): 8,646 (3.4%) Knowledge about Fixed Expressions (learning more): 140,154 (55.5%)
If we compare the two phases with three and four dictionaries, we can see that use for satisfying cognitive needs has increased strongly. In phase 1, use for reception needs stood at 60.3 per cent and for cognitive needs at 33.4 per cent, but in phase 2 the results are virtually the opposite. Now the cognitive need accounts for 55.5 per cent and the reception need for 37.6 per cent. We can only guess at the reasons for this development. I assume that over time the users learned that Knowledge about Fixed Expressions offers much historical information which might – at least that is what many e-mails from the users say – be very interesting. In any case, we can observe with certainty that the number of users that require help with text production is relatively modest compared with those who need reception assistance or those who want to know more about fi xed expressions. In phase 1, they account for 6.2 per cent of the searches and in phase 2 the total number of searches in the two production dictionaries represent 6.9 per cent; these are cases with the explicit intention of obtaining assistance with text production problems.
2.6. Further Work with Internet Dictionaries A printed dictionary is already obsolete the moment it is published. This is not true to the same extent for an internet dictionary, which can, in principle, be
9781441128065_ch02_finals_txt_print.indd 52
7/6/2011 11:10:23 PM
Access to and Presentation of Needs-Adapted Data
53
extended or changed every day. But if this is not or no longer done, an internet dictionary will also degenerate relatively quickly into a less useful and less reliable tool. Specialist dictionaries suffer such degradation sooner than generallanguage dictionaries, but after a certain period of time, the latter also are of interest only as museum exhibits, at best. In the case of electronic dictionaries, there is the additional factor that new technical possibilities are created very quickly, and these have to be incorporated if these dictionaries are to remain of value to the seasoned internet user. For example, we intend to build advanced search options into the dictionaries presented here as soon as possible so that such a user can perform combination searches (Boolean searches). They should also be able to define which fields are to be searched, in which order, whether maximizing or minimizing, and from which fields items must be displayed. One could say that each user will be able to design one’s own individual dictionaries. It will therefore be possible to extract not only four, but thousands of different dictionaries from the database. Experiences of other internet media show that about only 5–10 per cent of users will be involved, but these will probably be those who are very frequent dictionary users. Another extension is even more interesting, however. This concerns that of the association items. Today (25 April 2010), there are about 47,974 associations in Knowledge about Fixed Expressions – not different associations, but in total. This could be around 10,000 different associations. However, there is no doubt that associations are a highly personal matter. Consequently, it is as good as certain that access via the associations in their present form, with a maximum of five associations recorded by only one lexicographer, is not optimal; different people make different associations – not entirely, but still to a significant extent. We are, therefore, planning an option whereby the users will be able to add associations themselves. Whether access will be restricted to the individual user or whether it should be made accessible to all users has not yet been decided. I favour the latter solution, as not all users will take the trouble to enter associations. If this solution is selected, care must be taken to avoid sabotage, for example, the addition of sexual expressions, to every individual article. For this reason, the lexicographer(s) would have to first accept any association provided by a user.
Notes 1 2
3
http://wortschatz.uni-leipzig.de In a maximizing search, the order does not really matter, as all sub-results for each individual search are added up in the overall result. In a minimizing search, this is different. In this case, the search ends after searching one field if one or more results are found. Therefore, the next fields are not searched. The total number of searches in the dictionaries of fi xed expressions was 2,502,576 on 24 April 2009, 644.911 of which (that is, 25.8 per cent of all searches) were human searches.
9781441128065_ch02_finals_txt_print.indd 53
7/6/2011 11:10:24 PM
Chapter 3
Lexicographical and Other e-Tools for Consultation Purposes: Towards the Individualization of Needs Satisfaction Sven Tarp
3.1. Introduction Do we need a new general theory guiding the conception and production of lexicographical e-tools, or can we use the general theory already developed in the era of lexicographical p-works (p for paper and printed)? The only right answer to this central question is NO, we do not need a new theory – and YES, we do need a new theory. What is meant by this apparently contradictory statement? During its more than four thousand years of existence, practical lexicography has passed through various stages in terms of its media: lexicographical works have been carved in clay, handwritten on paper or papyrus, printed with different technologies and, more recently, made available electronically on various platforms such as compact disks, handheld computers, mobile phones, the internet, etc. In the last resort, lexicography is a practical – and theoretical – response to needs detected in society, and, as such, it is strongly embedded in specific cultural, historical and technological environments. However, at the highest level of abstraction, the needs giving rise to lexicographical products are of the same types (or categories), as are the data selected to solve these needs, notwithstanding the specific type of medium in which these data are presented. At this level of abstraction, if a general theory of lexicography had existed four thousand years ago, there would be no reason to change it when lexicography’s practical tools passed from clay to papyrus, and later from handwritten to printed versions. And neither would there be any reason to invent a new general theory during the present paradigm shift from p-lexicography to e-lexicography. In this ideal world of supreme abstraction, the only thing
9781441128065_ch03_finals_txt_print.indd 54
7/6/2011 11:04:13 PM
Lexicographical and Other e-Tools for Consultation Purposes
55
theoretically new to be developed would be specific theories related to the new media, for instance, those relating to data processing, data presentation, data access, data linking, and so on. (As to the relation between general and specific lexicographical theories, see Tarp, 2008a: 9–11.) As everybody knows, we are far from living in a perfect world. Human cognition, the epistemological process of acquiring knowledge about a specific subject field, is extremely complex and necessarily passes through various stages, where the already acquired knowledge is constantly confronted with the results of new observations; this leads to a spiral of growing cognition through the fruitful interaction of theory and practice. In this sense, a general theory about any subject field must constantly be improved, and sometimes even replaced with a new one (paradigm shift), as it has to adapt itself to the continuous flow of new data generated in the framework of the scientific research process. This is a general law of human cognition, and lexicographical research does not constitute an exception. However, there is another essential consideration which is valid not only for lexicography but for all theoretical work within disciplines, especially social sciences, where the research field occasionally is subject to qualitative transformation. Within such fields, the researchers might frequently find, apart from completely new elements, ones that existed earlier but only in their embryonic form and which are still not fully developed; for this reason, they might not have noticed them or, at least, not paid sufficient attention to them in terms of their incorporation and place in the theoretical model or system. When the subject field passes from one qualitative stage to another, such hitherto ‘hidden’ elements might ‘unfold’ and show themselves to be elements that necessarily must occupy an important and central place in the corresponding theory-building. This is the case with lexicography in its present paradigm shift from p-works to e-tools. Apart from the completely new elements which obviously exist related to the new media and technologies, there are also other elements that have existed since the very first dictionary was produced several thousand years ago, but which have never really been discussed or paid sufficient attention by the theoretical literature until now. This new interest is due to it becoming evident that they are central and important elements which, correctly interpreted and understood, make it possible to project lexicography far beyond its traditional boundaries. In this way, the contours of a renewed general theory of lexicography are, little by little, becoming more clearly defined, and we see a theory which by means of the dialectical negation of the negation takes everything useful from previous theories, placing lexicography in a relevant and powerful position in the so-called information society. This will be discussed in the next paragraph.
9781441128065_ch03_finals_txt_print.indd 55
7/6/2011 11:04:15 PM
e-Lexicography
56
3.2. New Vision of Lexicography The computer and information sciences have created a completely new technological environment in which lexicography is now developing. The increased focus on information in present-day society has made it clear that lexicography is an information discipline par excellence. The needs that lexicographical works have intended to meet during the last four millenniums have always – when an abstraction is made from their concrete and specific content – been information needs. Besides, if a distinction is established between global information needs, i.e., the needs related to a more profound study of a specific subject field (or part of it), and specific information needs related to a single and limited topic within a larger subject field, or else to the solution of specific tasks or problems, then it becomes clear that lexicographical works and tools – with very few exceptions not worth mentioning – are always artifacts designed to be consulted to meet the latter needs. (As to the concepts of global and specific information needs, see Tarp, 2008b.) Lexicographical works are not the only types of text produced with a view to satisfying concrete information needs. A short panoramic overview shows that manuals, how-to books, user guides and indexes (in text books and other books) are all text types totally or partially designed to be consulted in order to retrieve specific information for one purpose or another. Moreover, the list can easily be extended, to include even telephone books, internet-based search engines and other similar reference tools. What, then, is the relation between lexicography and all these consultation tools which obviously have not been planned and produced according to lexicographical principles? This is a major question that has to do with the very essence of lexicography – and the essence of the philosophy and principles behind the other artefacts. It is evident that they all have something fundamental in common, but it is also clear that they have developed from different traditions. In reality, what we are dealing with is one big discipline embracing all types of consultation tools designed to meet concrete information needs, a discipline which might be considered an integrated part of information science. This discipline should develop its own general theory as part of information science. In this respect, lexicography has, on the one hand, a lot to contribute to other theories dealing with specific consultation tools and to information science in general, and has, on the other, a lot to learn from these theories and this science. Although it will be the future harmonizing and contrasting of ideas that finally will decide to what extent the various traditions within the broader framework of information science might contribute and learn from each other, it might not be premature to claim that among lexicography’s strongest contributions are the very concepts of specific information needs, the satisfying of needs and access to prepared data from which the corresponding information might be retrieved. According to the lexicographical function theory developed over
9781441128065_ch03_finals_txt_print.indd 56
7/6/2011 11:04:15 PM
Lexicographical and Other e-Tools for Consultation Purposes
57
the past two decades, users’ concrete information needs are intimately related not only to the type of user but also to the type of social situation, or activity, where the needs occur. The type of user depends on a number of criteria which, in fact, constitute an open list that might be still more detailed according to the type of situation and particular consultation tool, whereas the types of relevant social situation or activity can be grouped into four fundamental types – cognitive, communicative, operational and interpretive – each of which might be further subdivided into a number of subtypes (for instance, communication can be subdivided into production, reception, translation, text revision, etc.). This complex concept of specific information needs allows a much more precise determination of the exact nature and content of the needs and, consequently, a much more precise determination of the data required to satisfy these needs. The same holds true for access to this data, at least in the case of the controlled framework of a printed consultation tool or an e-tool linked to the limited amount of data in a prepared database, that is, without considering the possible relation to the ‘unlimited’ amount of data made available through connection to the internet. In this last respect, that is, the relation between a ‘limited’ database and the ‘unlimited’ data on the internet, lexicography will probably have a lot to learn from information science, although this prognosis hardly invalidates the assertion that ‘the applicability of the access process, as developed in lexicography, goes beyond dictionaries, illustrating the importance of a process not relevant within the field of linguistics but extremely important in the successful use of reference works’ (Bergenholtz & Gouws, 2010: 103). To sum up, the new technological environment in which lexicography is developing strongly suggests that this discipline should be considered part and parcel of a broader consultation discipline, or science, integrated into information science, through a process in which lexicography is neither big brother nor little brother, but an equal and open-minded partner who has both something to contribute and something to learn. In this way, the general theory of lexicography will form a synthesis with other theories and integrate into a completely new general theory of consultation tools. This new general theory will be a theory that is not only in a position to describe present lexicography and guide the conception of future consultation tools, whether lexicographical or not, but also to enlighten past lexicography and consultation practice in a wider perspective than hitherto. This is the new vision of lexicography put forward currently by the function theory.
3.3. The Concepts of e-Lexicography and Lexicographical e-Tools The term e-lexicography has increasingly been introduced into the lexicographical community over the past few years, together with the term lexicographical
9781441128065_ch03_finals_txt_print.indd 57
7/6/2011 11:04:18 PM
58
e-Lexicography
e-tool. But to what do the two terms refer? Might e-lexicography be used as a reference to any kind of lexicography where computers or other types of electronic media are involved? And might a lexicographical e-tool be understood as any lexicographical work made available on an electronic platform? These are essential questions where there might very well be some disagreement among researchers and practical lexicographers. Such terms are frequently introduced in theoretical and practical discussions without necessary attention being paid to their proper scientific definition and which might, therefore, be tacitly used with different meanings by different researchers, who might think that they agree when they actually disagree, and vice versa. It is, therefore, necessary to furnish a proper definition of the concepts of e-lexicography and lexicographical e-tools according to criteria that take into account the technological differences between printed and electronic media, as well as the options provided by the latter. In order to do so, it might be useful to formulate a preliminary classification of lexicographical works made available on electronic platforms, with a view to establishing first a relevant dividing line between lexicographical p-works and e-tools, and between p-lexicography and e-lexicography, and secondly a categorization of lexicographical e-tools in terms of both their present situation and their future possibilities. In the following, lexicographical works made available on electronic platforms will be classified in four main categories: Copycats, Faster Horses, Model T Fords and Rolls Royces. These categories are, as we will see, also valid for other consultation tools on electronic platforms.
3.3.1. Category 1: Copycats The first category of lexicographical – and other consultation – tools on electronic platforms refers to works that have been either photocopied or directly copied from a text file and then placed on an electronic platform, frequently as PDF files. One such example was the first internet-based version of the Diccionario de la Lengua Española (Dictionary of the Spanish Language), edited by the Royal Spanish Academy, which was photocopied, not article by art icle but page by page, and then made available on the internet as PDF files, which were frequently crooked and dog-eared. This way of providing access to internet-based dictionaries was not unusual when electronic media was in its infancy but it is now becoming less common, with the exception of some old historical dictionaries that are now merely used for research purposes and not as consultation tools. The dictionary of the Royal Spanish Academy was long ago made available in a more user-friendly version, and it is now no longer possible to access the former electronic ‘paper’ version of this famous dictionary. However, when it comes to other types of consultation tools, Copycats are still by far the most frequent method when making those documents electronically
9781441128065_ch03_finals_txt_print.indd 58
7/6/2011 11:04:19 PM
Lexicographical and Other e-Tools for Consultation Purposes
59
available. This holds especially true in the case of user guides and manuals, where perhaps 99 per cent of all electronic versions are PDF fi les, exact copies of the former printed versions. As for the remaining 1 per cent or less where some more advanced technology has been used, they are as a rule not very user friendly in terms of accessing, providing and understanding the relevant data. This sad situation is probably due to the fact that little attention has so far been paid to developing a scientific theory which might guide the conception and production of these important consultation tools used by millions of people each and every day, that is, a theory which, based upon a typologization of users and user activities, caters for the selection, preparation and presentation of relevant data to be included and for quick and easy access to these data. In this respect, it is evident that the most advanced lexicographical theories might have something to contribute to a new generation of more user-friendly electronic user guides and manuals.
3.3.2. Category 2: Faster Horses This category refers to the famous words that Henry Ford is supposed to have pronounced when he introduced his Model T Ford more than a hundred years ago, when asked if he had consulted people before inventing this model. His laconic answer was that ‘if we had asked people what they wanted, they would have said faster horses’. The new revolutionary Model T Ford would never have seen the light of day if Ford and his collaborators had not had the vision and courage to go beyond the traditional boundaries and satisfy people’s needs for transportation in a completely new way. The same situation exists today with regard to lexicography. The vast majority of lexicographical works made available on electronic platforms, especially on the internet, belong to this category where the lexicographers do make use of the new technologies available, but only in a very restricted way in order to provide quicker access to their data by means of links or search strings (which are also considered a sort of linking in information science), and which might recognize not only inflectional forms but also parts of words, orthographic variants and even mistakes. The current online version of the Diccionario de la Lengua Española as well as the Longman Dictionary of Contemporary English Online and the Merriam Webster’s Online Dictionary are emblematic examples of this category of dictionaries. For instance, in the FAQ about the last of these three dictionaries, the user is informed that ‘the Merriam-Webster Online Dictionary is based on the print version of Merriam-Webster’s Collegiate Dictionary, Eleventh Edition’. The same applies for the Ordbog over det Danske Sprog Internet and Svenska Akademiens Ordbok, which are both digitalized versions of already existing dictionaries. However, a large part – perhaps even the majority – of the dictionaries belonging to this
9781441128065_ch03_finals_txt_print.indd 59
7/6/2011 11:04:20 PM
60
e-Lexicography
category are not electronic versions of former printed ones, but have been made from scratch, based upon traditional models and concepts which have been taken over uncritically from the era of p-lexicography. One such emblematic example is the IATE 2010, where millions of Euros have been spent to create a lexicographically incurious and unimaginative tool. The result of this restrictive use of the new technological possibilities is faster lexicographical horses, where the data included are still organized in traditional and static articles, which are completely modelled on the corresponding articles in printed dictionaries. More than 99 per cent of all lexicographical works on electronic platforms are probably Faster Horses of this kind, which shows that lexicography has still a long way to go until it has fully adapted to the new technologies. And to this should be added the observation that some of these horses might run faster but that they have apparently chosen a longer distance, which means that the user frequently ends up wasting the same or even more time than in the corresponding printed dictionaries before accessing the desired data.
3.3.3. Category 3: Model T Fords The lexicographical works belonging to this category have gone beyond the traditional boundaries and have not only made use of the existing technology in order to provide quicker data access, but also to adapt the dictionary articles to the various functions displayed by the dictionary. The result is dynamic articles with dynamic data, which correspond to the specific types of information needs which specific types of users performing specific types of lexicographically relevant activities might have in any consultation situation. These lexicographical Model T Fords provide different types of interactive options where the users might define themselves and the activity for which they need information. They also frequently link to the internet, where already existing data is reused in order to satisfy the users’ specific needs. Among such advanced lexicographical tools are Ordbogen over Faste Vendinger (Dictionary of Fixed Expressions), and Musikordbogen (Danish Music Dictionary). In both dictionaries, the user might adjust the search strategy to the different types of needs occurring in various types of user situation, although the dictionaries do not allow a further differentiation that also tailors data access in conformity with the various types of user profile (Tarp, 2009a: 57–61). With this option, the first of the two above-mentioned works is presented by the authors as four monofunctional dictionaries that can be accessed separately in accordance with each communicative or cognitive situation (see Bergenholtz & Bjærge, 2009). These four dictionaries are: Betydning af faste vendinger (Meaning of Fixed Expressions), Brug af faste vendinger (Use of Fixed Expressions), Faste vendinger med en bestemt betydning (Fixed Expressions
9781441128065_ch03_finals_txt_print.indd 60
7/6/2011 11:04:21 PM
Lexicographical and Other e-Tools for Consultation Purposes
61
with a Specific Meaning), and Viden om faste vendinger (Knowledge about Fixed Expressions). The Musikordbogen, on the other hand, takes a further step and makes extensive use of links to selected web pages whose data is reused in order to support its cognitive function. The necessary precondition for the development of these highly technological dictionaries or lexicographical tools is an advanced theory guiding their conception and production. It goes without saying that it is not a mere question of giving users the option to choose between longer or shorter articles, for example, more or less data displayed on the screen, as in the online Macmillan Dictionary and Thesaurus 2010 and Den Danske Ordbog (The Danish Dictionary), which might be lexicographically categorized somewhere between Faster Horses and Model T Fords. The crucial question is, as already stated, whether there exists an advanced theory, such as the function theory, providing detailed guidelines that permit a meticulous study and classification of the types of users’ information needs and the corresponding types of data needed to satisfy them in each and every type of situation where they might occur. This might also be the reason why no electronic user guide or manual has been found that qualifies for the category of Model T Ford. Even Microsoft, the world’s biggest provider of software, has so far only developed electronic user guides for their surprisingly low-standard products, which only places them in the category of Faster Horses. However, it is a fact that some relatively simple consultation tools conceived to meet less complex needs, such as train, bus and flight information, have also qualified for the category of Model T Ford.
3.3.4. Category 4: Rolls Royces The Rolls Royces are lexicographical and other consultation e-tools which permit individualized solutions for specific and individual users in concrete situations, and which might also combine access to selected data in a prepared database with browsing on the internet; the purpose is to obtain dynamic solutions based upon a recreation and re-representation of the data made available in this way, which is different from the lexicographical Model T Fords, which link to specific web pages in order to reuse their data. This category is so far an empty category, in the sense that no lexicographical tool has yet been developed with characteristics that justify its inclusion in this category, but it is nevertheless an extremely relevant category, pointing as it does to the future of lexicography – and consultation disciplines, in general – and to the new horizons gradually being reached by the new computer and information technologies. However, it is necessary to stress that the development of Rolls Royces for consultation purposes is not only a question of adapting to the existing technologies, but also of creating completely new ones. The necessary technology can only come into the world by means of fruitful interdisciplinary team work
9781441128065_ch03_finals_txt_print.indd 61
7/6/2011 11:04:22 PM
e-Lexicography
62
between, on the one hand, computer experts, and, on the other, information specialists and lexicographers guided by an advanced theory.
3.4. Definition of e-Lexicography and Lexicographical e-Tools Following the above classification of lexicographical works, and other consultation tools, made available on electronic platforms, we can now proceed to a definition of the terms lexicographical e-tool and e-lexicography. Initially, however, it is worth noting that the difference between lexicographical p-works and e-tools has nothing to do with the amount of data that the work or tool might contain. In this regard, the proof of the pudding is the existence of two gigantic lexicographical works completed in China in 1408 and 1726, respectively. The first, the Yongle Dadian 1408 (Great Canon of the Yongle Era) was published in 1408, is a lexicographical h-work (handwritten dictionary) comprising no fewer than 370 million Chinese characters and 22,937 chapters, bound in 11,095 volumes, covering history, philosophy, Buddhism, Taoism, drama, arts and farming, among many other topics. The data included, which was partially transcribed character by character as exact copies of original texts produced during the previous decades, is structured according to a rhyming system for the characters and is also accessible through a complex system of indexes which, together with the preface, comprise 60 chapters. The idea was to make a complete canon of existing texts within a wide range of subject matter at a critical historical moment, when China was recovering from several devastating conflicts and needed this knowledge. The second of these gigantic lexicographical works is the Gujin Tushu Jicheng (Complete Collection of Illustrations and Writings from the Earliest to Current Times), which was printed in 1726 in about 60 copies, and contains 100 million Chinese characters, some 800,000 pages and 10,000 chapters, collected in 5,020 volumes. If we take into account that each page of Chinese characters corresponds to about six pages written in our alphabet, this encyclopedia is, as such, the biggest lexicographic p-work ever published. Until now, no other lexicographical work, whether handwritten, printed or electronic (with the possible exception of Wikipedia), has incorporated so much data as these Chinese Gargantua and Pantagruel, which indicates that the dividing line between lexicographical e-tools, on the one hand, and lexicographical p- and h-works, on the other, cannot be established quantitatively. Hence, what has to be done in order to distinguish between p-works and e-tools and define the latter is to proceed qualitatively and look at the way in which they make use of the existing computer and information technology.
9781441128065_ch03_finals_txt_print.indd 62
7/6/2011 11:04:23 PM
Lexicographical and Other e-Tools for Consultation Purposes
63
In the above classification of lexicographical works published on electronic platforms, it is clear that some lexicographers have simply placed their old or new works on these platforms without really making use of the new technological possibilities, except for that of quicker access. The lexicographical articles and data appearing on the screen are static, and exactly the same kind of articles and data that the users will find in printed dictionaries. The possibility of adapting the data to the specific type of information need has not yet been explored. The result is, in the case of lexicographical Copycats and Faster Horses, that we are dealing with electronic p-works and not with lexicographical e-tools, which require, at the very least, interaction with the user and the possibility of accessing dynamic articles with dynamic data that might even be connected with relevant data already made available on the internet. We can say, therefore, that it is only the few existing lexicographical Model T Fords and the future lexicographical Rolls Royces that might be considered lexicographical e-tools. Consequently, e-lexicography should be considered only the type of lexicog raphy dealing with these lexicographical e-tools, that is, with lexicographical Model T Fords and Rolls Royces, and not all the versions of lexicography working with electronic media without really adapting to them and exploring their new and revolutionary possibilities for this four thousandyear-old discipline.
3.5. Relation between the Type and the Individual When Wiegand (2001) published the only hitherto serious criticism of the lexicographical function theory (in its first version), one of his critical comments was that the authors of this theory wrote about users, user situations and user needs, and not about types of users, types of user situations or types of user needs. His criticism was correct and, therefore, more than welcome. A theory like that relating to function cannot be built directly upon concrete and individual phenomena that might differ from each other in many aspects. Scientific and theoretical work presuppose an abstraction from some of the less important characteristics of concrete and individual phenomena, and the creation of concepts, categories and types which include phenomena with some common characteristics considered essential and relevant for the research field in question. The function theory was forced to re-saddle and, consequently, in its present and more mature version, this theory works with types of users, types of user situations, types of user needs and types of data that might satisfy these needs. However, no type of user has ever made a type of lexicographical consultation in order to access a type of data that might meet a type of information need occurring in a type of social situation. The only thing that has ever happened, and which happens every day, hour and minute, is that an individual user with
9781441128065_ch03_finals_txt_print.indd 63
7/6/2011 11:04:24 PM
e-Lexicography
64
individual information needs occurring in an individual situation decides to make an individual lexicographical consultation in order to access the concrete data that might satisfy his or her individual needs. Although each user, user situation, user need, type of data and consultation might be assigned to specific types, they are not in themselves types but individual and concrete phenomena. Yet however justified and correct a typologization might be, it is a fact that, in the real world, the individual needs of individual users might differ from each other and that the concrete data required to meet these needs might as a result be slightly different from user to user as well as from user situation to user situation. Two potential users belonging to the same type might, for instance, both have problems related to text production, but one might have a problem in terms of morphology, whereas the other’s might be related to syntactic properties. And the same user who in one situation had a morphological problem might in the following situation have a syntactic problem, and in a third situation have a combination of morphological and syntactic problems, etc. This is the reason why the individualization of satisfying a user’s needs is a question to be taken seriously, especially at a time when the computer and information sciences are gradually providing the necessary technology to permit this gigantic, revolutionary step in the framework of revised lexicographical theory and practice. What does such individualization imply for lexicographical theory and for the lexicographers who conceive and plan dictionaries? First of all, it is important to remember that no lexicographer, however well prepared, can deal with each and every one of the infinite number of individual needs that an infinite number of individual users might have in an infinite number of situations. This is completely unthinkable and it cannot be the vision for future lexicography. Lexicographical planners – just as lexicographical theory-builders – still have to work with types of users, situations, needs, data, etc. What it needed is the gradual development of highly sophisticated tools that permit both individualized access to the data contained in a well-structured database or made available on the internet, and the re-creation of completely new data based upon the already existing data. As already mentioned above, part of the required technology is already available whereas additional technology still must be invented.
3.6. Individualization and the Fundamental Lexicographical Functions Some of the ways to achieve user-needs satisfaction will be discussed little in this chapter, but before proceeding to this important question, it is appropriate to relate the concept of individualization to some of the basic elements of the function theory. As already mentioned, in its present version this theory
9781441128065_ch03_finals_txt_print.indd 64
7/6/2011 11:04:24 PM
Lexicographical and Other e-Tools for Consultation Purposes
65
envisages four fundamental situations or activities where specific information needs might arise for potential users of lexicographical tools. These four situations, which are probably also valid for other consultation disciplines apart from lexicography, are those where the people engaged in various activities might need information in order (i) to know more about some topic (cognitive situation), (ii) to solve problems related to textual communication (communicative situation), (iii) to receive assistance to perform manual or mental operations (operational situation), and (iv) to receive help to interpret and understand non-textual and non-verbal phenomena or symbols (interpretive situation). Until now, lexicographical works have mostly been conceived with a view to assisting users with cognitive or communicative needs, but they have also in a few cases included data that might help users in operational and interpret ive situations. One such example is M. Postlethwayt’s Universal Dictionary of Trade and Commerce from 1774. This interesting dictionary, which can be found in, among other places, the Biblioteca Histórica de Santa Cruz, at the University of Valladolid, contains recommendations (so-called remarks) to the users on how to behave and do business ‘in America’ in the years immediately preceding the independence of the first 13 states. However, there are some fundamental differences between three of the basic types of user situations or activities included in the function theory and the last, that is, the cognitive type. In the latter, it is very difficult to obtain a clear idea of the exact information that the users might need and, as a result, of the data required to provide this information. Even within a small and restricted area of knowledge, there might be an infinite amount of information needed by the potential users, although it would be possible to determine what kind of information users ought to have in order to acquire a specific range of knowledge, for instance, in schools and in relation to teaching and learning, in general. All this makes it very difficult to select and prepare lexicographical data that might directly satisfy users’ need in all, almost all, or even the majority of consultations related to cognitive situations. The information needed is, let us say, not easily predictable. The question is somewhat different when it comes to communicative, operational and interpretative situations. In the vast majority of cases, these activities might give rise to a limited number of information needs and, thus, require a limited amount of data. Such is, for instance, the case when a user needs instructions in order to operate a machine (operational situation); when somebody needs an explanation (sometimes followed by recommendations to take action) in order to interpret a sound, a symbol or some similar non-linguistic and non-verbal sign (interpretative situation); or when he or she needs explanations in order to understand a text or advice, in the form of data, in order to produce a text (communicative situations), and so on. In all these situations, the deductive method developed within the framework of the function theory
9781441128065_ch03_finals_txt_print.indd 65
7/6/2011 11:04:24 PM
e-Lexicography
66
makes it possible to determine the data to be selected, prepared and presented so as to meet users’ information needs in the vast majority of consultations. Only in relatively few cases must additional data be found or recreated to meet unpredictable information needs. As a matter of fact, there are some important affinities between the three fundamental situations mentioned. Both in text reception and interpretative situations, users require explanations in order to satisfy their needs. Both in text production and operative situations, they need instructions, recommendations or advice in order to perform their different manual, mental or linguistic actions in the best possible way. These affinities are the reason why user guides and manuals have been mentioned several times in this chapter. There is little doubt that these tools can be improved considerably if their conception and production are guided by lexicographical principles and, especially, if they make use of the deductive method developed by the function theory in order to determine possible information needs within a restricted area of activity. In this respect, when, in the near future, lexicography and information science confront their different ideas and possibilities in terms of specific consultation e-tools, this confrontation might probably show that lexicography has most to contribute in relation to communicative, operational and interpret ative information needs, whereas the existing information science might have its strongholds in cognitive situations.
3.7. Cognitive Situations, Textbooks and Individualized Consultations It can be taken as a matter of course that the reflections in the previous paragraph do not exclude the possibility of lexicography having something to offer in terms of cognitive situations. Apart from the concrete cognitive needs satisfied by a large number of dictionaries, lexica and encyclopedias, lexicography has also, in certain cases, produced longer texts conceived to be both consulted and read from beginning to end for cognitive purposes. This is, for instance, the case of the 42-page systematic Introduction to Molecular Biology in the Encyclopedic Dictionary of Gene Technology English–Spanish, which is integrated into the central alphabetic word list by means of an elaborate system of cross-references. This and similar experiences in other dictionaries based upon the function theory (e.g. Musikordbogen) might be transferred to other cognitive information tools like textbooks. These are not only used to be read from beginning to end, chapter by chapter, in order to acquire global information about a specific subject field, but also to be consulted with a view to specific information needs being met in at least three different situations: (i) when a student needs
9781441128065_ch03_finals_txt_print.indd 66
7/6/2011 11:04:24 PM
Lexicographical and Other e-Tools for Consultation Purposes
67
to consult a previous paragraph or chapter for information about a specific topic, for example, a definition of a central concept; (ii) when a student wants to memorize what he or she has just read in a given paragraph or chapter; and (iii) when a student is preparing for an examination. Satisfying the concrete information needs occurring in these three cognitive situations might well draw on the same types of data but require different access routes to the appropriate data, which might also be required to be structured in such a way that the textbook can meet both global and specific information needs. Lexicography, and especially the function theory, will most likely have something relevant to contribute to the development of such advanced textbooks, especially when they are gradually placed on electronic platforms with many more distinct access options, perhaps allowing each student to adapt the text e-book to his or her individual information needs and personal study technique.
3.8. Some Methods to Achieve Individualization The subtitle of this contribution is ‘toward the individualization of needs satisfaction’. The use of the word ‘toward’ indicates both the process and the idea that individualization might never be fully achieved in the narrow sense of the word. However, it is beyond any doubt that it is possible to take important steps in this direction with the technology already existing as well as with that to be developed with this objective. In this regard, three main methods can be listed in accordance with the type of engagement required by the user: (i) the interactive method; (ii) the active method; and (iii) the passive method. The first of these three methods presupposes an interaction between the user and the e-tool. The user is in one way or another offered some fill-in options or questions, by means of which he or she will be assisted in making a personal profile and indicating the specific type of situation or activity where information needs occur, and even the specific type of need. This can be done in various steps so as to end up with a very detailed description and characterization of the specific user, user situation and particular need. The e-tool will then automatically select and filter the data required and adapt it to the individual needs according to the user profile and situation indicated. When the second method is applied, each individual user of an e-tool will be given the option to design his or her own ‘master article’ in terms of the type of data sought and its arrangement on the screen. The e-tool will then automatically filter the available data and present it as indicated by the user. This option is, of course, also interactive in the modern sense of the word, but as it requires much more engagement and probably some very advanced user skills, it is here called active in order to distinguish it from the former and stress the active role required by the user.
9781441128065_ch03_finals_txt_print.indd 67
7/6/2011 11:04:24 PM
e-Lexicography
68
Both the above creation of a ‘master article’ and that of user profiles in the broad sense of the word (including activity) might be done at four different moments: (i) when the user enters the e-tool for the first time; (ii) when the user begins a specific activity where information needs might be expected; (iii) when the user starts a specific consultation; and (iv) when the user is already in the middle of a specific consultation. In this way, it is possible to re-saddle whenever necessary. The third method consists of automatic tracking of the user’s behaviour during a number of consultations. In this case, the user is ‘passive’ while the e-tool makes the calculations and creates a profile of the type of data that the user generally looks for, so that the same type of data is furnished when the e-tool is once more consulted. The biggest inconvenience regarding this method is that users’ needs, by definition, are not determined only by their personal profile in the narrow sense of the world, but also by the type of situation in which they occur. If consultation of the e-tool is made one day in one situation and the next day in a different one, then the needs will be of a completely distinct nature. For this reason, the best solution might be some kind of combination of all three methods mentioned, with the possibility to reset and change at any moment. If one looks at the three main methods of attempting to individualize userneeds satisfaction discussed above, a preliminary hypothesis would be that the two first methods, interactive and active, are quite appropriate in communicative, operational and interpretative situations, whereas the third, passive, method might be more useful in cognitive situations. This hypothesis is built upon the reflections in the previous paragraph, but it is up to future research to determine whether this hypothesis will prove right and to what extent a combination of the three methods should be recommended.
3.9. Typology of Lexicographical e-Tools Before concluding this contribution, it might be useful to return to the old discussion about dictionary typology. In the case of the ‘old’ lexicographical p-works, it was relatively easy to distinguish between one multifunctional dictionary and two monofunctional dictionaries: it was a single multifunctional dictionary if it was printed in one volume, while we were dealing with two monofunctional dictionaries if they were printed in two volumes with two different titles. However, in the era of e-lexicography, the typologization has become even more difficult. Some lexicographers claim that a lexicographical e-tool that permits two or more types of monofunctional data access should be considered a conglomerate of two or more monofunctional dictionaries, using the same database to extract the required data. Other lexicographers prefer the term pluri-monofunctional dictionary, while there are those who claim that a lexicographical e-tool of this type should be considered a single
9781441128065_ch03_finals_txt_print.indd 68
7/6/2011 11:04:24 PM
Lexicographical and Other e-Tools for Consultation Purposes
69
multifunctional dictionary when it is embedded in the same web page or portal and extracts its data from the same database. In the latter case, typologization criteria are more or less the same as those employed to typologize lexicographical p-works. Such disagreements tend to become deeply rooted if something new does not evolve to change the whole framework of the discussion. This ‘something new’ is already taking shape and is called individualization of lexicographical user-needs satisfaction. In lexicographical e-tools conceived according to the principles of individualization, there are neither monofunctional nor multifunctional data access routes, but only individualized ones (although each of these individualized routes might be assigned to a specific type of function). And as it is impossible to list all the individualized access routes, it is also impossible, and therefore illogical, to list and name all the ‘individualized dictionaries’ contained in the e-tool. The only logical conclusion to be drawn is that lexicographical e-tools should not be viewed as a number of monofunctional or individualized dictionaries, but as one multifunctional dictionary with individualized search options within the framework of its defined functions. Of course, if the user first has to enter a lexicographical tool and choose a monofunctional option (dictionary) and then be allowed to individualize his or her access routes, then it might be justified to talk about a set of monofunctional dictionaries, each of which has individualized search and access options. But if the ‘individualization’ takes place before we enter a separate monofunctional dictionary, that is, if the interactive and active methods described in the previous paragraph are applied before this step is taken, then it makes little sense to speak about a set of monofunctional dictionaries defined in terms of their monofunctional search and access options. As a consequence, the best dictionary in terms of needs satisfaction is not necessarily a monofunctional dictionary, but any dictionary – whether monofunctional, pluri-monofunctional or multifunctional – that allows either monofunctional access or individualized access in the framework of its specific and foreseen functions. In this way, the discussion of (dictionary) typology might continue with all justification, but on much more solid ground. The question might even be posed as to whether the term dictionary should be maintained in the era of electronic lexicography and whether it is becoming obsolete as a scientific term (not necessarily as a commercial term) and should be replaced by another term, for example, lexicographical e-tool, lexicographical information tool or lexicographical consultation tool.
3.10. Conclusion As a conclusion, one relevant question which has not been discussed in this chapter and which must be addressed in the future is what to do with the information that potential users of an e-tool believe they do not need, when actually
9781441128065_ch03_finals_txt_print.indd 69
7/6/2011 11:04:24 PM
70
e-Lexicography
they do need it. If someone uses a computer, or is online, then software like Word’s Spelling and Grammar Checker, which does its work without the intervention of the user, and sometimes even without his or her knowledge, might be an option not only related to communicative situations and activities, but also to cognitive, operational and interpretative ones. The same holds true for a GPS device which might alert and even correct the user when he or she takes the wrong direction. But is there also a solution to this problem if someone is not using a computer, if they are not online or connected to the Global Positioning System? The new computer and information sciences do not only provide completely new and revolutionary solutions to old needs. They also create new challenges for the future.
9781441128065_ch03_finals_txt_print.indd 70
7/6/2011 11:04:25 PM
Chapter 4
Filtering and Adapting Data and Information in an Online Environment in Response to User Needs Theo JD Bothma
4.1. Introduction Theory and praxis in lexicography are closely interlinked. This chapter accepts that a theory of lexicography is required to take the practical development of dictionaries and e-dictionaries forward and is written within the framework of the function theory as developed and expounded by Bergenhotz, Tarp and others at and in collaboration with the Centre for Lexicography (Centlex) at the Aarhus School of Business (see, for example, Bergenholtz, 2010; Bergenholtz & Bergenholtz, 2011; Bergenholtz & Gouws, 2007b; Bergenholtz & Tarp, 2005a; Gouws & Leroyer, 2009; Gouws & Steyn, 2005; Leroyer, 2009a; Nielsen, 2009; Tarp, 2007; Tarp, 2008a, 2008b, 2008c, 2009a, 2009b, 2011; Tono, 2010; Verlinde, Leroyer, & Binon, 2010). This chapter does not argue the validity of this theory when compared to other theories; it does, however, try to show that the theory may need further refinement when the possibilities of modern information technologies are applied to the praxis of the development of e-dictionaries. There are opposing theories of lexicography (see, for example, the chapter by Gouws in this volume); there are even those that argue that a theory of lexicography does not exist (for example Atkins & Rundell, 2008: 4, discussed in Bergenholtz & Gouws, 2010: 23). These issues are not addressed in this chapter. This chapter assumes that praxis informs and influences theory and that theory informs and influences praxis in the creation of e-dictionaries and e-information tools. The possibilities that modern information technologies offer the e-lexicographer may or may not be useful in the development of e-dictionaries. However, the use of such technologies should not be simply because the technologies exist; they should only be adopted if they bring a higher level of efficiency to the dictionary and enhances the user experience with the e-dictionary, that
9781441128065_ch04_finals_txt_print.indd 71
7/6/2011 11:04:55 PM
72
e-Lexicography
is, it allows the user to satisfy his/her information needs more effectively and more efficiently. This has been the rationale for the use of new information technologies during the past 25-plus years. e-dictionaries developed from simple reproductions of paper-based dictionaries in a digital format to e-dictionaries that use highly complex relational databases to organize the data, provide fairly sophisticated searching and navigation (browsing) facilities to facilitate user access to the data, and so on. Distribution media have also changed, from delivery via a floppy disc to CD-ROM and currently via the Web. In all these cases, the use of the technologies was not because of the existence of the technologies, but in response to user needs. During the past ten-plus years, a number of further information technologies have matured that are currently not extensively or optimally used or not used at all in e-dictionaries. This chapter discusses these technologies in brief and shows how they are used in current general applications on the Web. It then also describes to what extent they are used in current e-dictionaries (if at all) and offers examples of how they can be implemented to enhance access to information in terms of user needs.
4.2. User Information Needs In any given environment, information is included in an information artefact based on the author and/or publisher’s perceived or assumed understanding of the user’s information needs. The information is organized in such a way that it can easily be accessed by users in terms of their specific information needs in any given situation, again based on the author and/or publisher’s assumptions of easy accessibility in a given situation. In a paper-based environment, a book is, therefore, aimed at a specific target set of users: for example an elementary book for first grade primary school children, a textbook for university students, a highly specialized scientific book aimed at the expert and so on; in each case the information included as well as the organization of the information is based on this perceived or assumed understanding of the author and/or publisher. However, such categorizations are very broad and the specific information needs of an individual within this group cannot be taken into account. The information artefact is therefore aimed at providing the level of detailed information to satisfy the assumed information needs of the largest possible group within the target set; information needs outside this simply cannot be satisfied and the user has to find another source. This is currently, to a very large extent, the situation with digital information artefacts on the Web as well: users very often have to consult multiple sources to get exactly the information they need.
9781441128065_ch04_finals_txt_print.indd 72
7/6/2011 11:04:56 PM
Filtering and Adapting Data and Information
73
This is the case in e-lexicography as well: the perceived information needs of the largest possible group of the target population are addressed in any specific e-dictionary. Modern information technologies, however, allow for the personalization of information presented to the user by means of filtering and adaptive technologies based on the user’s profile. Sven Tarp already addressed this issue in various publications; a number of quotations from 2007 and 2009 illustrate his point of view: z ‘[S]cientific lexicography would above all be interested in knowing in which
situations – e.g. reception and production – these needs may occur. Then it would set itself the task of uncovering the needs users have in the last 20 percent of the look-ups, i.e. in one out of five consultations. And it would not stop here, but would try to go even deeper into the problem in order to discover the needs that only show up in one out of a hundred or one out of a thousand consultations, [ . . . ]’ (Tarp, 2009b:292). z ‘[ . . . ] or, even more rarely, in order to conceive dictionaries capable of meeting all the users’ needs in specific types of situations’ (Tarp, 2009b: 292). z ‘ . . . dynamic articles . . . structured in different ways according to each type of search criteria’. z ‘ . . . articles that are especially adapted . . . ’; ‘ . . . define their own profile . . . ’; ‘ . . . one direction will probably be the “individualization” of the lexical product, adapting to the concrete needs of a concrete user [ . . . ]’ (Tarp, 2009a: 57, 59, 61). These quotations show the need for e-lexicographers to consider the possibilities of modern information technologies seriously and to experiment with the possibilities that such technologies offer to enhance the user experience in e-dictionaries; the same sentiments are expressed in Tarp’s contribution in this volume (Tarp, 2011). Against this background this chapter therefore addresses two main issues, viz. to what extent can modern information technology: z facilitate the design and implementation of e-dictionaries and/or e-infor-
mation tools for specific user groups and situations; and z enable the user to ‘create’ his/her own e-dictionary and/or information
tool(s) to access the information required for a specific information need on demand? It cannot provide final and definite answers to these questions, but it hopes to make a contribution to further stimulate the debate around the theory
9781441128065_ch04_finals_txt_print.indd 73
7/6/2011 11:04:57 PM
e-Lexicography
74
and praxis of e-lexicography. Some theoretical background and assumptions against which the potential use of the information technologies are to be understood must be discussed.
4.3. Theoretical Background and Assumptions The function theory of information needs have been described in detail in many publications from the Centre for Lexicography (Centlex) at the Århus School of Business and this chapter is written within this theory, as indicated above. It, therefore, accepts that there are four types of information needs, viz. communicative, cognitive, operative and interpretative needs. The theory works well in the current e-lexicography environment and examples of its practical implementation can be found in the dictionaries developed by Centlex and commercialized by Ordbogen.com (www.ordbogen.com). As stated in the introductory paragraphs, this chapter is written within the framework of the function theory. In addition to the function theory, issues about context and the characteristics of information and users need to be addressed.
4.3.1. Context The function theory makes limited provision for the context in which the information need is experienced. Ingwersen and Järvelin (Ingwersen, 2001, 2007; Ingwersen & Järvelin, 2004, 2005) define six context dimensions in interactive information retrieval (IIR), viz. ‘Intra-object structures’ (dimension 1 in Figure 4.1); ‘Inter-object structures’ (dimension 2); ‘[t]he Session context dealing with features (evidence) of the interaction (or Activity)’ (dimension 3); ‘Social, systemic, domain and work task contexts’ which can be either individual (dimension 4a) or collective (dimension 4b); ‘Techno-economic-politicosocietal infrastructures’ (dimension 5); and ‘The historic context operating across this stratification’ (dimension 6) (Ingwersen, 2007). Ingwersen (2007) explains the different context domains as follows: 1. ‘Intra-object contexts in the model – signs in context of sign structures or elements, constituting objects; 2. Inter-object structures refer to social networking, hyperlinks or citations between objects, identifying the research framework component in question (the core of the nested set of contexts of the model); 3. The Session context dealing with features (evidence) of the interaction (or activity) between two components or actors of the research framework – with the situation at hand as a central cognitive–emotional element. Session context is embedded in broader seeking and information behavior. The
9781441128065_ch04_finals_txt_print.indd 74
7/6/2011 11:04:57 PM
Filtering and Adapting Data and Information
75
situation at hand is constructed by the actor’s perception of work and search tasks (interest), knowledge gap and potential sources, and so on (Ingwersen & Järvelin, 2005: 278) in the context of 4. Social, systemic, domain and work task contexts – depending on the nature of the core component: a. Individual conceptual and emotional (actor: searcher; author); systemic (engine; interface; information object); and domain properties immediately surrounding the core actor or component (work task; interest, preference, product). b. Collective conceptual and emotional (actors: search teams; author groups); systemic (networks; meta-engines; information objects, information space); sociocultural and organizational structures in local settings. 5. Techno-economic-politico-societal infrastructures influencing (not necessarily always in a remote way) all actors, components and interactive sessions. 6. The historic context operating across this stratification, i.e. that of all participating actors’ experiences, forming their expectations. All IIR processes and activities are under influence of this temporal form of context.’
(6) Historic context (5) Economic technophysical and societal contexts (infra-structures)
(4b) Collective
(4) Social, Systemic, Domain/Media, Work task, Conceptual, Emotional ... contexts
(4a) Individual
(3) Interaction (session) context
(2) Inter-object contexts
(1) Intra object structures
Signs
Component of cognitive IS&R framework
Figure 4.1 Nested general model of context stratification for interactive information retrieval (Ingwersen 2007, revised from Ingwersen and Järvelin, 2004 and 2005).
9781441128065_ch04_finals_txt_print.indd 75
7/6/2011 11:04:57 PM
76
e-Lexicography
Even though these contexts are formulated in terms of interactive information retrieval, these contexts are equally relevant in terms of information needs, information use and information use behaviour. The function theory is not in conflict with the context domains described above. To be satisfied, the information needs of the function theory require interaction of the user with information objects that have intra- and interobject relationships. Equally, the user that has a specific information need operates in any given interaction with the information within an individual or collective/collaborative social or work task context with specific infrastructures and a specific historic context.
4.3.2. Characteristics of Information and Users Typically, users don’t care what the source of the information is, whether it comes from a dictionary, encyclopaedia, thesaurus, Web document, research article, book/textbook and so on, as long as their information needs are satisfied and the criteria for good quality information are met. Bergenholtz and Gouws (2010: 3) formulated this as follows: ‘For the user the type of information source is not important. Important is that he/she retrieves the exact required information as quickly as possible’. In the context of dictionaries, Haas had already formulated this in 1962: ‘The perfect dictionary is one in which you can find the thing you are looking for preferably in the very first place you look’ (Haas, 1962: 48). In this quotation, ‘dictionary’ can equally well be replaced with ‘information source’ – any user consulting any information source would prefer it to provide him/her with exactly the information he/she needs in the given situation, not more and not less. Therefore, the user would prefer not to have to read through much detail if only a single fact is required; neither would the user like to have to consult multiple sources to get to the detail required for a more comprehensive view on a topic. Equally, a lay person in a specific domain would prefer not to get information that is written at a high level of scientific complexity and which is aimed at the expert and an expert would not be satisfied with an overview aimed at a lay person. This implies that the information required to satisfy a specific information need in a given situation has multiple characteristics; for example, it could have little, medium or much detail, it could be aimed at a lay person, a semiexpert or an expert. A short definition can be a very easy definition for a lay person or a highly complex definition for an expert; similarly, a long essay can be written ‘in simple English’ or could contain highly complex scientific terminology and argumentation. Furthermore, an expert scientist in a specific discipline could require a short scientific definition aimed at an expert or a detailed scientific discussion aimed at an expert, depending on his/her information needs in a given situation. A person who has a good understanding of the same discipline might require a definition that is phrased in less technical terms than the expert might require, but something more technical than a
9781441128065_ch04_finals_txt_print.indd 76
7/6/2011 11:04:58 PM
Filtering and Adapting Data and Information
77
person who has no background in the field, and he/she might again require only a short definition or a detailed discussion. And since no user is an expert in all fields a user might be an expert when one information need is to be satisfied, a semi-expert in another field and a lay person in a third field. In the preceding, only two characteristics have been identified and both can be plotted on a continuum, in the one case a continuum of detail and in the other case a continuum of complexity. Many other such characteristics of information can be identified. It is also possible to typify the user, for example in terms of his/her general language proficiency in his/her mother tongue or a second language, or, for example, in specialist scientific language. Bergenholtz and Gouws (2007b: 579–583) identify 30 user types based on subject knowledge and general and specialist scientific language proficiency. This results in an n-dimensional complex matrix of elements if all characteristics of information and user types are combined in a single diagram. In Figure 4.2, only two possible characteristics of information have been plotted in the matrix, viz. detail and complexity; it could, however, be any other variables or combination of variables. Within this context, users therefore expect information to be: z z z z
accurate, up-to-date, relevant; available on demand; with a minimum amount of effort (time, clicks etc.); at the required level of detail and complexity { Factoid to comprehensive { Elementary to highly complex; z delivered on the platform of their choice { Desktop { Mobile. This can be summarized in the matrix depicted in Figure 4.2. Platform independent, from mobile to desktop
Detail
D1/ Little detail
Complexity
D3/
D2/ Medium
ate cur
Ac
Comprehensive
C1/Elementary
ate
o-d
-t Up
ant lev
C2/Medium
Re
C3/Complex
Different information needs and types of information need over time of worktask
Figure 4.2 Matrix of possible characteristics of information.
9781441128065_ch04_finals_txt_print.indd 77
7/6/2011 11:04:58 PM
78
e-Lexicography
If information needs are plotted over a specific period of time in a specific environment, for example, in a given work task, different information needs and different types of information needs might be experienced. Two examples will suffice: z If a scientist writing a scientific paper his/her information needs can most
probably in general be characterized as a cognitive need at a level of comprehensive detail and high complexity. However, if the same scientist were to read a scientific article as part of his research in a language that he/she is not quite familiar with, he/she might have a communicative need at the level of text reception (‘understanding the meaning of an unknown word’). The same scientist might also need some contextual background regarding his/her research problem and might require a number of ‘overview articles’ that might contain much detail (i.e. be quite comprehensive) but are mainly aimed at the semi-expert. z A translator translating a text into a foreign language (working from L1 to L2) may typically need information for text production (‘what is the correct word to use in this context?’), which could in many cases be resolved by a single lookup in a dictionary for text production by finding a single translation equivalent. However, if he/she comes across an idiom or fixed expression, he/she might require cultural and/or historical background information to ensure that the idiom is translated in such a way that the target audience can understand the intended meaning of the idiom. An example of the latter would be if a mother tongue English speaker in South Africa were to be engaged in Bible translation from Greek (L2) to isiZulu (L3). Even if the translator were to be fluent in all three languages, cultural differences between idiom use in the era of New Testament Greek and modern-day Zulu culture might require a careful analysis of the intended meaning of an idiom in the Greek, as well as the possible translation equivalent in modern isiZulu, especially to find an idiom with the equivalent meaning (if possible), to ensure that the target audience understands the intended meaning of the original – a direct translation or paraphrase will obviously not suffice. The detail and complexity of the information interaction required to solve the communicative information need in this case is much more than finding a single translation equivalent, especially if no L2/ L3 bilingual dictionary is available. (The information need remains a communicative need as the intention of the translator is to solve a translation problem, viz. a text production problem. Since the translator’s intention is not primarily to acquire new knowledge, it is not a cognitive problem, even though learning may take place as an unintended side effect and benefit of the interaction with the information.) From the preceding it is clear that it is extremely complex to satisfy a specific information need of a specific user in a specific situation taking into account
9781441128065_ch04_finals_txt_print.indd 78
7/6/2011 11:05:00 PM
Filtering and Adapting Data and Information
79
all the possible permutations of the characteristics of users and information. Bergenholtz and Gouws (2007b: 854) come to the conclusion that a reference work is of high quality if it contains what the user needs – and only what the user needs – to satisfy his/her information needs. Such products are not feasible in a paper-based environment and are currently only to a very limited extent available in the e-environment. As indicated below, modern information technologies can start addressing these issues.
4.3.3. Information Needs Life Cycle In the typical life cycle of information needs, a user must acknowledge that he/she has an actual information need which he/she must satisfy in a specific given context (Ingwersen’s dimension 4, above). The user then, based on his/ her articulation of this need, executes a search and finds information; this information is analyzed and interpreted in terms of the information need. If the need is satisfied, the user can go ahead and use the information. If, however, the need is not satisfied, an iterative process occurs in which the user must re-identify the information need, re-do the search, re-analyze and re-interpret the search results and information found or any combination of the steps. Through using the information, the user can synthesize the information with his/her existing information/knowledge base and in the process internalize the information for future use. It is obviously also possible that the information is only of peripheral and ephemeral value for the user for only a limited time period and is soon forgotten. If the user uses the information in, for example, the creation of a new document, new information is created. In the writing process, new information needs can be identified and the process begins again. This is illustrated in Figure 4.3.
Identify Etc.
Information need Search, find Analyse, interpret Information use
New information need
Identify new unknowns
Information creation
Internalize, synthesize
Figure 4.3 The typical life cycle of information needs.
9781441128065_ch04_finals_txt_print.indd 79
7/6/2011 11:05:00 PM
e-Lexicography
80
4.4. Information Technologies and Techniques Current information technologies and techniques that could impact on the availability of customized data from e-dictionaries are discussed in brief, giving examples of general use, as well as examples from current e-dictionaries, where available. Some of these technologies are being used to a limited extent in e-dictionaries; extensive and general use has not been observed. Standard database technologies, including relational databases, are not discussed, since this is generally being used in e-dictionaries. Of the list below, the most common technology currently being used is searching and navigating; however, this is used at different levels of complexity in many dictionaries and is included here to indicate possibilities offered by searching and navigating. z z z z z z z z
Searching and navigating User profiling Filtering Adaptive hypermedia Metadata markup Linked open knowledge (data/content) Recommender systems Annotation systems
These information technologies and techniques don’t imply that the e-lexicographer should necessarily create all the information in the e-dictionary (or e-information tool). In many cases, a set of links to external information sources can contribute to the comprehensiveness of the information source. These links can be selected individually by the e-lexicographer, or can be generated automatically based on predefined criteria. It is also possible to incorporate the information directly into the e-dictionary (obviously with proper attribution and copyright being observed), to create so-called mashups1 (Ankolekar, Krötzsch, Tran & Vrandečić, 2008), where information from multiple sources are incorporated directly into the e-dictionary. The underlying principle in such cases would be to reuse existing information and not to recreate information; this would allow e-lexicographers to develop e-information tools much faster and make much more content and detail accessible to their users. e-lexicographers should, however, ensure that such integration is seamless and that the user is always aware what the origin of the information is (created by the e-lexicographer, imported as a mashup or externally referenced).
9781441128065_ch04_finals_txt_print.indd 80
7/6/2011 11:05:01 PM
Filtering and Adapting Data and Information
81
4.4.1. Searching and Navigating For the purpose of this chapter, searching is defined as the directed exploration of a defined information space with a defined objective in view and a defined search strategy. Navigating, on the other hand, is the exploration of a defined or undefined information space without using a defined strategy. Navigating can be split into two sub-categories, viz. browsing and surfing. Browsing implies that the user has a defined objective but has no defined search strategy; surfing implies that the user has no defined objective in mind, but is serendipitously following links that may seem interesting.2
4.4.1.1. Search Searching implies that the user has a definite search strategy. The user is, therefore, required to define one or more search terms or phrases that are input into a search engine, by means of which the user searches for documents that match the search terms or phrases. Search can use a simple or complex search interface. In a simple search interface, the user usually has only one search box in which the terms are inserted and the search engine searches the full database or document for words that match the terms. In a complex search, the user is usually presented with various search boxes that enables him/her to specify where in the database or document specific terms should occur, for instance in an author field, a subject field, an abstract, body text and so on. In a complex search, users also use Boolean operators (and, or, not) by means of which terms are combined, as well as brackets to indicate how the order of the operators are to be interpreted. Typically, standard Web search engines have both options – a simple search interface and an advanced search interface, as in, for instance, Google3 and Bing.4 Bibliographic and full-text databases of journals also have both; however, in such databases, one usually has many more options to do a field-oriented search because of the structure of the database. Search engines also usually support wild card searching (e.g. using a symbol to replace a letter to make provision for spelling variations, as in ‘organi*se’ which will match both ‘organize’ and ‘organise’) and truncation (e.g. ‘librar*’ to match ‘library’, ‘libraries’, ‘librarian’, etc.). Matches can either be an exact match or best match, that is, the search term exactly or only closely matches the term in the source (Järvelin & Ingwersen, 2010). Relevance ranking of search results also occurs frequently; relevance ranking is simply in terms of system relevance and cannot take into account further dimensions of relevance (see Cosijn & Ingwersen, 2000; Cosijn, 2006; Järvelin & Ingwersen, 2010).
9781441128065_ch04_finals_txt_print.indd 81
7/6/2011 11:05:01 PM
82
e-Lexicography
Typical support for searching in e-dictionaries is at the level of exact match and simple search, sometimes with an advanced search interface included. Advanced search interfaces can be of different complexity, as is evident from the examples taken from the Elektroniese Woordeboek van die Afrikaanse Taal (e-WAT) and the Oxford English Dictionary (OED) in Figure 4.4. From these two examples, it is evident that the OED supports much more specific search than does the e-WAT. The OED makes a basic distinction
Figure 4.4 Advanced search interfaces in the e-WAT and the OED.
9781441128065_ch04_finals_txt_print.indd 82
7/6/2011 11:05:02 PM
Filtering and Adapting Data and Information
83
between searching the full entry or only quotations and supports inter alia Boolean operators indicating in which fields should be searched, proximity operators and a part of speech filter. The e-WAT supports only an ‘and’ or ‘or’ operator and phrase searching. This unequal use of advanced search options was also observed by Almind (2005) in his evaluation of e-dictionaries and he remarked that ‘[g]iving access to powerful search-functions is essential to a dictionary’s success’ (Almind, 2005: 104).
4.4.1.2. Navigate Navigation is the most common way to move between discrete information entities on the web – a user can either browse or surf from Web page to Web page, trying to find either specific information or simply follow links that appear interesting. Links can be reference links to information outside the current website or cross-references to related material in the same website. It is quite common to indicate clearly whether links are cross-references or to external sources, as is found in, for example, Wikipedia, where external sources are always indicated by means of a special symbol. In many cases, a Web article has an internal microstructure that is displayed at the beginning of an article (as a table of contents) and by means of which it is possible to find the required information easier. One of the most well-known cases of such microstructures occurs in Wikipedia, where each article has an internal structure that facilitates access to the material within the article. This type of internal microstructure also sometimes occurs in e-dictionaries, for example in Wiktionary 5 (see Figure 4.5a) and the floating clickable index icon in Webster’s Online Dictionary.6 This microstructure is called an ‘entry map’ in the OED; such an entry map provides easy access to individual parts of the dictionary, but is unfortunately not very helpful for the average user as it is totally opaque, as is evident from Figure 4.5b.
4.4.1.3. Combination of searching and navigation Users usually combine searching and browsing to satisfy an information need. A search, be that through a search engine or in an e-dictionary, usually results in multiple results. Users then navigate from the list of results to the actual full text article by following links. Based on the user’s information needs, multiple links in an article to further articles can be followed (browsed) to enable the user to obtain sufficient detail to satisfy his/her information need. A user can also be distracted through interesting snippets and start surfing from article to article, not based on the original information need but on a serendipitous interest created by an interesting piece of information.
9781441128065_ch04_finals_txt_print.indd 83
7/6/2011 11:05:04 PM
84
e-Lexicography
Figure 4.5 Part of an internal microstructure in Wiktionary and part of an entry map in the OED.
4.4.2. User Profi ling/Modelling Information retrieved through a specific search or navigation action can be filtered by means of profiling or modelling of the user. Therefore, two users searching the same database with the same terms may retrieve different results based on their individual profiles. This can help to reduce information overload and provide the user with customized information tailored to his/her specific needs in a specific situation. This obviously also implies that it should be possible to change one’s profile, based on the specific information need at any given time. User profiling can be accomplished through the user supplying the system with specific data, by the system tracking user behaviour and thereby automatically constructing a profile of the user or a combination of the two. User profiles can be transitory (e.g. applied to a single search only) or persistent (e.g. be stored on the system and be used for future retrieval actions; see also Amato & Straccia, 1999; Gauch et al., 2007; and Kobsa, 2007).
9781441128065_ch04_finals_txt_print.indd 84
7/6/2011 11:05:04 PM
Filtering and Adapting Data and Information
85
4.4.2.1. Form fi ll-in data The easiest way to construct a user profile is by means of the user supplying data about his/her goals, preferences and/or levels of knowledge. For example, a user may be primarily interested in classical music with a good knowledge of music from the Romantic period, less knowledge of baroque music, and no knowledge of or interest in modern classical music. Based on this interest and knowledge, a user may indicate that he/she is a specialist in certain fields and a novice in other fields. If the user then searches a music database for specific information about Mozart, the system will return only information aimed at the specialist; if the same user searches for a modern classical composer, the system will not return references to specialist information but only information aimed at a novice or semi-expert. A profile based on form fill-in data may include personal data such as location-based data. However, such information should only be asked if it is relevant to the possible search results. For example, when searching a site for booking events, it might be useful to limit results to a specified distance from home. When searching for a live performance to attend, one could specify that the system returns only performances within a certain distance from the specified address – getting results for Cape Town when living in Pretoria is not particularly helpful and only results in unnecessary information number overload. Personal data should only be included if it is relevant to fi ltering the search results. Age and gender, for instance, will not be relevant when choosing a live performance, but age may be relevant when choosing a hiking trip, in terms of its difficulty level. Such form fill-in may require a considerable amount of effort from the user and may require that the user revise his/her profile each time before a search is carried out. It obviously requires a sophisticated search system, both in terms of the variables provided for and also in terms of the database structure and the way in which data is characterized, as is discussed in more detail below.
4.4.2.2. Automated tracking of user behaviour A user profile can also be created automatically by the system, based on the user’s behaviour in the system. In this case, the profile is based on the pages that the user visited and the links that he/she followed, as well as, possibly, on the time spent on specific pages. Using this behavior, the user’s interest and knowledge levels are calculated and this is then used to either filter information in a search or to suggest possible links of relevance that can be followed. Such analysis requires sophisticated characterization of the data in the database on the basis of which the data can be analyzed and matched to the profile.
9781441128065_ch04_finals_txt_print.indd 85
7/6/2011 11:05:05 PM
86
e-Lexicography
4.4.2.3. Combination of both A combination of form fill-in data and user tracking can create a powerful profile and thereby filter out all information that does not comply with the user’s pre-defined profile or his/her recent searching behaviour. It is, however, important that the user be aware of how a profile is created and that the user has the option to change or reset the profile at any stage. For example, a highly knowledgeable scientist would usually require only highly specialized information in his/her field of research; however, were this person to search in an effort to help a child find information for a school assignment, the use profile would be totally different and the usual profile would be totally irrelevant. Many e-commerce sites contain aspects of profiling and remember a user’s preferences when the user returns to the site. This profile can be based on the user selecting a number of topics that he/she is interested in when he/she registers on the site for the first time and/or on the user’s search and navigation behaviour on the site. Such sites then often use this information to filter information or recommend products to the user on his/her return to the site, as discussed below. All social network sites, such as Facebook7 and Twitter,8 including academic social network sites such as LinkedIn,9 require user profiles to function. To the best of my knowledge, user profiling where the user can specify his/ her own profile or where a profile can be created automatically does not currently occur in e-dictionaries.
4.4.3. Filtering Filtering can be user-controlled through the choices that the user indicates in the system or can be system-controlled.
4.4.3.1. User-controlled through choices User-controlled filtering can be based on a persistent user profile that the user has created through filling in data in a form, or it can be based on selections the user makes at the time of consultation. The selections can be in the form of fill-in choices or choices through manipulating sliders (see, for example, Kang & Shneiderman, 2000; Shneiderman, 2003). The advanced search option in the OED (Figure 4.5 above) contains an element of filtering through its part of speech filter. Many standard search engines also contain filtering elements, for instance the possibility of specifying the language or the country of origin of the results in Bing, as well as the possibility of selecting text documents or images, and a date filter in Google Scholar.10 Such filtering options are much more sophisticated in e-journal
9781441128065_ch04_finals_txt_print.indd 86
7/6/2011 11:05:05 PM
Filtering and Adapting Data and Information
87
platforms such as EbscoHost,11 and ScienceDirect,12 where one can specify databases and/or subject fields, date ranges, publication type (journals, books, reference works), language and so on. In the e-dictionary environment some of the dictionaries by Ordbogen. com13 support limited filtering based on the type of user need, for example a communicative need (to understand a fi xed expression or to produce a text) and a cognitive need, as in the Ordbogen over faste vendinger (Figure 4.6). In the Base lexicale du français,14 the user is presented with a large number of predefined filters on the database. The user can, for example, select to get information on a word and then choose to find out about the gender, spelling, verb form (if it is a verb) and meaning of the word, as illustrated in Figure 4.7. All information in the database not related to the specific query is then filtered out.
Figure 4.6 Filtering (search) options in the Ordbogen over faste vendinger.
Figure 4.7 Filtering options in the Base lexicale du français.
9781441128065_ch04_finals_txt_print.indd 87
7/6/2011 11:05:05 PM
e-Lexicography
88
4.4.3.2. System-controlled The system can also automatically filter information presented to the user based on the user profile (provided by the user, created by tracking user behaviour or a combination of both). For examples, see the discussion on filtering in e-commerce sites and social networks above. No example of such filtering could be found in the e-dictionaries consulted.
4.4.4. Adaptive Hypermedia According to Brusilovsky (1996: 100), adaptive hypermedia can be subdivided into two categories, namely adaptive presentation and adaptive navigation support. Adaptive presentation again consists of either text or multimedia presentation. Adaptive navigation support refers to a system’s ability to manipulate links, viz. sort, hide or annotate links, and to provide direct guidance through linking, either text-based, multimedia-based or based on a navigational map. These categorizations are depicted in Figure 4.8. (See also Brusilovsky & Maybury, 2002; Brusilovsky & Millán, 2007; Bunt, Carenini & Conati, 2007; Knutov, De Bra & Pechenizkiy, 2009; He et al., 2007.)
4.4.4.1. Adaptive presentation Adaptive presentation refers to the system’s ability to either expand (show) or hide (not show) data / information (be this text-based or multimedia-based),
Adaptive multimedia presentation Adaptive presentation
Adaptive technologies
Adaptive text presentation Direct guidance Adaptive sorting of links
Adaptive navigation support
Adaptive hiding of links Adaptive annotation of links Map adaptation
Figure 4.8
Adaptive hypermedia technologies, based on Brusilovsky (1995: 100).
9781441128065_ch04_finals_txt_print.indd 88
7/6/2011 11:05:07 PM
Filtering and Adapting Data and Information
89
based on a user’s preferences. It can be system controlled (based on the user’s profile) or manually controlled at the time of reading. A user can, therefore, be presented with the amount of information he/she requires at any given stage: a user can, for example, require only a limited overview of a topic, or an in-depth discussion of the same topic. In both cases the user should have the opportunity to control the system. If, for example, the user’s profile is set to ‘overview-level information’, he/she should have the opportunity to expand the information to an in-depth presentation if he/she were to require more detail. Conversely, if the profile is set to ‘in-depth information’, the user should have the opportunity to select an option to be provided with only an overview if the in-depth information provides too much detail. The preceding example refers only to the amount of information presented to the user. There are obviously many categories and many different levels of categories that can be used to describe the information need of the user, for example a continuum of information complexity aimed at the lay person vs. information for the expert and any sub-category in-between, such as ‘interested’ lay person (i.e. someone who has a basic knowledge of the topic), semi-expert and so on, as described above. Language can also be a selection criterion; the user can, for example, specify the languages in which the information should be written. Adaptive navigation support is covered by the five categories listed in Figure 4.8. In the case of direct guidance, the user is directed to read one page or article after another in, for example, an online learning system where readers should read the material in a predefined sequence. Adaptive hiding of links implies that only certain links are shown to the user, based on his/her profile. For example, links to information that is categorized as ‘complex’ can be hidden from a user whose profile is set to ‘lay person’. (See the references above and also, for example, Brusilovksy, 2007; Brusilovsky, Sosnovsky & Yudelson, 2009.) However, any user should at all times be able to change any level of information presented to him or her by the system. For example, the semi-expert may regard the data that is presented as either too scientific or not scientific enough, too detailed or not detailed enough, and should then have the ability to change all parameters to match his/her specific needs in the given situation and the system should enable the user to either drill down to further levels of information or too ‘zoom out’ to obtain only an overview. In the graphic environment, such zoom in/out is very common, for example in all geographical applications such as Google Maps15 and Google Earth.16 It also links to the HCI mantra of Shneiderman, viz. ‘Overview first, zoom and filter, then details-on-demand’ (Shneiderman, 2003) and can be found in much of the work done at the HCI laboratory and the University of Maryland (see, for example, Bederson & Shneiderman, 2003; and www.cs.umd.edu/hcil/). In the text environment, this is more complex. These characterizations of text-based information available to a user could be at the document level, page
9781441128065_ch04_finals_txt_print.indd 89
7/6/2011 11:05:08 PM
90
e-Lexicography
level, paragraph level or even smaller. This implies that a whole document can be aimed at the expert and another document on the same topic at a lay person. However, it is also possible that the granularity can be much finer grained. A single document can, therefore, be written in such a way that certain sections are aimed at the expert, others at the semi-expert and others at the lay person. The system should be able to present the correct level of information to the user, depending on his/her specified user profile in any given situation. Currently, the only way this can be done effectively is by means of marking up data in the document to indicate for which type of user the data is intended.
4.4.5. Data Markup Metadata is used to describe inter alia the content, structure and administrative properties of digital data. Metadata can be created by a system or by a human. All Microsoft Office documents, for example, contain metadata fields that are automatically created by the system, for example versioning, document statistics, date and time, and so on. It is, however, possible to add further metadata to any document, such as a short description, key words, abstract, author details, status of a document (private/public, in revision/final, etc.). Typically Web pages are (or should be) described by a set of metadata elements commonly referred to as the Dublin Core metadata schema.17 However, Dublin Core is very limited in its abilities to describe complex properties of Web documents. Many different metadata standards have, therefore, been developed to describe data in more detail; a partial list can be found at Wikipedia.18 In the Learning Objects Metadata schema it is, for example, possible to indicate the level of difficulty of an element, as well as the typical age range at which the object is targeted, and the typical amount of time a learner should spend on mastering the content, and so on.19 Such metadata standards are often described in an XML environment using RDF, 20 that is, Resource Description Framework (see also Dolog & Nejdl, 2007). By means of metadata, it is therefore possible to describe the properties of any data element, at the macro level of a full document, but also at a much finer granularity, even up to the individual paragraph, sentence or even smaller. An existing metadata schema can be used, or a combination of metadata schemas or even a totally new metadata schema can be created, based upon the characteristics of the user profile referred to earlier. If data is then clearly defined through the metadata markup it is easy to match individual data elements to specific user profiles. For example, if a user profile specifies that the user needs detailed scientific information, the system will present the user only with information that has been marked up with the labels ‘detailed’ and ‘scientific’; if, on the other hand, the profile specifies that the user needs
9781441128065_ch04_finals_txt_print.indd 90
7/6/2011 11:05:09 PM
Filtering and Adapting Data and Information
91
only short definitions aimed at the lay person, only data with these attributes will be presented to the user. Complex markup of data by means of metadata on the Web is not common. It is, nevertheless, the ideal of the Semantic Web that such enriched data should exist on the Web. ‘The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing’ (www.w3.org/2001/sw). To the best of my knowledge, such complex metadata markup of data does not yet occur in e-dictionaries; it may, however, occur in a research environment in a limited fashion, as is evident from the contribution by Spohr in this volume (Spohr, 2011). In addition, to the best of my knowledge, the possibility of specifying (and changing) detailed user profiles and/or track user behaviour and adapting information presented to the user based on the profile and user tracking also does not exist in available e-dictionaries. Proper metadata markup of data in an e-dictionary and sophisticated user profiling go hand-in-hand – the one is only useful as a complement of the other. It is, therefore, not useful to profile a user extensively if this cannot have an influence on the data which is presented to the user; conversely, it is not useful to mark data in the dictionary up extensively if these characteristics of a data element are not used to present information selectively to the user, depending on his/her information needs. There is, therefore, much scope for research and experimental development work to see to what extent such features will make e-dictionaries more useful for users. Extensive metadata markup is, however, also required if an e-dictionary intends making using of external data sources as supplementary data for the dictionary, for example to refer to examples of real life usage of expressions in literature, newspapers and other corpora, by making use of open data sources, as discussed below.
4.4.6. Linked Open Knowledge (Open Content / Open Data) It is claimed that ‘[a] piece of knowledge is open if you are free to use, reuse and redistribute it – subject only, at most, to the requirement to attribute and share-alike’;21 in this context knowledge includes creative content such as ‘music, films, books’; ‘[d]ata be it scientific, historical, geographic or otherwise’, as well as ‘[g]overnment and other administrative information’. The term, therefore, implies that the knowledge is in the public domain (i.e. without copyright restrictions). There are currently many projects that
9781441128065_ch04_finals_txt_print.indd 91
7/6/2011 11:05:09 PM
92
e-Lexicography
are aimed at making open data available. The Open Data Movement22 ‘aims at making data freely available to everyone’ ‘by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources’; see also Bizer et al. 2008. Figure 4.9 provides a diagrammatic representation of the datasets that have been published and interlinked. And ‘[c]ollectively, the data sets consist of over 38 billion RDF triples [ . . . ], which are interlinked by around 389 million RDF links [ . . . ]’ (August 2010).23 For the purpose of this chapter it is not necessary to describe the technical detail of how RDF triples and links work; it suffices to realize that there is a vast amount of data available that can be reused, including literary works such as books available in Project Gutenberg, 24 U.S. government data available at the U.S. government portal, 25 and U.S. census data, 26 geographical data, 27 scientific data in many semantic repositories28 and so on. The principles of open linked data enable the developer to link to any individual datum since it is uniquely identifiable. Much work has been done on the role of corpora in dictionaries; see, for example, Prinsloo 2009 and the opinion regarding the value of such corpora differ widely. Lexicographers should, however, consider exploring the vast stores of linked open data to provide access to additional information to satisfy especially cognitive needs of users. Currently, e-dictionaries provide links only to so-called outer texts and to manually selected external examples; examples of both principles can be found in a number of the dictionaries by Ordbogen. com, 29 as well as in many other free online dictionaries. This, however, does not make systematic and/or automatic use of reusable data and requires a tremendous input in terms of time and effort from the lexicographer. One example of a dictionary that does provide the option to link to a huge external database of examples is the Base lexicale du français (BLF).30 After getting examples of the meaning of a word the user has the option of linking to various corpora, including a set of documents of the European Parliament and Wikipedia (a selection of documents from the Corpuseye corpora;31) see Figure 4.10 for an example. These examples are automatically searched by the BLF and the selection of the examples does not require any input from the lexicographer (except, obviously, specifying those corpora to be searched). Linking to external text corpora requires more than metadata markup of the corpus. Inflected forms also need to be normalized to a root or lemma form for matching, for example inflected forms such as ‘goes’ and ‘went’ need to be normalized to ‘go’ by means of natural language parsers. In highly inflected languages, this becomes a huge additional task, however, outside the domain of the lexicographer but part of the work of a computational linguist. The Corpuseye corpora used in the BLF are automatically annotated (combined with manual linguistic revision) for part of speech and morphology.32 Markup of corpora becomes more complicated the more inflectional a language is. In the classics the most well-known project in this environment is the Perseus
9781441128065_ch04_finals_txt_print.indd 92
7/6/2011 11:05:09 PM
9781441128065_ch04_finals_txt_print.indd 93 Symbol
Yago
Geonames
UniSTS
Diseasome
Pub Chem
Eurostat
Daily Med
CAS
Project Gutenberg
QDOS
HGNC
Doapspace
OMIM
Drug Bank
MGI
GEO Species
ChEBI
Gene Ontology
PubMed
GenelD
Reactome
ACM
Inter Pro
UniPort
DBLP Hannover
PDB
Pfam
PROSITE
IEEE
Eurécom
UniRef
DBLP RKB Explorer
IRIT Toulouse
Resex
CiteSeer
UniParc
Budapest BME
RKB ECS Southampton
Pisa
SW Conference Corpus
RDF ohloh
OpenGuides
RDF Book Mashup
Virtuoso Sponger
Revyu
Semantic Web.org
Wikicompany
DBLP Berlin
Freebase
Open Calais
SIOC Sites
Flickr exporter
SemWebCentral
flickr wrappr
lingvoj
Linked MDB
FOAF profiles
KEGG
LinkedCT
DBpedia
Crunch Base
UMBEL
World Factbook
BBC John Peel
W3C WordNet
Magnatune
BBC Later + TOTP
AudioScrobbler
LIBRIS
ECS Southampton
Linked datasets from the W3C SWEO Linking Open Data community project.
Homolo Gene
Open Cyc
US Census Data
riese
Pub Guide
Jamendo
Musicbrainz
Surge Radio
Source: www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009–07-14_colored.png.
Figure 4.9
GovTrack
Linked GeoData
BBC Programmes
BBC Music
MySpace Wrapper
BBC Playcount Data
As of July 2009
ProDom
IBM
Newcastle
CORDIS
National Science Foundation
Taxonomy
LAASCNRS
eprints
RAE 2001
ReSIST Project Wiki
Filtering and Adapting Data and Information 93
7/6/2011 11:05:09 PM
94
e-Lexicography
Figure 4.10 An example of automated selection of corpus examples from the Base lexicale du français. Source: http://ilt.kuleuven.be/blf.
digital library,33 which provides support for automatic lemmatization and morphological analysis of Greek, Latin and Arabic texts.34
4.4.7. Recommender Systems A system can recommend to the user to access certain information or perform certain actions based on the user’s current information behaviour or his/her user profile. In some cases, recommender systems are taken to a further level of sophistication in that recommendations are made on the basis of multiple user profiles. In this case, the system links the user to a group of users with similar interests and assumes that the specific user will share interests with other users of the group. (See, for example, He et al., 2007, and Von Reischach, Michahelles & Schmid, 2009.) Recommender systems are very common in e-commerce sites. It is primarily used to market additional products based on a user’s prior information behaviour, for example, by providing details of items similar to the one the user has searched for. In Amazon,35 there is a sophisticated recommender system in which a user’s specific search is matched against the searches and buying patterns of other customers who bough the same product. The systems then recommends not only a similar book, but, as in Figure 4.11, indicates that the book the user searched for, was often bought together with another book (and they then offer a special deal on buying both books); in addition, they also provide a list of further books people who bought the first book also bought. A simple recommender system also occurs in Google Scholar36 by means of the link to ‘Related items’ that appears after each search result. To the best of my knowledge, no recommender systems currently occur in e-dictionaries. The usefulness of such features are also not directly apparent – if the user is simply looking for the meaning of word A, a reference to the meanings of words B, C and D would not be very helpful. However, references to synonyms and antonyms in a dictionary could be seen as a type of
9781441128065_ch04_finals_txt_print.indd 94
7/6/2011 11:05:12 PM
Filtering and Adapting Data and Information
95
Figure 4.11 An example of recommendations in Amazon.
recommendation; this feature occurs quite commonly in many dictionaries at the level of single words, as well as for fi xed expressions and idioms. An example of this can be found in the Ordbogen over faste vendinger : if one searches for the idiom spille andenviolin (‘to play second violin’) the meaning (and other details, depending on the type of search executed) is provided, as well as a reference and link to the antonym, spille førsteviolin (‘play first violin’). Recommendations can be very useful in a text production dictionary. If a user looks at the meaning of a specific word or looks at examples of how the word is used in context, recommendations for alternative words, idioms or examples could enable the user to make a better choice. For example, the dictionary could indicate that the specific word or idiom the user is looking at is not neutral in terms of style and that the user should consider a more neutral alternative (depending, obviously, on the style of the document the user is working on, which could already be specified in the user profile). In the case of an idiom, the dictionary could specify that the idiom is regional and that the user should consider a more general alternative (or vice versa, depending on the requirements specified in the user profile). Recommendations based on a collaborative profile of a workgroup could also be very useful: the system can, for example, recommend a specific technical term for a user based on selections of other users within the group.
4.4.8. Annotation Systems In the Web 2.0 environment, users of a system are not only the users of the information but are often also creators of new information (see http://en.wikipedia. org/wiki/Web_2.0). Users are, therefore, encouraged to share their thoughts
9781441128065_ch04_finals_txt_print.indd 95
7/6/2011 11:05:12 PM
96
e-Lexicography
and views with other users. This is a standard feature of blogs. Currently, this quite often occurs on news sites such as CNN (http://edition.cnn.com) and Skynews (http://news.sky.com/skynews) websites. On both sites, users can recommend stories by means of clicking a tick box. Skynews allows readers to comment on stories and blog posts. The WunderPhotos sub-site of the WeatherUnderground website37 allows users to rate photographs (out of ten) as well as to add comments and the number of ratings as well as the average score of the photograph is displayed. YouTube (www.youtube.com) has similar features; videos can, however, only be rated as ‘good’ or ‘bad’. In these cases, the annotations are all public annotations, that is, visible to everyone. As such, this is open to abuse and comments are often moderated, either by a moderator or by users reporting a comment as ‘inappropriate’. Annotations can, in addition to being public, be private (available only to the author of the annotation) or shared only with a predefined group of people (such as a workgroup or a group of friends). In all preceding cases, the annotation does not change the original in any way and the integrity of the original is maintained. In the wiki environment, Wikipedia and Wiktionary being prime examples, users are allowed to register as editors and change the original. A distinction should therefore be made between annotating an original and editing or changing an original. The latter requires much more rigorous security and quality control and possibilities to revert to earlier versions. Annotations need not only be text-based, but users can also annotate items by means of adding multimedia content, as in Google Earth38 and Google Maps.39 Figure 4.12 provides an example of a Google Earth view of Segovia, with a photograph of the cathedral added by a user; all the small blue squares represent photographs. Since the photographs are stored and can be uploaded and viewed in another database, Panoramio,40 this is example can be seen as a mashup, as well. In some cases, systems do fairly sophisticated analysis of user comments and ratings and provide the results of the analysis as a summary for the user, in addition to the original comments. This occurs fairly commonly in accommodation booking sites. For example, in Booking.com,41 an online hotel reservation site, users who have stayed in a particular hotel are invited to rate the hotel in terms of various categories (staff, services, value for money, etc.) and to provide comments. This is then aggregated into a summary table that gives the average score (based on the number of reviews) for the particular service. They, in addition, provide a further analysis, viz. to distinguish between the ratings of various categories of visitors, for example ‘solo travellers’, ‘young couples’, ‘mature couples’ and so on, as is illustrated in Figure 4.13; the full comments of the reviewers are also available. To the best of my knowledge, no annotation features are currently available in e-dictionaries. The only such feature is the possibility of providing feedback
9781441128065_ch04_finals_txt_print.indd 96
7/6/2011 11:05:13 PM
Filtering and Adapting Data and Information
97
Figure 4.12 An example of a photograph annotation in Google Earth. Source: http://earth.google.com.
Figure 4.13 Hotel evaluation analysis at Booking.com.
to the editor(s) by means of e-mail or an online form. A user can therefore request further clarification on a word from the editor(s) or add interesting facts that the editor(s) can incorporate in the dictionary, either immediately (if the editor works on the ‘live’ version of the dictionary) or for a future update.
9781441128065_ch04_finals_txt_print.indd 97
7/6/2011 11:05:13 PM
98
e-Lexicography
The ability to make private or workgroup annotation in an e-dictionary would be very useful. Users could thereby personalize specific entries in terms of their own use of specific words. For example, members of a technical workgroup could add comments on technical terms that are commonly used by the workgroup, to indicate specific usage or additional examples. By means of public annotations, the currency and completeness of an e-dictionary could be enhanced. Users could add detail about a word, expression or idiom that the lexicographer(s) didn’t include or didn’t know about, for example the exact meaning or use of an idiom in regional context. Such annotations could be shown as user annotations and be included in the main body of the dictionary when a new edition appears, or the lexicographer could immediately incorporate the detail in the dictionary. In all cases, strict quality control is obviously required.
4.5. Conclusion The preceding section discusses some innovative technologies and techniques that can be used in e-dictionaries to customize information access in response to user needs. In each case, the technology is discussed briefly and examples from current products are used to illustrate the technologies. The use of these technologies, or (sometimes vaguely) similar technologies, in e-dictionaries is also discussed, where such examples exist. From these discussions, it is clear that the technologies are used very unevenly in e-dictionaries and even those that are used, are not used equally effectively in all dictionaries. Searching and navigating are the two technologies that are used most commonly. All e-dictionaries have a simple search interface where the user can input at least one word. Most e-dictionaries also have an advanced search interface, but the level of sophistication differs widely in different e-dictionaries. However, most of the other technologies and techniques that are discussed are either used in an extremely rudimentary way or not at all. Sven Tarp commented in 2007 that lexicography ‘should promptly and totally adapt itself to the new technologies and it should, among others, explore the possibilities of performing user needs adapted searches on the internet combining traditional static data with dynamic data made available through the internet’ (Tarp, 2007), a sentiment reiterated by Tarp in his chapter in this volume (Tarp, 2011). As indicated in the preceding discussions, these technologies are available; they have achieved a level of maturity that makes them eminently suitable for use in all e-environments, including e-dictionaries. If lexicographers were to embrace these technologies, it would be possible to provide customized and customizable information tools that could satisfy the user needs of all individual users. It would therefore be possible to create information tools that would address the information needs not only of the
9781441128065_ch04_finals_txt_print.indd 98
7/6/2011 11:05:21 PM
Filtering and Adapting Data and Information
99
‘average’ user, but also of a specific user in ‘one out of a thousand consultations’, by providing ‘dictionaries capable of meeting all the users’ needs in specific types of situations’, as referred to by Tarp in the quotations in the introductory paragraphs of this chapter (Tarp, 2009a: 292; Tarp, 2009b). In such a customizable e-information tool, z the user will be able to: { set up a complex profile indicating his/her preferences; { change the profile based on specific information needs for any given
situation; and { drill down to the required level of complexity and/or detail in any given
situation; z the system will: { further adapt the profile based on the user’s information behaviour; and { present information to the user based on the characteristics of such a
profile; z the database will require that: { the data be marked up through a complex metadata schema – to enable matching the characteristics of the user’s profile with the
characteristics of the data; { there be links to external data sources (linking open knowledge)
– either through direct linking by the lexicographer; or – by on-the-fly searching of such external data sources { to enable the user to get additional information on demand; z The system will also be able to: { make recommendations to the user based on his/her profile and expressed information need; and { allow the user to make private, group or public annotations to the database to – enhance the user’s future use of the data; and – help the lexicographer to keep the database more current and up-todate. Obviously, a user will not be willing to set up or adapt his/her profile for each and every individual information need and consultation of the e-information source. The e-information tool should, therefore, also contain multiple predefined views on the database – each in effect a monofunctional e-information tool that would provide the user with a specified sub-set of information in the database in accordance with the lexicographer’s analysis of the ‘average’ information need of the ‘average’ user – typically the 80 per cent of users referred to in the earlier quotation from Tarp (2009a: 292). These monofunctional tools can typically be addressed at the main types of information needs identified in the publications by Bergenholtz, Tarp and others (as listed in the intro-
9781441128065_ch04_finals_txt_print.indd 99
7/6/2011 11:05:22 PM
100
e-Lexicography
duction to this chapter), viz. communicative needs (divided into text reception and text production needs), cognitive, operative and interpretative needs. However, the technologies will enable the user to create his/her own information tools to address his/her specific information needs – typically information at the cognitive level to extend, enhance or explain data presented, or very narrowly specified needs in the other three function categories. Such technologies should, therefore, have a huge impact on the praxis of modern e-lexicography. The question that has not been answered is, ‘To what extent will the use of these (and other) technologies and techniques influence the theory of e-lexicography and any proposed models or frameworks for e-lexicography?’ Currently, for example, the dictionaries produced by the Centre for Lexicography (Centlex) at the Århus School of Business and commercialized by Ordbogen.com (www.ordbogen.com) have a single database but multiple ‘views’ on the database, each view being a monofunctional tool aimed at a specified communicative, cognitive and so on need and each described as a monofunctional dictionary – the Music dictionary (Musikordbogen), therefore, in fact consists of three monofunctional dictionaries, the Dictionary of Fixed Expressions (Ordbogen over faste vendinger) consists of four monofunctional dictionaries, and so on. In an information tool as described above, there can also be predefined monofunctional views on the database; therefore, the equivalent of monofunctional dictionaries. However, if the user can customize his/her view of the database at a very fine level of granularity, it would be theoretically possible to create an infinite number of views of the database and therefore an infinite number of monofunctional dictionaries. Is this acceptable within the current theoretical framework? Furthermore, a user could also include in his/her selection, for example, data that would typically answer a text production information need as well as data that would typically be part of a cognitive information need. How would this influence the current theoretical framework? Equally, if the user can specify that he/she needs only a small subset of the information typically presented for a text production need, is the specific view still a monofunctional view on the database? Would it therefore be theoretically possible that one comprehensive database with external links and linked open knowledge can support an unlimited number of monofunctional dictionaries? These technologies therefore raise many questions at both the practical level and at the theoretical level of e-dictionaries. At the practical level, questions to be answered include: z How are these technologies to be programmed to ensure optimal access to
information? z How should the predefined monofunctional views of the database be
determined?
9781441128065_ch04_finals_txt_print.indd 100
7/6/2011 11:05:23 PM
Filtering and Adapting Data and Information
101
z How should user profiling be enabled – through form fill-in, through user
tracking, or both? z What categories/elements should be used to determine a user profile? z What is the optimal number of categories/elements according to which a
user profile should be compiled? z How should the data be marked up to enable a match between the user
profile and the data needed? z To what extent is open linked knowledge required to satisfy user needs? z How is this linking to open linked knowledge to be optimized? z How should recommendations be facilitated and what types of recommen-
dations are useful? z How should annotations be facilitated and what types of annotations are
useful? z What are the usability criteria for such an e-dictionary or e-information
tool and how is usability testing to be performed? z To what extent does the e-dictionary or e-information tool satisfy the diverse
information needs of a full spectrum of possible users? The only way these questions can be answered is through actually developing an e-dictionary or e-information tool that includes all these features to answer the above questions and then do proper usability testing. The results of this practical research should inform the theoretical research on the function theory of e-lexicography. The theoretical research should, however, in the meantime go ahead to inform the practical research. This simply means that, in the e-lexicography environment, theory informs praxis and praxis informs theory – the one cannot exist or advance without the other. This is very well summarized in a favourite remark of Henning Bergenholtz, ‘Nothing is more practical than a good theory’, used as the subtitle of the Introduction to the volume on lexicography published in his honour (Nielsen & Tarp, 2009a).
Notes 1. 2.
3. 4. 5. 6. 7. 8. 9. 10.
(http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29) (www.businessdictionary.com/definition/ {searching.html; browsing.html; surfing.html}) (www.google.com) (www.bing.com) (http://en.wiktionary.org) (www.websters-online-dictionary.org) (www.facebook.com) (www.twitter.com) (www.linkedin.com) (http://scholar.google.com)
9781441128065_ch04_finals_txt_print.indd 101
7/6/2011 11:05:23 PM
102 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
23.
24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.
e-Lexicography
(www.ebscohost.com/) (www.sciencedirect.com/) (www.ordbogen.com/) (http://ilt.kuleuven.be/blf/) (http://maps.google.com) (http://earth.google.com) (http://dublincore.org) (http://en.wikipedia.org/wiki/Metadata_standards) (http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Final_Draft.pdf) (www.w3.org/standards/xml, www.w3.org/RDF) (www.opendefinition.org/) (http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData) (http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData#dbpedia-lod-.cloud) (www.gutenberg.org) (www.data.gov/semantic/index) (www.rdfabout.com/demo/census/linked) (http://linkedgeodata.org/About) (http://eprints.rkbexplorer.com/) (www.ordboge.com) (http://ilt.kuleuven.be/blf) (http://corp.hum.sdu.dk) (http://beta.visl.sdu.dk/corpus_linguistics.html) (www.perseus.tufts.edu/hopper/) (www.perseus.tufts.edu/hopper/opensource) (www.amazon.com) (http://scholar.google.com) (www.wunderground.com/wximage) (http://earth.google.com) (http://maps.google.com) (www.panoramio.com) (www.booking.com)
9781441128065_ch04_finals_txt_print.indd 102
7/6/2011 11:05:23 PM
Chapter 5
A Multi-Layer Architecture for ‘Pluri-Monofunctional’ Dictionaries Dennis Spohr1
5.1. Introduction The importance of user needs and the desire to develop lexicographic tools that are capable of adapting to different users and situations have been highlighted throughout this volume (see e.g. the contributions of Theo Bothma, and Sven Tarp, this volume). In terms of function theory, the main idea behind this refers to the question of how lexicographic data can be modelled in such a way that a lexicographic tool is capable of satisfying the different types of needs of different types of users in different types of situations. As Tarp (this volume) points out, however, this account largely neglects the needs of the individual user, in his or her specific consultation situation. Therefore, what is needed is not only a lexicographic tool that is capable of dealing with types of users and situations, but one that provides the necessary mechanisms for an individualization of dictionary content – in terms of customizing the views that an individual user is given on the lexicographic data. From a theoretical perspective, some of the key ideas for modelling such tools have been introduced in Gouws (2006), who defined the concept of a Mutterwörterbuch (mother dictionary) as an abstract lexical database in which types of indications and lexical entities (i.e. both micro- and macro-structural items) are marked for inclusion in particular situations, in order to derive different monofunctional dictionaries. From a practical point of view, there are at least two possible technical realizations for this concept. The first one is in fact closely related to the initial theoretical conception, in that it operates directly on the lexical database itself. Here, a set of export routines or filters is created which translate subsets of the data from a lexical database into some other format (possibly several different databases) that serves as basis for the dictionaries that users have access to and whose content they are presented with (see Andersen & Almind as well as Nielsen & Almind, both in this volume). These
9781441128065_ch05_finals_txt_print.indd 103
7/6/2011 11:05:43 PM
e-Lexicography
104
dictionaries can be considered static in the sense that they need to be recompiled in case, for example, further data are added to the initial database. The second option is more dynamic in the sense that the lexical database is in fact the one that is accessed by the users, and the filters are defined on a separate layer between the user and the database, regulating access and presentation of the lexicographic content. While both of these approaches are perfectly valid for generating function-based views (which may appear to a user as different ‘dictionaries’), it seems that only the latter one brings with it the infrastructure that is necessary in order to further allow for active and interactive individualization, as discussed by Tarp (this volume). To understand, consider a user who wishes to generally restrict the information that is presented to him or her in a dictionary entry for example, to information on orthography and partof-speech. While in the second setting the user would define an additional filter like the one used for the different monofunctional dictionaries, the first setting seems to require a rather heterogeneous way in order to achieve the same result.2 Taking these considerations as point of departure, this article presents an approach to dealing with the satisfaction of user needs by means of a lexical resource model which is pluri-monofunctional in that it not only serves a single lexicographic function, but further allows for the dynamic extraction of different monofunctional dictionaries, that is, dictionaries serving exactly one lexicographic function. The different components in this architecture are modelled in the graph-based Resource Description Framework (RDF), by means of its standardized vocabularies RDFS and OWL. In addition to highlighting the benefits that these so-called Semantic Web formalisms bring with them, we outline how the architecture presented here can be extended to meet the needs of individual users. As a result, this article contributes to ongoing research on the individualization of user need satisfaction that will most likely be a predominant topic in scientific publications on lexicography in the near future. In the following sections, we will first introduce the concept of ‘plurimonofunctionality’ in relation to monofunctionality and multifunctionality. The subsequent sections will briefly introduce the main features of the formalisms used, and highlight their suitability for the definition of lexical databases – as opposed to relational database schemas or custom XML formats. Then we will discuss various aspects of the proposed architecture, and show how it has been implemented in a prototype of a lexicographic tool.
5.2. Background and Definitions As the previous section suggests, this work is based within the framework of function theory (see e.g. Bergenholtz & Tarp, 2002), which is so prevalent in this volume that it need not be introduced further here. We will simply reproduce
9781441128065_ch05_finals_txt_print.indd 104
7/6/2011 11:05:44 PM
A Multi-Layer Architecture
105
the definition of a lexicographic function as given by Tarp (2008a), according to which a lexicographic function refers to the ‘satisfaction of the specific types of lexicographically relevant need that may arise in a specific type of potential user in a specific type of extra-lexicographical situation’ (Tarp, 2008a: 81). This definition assumes three fundamental concepts; namely the types of need, the specific type of potential user, as well as a specific type of situation. For types of need, Tarp distinguishes between primary and secondary needs, where primary needs are those that make a potential dictionary user become an actual one, while secondary ones are those that do not arise until the actual dictionary consultation takes place. A specific type of user is characterized by various factors, such as mother tongue, language proficiency or general knowledge of the subject field that is covered by the lexicographic tool – which refers for example, to linguistic knowledge in the context of language dictionaries. Finally, types of situation cover for example, communicative situations, such as a user in a text-productive or text-receptive situation, or cognitive situations, in which a user wishes to learn more about a given topic. In addition to this, it is important to distinguish between those situations where the reception or production occurs in the mother tongue or in a foreign language, as the needs that are associated with these situations are different. Based on the above definition of a lexicographic function, it is possible to define a monofunctional lexicographic tool as one whose genuine purpose is to serve the needs of exactly one combination of user type and situation type, for example, a text-productive situation in German for a user with L1 English and fair linguistic background. In contrast to this, a multifunctional lexicographic tool is one that is capable of serving several combinations of user and situation types. However, in traditional lexicography the term ‘multifunctional’ has connotations of a tool that intends to satisfy the needs associated with different lexicographic functions by packing all relevant (and irrelevant) information items into a single dictionary entry. As function-theoretic scholars like Henning Bergenholtz or Sven Tarp put it, this commonly leads to information stress – or even ‘information death’ – by overloading the user with information, which is shown by Heid (this volume) in the context of usability studies of dictionary interfaces. It is, therefore, important to emphasize that this is by no means the intended interpretation of multifunctionality in this context. Rather, each function can be specified individually – for example, in terms of the relevant lexicographic indications – and the dictionary entries that are presented to a user are dependent on these specifications. In order to emphasize this, we will refer to the work presented here as the implementation of a pluri-monofunctional lexicographic tool, in the sense that it is capable of deriving multiple monofunctional dictionaries. The desire for pluri-monofunctionality entails a number of requirements, many of which can be inferred directly from the definition of a lexicographic function as given above. In particular, the definition refers to the question as
9781441128065_ch05_finals_txt_print.indd 105
7/6/2011 11:05:44 PM
106
e-Lexicography
to what is to be part of the consultation when and how. Here, ‘what’ refers to the types of need (i.e. what is required in order to satisfy the need), ‘when’ refers to the type of situation (i.e. when is a particular piece of information relevant) and ‘how’ refers to the characteristics of the user, namely how should a particular piece of information be offered to a particular (type of) user, both with regard to the level of granularity of description and the vocabulary used. For example, while it may be desired to display fine-grained descriptions using technical terms to expert users, it is generally more applicable to use coarsergrained descriptions and common-language terms for users who are untrained in the respective subject field. However, it is important to note that granularity does not entail that various ‘versions’ of the same piece of information should co-exist in the resource, as this would complicate maintenance processes such as data updating and integrity control drastically. It rather means that it should be possible to derive different granularities from a single representation. In the following section, we will present an elegant way of achieving this.3 Finally, the above questions must be addressed not only in the context of the presentation of lexical content in the form of dictionary entries, but also in the context of the access to the content, that is, when should a user have access to what and how. Thus, the lexical resource needs to allow for variable content access as well as variable content presentation. Again, there are at least two ways of approaching this. In the first of these, users determine the shape of the information they are presented with (i.e. the content presentation) by selecting a predefined search field in the search interface – for example, ‘what is the gender of Eindruck?’ – and the result provides an answer to their specific need. In case the user wishes to have additional information beyond this, he or she is directed to other (possibly external) sites that provide further information. The other option is to have a separate model that determines both the shape of the search interface (i.e. the content access) and the content presentation. Here, this model contains specifications of the needs of different user and situation types on the basis of previous analysis (such as e.g. the one presented in Tarp, 2008a), the user simply selects the corresponding situation type (e.g. ‘I want to understand a certain expression’ in the Musikordbogen of Bergenholtz and Bergenholtz, this volume) and a dictionary entry is generated based on the specifications in this model. In the approach presented here, we have opted for the second option. Strong support for this decision is provided by the results reported by Heid (this volume) in the context of the interface to the Base lexicale du français (BLF; Verlinde et al., 2010), which had initially opted for the first option and apparently asked too much of its users (see, however, Verlinde, this volume). In the following, we will discuss a number of the requirements resulting from the desired pluri-monofunctionality, as well as the implications they have on the components of the lexicographic tool.
9781441128065_ch05_finals_txt_print.indd 106
7/6/2011 11:05:44 PM
A Multi-Layer Architecture
107
5.3. Requirements and Solutions As emphasized by Bothma (this volume), one of the main prerequisites for a tool capable of adapting to the needs of different users is a structured data model underlying the tool. In other words, there must be a formal means for defining the elements of a lexicographic database. In the case of a language dictionary, these are for example, the lexical entities and those which are used to describe them, the relationships that may hold between these entities, as well as the conditions under which their descriptions are well-formed and complete. As trivial as it may seem, it is crucial to see that these conditions are not the same for all entities. In a linguistic resource, it may, for example, be necessary to specify that a compound word consists of certain other free morphemes, whereas the opposite is the case for example, for morphological roots. In order to make sure that only entities of the former type can make use of such a relation whereas the latter cannot, it is necessary to represent this type information in the lexical resource model both for entities and relations – that is, which types of relations are available to which types of entities, and which entities do they relate to. In terms of the aforementioned formalisms RDFS and OWL, this can be achieved by specifying for each type of relation a domain (i.e. the set of entities for which this relation has been defined) and a range (i.e. the set of entities – or simple values like strings – to which this relation links the entities in its domain). In slightly more formal terms, each relation typically specifies one or more classes in its domain and one or more classes (or a data type) in its range, the instances of which are then the items that are actually linked by the relation. For example, a relation like has-part-of-speech links instances of a class Lexeme – which includes, among others, simple words like ‘pressure’, compounds like ‘construction site’ and noun-verb collocations like ‘to exert pressure’ – to instances of a class PartOfSpeech, which has instances like noun, verb and adjective. This is, in fact, in direct opposition to traditional approaches in what has been labelled p-lexicography by Tarp (this volume), where the ‘type’ of an indication needed to be derived on the basis of typographical properties (e.g. italics) as well as the relative position in a dictionary article. However, the underlying model must meet further conditions in order to be able to offer the dictionary content to different users with varying granularities, which – as was mentioned above – should be derived on the basis of a single description, and not by means of reduplicating the same piece of information with different granularities. In order to achieve this, it is necessary to organize both the types of entities and the types of relations in hierarchical structures. For example, one could say that statements like ‘investigation is a nominalization of to investigate’ and ‘investigative is an adjectivization of to investigate’ are more fine-grained ways of expressing that ‘investigation and investigative are derivations of to investigate’. Thus, in a hierarchical system, one
9781441128065_ch05_finals_txt_print.indd 107
7/6/2011 11:05:44 PM
108
e-Lexicography
could define the two relations (i.e. ‘is nominalization of’ and ‘is adjectivization of’) as sub-relations of ‘is derivation of’. However, while this would allow us to express the same piece of information with two different granularities, it is still not enough to derive the more general statement on the basis of the more specific statements. This is where RDF(S) and OWL come into play again. On the one hand, they provide the necessary vocabulary in order to express hierarchical structures. On the other hand, however, the fact that they have been standardized by the World Wide Web Consortium (W3C) means that there is a wide range of tools available which are capable of interpreting this vocabulary. Therefore, on the basis of the two specific statements above and the fact that the respective relations are subsumed under a more general relation, the more general statements are inferred automatically, without the need to reduplicate the information in the lexical database. As a result, the different granularities are derived from a single (most specific) statement.4 It is unclear how this could be achieved for example, in a relational database management system in a similar way. On the basis of such hierarchical structures, it is now in principle possible to express for example, that the information on a certain level of granularity (e.g. at the level of ‘derivation’ in the example above) is relevant for particular nonexpert users, while the more detailed version (e.g. ‘nominalization’) is relevant only for expert users. This could be done by specifying for each kind of information item in the lexical database in which functions it is relevant. However, this would mean that the set of possible dictionaries derived from the lexical database (i.e. the number of lexicographic functions that can be served by its dictionaries) is encoded directly in the database itself. In other words, as soon as further functions are to be added, the underlying structure must be modified and extended. In view of a future development towards user individualization, this is a rather undesired configuration, as it would mean that individual users affect the structure of the underlying model. Instead, it is rather desired to have a modular architecture, in the sense that situation- and user-specific information is located on a layer that is separated from the lexicographic data in the database. This way, the addition of further functions only affects the layer in-between the user and the lexical database, not the database itself. In addition to the requirements discussed so far, which primarily refer to the general architecture of the tool and the formalism that is used to define it, there are further requirements which stem from the desire to be pluri-monofunctional as well, but which affect other components of the tool. For example, given that the tool is intended to be used by both beginners and expert users alike, it is necessary that easy access to the data is ensured for the former, without limiting the possibilities of deeper investigation and introspection for the latter. As a result, the interface to the lexicographic tool must allow for both simple and complex ways of access.
9781441128065_ch05_finals_txt_print.indd 108
7/6/2011 11:05:44 PM
A Multi-Layer Architecture
109
As has been shown in the course of this section, the formalisms that have been defined within the so-called Semantic Web are able to provide solutions to some of the issues discussed. Due to the target audience of this volume, however, we skip technical details on these formalisms and instead turn directly to the individual components of the lexical resource that has been defined by means of these formalisms.5 In the following sections, we thus start by providing details on the rich hierarchies that have been defined as part of a lexicographic data model that so to speak ‘ontologizes’ the entities of the field covered by the lexicographic database – which is linguistics in the case of the resource described in this article. The subsequent sections then illustrate how the specifications of user needs relate to these entities, and how all components are accommodated in a multi-layer architecture.
5.4. Lexicographic Data Model and Object Language Data The previous section has emphasized the need for hierarchical structures in order to allow for different granularities of description. In Spohr (2010), the author has presented a model for linguistic data inspired by recent proposals like the Lexical Markup Framework (LMF; ISO/FDIS 24613, 2008) and the General Ontology of Linguistic Description (GOLD; Farrar & Langendoen, 2003), which contains such hierarchies both for types of lexical entities and types of descriptive entities, that is, those entities which are used to describe the former. In the following, we will present and discuss a part of the hierarchy of lexeme types (Figure 5.1), as well as a part of the hierarchy of lexical relations that have been defined between these entities (Figure 5.2). As can be seen in Figure 5.1, the model distinguishes between several types of lexical entities, with free and bound units located at the highest level and more specific subtypes below each of them. As was mentioned above, these types can be assigned different properties and relations, some of which are shown in Figure 5.2. For example, there is a relation has-component-relation-to, with two direct sub-relations has-component and is-component-of. While it is reasonable to define the has-component-relation-to relation for all kinds of lexemes – since any lexeme can be in some sort of component relation to other lexemes – it makes more sense to restrict at least one of its more specific sub-relations to certain types. In particular, the relation has-component can be defined as being available only to those lexemes that have been classified as syntactically or morphologically complex, such as collocations or compounds. Similar hierarchies have been developed for simple descriptive entities like parts-of-speech or grammatical genders (so-called data categories; see e.g. ISO/FDIS 12620, 2009) and more complex descriptions like morpho-syntactic preferences or syntactic and semantic valence frames, in addition to the properties that are used to link the lexical instances to the instances of these classes.
9781441128065_ch05_finals_txt_print.indd 109
7/6/2011 11:05:45 PM
9781441128065_ch05_finals_txt_print.indd 110
VerbNounCompound
...
...
FreeUnit
Nominalised Unit
Acronym
...
Contraction
Proverb
...
Collocation
SyntacticallyComplexFreeUnit
Abbreviation
Idiom
Adjectivised Unit
Derived Unit
MorphologicallyComplexFreeUnit
SyntacticallySimpleFreeUnit
BoundUnit
Figure 5.1 Hierarchy of different types of lexemes
NounNounCompound
Compound
MorphologicallySimpleFreeUnit
...
Clitic
Lexeme
110
e-Lexicography
7/6/2011 11:05:45 PM
A Multi-Layer Architecture
111
In addition to this, Figure 5.2 illustrates a further benefit of the hierarchical structure. As can be seen, the two sub-relations of has-component-relation-to have sub-relations themselves, such as has-morphological-component, which links a morphologically complex entity to its components, and is-collocation-of, which links collocations to their components. Due to the inheritance mechanism of RDFS, the more general relations subsume the more specific ones, and thus it is possible to make use of what has been called underspecification in Spohr and Heid (2006). For example, if the lexical database contains a noun-noun compound such as ‘construction site’, which has been defined as having the component ‘construction’ and the head ‘site’, then it can be extracted both by a specific query like ‘give me all noun-noun compounds that have the head “site” ’, and a general query like ‘give me all lexemes that contain “site” ’. An immediate benefit of this is that it is not necessary for users at the beginners’ level who wish to consult the lexicographic tool to have knowledge of the exact details, while it is still possible for expert users to query the tool in this detail, if this is desired. We will see later how this allows for the definition of a highly dynamic user interface.
5.5. Specification of User Needs With the main features of the model introduced, we will now turn to the specification of user needs in relation to entities defined therein. As was mentioned before, we emphasize the fact that these specifications are not included directly in the database but on a separate layer, in order to ensure modularity and, thus, extensibility. Returning to the initial thoughts on plurimonofunctionality, the questions that must be answered in order to satisfy the specific (types of) needs of specific (types of) users in specific (types of) situations are: (i) ‘in which language and with which vocabulary should an indication be presented’; (ii) ‘what should be presented in which situation(s)’; and (iii) ‘which indications should be accessible6 for which users’. In addition to this, the characteristics of the user further determine how each of these are actually realized. For (i), this means that for users with little background knowledge in the respective field, indications should be presented using simple terms in the user’s mother tongue – or any other language that he or she desires – in order to avoid failure of satisfaction of the information need due to the fact that the tool is not capable of offering it in an appropriate form. For example, in the case of the linguistic resource described here, it may be appropriate to present an indication like ‘masculine gender’ to German users who are linguistically untrained as Geschlecht: männlich, while for expert users it would be preferrable to adhere to the terminology and present it as Genus: maskulin, respectively. For case (ii), one can make use of the very detailed analysis of Tarp (2008a), who specifies for example, that in productive situations, users need information
9781441128065_ch05_finals_txt_print.indd 111
7/6/2011 11:05:46 PM
9781441128065_ch05_finals_txt_print.indd 112
Figure 5.2
has-lexical-semantic-relation-to
has-synonym
has-synonymic-relation-to
has-quasi-synonym
has-quasi-antonym
has-antonym
...
...
is-collocation-of
...
...
has-antonymic-relation-to
has-component
has-morphologicalcomponent
is-morphologicalcomponent-of
has-proverbial-relation-to has-morphological-relation-to
...
...
has-idiomatic-relation-to
Hierarchy of different types of lexical relations
has-lexical-relation-to
is-component-of
has-collocation
has-collocational-relation-to has-component-relation-to
...
has-abbreviational-relation-to
...
...
112
e-Lexicography
7/6/2011 11:05:46 PM
A Multi-Layer Architecture
113
on synonyms and antonyms. In terms of the hierarchy of lexical relations shown in Figure 5.2, this means that the two relations has-synonymic-relation-to and hasantonymic-relation-to, as well as their four more specific sub-relations has-(quasi-) synonym and has-(quasi-)antonym, are relevant in a text-productive situation. However, users with very limited linguistic background knowledge might not be familiar with the distinction between synonymy and quasi-synonymy, and thus presenting them with a statement containing in particular the latter of these relations might puzzle users rather than satisfy their information need. Therefore, while these six relations are in general relevant in a text-productive situation, the four specific ones are not relevant if the user is one with little linguistic knowledge. Conversely, for expert linguists, one would specify that the two more general relations are irrelevant, while the four specific ones are those which probably meet their needs best. Again, it is important to remember that the internal representations are the same for all of them: only the most specific information has been stated in the lexical database, and the more general ones have been inferred by means of OWL’s inheritance mechanism. For (iii), which refers to the access functionality rather than the presentation of dictionary entries, the situation is slightly different. Here, it rather seems that what must be accessible depends even more on the characteristics of the user than on a particular type of situation. However, it seems to be safe to assume that the indications which have been identified as being irrelevant for particular types of users with respect to the presentation are also irrelevant in the context of access. For example, in the case described above, it seems natural to assume that has-quasi-synonym is an indication that is relevant (or irrelevant) for the same types of users. As a result, what are needed from a re-presentational point of view for cases (ii) and (iii) above are properties which specify the ‘presentation status’ as well as the ‘access status’ of an indication with respect to a situation and a user type. In OWL, this can be achieved by simply creating the relations has-access-status and has-presentation-status, and specify that they link an entity of the lexicographic data model to values like ‘primary’ or ‘secondary’ – in terms of the primary and secondary needs in Tarp (2008a) – or to ‘ignore’ in case an entity is irrelevant. For reasons which will be explained in the following section, it is advisable to keep situationrelated status specifications separate from user-related ones, for example, in different files. In a case like above, where an indication is relevant in a certain situation but at the same time irrelevant for certain users – even though they are in the corresponding situation – it can be safely assumed that the values specified with respect to users override the ones that come from situationrelated specifications. In a function-based approach, this basically means that user-related specifications reduce the set of indications that have been defined as relevant with respect to a particular situation and, thus, that ‘ignore’ overrides all other values. In view of further individualization, however, it is not the case that user-related specifications only reduce the set of indications.
9781441128065_ch05_finals_txt_print.indd 113
7/6/2011 11:05:46 PM
114
e-Lexicography
In addition to the status specifications, what is needed for case (i) above is a property that specifies labels in different languages and vocabularies. In order to be able to keep apart multiple specifications in one language, such as maskulin and männlich for German, as well as to be able to determine which one must be used when, these labels are stored in separate files, where one contains the labels in all supported languages for example, for expert users, whereas another file contains the labels for beginners, and again another file for intermediate – or any other (type of) – users. In terms of the Semantic Web formalisms mentioned above, each of these files receives its own namespace and can, thus, be identified easily.
5.6. Multi-Layer Architecture The previous paragraph has already introduced one of the main characteristics of the multi-layer architecture, namely the idea that information is distributed among different files. On the one hand, this modularity makes the architecture generally extensible for adding specifications for further situations. In addition to this, however, the fact that user-related specifications can be separated from situation-related ones means that as soon as a further situation is added to the existing ones, it is automatically available for all user types that have been defined. Likewise, in case a further user type is added, it is immediately available in all situations. As was mentioned above, user-related specifications should override situation-related ones, as there may be users who always want to be given information on the collocations that can be formed with a particular lexical item, even if they are in a situation where information on collocations is – as in text-receptive situations according to Tarp (2008a) – not necessary. Figure 5.3 summarizes the multi-layer architecture, with the lexicographic data model at the top, followed by the lexicographic data and the access and presentation model at the bottom. As is illustrated there, the data model contains classes and descriptive instances – that is, the types of the entities in the resource as well as possible values for the data categories – and the properties and relations used to connect them. On the lexicographic data layer, the types defined in the model are instantiated by actual lexical items, and are described by means of the properties and relations. Finally, the access and presentation layer defines which of the entities (both classes, properties and instances) are relevant to which users in which situations. In the figure, this is illustrated by means of the horizontal line representing a kind of filter that lets through certain entities while holding back others. In fact, this layer contains not only one but several such filters, each of which may let through or filter out different pieces of information, depending on the specific user and situation types (see Bothma, this volume). Finally, this layer specifies labels in different languages
9781441128065_ch05_finals_txt_print.indd 114
7/6/2011 11:05:47 PM
A Multi-Layer Architecture Classes and descriptive instances
115
Properties and Lexical instances relations
LEXICOGRAPHIC DATA MODEL
LEXICOGRAPHIC DATA
ACCESS & PRESENTATION MODEL
Figure 5.3 Components of the multilayer architecture
with varying degrees of expert terminology. In Figure 5.3, this is illustrated by means of ‘clouds’, which are meant to indicate that the actual nature of the information is hidden from the users; the information they get to see is not as it is actually represented in the database, but rather wrapped in a form that satisfies their needs in the most appropriate way.
5.7. Prototypical Implementation of the Proposed Architecture The architecture that has been introduced on the previous pages has been implemented in the prototype of a Web-based electronic dictionary containing roughly 14,000 lexemes with 44,000 example sentences and almost 33,000 morpho-syntactic preferences. The data have been extracted from the manually annotated SALSA corpus (Burchardt et al., 2006) and the automatically acquired database of multi-word expressions presented in Weller and Heid (2010). In the following, we will discuss some of the features of the access functionality 7 of a graphical user interface (GUI) that has been developed in the context of an undergraduate thesis at Universität Stuttgart (see Müller, 2010).
9781441128065_ch05_finals_txt_print.indd 115
7/6/2011 11:05:47 PM
116
e-Lexicography
More information on the extraction and storage of the data as well as technical details on the implementation of the GUI can be found in Spohr (2010) and Müller (2010), respectively. To make use of the full potential of the representations in the model, the GUI must be highly dynamic, in a number of respects. Since users frequently pass through a sequence of different situations, the GUI must allow for the switching between different situations on the fly. For example, a user who is in a text-receptive situation in one moment may well be in a text-productive situation two minutes later, with the tool still running for example, in a Web browser in the background. In addition to this, the GUI must be able to handle modifications to the model – in particular to the model containing the user-related and situation-related specifications. For example, it should not be the case that the GUI must be adapted every time the specifications for a new user or situation are added. Finally, one of the main characteristics of the GUI is that – while offering simple lemma-based access – it makes use of the information in the model in order to guide users in the formulation of complex queries, as well. In particular, the complex search interface is made up of drop-down lists that dynamically adapt to what the user has specified. For example, if users wish to restrict the search to entities having a certain part-ofspeech, they select the appropriate indication in the list (which is offered for example as ‘Part-of-speech’ or ‘Word class’) and are then offered only those values that are possible for this indication, such as ‘adjective’ or ‘noun’. In other words, only those continuations are offered that are possible according to the model, on the basis of the domain and range of each property and relation. While this may seem trivial in the case of the simple data categories just mentioned, it is in fact a very powerful mechanism that allows for the specification of very complex configurations. For example, it can be used to formulate a query that extracts all noun-verb collocations that are (near) synonyms of kritisieren (to criticize) and express their Evaluee (i.e. that which is criticized) by means of a prepositional phrase. In each step of specifying this query, the user is supported by the GUI, which offers only those properties and relations whose domain is a superset of the range of the previously selected relation (see Spohr, 2010, for more details). Figure 5.4 shows a screenshot of the GUI, which is obviously not in a production-ready state. Nonetheless, it still illustrates the most important functionalities that the GUI offers, such as the selection of a situation (e.g. text reception in the mother tongue or production in a foreign language)8 and a presentation language in box 1 in the figure.9 Moreover, it shows the text field that is used for specifying a simple lemma-based query (box 2), as well as the complex search functionality (box 3), which appears only after the user has clicked on a corresponding ‘complex search’ button. In other words, users are offered the complex search functionality only when they have explicitly demanded it.
9781441128065_ch05_finals_txt_print.indd 116
7/6/2011 11:05:48 PM
A Multi-Layer Architecture
117
Figure 5.4 Screenshot of the web-based GUI, for non-expert users in a textreceptive situation
Figure 5.5 Screenshot of the complex search in a text-productive situation for expert users
9781441128065_ch05_finals_txt_print.indd 117
7/6/2011 11:05:48 PM
118
e-Lexicography
Box 4 then shows the result count and the lexical items that match the query, which – in this particular case – extracts entities of type Typische Wortverbindung (typical word combination) that consist of Eindruck (impression) and a verb,10 such as Eindruck bekommen (to get the impression) or Eindruck erwecken (to give the impression). As can be seen in comparison with Figure 5.5, which is a screenshot of box 3 after the user type has been changed from non-expert to expert user, the content of the interface interacts with the specifications on the access and presentation layer. Here, the rather short list of possible types from Figure 5.4 has been replaced by a longer one, which contains not only linguistically more sophisticated terms like Kollokation (collocation) instead of Typische Wortverbindung (typical word combination), but also more options from which to choose. In particular, it offers the selection of specific types of collocations, which had been assigned the status of being irrelevant for nonexpert users. This shows how the search interface is determined on the basis of the access and presentation layer and is capable of adapting dynamically to changes in case a different situation or user type with different characteristics is selected.
5.8. Conclusion As the previous pages have shown, the architecture presented in this article is capable of dealing with user needs in a dynamic way. This has been achieved by introducing a layer containing function-specific information for each lexicographic indication in-between the user interface and the lexicographic database. This layer acts as a kind of filter that lets through only those indications which are believed to be relevant to certain types of users in particular types of situations (on the basis of Tarp, 2008a). At the same time, these indications are ‘wrapped’ in a form that is appropriate for the particular users, using labels in different languages and vocabularies with different degrees of expert terminology. By separating these function-related specifications from the other lexicographic content, this additional layer can be used to dynamically generate several monofunctional views on the data in the lexicographic database, each covering a different combination of a type of situation and a type of user. This pluri-monofunctionality has been shown by means of a prototypical implementation of the different components in this architecture in the Web Ontology Language OWL – a typed logic-based formalism that has been developed in the context of the Semantic Web and standardized by the World Wide Web Consortium. OWL – and the RDFS vocabulary that it uses – offers the formal means to model subsumption hierarchies that can be used to automatically derive coarser levels of granularity from single finer-grained statements.
9781441128065_ch05_finals_txt_print.indd 118
7/6/2011 11:05:50 PM
A Multi-Layer Architecture
119
As of May 2011, the resulting prototypical lexical resource contains semiautomatically acquired descriptions for around 14,000 lexemes. Moreover, a GUI has been developed which exemplifies the dynamicity of the architecture. This GUI offers simple and complex search functionalities, where it has been shown to be capable of being dynamically adapted according to the characteristics of the respective type of user – both in terms of how the content is displayed on the screen and in terms of which kinds of indications are offered at a specific point in the consultation. In addition to this, further situation and user types can be added easily at a later stage. Especially in this latter respect, it has been tried to argue that the proposed architecture is modular enough in order to be extended such that it covers not only specifications relative to situation and user types, but also those which are needed in order to allow for a customization of several aspects of both the access to and the presentation of the content for individual users. While different levels of granularity and labels in different languages and vocabularies are certainly not all that is required in order to achieve this, we are convinced that they do, in fact, represent important steps into the direction of individualized lexicographic consultation.
Notes 1
2
3
4
Despite the author’s affiliation with the Universität Bielefeld at the time of writing, the research that has led to this article has been carried out at the Institut für Maschinelle Sprachverarbeitung at Universität Stuttgart, as part of the International Graduate School 609 ‘Linguistic representations and their interpretation’. In addition to this, many of the key ideas underlying this article have been developed in the project ‘Developing a Model for Electronic Dictionaries’, in a cooperation between the Stellenbosch Institute for Advanced Study and Universität Stuttgart. The author is indebted to the invaluable comments and ideas of Ulrich Heid and Rufus Gouws. Further thanks go to Pawel Müller for his work on the GUI. This does not mean that there is no way to customize the data that are presented to a user in a setting that is based on the first approach. It does mean, however, that achieving this customization is expected to be rather different from the way the different monofunctional dictionaries have been compiled, as it seems unrealistic to have users define export filters operating directly on the database schema. It should be noted, though, that certain pieces of information (e.g. definitions or example sentences) must exist in several versions in order to be offered to different users in different ways. Strictly speaking, however, this is not meant by granularity in this context. For the purpose of this chapter, we will not go into details on further opportunities opened up by such inferences, such as the possibility to express underspecified information, which is interesting in particular in a setting where the data have
9781441128065_ch05_finals_txt_print.indd 119
7/6/2011 11:05:50 PM
120
5
6
7
8
9
10
e-Lexicography
(at least partly) been acquired automatically. See however later sections in this article (as well as Spohr & Heid, 2006) for the use of underspecification in query ing the lexical database. The interested reader is referred to Spohr (2008) and especially chapter 3 of Spohr (2010) for discussions of the features of the Resource Description Framework Schema (RDFS) and the Web Ontology Language (OWL) in a lexicographic context. The official specification of OWL can be found at www.w3.org/TR/ owl-ref/. The term ‘accessible’ is meant to refer to situations in which a user wishes to formulate a complex query that specifies certain values for particular indications. For example, in order to be able to formulate a complex query like ‘give me all nouns beginning with “p” ’, at least the part-of-speech indication must be accessible – in addition to the ‘usual’ lemma specification. The focus on the access functionality is due to the fact that the presentation functionality, that is, the function-based generation of dictionary entries that are displayed to the user, has not been implemented at the time writing. It should be noted that since the GUI has been developed in an undergraduate thesis as a kind of proof-of-concept implementation, the situation-related and user-related specifications have been grouped together. In the version displayed in Figure 5.4, the ‘Rezeption in L1’ profile is one that contains situation-related specifications together with those for linguistically untrained users, whereas ‘Produktion in L2’ contains those for linguistic expert users. Boxes 1–4 are not part of the screenshot, but have been added solely for illustrative purposes in this context. In the figure, the value ‘verb’ of the selected property Hat Wortart (‘has part-ofspeech’) is obstructed by the drop-down list for specifying the type of the entity to be retrieved.
9781441128065_ch05_finals_txt_print.indd 120
7/6/2011 11:05:50 PM
Chapter 6
Change of Paradigm: From Linguistics to Information Science and from Dictionaries to Lexicographic Information Tools Patrick Leroyer
6.1. Introduction: Moving Lexicography into the Realm of Information Science In a recent seminal publication on English lexicography, Henri Béjoint (2010) claims categorically that lexicography is a practical activity or a craft. According to Béjoint, there cannot be any such thing as a theory of lexicography. This claim will certainly sound familiar to many meta-lexicographers engaged in the epistemological and methodological debate on the status of lexicography. It had already been put forward in quite similar terms by two other coryphaei of English Lexicography, Sue Atkins & Michael Rundell (2008). I simply do not believe that there exists a theory of lexicography, and I very much doubt that there can be one [ . . . ]. There are theories of language, there may be theories of lexicology, but there is no theory of lexicography. Lexicography is above all a craft, the craft of preparing dictionaries. (Béjoint, 2010: 381) Rather than being the expression of a personal belief, this ontological judgment appears to follow from linguistic arguments. The dictionary is conceptualized as being identical with an artefact containing language data, a kind of language model firmly anchored in the linguistic paradigm as established, among others, by Rey-Debove (1971) almost 40 years ago. Naturally, dictionary making as a technique is bound to be reduced to an atheoretical activity; it becomes the craft (or art) of practically compiling (theoretical) language data. Furthermore, Béjoint’s anti-theoretical claim seems to be guided by a restrictive approach to the subject matter of lexicography, limiting the discipline to the exclusive making of language dictionaries, and ignoring the
9781441128065_ch06_finals_txt_print.indd 121
7/6/2011 11:06:05 PM
122
e-Lexicography
fact that these only represent a minority of all lexicographic publications and lexicographically designed reference works today, as will be demonstrated in the second section of this article. Finally, one could suspect that Béjoint’s judgment is conditioned by a conventional view of dictionaries as standalone, printed artefacts. It seems to give no consideration to e-lexicography, which has been developing very rapidly during the past 20 years and has become a major field of lexicographic research. Surprisingly, this suspicion is invalidated in the conclusion of the book, in which the author radically modifies his view on the dictionary, and anticipates – quite correctly – the upcoming of a new generation of tools made possible by a technological metamorphosis paving the way for innovation: Probably the dictionary as we know it is on its way out, and we will see the emergence of new kinds of tools, reference tools encompassing more that the dictionary, containing other kinds of information and providing a better treatment of the more traditional presentations [ . . . ] We are entering a period where knowledge cannot be further than a click away. (Béjoint, 2010: 386) ‘The emergence of new kinds of tools’ will be exactly the central topic of the present discussion. Accordingly, the underlying thesis developed and illustrated in this article is that communication and information technology in general, and e-lexicography in particular, have profoundly altered the status of lexicography. Lexicography can no longer be categorized as a subset of disciplines within applied linguistics. It must be seen as a unique discipline at the crossroads of social and information sciences and technology. Technology has revealed that the dictionary, despite the huge diversity of its forms, is merely a basic form of lexicographic artefacts. Still, much meta-lexicographic work is restricted to the study of this particular form. The true subject matter of lexicography is the ongoing development of lexicographic information tools and lexicographically designed information tools, including, but by no means limited to, dictionaries. This development has been made possible by the Modern Theory of Lexicographical Functions (henceforth MTLF).1 For many centuries, lexicography – in the hands of writers and authors of dictionaries – has been developing dictionaries of all kinds, on various media, in a great variety of formats and structures and in practically all languages of the world. It has covered general language, specialized language, language variation and so on. Dictionaries have played a crucial role in history, being mirrors of their time in all domains of human activity – literature, art, commerce and industry, religion and science. Dictionaries have also been compiled as cultural and ideological instruments of language development and language planning, used by governments and language academies to standardize language. Some dictionaries have even been truly transformed into scientific
9781441128065_ch06_finals_txt_print.indd 122
7/6/2011 11:06:06 PM
Change of Paradigm
123
instruments, used by linguists to test, formalize and disseminate their theories on language structures and on lexical semantics. The overwhelming majority of dictionaries, however, have been made as commercial products of varying quality, and as such they have been submitted to the rules of modern marketing and consumption. But most importantly, dictionaries have always been made for people who find them useful in solving problems caused by language used (i) for oral and written communication and (ii) for knowledge representation and acquisition in all domains of human activity. Finally, dictionaries have been used as pedagogical instruments, as an interactive resource in language acquisition or in the knowledge acquisition of a particular subject field through the concepts of this field. No wonder then that the sciences of language – in all fields of Language for General Purposes (LGP) as well as of Language for Special Purposes (LSP) – have hijacked and rerouted the dictionary, and made it their favourite instrument alongside the grammar book. And it comes as no surprise, either, that the dictionary, for better or worse, has been studied and designed by the scientific language community. As Béjoint seems to indicate, it is time to move forward, because the conventional dictionary is on its way out. The subject matter of lexicography has to be redefined through the looking-glass of modern information science and technology. In order to establish a change of paradigm, this article will argue for a recategorization of the subject matter and of the substance of lexicog raphy in accordance with the functional framework developed at the Centre for Lexicography (hereafter Centlex), 2 in which specific references are actually made to lexicographic and lexicographically designed information tools. Consequently, a new, general definition of lexicography and lexicographic information tools embedded in the functional framework will be put forward. Finally, the adaptive quality of the new paradigm, which is moving lexicography into the realm of information science, will be illustrated through the innovative models of four lexicographically designed information tools. Consequences and perspectives for the future research agenda will be briefly outlined in the concluding remarks.
6.2. Underrepresentation of Language Dictionaries As already stated, the true subject matter of lexicography is the study and design of lexicographical tools, including dictionaries. In this light, Tarp (2008a) is definitely correct when he emphasizes that this definition is partly deduced from a tradition, and that dictionaries, like other tools, must be seen as what they are – a range of information tools sharing common functional features. In fact, the number of lexicographical tools published worldwide, and not belonging to the category of language dictionaries, is quite substantial.
9781441128065_ch06_finals_txt_print.indd 123
7/6/2011 11:06:06 PM
e-Lexicography
124
The term ‘dictionary’ itself is governed by the logics of language, both for general and special purposes, and lexicography is inevitably regarded as a matter of words and language descriptions and/or language engineering. This is definitely out of touch with reality. A close study of the directory of lexicographic publications from 2008 and 2009 at Dawson Books3 (see Dawson Books Limited online database), a global supplier of books to university libraries worldwide, reveals that the category ‘language dictionaries’, in which the vocabulary of a language is the true scientific object of the work,4 counts for only a quarter of all lexicographic publications. The figures are as follows. In the two-year period of 2008 and 2009, 767 language dictionaries were published. For the first ten months of 2009 alone, the number reached 318. These figures must, of course, be seen in relation to the number of other lexicographic information tools published. In the entire period of 2008–2009, the number of such publications amounted to 2,349 titles, of which 1,152 appeared during the first ten months of 2009. In other words, every year sees the publication of three times as many lexicographic information tools, which are certainly made up of wordlists and language data organized in dictionary articles, but which nevertheless have nothing to do with language as a scientific object of study as such. If one was to include all artefacts which, all things considered, also should be counted as members of the vast family of reference works, such as almanacs, atlases, catalogues, directories, guides, handbooks, reference manuals and so on, the number of language dictionaries being published would fade below the level of significance. In the information and knowledge society, the needs for access to data and information processing are far from being restricted to language data per se, although access to data is most often realized by means of systematically arranged lemmas or keywords according to some kind of semasiological or onomasiological structuring.
6.3. What Is Lexicography? Extension of the Subject Matter At Centlex, the main research objectives include a necessary expansion of the subject matter of lexicography: Lexicography at Centlex is much more than the several hundred years old scientific discipline dedicated to the investigation and development of dictionaries, consultation works, reference works and other types of lexicographically designed information tools. We go even further, and expand the subject matter to information tools that can benefit from a lexicographic design. Centlex: (www.asb.dk/article.aspx?pid=895&lang=en-gb) (see note 2)
9781441128065_ch06_finals_txt_print.indd 124
7/6/2011 11:06:06 PM
Change of Paradigm
125
This extension of the subject matter is in turn made possible by the output of a ground-breaking thesis based on the conceptual frame of ‘needs adapted access to data and information’: At Centlex we work on the ground-breaking main thesis that lexicography is a modern, interdisciplinary and independent discipline whose subject field is not limited to lexicographic user-needs adapted information tools, but also encompasses all kinds of information tools and reference works that can be optimized with the very same user-needs adapted information and data access – from handbooks and manuals to dynamic information websites and web portals. (www.asb.dk/article.aspx?pid=895&lang=en-gb) (see note 2) Needs-adapted access to data and information is the true research agenda of Centlex; dictionaries per se are not. Therefore, the objective of Centlex is to devise theories necessary for the development of lexicographic projects aimed at solving information problems. This research has come a long way during the past 20 years. It has recently reached several important milestones, not only at the meta-level, like the development of the MTLF function theory of lexicog raphy (Tarp, 2008a), but also at the micro-level in more specific research fields, such as specific types of lexicographic tools and their components, functions and structures. These include studies of general dictionaries and learner’s dictionaries (Tarp 2008a), travel dictionaries and travel guides (Andersen & Leroyer, 2008; Gouws & Leroyer, 2009; Leroyer, 2008a and 2008b), specialized dictionaries for learners (Fuertes-Olivera, 2010a) and internet terminological dictionaries (Fuertes-Olivera, 2010c), types of usersituations, in particular communicative user-situations (Bergenholtz & Tarp, 2005) but also cognitive user-situations (Ptaszynski, 2010), types of dictionary use, especially the access process (Bergenholtz & Gouws, 2007a and 2010), the design of search interface in accordance with specific user-situations (Almind, 2005), and the role of lexicographic structures in relation to the lexicographic functions (Gouws, 2005), just to mention a few representative achievements. In the knowledge and information society, the dazzling, exponential growth of information, information systems, and modes of storing, indexing, sharing and accessing information is logically accompanied by the growing risk for human users of not being capable to access the right information at the right time, in the right place, for the right purpose and in the right quantity. There is a growing risk of information overload, and in this light, functional lexicography seems to have a more important role to play than ever in order to take up this challenge. But what is lexicography exactly? Or to put it more precisely, what is it exactly that makes an information tool lexicographic? What is the lexicographic quality of an information tool so to speak? Do not all information tools and systems
9781441128065_ch06_finals_txt_print.indd 125
7/6/2011 11:06:06 PM
126
e-Lexicography
share, in one way or another, some common lexicographic heritage, by virtue of being tools and, thus, having functions to fulfil? To find satisfactory answers to these questions it is necessary to go back to the concept of lexicography, try to (re)categorize it, and isolate its distinctive features. One of the main problems faced by modern lexicography in its attempt to establish the functional paradigm has been embedded in the etymology of the word itself – the lexiko graphein, or the writing of words and search for the original – and in the definitions that consequently have been deduced from this etymology and linguistically determined. It is exemplified best in Johnson’s famous harmless drudge quotation: Lexicographer: a writer of dictionaries, a harmless drudge, that busies himself in tracing the original, and detailing the signification of words. (Dictionary of the English Language, 1755). French definitions also evolve along this line. In line with Bernard Quemada, they advocate splitting lexicography into two separate disciplines: the technique of writing and producing dictionaries on the one hand (= dictionnairique), and the scientific study of lexical facts on the other, a study that does not necessarily lead to the production of dictionaries (= lexicographie):5 Lexicographe, subst. Personne qui pratique la lexicographie. (TLFi 2010) Lexicographe [ . . . ] Personne qui pratique la lexicographie, qui compose, rédige des dictionnaires, des lexiques. (Académie 2010) Lexicographie [ . . . ] Technique de confection des dictionnaires. (TLFi 2010) Bergenholtz (1995) is among the first to go against this trend, arguing convincingly that in terms of epistemology, lexicography is not limited to the activity of the lexicographer, but includes both theoretical and practical aspects. The phenomenological argument – the inclusion of lexicons and encyclopaedias in the range of lexicographically designed products – is also a convincing argument, contributing to the necessary separation of lexicography from linguistics: Lexicography is not a linguistic discipline and especially not a genuine part of lexicology. Lexicography is not only the compiling of dictionaries, but theory and practice for dictionaries, lexica and encyclopaedias. (Bergenholtz 1995:37)
9781441128065_ch06_finals_txt_print.indd 126
7/6/2011 11:06:06 PM
Change of Paradigm
127
Finally, Tarp (2007) is the first meta-lexicographer to assert that the informational nature of lexicography is embedded in its methodology, independent of any particular lexicographical product. Any kind of situation involving an information problem and leading to a lexicographically relevant need for information should be matched by tailor-made data selection, presentation and access – ‘a lexicographic theory focusing on quick and easy access to data from which specifically user- and situation-adapted information can be extracted’ (Tarp, 2007:177). The obvious question that the above assumption leads to is: what exactly is the lexicographic core of the theory itself? Many information tools which apparently are not lexicographically designed (although some of them could benefit from some kind of lexicographic design) are indeed specially designed to provide quick and easy access to data from which specifically user- and situation-adapted information can be extracted. Examples include the homepage of a provider of TV programs and news, the home page of a railway, of an airline company or a real estate agency – the list of potential information tools focusing precisely on these very same parameters is endless.
6.3.1. Interdisciplinarity: Cooperation but No Compromises Modern functional lexicography is characterized by its interdisciplinary methodology. Because lexicographical tools can reflect all fields of human knowledge and activity, lexicographers are naturally bound to work with experts from other disciplines (including, whenever necessary, language experts). There is nothing eclectic about this approach; it is simply fundamental to functional lexicographic methodology. Any language dictionary, for communicative user-situations (production, reception, translation), or for cognitive user situations (using the dictionary for learning, for instance in connection with a language acquisition course), will definitely benefit from the participation of linguists, but only in so far as the linguistic contribution is relevant to the intended user group, particularly to its profile in terms of needs, competences and foreseen user situations. The same interdisciplinary approach also applies to specialized lexicography, in which experts of a specific LSP can work together with experts from the specific subject field covered by the information tool. For instance, dictionaries developed in the field of communication will benefit from the cooperation with scholars of that discipline, while a lexicographic corporate tool developed for the communication department of a given company should involve the contribution from experts from the field of corporate communication (see Leroyer, 2007) as well as the executive staff of this department. Many more examples could be given. Suffice it to say that lexicography in theory and in practice should lead to the design of genuine functional lexicographic tools, not as some kind of hybrid products and certainly not as the result of a cloning process from other disciplines.
9781441128065_ch06_finals_txt_print.indd 127
7/6/2011 11:06:07 PM
e-Lexicography
128
6.3.2. The Triangulation of Lexicographically Designed Information Tools Basically, lexicographically designed information tools can be perceived through the triangulation of three interrelated sets of parameters: the user, the access and the data parameters. To keep the tool in a functional balance and optimize its functional ergonomics, focus should as a rule be placed equally on all three sets of parameters. Unequal focus in the design of lexicographic products, however, seems to be the rule rather than the exception. Lexicographers who focus excessively on the presentation of data tend to reduce the information tool to its contents and to the presentation of the data. This is often the case with language dictionaries in which data selection and presentation is guided by the uncritical use and promotion of specific linguistic, communicative or conceptual theories (depending on the terminological school involved), or by specific language-planning policies. Lexicographers who focus primarily on the user and on users’ demands, evaluations and behaviour run the risk of confusing objective user needs with needs assessments resulting from market research analysis and segmentation, or from user surveys devoted to the registration of subjective user needs. Such an approach, which reduces the information tool to a mere commodity and its users to consumers, can be found in commercial lexicography, where it is common to recycle existing data collections independently of anticipated, objective user needs and profiles. Lexicographers who focus mostly on access technology and search interface are likely to reduce the information tool to a computer gadget or to a plethora of intricate, advanced search options. This is the case, for instance, with linguistically determined dictionaries as the TLFi (2010), in which data-fi ltering and advanced search options are strictly reserved to linguistic experts. Access modes and combinations can in fact be seen as reflections of the data fields and data categories used throughout dictionary articles; the resultant lexicographic corruption consists in access to the data being determined by the data themselves and not by the information needs (as should be the case).
6.4. Lexicography as a Use and Gratification Theory in a Functional Framework On the basis of the above argumentation on subject matter extension, interdisciplinary methodology and triangulation of lexicography to ensure functional balance, it is now possible to propose a general definition of lexicography and lexicographic tools within the functional framework (Table 6.1):
9781441128065_ch06_finals_txt_print.indd 128
7/6/2011 11:06:07 PM
Change of Paradigm
129
Defi nition Lexicography is an integrated part of the social and information science paradigm and refers to the interdisciplinary discipline concerned with the study, design and development of functional tools aimed solely at the gratification of human information needs and problems. The distinctive feature of lexicographic tools is the triangulation of three interrelated sets of social, logical and semiotic parameters, corresponding respectively to the following dimensions of the tool: user, access and data. Social parameters are in every single case determined by the systematic identification of the specific information problems, needs and profiles of the potential user of the information tool. The social parameters are decisive for both the functional genesis (communicative, cognitive, operative and interpretative functions) and the gratifying use of the information tool. Semiotic parameters are in every single case determined by such data selection and presentation that ensures gratifying extraction of information in accordance with the specific problems, needs and profiles of the potential user. Data are by nature of a semiotic kind and consist of verbal and non-verbal signs. The most frequently used symbols in data selection and presentation are words. Logic parameters are in every single case determined by such structures, modes, indices, algorithms and computing technologies that ensure gratifying access to data in accordance with the specific problems, situations, needs and profiles of the intended user. Table 6.1: New social definition of lexicography as a use and gratification theory in a functional framework. The new definition is one of the preliminary and necessary steps in the implementation of the change of paradigm. It will make it possible to develop new types of lexicographically designed information tools according to the functional methodology, in which customization and personalization of access according to user needs and situations play the most important role in order to achieve gratification through the use of the tools.
6.5. Implementing the Change of Paradigm: Four Lexicographic Information Tools In the following text, four different models of lexicographically designed information tools will be presented in order to illustrate the change of paradigm
9781441128065_ch06_finals_txt_print.indd 129
7/6/2011 11:06:07 PM
130
e-Lexicography
discussed above. They will demonstrate how lexicography is currently moving towards the realm of information science.
6.5.1. Patient Dictionary The first functional model to be discussed is related to Lexonco (= lexique oncologique), an online patient dictionary and patient guide. Lexonco was originally developed in the framework of a French research project (Delavigne, 2008) in which Centlex acted as a consultant. The genuine purpose of Lexonco was to solve information problems for cancer patients and their families. The data were collected from a number of data repositories managed under the French National Federation of Cancer Centres, such as cancer terminology, patient testimonies, information leaflets and so on.
6.5.1.1. Modular, functional configuration The numerous information needs and information challenges encountered by cancer patients and their families arise from the necessity of understanding complex, individual pathologies with heavy psychological and social side effects, as well as entering a dialogue with medical staff. The communicative and conceptual complexity of cancer is reflected in the medical LSP of oncology and in the organization of treatment protocols and similar documents. Comprehension is necessary in order to break down cognitive bar riers and optimize communication between hospital, doctor and patient, and thereby contribute to the success of treatment and return to normal life. This explains why the functional model developed at Centlex and proposed to the management of Lexonco features the modular combination of cognitive and communicative functions. Moreover, it involves access to external resources for interactive user participation. The suggested architecture consists of independent modules governed by communicative, cognitive and operative functions. Lexical tables contain terms related to cancer and all the necessary data concerning the term (meaning explanation, illustrations, examples, etc.); introduction tables contain systematic introductions to the topic of cancer, including the biology of cancer; synoptic tables contain specific data on individual types of cancers; finally, link and participation tables govern access to external resources as well as the administration of active user participation.
6.5.1.2. Three types of individualized access modes The functional model features three individualized access modes, all with user-filtering, meaning that access to data and data presentation is to be filtered according to the individual search criteria chosen by the user.
9781441128065_ch06_finals_txt_print.indd 130
7/6/2011 11:06:07 PM
Change of Paradigm
131
1. The consultational access mode corresponds to a certain extent to the conventional lexicographic access mode by way of word lists (lemmas). The main difference here is that access (= the available lemmas) is individualized in accordance with patient information records from the hospital journal system. This is the ‘my cancer’ access. In the long run, data selection could also include other medical data, such as screenings, examinations, diagnosis, treatment, controls and so on. 2. The interactive and participative access mode is realized by active links to a relevant selection of external information resources. Individualization of access is achieved by user interactivity. The user can obtain membership of patient associations, order medical guides, apply for services, sign up for experimental treatments and so on. This is the ‘my network’ access, allowing the possibility of communicating and sharing of information, for example, with other patients in the same situation. 3. The automated access mode with user filtering is achieved by the user giving permission to receive RSS feeds, alerts, blogs or newsletters from a selection of relevant resources.
6. 5. 2. e-Lexicographic Guide to French Real Estate The Ejendomsordbog (= Dictionary of Real Property) is an e-lexicographic guide to French real estate. It is currently being developed in close cooperation with a leading provider of online lexicographical solutions in Denmark. Many Danes are fascinated by France, and some of them consider buying property in some of the most popular regions of the country. The problem is normally not the matter of financing, as Danish banks today offer a range of financial products to their clients for this specific purpose. Therefore, the Ejendomsordbog needs not include financial data types. Rather, the problems Danes encounter in this situation are of communicative as well as cognitive nature. The communicative problems are due to the fact that many Danes have no command of French and are ipso facto prevented from actively participating in the real estate market. Reading comprehension hinders potential buyers unless they make use of international agencies and brokers providing descriptions of properties in English. However, English is still rarely found in French real estate market communication and is normally reserved for high-end properties. The cognitive problems result from the fact that Danes normally do not know the characteristics and rules of the French market and its many subdomains, such as advertising and pricing, negotiation, documents, law, taxes, administrative procedures and so on. Real estate is indeed a culture-bound domain par excellence. What is truly needed then is also a practical, intercultural guide, and such a guide is actually integrated in the Ejendomsordbog.
9781441128065_ch06_finals_txt_print.indd 131
7/6/2011 11:06:08 PM
e-Lexicography
132
6.5.2.1 Functional analysis of user-profi les and market analysis The model of the Ejendomsordbog is derived from a thorough functional analysis of the extra-lexicographical situations and of the lexicographically relevant needs of Danes who plan to buy or sell a property in France, or of Danes who already own a property there and encounter information problems in connection herewith. The functional analysis was supported by a market analysis conducted over a period of two years, particularly in connection with the big annual, national exhibitions organized by the main actors of the real estate business in Denmark. In addition, property-related communication problems encountered by Danes owning a property in France were identified through translation jobs commissioned by a number of Danish translation companies. The model operates with two groups of users according to L2 competence: Danes without any command French and Danes with some command of French. It also distinguishes two categories of subject-field competence: semiexperts and experts. Data selection is based on information needs in specific situations in the extra-lexicographic world, corresponding to prototypical scenarios: buying, selling, owning, or being a potential buyer. Also, translators and translation students (the domain being, as already mentioned, a market for professional translation) are included in the second group (Table 6.2):
User Profiles I. Danes without any Command of French – – – –
Buyers/sellers – experts Buyers/sellers – semi experts Owners Potential buyers
II. Danes with some Command of French – – – – –
Buyers/sellers – experts Buyers/sellers – semi experts Owners Potential buyers Translators and translation students
Table 6.2 User profiles of Ejendomsordbog.
9781441128065_ch06_finals_txt_print.indd 132
7/6/2011 11:06:08 PM
Change of Paradigm
133
6.5.2.2. Variable data presentation and clustering Because of the many functional combinations to be taken into consideration, the Ejendomsorbog model offers variable data distribution and situation-adapted access. There are four sets of distinctive data presentation and data clustering: 1. Data for reading and understanding real estate texts, featuring, among other things, short explanations to the French lemmata 2. Data for translating real estate texts from French into Danish 3. Data for checking knowledge concerning real estate concepts User situations 1. Reading and understanding French real property texts – – – –
French lemma French lemma illustration Danish short explanation of French-lemma Danish equivalents of French lemma
2. Translating French real property texts into Danish – – – – – – –
French lemma French grammatical information on lemma Danish equivalents of French lemma French collocations Danish equivalents of French collocations French standard phrases and expressions Danish equivalents of French standard phrases and expressions
3. Checking on French real estate knowledge – terms and expressions – – –
French lemma French lemma illustration Danish short explanation of French lemma
4. Acquiring knowledge about French real estate market – – – –
Danish lemma Danish long explanation of French lemma Internal links (to integrated practical guide to buying, selling property and owning real property) External links (to selection of relevant online resources)
Table 6.3 Data clustering and variable data presentation according to user situations.
9781441128065_ch06_finals_txt_print.indd 133
7/6/2011 11:06:08 PM
134
e-Lexicography
4. Data for acquiring knowledge about the French real estate market. These data can also be accessed through Danish entries connected to the relevant French entries. They feature long, contrastive explanations in Danish to the connected French lemmata. There are plans to include document resources in the future, and to add an operative dimension to the information tool. This would come in the form of a practical guide containing explicit instructions.
6.5.3. Change of Paradigm: e-Lexicographic Mobile Tourist Guide The third functional model discussed here is a practical lexicographic tool for tourists. It is based on theoretical research conducted at Centlex for the past three years (Andersen & Leroyer, 2008; Gouws & Leroyer, 2009; Leroyer, 2008a and 2008b). The research originated in the studies of lexicographic and meta-lexicographic contributions to the field of lexicography for tourists (or ‘travellers’, as they often are referred to in the literature). The definition presented below is a conventional, meta-lexicographic definition of what is to be understood by the term ‘travel dictionary’, which seems to be the most widespread lexicographic tools for tourists. Most travel dictionaries covered by this definition also include a phrase book, featuring a heteroclite collection of ready-made sentences and dialogues to be used in specific tourist situations, and structured according to the principles of a pragmatic, situational access (sentences needed for conversation at a hotel, restaurant, post office, etc.). The travel dictionary is bilingual (or multilingual) and is aimed at people with little or no knowledge of the foreign language; it is a typical reference dictionary, primarily intended to solve practical communication problems. (Nordisk leksikografisk ordbok 1997: 224) The above definition is based on the banal assumption that tourists normally have little or no knowledge of the foreign language of their foreign destination, and by virtue of this they must encounter serious communication problems. They are badly in need of lexicographic assistance. This is certainly true, but only to some extent. The real problem is what kind of lexicographic assistance is relevant and necessary. For someone travelling to Spain for the first time and having no command of Spanish but still wanting to communicate in the language, even the best travel dictionary and phrase book in the world will be of little help, indeed. Its usefulness would be confined to informing the user how to produce basic greetings, such as hello, thank you, please and good bye. It should also be noted that even if the travel dictionary could provide some kind
9781441128065_ch06_finals_txt_print.indd 134
7/6/2011 11:06:08 PM
Change of Paradigm
135
of primitive help to say something in Spanish, there is absolutely no guarantee that the utterance would be understood. Moreover, lacking the necessary listening skills and competences, the tourist user would not comprehend the answers to their questions, which would be nicely phrased in the foreign language in the travel dictionary. In any case, the definition presented above is the result of a phenomenological, deductive way of thinking: meta-lexicographers have uncritically observed the lexicographic state of affair. Their observations are apparently guided by commercial interests, travel dictionaries and tourist guides being a profitable business.
6.5.3.1. Absence of functional criteria for lemma selection Given the above, it comes as no surprise that one of the main drawbacks of conventional lexicographic tools for tourists is the striking absence of any functional criteria for lemma selection. The editorial logic of such tools seems to be governed exclusively by a disproportionate focus on users and their communicative needs, and by the market. “Give them as many words as possible” seems to be the rule, as is confirmed by the following list of junk lexical items found in one of the most popular online phrase books in Denmark: strand (beach), svinekød (pork), bil (car), grøn (green), butik (shop), blå (blue), opad (upwards), nord (North), grænse (border), gå (walk), pas (passport), til venstre (left), hus (house), spiseseddel (menu), hotel (hotel), kobling (clutch), vi=ndue (window), udsigt (view), køre (drive), flyvemaskine (airplane), lufthavn (airport), station (station), høne (hen), pris (price), tjener (waiter), søge (look for), hospital (hospital), hjælpe (help), god (good), hund (dog), edderkop (spider), seng (bed), Danmark (Denmark), syg (sick), vand (water), vej (way), værdifuld (precious), kone (wife), hjem (home), penge (money), brød (bread), vandfald (waterfall). (Dansk parlør, 2010) As tourists travel to foreign destinations, it seems quite obvious from the editor’s point of view that they need help coping with the foreign language of the destination in all relevant tourist situations in which communication problems could be expected to arise. But what are those relevant tourist situations in which problems might occur? What are the relevance criteria? In fact, it appears that the number of situations is virtually unlimited – as is clearly demonstrated by the list of words presented above. What is needed, instead, is a comprehensive dictionary of the foreign language, including an introduction to the grammar of the language. Or better indeed, the tourist might consider learning the language, or some of it at least, before going to the foreign destination. If this is not possible, or does not produce the desired effect, the tourist might consider using English as a lingua franca.
9781441128065_ch06_finals_txt_print.indd 135
7/6/2011 11:06:08 PM
136
e-Lexicography
6.5.3.2. Lexicographic tools for tourists revisited The functional shortcomings of conventional lexicographic tools for tourists lead to the formulation of three theses necessary for the development of innovative lexicographic tools (Table 6.4):
1. Because of their functional drawbacks, conventional, dictionarybased lexicographic tools for tourists fail to fulfil their intended communicative functions. The travel dictionary is useless and is better considered as a talisman rather than as a tool. 2. Being a tourist is a mobile, multicultural and experiential activity. Information needs are far from limited to communication assistance; they also include a vast range of other lexicographically relevant information needs (practical, cultural, etc.), as well as information necessary for successful decision-making and performance. 3. New designs are required. Innovative mobile e-lexicographic tools for tourists could satisfy those needs and enhance the quality of tourist experience. Such information tools must be function-based and should make use of adaptive geo-communicative technologies. Table 6.4 Three theses on tourist lexicography.
6.5.3.3. Designing a mobile e-lexicographic tourist guide Data access in a mobile e-lexicographic tourist guide should be of two kinds. It should combine a conventional user-driven search and navigation mode on the one hand, and a range of automated access modes on the other. The userdriven modes should include: z A consultational search mode provided by primary entries z A navigational mode provided by cross-reference entries, internal links
to integrated thematic sections and external links to selected L16 online resources in the user’s native language The automated modes should include: z Location-based access (GPS) z Time-based access to time-dependent information (data filtering) z Visual access to encyclopaedic data (the so-called augmented reality)
9781441128065_ch06_finals_txt_print.indd 136
7/6/2011 11:06:09 PM
Change of Paradigm
137
z Participative network access to comments and recommendations (sharing
information and geo-tagging)
6.5.3.4. Functions and data categories Communicative functions should be supported by L2 equivalents and pronunciation sound files. Cognitive functions should be supported by L1 practical and cultural notes. Operative functions should be supported by L1 instructional notes and by an integrated lexicographic help.
6. 5. 4. e-Lexicographic Guide for Scientific Text Production7 The last model to be presented here is a lexicographic database aimed at generating an e-lexicographic assistant for scientific text production, as is currently being developed by the EILA department at the University of Paris Diderot. The goal of the project, according to its managers (Pecman et al., 2009), is twofold: (i) to build a complete and adaptive tool aimed at providing assistance to scientific text production in English, and (ii) to support the practical training of students of specialized translation: z conception d’une base de données destinée à recevoir les ressources linguistiques dével-
oppées sous forme d’un outil d’Aide à la Rédaction de Textes Scientifiques z modélisation et intégration des ressources développées dans la base de données z conception des applications pédagogiques portant sur la compréhension et l’acquisition
des savoirs scientifiques z Nous espérons ainsi pouvoir adapter l’outil ARTES pour intégrer ces informations
linguistiques et aboutir ainsi à un outil complet, combinant les ressources terminologiques et phraséologiques et un accès aux données à la fois sémasiologique et onomasiologique z [‘Design of a database aimed at hosting available linguistic resources in the
form of a scientific writing assistant z Modelling and integration of the resources available in the database z Design of pedagogical applications aimed at understanding and acquiring
scientific knowledge z We hope to be capable of adapting the ARTES assistant in order to inte-
grate the linguistic information, and thus devise a comprehensive tool combining terminological and phraseological resources, and featuring a semasiological as well as an onomasiological access to the data.’]
9781441128065_ch06_finals_txt_print.indd 137
7/6/2011 11:06:09 PM
e-Lexicography
138
More specifically, the goal is to design a central database prepared for the import and indexing of several thousand terminological records presently stored in other database systems, and to provide advanced search options as well as flexible editing options. Besides terminological records, the database should offer access to documents, instructions, style guides and so on, providing textual and translational assistance to the users. The ongoing work also includes the design of the interface and its specifications. The methodology applied so far, however, has been largely governed by a terminological and linguistic approach, although an ‘adaptive design’ indeed is mentioned as an important issue. Nevertheless, the functional aspects of the information tool have not been systematically analyzed yet, nor taken into consideration in the planning process. These are the aspects on which Centlex has been invited to offer its expertise.
6.5.4.1. Modular approach The latest development of the model involves a modular approach proposition based on the MTLF and including an extra-lexicographic functional analysis of the intended user profiles and user situations. The analysis has revealed a large number of distinctive functions to be taken into consideration. This, among other things, poses a challenge to the optimal design of the interface. Without going into details, what is basically proposed is a modular architecture featuring six independent, basic modules: z three communicative modules for reading, writing and translating situations z one cognitive module for learning situations (knowledge acquisition) z two operative modules for pedagogical situations (instructions and exer-
cises), and for editing situations (for teachers, and for students) Access to data within those six modules will be filtered and prioritized according to the functions assigned to the different user-situations and user-profiles (four major groups of users with different needs and degrees of expertise), for consultational as well as operative and editing use.
6.6. Conclusion As stated at the beginning of this article, the importance of future research in the field of lexicographically designed information tools is one of the key points in the research vision and mission of Centlex. The perspectives for the future are twofold: On the theoretical side, the main idea is to push forward the fundamental theories that will govern the implementation of a
9781441128065_ch06_finals_txt_print.indd 138
7/6/2011 11:06:09 PM
Change of Paradigm
139
lexicographically designed needs-adapted information and data access in other types of information tools that can benefit from such a design. It has been argued that this work must be carried in accordance with the functional axioms of lexicography as a science of ergonomic information tools, but that the main prerequisite is to work for a re-categorization of lexicography and a change of paradigm. On the practical side, the most important mission is to experiment and apply the new findings of theoretical work in the design of innovative, lexicographic information tools. This article has demonstrated that e-lexicography has truly revealed and generated the needs for a shift of paradigm, and has simultaneously given the new lexicographer, with computing technologies, the means to offer real innovation. The case has been made for adaptive functional solutions in connection with data selection, presentation and access and a modular approach which could be compared to the design of multifunctional Swiss Army knives has been proposed. Just like the specific and unique tools of the universal Swiss Army knife, modules must be specific and independent, but the rules governing their interaction must be functional. Finally, for users on the move, the case has been made for the development of a new generation of mobile lexicographic tools exploiting the advances of modern computer technologies. It is true that (language) dictionaries have, for better or worse, been the subject matter of lexicography for centuries, but it is only a small part of the truth. Dictionaries, regardless of their intended users and their functions, only contain and realize, in their own inspired way, some of the general principles that govern lexicography as an interdisciplinary social and information science. The new science of lexicography is devoted to the development of unique, functional tools to match and satisfy a great variety of needs for information and experience in modern human societies. Its future lies in a change of paradigm, from linguistics into information science.
Notes 1
2
3
The theory has been developed in Denmark at Aarhus University, Aarhus School of Business and Social Sciences and is also internationally renowned as the Aarhus School of Lexicography. The concept of lexicographic functions which lies at the core of the theory must not be mistaken for the concept of dictionary functions (Wörterbuchfunktionen) developed by the German meta- lexicographer Herbert. E. Wiegand (2001). Centlex is a research centre at Aarhus University, Aarhus School of Business and Social Sciences. It can be contacted at: www.asb.dk/article.aspx?pid=895 Dawson Books is Europe’s largest supplier of academic books, ebooks, shelf ready services and information systems to the University and FE markets. Its online database can be retrieved from: www.DawsonEnter.com
9781441128065_ch06_finals_txt_print.indd 139
7/6/2011 11:06:09 PM
140 4
5
6 7
e-Lexicography
These works encompass monolingual, bilingual and multilingual dictionaries and represent a large number of languages and a broad variety of dictionary functions: documentation, communication and cognition, pedagogical applications and so on. To complete the picture, at the risk of adding to the confusion, J. Pruvost (2006) has added meta-lexicography (the scientific study of dictionaries) as the third component of ‘lexicography’. L1 being the user’s native language, and L2 the user’s foreign language. The goals of the ARTES (Aide à la Rédaction de Textes Scientifiques) project can be downloaded from www.eila.univ-paris-diderot.fr/recherche/clillac/ciel/ index
9781441128065_ch06_finals_txt_print.indd 140
7/6/2011 11:06:09 PM
Chapter 7
From Data to Dictionary Sandro Nielsen Richard Almind
The world does not contain any information. It is as it is. Information about it is created in the organism through its interaction with the world. To speak about the storage of information is to fall into a semantic trap. Books or computers are parts of the world. They can yield information when they are looked upon. Illich 1973: 101
7.1. Introduction Online dictionaries are integrated parts of a world that is constantly changing its reality. The first online dictionaries were based on existing printed dictionaries firmly grounded in text linguistics, and the second generation was based on specially developed databases, with input from linguistics as well as computer science. However, these dictionaries do not fully meet users’ need for help and knowledge, because they are seldom made to satisfy specific user needs in specific types of situation in which users consult dictionaries to find help to solve specific types of problem. For lexicographers to satisfy these needs in the best possible way, it is necessary to re-assess the practical and theoretical foundations of online lexicography in light of the electronic options available to produce targeted reference tools and the advances described in the recent literature. The project referred to as the Accounting Dictionaries illustrates a combined approach to making electronic dictionaries. At present, the project consists of a set of two monolingual and three bilingual online dictionaries, with the languages Danish, English and Spanish – a Spanish dictionary and a Spanish– English dictionary are in the pipeline. The theoretical basis underlying the project is not text linguistic but gives priority to lexicographical functions, that is, the help these dictionaries can give to users in specific types of situation where users require knowledge to resolve issues relating to accounting.
9781441128065_ch07_finals_txt_print.indd 141
7/6/2011 11:06:27 PM
e-Lexicography
142
The dictionaries are designed to satisfy certain types of user need through the careful selection of data and specific needs-adapted, technical options for accessing data so that the answers to questions match user needs in different contexts. We first present and discuss the practical, technical basis of the project, which is made up of two distinct components: database and dictionary. Then we discuss the theoretical framework and finally show that the theoretical and practical bases are integral features of the project in that neither can stand alone.
7.2. Distinguishing between Dictionaries and Databases from a Practical Perspective Databases and dictionaries are two different things. Many people will find this obvious but when discussing online reference works such as the Danish Accounting Dictionary, many lexicographers quickly forget the distinction and jump to strange conclusions, often mixing terminology from linguistics and computer science, thereby confusing the matter. Databases are vessels that contain data and nothing else. They have no functionality per se. Data are stored in discrete fields defined to contain specific types of data, for instance, numbers, dates, alphanumerical strings and so on. Since most of the data elements in lexicographically relevant databases are of the type text and therefore rather uninteresting from a computational point of view, this paper describes the data elements using lexicographical terminology such as collocation, example, synonym and so on, and the focus is on how data elements relate to each other, which is much more interesting. To gain a better understanding of this, it is necessary to stress that the most important thing in a database is to avoid redundancy. This means that the number of duplicate data must be reduced to a minimum. Examples, for instance, can be reused and addressed to several specialized terms. Without a database each example would be typed into a fi xed place in all the articles concerned (hard coding) and be separate entities. If an error arises in one of these fi xed places, the correction of that error must be carried out and verified at all other places, that is, in all the relevant articles. In a database, a specific example only exists virtually on the computer and can be related to many terms. Changing it in one place changes it everywhere it is being used. Relations can under certain circumstances be made automatically, but this is not recommended. Although relations, virtuality and avoidance of redundancy are important features, the most interesting part is the interdependency of data. Typically, grammar cannot exist without, that is, is dependent on, a term, examples cannot exist without, that is, are dependent on, a definition and so on. In order to create such dependencies, data elements are placed into tables, for instance a table for collocations, one for examples and so on. These fields are related
9781441128065_ch07_finals_txt_print.indd 142
7/6/2011 11:06:27 PM
From Data to Dictionary
143
hierarchically to each other and form paths along which queries can travel to retrieve data. Data retrieval then becomes a matter of querying the database in a predetermined order and collecting data as the query moves along the relational paths. Data elements that are not related cannot be retrieved without breaking rules, thereby putting the integrity of the database as a whole at risk by disrupting the coherence of the data. Lexicographers often see this stringency as a restricting factor that leads to many and heated discussions, but complex databases cannot be maintained without it, especially, when more than one editor is involved. In a relational database there is no room for exceptions: Either the lexicographical element is coded into the relational hierarchy or it is omitted entirely. Usually, the process of creating a relational hierarchy reveals serious flaws in lexicographical instructions and, although painful, correction of those flaws always leads to more coherent dictionaries. Users who wish to retrieve data will initiate queries that reflect their need for help. All data elements in a query and its result are interchangeable. Depending on the user’s need, some elements will be included and others will not. In some cases, the data elements queried will be displayed in the result and in others they will not, but quite often an element will be displayed which was not included in a query. The data displayed are called a data set. This set is formatted into one or more articles and each article contains data that are in some way related to all the other data in this particular article but which can be related to other articles as well. At this point, when the result is formatted and displayed to users, the data ceases to be a part of the database and becomes the dictionary. In essence, a database is a very specialized and sophisticated text corpus. The Danish Accounting Dictionary originally consisted of a monolingual Danish dictionary and a bilingual Danish–English dictionary containing primarily terms from the International Financial Reporting Standards (IFRS). The underlying databases held no particularly interesting challenges from a programmer’s point of view. Designing the editorial system was very straightforward and based on the usual lexicographical set of instructions, adopted a traditional two-tier approach, with lemma and definition on top, grammar, synonyms, antonyms, collocations, examples and so on, on level two. Alterations to this two-tier setup were made for purposes of the bilingual dictionary. After the entry of data had been completed, the database was inverted to form the basis for an English monolingual dictionary and an English–Danish bilingual dictionary. At this point, two separate databases existed. The editors intended the English part of both databases to be synchronous but since that was a manual task, synchronization gradually slipped. However, the core structures of the databases were basically in order and data had been added and updated in parallel, although many decisions made for the structure of the English–Danish version did not exist in the Danish–English version, which
9781441128065_ch07_finals_txt_print.indd 143
7/6/2011 11:06:27 PM
e-Lexicography
144
caused problems later on. Furthermore, the databases had suffered under a lenient data policy from the beginning and the inversion only made this worse. Despite these shortcomings, the databases worked well as repositories for the original Accounting Databases for several years. In 2008, Pedro Fuertes-Olivera approached Centre for Lexicography with a request to create a Spanish variant of the Accounting Dictionaries. His idea was to copy the concept and create a copy of the structures, with Spanish replacing Danish. The idea had much merit had it not been for the problems mentioned above, and instead a copy of the English–Danish core was converted into an English–Spanish version in 2009, with the promise of finding a more viable solution. Now three separate databases existed that had the English database in common, although they were independent of each other and still had to be synchronized manually. The lexicographers in charge decided that a complete conversion of the databases was necessary and all further editing, apart from the Spanish data, was halted. The conversion had to be planned and constructed with an open-ended database structure in mind. The structure had to accommodate not only the Spanish addition but also any other language that may be added in the future. For various reasons, the structure is Anglo-centric and the possibility of bypassing the English component to form, for instance, a Danish-Spanish relationship is not part of the plan. The resulting system of databases will become a hub-and-wheel-structure like the one illustrated in Figure 7.1. The lexicographers expected the conversion process to last one year but we are only now, after two years of re-programming and conversion, reaching the point where the relational hierarchy can be said to be coherent, although
Danish
Ln
Spanish English
L5
L4
Figure 7.1: The intended structure of a complex set of accounting dictionaries.
9781441128065_ch07_finals_txt_print.indd 144
7/6/2011 11:06:28 PM
From Data to Dictionary
145
some flaws remain. Much manual consolidation is still taking place but at the time of writing (August 2010), the editing of English and Danish data has been resumed. At present, we are finalizing the Spanish–English bilingual and Spanish monolingual structures and the database already contains the Spanish–English data, but the work on editing tools is still in progress. Conversion of the databases has been a frustratingly difficult and lengthy process and many good ideas had to be postponed or discarded. Usually, when designing a database the concept of ‘normalization’ takes precedence over any other decision. With a database that is already designed and where such decisions have been secondary, it is nearly impossible to find satisfactory solutions without the loss of data. In consequence, no normalization can take place before the conclusion of labour-intensive control and cross-checking of data integrity. Sandro Nielsen has just completed manual consolidation of the monolingual English and Danish data and is currently consolidating the data for bilingual consistency. Once the Spanish part is released for editing, we will have the following monolingual structures in place (Figure 7.2). As Figure 7.2 shows, the monolingual relational structures are hinged on the accounting term, with grammar taking precedence over definition, which is not the case in the bilingual relational structure. The next step is to cross-reference equivalence-relations between the existing language pairs Danish–English, English–Danish and English–Spanish. These relations have only recently become stable and have been the single most difficult part of the conversion process. With the new structure in place consolidation will take place at the level of definitions as shown in Figure 7.3.
English Lemma
Grammar
Danish Lemma
Grammar
Spanish Lemma
Grammar
Definitions
Definitions
Definitions
•Synonyms •Antonyms •Collocations •Examples •Etc.
•Synonyms •Antonyms •Collocations •Examples •Etc.
•Synonyms •Antonyms •Collocations •Examples •Etc.
Figure 7.2:
Relational database structure for monolingual use.
9781441128065_ch07_finals_txt_print.indd 145
7/6/2011 11:06:28 PM
e-Lexicography
146
Danish Definition
English Definition
Spanish Definition
Lemma Grammar
Lemma Grammar
Lemma Grammar
Synonyms
Synonyms
Synonyms
Antonyms
Antonyms
Antonyms
Collocations
Collocations
Collocations
Examples
Examples
Examples
Etc.
Etc.
Etc.
Figure 7.3: Relational database structure for bilingual use. For the purposes of the bilingual dictionaries, translation is defined as the process of finding the data related to the definition of any given L2-accounting term, which in turn is related to the definition of any given L1-accounting term as a result of user queries. For instance, a user’s need to translate a Spanish accounting term will initiate a query that starts at a given data element in the Spanish column. In case of a preliminary result of at least one found element, the query algorithm resolves the elements’ relations to their respective definitions, at this point a preliminary (Spanish) data set is bundled. These bundled definitions are then isolated and their relations to English definitions resolved. At which point, all data sets are bundled and will cease to be part of the database and only exist virtually. The query algorithm then further sorts and formats the data elements in each bundle according to parameters defined by the lexicographers and displays the resulting articles on the screen. These articles are the final dictionary. With these structures in place and once resources are available again, the lexicographers can begin proper normalization processes. There is, for instance, a dire need to integrate the synonym, antonym and lemma tables into
9781441128065_ch07_finals_txt_print.indd 146
7/6/2011 11:06:29 PM
From Data to Dictionary
147
the grammar table. Many lexicographers find this counterintuitive because they regard the elements as parts of a dictionary. These data elements are just lemmata that are related to other lemmata and placing them in separate tables causes redundancy and complicates editing. The structures of the tables of synonyms, antonyms and lemmata are identical and there is no reason not to combine them; however, it makes good sense to combine them with the grammar table as this facilitates the possibility of querying grammatical variations in, for instance, collocations and examples. The grammar table then becomes the lemma table and another level of complication has been eliminated from the system. In databases it is necessary to reduce redundancy on as many levels as possible including the structure itself. Simplicity makes for good systems. Nevertheless, online dictionaries should not merely be discussed from a practical perspective as a different view may highlight new features.
7.3. Presenting Dictionary and Database as a Triadic Setup The view of the world generally changes as we change our perspective, and this is equally true for our perception of dictionaries. So far, we have analyzed and described the aspects that constitute the technical foundation of the Accounting Dictionaries from what may be called an insider’s perspective, but it is also appropriate to look at online dictionaries from the outside. When people talk about electronic dictionaries they often have in mind a database that is accessed by users from an interface whose sole function is to give direct access to the dictionary, that is, the database. In other words, the relationship between database and dictionary is a one-to-one relationship and the database is the dictionary. However, the Accounting Dictionaries have a different structural setup. Some of the components are similar to those outlined above, but the dictionaries are best described as a triadic construction in that the structure consists of three main components. First, there is a database containing specially selected data that have been structured in a way that facilitates search and retrieval. Secondly, users will see one or more dictionaries, for example, the English Accounting Dictionary and the English–Danish Accounting Dictionary, which are websites that, strictly speaking, do not contain the lexicographical data, as these are contained in the database and not in the user interface. Thirdly, in order to provide access to the lexicographical data, a search engine is introduced as a mediator between the dictionary (user interface) and the database. This search engine allows users to search for data in the database and from there it retrieves the relevant data for each of the dictionaries and presents the results of searches to users according to their requests. In this case, there are several dictionaries and one database, and the relationship between database and dictionary is, thus, a one-to-many relationship.
9781441128065_ch07_finals_txt_print.indd 147
7/6/2011 11:06:30 PM
e-Lexicography
148
Describing dictionaries in terms of the above triadic structure has a number of practical and theoretical implications. First of all, the database is the source of several dictionaries. The search engine allows users to make structured searches in structured data in a restricted universe through a specific set of online dictionaries. Moreover, the dictionaries contain, or may contain, several independent components, as the search engine provides links that give direct access to components that support the appropriate functions and use of the dictionary, such as a user guide and a planned subject-field component giving a structured presentation of and introduction to the field of accounting along the lines described in Bergenholtz and Nielsen (2006) and Fuertes-Olivera (2009a). Finally, lexicographers frequently insist that online dictionaries contain macrostructures. This may be true if the traditional structure from printed dictionaries is used, for instance where the dictionary is merely a printed dictionary presented in electronic form, or where the database is the dictionary, but the Accounting Dictionaries do not have any macrostructures in the text-linguistic sense of the word: a lexicographical structure that arranges lemmata in a specific order so that they can easily be found (e.g. Hausmann and Wiegand, 1989: 336). The Accounting Dictionaries have no wordlists in the traditional sense. Instead, they allow users to access data in the database and present the search results on the computer screen – and such a result may be one single article retrieved from the database no matter where its data were actually located. The macrostructure has been replaced by a ‘data presentation structure’ that is supported technically by an output device which arranges the data retrieved from the database according to type, and presents these data in a predetermined order depending on user needs as identified by the type of help sought. In order to cope satisfactorily with these practical and theoretical challenges, it is helpful to establish a set of principles that can guide lexicographers.
7.4. The Theoretical Basis Focuses on User Needs A theoretical basis rooted in text linguistics does not allow lexicographers to fully exploit the options of online lexicographical tools. Therefore, it was necessary to find another framework within which to place the dictionaries and the lexicographers adopted a theoretical foundation that is grounded in a functionally based theory as described by, for example, Bergenholtz and Tarp (2010), Nielsen and Mourier (2007) and Tarp (2010). According to Bergenholtz and Tarp (2010: 30), a lexicographical function may be defined as ‘the satisfaction of the specific types of lexicographically relevant needs that may arise in a specific type of potential user in a specific type of extralexicographical situation’. The types of situation in which dictionaries may be helpful are numerous, so for practical purposes, the Accounting Dictionaries
9781441128065_ch07_finals_txt_print.indd 148
7/6/2011 11:06:30 PM
From Data to Dictionary
149
have two main types of function. Communicative functions provide help where ongoing or planned communicative acts occur and cognitive functions provide help where people want to acquire knowledge. In summary, the functions of the Accounting Dictionaries are to: z provide help translating accounting texts into and from Danish, English
and Spanish z provide help producing accounting texts in Danish, English and Spanish z provide help understanding Danish, English and Spanish accounting texts z provide help acquiring general or specific knowledge about accounting
matters in Danish, English (and Spanish) It should be appreciated that these are just examples of possible dictionary functions, and it is important to realize that they are all related to problems and needs that arise in an extra-lexicographical environment. The needs of users are not abstract, theoretical concepts conjured up by meta-lexicographers, but they are found in real-life user situations involving real people. User situations occurring in the real world are, in fact, unrelated to dictionaries. For example, persons who are writing texts are in an extra-lexicographical environment. At this point they are writers of texts and no more than potential dictionary users. While writing, they come across a problem specifically related to text production and think they can solve it by consulting a dictionary. Once they consult the dictionary, they have moved into the lexicographical environment and are now actual dictionary users. When they have found the answer to their questions, they leave the lexicographical environment and go back to writing their texts in the extra-lexicographical, text-production environment. The user situations that gave rise to the dictionary consultation are completed, and the writers are now potential dictionary users again. The writers may encounter a new text-production problem and have recourse to a dictionary – a new user situation – and this may be repeated several times until the texts have been completed, so that, during the writing process, there was a succession of discrete user situations pertaining to the same, similar or different user needs and the same or different dictionaries. Having said that, it is necessary to link user situations and user needs to user types. The various competences and levels of competence dictionary users have play a significant role, depending on whether their competences help them or let them down in the specific types of user situation identified above. Lexicographers can get an idea of the relevant competences by dividing users into general groups, and according to Nielsen (1990: 131) and Bergenholtz and Kaufmann (1997:98–99), it is relevant to distinguish between experts, semi-experts and laypeople. The levels of competence of these three groups indicate the lexicographically relevant user needs and, thus, provide
9781441128065_ch07_finals_txt_print.indd 149
7/6/2011 11:06:30 PM
150
e-Lexicography
lexicographers with a workable basis for selecting dictionary functions. The intended user groups of the Accounting Dictionaries can be summarized as follows: z Translators and language staff z Accounting experts and semi-experts z Students and laypersons interested in Danish, English and Spanish account-
ing matters The members of these user groups have different factual, linguistic, production and translation competences (cf. the functions listed above) so the dictionaries need to contain data that help users where competences are inadequate. The competences of users may be identified through the profiling of intended users, and this can be done in several ways. Bergenholtz and Nielsen (2006: 285) suggest that one way in which to ensure that users get the help they need is to identify their characteristics by answering a number of questions in a diagnostic checklist, which has been adapted for the Accounting Dictionaries as follows: z z z z z z z z z z
Which language is their native language? At what level do they master their native language? At what level do they master a foreign language? How extensive is their experience in translating between the languages in question? What is the level of their general cultural and factual knowledge? At what level do they master the special subject field of accounting? At what level do they master accounting LSP in their native language? At what level do they master accounting LSP in the foreign language? At what level do they master production of accounting texts in their native language? At what level do they master production of accounting texts in the foreign language?
The answers to the above questions will show the competences of the target group of the dictionary, and enable lexicographers to put data into the dictionary that will help users where the competences are insufficient. It seems reasonable to assume that translators and language staff have a considerable general linguistic competence, a medium to high competence in accounting LSP, a small to medium factual competence within accounting, considerable translation competence, and considerable competence in writing general and specialized texts in their native language and a foreign language. Accounting experts and semi-experts generally have a considerable factual accounting
9781441128065_ch07_finals_txt_print.indd 150
7/6/2011 11:06:30 PM
From Data to Dictionary
151
competence and a small to medium linguistic competence in relation to a foreign language, little or no translation competence, medium to high competence in producing native-language texts, and little or no competence in producing texts in a foreign language. Students and laypeople interested in accounting matters can generally be assumed to have small to medium competence across the board and will therefore share many levels of competence with the two first groups. The result is that the first group will need more data on factual accounting matters than will accounting experts and semi-experts, whereas the second group will need more data that can help them with linguistic, production and translation problems than will translators and language staff. All user groups will need data on factual differences between equivalents and accounting specifics in the foreign culture, as they cannot be expected to have detailed knowledge about these types of particulars. This means that several types of data come into play when lexicographers attempt to help users with asymmetric competence levels.
7.5. Selecting Data for the Accounting Dictionaries Is a Multi-Stage Process Each dictionary project should establish its own criteria for data selection. This selection encompasses lemmata, whether single-word or multi-word units, collocations, phrases, synonyms, antonyms, examples and any other types of data that support the functions of particular dictionaries. However, before they can embark on the selection of function-supporting data, lexicographers may build one or more balanced and representative text corpora that can provide relevant input for the selection process. Proposals on how this can be done with specialized subject fields have been put forward by Pedersen (1995) and Svensén (2009), who both advocate a multi-stage approach. The first step is to make an external subject classification that identifies the boundaries of accounting against other subject fields to ensure that the corpus contains only accounting texts that are relevant for the dictionary functions, the intended users, their needs and competences. Accounting is a subject field that is affected by other subject fields because there are specific rules for example, accounting for financial instruments, insurance contracts and income tax. The relevant texts selected for the Accounting Dictionaries are not domain-specific expert texts on insurance, taxation and so on, but accounting texts dealing with the rules for accounting for those subjects as defined by the International Financial Reporting Standards (IFRSs), with which all companies listed on stock exchanges in the European Union must comply. The next step is to make an internal subject classification that shows the structure of the subject field so that lexicographers can select and present data that reflect the structure of the field of accounting. This was originally limited
9781441128065_ch07_finals_txt_print.indd 151
7/6/2011 11:06:30 PM
e-Lexicography
152
to financial accounting, that is, the preparation and presentation of financial statements for use by the general public, but as work progressed, it became clear that it was necessary to include management accounting. This branch involves the preparation and use (including bookkeeping) of accounting data for internal use by business managers and as these activities are directly reflected in the results in financial statements, the two branches of accounting are interrelated. The preparation of the external and internal subject classifications resulted in three language-specific electronic text corpora containing authentic, subject-specific texts of the following types: z International Financial Reporting Standards (IFRS) in English and their
official translations into Danish and Spanish. z National financial reporting and bookkeeping standards and statutes. z Financial statements published by international and national companies. z Information material on financial and management accounting published
by international and national accounting firms. These electronic corpora are made up of publicly available electronic texts, but printed texts unavailable in electronic form were also found helpful, especially explanatory texts. Consequently, the electronic corpora were supplemented by corpora of the following types of printed texts: z Textbooks on financial and management accounting in the three
languages. z Existing dictionaries covering accounting in the three languages.
A proper internal classification requires a detailed identification of the structure of accounting. For example, balance sheets are part of financial statements and lexicographers should divide the balance sheet into even smaller parts in order to establish the structure of financial statement components. Balance sheets can be divided into two parts, with assets in one, and equity and liabilities in the other. Once they complete this, lexicographers should dig even deeper and identify the smallest structural components of, for example, equity, which can be divided into share capital, other reserves and retained earnings. The internal classification may be illustrated by the following list showing the main components of that section of balance sheets that concerns equity and liabilities: 1. 1.1 1.1.1 1.1.2 1.1.3
Equity and liabilities Equity Share capital Other reserves Retained earnings
9781441128065_ch07_finals_txt_print.indd 152
7/6/2011 11:06:31 PM
From Data to Dictionary 1.2 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5
153
Non-current liabilities Long-term borrowings Derivative financial instruments Deferred income tax liabilities Retirement benefit obligations Other long-term provisions
Finally, a terminological classification helps to locate and identify the terms that are found in the structural make-up of financial and management accounting. However, lexicographers can only make an optimal terminological classification by identifying the terminological components of each structural component in, for instance, the balance sheet. This will provide lexicographers with a detailed picture of the structure and terminology of balance sheets. One example of such a terminological classification is the following showing the terminological components of a small section of balance sheets, namely other reserves under equity: 1.1.2. 1.1.2.1 1.1.2.2 1.1.2.3
Other reserves Share premium Treasury shares Fair value adjustments and other reserves
These are only examples of how a small part of the subject field of accounting can be divided and sub-divided for selection purposes. The three-step process described above ensures that only relevant texts and data are selected and that the focus is on texts and data that can support dictionary functions. However, it is important to note that internal and terminological classifications need to be made for Danish, English and Spanish, because structures vary. As English covers IFRS, UK and US English, lexicographers must also make internal and terminological classifications for each, because both structures and terminology vary. The selection of data is, to a large extent, based on three electronic and three printed text corpora, one for each language. These corpora have been gradually extended and now each electronic corpus contains more than 4 million words. The largest corpus is the English one as this is the lingua franca of accounting and covers three variants: international English (as used in the IFRSs), British English and American English. Care is taken to extend the corpora on an ongoing basis to keep up with the dynamics of accounting terminology and the development in national and international accounting. However, lexicographers working with text corpora cannot be expected to have covered and found everything relevant – for example, due to lacunae in corpora and lack of factual knowledge – so accounting experts were consulted
9781441128065_ch07_finals_txt_print.indd 153
7/6/2011 11:06:31 PM
154
e-Lexicography
in the initial phase of building the corpora and are consulted regularly as work continues. This ensures that, inter alia, relevant terms and collocations are selected, and that definitions and equivalents are factually correct. The result of the selection of lemmata gives an indication of the size of the dictionaries. The English Accounting Dictionary, the Danish Accounting Dictionary, the English–Danish Accounting Dictionary, the Danish–English Accounting Dictionary and the English–Spanish Accounting Dictionary are revised and updated on an ongoing basis and at the time of writing (August 2010), each dictionary contains approximately 6000 lemmata, which include single-word units as well as multi-word units. Since terms, according to Laurén (1993: 99–100), account for fewer than 20 per cent of the words in specialized texts, the dictionaries give users access to several word classes relevant for the functions described in Section 4 above: nouns, verbs, adjectives, adverbs and abbreviations. In order to provide the best help possible, users can search for inflected forms of all the word classes. Furthermore, each dictionary contains more than 20,000 collocations and phrases and between 1,000 and 2,000 examples, and the bilingual dictionaries contain translations of these types of data in order to help users produce, translate and understand accounting texts as well as acquiring general or specific knowledge about accounting matters. The entire process of selecting data for the dictionaries is governed by a single overriding principle. This principle may be referred to as the principle of relevance. For the purpose of lexicographical selection, relevance means the quality of being directly connected with the subject field in question, the function(s) of the dictionary, the types of user situation in which the dictionary is intended to be used, and the various competences of intended users. Relevance is a fundamental qualitative characteristic as it can be used to distinguish useful lexicographical data from data that is not lexicographically useful, and in this context, useful lexicographical data means data that directly support a lexicographical function. For example, collocations are selected because they are important when producing accounting texts, because they are important when translating accounting texts (and often difficult to translate between the languages concerned), and examples are selected because they specifically show how to write and translate accounting texts as well as provide data for knowledge building. In other words, the lexicographical concept of relevance helps lexicographers decide which texts to include in corpora and which data types to include in dictionaries; and relevance ensures that the data selected are actually connected with the dictionary function(s), and that the data are presented in a way that satisfies user needs. Based on the competences revealed by the profiling of intended users, various data types have been selected for inclusion in the database, as indicated by Figures 7.2 and 7.3, above. The database contains the following data types
9781441128065_ch07_finals_txt_print.indd 154
7/6/2011 11:06:31 PM
From Data to Dictionary
155
and the output unit will present them to users in a way that is structured to the needs of users in specific user situations: z Lemma z Grammatical data addressed to lemma (inflection, countability, active and
passive forms) z Equivalent z Grammatical data addressed to equivalent (inflection, countability, active z z z z z z z z z
and passive forms) Definition Collocations (short and long phrases but no full sentences) Examples (full sentences) Synonyms and antonyms (addressed to lemma and/or equivalent) Source (reference and/or link) Grammar note (addressed to lemma or equivalent) Usage note (addressed to lemma or equivalent) Cross-reference (to relevant data) Not recommended, use instead (proscriptive note)
The most complete data set linked to a lemma will contain all these types of data, except the last one. Which data types will be presented on a given consult ation, however, depends on the search option selected by users.
7. 6. Search Options Depend on User Needs Access to data in the Accounting Dictionaries is linked to user needs. By focussing on the needs of users in various types of user situations, lexicographers can ensure that data are retrieved that satisfy a specific type of need and then presented in such a way that users can easily turn the data into useful information. In order to achieve this goal, the Accounting Dictionaries offer users a number of search options that depend on the dictionary consulted. In a particular type of user situation, users will need help of a specific kind and consult the dictionary that is most likely to help them. Users who consult one of the monolingual accounting dictionaries can elect to search for the following kinds of help: z z z z
Help to understand an accounting term Help to produce an accounting text where the expression is known Help to find a term where the meaning is known Show all data
9781441128065_ch07_finals_txt_print.indd 155
7/6/2011 11:06:31 PM
156
e-Lexicography
Users who consult a bilingual accounting dictionary can elect to search for the following kinds of help: z Help to understand an accounting term z Help to translate an accounting term z Help to translate a collocation or phrase
As these two lists indicate, each dictionary has different functions and search options tailor-made for each function. It is thus possible to argue that there are not just three multi-functional monolingual dictionaries, but twelve mono-functional monolingual dictionaries, and that there are not three multi-functional bilingual dictionaries, but nine (soon to be twelve) monofunctional ones. From the users’ point of view, the Accounting Dictionaries work in a relatively simple way . When they consult one of the dictionaries, users go to the appropriate dictionary website where they will have to use the search engine that searches the database and retrieves the relevant data. These data will then be presented to the user on the dictionary website in a predetermined order. How the search in the database is conducted and how the data are presented depends on user needs. Users who want to know the meaning of the English accounting term ‘deemed cost’ found in an accounting text, may consult the English Accounting Dictionary and select the option ‘help to understand an accounting term’. The search engine will then make a targeted search for the term input in the search box in the database in the field containing the data type inflection. This data type covers the canonical form of the lemmata as well as their inflected forms, which allows users to type in inflected word forms in the search box. The search engine will retrieve the data addressed to the search word, which for this particular function includes the lemma, homonym index (if any), polysemy index (if any) and definition. Figure 7.4 shows the search result presented by the output device. The dictionary presents to users data that are intended to help them understand terms found in accounting texts: the meaning of the term searched for, and this was also what the users wanted. However, definitions written for accounting experts would be too difficult, if not impossible, to understand by the intended users, so the definitions have been written with the identified user competences in mind. This was done so that the definitions represent fairly the meaning of the terms as defined by accounting experts in order to meet the factual needs of semi-experts. The general principle is that of referential focus as described by, for example, Harris and Hutton (2007: 210–215), which implies that definitions in special domains such as accounting have narrow and specific, as opposed to broad and general, referential foci, in that
9781441128065_ch07_finals_txt_print.indd 156
7/6/2011 11:06:31 PM
From Data to Dictionary
157
Figure 7.4: Definition as help to understand an accounting term. they contain the conceptual features that are found in the domain concerned in contextualized communication. Finally, all definitions are written as full sentences using natural language, so that users can easily turn the data into information by a mental process. Important aspects in connection with understanding the meaning of specialized terms are homonymy and polysemy. In order to find the correct meaning of a term, it is imperative that users can identify the correct meaning of words that are spelt identically but have different referential foci. The syntagmatic criterion ‘word class’ is used for treating terms as homonyms, so that the dictionaries clearly distinguish between homographs belonging to different word classes, for example the noun expense and the verb expense. Morphological criteria are generally used in cases of polysemy, so that homographs that can be both countable and uncountable are treated as being polysemous, such as the noun authority (‘power to make contracts on behalf of another’ (uncountable) and ‘governmental agency’ (countable)), and words of the same word class that have different inflectional paradigms are treated as polysemous. In a few cases the determinant is referential focus, for example, where the meaning is subject to jurisdictional constraints: the abbreviation ISA can mean both International Standard on Auditing (IFRS) and Individual Savings Account (UK). Even though definitions are often regarded as help to understand texts, they also contribute to other types of function.
9781441128065_ch07_finals_txt_print.indd 157
7/6/2011 11:06:32 PM
158
e-Lexicography
Figure 7.5: Data presented to give help to produce accounting texts where the expression is known. In a different kind of scenario, users might need help to write texts in English and need to know how they might use the term ‘deemed cost’. In this case, they would select the option ‘help to produce an accounting text where the expression is known’ as they have identified the term they need help with. The search engine would search the data fields for the following three types of data: inflection, collocation and example. As a result of the search, the following data types addressed to the search word will be retrieved (as applicable): the lemma, homonym index, polysemy index, definition, inflection, collocations, examples, synonyms, antonyms. Users would then be presented with the result shown in Figure 7.5. The data types in Figure 7.5 all support the function of giving help to write accounting texts in English. The definition is necessary to determine that the word has the right meaning for the context and the grammar data, collocations and examples support the text-production process. It can generally be said that the production of texts involve a planning stage, an execution stage and a finalization stage, and for practical purposes, Nielsen (2006: 49) suggests that lexicographers should focus on providing help in the execution and finalization stages. Rude (2002: 15–16) explains that the execution stage involves the writing of drafts, the design of texts, as well as the revision and editing of texts, and Mossop (2007: 23–28) points out that revision and editing concerns checking texts to make sure that grammar and spelling rules are
9781441128065_ch07_finals_txt_print.indd 158
7/6/2011 11:06:33 PM
From Data to Dictionary
159
complied with (copyediting), that terminology is consistent, and that texts are free from errors (revising). These steps collectively concern lexis and syntax, for example, terminology and phraseology; grammar and syntax, for example, coherence and cohesion; and pragmatics, for example, implicatures and presuppositions. Dictionaries whose function is to help with text production should contain data that help users perform the activities covered by these steps of the production process. The two examples above show how the English Accounting Dictionary can assist users in communicative situations, but it also provides help in cognitive situations. Users might want to acquire general or specific knowledge about the accounting concept reinsurer and activate the dictionary’s cognitive function by selecting the option ‘show all data’, that is, all data addressed to the searched lemma. The search engine would make a targeted search in the database in two types of data, namely inflection and definition, and retrieve the relevant data types (as applicable): the lemma, homonym index, polysemy index, definition, inflection, cross-references (including any usage notes, and homonymy and polysemy indices), collocations, examples, synonyms, antonyms, sources (including any links), usage notes, and contrastive notes. The output device would present the data in a predetermined order like that shown in Figure 7.6.
Figure 7.6: Help in cognitive situations providing all data addressed to the search word.
9781441128065_ch07_finals_txt_print.indd 159
7/6/2011 11:06:34 PM
e-Lexicography
160
The term ‘reinsurer’ occurs in seven articles and Figure 7.6 shows the article presented at the top of the list. However, it should be noted that the seven articles show users all the data in the database addressed to this particular term and which are intended to help users acquire knowledge. In Figure 7.6, for instance, the definition explains the meaning of the term, which is complemented by the example; and the synonym and antonym help users put the term ‘reinsurer’ in its terminological place in the internal structure of the subject field. By clicking the cross-reference (see also), users are directed to another article where relevant additional data can be found and the item indicating the source of the definition is clickable and will send users to the website where the international financial reporting standard is found. There, users can find more information about the term and thereby gain more knowledge, for example, a definition written for experts. The other six articles all contain the term ‘reinsurer’ in their definitions and provide users with the opportunity to expand their knowledge bases. If users want to use a word which they are unsure about because they only know the meaning but not which exact word to use, the English Accounting Dictionary can help them. Users who are writing accounting texts and want to use the correct word for the meaning ‘making securities available’ can type this phrase into the search box and select the search option ‘help to find a term where the meaning is known’. In this case, the search engine would search the database in the following data types: definition, usage note, synonym, and antonym. Users would then be provided with the following types of data addressed to the search word (as applicable): the lemma, homonym index, polysemy index,
Figure 7.7:
Data providing help to find a term where the meaning is known.
9781441128065_ch07_finals_txt_print.indd 160
7/6/2011 11:06:35 PM
From Data to Dictionary
161
definition, inflection, collocations, example, synonym, and antonym. The dictionary would show users the search result as shown in Figure 7.7. This search gives two possible solutions: a noun and a verb that have the meaning the users were looking for. On the basis of the data presented in Figure 7.7, users can then choose the appropriate word and find help in the execution and finalization stages of text production in the form of inflection, collocations and examples. As can be seen, this search looks for the meaning in the definitions (highlighted), and provides users with accounting terms that have the specific meaning required. The first four examples illustrate the use of monolingual dictionaries and it may be appropriate to compare them with searches in bilingual dictionaries. It should be appreciated that searches are conducted in and the results retrieved from one and the same database, whether dictionaries are bilingual or monolingual. Danish users who translate accounting texts might need to know how to translate the English term ‘deemed cost’ and consult the English– Danish Accounting Dictionary. After having typed the term into the search box, they would select the search option ‘help to translate an accounting term’. The search engine would search the database in one type of data: inflection. The output unit would present data addressed to the search word and the equivalent of the following types (as applicable): the lemma, homonym index, polysemy index, language code, definition, inflection, contrastive note, usage note, antonym (with language code), synonym (with language code), collocation (with language note), and examples, all addressed to the lemma; and equivalent as well as contrastive note, inflection, translation of collocations, translation of
Figure 7.8: Data giving help to translate an accounting term.
9781441128065_ch07_finals_txt_print.indd 161
7/6/2011 11:06:37 PM
162
e-Lexicography
examples, synonym and antonym, all addressed to the equivalent. Furthermore, there might be cross-references to relevant terms and an indication of a source with a link, if any. Figure 7.8 shows the search result for ‘deemed cost’. This example shows how specific data types can benefit users who translate accounting texts into Danish. In addition to presenting the meaning of the term, the dictionary informs users that the official Danish equivalent of the IFRS term ‘deemed cost’ is anslået kostpris – this is the term used in the official Danish translation of the international financial reporting standard that introduced the concept – and the usage note explains that Danish accountants prefer two other terms: fastsat kostpris and ny kostpris. The two alternatives to the official term were created by Danish accountants before the publication of the European Union translation of the international accounting standard because they needed to be able to communicate about this concept in Danish. The important point is that users need to be made aware of the dynamics of accounting terminology in communicative user situations, so that they can act accordingly and appropriately. English collocations and examples with their translations into Danish are also data that help users to translate accounting texts. Moreover, the data presented in Figure 7.8 benefit users who produce, revise and copy-edit Danish accounting texts based on information and knowledge acquired from reading texts in English. Users might consult the Danish–English Accounting Dictionary because they need help to understand a Danish accounting text and the dictionary provides users with an explanation. If users need help understanding the Danish accounting term resultatopgørelse, they would select the option ‘help to understand an accounting term’. The search engine would search one data type in the database: inflection. The data types retrieved and presented in the dictionary are (as applicable): the lemma, homonym index, polysemy index, and definition, all addressed to the lemma; and equivalent(s), language code, usage note, and inflection, all addressed to the equivalent(s). The dictionary would show the following result of the search (Figure 7.9) for resultatopgørelse. The Danish definition in Figure 7.9 clarifies the meaning of the term and the two English equivalents might support this function as they work as synonyms for those who have the necessary foreign-language competence. Note that the English equivalents are marked as international English (IAS/IFRS), American English (US) and British English (UK), respectively. When they translate texts into a foreign language, users often need help to combine terms with other words to form collocations and phrases. Danish translators who have to translate collocations in which the term resultatopgørelse occur might consult the Danish–English Accounting Dictionary and select the option ‘help to translate a collocation or phrase’. The search engine would search two data types: collocation and example. The dictionary would show the following types of data addressed to the search word (as applicable): the lemma, homonym index, polysemy index, collocations with translations, language code, examples
9781441128065_ch07_finals_txt_print.indd 162
7/6/2011 11:06:38 PM
From Data to Dictionary
163
Figure 7.9: Data giving help to understand an accounting term.
Figure 7.10:
Data giving help to translate a collocation or phrase.
9781441128065_ch07_finals_txt_print.indd 163
7/6/2011 11:06:39 PM
164
e-Lexicography
with translations, equivalent(s) to lemma, and inflection of equivalent(s). Figure 7.10 shows an excerpt of the relevant data types presented by the dictionary relating to collocations and phrases in which the term resultatopgørelse occurs. The search finds 52 articles in which the term resultatopgørelse is found in a total of 95 collocations, phrases and examples. For the sake of comparison, Figure 7.10 shows an excerpt of the article resultatopgørelse and the dictionary gives users the same equivalents as in Figure 7.9. Another important point illustrated by this article is that all Danish collocations are translated twice to match the two recommended English equivalents, and each translation is clearly marked UK, US and IAS/IFRS, as appropriate. This is important in translation situations because the translation of collocations may affect more than the translation of terms themselves. This is evident from the first example, where the Danish collocation artsopdelt resultatopgørelse translates differently depending on whether it is translated into UK English or US/IFRS English: artsopdelt resultatopgørelse income statement classified by nature (US, IAS/IFRS) profit and loss account classified by type of expenditure (UK) Like text production, translation can be divided into a planning, an execution and a finalization stage. Fuertes-Olivera and Nielsen (2008: 670) propose that lexicographers should focus on the execution and finalization stages, primarily because they share many elements with the similar stages of text production described above. Following Bell (2000: 20–21) and Nord (2005: 35), the execution stage of translation involves analysis of source-language texts, transfer of the meaning of these texts into target languages, and recoding messages into final target-language texts in terms of lexis, syntax and pragmatics. Furthermore, translating includes drafting, revision and editing of target-language texts along the lines described by Mossop (2007: 23–28). Consequently, dictionaries whose function is to provide help in translating texts should contain data that help users draft, transfer meaning, revise and edit targetlanguage texts. The above examples show that this is not restricted to data on spelling and inflection, but also includes data on syntagmatic features, which are particularly important where these differ in the two languages involved, as well as differences in English-language variants. Examples 7.4 to 7.10 illustrate how the monolingual and bilingual dictionaries provide different help to users depending on the search option chosen and the function of the dictionary. Furthermore, they also show that the dictionaries contain dynamic data in that the same dictionary may provide different search results if users are looking for help concerning the same term, for example, ‘deemed cost’. This dynamic nature of lexicographical data may be illustrated in connection with the concept of proscription. Bergenholtz (2003) discusses the concept of proscription in lexicography and explains that in a proscriptive
9781441128065_ch07_finals_txt_print.indd 164
7/6/2011 11:06:41 PM
From Data to Dictionary
165
approach, lexicographers do not merely describe language use but recommend specific use in case of alternatives, thereby helping dictionary users explicitly instead of letting users choose among several options. Readers who encounter the term ‘cash generating unit’ in an accounting text and do not know the meaning of it would get the following result from the English Accounting Dictionary when looking for help to understand a term (Figure 7.11). The data contained in Figure 7.11 explain the meaning of the term searched. However, writers who want to produce accounting texts in which the term ‘cash generating unit’ occurs and consult the same dictionary, will select the option ‘help to produce an accounting text where the expression is known’. The search result is shown in Figure 7.12. Figure 7.12 includes the same data as Figure 7.11, but contains an additional proscriptive note informing users who want to write accounting texts that even though the spelling variant of the term ‘cash generating unit’ exists they should use its spelling variant ‘cash-generating unit’. By clicking on ‘cash-generating unit’, users will be directed to the full article containing collocations, examples and synonyms. When they encounter the term ‘cash generating unit’, readers only need the definition to understand what it means, so proscription is not relevant. When they write accounting texts, writers need guidance as to use language and therefore proscription is helpful. The decision as to which spelling variant to recommend is made by the lexicographers after analyzing language use in accounting texts and in this case, the result was to recommend
Figure 7.11:
Data explaining the meaning of an accounting term.
9781441128065_ch07_finals_txt_print.indd 165
7/6/2011 11:06:41 PM
166
e-Lexicography
Figure 7.12: Data containing proscriptive note. the spelling variant with the hyphen because it is the only one used in the international financial reporting standards, whereas US and UK accounting texts use the spelling with as well as without a hyphen.
7.7. Conclusion Printed and computerized dictionaries contain data and not information, but lexicographers must collect and present data that dictionary users can readily convert into information. The traditional linguistic and text linguistic approaches to lexicography have serious shortcomings, so dictionaries based on these approaches do not fully satisfy the needs for help and knowledge users have in specific types of situation. One way to address this problem is to re-assess the practical and theoretical foundations of online lexicography in light of the electronic options available to produce well-crafted reference tools. The work on the multilingual Accounting Dictionaries show that lexicographers should distinguish between the database and the dictionary: The database is not the dictionary, but a repository of structured data and online dictionaries are in effect search engines that search for structured data in databases, retrieve the relevant data, and present them to users in predetermined ways. There is, thus, a clear distinction between databases and the editorial tools used to manipulate their data and the output facing end-users.
9781441128065_ch07_finals_txt_print.indd 166
7/6/2011 11:06:42 PM
From Data to Dictionary
167
The core is an English database with Danish and Spanish databases related to it through the definitions of terms, thereby creating a set of monolingual and bilingual online dictionaries. Users access the data in these databases through online dictionaries that allow them to make structured searches relating directly to the problems they need to solve. The search engines are designed according to lexicographical functions, that is, the type of help dictionaries can provide in certain types of situation. All dictionaries have both communicative and cognitive functions, but they mainly help users to solve problems in communicative situations such as understanding, producing and translating accounting texts. The monolingual dictionaries also help users acquire knowledge about general or specific accounting matters in cognitive user situations. This theoretical foundation allows lexicographers to design and develop dictionaries that search in structured data sets and then retrieve and present data types explicitly selected because they provide help in specific situations. Users who want to know how to use a specific term or phrase in a text-production situation are presented with data that are different from the data presented to users who want to know what that particular term means in accounting texts. The theoretical foundation and practical implications of this type of dynamic online dictionaries allow lexicographers to design dictionaries that satisfy the needs of modern users for practical lexicographical tools.
9781441128065_ch07_finals_txt_print.indd 167
7/6/2011 11:06:42 PM
Chapter 8
Internet Dictionaries for Communicative and Cognitive Functions: El Diccionario Inglés-Español de Contabilidad 1 Pedro A. Fuertes-Olivera Marta Niño-Amo
8.1. Dictionaries for the Third Millennium The investigation into old and new lexicographical solutions usually starts by making a distinction between paper and electronic dictionaries. Nesi (2000), and Bergenholtz and Tarp (2005b), among others, identify electronic dictionaries by referring to CD-ROM, DVD and internet reference works as well as to dictionaries conceived to support the spelling, hyphenation and other functions integrated into text processing programmes. Electronic dictionaries are associated with the advent of the computer that has been used throughout three different lexicographical periods, described by Cerquiglini (cited in De Schryver 2003: 133–34) as (i) computer-assisted (paper) lexicography, (ii) transfer of existing paper dictionary to an electronic medium, and (iii) electronic dictionaries in their own right, conceived afresh for the electronic environment. To the best of our knowledge, most published discussions on electronic dictionaries (for example, de Schryver, 2003), do not take into account the differences between an information database and an information tool. Consequently, they tend to leave key lexicographical aspects out of the discussion. For instance, De Schriver’s (2003) classification does not discuss ‘references to other homepages’, ‘searching with functions’, ‘searching with use of search strings combined after Boolean Operators’, ‘searching with meaning as search string’, ‘searching with associations and hyperonyms as search string’ or ‘searching by using pictures and other illustrations’ (Bergenholtz, this volume). This chapter aims at filling some of the above gaps by discussing the defining characteristics of internet dictionaries (section 3), within a sound and tested lexicographical theory (section 2) that makes room for constructing internet dictionaries in their own right. Within this general theoretical scenario, the
9781441128065_ch08_finals_txt_print.indd 168
7/6/2011 11:06:50 PM
Internet Dictionaries
169
chapter describes some of the lexicographic characteristics of the internet dictionary, El Diccionario Inglés-Español de Contabilidad, and elaborates on two proposals that could be incorporated into internet dictionaries at no extra relevant reference costs (Nielsen, 2008). The proposals offered combine lexicographers’ own texts with the possibilities the internet offers (section 5). Finally, section 6 summarizes the main conclusions drawn.
8.2. Lexicographical Theory for Internet Dictionaries Although dictionary writing is an activity with a long history, the theoretical foundations on which such an activity is carried out are seldom published, perhaps because lexicography is still unsure of its true nature. Lexicography is not just a theoretical, scholarly endeavour, but is very much influenced by its pragmatic function. Without putting into doubt its applied vocation, a theory of lexicography must go a step forward by giving a theoretical basis to the pragmatic functions of dictionaries and accepting that there is only one and the same lexicographical theory at the highest level of abstraction. This implies the presentation of lexicographically available choices with regard to the way of presenting lexicographical data, dealing with lexicographical functions, providing assistance in various use situations, and evaluating the lexicographical information costs incurred that relate to the effort associated with the consultation of a dictionary, whether these refer to the look-up activities or ‘to the user’s ability to understand and interpret the data presented in the dictionary’ (Nielsen 2009: 34). The so-called function theory of lexicography proposed by Bergenholtz and Tarp (2003, 2004, 2005a; see Tarp, 2008a, for a review) defends the elaboration of a transformative theory of lexicography that not only studies actual existing dictionaries but also influences lexicographical practice by means of indications and guidelines for future conception and implementation. In the case of e-dictionaries, e-lexicography must also make reflections on the characteristics of the internet in order to determine how this technology can assist the processes of dictionary elaboration and consultation. Tarp (2007, 2008a), for example, claims that lexicography has much to contribute to the information and knowledge society, considering that the very essence of lexicography is its capacity to provide quick and easy access to data from which information needed by different types of users in different types of social situations can be retrieved, and adds that the truly unique thing about dictionaries is ‘the way in which this data is made accessible so users can quickly and easily find the exact data they need. In other words, the concept of accessibility is a key concept in any lexicographical theory claiming to be user-oriented.’ (Tarp 2008a: 101). Tarp’s leximat, the name he uses for the dictionary of the third millennium, aims at agreeing with the true nature of lexicography by
9781441128065_ch08_finals_txt_print.indd 169
7/6/2011 11:06:50 PM
e-Lexicography
170
creating lexicographical tools with electronic support that allows the user to interact with the lexicographical resource: A leximat is a lexicographical tool consisting of a search engine with access to a database and/or the internet, enabling users with a specific type of communicative or cognitive need to gain access via active or passive searching to lexicographical data, from which they can extract the type of information required to cover their specific needs. (Tarp, 2008a: 123) Within the functional paradigm, a lexicographical theory for internet dictionaries must focus on two main elements. First, it must take into consideration the wealth of lexicographical knowledge achieved through reflection and practice on aspects such as potential users, users’ information needs, use situation, the access process, lexicographical structures, lexicographical functions and so on. Secondly, it should adapt itself to the advent of the information and knowledge society, especially to the constant development of information technologies. In sum, the construction of internet dictionaries must address known demands such as the theoretical foundations on which dictionary-writing is based, but also new challenges that might imply a thorough shake-up of lexicographical activities with the aim of adapting the construction of dictionaries to the possibilities offered by the internet. Three of them are discussed in the next section.
8.3. The Defining Elements of Internet Dictionaries Internet dictionaries must not be compared with digitized, machine-readable versions of printed dictionaries, but must be considered as works in their own right, only limited by technological, economical, or practical (i.e. ‘common sense’) constraints. The defining elements here presented are viewed according to the functionalities the internet offers, and the constraints it imposes. By accepting that lexicography revolves around the concept of accessibility, a description of the defining elements of internet dictionaries must be narrowed down to cover quick and easy access. The concept of quick and easy access applies to all types of dictionaries, considering that users do not read dictionaries from one end to the other but consult the specific type of data that cover a specific type of user’s specific type of need in a specific type of use situation. Consequently, the first defining element of an internet dictionary deals with the access process. The biggest problem for a real information society is not lack of access to the needed data, but the fact that data cannot be found, or found in such great quantity that information stress or information death result, both of which usually force potential users to abandon the search before finding the
9781441128065_ch08_finals_txt_print.indd 170
7/6/2011 11:06:50 PM
Internet Dictionaries
171
results. Bergenholtz and Gouws (2010) address the concept of quick and easy access within accessology, a new discipline that demands empirical data and theoretical considerations ‘dealing with reference needs, reference acts and reference works in general’ (Lew 2008: 116), with the aim of understanding how users really access information sources in order to retrieve the information they need as quickly and successfully as possible. Bergenholtz and Gouws have started their reflections on accessology by, first, presenting and describing the terms needed to discuss the access process and, secondly, by offering some empirical results based on two case studies with Danish and Afrikaans reference sources. Regarding the terminology of the new discipline, Bergenholtz and Gouws describe terms such as information source usage situation, choice of information source, choice of the component of an information choice, choice of the component of an information source, consultation of an information source, search string, search option, situation-oriented access, user type-oriented access, accuracy of the access and the data presentation, combined search strings, access by means of an alphabetical macrostructure, access by means of a systematic macrostructure, index access, search in a part of a component, search route, search step, search speed, and search time. They also claim that an access process contemplates consulting a variety of different types of reference resources and test the hypothesis that dictionaries should offer better access to the data entries than other non-lexicographic reference sources. They focus on the notions search time and search steps and show that success goes hand in hand with well-developed access structures. Bergenholtz and Gouws’ new perspective on the access process indicates that the first defining characteristics of internet dictionaries is that they are part of potential multi-information source usage situations on offer, that is, several available information sources that can be accessed fairly quickly and easily within manageable time and place constraints by using the internet. This element reinforces the proscriptive nature of internet dictionaries, as lexicog raphers must be aware that the internet allows them to select specific data from the potential multi-information source usage situation on offer. For example, El Diccionario Inglés-Español de Contabilidad, in accordance with the accounting dictionaries, is both descriptive and normative: descriptive as it includes various spellings (even misspelt variants that potential users can come across in accounting texts); normative, as it advises the potential user of correctness, and makes recommendations regarding spellings, and preferred variants. Recent research has also shown that internet dictionaries are highly conditioned by formal properties that tend to be outside the lexicographer’s control (see Almind, 2005). Following Almind’s (2005: 38) claim that on the internet ‘any or all decisions concerning fonts, font-sizes, colour, background and page size, are void beyond the designer’s browser’, we go a step forward by summarizing (see Table 8.1, below) a list of demands and solutions taken from research by Almind (2005), Bergenholtz and Johnsen (2005), Bergenholtz and Gouws
9781441128065_ch08_finals_txt_print.indd 171
7/6/2011 11:06:51 PM
172
e-Lexicography
(2010), Church (2008), De Schryver (2003), Fuertes-Olivera (2009b, 2010c), Heid (2009), Lew and Doroszewska (2009), Nielsen and Mourier (2007), Ooi (2008), Prinsloo (2005), Storjohann (2005), Tarp (2007, 2009a), and Verlinde and Binon (2009). We have focused on demands for manageable costs and solutions already in operation (i.e. they are used in some lexicographic projects), as we believe that the second defining element of internet dictionaries is the fact that the demands a user makes towards a functioning internet dictionary are not only influenced by lexicographical theory but also by formal properties constrained by internet technology and reference costs, that is, the amount of money a potential user is set to pay (Table 8.1).
Demand
Solution
An internet dictionary must be easy to find
Make sure your dictionary has a simple internet address, still in operation.
It must have smart searches
Make sure it has several routes to the data included and brings together related items. Heid (2009), for example, proposes a model based on relational databases and typed formalisms.
It must strengthen interactivity
Make sure users can contact editors and, if possible, other users. Also they can have the possibility of being guided into different search possibilities. Tarp (2009a: 54) comments that Bergenholtz and colleagues have pioneered the way ahead by providing users ‘with the option of going through an interactive phase before being guided to the respective dictionary articles’.
It can be part of a reference portal
Make sure that it is included in the right portal.
It must use an elegant and pleasant layout
Use user-friendly colours, fonts, font-sizes and so on. Also divide the page into a static part (for example the part for header, footer, and navigational menu), and the dynamic data part.
It must have audio and record facilities.
Provide your dictionary with audio and record facilities.
It must use a familiar and reassuring virtual environment
Be sure that the screen does not contain ‘extra elements’ such as advertisements.
It must allow the retrieval of data in different formats
Make sure users access written data together with graphics, sounds, animation and so on, if necessary. For example, pedagogically oriented dictionaries connected with teaching/learning materials, games and so on.
It must be provided with extra software
Make sure that it allows links to other functionalities. For example, that it allows the copy-and-paste facility.
9781441128065_ch08_finals_txt_print.indd 172
7/6/2011 11:06:51 PM
Internet Dictionaries
173
Demand
Solution
The search field must be the centre of attention
Let the search field be visually centred near the top of your home page.
It must have readable articles
Use colours sparingly and keep the ‘good’ colours for important data. Black on white is good. Red on blue is not. Contrast is your friend. Put each piece of information on its own line and use headings prudently.
It must be regularly updated and corrected
Be sure that the date in which the dictionary is updated and corrected is visible. This is particularly a must in specialized lexicography. For example, users must be sure that recent modifications of, say, IAS/IFRS have been included in the dictionary.
It must offer instant, simple, and reduced numbers of results
Almind (2005) recommends that we keep small databases and web pages, but large servers, and that the dictionary should show only a limited amount of results. If a search results in many returns, we should allow users to re-define the search.
It must use advanced searches
Let users find alternatives to an alphabetically sorted list. A possibility is using proper headings for synonyms, antonyms, collocations, examples, and similar parts of an article that will allow the user to refine his or her search by searching both lemma and specific elements such as collocations or by searching within the found set of articles.
It must display the results logically
Is an alphabetical sorting really informative? How about sorting by word length or by relevance? Consider giving alternatives to the default sort especially when displaying the results of advanced searches.
Table 8.1 Demands and possible solutions related to formal properties of internet dictionaries.
Most of the above demands and solutions are present in internet dictionaries such as El Diciconario Inglés-Español de Contabilidad: z it is easy to find at www.accountingdictionary.dk/ z editors can be contacted easily through e-mail; z it will be integrated into the portal for the accounting dictionaries at:
www. ordbogen.com/ordboger/regn/index.php?dict=a007 z It uses an elegant and pleasant layout. For example, it divides the page into
a static part and a dynamic part; z it uses a familiar and reassuring virtual environment;
9781441128065_ch08_finals_txt_print.indd 173
7/6/2011 11:06:51 PM
e-Lexicography
174
z the search field is at the centre of the screen; z it has readable articles; z it is regularly updated and corrected; z it displays the results logically; z it is equipped with smart searches that retrieve different types of data. The
search engine offers four different options: ‘is’, ‘begins with’, ‘ends with’, and ‘contains’, each retrieving a different number of articles (Table 8.2) that can be used for satisfying different needs. In sum, the second defining element of internet dictionaries focuses on formal properties constrained by available internet technology and economic considerations. As always happens in the economic system we are living in, economic costs tend to decrease once a particular technology is widespread. Hence, this second defining element must be re-analyzed at the start of every new lexicographical project. Research has also insisted on the distinction between information databases and information tools. This distinction allows lexicographers to introduce new search criteria. Tarp (2009a), for example, comments on the introduction of new search criteria in Ordbogen over Faste Vendinger, an internet dictionary developed by Bergenholtz and colleagues at the Centre for Lexicography (Aarhus School of Business). Tarp (2009a: 57) claims that this dictionary provides users with the option of going through an interactive phase before being guided to the respective dictionary articles, and adds that the results are the existence of ‘dynamic articles including different types of data that are structured in different ways according to each type of search criteria’. Bergenholtz (this volume) mentions the accounting database as an example of what we must expect in e-lexicography, that is, in what this chapter refers to as lexicography for the third millennium, when paying attention to the difference between an information database and an information tool. He comments on some of the options available to the users of the English–Danish accounting dictionary, who can retrieve sets of data that are adapted to the function(s) they need: (i) to understand and translate an accounting expression; (ii) to translate an accounting expression or use it in Danish; (iii) to translate a word
Search Option is begins with ends with contains
Number of Articles retrieved 3 4 24 54
Table 8.2 Number of articles retrieved for benefit by search option in the Diccionario Inglés-Español de Contabilidad
9781441128065_ch08_finals_txt_print.indd 174
7/6/2011 11:06:52 PM
Internet Dictionaries
175
combination. In other words, users only retrieve the data necessary to carry out the function demanded. If users want to understand and translate an accounting expression (for instance, cash flow statement), they retrieve grammar information (i.e. a noun that is countable), pragmatic information (it is an international accounting term shown by means of the label IAS/IFRS), and conceptual information that is offered by means of a definition and a Danish equivalent with its grammar data (example 1): Example (1) Dictionary entry retrieved for understanding and translating an accounting expression cash flow statement noun IAS/IFRS Definition The cash flow statement must, as a minimum, show the cash flows for the period classified by operating, investing and financing activities. Furthermore, the cash flow statement must show changes in cash and cash equivalents for the accounting period and the cash and cash equivalents as at the beginning and end of the period. Equivalent aktieninvestering noun If users want to translate the accounting expression cash flow statement or use it in Danish, they also retrieve several English collocations translated into Danish, an English example also translated into Danish, and hyperlinks (i.e. cross references) to related dictionary entries (example 2): Example (2) Dictionary entry retrieved for translating an accounting expression or using it in Danish cash flow statement noun IAS/IFRS Definition The cash flow statement must, as a minimum, show the cash flows for the period classified by operating, investing and financing activities. Furthermore, the cash flow state.ment must show changes in cash and cash equivalents for the accounting period and the cash and cash equivalents as at the beginning and end of the period. Equivalent aktieninvestering noun Collocations cash flow statement per quarter pengestrømsopgørelse pr. Kvartal
9781441128065_ch08_finals_txt_print.indd 175
7/6/2011 11:06:52 PM
e-Lexicography
176
(...) Examples Investing transactions that do not require the use of cash and cash equivalents are excluded from the cash flow statement. Investeringstransaktioner, der ikke kræver brug af likvider er ikke indregnet I pengestrømsopgørelsen. See also direct method ( . . . ) To these three defining criteria of internet dictionaries, we have to add the option of facilitating access to free available internet texts and reference materials, such as free internet multiple-language dictionaries (for example, Wikipedia, Wiktionary, Fuertes-Olivera, 2009b). Below we will illustrate the working of these characteristics in El Diccionario Inglés-Español de Contabilidad.
8.4. El Diccionario Inglés-Español de Contabilidad El Diccionario Inglés-Español de Contabilidad was elaborated at the University of Valladolid by a group of lexicographers, accountants and language experts who follow the lexicographical principles and ideas decided by Nielsen, Mourier and Bergenholtz in the preparation of the accounting dictionaries (see Nielsen & Almind, this volume), a network of internet dictionaries designed to assist users in learning native-language as well as foreign-language accounting terminology and usage. In sum, this English–Spanish dictionary is a polyfunctional dictionary that helps users to read and understand accounting texts, helps to produce accounting texts, helps to translate English accounting texts into Spanish, and helps to acquire knowledge about accounting matters. It aims at satisfying the needs of three main user types. Tarp (2009a) claims that the function theory of lexicography provides a set of statements about lexicographic users’ needs which may assist the formulation of a future theory of information and data access in lexicographic works and other text types conceived for consultation and retrieval of information. The most important part of this statement is that users in general never require information in general. Users require a concrete type of information that depends both on the concrete type of user and on the concrete type of situation in which the need occurs. Until now, the function theory has identified four use situations, that is, communicative, cognitive, operative and interpretative situations. Although the need for systematic information has been traditionally satisfied by text types such as books, articles and so on, the function theory has also assumed that the wish to gain new knowledge could arise in social situations
9781441128065_ch08_finals_txt_print.indd 176
7/6/2011 11:06:52 PM
Internet Dictionaries
177
such as a teaching programme and a course of study, when users might wish to learn more about the area of knowledge they are dealing with, and where dictionaries and other reference works could be useful. For example, printed dictionaries aim at satisfying cognitive needs by the inclusion of encyclopaedic information, which is concerned with describing factual knowledge and extra-linguistic reality. More specifically, in specialized printed dictionaries, encyclopaedic information is usually given in encyclopaedic notes in the dictionary articles, encyclopaedic labels addressed to the individual lemmata or equivalents, and independent outside matter components, referred to as systematic introductions, subject-field components, encyclopaedic sections or subject field term systems (Bergenholtz & Nielsen 2006; Fuertes-Olivera, 2009a; Svensén, 2009). With very few exceptions, specialized printed dictionaries could be used for obtaining punctual knowledge but not for accessing systematic knowledge. Following suit, El Diccionario Inglés-Español de Contabilidad could also be used in cognitive use situations, provided that users are well aware that some of its structures also offer needed factual punctual knowledge in connection with communicative function such as text production, text reception and text translation: definitions, synonyms, antonyms, the lexicographical arrangements of synonyms and antonyms, pragmatic labels, contrastive notes, cross-references, hyperlinks and search engines. Definitions offer knowledge and are, therefore, necessary for cognitive functions considering that a defining characteristic of bilingual LSP dictionaries is the impossibility of defining concepts, that is, terms, without drawing on extralinguistic knowledge. In Example (3), below, the definition in English could be used to gain knowledge by providing users with additional background information such as the possibility of having positive or negative ‘benefits’, an accounting concept that goes against the folk interpretation most nonexperts (for example, the primary user group of the dictionary) will make. In sum, they contain factual information that is essential in bilingual specialized lexicography where potential users need a combination of language and background knowledge: Example (3) Except Article from El Diccionario Inglés-Español de Contabilidad past service cost Definition Past service cost is the change in the value of pension obligations earned by employees in prior accounting periods occurring in the current accounting period as pension benefits are implemented or changed. Past service cost may be positive if new pension benefits are implemented or existing benefits are improved, but it may also be negative if existing benefits are cut down.
9781441128065_ch08_finals_txt_print.indd 177
7/6/2011 11:06:53 PM
178
e-Lexicography
Equivalent coste de servicio pasado The utility of definitions in cognitive use situations is reinforced by the presence of two more microstructural elements that could also be very helpful as they contribute to delimit the lemmata in particular contexts: (i) synonyms and antonyms in some entries; (ii) the lexicographical arrangement of homonymous, by means of superscripts, and polysemous terms through Arabic numbers. For example, in Example (4), the definition together with polysemous markers 1, 2, 3, and 4, homonymous markers in the form of superscripts, and synonyms add factual information about a polysemous term: they indicate that account is a term (i.e. 1 in account1 and account 3 refer to the accounting concept of ‘recording’), a semi-term (i.e. 2 and 3 in account1 indicate two related meanings in the sub-field of trade), and a word (i.e. account 2 is consideration). These elements delimit the lemmata in particular cognitive contexts, and are exactly what we need for solving a reception problem: Example (4) Excerpt article from El Diccionario Inglés-Español de Contabilidad account1 1 Definition An account is a record in monetary terms of accounting transactions listing items of a similar type on a debit or credit basis. Accounts are part of an accounting system and usually classified according to a specific category, for example, cash account, ledger account, nominal account, contra account, deposit account, and so on. Equivalent cuenta 2 Definition An account is an arrangement with a firm allowing credit and deferring payments to a later date, usually the end of the month, or a statement of money paid or due for goods or services. Equivalent cuenta 3 Definition An account is an expression for a regular client or customer who does a large amount of business and has an account with a particular supplier or other enterprise. Equivalent cuentas account 2 Definition Account means consideration, for example, to give consideration to something when you plan. Equivalent cuenta
9781441128065_ch08_finals_txt_print.indd 178
7/6/2011 11:06:53 PM
Internet Dictionaries
179
synonyms consideración account 3 Definition To account means to show in the enterprise’s financial statements or bookkeeping records, usually according to specific rules or methods. Equivalent contabilizar synonyms anotar The dictionary also offers labels such as IAS/IFRS, UK and US that correspond to three relevant English varieties for cognitive purposes, as these labels make explicit that accounting is a culture-dependent subject field and that there are differences among three related systems: the American, the British and the International accounting system ruled by the International Accounting Standards Board. If necessary, (see example 5, below), the dictionary includes usage notes that explain differences among English varieties and the Spanish accounting tradition. For example, in (5), users are informed that there are accounting traditions where non-voting shares are admitted, a possibility that is ruled out in Spain, where shares can have different economic benefits but not different voting rights: Example (5) Excerpt article from El Diccionario Inglés-Español de Contabilidad A share (US) Definition usage note In some countries, A shares may be non-voting ordinary shares. Whether A shares are voting or non-voting shares will appear from a company’s articles of association usage note In Spanish law, shares can have different economic benefits but all of them have the same voting rights In this dictionary, we have included 261 ‘contrastive notes’, most of which make comments on specificities of the Spanish accounting system, as well as updated information about changed rules. For example, in (6), the dictionary indicates that Spanish law only requires training or degrees for auditors, but not for accountants: Example (6) Excerpt article from El Diccionario Inglés-Español de Contabilidad accountant Definition(. . .) usage note
9781441128065_ch08_finals_txt_print.indd 179
7/6/2011 11:06:53 PM
e-Lexicography
180
Spanish law only requires special trainings or degrees for auditors and not for accountants. Finally, the dictionary also includes cross-references to other lemmata that are relevant for comparison and additional knowledge. For example in (7), the label ‘see also’ cross-refers users to share class and B share, which could be used for explaining the existence of different share classes, subject to accounting peculiarities in the different accounting system described in the dictionary. In El Diccionario Inglés-Español de Contabilidad, 1,390 out of the 6,063 dictionary articles (almost 23 per cent of the lemmata) include internal crossreferences that are useful for obtaining factual knowledge about the subject field of accounting: Example (7) Excerpt Article from El Diccionario Inglés-Español de Contabilidad A share (US) see also B share share class The dictionary also includes lexicographical structures that are characteristic of internet dictionaries: users are hyperlinked to internal dictionary structures such as ‘English synonyms’, ‘English antonym’, ‘English cross-references’; also to external dictionary structures that link dictionary users to ‘sources’, that is, websites where more factual information can be accessed to. The hyperlinks to ‘English synonyms’, ‘English antonyms’, and ‘English cross-references’ have been commented on previously, as these are microstructural components. Table 8.3 shows that these structures are frequently used in this dictionary: 37.67 per cent of the entries include English synonyms to the equivalent, 7.6 per cent of the dictionary articles go with English antonyms, and 22.92 per cent of the lemmas are cross-referred to a lemma that is relevant for comparison and additional knowledge: Table 8.3 also shows a defining characteristic of El Diccionario Inglés-Español de Contabilidad, which leads us to claim that the dictionary is in the forefront
English synonyms
English antonyms
Internal cross-references ‘See also’
External websites ‘source’
2,284/6,063 (37.67%)
461/6,063 (7.60%)
1390/6,063 (22.92%)
174/6,063 (2.86%)
Table 8.3. Number and per cent of hyperlinks distributed per categories
9781441128065_ch08_finals_txt_print.indd 180
7/6/2011 11:06:53 PM
Internet Dictionaries
181
of e-lexicography: it links dictionary articles to external sources in the form of websites where potential users can access data for acquiring extra knowledge. In particular, 2.86 per cent of the dictionary articles have this functionality or internet dictionary structure. For example, in the entry current service cost (example 8), potential users are cross-refereed to the EU Single Market, an official website managed by the European Union where potential users can find updated and standardized information on accounting and financial standards, interpretations, and standardized terms in different European languages. In particular, we are linked to IAS 19, an accounting standard that interprets ‘employee benefits’. Example (8) Excerpt article from El Diccionario Inglés-Español de Contabilidad current service cost Definition Current service cost is the increase in the value of pension obligations as a result of services rendered by employees in the current accounting period. sources IAS 19. Supposing a user of the dictionary needs more precise information on the term current service cost, he or she could obtain it by clicking on IAS 19 that retrieves accounting standards, interpretations, and different draft texts. By activating the search engine of the PDF text, he or she will retrieve 25 instances of the term in IAS 19. Its analysis may lead the user to obtain additional and more precise information: z a standardized definition: ‘Current service cost is the increase in the present
z z
z z
z
value of a defined benefit obligation resulting from employee service in the current period.’ When compared with the definition found in the dictionary, potential users adds key factual knowledge to the one offered in the dictionary definition: a current service cost is a kind of cost in a ‘defined benefit obligation’ that must be accounted for in defined benefit plans; the steps that must be taken to account for current service costs in a defined benefit plan; the actuarial valuation method that shall be used by an entity in order to determine the present value of its defined benefit obligations and related current service costs, and, where applicable, past service costs; the way of calculating current service costs using the Project Unit Credit Method; a kind of warning considering that the current service cost ‘reflects the probability that the employee may not complete the necessary period of service to earn part or all of the benefits.’ Several examples that illustrate the way of accounting for current service costs.
9781441128065_ch08_finals_txt_print.indd 181
7/6/2011 11:06:53 PM
e-Lexicography
182
Finally, El Diccionario Inglés-Español de Contabilidad could also be used for gaining punctual knowledge by using the functionality ‘contains’ that retrieves all the articles where the search term is part of the lemma. For example, searching accounting with the search option ‘contains’ retrieves 92 instances. This equates El Diccionario Inglés-Español de Contabilidad with traditional thesauri such as EL Tesauro ISOC de Ecionomía, whose subsection 0601 covers contabilidad (Eng: accounting) by offering a hierarchy of the main terms used referring to analytic accounting, financial accounting, and management accounting. In sum, the search option ‘contains’ facilitates the acquisition of knowledge by allowing users to analyze the factual information included in related terms. In a brief span of time, the three user types of this dictionary can gain some knowledge about this key concept.
8.5. The Way Ahead: Increasing Reliability and Systematism The analysis offered in the previous sections highlights two main findings. First, the elaboration of internet dictionaries must be viewed in terms of the broad concept of accessology that is being developed. This means that we must accept that only internet dictionaries, but not other types of electronic dictionaries, assure quick and easy access to extra-lexicographical data. Secondly, some existing internet dictionaries have explored new ways of presenting data that could be used for acquiring punctual knowledge. From here, we can envisage the way ahead by exploring two possible developments that can contribute to the elaboration of better and more user-friendly internet dictionaries at no extra cost. In particular, we defend the use of source for communicative functions and the creation of an internet integrated systematic introduction. Our first proposal consists in increasing reliability by widening the use of hyperlinks in order to minimize or even eliminate the stress users have when working with texts that are in the forefront of new knowledge, a typical situation when translating, reading, and producing some specialized texts. A case in point occurs when translators must translate recently coined English terms ending in –ing into Spanish. These English terms usually conceptualize processes that have a difficult adaptation into Spanish. For example, English marketing has been calqued into Spanish, which has led to the oblivion of the Spanish equivalent mercadotecnia proposed by Spanish normative bodies. In such extra-lexicographical situations, Spanish translators tend to ‘Google’, trying to find a possible translation for the English-difficult-to-translate term. In our view, the dictionary should hyperlink the English term, offered as equivalent, synonym, or antonym, to Spanish terms where the English expression is still used. This will increase users’ trust in the dictionary, especially
9781441128065_ch08_finals_txt_print.indd 182
7/6/2011 11:06:54 PM
Internet Dictionaries
183
because this lexicographical measure agrees with the tenets of the function theory of lexicography that proposes the creation of proscriptive dictionaries, and with current translation practices by translators of specialized discourse (Fuertes-Olivera & Pizarro-Sánchez, 2002). Our proposal is illustrated in example (9): Example (9) Hypothetical entry in El Diccionario Inglés-Español de Contabilidad. direct costing
Definition Direct costing is a method for fi xing the value of the inventory of an enterprise, which is based on recognition of all direct production costs and variable indirect production costs. The fi xed indirect production costs are not included as part of the inventory value, but are considered as period costs. Equivalent sistema de coste directo Synonyms direct costing collocations consequences of using direct costing consecuencias de la utilización del sistema de coste directo Synonyms variable costing See also marginal costing period cost By hyperlinking direct costing to Spanish texts, the hypothetical entry could be used by users – typically Spanish translators of English accounting texts – to access several key concepts in ‘cost accounting’, as well as their Spanish equivalents in the field. In sum, this proposal enhances the proscriptive nature of El Diccionario Inglés-Español de Contabilidad that also uses the expression ‘not recommended, use instead’ and the symbol → to lead users to the expressions and words recommended by the lexicographers. Our second proposal starts from Bergenholtz and Tarp’s (1995) seminal work on specialized lexicography, which recommends the construction of systematic introductions in order to offer users data that could be used to gain knowledge about a subject field. Following Bergenholtz and Nielsen’s (2006) proposal for the construction of integrated systematic introductions in specialized dictionaries, our proposal contemplates four measures: z To write an English and Spanish list of basic accounting concepts. This can
be placed on the left vertical menu, accessible through the banner ‘list of basic concepts’.
9781441128065_ch08_finals_txt_print.indd 183
7/6/2011 11:06:54 PM
184
e-Lexicography
z To hyperlink the English and Spanish basic terms to respected and well-
known free internet texts, if available. For example, cost accounting and contabilidad de costes can be hyperlinked to the Wikipedia. z To hyperlink the English and Spanish basic terms to the appropriate dictionary entries where culture-dependent information should be included, if needed. z To hyperlink the English and Spanish basic terms to specially prepared pop-up texts that will be divided into chapters, sections, and the like, each linking users to ‘sources’ (i.e. external texts, when available), to dictionary entries (if specific data must be included), and to other chapters, and/or sections of the prepared text. The above measures are illustrated with the Spanish basic term contabilidad de gestión (Eng: management accounting): When the systematic introduction is in full operation, clicking on contabilidad de gestión (a basic accounting concept) will cross-refer users to internal dictionary structures (definition, and synonym), sources (two texts prepared by experts in the Wikipedia and a teaching blog), and to a pop-up text prepared by the lexicographers: (a) Definition and list of synonyms: La contabilidad de gestión, es un sistema de recogida y elaboración de información para ser utilizada por los usuarios internos, esencialmente por la gerencia, en la toma de decisiones de planificación, control y gestión de la actividad realizada por la empresa management accounting [hyperlink to the definition given in El Diccioanrio Inglés-Español de Contabilidad] a list of synonyms: contabilidad de costes, contabilidad interna, contabilidad analítica. (b) Source: hyperlinks to two websites where we have found this term explained: http://ciberconta.unizar.es; http://es.wikipedia.org/wiki/Contabilidad_de_costos; (c) a pop-up text that explains that the contabilidad de gestión, here differentiated from the contabilidad fi nanciera, has specific characteristics and aims: Example (10) Hypothetical construction of an internet integrated systematic introduction. Para cumplir sus objetivos la información recogida y elaborada por este sistema contable presenta una serie de características propias entre las que cabe destacar las siguientes: (1) La información obtenida y utilizada tiene diversas fuentes: la proporcionada por la contabilidad financiera, información expost, que se corresponde con datos reales e históricos y la obtenida ex ante en base a la utilización de datos estimados
9781441128065_ch08_finals_txt_print.indd 184
7/6/2011 11:06:55 PM
Internet Dictionaries
185
previamente para conseguir unos estándares de trabajo deseados o unos objetivos perseguidos. (2) Se va a utilizar principalmente en el ámbito interno (por esta razón también se la denomina contabilidad interna), por lo que no está sujeta a regulación. (3) Es una información segmentada, proporciona datos de la empresa dividida en secciones, departamentos, etc. Los objetivos principales perseguidos por la contabilidad de costes son: (a) La planificación y el control empresarial mediante el cálculo de costes y de resultados de las distintas secciones en las que pueda dividirse la actividad de la empresa (almacenamiento, producción, distribución, administración). (b) Control de la gestión y presupuestos. Permitiendo comparar datos de la empresa en diversos ejercicios o datos de la empresa con los de otras empresas. (c) Control de las tareas o control de ejecución. Ya que las tareas tendrán asignadas unidades físicas estándares de realización (horas de mano de obra, horas de utilización de maquinaria) por lo que podrá realizarse un seguimiento del grado de cumplimiento de las principales tareas. (d) La valoración de los inventarios. Mediante la asignación de costes a las distintas fases del proceso productivo, la contabilidad de costes determinará el coste de producción de los productos elaborados por la empresa. If necessary, a translation of the Spanish text will be offered in parallel. For contabilidad de gestión this English text is unnecessary as Wikipedia has a welldevised entry for management accounting. In sum, our proposal for increasing reliability and constructing an internet integrated systematic introduction rests on the tenets of the function theory, the main characteristics of the internet, and common sense as these two proposals can be easily implemented at no extra relevant lexicographic cost.
8. 6. Conclusion This chapter has reviewed the state-of-the-art of internet dictionaries and has defended the thesis that these reference works must be constructed by adapting current functional approaches to lexicography to the economic and technical constraints imposed by the development of the internet, which has allowed us to hypothesize three defining elements of internet dictionaries: (i) the access process in internet dictionaries allows lexicographers to nudge users into accessing lexicographically unprepared data easily and quickly; (ii) the construction of internet dictionaries is constrained by available internet technology and economic considerations; (iii) the elaboration of internet dictionaries must make room for a key distinction between information databases and information tools.
9781441128065_ch08_finals_txt_print.indd 185
7/6/2011 11:06:55 PM
186
e-Lexicography
El Diccionario Inglés-Español de Contabilidad is a typical exemplar of an internet dictionary that has been constructed by paying attention to the above principles. Similarly, our contention is that this dictionary, and the twin dictionaries that we are currently constructing – El Diccionario Español-Inglés de Contabilidad and El Diccionario Español de Contabilidad – must make room for each new development that can be implemented provided that it agrees with the defining characteristics of the internet, proven lexicographical principles and methods, and common sense that here is presented in terms of reference costs. For example, the two specific proposals for increasing reliability and systematism can benefit from the above three principles and therefore will enhance the idea that internet dictionaries can be used in both communicative and cognitive use situations in a satisfactory way.
Note 1
Our thanks go to the Ministerio de Ciencia e Innovación and la Junta de Castilla y León for financial support (grants FFI2008–01703/FILO and VA039A09, respectively).
9781441128065_ch08_finals_txt_print.indd 186
7/6/2011 11:06:55 PM
Chapter 9
A Dictionary Is a Tool, a Good Dictionary Is a Monofunctional Tool Henning Bergenholtz Inger Bergenholtz
9.1. Polyfunctional and Monofunctional Dictionaries and Other Information Tools It is generally assumed that dictionaries serve as tools; for example, as an aid when one does not understand a word in a text. The customary rule for tools is that, in order to be proper, safe and easy to use, they must be designed for a specific and limited purpose. For example, if you want to take down a large tree in the garden, you buy a special saw for this purpose in a hardware shop. This saw can probably also cut a large beam, but it would hardly be of any use on a very thin sheet of wood you want to cut into three equal pieces. If they are to work well, tools must be monofunctional. Just as there are different types of saws, there are also different types of dictionaries. However, in most cases, dictionaries are not only monofunctional, but highly polyfunctional. This is reflected in the typical definitions of the term dictionary given in dictionaries and textbooks. However, the fact that dictionaries are polyfunctional information tools is stated only indirectly, since not only the function but also the individual parts of the tool are listed, comparable to the dozens of small parts of a Swiss Army knife which you can access individually: A dictionary is a collection of words in a specific language, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another. (http://en.wikipedia.org/wiki/Dictionary)1 Definitions such as the above do not suggest that all dictionaries are exactly alike or that they all contain exactly the same item types, but that they contain
9781441128065_ch09_finals_txt_print.indd 187
7/6/2011 11:07:04 PM
188
e-Lexicography
as many items and item types as possible in order to satisfy the largest number of different user needs. The type of dictionary described in this lexicographic definition is actually a very special dictionary. It is the dictionary that linguists who also masquerade as lexicographers on the side regard as the dictionary. It is a very narrow understanding of ‘dictionary’. A dictionary should help solve communication problems; it should solve cognitive problems only in very few cases. Two highly polyfunctional dictionary types are concealed behind the definition. The first type contains all sort of linguistic information (‘usage information, definitions, phonetics, pronunciations, and other information’) as an aid to reception and text production problems, and encyclopaedic information to meet special types of knowledge needs (‘etymologies’). The second type may also contain linguistic information similar to that of the first type of dictionary, but especially equivalents (‘book of words in one language with their equivalents in another’). It is a dictionary that aims to assist, especially with translation problems, but possibly also with the reception of a foreignlanguage text or with text production in the foreign language. This is also reflected in the way the two types of dictionaries are advertised. The monolingual dictionary is intended for mother-tongue and foreign-language speakers and can, it is claimed, help solve communication problems of the target language. The same claims are usually also made for bilingual dictionaries, with the difference that they also, and especially, aim to help solve translation problems. Descriptions of or advertisements claiming such polyfunctionality are quite uncharacteristic for almost all other tools, where the special suitability of precisely this product for a specific task is emphasized instead. To return once again to saws, we can see that there are hand saws and motorized saws; in this particular case, electric or petrol saws. However, the specific purpose for which each individual saw is particularly suitable is always emphasized: large trees for the professional tree feller, chipboard for the hobby woodworker, and thin Masonite sheets for children or grown-ups. When a particular saw is actually purchased, the first and also most important reason for the choice is whether the saw can cut specifically that for which it is bought. The next consideration for the choice of saw will be a mix of two secondary arguments: 1. How much does the saw cost? 2. Is it a familiar brand of a manufacturer known for durable, quality products? Users of dictionaries probably think and argue in the same way, at least when it comes to the secondary consideration, namely, the price and quality assurance. We do not know this exactly, yet we do know that those who compile dictionaries and ponder over past and future lexicographic practice do not study existing monolingual dictionaries – partly because hardly any exist. Only rarely do they discuss concepts for and theoretical considerations about monofunctional information tools. The focus is on tools corresponding to the first or the second part of the above definition of a polyfunctional dictionary.
9781441128065_ch09_finals_txt_print.indd 188
7/6/2011 11:07:05 PM
A Dictionary Is a Tool
189
Almost all of those who call themselves lexicographers, however, are of the view that lexicography is a linguistic discipline. This is remarkable. For the music dictionaries that are the subject of this contribution, there was no need for any cooperation with linguists or linguistic theories. Three types of experts participated in the planning and execution: 1. an expert in lexicography 2. an expert in music theory and the history of music 3. an expert in lexicographical databases Obviously, there are dictionaries for which the co-operation of linguists is required, for example, in a general language text production dictionary; see Bergenholtz and Gouws, 2010: 1. an expert in lexicography 2. a linguist 3. an expert in lexicographical databases For some dictionaries, but not for all, the co-operation of linguists is required. If one considers the total number of all dictionaries, the co-operation of expert linguists will be necessary for, at most, 30 per cent of them (Leroyer, 2010). It is therefore remarkable, and only as a result of incidental scientific developments – wrong developments, one is tempted to say – that this is observed by so many people. But it is even more remarkable that only linguistics has a secure scientific status. That lexicography has any status at all is denied especially by British linguists, and particularly tersely in a newly published introduction to the practice of lexicography: This is not a book about ‘theoretical lexicography’ – for the very good reason that we do not believe that such a thing exists. But that is not to say that we pay no attention to theoretical issues. Far from it. There is an enormous body of linguistic theory which has the potential to help lexicographers to do their jobs more effectively and with greater confidence. (Atkins & Rundell, 2008: 4) Such a view is not only an expression of British arrogance, it is also simply untrue and a threat to the further development of lexicography, especially since it is typical of the majority of the contributions in the International Journal of Lexicography, which, because it appears in English, has considerable influence, despite its modest theoretical level. If lexicography were denied any form of being a science, linguistics would remain a discipline at our universities, but lexicography would not. We would not be able to do any dictionary work at universities. And purely theoretical lexicographic contributions, for example,
9781441128065_ch09_finals_txt_print.indd 189
7/6/2011 11:07:05 PM
190
e-Lexicography
about access to data in dictionaries, access times, dictionary structures and so on, would not be regarded as scientific research and, therefore, would not be included in the research and data bases at universities. If this opinion were to become accepted, a researcher at a university who wished to keep his position would be compelled to turn to other subjects rather than lexicography matters. The attempt made by Atkins and Rundell to rob lexicography of any scientific status, together with many other attempts, such as that by Anna Wierzbiecka (1985), have some prospect of success if the development of the theory of lexicography over the past 40 years is considered. After a rapid rise in the 1970s and 1980s, theoretical lexicography became partly entangled in an external relationship with computer lexicography, which operated without any consideration at all being given to the different functions of several dictionaries. The structural descriptions of existing dictionaries became partly a mixture of hard-to-grasp theories and contemplative analyses, which, at best, do not get in the way of future dictionary concepts. Lastly, most of the studies of dictionary usage were carried out in the most unscientific way imaginable, as they were conducted without any knowledge and without use of the methods of the social sciences. By this we mean that they do not satisfy the two fundamental requirements for scientific surveys: (1) A scientifically sound survey must be based on a section of the population surveyed that can be statistically viewed as representative of the entire population. (2) The respondents must be selected by the researcher on the basis of principle (1) above. Moreover, questions are hardly ever asked about the real needs of dictionary users, only about linguistic phenomena. The general-language polyfunctional dictionary is assumed to be the one and only dictionary, and questions are asked such as: How often do you use meaning items, grammar items or etymological items? In other words, the conclusion is that the scientific development of lexicography is still experiencing substantial problems. However, that was not the objection raised by Wierzbiecka (1985) or Atkins and Rundell (2008). They denied the scientific nature of lexicography altogether. We do not want to repeat what we think of this. We will say, however, that many of the lexicographic proposals put forward during the past 40 years for changing or improving lexicographic practice were often more harmful than useful. Overall, our explanation of this sorry state of affairs is that while many contributors call themselves – and may even be – lexicographers, their aims respond to the criteria and objectives of a very broad linguistic presentation. Even though the password ‘userfriendly’ is used time and again, such lexicographers/linguists do not have the
9781441128065_ch09_finals_txt_print.indd 190
7/6/2011 11:07:05 PM
A Dictionary Is a Tool
191
monofunctional tool in mind, but rather a phenomenological description that is as broad as possible, as befits a good linguist.
9.2. Meta-lexicographic Proposals and Demands Leading to Information Stress The assumption that dictionaries are and must be polyfunctional is associated with a focus on linguistic phenomena instead of on the user’s needs. A series of ideal requirements is laid out for dictionaries. These requirements are reminiscent of our earlier comparison with a Swiss Army knife: the larger and more comprehensive, the better. This goes not only for the whole dictionary, but also for the individual dictionary entries: z The more lemmas a dictionary contains, the better. z The more item types the dictionary user finds, the better the dictionary. z The more items in each item type in each dictionary entry that the diction-
ary user finds, the better the dictionary. z If the user finds information he is not looking for at all, it is a really good
dictionary. This is a so-called bonus item. Such claims may be in order in the case of a documentation dictionary or in certain other dictionaries with one or more cognitive functions, but for dictionaries with communicative functions exactly the opposite is true. The ideal communication dictionary (for reception or text production or for translation) is precisely the dictionary that deals quickly and clearly with the need the dictionary consultation should satisfy. It is nothing less, but, especially, nothing more than that. When the dictionary user has a specific information need, all irrelevant items will only complicate rapid access to the desired data and may even interfere with the interpretation of the data found. We are familiar with this through the use of search engines such as Google. In almost all cases, the main problem here is not that one does not get any answers, but that there are so many hits that one often gives up after the first one or two pages of these, discontinuing the search if the information sought cannot be found there. Such a reaction is called information stress and, in the worst case, leads to no information being obtained. Bergenholtz (2009) shows that long search times may lead to the user aborting the search prematurely and to data that was available in the dictionary not being found. The point at which the search is cancelled seems to vary widely from one individual to another. In a first case study (Bergenholtz, 2010), it was observed that a subject gave up after searching for 15–20 seconds in an electronic dictionary and after 20–30 seconds in a printed one. Another subject searched several times, for more than five minutes in both printed dictionaries and electronic dictionaries, until the result was found or the search was discontinued.
9781441128065_ch09_finals_txt_print.indd 191
7/6/2011 11:07:05 PM
192
e-Lexicography
The second claim is a variant of the requirement we have just discussed, only the other way round; the issue is the inadequate quality of small dictionaries: z This is only a small dictionary with a few thousand entries. A small diction-
ary can never be a quality dictionary. Briefly, our response would be that a large saw is not necessarily better than a small saw. They merely serve different purposes. Dictionaries are no different. If a dictionary with, for example, 2,000 entries describes a language or subject field in such a way that it can satisfy a certain type of need, then it is also a good dictionary. Whether it is large or small is not the point. If one turns the argument around, an interesting research question arises which nobody has tackled yet. Neither have we, although we collected some data three years ago. It is a fact that not all entries in a dictionary are ever used by any user. This cannot be determined exactly for printed dictionaries, but for electronic dictionaries it can be. In the Danish internet dictionary of 2002–2006, which was a text-production dictionary, a total of more than 6 million searches using a search string or a hyperlink from one entry to another had been logged at the beginning of 2006. But less than half of the 112,000 dictionary entries had been looked up or clicked onto one or more times. In no less than 6 million searches, almost 60,000 dictionary entries had not been looked up even once by any user, or had at least not been found! A study of the data concerning dictionary entries consulted and those not found did, however, reveal systematic evidence. In both groups, there were common words, rare words, neologisms and obsolete words. There was not one of the common word types that could not be found in the large number of unused dictionary entries. In the Danish Music Dictionary of 2006, which was not then accessible on the internet for very long, the number of dictionary entries consulted was already more than half of the 4,000 entries after a short time. But once again, we observed no clear system, only that there were fewer unused dictionary entries. Research here is still waiting to be undertaken. However, we do not believe that an egg of Columbus will be discovered that will make it possible to omit those dictionary entries about which it can be predicted that they will not be used anyway. At least, we do not see any indication of this. Instead, the best solution consists of immediately writing those entries for which a user looked in vain, if the dictionary does not already contain them. Recent learners’ dictionaries point out with great pride that they offer the user frequency indications and that they explain precisely the most common words and omit the rare ones. This may be all well and good for a production dictionary for non-native speakers, at least as far as lemma selection is concerned. However, this line of thinking is often generalized: every dictionary should omit rare words from its lemma stock. This is followed by the general statement that is often highlighted as criticism in dictionary reviews: z Words and expressions which are rarely used do not belong in the
dictionary.
9781441128065_ch09_finals_txt_print.indd 192
7/6/2011 11:07:06 PM
A Dictionary Is a Tool
193
When a user reads a rarely used word, he is much more likely to experience a reception problem than when he reads a very common word. Moreover, for a specialist dictionary, this argument is quite astonishing. It means that important musical terms should be omitted from the music dictionary simply because they are rarely used in texts. The argument is not relevant. What is relevant is whether the users want to look it up. As discussed above, we have here the not easily explained fact that many words in a reception dictionary are never looked up. For a specialist dictionary with a cognitive function, the argument of frequency is even less comprehensible. What is also important here is to give users a systematic insight into aspects of musical theory and the world of music. Frequency is not an argument; subject relevance is. Another statement, again put forward by linguists, assumes that a dictionary has only certain item types and should at the same time not contain certain other types: z This is a bad dictionary entry in a bad dictionary. Because here there are
not only meaning items, but also encyclopaedic items. Such a distinction is a central theme in both the American annual Dictionaries of 1993 and in the German annual Lexicographica of the same year. Atkins (1993) takes the view that a meta-lexicographical discussion should pay more attention to the commercial conditions of lexicography – time, dictionary scope and money – than it has done to date. One could, like Wierzbicka (1985), replace a semantic item for cup of four lines with one of 80 lines, says Atkins, but neither the space in the dictionary nor the lexicographer’s time would allow for this. In any case, the proposed extensive semantic item for cup – for example, that cup can be ‘with or without a handle’ – would be neither necessary nor relevant in all respects (Atkins, 1993, 9). But even Wierzbicka, whom Atkins criticizes, is not an absolute adherent to long explanations. Referring to the following example, Wierzbicka (1993: 49) in fact criticizes the amount of detail in some dictionary entries: z dentist a person who is skilled in and licensed to practice the prevention,
diagnosis, and treatment of diseases, injuries, and malformations of the teeth, jaws, and mouth and who makes and inserts false teeth She regards this explanation as far too detailed; in her view, it belongs in an encyclopaedia and not in a dictionary, which should provide information on the meaning but not general knowledge. Wierzbicka, therefore, recommends ‘a short definition’ – which says so little that it applies to a lot more people than dentists: z someone whose job is to look after teeth
9781441128065_ch09_finals_txt_print.indd 193
7/6/2011 11:07:06 PM
194
e-Lexicography
It is far less a matter of long or short than for what user group and for what sort of consultation it is targeted. Below we will see some examples of how this difference is reflected in a receptive and in a cognitive music dictionary. But it is not so much a question of semantic or encyclopaedic; there is no scientific justification for this distinction (see Bergenholtz & Kaufmann, 1996). The question is rather whether the aim is reception, the acquisition of as much knowledge as possible about a matter, a word or a term, and also whether laypersons or semi-experts are targeted. In this regard, compare two different entries, both intended as entries in a reception dictionary, the first one for non-experts: z (1) gene the basic unit of inheritance transmitted from parent to
offspring. The second entry comes from the proposal for a reception dictionary for semiexperts: z (2) gene A gene is a DNA sequence encoding a protein, tRNA or rRNA
A dictionary aimed at presenting knowledge on molecular biology for laypersons can give the following in a cognitive dictionary compiled for this purpose: z (3) gene the basic unit of inheritance transmitted from parent to
offspring. An organism contains many genes – in humans approximately more than 100,000. Each gene has a specific characteristic, for example, one out of the potential blood groups. In chemical terms, genes are small sections of big complex molecules, the nucleic acids. In bacteria, these are coiled aggregates and in higher organisms, they are constituents of chromosomes. A dictionary aimed at presenting knowledge about molecular biology for semi-experts can give the following in a cognitive dictionary compiled for this purpose: z (4) gene a DNA sequence encoding a protein, tRNA or rRNA.
For eukaryotes, a gene can also be defined as a transcribed DNA sequence or transcription unit. In prokaryotes, two or more proteins are often encoded in the same transcription unit, and such a transcription unit plus its associated regulatory sequences is termed an operon. The Danish Music Dictionary (2010) presents meaning items formulated more for lay persons and perhaps partly for users who are at a level between non- experts and semi-experts. The presentation below therefore makes a distinction only between meaning items for reception and cognitive problems. Printing both as separate dictionaries would present no problems. Neither
9781441128065_ch09_finals_txt_print.indd 194
7/6/2011 11:07:06 PM
A Dictionary Is a Tool
195
would the above proposals for molecular biology. This would be a question of potential sales figures. For electronic dictionaries, this argument would not hold true. In this context one would be able to select dictionary 1, 2, 3 or 4, depending on need and condition, by pressing a button.
9.3. Presentation of Musikordbogen – The Danish Music Dictionary On the internet address www.ordbogen.com/ we find a lot of dictionaries, some of them bilingual, some monolingual. Some are for common language, some are for special purposes. The Danish Music Dictionary is free of charge but the other dictionaries of Ordbogen.com are not, so the log in is not to be used by the music dictionary user – in fact, you should not do it, as then you will be able to look up only a few terms. First of all, it should be underlined that the dictionary has only recently been taken over by Ordbogen.com. This means that a few things do not function as they are supposed to. Secondly, when we speak of the Music Dictionary in the singular, we mean the database, and the Danish Music Dictionaries in the plural are the three dictionaries specified below. Writing a search string and clicking on Musikordbogen at the bottom of the list under vælg ordbog (choose dictionary), leads to www.ordbogen.com/ordboger/musk. No matter what you have written, you are led to the search page of the dictionary. On the left hand side of this very first page you will find a column with information about the dictionary. Om ordbogen (about the dictionary) tells us about the Music Dictionary, what it contains and who it is intended to help: 3.900 forskellige opslagsord med definitioner, 3.900 different lemmas with definitions, historisk baggrund, synonymer, henvisninger historical background, synonyms, til relevante hjemmesider, billedmateriale, m.m. references and links, pictures and so on. Denne ordbog er en fagsproglig ordbog om This dictionary is a dictionary for special musikalske begreber, hovedsagelig fra den purposes on musical terms mainly from the klassiske verden, men også fra rytmisk musik og world of classical music, but also from den såkaldte verdensmusik. Musikordbogen er commercial music and the so-called world tænkt som værktøj for musikstuderende, music. The Music Dictionary is intended to musikskoleelever, musikudøvere og alle be a tool for music students in universities musikinteresserede, som har behov for hjælp ved and music schools, for both amateurs and læsning af musiktekster eller ønsker at få professional musicians, and for every yderligere viden om musik. interested person who wants help by reading texts on music or who wishes to get further information on musical terms and topics.
As you can see, the Music Dictionary has been created for semi-experts and laymen. As for the number of lemmas, an internet dictionary can always tell
9781441128065_ch09_finals_txt_print.indd 195
7/6/2011 11:07:06 PM
196
e-Lexicography
us exactly how many there are. I made lemma no. 3,926 on 17 April 2010 The dictionary is updated at regular intervals by ordbogen.com. When you have chosen Om ordbogen, you will find links on the right hand side of the page to a more thorough introduction to the dictionary (Introduktion), which tells us how to use the dictionary:
En ordbog er et hjælpemiddel, et stykke værktøj. A dictionary is an aid tool. Like many tools it Ligesom meget værktøj findes der ordbøger, has certain functions, some have only one som har én funktion, og andre, som har flere. function, others are polyfunctional. . . . Musikordbogen har flere funktioner. . . .
This introduction must still be written. Underneath is a link to a systematic part (Systematisk del) and to the possibility of writing a mail to the authors (Kontakt). The systematic part consists of six comprehensive articles on topics from music theory:
§ 1 notation § 2 intervaller § 3 kirketonearter § 4 dur, mol og andre skalavarianter § 5 kvintcirklen § 6 akkorder
§ 1 notation § 2 intervals § 3 church modes system § 4 major, minor, and other scale systems § 5 cycle of fi fths § 6 chords
This systematic part is not intended as a complete manual on the topics. If you do not know anything about music theory you will not understand it. It simply offers a condensed description in an overview with many music examples. These music examples serve as illustrations linked to in many dictionary entries. On this page (www.ordbogen.com/opslag.php?word=mus&dict=mudd#musk), the user has three search possibilities, or – as it has been argued above – you could say three music dictionaries to choose from. The user can look up only terms, not names, but, as we shall see later, under certain circumstances the user can get hits on names. We shall now go through the dictionaries by imagining some user situations. As said above, we do not know much about these situations, but we assume that they can be divided according to three functions. Therefore, we have divided the music dictionary database into three dictionaries indicated by the three search buttons: a reception dictionary a knowledge dictionary
9781441128065_ch09_finals_txt_print.indd 196
forstå et udtryk (i.e. understand a term) viden om et udtryk (i.e. knowledge about a term)
7/6/2011 11:07:07 PM
A Dictionary Is a Tool
197
a polyfunctional dictionary finde et udtryk ved hjælp af et vilkårligt søgeord (i.e. find a term by means of an arbitrary expression)
9.3.1. The First Music Dictionary: A Reception Dictionary The first music dictionary is meant for reception only, and asks for a swift and short answer to a question. The 1. button under the search frame says forstå et udtryk (understand a term). If we imagine a user studying some music, reading a text on music or hearing something on the radio and so on, he or she can write the expression for which there is a need for an explanation and will then receive a short answer or definition of the word. For example, our user has several options: a. The user has bought a CD with Spanish music, reads the booklet and wants to know what zarzuela is. A. zarzuela Kort forklaring en spansk form for syngespil med talt dialog
A. zarzuela Short explanation a Spanish form of ballad opera with spoken dialogue
b. The user is reading some music, in this case a piece of French piano music, and wants to know the meaning of the expression serrez:
B. serrez fransk stram til pres tempoet op
B. serrez French tighten up increase the tempo
The first is the express translation of the French word, the second a translation that helps understand the term in a music text. c. The user is reading a book about old music, and wishes to look up the meaning of kirketoneart (church mode):
C. kirketoneart Kort forklaring kirketonearterne er det middelalderlige diatoniske skalasystem, som gik forud for den senere tids dur- og moltonearter
9781441128065_ch09_finals_txt_print.indd 197
C. church mode Short explanation the church modes constitute the diatonic system of scales of the Middle Ages, which led to the major and minor scales of later periods
7/6/2011 11:07:07 PM
198
e-Lexicography
Of course, this is a very short definition but enough to continue the reading, at least if the words major and minor are of some meaning to the user. d. The user is listening to a radio programme on African music, and wants to know what an mbira is. Here there might be a difficulty in spelling, but ordbogen.com offers the possibility of writing a part of the word and getting a list of 10 suggestions as to what the user might mean. We now assume that the radio speaker spells out the word and that the term is written correctly:
D. mbira Kort forklaring en lille håndholdt afrikansk idiofon, som består af nogle metaltunger, der spilles med tommelfingrene, med en lille kasse som resonator
D. mbira Short explanation a small hand-held African idiophone with metal tongues on a resonating box which are played with the thumbs
You might say that this explanation contains an expression, ‘idiophone’, which could lead to fresh questions. This is the price paid for the principle that the definition should be as short as possible. In this case, I should say that the short explanation is enough to understand what was meant in the context. e. Many music students trying out a music dictionary tend to think of their own instrument first. Here is an example:
E. violin Kort forklaring den vestlige kulturs højstudviklede strygeinstrument. I de sidste 300 år har dette instrument været det betydeligste i klassisk kompositionsmusik, når man ser bort fra klaveret. Geige tysk violin violon fransk violin
E. violin Short explanation the highest developed bowed string instrument of the Western culture. For the last 300 years this instrument has meant most in classical art music, only exceeded by the piano. Geige German violin violon French Violin
These last two items occur under the short explanation because this dictionary searches not only the lemma field but also the translations.
9781441128065_ch09_finals_txt_print.indd 198
7/6/2011 11:07:07 PM
A Dictionary Is a Tool
199
f. The dictionary has no names as lemmas but that does not mean that names do not occur at all in the lemma list. Let us try with the name of Mozart. Here again, ordbogen.com makes 10 proposals, and the first is Mozart-effekt (the Mozart effect). Or we can try Hoboken:
F. Hoboken F. Hoboken Kort forklaring Short explanation navnet på hollænderen A. van Hoboken the name of the Dutchman A. van Hoboken (1887–1983), som i 1957 udgav første del (1887–1983), who published his fi rst af en tematisk-bibliografisk fortegnelse version of a thematic-bibliographical over Jos. Haydns værker catalogue of the works of Jos. Haydn in 1957
In this case, Hoboken is not the name of a person but it is used as a designation of an important work of reference in music science. The presentation has shown that this dictionary consists of lemmas, translations and short explanations. We must admit that it has shown hits in every case. But, of course, it may happen that the search leads to no hits at all. It might be that the user has misspelt the term or that he is searching for something that is not lemmatized. In either case, ordbogen.com shows a list of possibilities (so you might say that it is a dictionary for production if you are unsure about the spelling of a term). If you are not satisfied, you have to press one of the other buttons.
9.3.2. The Second Music Dictionary: A Dictionary for Knowledge This dictionary is meant for knowledge concerning musical topics, terms, genres, instruments and so on. It contains all the short explanations from the first dictionary, in addition to information on history, examples and works, synonyms, links and pictures. And it has references to other entries and to the systematic part, if this enlarges on understanding: g. Let us imagine a user reading a concert programme and, in connection with one of the works, he comes across the abbreviation Hob. Now if he simply writes Hob (without the dot) he gets a list of proposals of which the first two are Hob. and Hoboken. (In the present version of the dictionary, Hob. will only contain a synonym Hoboken. But this synonym is linked to the lemma Hoboken, and by clicking onto it the user can access the explanations.)
9781441128065_ch09_finals_txt_print.indd 199
7/6/2011 11:07:07 PM
200
e-Lexicography
G. Hoboken Kort forklaring ... Uddybende forklaring Hobokenfortegnelsen har siden da været brugt til at systematisere disse værker i lighed med fx KV, Köchels fortegnelse over Mozarts værker, og BWV, Bachwerkeverzeichnis. Nummereringen efter denne fortegnelse markeres med forkortelsen Hob. Anden del af fortegnelsen med vokale værker udkom i 1971. Tredje del, som er et registerbind, kom i 1978. Se også værkfortegnelse Flere oplysninger http://de.wikipedia.org/wiki/HobokenVerzeichnis_der_Werke_von_Joseph_Haydn
G. Hoboken Short explanation [see F., p. 199] Elaboration The Hoboken catalogue has since then been used to systematize these works just as occurs with, for example, KV, the catalogue of Mozart’s works, and BWV, Bachwerkeverzeichnis. The numbering in this catalogue is referred to by the abbreviation Hob. A second part of the catalogue was published in 1971. The third part, which is an index volume, appeared in 1978. See also catalogue More information http://de.wikipedia.org/wiki/HobokenVerzeichnis_der_Werke_von_Joseph_Haydn
As you can see, the so-called elaboration implies the short explanation (The Hoboken catalogue has since then . . . ). Under Se også (see also) is a link to the term værkfortegnelse (catalogue) under which you will find all the thematic catalogues lemmatized in the dictionary, and from there you can go on to any of the others. These cross-connections are deemed to be important and we will constantly be looking to include more of them. h. The user is reading a book about Gregorian chant and wishes to obtain more information about the church modes mentioned above.
H. kirketoneart Kort forklaring ... Uddybende forklaring Systemet fik allerede sin udformning omkring år 500 af syriske munke, der delte de (foreløbig) 8 tonearter op i 4 autentiske, som havde grundtonen nederst i skalaen, og 4 plagale, hvor den samme grundtone lå i midten, så tonearterne var parvis beslægtede. ... Se også tonal tonal autentisk plagal Flere oplysninger §3 Billede
9781441128065_ch09_finals_txt_print.indd 200
H church mode Short explanation [see F., p. 199] Elaboration The system had its rules made by Syrian monks around the year 500 A.D. They divided up the 8 known modes into 4 authentic ones with the tonic at the bottom, and 4 plagal ones with the tonic in the middle so that the modes were related as pairs. ... See also tonal tonal authentic plagal More information §3 Picture
7/6/2011 11:07:08 PM
A Dictionary Is a Tool
201
The elaboration is not quoted as a whole. It gives a general view of the historical background. Again, it is a continuation of the short explanation and underneath we see links to other lemmas, which may increase our understanding of the topic. Furthermore, there is a link to the systematic part § 3, which has music examples to show more details of this tonal system. i. We shall now return to our very first search string, the zarzuela:
I. zarzuela Kort forklaring en spansk form for syngespil med talt dialog Uddybende forklaring Zarzuela blev dyrket i det 17.–18. årh. i forbindelse med det spanske hofs fiestas, dvs. festspil. Syngespillet er opkaldt efter en af kong Philip IV af Spaniens jagthytter. Efter 1850 fik det en renaissance, som nærmede det stilistisk til det mere folkelige género chico. Se også género chico Flere oplysninger http://en.wikipedia.org/wiki/Zarzuela [lægges ind senere] www.zarzuela.net/
I. zarzuela Short explanation a Spanish form of ballad opera with spoken dialogue Elaboration Zarzuela was practised in the seventeenth and eighteenth centuries in connection with the fiestas or festivals of the Spanish Court. The ballad opera is named after one of the hunting lodges of King Philip IV of Spain. After 1850 it had a renaissance which was in style nearer to the more popular género chico. See also género chico More information http://en.wikipedia.org/wiki/Zarzuela [will appear shortly] www.zarzuela.net/
Under Se også is a link to another genre related to the zarzuela and beneath that links to Wikipedia and a website on zarzuela in English. It must be admitted that the latter is a commercial link, but it gives a survey of the history and might lead to more information. There are many links to Wikipedia in the Music Dictionary, not to the Danish one, due to its not being of much help, but to the German and the English sites. During work with the Politikens Musikordbog, which is a printed work from the 1990s serving as a source for the internet Music Dictionary, we very often asked our colleagues in the music school for help. After the publishing of the first internet dictionary, www.musikordbogen.dk, they were asked to proofread the new results. Our saxophone colleague has approved of the following example of an instrument search with the second button: J. saxofon J. saxophone Kort forklaring Short explanation et blæseinstrument, der pga. sit rørblad tradi- a wind instrument which, because of its use tionelt hører til under træblæserne, men of a reed, is grouped with the woodwind som er bygget af metal instruments, although it is made of metal
9781441128065_ch09_finals_txt_print.indd 201
7/6/2011 11:07:08 PM
202
e-Lexicography
Uddybende forklaring Saxofonen blev opfundet i 1846 som den belgiske instrumentmagers Adolphe Sax’ største succes. Han har derudover indlagt sig fortjeneste ved talrige forbedringer af andre træblæseinstrumenter, af mange ventilblæseinstrumenter og af pauken. Saxofonen er . . . Flere oplysninger ht t p://de.w i k ip ed i a .or g /w i k i/S a xofon Billede
Elaboration The saxophone was invented in 1846 by the Belgian instrument maker Adolphe Sax and turned out to be his greatest success. He furthermore earns our gratitude for many improvements in other woodwind instruments, many valve wind instruments and the kettledrum. The saxophone is . . . More information http://de.wikipedia.org/wiki/Saxofon Picture
If you try to find Adolphe Sax in this music dictionary you will not be successful. He can be found only in the next dictionary. But, before going on, we shall see an entry containing a sound example. K. Again, we assume a user having heard something interesting on the radio, this time with the term alpehorn, alpenhorn: K. alpehorn Kort forklaring et gammelt blæseinstrument fra bjergegne Uddybende forklaring Alpehornet er fundet/findes forskellige steder i europæiske bjergegne, hvor det har været brugt af hyrder til signaler og senere til musikudøvelse. Det består af et flere meter langt konisk trærør, fremstillet af to halvdele, samlet og omviklet med barkstrimler. Der er ingen blæsehuller, så det kan kun spille naturtonerne. I Norge kaldes instrumentet for neverlur. Der er også eksempler på alpehorninstrumenter fra Tibet og andre asiatiske bjergområder. Jævnfør Alpehornet har fundet indpas i kompositionsmusikken. Således skrev W. A. Mozarts far, Leopold Mozart, en Sinfonia pastorella for alpehorn og strygere, og en tjekkisk komponist ved navn Georg Druschetzky (1745–1819) har komponeret en partita for bondeinstrumenter, som bl.a. omfatter alpehorn. Flere oplysninger www.webfoto-interaktiv.dk/Himmel/ Billede
9781441128065_ch09_finals_txt_print.indd 202
K. alpenhorn Short explanation an old wind instrument used in mountainous regions Elaboration The alpenhorn can be found in different parts of European mountains where it was used by shepherds for communication and later on for making music as such. It consists of a several meter long conical wooden tube in two halves kept together and wrapped in cortex strips. There are no holes, therefore the alpenhorn can only play overtone melodies. In Norway this instrument is called neverlur. Alpenhorns are also found in Tibet and other Asian mountainous regions. cf. The alpenhorn has found its way into art music. W. A. Mozart’s father, Leopold Mozart, has written a Sinfonia pastorella for alpenhorn and strings and a Czech composer by the name of Georg Druschetzky (1745–1819) has composed a partita for peasant instruments in which the alpenhorn is found. More information www.webfoto-interaktiv.dk/Himmel/ Picture
7/6/2011 11:07:08 PM
A Dictionary Is a Tool
203
In this example, the expression Jævnfør (cf.) may be changed – it is the name of the section where remarks of a more secondary nature are inserted.
9.3.3. The Third Music Dictionary: A Polyfunctional Dictionary The third music dictionary is a polyfunctional dictionary. The user situations will be manifold, the users being the same. The terms need not be musical, but can be taken from common language such as, for example, the words spansk (Spanish), kinesisk (Chinese), or hurtig (fast). The system searches all text fields in the database and, if a search string happens to be a lemma, it is planned to appear as well. It is not to be recommended to write piano or violin as a search string, as you will experience so-called information death. In this dictionary the search for names might be successful, or the user can look up taffelform (table form), pæreform (pear-shape), and so on. Here are just a couple of examples: l. We assume that in this place readers should be especially interested in the word Spanien (Spain). If we had written Spanien/Spain and used the first or the second button, then we would not have got any hits. The search leads to 17 entries which later on are supposed to be shown as a list from which you can choose what you find interesting:
L. Spanien aragonesa flaviol folia folie d’Espagne gaita género chico guitar jongleur mandola mozarabisk panharmonicon romance sarabande saz tonada ut zarzuela
L. Spain aragonesa flaviol folia folie d’Espagne gaita género chico guitar jongleur mandola mozarabisk panharmonicon romance sarabande saz tonada ut zarzuela
As a matter of fact, none of these terms need a translation. We shall pick out just one of them as representative:
9781441128065_ch09_finals_txt_print.indd 203
7/6/2011 11:07:09 PM
204
e-Lexicography
gaita 1. Kort forklaring en sækkepibe, som stadig findes i Spanien Se også sækkepibe Flere oplysninger http://www.banchetto-musicale.com/english/gaita.htm 2. Kort forklaring en enhåndsfløjte med to gribehuller, som anvendes i (ældre) folkemusik i Spanien Flere oplysninger www.tamborileros.com/fotos.htm
gaita 1. Short explanation a bagpipe which is still found in Spain See also bagpipe More information w w w.banchetto -musicale.com/english/ gaita.htm 2. Short explanation a pipe to be played by one hand with two holes which has been used in (older) folkmusic in Spain More information www.tamborileros.com/fotos.htm
Under these entries we find some very nice links with informative sound examples. We shall now try a name, for example, that of a composer, in this third dictionary. m. Search string Berlioz shows 13 entries with short and long explanations in which the composer Berlioz is mentioned. As the list contains more than ten items, it can be seen as a list and the user is now invited to click onto each of them to see via which connection Berlioz has found his way into the dictionary: M. Berlioz cimbasso darbuka Dies Irae idée fixe instrumentation lamento ledemotiv Neudeutsche Schule ouverture programmusik romantik sonateform symfonisk digtning
M. Berlioz cimbasso, a type of bass trombone darbuka, a type of vase drum from North Africa Dies Irae, used by Berlioz in his Symphonie Fantastique idée fixe, leitmotiv, a theme attached to a person in an opera or an orchestral work instrumentation, of which Berlioz was a great expert lamento, elegy, exemplified by Berlioz among others leitmotiv – see above under idée fixe Neudeutsche Schule, a group of German composers promoting Berlioz among others ouverture, Berlioz has played a role in the history of this genre programme music the Romantic era sonata form symphonic poem, a genre of which Berlioz is one of the fathers
One can learn a lot about Berlioz merely by looking at this list. Some of the lemmas are somewhat surprising, for example, the darbuka. This may be seen as a bonus, some call it lexicotainment.
9781441128065_ch09_finals_txt_print.indd 204
7/6/2011 11:07:09 PM
A Dictionary Is a Tool
205
We should like to present another name of a modern composer, known for his odd ideas and philosophical approach to composing, the American John Cage. n. Search string John Cage shows five entries, of which some are English N. John Cage aleatorisk chance operation indeterminacy indetermineret musik kalkant
N. John Cage aleatoric chance operation indeterminacy indetermined music bellows treader
Only the latter is somewhat surprising, so we recommend clicking on the internet link: www.john-cage.halberstadt.de/new/index.php?seite=galerie&l=d. It does not explain the bellow treader as such, but again it is rather an entertaining step into a very odd world, the world of slowness. o. Search string taffelform (table form). Since this word is not a musical term it is not lemmatized but, as we shall see, this does not mean that we cannot obtain knowledge about it in the third dictionary:
O. taffelform flygel klaver taffelklaver
O. table form grand piano piano table piano
taffelform as a word is marked in the text for the user to learn about it. p. Search string pæreform (pear-shape) gives us five entries, all of them with the word in Danish in the form of pæreformet (pear-shaped), which means that you do not even have to write exactly the right form of the word as you might get hits anyway:
P. pæreform gadulka gige 1. og 2. pandura rebek sitar
P. pear-shape gadulka gige 1. and 2. pandura rebek sitar
All of these instrument names are very much the same in English. As you can see even from the Danish text, this is about old string instruments, partly from
9781441128065_ch09_finals_txt_print.indd 205
7/6/2011 11:07:10 PM
206
e-Lexicography
Europe and partly from the Far East. This means that you acquire information even without going more thoroughly into the lists. Finally, we will show that this dictionary may even serve as a production dictionary. Imagine a user looking for the musical term for hurtig (fast). This word may be a part of the texts in the elaborations, but as it is common in the translations of the tempo-terms, you can easily find what you are looking for. q. Do not be overwhelmed, your search with hurtig (fast) offers: hurtig hurtig hurtig hurtig Nederst på formularen Din søgning gav for mange resultater. Et begrænset antal bliver vist. (Your search gave too many hits. We offer you a limited number.) Q. hurtig (fast) pronto rapido tosto veloce vif vite vive a tempo primo accel. agile agité allegro allemande arie bariolage etc.
Q. fast pronto rapido tosto veloce vif vite vive a tempo primo accel. agile agité allegro allemande arie bariolage etc.
From the beginning of the list it is seen that the system searches first in the translations and then in the short explanations. Further down the list you find the hits from the elaborations and cf. texts. And surfing from the list and back changes the buttons, so it is very useful to have on your left hand side a list of your last search strings to refer back to.
9.4. Conclusion A good tool is a tool that is able to fulfil the needs of a certain user group by giving quick access to the data, by giving relevant and correct data for this
9781441128065_ch09_finals_txt_print.indd 206
7/6/2011 11:07:10 PM
A Dictionary Is a Tool
207
user group in an understandable way. This is not what traditional polyfunctional dictionaries offer. Such paper dictionaries and electronic dictionaries show all data from all data fields in the database. However, if you really want to make tools for different purposes, you must make search possibilities and data presentation possibilities for different needs, for example, for a reception problem you need data about the meaning – and nothing else. In the end, you can produce many, hundreds or thousands of different monofunctional tools outgoing from the same database by using different search and presentation possibilities.
9781441128065_ch09_finals_txt_print.indd 207
7/6/2011 11:07:10 PM
Chapter 10
The Technical Realization of Three Monofunctional Phrasal Verb Dictionaries Birger Andersen Richard Almind
10.1. Introduction This chapter will deal with a dictionary project, involving the creation of a database that will be capable of generating three monofunctional phrasal verb dictionaries – one for checking the meaning of English phrasal verbs, one for translating English phrasal verbs into Danish, and one for assisting users in the grammatically and stylistically correct use of English phrasal verbs. The motivation for producing dictionaries for English phrasal verbs lies in the fact that these multi-word verbs are a difficult area for non-native speakers of English. First of all, it is a fact that for many phrasal verbs, it is difficult to deduce their meaning from their individual parts. This gives rise to problems with respect to understanding the meaning of phrasal verbs. This is also the reason why they cause translation problems. A number of studies of the use of phrasal verbs by learners of English also point to a number of problems. For learners with Hebrew as their mother tongue, it has been demonstrated that generally they use fewer phrasal verbs in connection with production in English (including translation into English) than do native speakers of English (Dagut & Laufer, 1985; Laufer & Eliasson 1993). The same tendency has been demonstrated for learners with Chinese as their mother tongue (Liao & Fukuya, 2004). The general tendency is also, however, that advanced learners use more phrasal verbs than do intermediate learners. In all these investigations, it is emphasized that these learners’ underuse of English phrasal verbs is due to the fact that non-Germanic languages lack grammatical structures similar to English phrasal verbs. Although most Germanic languages have grammatical structures similar to English phrasal verbs, it has also been demonstrated that learners with Germanic languages as their mother tongue underuse English phrasal verbs (compared to native speakers of English) in connection with production in
9781441128065_ch10_finals_txt_print.indd 208
7/6/2011 11:07:24 PM
The Technical Realization
209
English (including translation into English). This applies to, for example, learners with Swedish as their mother tongue (Laufer & Eliasson, 1993) and learners with Dutch as their mother tongue (Hulstijn & Marchena, 1989). In sum, the rationale for the project here described is that English phrasal verbs create problems for language users, who do not have English as their mother tongue, in connection with reception, translation into Danish and production of English texts with phrasal verbs.
10.2. Data Input 10.2.1. Lemma Selection The term ‘phrasal verb’ is used in this connection as a cover term for different types of multi-word verbs. Linguists make a distinction between three types of what is called here phrasal verbs, consisting of a verb and a so-called particle, which is either an adverb or a preposition or a verb plus two particles (an adverb and a preposition), see for example, Darwin and Gray (1999), Biber et al. (1999: 404–427), Downing and Locke (2002: 332–336) and Andersen (2006: 157–163). As previously indicated, our lemma selection is not based on a strictly linguistic definition of phrasal verbs, but rather on assisting dictionary users with solving problems in connection with all the types of multi-word verbs mentioned. Hence, there are phrasal verbs proper, that is, a combination of a verb and an adverb (The chairman stepped down); then, there are prepositional verbs which consist of a verb and a preposition (We will look into the matter); and finally there are phrasal prepositional verbs consisting of a verb and both an adverb and a preposition (You’d better get on with it). All three types will be selected for the database. This is also in accordance with the practice within British and American lexicography, where monolingual socalled phrasal verb dictionaries cover all three types of multi-word verbs, for example, Collins Cobuild Dictionary of Phrasal Verbs (1989), Longman Phrasal Verbs Dictionary (2000), Macmillan Phrasal Verbs Plus (2005), American Heritage Dictionary of Phrasal Verbs (2005), Oxford Phrasal Verbs (2006) and Cambridge Phrasal Verbs Dictionary (2006). Apart from these, we have also included in the database fi xed expressions containing phrasal verbs, as well as a number of so-called free combinations (We leave for Madrid by the next plane), considering that they may cause problems with respect to understanding, translation and production.
10.2.2. Target Groups The target groups of the dictionaries are, on the one hand, Danish students at bachelor and master level, primarily students of English, but also other
9781441128065_ch10_finals_txt_print.indd 209
7/6/2011 11:07:24 PM
210
e-Lexicography
students who are required to read large amounts of English-language literature during their studies and who also from time to time are required to produce English-language texts such as essays, reports and so on. On the other hand, professional translators are also a target group for the dictionaries.
10.2.3. The Lexicographic Functions of the Database The lexicographic functions, which the database is supposed to cover, have already been alluded to. Since the meaning of phrasal verbs is often quite opaque, the database must be capable of generating a dictionary that assists dictionary users in the reception of English-language texts. For this function, the users will be primarily students. Since phrasal verbs also give rise to translation problems, the database must also be capable of generating a dictionary that gives assistance with respect to translation of English-language texts with phrasal verbs into Danish. Finally, as non-native speakers of English also have problems with the correct grammatical and stylistic use of English phrasal verbs, the database must be capable of generating a dictionary that assists in the production of Englishlanguage texts with phrasal verbs.
10.2.4. The Overall Structure of the Database To illustrate the overall structure of the database, here is the ‘skeleton’ for the phrasal verb call back: CALL BACK 1. RINGE TILBAGE: 2. RINGE OP IGEN: 3. KIGGE INDENFOR IGEN: 4. KALDE TILBAGE: 5. TILBAGEKALDE: 6. GENKALDE SIG:
1. call back (‘ringe tilbage’) 2. call somebody back (‘ringe tilbage til nogen’) 1. call back (‘ringe op igen’) 2. call somebody back (‘ringe op igen til nogen’) 1. call back (‘kigge indenfor igen’) 1. call back somebody (‘kalde nogen tilbage’) 1. call back something (‘tilbagekalde noget’) 1. call back something (‘genkalde sig noget’)
The main access route to the entry for this phrasal verb is a search string, which is the phrasal verb. The reason why access is through the English phrasal verb is, of course, the assumption that the majority of the intended target groups of the dictionaries are aware of the existence of these phrasal verbs. For the potential dictionary users, for example, students who are not students of English, for whom the notion of phrasal verb may be unknown, we may have to allow a search possibility on the verb part of the phrasal verb alone. Such a
9781441128065_ch10_finals_txt_print.indd 210
7/6/2011 11:07:24 PM
The Technical Realization
211
search will result in a list of all phrasal verbs that contain the verb in question with the possibility of continuing the search from there. The majority of phrasal verbs have more than one sense and these senses are given in the form of a Danish verbal expression. They serve two purposes. In connection with reception and translation, they serve to give an overview of the various senses of the individual phrasal verb so that the dictionary user can easily locate the relevant sense and translation. The sense indications also serve a purpose in connection with production, since they give access to data that are relevant for the correct use of the English phrasal verb that corresponds to the sense. For individual senses of the given phrasal verb, there is often more than one grammatical structure into which the phrasal verb can be inserted. We speak here mainly about intransitive and transitive use of the phrasal verb, and, for the transitive use, different forms of objects such as noun phrases and various forms of finite and infinite subordinate clauses. All this is given in the form of so-called pattern illustrations. The full dictionary article for the phrasal verb call back showing all data types looks like this: CALL BACK 1. RINGE TILBAGE call back ringe tilbage Examples: Can you call back later? Mrs Cohen is in a meeting at present. Kan De ringe tilbage senere? Mrs Cohen sidder i møde lige nu. English synonyms: ring back (British) phone back call back somebody, call somebody back ringe tilbage til nogen Examples: Call me back as soon as you’ve got the results of the test. Ring tilbage til mig så snart du har resultaterne af prøven. English synonyms: ring somebody back (British) phone somebody back 2. RINGE OP IGEN call back ringe op igen Examples:
9781441128065_ch10_finals_txt_print.indd 211
7/6/2011 11:07:24 PM
212
e-Lexicography
He wasn’t in. I’ll call back later. Han var der ikke. Jeg ringer op igen senere. English synonyms: ring back (British) phone back call back somebody, call somebody back ringe nogen op igen Examples: I’ll try and call him back tomorrow. Jeg prøver at ringe ham op igen i morgen. English synonyms: ring somebody back (British) phone somebody back 3. KIGGE INDENFOR IGEN call back (British) kigge indenfor igen Examples: I’ll call back on my way home from work. Jeg kigger ind igen på vej hjem fra arbejde. Danish synonyms: kigge ind igen komme igen English synonyms: return stop back (American) 4. KALDE TILBAGE call back somebody, call somebody back kalde nogen tilbage Examples: I ran off, but he called me back. Jeg løb væk, men han kaldte mig tilbage. 5. TILBAGEKALDE call back something, call something back tilbagekalde noget Examples: The company has called back all models of this car built in 2002. Virksomheden har tilbagekaldt alle modeller fra 2002 af denne bil. English synonyms: recall something
9781441128065_ch10_finals_txt_print.indd 212
7/6/2011 11:07:24 PM
The Technical Realization
213
call something in 6. GENKALDE SIG call back something, call something back genkalde sig noget Examples: I cannot call his face back. Jeg kan ikke genkalde mig hans ansigt. We have first the search string, that is, the phrasal verb call back, and then the sense indication for the first sense ringe tilbage. This is followed by the first grammatical structure for this sense, namely the intransitive use in the form of the pattern illustration call back. Then follows the Danish equivalent, also in the form of a pattern illustration. Following this, we have English example sentences with Danish translations, and in this case two English synonyms, of which one features the linguistic label (British). As far as the transitive use of the same sense of call back is concerned, the grammatical structure is supplemented by the information that it does not occur in the passive voice . If there are synonyms to the Danish equivalent, these appear separately as in the third sense of call back. It also appears explicitly, if the particle can be placed both before and after the direct object.
10.2.5. Grammatical Data As mentioned already, the dictionary user is provided with explicit grammatical data, such as the fact that call back in sense two and the transitive use does not occur in the passive voice. The grammatical labels of this kind are the following: () () () () () and should require no further explanation, but there may be some reasons to comment further on the more controversial , and . So far, it has not been possible to come up with a solution that allows incorporation directly into the dictionary article of information to the dictionary user under what circumstances the given phrasal verb will occur in the passive voice, since it has nothing to
9781441128065_ch10_finals_txt_print.indd 213
7/6/2011 11:07:24 PM
214
e-Lexicography
do with the individual phrasal verb as such, but with general rules and conventions in English for the choice of passive voice instead of active voice. For a phrasal verb such as acquit of, it is indicated that it often occurs in the passive voice, because in most cases it will be the patient rather than the agent, which is the theme of the sentence. Such information may be given in a separate dictionary grammar. ACQUIT OF 1. FRIKENDE FOR acquit somebody of something (formal) frikende nogen for noget Examples: He was acquitted of all charges. Han blev frikendt for alle anklager. The same applies to call away. CALL AWAY 1. KALDE UD call away sb kalde nogen ud Examples: He has been called away to deal with a problem at our Birmingham branch. Han er blevet kaldt ud for at klare et problem i vores filial i Birmingham. At times a phrasal verb in a given sense will always occur in the passive voice. In such cases, this is simply indicated in the pattern illustration such as in the third sense of the phrasal verb aim at: AIM AT [...] 3. HENVENDE SIG TIL be aimed at somebody henvende sig til nogen Examples: The magazine is aimed at teenagers. Bladet er rettet mod teenagere. The same applies if a phrasal verb in a given sense always occurs in a negative structure such as the sixth sense of the phrasal verb agree with:
9781441128065_ch10_finals_txt_print.indd 214
7/6/2011 11:07:24 PM
The Technical Realization
215
AGREE WITH [...] 6. KUNNE TÅLE not agree with somebody ??? Examples: Orange juice in the morning does not agree with me. Jeg kan ikke tåle orange juice om morgenen. Since the intended target groups of the dictionaries must be assumed to have some knowledge about the grammar of the English language, there are grammatical data which will not be included in the database. There are for example no morphological data about the English phrasal verbs such as inflectional suffi xes, and there are no plans to include irregular verb forms as entries with reference to their base forms.
10.2.6. Linguistic Labels The English pattern illustrations will also be provided with linguistic labels of various kinds. As it has appeared, also English synonyms will be provided with these linguistic labels. They include the following: (a) diaphatic labels:
(uformelt) (informal) (neutralt) (neutral) (formelt) (formal)
(b) diachronous labels:
(gammeldags) (old-fashioned)
(c) diatopical labels:
(britisk) (British) (amerikansk) (American) (australsk) (Australian)
(d) diatextual labels:
(litterært) (literary)
(e) diastratic labels:
(slang) ( slang)
(f) diaevaluative labels:
(humoristisk) (humorous) (nedsættende) (derogatory)
(g) dianormative labels:
(ukorrekt) (incorrect)
10.2.7. Collocations and Fixed Expressions The database will also be provided with English collocations with Danish translations. The functions of the collocations in the individual dictionaries
9781441128065_ch10_finals_txt_print.indd 215
7/6/2011 11:07:25 PM
e-Lexicography
216
will be the same as that of the example sentences, that is, for reception, the English collocations will help the dictionary user ascertain the precise meaning of the English phrasal verb; for translation, the English collocations with Danish translations will be invaluable; and for production the collocations will give the dictionary user information about which words the English phrasal verb typically combines with. A large number of English phrasal verbs are part of fi xed expressions which will also be selected for the database. As an example, the phrasal verb ask for is part of the fi xed expressions be asking for trouble and be asking for it. ASK FOR [...] 4. SELV VÆRE UDE OM DET be asking for trouble selv være ude om det Examples: Anyone who goes into Chapeltown after dark is asking for trouble. Enhver der vover sig ind i Chapeltown efter mørkets frembrud udfordrer skæbnen. Danish synonyms: udfordre skæbnen English synonyms: have it coming for you bring it on yourself be asking for it selv være ude om det Examples: Anyone who drives while they’re drunk is just asking for it. Enhver der kører i beruset tilstand er selv ude om det. As can be seen, the dictionary user gets the explicit grammatical information that the phrasal verb in these fi xed expressions always occurs in the progressive. Another example is the phrasal verb answer to, which is part of the fi xed expression answer to the name of something: ANSWER TO [...] 7. LYDE NAVNET answer to the name of something (literary) (often humorous) lyde navnet Examples:
9781441128065_ch10_finals_txt_print.indd 216
7/6/2011 11:07:25 PM
The Technical Realization
217
They had two cats: one was called Treacle and the other answered to the name of Faustina. De havde to katte. Den ene hed Treacle og den anden lød navnet Faustina. It will appear that this expression is literary and often humorous.
10.3. Generation of the Dictionaries 10.3.1. Selection of Data Types The types of data included in the dictionary articles in the database are largely the traditional ones and the form in which they are realized is also largely traditional. The innovative aspect of the project is that the database will be capable of generating three different dictionaries with three different functions. The lexicographic function theory is based on the notion that dictionaries are tools for the satisfaction of extra-lexicographic needs of potential dictionary users. The function theory has traditionally distinguished between cognitive, communicative, operative and interpretive needs. The database of this project must be capable of generating dictionaries covering three different communicative needs, namely the need for dictionary assistance in connection with the reception of English-language texts, the need for dictionary assistance in connection with translation of English-language texts into Danish and finally the need for dictionary assistance in connection with the production of Englishlanguage texts. The project also assumes that not all types of data in the database are relevant for the three different communicative needs. From the start, the user of the database is confronted with three options reflecting the three different needs. These options must be formulated as simply as possible, for example: For reception: What is the meaning of the English phrasal verb? For translation: How do I translate the English phrasal verb into Danish? For production: How do I use the English phrasal verb? By clicking into one of these options, the user will be given access to the types of data in the individual articles in the database that are relevant for the satisfaction of the given need. The user will also have the option to get access to all types of data in the article in the database, since it may be consulted by users with other needs than the three communicative needs specified here. But which types of data are relevant for the three communicative needs? With respect to reception, the user will look up phrasal verbs which he or she does not know the meaning of . The display of data will, of course, include senses together with the grammatical structures followed by the Danish equivalent. It
9781441128065_ch10_finals_txt_print.indd 217
7/6/2011 11:07:25 PM
218
e-Lexicography
could be argued that this should be sufficient to tell the user what the English phrasal verb means, but the display will also include the English example sentences, since the precise meaning of the phrasal verb is often dependent on the arguments it occurs with in the sentence. Danish synonyms may also contribute to giving the user a more precise indication of the meaning of the phrasal verb. For reception it is irrelevant to display grammatical information and information about the use of the phrasal verb in the form of linguistic labels. English synonyms are also irrelevant for understanding the meaning of the English phrasal verb. So by clicking into reception, the user will get access to the following set of data: (a) phrasal verb (b) sense(s) (c) grammatical structure(s) (d) Danish equivalent (e) English example sentences (f) Danish synonyms With respect to translation, the starting point is again the phrasal verb, with its senses and grammatical structures. In connection with the translation of English phrasal verbs it is essential to have information about the stylistic properties of the phrasal verb in question, for example, whether the phrasal verb is old-fashioned, to be able to choose a Danish equivalent with the same style. All this information is given in the form of linguistic labels with the exception of diatopical labels, since it is irrelevant for the translation whether the phrasal verb in question is restricted to British, American or Australian English. Grammatical information about the English phrasal verb is also irrelevant for translation into Danish, whereas the Danish equivalent is of course of the utmost importance. English example sentences with their Danish translations are also extremely relevant as are Danish synonyms, whereas English synonyms are irrelevant in this connection. So by clicking into translation, the user gets access to the following data set: (a) phrasal verb (b) sense(s) (c) grammatical structure(s) (d) diaphatic, diachronous, diatextual, diaevaluative and dianormative labels (e) Danish equivalent (f) English example sentences with their Danish translations (g) Danish synonyms
9781441128065_ch10_finals_txt_print.indd 218
7/6/2011 11:07:25 PM
The Technical Realization
219
For production the starting point is the same: The phrasal verb with its senses and grammatical structures. To be able to produce grammatically and stylistically correct texts in English it is of course necessary for the user to have access to the relevant grammatical data about the English phrasal verb together with all the stylistic information provided by the linguistic labels. It will also be of great help to the user to see the phrasal verb used in authentic English sentences, and in connection with production the display of data will also include English synonyms with all grammatical and stylistic information, since they give the user the possibility to choose the phrasal verb that fits in with a given communicative situation. The production dictionary will also display the Danish equivalent, so that the user of the dictionary has a possibility of checking that the English phrasal verb in fact has the relevant meaning, but for production purposes it is irrelevant to display Danish synonyms. So by clicking into production, the user gets access to the following data set: (a) phrasal verb (b) sense(s) (c) grammatical structure(s) (d) grammatical data (e) all linguistic labels (f) Danish equivalent (g) English example sentences (without their Danish translations) (h) English synonyms with their grammatical and stylistic information To illustrate the generation of the three dictionaries, the phrasal verb cash in is given here first with all types of data: CASH IN 1. SCORE KASSEN cash in (informal) score kassen Examples: Copying programs is a simple operation and the software pirates are cashing in. Det er en simpel ting at kopiere programmer og softwarepiraterne scorer kassen. Danish synonyms: tjene en masse penge English synonyms: clean up (informal) make a fortune
9781441128065_ch10_finals_txt_print.indd 219
7/6/2011 11:07:25 PM
e-Lexicography
220
2. INDLØSE cash in something, cash something in indløse noget Examples: He cashed in all his bonds to raise the money to buy a boat. Han indløste alle sine obligationer for at skaffe penge til at købe en båd. 3. OPHÆVE cash in something, cash something in ophæve noget Examples: You will lose money if you cash in your policy early. De mister penge hvis De ophæver Deres forsikring før tid. 4. STILLE TRÆSKOENE cash in (American) (informal) stille træskoene Examples: My uncle finally cashed in after a long illness. Min onkel himlede til sidst efter lang tids sygdom. Danish synonyms: sætte træskoene himle kradse af tage billetten English synonyms: kick the bucket (informal) pop off (British) (informal) snuff it (British) (informal) pop your clogs (British) (informal) cash in your chips (American) (informal) cash in your checks (American) (informal) die pass away (formal) Cash in has four different senses. The first is score kassen, which means to earn a lot of money. In this sense, cash in is intransitive and informal. Then follow two related senses – expressed in Danish as indløse and ophæve – where the difference lies in the nature of what is cashed in – namely, on the one hand securities such as bonds and on the other hand insurance policies. The fourth sense – to die – has been given an informal Danish sense indication – stille træskoene – corresponding to the British expression to pop your
9781441128065_ch10_finals_txt_print.indd 220
7/6/2011 11:07:25 PM
The Technical Realization
221
clogs. It is also indicated that this sense is restricted to American English. A stylistically equivalent Danish expression is stille træskoene with a number of synonyms. As can be seen, the display also includes a large number of English synonyms with the relevant linguistic labels. In the following illustrations, only the data for this fourth sense of cash in will be shown. If the dictionary user chooses the reception dictionary, the following data will be displayed: (a) The sense (b) The English pattern illustration which shows that cash in in this sense is intransitive (c) The Danish equivalent (d) English example sentences without Danish translations (e) Danish synonyms which may help clarify the meaning if the dictionary user – against expectations – does not know the meaning of the Danish equivalent. 4. STILLE TRÆSKOENE cash in (informal) stille træskoene Examples: My uncle finally cashed in after a long illness Danish synonyms: sætte træskoene himle kradse af tage billetten If the user chooses the translation dictionary, the following data are displayed: (a) The sense (b) The English pattern illustration with the added information that the English phrasal verb is informal (c) The Danish equivalent (d) English example sentences with Danish translations (e) Danish synonyms for the dictionary user to choose between based on stylistic criteria. 4. STILLE TRÆSKOENE cash in (informal) stille træskoene Examples: My uncle finally cashed in after a long illness. Min onkel himlede til sidst efter lang tids sygdom.
9781441128065_ch10_finals_txt_print.indd 221
7/6/2011 11:07:26 PM
e-Lexicography
222
Danish synonyms: sætte træskoene himle kradse af tage billetten If the user chooses the production dictionary, the following data are displayed: (a) The sense (b) The English pattern illustration with the added information that cash in in this sense is informal and restricted to American English (c) The Danish equivalent so that the dictionary user can be absolutely certain of the meaning of the English phrasal verb (d) English example sentences to illustrate the use of the phrasal verb in authentic English sentences (e) English synonyms with information about style so that the dictionary user is offered alternatives to the English phrasal verb for different communicative situations. 4. STILLE TRÆSKOENE cash in (American) (informal) stille træskoene Examples: My uncle finally cashed in after a long illness. English synonyms: cash in your checks (American) (informal) cash in your chips (American) (informal) kick the bucket (informal) pop your clogs (British) (informal) pop off (British) (informal) snuff it (British) (informal) die pass away (formal)
10.3.2. Sense Ordering It has already been mentioned that many phrasal verbs have more than one sense, and that these senses are indicated by means of Danish verbal expressions. At that point, however, principles for ordering senses were not discussed. Lew (2009) discusses the possibility of customizing functionally based sense ordering in the sense that for computer-based monofunctional dictionaries
9781441128065_ch10_finals_txt_print.indd 222
7/6/2011 11:07:26 PM
The Technical Realization
223
based on polyfunctional databases, sense ordering could be made sensitive to the currently active function, either chosen by the dictionary user of the function or activated by the computer based on different parameters, such as for example other running applications. Lew was mainly concerned with monolingual general-language learner’s dictionaries, and the criteria for sense ordering was mainly based on frequency of use of the senses. However, for a specific subset of linguistic items such as phrasal verbs, the basic criteria for sense ordering may be others than frequency of use of given senses. What is suggested here in this respect is that we take our point of departure in non-native language users’ specific difficulties with these particular structures. Basically, for reception, semantically opaque senses should come first, because these are the ones that are more likely to be consulted by the users of the reception dictionary. Since potential users will use the reception dictionary mainly in connection with reading various kinds of academic texts, there might also be some justification for prioritizing specifically academic senses of phrasal verbs, although such senses seem to be relatively rare. For translation purposes, it also seems sensible to prioritize opaque senses, since these are more likely to be consulted than transparent senses. For production purposes, there might be some justification in prioritizing the more semantically transparent senses, since the hypothesis here is that it is these senses of phrasal verbs that are more likely to be used in producing English texts, which means that dictionary users will want to access data about the grammatical and stylistic use of these senses. It must be emphasized that these observations are not based on empirical investigations, and any decisions about conscious sense ordering will have to rely on findings from investigations into these matters. Also, the whole matter about sense ordering rests upon one important assumption which has not so far been discussed, namely that the look-up behaviour of the average dictionary user is the same for electronic dictionaries as for printed dictionaries in the sense that in their search for lexicographic data, users’ look-up process proceeds in a linear fashion from the first sense to the second sense to the third sense and so on, until they reach the sense that is relevant for the satisfaction of their look-up needs. However, this assumption has never been confirmed.
10.4. Creating the Editing Tool Creating a database from lexicographical instructions rarely if ever follows a straight and narrow path and much discussion precedes. The main problem is the lexicographer projecting in his mind’s eye complete formatted articles whereas the programmer constructs the methods necessary to combine data
9781441128065_ch10_finals_txt_print.indd 223
7/6/2011 11:07:26 PM
224
e-Lexicography
elements to that end by simulating the search and their results based on the user’s needs described. The actual process of entering and retrieving data in a database and later collating them into useful information for a user necessitates that all elements be properly placed in a hierarchical structure according to their internal relations. This hierarchy never intentionally mimics the formatted version of a fi nished dictionary article since the lexicographical elements, that is, collocations, synonyms, etc., in any given article are interchangeable according to the needs of the user. All elements are thus expendable; some will be included in one user need and others will not. Some will be included in a search and be shown in the result, others included in a search but not shown, and quite often an element will be shown which has not been included in a search. This complexity can only be described through a high level of abstraction and the first thing a programmer does or should do is to dismantle the lexicographer’s instructions into its smallest fragments and reassemble them to ensure coherence according to the perceived internal relations. This, of course, may cause changes in the lexicographical structure where elements are added, split into smaller parts, moved or removed entirely because the detailed review reveals flaws but also possibilities that were overlooked earlier. The resulting structure of the database cannot be altered without consequences once data has been added to it. The database for the English Phrasal Verb Dictionary is considered a work in progress and the following data elements have so far been identified: 1. Phrasal verbs 2. a) Definitions, that is, Danish verb combination b) Explanation to same 3. Grammatical structure for English phrasal verb 4. a) Danish equivalent structure to phrasal verb b) synonyms 5. a) English collocations b) Danish equivalents 6. a) English examples b) Danish equivalents 7. English synonyms Previous experience shows that there is a strong possibility that many of the listed elements will be further broken down into minor data elements. Already, there are hidden elements – many editorial in nature – not listed here that have been added for editing purposes and although not strictly speaking a part of the lexicographical set of instructions they are an inherent part of the structure and must be treated as such. In a way it can be said that these types of
9781441128065_ch10_finals_txt_print.indd 224
7/6/2011 11:07:26 PM
The Technical Realization
Equivalent
Phrasal verbs
Definitions
225
Synonyms
Gram.struct.s
Collocations
Examples
Synonyms
Figure 10.1. Workflow and structural diagram.
data are a part of the dictionary for the one special user that is the editor. With this in mind the database for the English Phrasal Verb Dictionary currently has the structure shown in Figure 10.1: The hierarchy is read from left to right but might as well be described from top to bottom. Equivalent, Collocations, Examples and Synonyms are on the same level in the hierarchy. The reason this structure is displayed horizontally instead of vertically, as would usually be the case, is to emphasize the workflow, which starts in the leftmost table and progresses towards the right. For an article to be valid the three leftmost tables must contain data. As is evident, the lexicographical elements mentioned earlier have been translated into data elements roughly on a one-to-one basis, where each lexicographical element has been given a table of its own. In some cases a table can contain more than one lexicographical element, as is the case with the second table Definitions, which contains the element dansk verbal_forbindelse and its explanation forklaring_til_verb_forbindelse. This happens rarely and only when there is certainty that the two elements always correlate in a closed dependency. For open dependencies, related tables must be created, as is the case, for instance, for Collocations, where the translation element is in a separate table to allow for variants and alternate translations. The upstream relationships are called dependencies. These are necessary to ensure proper forming of articles. It is not possible to create an element of type grammatical structures related to the element of type phrasal_verb unless there is first an element of type dansk_verbal_forbindelse. This excludes the
9781441128065_ch10_finals_txt_print.indd 225
7/6/2011 11:07:26 PM
226
e-Lexicography
Figure 10.2: Default interface. Top level of the database.
possibility of having for instance a definition without a phrasal verb and similarly a grammatical structure without a definition that ensures consistency of elements across the database. Unlike a text editor, the database is not a tool in itself. For it to have any functionality, for instance, adding and altering data, a set of tools must be programmed to form a database application. Some database systems, like Filemaker Pro, come with many tools included and creating an application is relatively straight forward. Other systems, for instance, MySQL, come with virtually no tools at all, save a command line interface (CLI) that allows for very powerful data management but is impractical in everyday use. For these, an interface must be programmed separately, but for both the rule is that without an interface, there is no database, at least not as far as the editor is concerned (Figure 10.2). The interface for the English Phrasal Verb Dictionary reflects the cascading nature of the structure. It contains seven interfaces that give access to the underlying dependency through portals that allow the editor to create and edit multiple related data directly. At the bottom of each layout are a number of buttons containing basic navigation, delete- and find-buttons and a button for sorting a found set of data (Figure 10.3). Editing lexicographic elements is done either directly in a visible field or by opening the element and its dependencies in a new window through the click of a button. In this interface (Figure 10.3), the button Ækvivalente strukturer opens the interface for the second level.
9781441128065_ch10_finals_txt_print.indd 226
7/6/2011 11:07:28 PM
The Technical Realization
227
Figure 10.3: Interface for the Danish verbal form. Second level of the database.
Again, the system shows one lexicographical element and a portal to edit related data. Further editing of the related data is possible through the button marked Grammatiske strukturer. The system does not allow for direct parallel editing of data per se. Multiple windows with varying top-levels can be opened, but that can easily lead to confusion. The article structure and the amount of content makes it necessary to keep a strict workflow through the structure for each phrasal verb’s dependencies. On the right side of each interface and independent of the level in the structure is a collation of all article elements that resembles a complete article as it would appear in a printed dictionary. This field gets updated live as data is validated. On the left side is a panel reserved for editorial notes, which can be used for various purposes. As mentioned earlier Equivalent, Collocations, Examples and Synonyms are on the same relative level to each other and all dependent on one table (Figure 10.4). The interface for these varies in a technical solution that allows for all to be condensed on the same interface. This is only included here to show the possibility and has no impact on the way data is entered and validated. As this is a work in progress, many aspects and tools do not yet work as intended and the number of tools will increase, just as existing ones will change in functionality. The current database structure as such will remain, though, since a change at this point where data has already been entered will cause data to either be lost or painstakingly preserved and re-entered either manually or complicated algorithms.
9781441128065_ch10_finals_txt_print.indd 227
7/6/2011 11:07:29 PM
228
e-Lexicography
Figure 10.4: Interface for Equivalent, Collocations, Examples and Synonyms. Fourth level of the database.
The main reason to use a database as a container for lexicographic data is that as long as the editor adheres to the structure and does not, for instance, write collocations into a synonym field, the data can be combined to suit any user’s need in their given situation, as long as it is within the scope of the data selected by the editor. The resulting article, therefore, is a formatted result of parts of the data in the database but not of all the data in the database. Therefore, the mass of data in the database is not a dictionary in itself but is a collection of data to be selected and used in any combination defined by the user’s requirements, which is comparable to a corpus. Seen from a programmer’s point of view, the distinction is even harsher: the dictionary is pure output and the database is pure input, and until the user initiates a search that results in at least one data element as output, no actual dictionary exists.
10.5. Conclusion This article has given an outline of a project aimed at producing an electronic bilingual English–Danish phrasal verb database, aimed at the needs of advanced learners of English and English–Danish translators. These needs have been defined as understanding the meaning(s) of phrasal verbs, translating English verbs into Danish and using English phrasal verbs grammatically and stylistically correct. The database, therefore, must cover the lexicographic functions of reception, translation and production and a discussion of several
9781441128065_ch10_finals_txt_print.indd 228
7/6/2011 11:07:30 PM
The Technical Realization
229
lexicographical aspects – lemma selection, sense ordering, grammatical information and style labels – have been offered in connection with a presentation of the structure and contents of the database. However, the most important aspect of the proposed database – and one which is of general relevance to modern lexicography – is that the electronic format offers the opportunity of dynamic data presentation, that is, presenting to the users only that set of data out of all data in each individual entry which is relevant to a given user situation, be it reception, translation or production. In other words, the project aims at creating a database capable of generating three monofunctional dictionaries, one for reception, one for translation and one for production. In that way, the proposed database follows the theoretical and technical requirements already discussed in previous chapters, especially those dealing with the Accounting Dictionaries and the Danish Music Dictionary.
9781441128065_ch10_finals_txt_print.indd 229
7/6/2011 11:07:31 PM
Chapter 11
Online Dictionaries of English Robert Lew
11.1. Introduction The chapter is intended as an overview of online dictionaries of English, often seen, and probably rightly, as the leading lexicographic tradition of the present. Although a balanced overview is my primary goal, I will also touch upon some general issues and adopt a more evaluative position here and there. However, this will only be a secondary perspective, as the specific issues are covered in greater depth in some of the other chapters in the present volume. Obviously, given the sheer number of the currently available on-line dictionaries, no one can hope to produce a complete catalogue, and this is not the purpose here. Rather, the idea is to present prominent and representative exemplars of specific types of dictionaries and focus on their properties of interest. But what are those types of dictionaries? As dictionaries can be, and have been, compared on a number of different levels, classifying them has traditionally been problematic. This has become even more of a challenge in the age of electronic dictionaries. What, then, could be the basic classifying criteria for online dictionaries? Clearly, most of the traditional criteria can still be applied to online products. Here, of course, we fi nd the complex (and at times confusing) network of overlapping oppositions: general/specialized subject, general/special purpose, L1/L2/FL speaker, expert/layman, contemporary/historical and so on. There do appear, however, to be some criteria or oppositions that have not been inherited from printed dictionaries but rather are specific to online dictionaries.
11.2. Some Additional Criteria for Classifying Online Dictionaries 11.2.1. Institutional versus Collective A variety of overlapping classification criteria have been used to categorize online dictionaries. For example, in terms of user involvement, there is the
9781441128065_ch11_finals_txt_print.indd 230
7/6/2011 11:07:44 PM
Online Dictionaries of English
231
institutional versus collective opposition (Fuertes-Olivera, 2009b); the latter category signifies a collaborative effort by a community of non-professionals, who can themselves be dictionary users; an earlier paper by Carr (1997) has also used the terms bottom-up and collaborative. User-involvement is yet another designation for a similar concept, while open stresses a slightly different aspect of what might again be a fairly similar formula.
11.2.2. Free versus Paid Collective dictionaries would normally be free to use. Conversely, institutional dictionaries need not necessarily involve fee-based access, so the free versus paid contrast is an independent one. It is also increasingly difficult to demarcate sharply between free and paid, with the clear cases leaving a substantial grey area in the middle, as revenue to the publisher can take different forms. For example, individual pay-per-view or subscription-based access is a clear case, but when syndicated as part of a more comprehensive service and sold, say, to libraries, the end user often does not bear the direct cost. Then there are cases where online access is offered (perhaps for a limited time) as a bonus for buyers of paper editions. Still closer to the free end of the cline are ad-supported dictionaries, and this appears to be a rather popular model at the moment.
11.2.3. Number of Dictionaries In terms of how many dictionaries are offered by the specific services, at least the following four options come to mind: 1. individual dictionaries: much like traditional printed dictionaries, there exist standalone, single online dictionaries; 2. dictionary sets consisting of clusters of related dictionaries may be offered from a single landing page; a good example is the Cambridge dictionaries online page;1 3. dictionary portals only include hyperlinks to actual dictionaries (examples will be presented below); 4. dictionary aggregators excel at pasting together the content of various dictionaries and serving them on a single page (again, examples of these will be discussed further down in the chapter). In my overview below, I will begin with some notable representatives of institutional dictionaries offered free of charge to the world internet community.
11.3. Institutional Dictionaries 11.3.1. General English Dictionaries General English dictionaries are traditional general-purpose dictionaries that provide a relatively rich microstructural treatment of (primarily) contemporary
9781441128065_ch11_finals_txt_print.indd 231
7/6/2011 11:07:44 PM
232
e-Lexicography
English, which is traditionally expected from general reference desk dictionaries, and where the word list is not restricted by domain or register.
11.3.1.1. American Traditional U.S. dictionary publishers seem to have embraced the Web: as many as three of the major American players on the market of general desk and college dictionaries make their dictionaries available online free of charge. These are the Merriam-Webster Online Dictionar y, American Heritage Dictionar y and Random House Unabridged Dictionar y, the last one being included only as part of the Dictionary.com service (on which see 11.4.1. below).
11.3.1.2. British Until recently, the available offer of online general-purpose dictionaries on the British scene had been less complete, with the traditional and most prestigious publishers (notably Oxford University Press) apparently hesitant about placing their products online for free. Only very recently did Oxford University Press create the new oxforddictionaries.com 2 lexicographic portal, built around two of the publisher’s recent dictionaries: the newest (third) edition of the Oxford Dictionary of English (under the heading World English), and its American counterpart, the New Oxford American Dictionary, also in its third edition. A premium subscription service is also available, with one year free access for buyers of the printed copy. The availability of the free/premium combination for these Oxford dictionaries exemplifies rather well the new business model that is currently being followed by a number of publishers: the model known by the linguistic blend freemium. The approach works on the principle that basic content and functionality is offered essentially free of charge (in response, we might say, to the free-lunch mindset of today’s netizens). The free offer, however, is used as an opportunity to market and sell extra content, which might be richer lexicographic data and/or non-lexicographic content, such as exercises or language testing materials. To continue with our example, the premium oxforddictionaries.com service offers the following extra features (Judy Pearsall, personal communication): z z z z z
sense-linked thesaurus of 600,000 synonyms and antonyms; advanced search and browse features; 1.9 million sense-linked examples from the Oxford English Corpus; audio pronunciations; My Oxford Dictionary personalization features;
9781441128065_ch11_finals_txt_print.indd 232
7/6/2011 11:07:44 PM
Online Dictionaries of English
233
z browsing and search by subject area, meaning category, part of speech and
so on; z four additional zones fully linked to dictionary content, including Writing
Skills zone, Writers and Editors zone, Example sentences zone, and Puzzles zone. To some extent, free online versions may drive the sales of paper copies – but, of course, this argument could be reversed, with online access deterring some potential buyers from purchasing a printed copy. Apart from the two Oxford dictionaries, there are also other notable British dictionaries available free of charge. Collins offers what it refers to as the Collins English Free Dictionar y.3 A closer examination reveals that this is not the same as the authoritative Collins English Dictionary; the latter, however, does seem to be available, but only as part of TheFreeDictionar y service (on which see 11.4.1 below). The venerable Scottish publisher Chambers offers on its website the Chambers 21st Century Dictionar y.4 Again, though not really the same as the renowned Chambers English Dictionary, the 21st Century is still a usable, solid reference work for general consultation. The Encarta World English Dictionar y, 5 having originated in a cooperation between the London-based Bloomsbury publisher and Microsoft, actually comes in two versions, and both are available via the same website. There is the World English version, marketed as the dictionary that provides unrivalled treatment of the regional varieties of English, and the localized U.S. version. The site provides an option to switch quickly between the two, and it is fascinating to observe, by switching back and forth, the differences in the coverage of regional terms as well as meaning, spelling and pronunciation.
11.3.2. Learners’ Dictionaries: The Big Five According to data from internetworldstats,6 English is the foreign language of some 86 per cent of Europe’s active internet users. Now, given that English is today’s de facto lingua franca and that WWW content in English dwarfs out that in any other language, it becomes clear that non-native speakers are a significant section of online dictionary users, present or future. In this context, the category of English learners’ dictionaries comes to the focus, since these are the reference works designed specifically with the non-native speaker in mind. English learners’ dictionaries enjoy a long-standing tradition, which goes back to around the 1940s or, as some claim, the 1930s (see Cowie, 1999). Their content has been meticulously reworked over numerous successive editions, and thanks to their worldwide customer base and the corresponding sales volumes, publishers of monolingual English learners’ dictionaries have been able to take advantage of select teams of expert lexicographers. These
9781441128065_ch11_finals_txt_print.indd 233
7/6/2011 11:07:45 PM
234
e-Lexicography
dictionaries have enjoyed high levels of prestige, as have their traditionally British publishers. The last few years has seen free versions of British monolingual dictionaries (MLDs) for advanced learners appear online, one by one. On the whole, the major British MLDs have followed a pattern of remarkable similarity (Yamada, 2010), perhaps as part of the competitive drive, and this is also reflected in the features offered in their online versions. There is also a more down-to-earth reason for the similarities found in a number of British MLDs: they tend to use the same software dictionary production platform from IDM. The range of available English MLD’s opens with the pioneer in this segment, Oxford Advanced Learner’s Dictionar y,7 a free version now roughly based on the eighth print edition. A long-time competitor, Longman Dictionary of Contemporary English, currently in its fi fth edition, has also offered a free online version8 for some time. The dictionary’s landing page specifically mentions a limitation of the free version: recordings of spoken pronunciation are only available for a subset of headwords and example sentences (more specifically, the audio is available for the entries in the letter stretches D and S). The note further states that audio recordings for all entries are available in ‘the CD-ROM version’: this is not quite accurate, as the optical disk version is actually offered on a DVD-ROM. But the free version is not the only online version of this dictionary: there is also a radically different premium online edition,9 which offers essentially the same content as the off-line DVD-ROM version. Cambridge Dictionaries Online10 represents an example of an institutional dictionary set (as defined in 11.2.3, above): apart from the flagship Cambridge Advanced Learner’s Dictionary, four other learners’ dictionaries from the publisher are available at the same address. Among the major British learners’ dictionaries, Macmillan English Dictionar y may well be the one to have made available the most complete set of lexicographic content online11 free of charge, including audio pronunciations of all headwords and a sense-linked thesaurus. The one member of the Big Five set which has remained apparently sceptical when it comes to offering free online access of any kind is COBUILD. Although it has provided subscription-based access for some time,12 none of this is available free of charge, if we disregard an outdated fourth edition being hosted on a third-party service.13 Recently, it looked as if COBUILD was set to become the most widely used learner’s dictionary when, in autumn 2009, Google apparently obtained a licence for COBUILD content and placed it online as the main Google dictionary for English. This was a questionable choice, as COBUILD is not really well-suited for the type of uses for which Google users would be most likely to need the dictionary, that is, problems with text reception: of all the major learners’ dictionaries, COBUILD has the smallest coverage (Rundell, 2006). On the other hand, the features supporting text production would remain underused. Google’s half-hearted implementation of the interface
9781441128065_ch11_finals_txt_print.indd 234
7/6/2011 11:07:45 PM
Online Dictionaries of English
235
certainly would not have made users more sympathetic towards the dictionary. For example, Google dictionary included COBUILD’s syntactic codes, but without a word of explanation anywhere. Surely, it is a long shot to assume that a casual user of the Google dictionary will appreciate the significance of a code such as ‘NVAR’ (in this case, an indication that a noun in the sense so marked has both mass and individuated uses). Considering all this, it is not at all surprising that in August 2010, COBUILD was replaced as the database for Google dictionary with The Oxford American College Dictionary (Judy Pearsall, personal communication).
11.3.2.1. American learners’ dictionaries Although it is the British publishers who lead the market of monolingual English learners’ dictionaries, such dictionaries have also been published elsewhere, and one particular dictionary that made a premiere recently with quite a bit of publicity is the Merriam-Webster’s Learner’s Dictionar y.14 What is rather unique about this dictionary is that the launch of its online version coincided with the publication of the first paper edition. The free online content includes audio pronunciation, and the user interface is at least as good as those of the British dictionaries; but despite the marketing claims, the lexicographic content itself is not groundbreaking, and still lacks a number of modern features now taken for granted in the leading British products (Bogaards, 2010; Hanks, 2009). The dictionary does have more examples than do the competition, but their quality has been questioned (Hanks, 2009). Despite what some might be led to believe, the Merriam-Webster’s Learner’s Dictionar y is by no means the first American dictionary of its type: several have already been published, and at least one of them, Heinle’s Newbury House Dictionary of American English,15 is freely available online. However, the latter is a rather small dictionary and not a particularly impressive one. All in all, learners of American English may actually be better off using British-published dictionaries of American English, such as the Cambridge Dictionary of American English.16
11.3.2.2. Louvain EAP Dictionary (LEAD) Apart from the established publishers, some academic centres are also trying to enter the field of learners’ dictionaries. One particularly promising project currently in progress (not yet publicly accessible) is the Louvain EAP Dictionary (LEAD), which is being developed as a dictionary for non-native writers. Its main novelty is that it is customizable in terms of field domain (business, medicine) and mother tongue (French, Dutch). In consequence, usage notes and equivalents match the L1 of the user, and some of the examples are
9781441128065_ch11_finals_txt_print.indd 235
7/6/2011 11:07:45 PM
236
e-Lexicography
domain-specific. The dictionary will also have (as you might expect from a product created at the Centre for English Corpus Linguistics) a solid grounding in corpora, and integrated corpus access.
11.3.3. User-Involvement (Bottom-Up) Lexicography In the democratic world of the internet, users can play lexicographer as well and create their own online dictionaries. There is quite an impressive range of these, but let us have a look at three representative exemplars.
11.3.3.1. Urban Dictionary A success story in its own right, the Urban Dictionar y17 is a true bottom-up initiative, which recently celebrated its tenth anniversary. One of the community features exemplified here is that users vote on the ‘best’ definitions. But such democracy does not necessarily serve lexicography well: as it turns out, the most liked definitions are not of the type that would really help someone who does not already know the meaning. Clearly, true explanatory definitions are too predictable and thus not ‘interesting’ enough, and are being pushed back to the bottom of the list. Instead, collaborative dictionary entries, unless properly moderated, tend to become the playing ground for showing off wit, marking in-group membership and venting prejudice. For example, one entry at the headword bootyism runs as follows: ‘The gospel according to Beyonce. Often confused with Buddhism.’ This entry is written in an abbreviated style posing as lexicographese, and manages to allude rather cleverly to the semantics as well as the origins of the slang term, but it would probably not be of much help to a user who has no clue about the meaning. In this case, the author seems to be aware of this deficiency, and makes up for it in the (entirely invented) example exchange: Todd: I’m thinking about converting to Bootyism. Michael: Nah man, it’s BUDDHISM. Todd: No, ‘cause in Bootyism all you do is worship ass.
11.3.3.2. Wiktionary Wiktionar y18 may be the ultimate collaborative dictionary. A recent in-depth analysis of this resource (Fuertes-Olivera, 2009b) presents a number of interesting findings. It is observed that, contrary to what is often claimed, Wiktionary is not a multilingual dictionary, but rather an English dictionary with a translation overlay for several other languages. It is also noted that very similar items may receive radically different treatments, lacking internal consistency and contradicting the Wiktionary guidelines.
9781441128065_ch11_finals_txt_print.indd 236
7/6/2011 11:07:46 PM
Online Dictionaries of English
237
11.3.3.3. Wordnik Wordnik19 presents an interesting blend of online dictionary genres, involving a collaborative community-driven component built around a ‘professional’ core. According to the founder Erin McKean (personal communication), usergenerated content is encouraged here but in ‘guided’ ways, with less emphasis on user-created definitions than is usual in collaborative projects. Wordnik embeds content from other datasets: at this time, Twitter and Flickr are being tapped for real-time citations and relevant images, respectively. The service employs modern data mining techniques to identify in corpora citations of the self-defining and exemplar types (McKean, personal communication). Overall, there is less reliance on traditional definitions and the emphasis is shifted to citations.
11.3.3.4. Collaborative-institutional dictionaries Commercial publishers also try to get their users actively interested and involved in lexicography, perhaps in an effort to persuade them to stay on the site and come back for more. Examples of collaborative sections hosted on institutional dictionary sites suggest that the opposition institutional versus collective dictionary (Fuertes-Olivera, 2009b) may no longer be a sharp one. Two such examples from well-known institutional publishers are the MerriamWebster’s Open Dictionar y20 and Macmillan Open Dictionar y.21 A perusal of the user-added entries reveals that most of the entries added would not meet the criteria for inclusion in the regular edition of the dictionary, and their presence merely provides evidence of the conventional wisdom that ‘the dictionary’ is a collection of ‘all the words’ of a language. Apart from adding open dictionary components, online dictionaries sometimes offer other extras aimed at involving the users. Recent add-ons include social networking features, such as the award-winning Macmillan Dictionary blog.22 So far we have discussed general dictionaries of contemporary English, aimed at both native speakers of English and foreign learners. Let us now move beyond these common types, to diachronic and specialized dictionaries.
11.3.4. Diachronic (Historical) Dictionaries Users of diachronic dictionaries are most typically language scholars, and so their level of sophistication and language awareness is normally far beyond that of lay users. As language experts, they can reasonably be trusted to make choices that a non-expert user will not be in a position to make, such as the explicit selection of microstructural data categories (and we will revisit the issue of customization in a later part of this chapter). The makers of scholarly
9781441128065_ch11_finals_txt_print.indd 237
7/6/2011 11:07:46 PM
238
e-Lexicography
diachronic dictionaries appear to be aware of these ramifications, as exemplified by the online version of what is perhaps the most famous dictionary world-wide (at least for English), the Oxford English Dictionary. Access to the OED is subscription-based, and affiliated scholars would normally rely on their institutional subscription rather than a personal one. In contrast, a more restricted (in terms of period) but no less voluminous Middle English Dictionar y23 has been freely available online since 2007, when the University of Michigan completed the digitization process with the help of a government grant. The dictionary offers a rather large number of technically complex search options, but these should be manageable for language scholars and their students.
11.3.5. Subject Field Dictionaries There are countless online specialized dictionaries out there on the Web, most of them fairly small in size, dealing with the vocabulary of a specific subject field (as well as narrower sub-fields). Because of the sheer number, many users will find it useful to consult online directories of such dictionaries, one of the most comprehensive being Glossarist.com: an example of a dictionary portal as listed in my provisional taxonomy under 11.2.3 above. Indexing portals of this type only include links to dictionaries on external pages, without themselves hosting or displaying actual lexicographic content. The lexicographic wisdom that content and presentation are largely two separate aspects is strengthened by those products where there is a sharp contrast in quality between one and the other. One case in point is Dorland’s Medical Dictionar y24 from the respectable pair Merck Medicus and Elsevier, where solid content is marred by the uninspired (to say the least) access interface. Users are presented with a long chain of alphabetic stretches which must be navigated linearly in a fashion resembling page-turning, only much slower (although there is a term-search window, it does not apply to the dictionary itself, but to other services). To meta-lexicographers, this dictionary serves as a warning against sweeping generalizations about electronic dictionaries being faster and superior in terms of access: apparently, it is perfectly possible to produce an online dictionary where access is more cumbersome than in a paper book.
11.3.6. Dictionaries with Restricted Macrostructure One way to think of special-purpose dictionaries is that they often involve systematically restricted treatment in either macrostructure or microstructure. In the earlier case, only a distinct subset of the vocabulary is included in the wordlist. Field dictionaries, already covered in 11.3.5 above, may be included
9781441128065_ch11_finals_txt_print.indd 238
7/6/2011 11:07:46 PM
Online Dictionaries of English
239
here. Another exemplar of a restricted macrostructure dictionary is the wellknown and highly successful Acronym Finder,25 which aims to include acronyms, including those pronounced as one word and letter by letter (sometimes called initialisms). Although Acronym Finder does not limit its headword list to English acronyms, it is a fact that English clearly dominates.
11.3.7. Dictionaries with Restricted Microstructure In contrast to dictionaries with restricted macrostructure, restricted-microstructure dictionaries are characterized by a systematic reduction, not in the word list itself, but in the lexicographic data categories presented at each entry, compared to a general dictionary. The free Online Etymology Dictionar y26 is a representative of the genre: the lexicographic data for a given headword is restricted to an explanation of the word’s origins. Pronouncing dictionaries are another major category of restricted-microstructure dictionaries, where the chief lexicographic data given indicates the phonetic form of the entry word. Semantic information is only given in exceptional cases, such as to disambiguate between graphemically identical words that are pronounced differently (i.e. homographs that are not homophones). There is the question of the exact form in which information on pronunciation is conveyed. In printed books, transcription (in one of a number of standards, the most universal being the IPA) used to be the only option, but in the multimedia environment of the Web, the expectation of users is to be able to hear an audio rendition of an item’s pronunciation. This expectation is met by the popular free online talking English dictionary howjsay.com, 27 which provides recorded audio clips, but no written transcription. At the other end of the cline are academic pronouncing dictionaries such as the Carnegie Mellon University Pronouncing Dictionary, 28 which presents transcriptions in the ARPAbet respelling system, or Péter Szigetvári’s English Pronouncing Dictionar y, 29 which employs a variant of the SAMPA respelling system (both being attempts at representing detailed phonetic symbols with ASCII characters only). There is no denying that being able to hear what the word or phrase sounds like is an asset, but does this mean, as most people seem to assume, that phonetic transcription is now dispensable? It probably is for native speakers of English, but hardly so for speakers of other languages looking up English pronunciation. For them, it is an illusion to believe that just hearing a word pronounced in a foreign language is enough to register, less still learn, its correct pronunciation. Due to the effect known as categorical perception, speakers of a language tend to hear foreign language sounds through the filter of their native language phonology. Consequently, what foreigners will hear is mostly their native language sounds and will tend to miss the distinctions not present in their own language. For example, a speaker of Polish may easily miss the
9781441128065_ch11_finals_txt_print.indd 239
7/6/2011 11:07:46 PM
240
e-Lexicography
difference between met and mat. The important advantage of phonemic transcription is that it provides an explicit graphic representation of the phonemes involved, drawing attention to the phonemes as entities. (This is not to say that the two academic dictionaries cited above do this in a very user-friendly way: they do not.) Of course, it is also true that efficient use of phonetic transcription does not usually come naturally to a language learner and requires guided training. But that is not the end of the story. Apart from pure phonemic identity, there is the important sub-phonemic phonetic detail, including positional allophony, which, again, is very hard to hear for the untrained learner. Although traditional printed pronouncing dictionaries tend not to give sub-phonemic detail, there is no principled reason why future online dictionaries should not be able to offer a choice of the level of transcription, including a narrowphonetic rendition for those who might want or need it. Technically, it should not be terribly difficult to take stock of at least the rule-based variants. As noted by Sobkowiak (2009), phonetic transcription has a representational function and an indexical function. The former has to do with the representation of the phonetic form of a word (or, more generally, other linguistic string). The indexical function allows the user to use symbols for accessing (sets of) lexical items, such as when looking for words that exhibit a given phonetic pattern. A systematic transcription system is at present a prerequisite for the indexical function to be possible, although not all dictionaries that do have transcription, allow ‘sound search’ options. Clearly, of the three free pronouncing dictionaries here presented, Szigetvári’s English Pronouncing Dictionar y is the most sophisticated in this respect.
11.3.8. Onomasiological Dictionaries Onomasiological dictionaries are those that are specifically designed to take the user from a concept or idea to linguistic form, rather than explaining the meaning or use of a given form. A traditional paper dictionary of this type would most typically be a thesaurus or synonym dictionary. Thesaurus.com 30 is a companion site to the popular Dictionary.com aggregator (see 11.4.1, below). A more interesting online example of such a dictionary is RhymeZone,31 which started off as a synonym dictionary calling itself the Semantic Rhyming Dictionary. Somewhat predictably, probably because of the phrase ‘rhyming dictionary’ in the name, users arrived at the dictionary from search engines looking for traditional phonetic rhymes, and this is what the default search mode now offers. In fact, searching for rhyming words is also an onomasiological query, albeit in a broader sense. In the more restricted sense of onomasiological, the dictionary offers lists of synonyms, antonyms and ‘related words’. For these, RhymeZone relies on data from the English WordNet 32 lexical database, just as
9781441128065_ch11_finals_txt_print.indd 240
7/6/2011 11:07:47 PM
Online Dictionaries of English
241
so many other lexical resources do these days: WordNet has become the dataset of choice for many online dictionaries, because it is free and NLP-tractable in ways that make such integration relatively easy. One interesting way in which WordNet data are used is in graphic visualization engines such as VisuWords33 or Visual Thesaurus,34 where the idea is to represent WordNet’s lexical relations in a visually appealing graphical form. The latter now shows up in Cambridge Dictionaries Online entries. Having completed a quick tour of the representative online dictionaries of English, we now move on to a number of overarching issues that are relevant and topical for online dictionaries of today and tomorrow.
11.4. Some Issues in Online Dictionaries 11.4.1. The Dictionary Web The World Wide Web is built around the concept of hypertext, where texts, documents and media make up an interconnected network. Like most other sites, online dictionaries hyperlink, interlink, embed and integrate, and it will not take long for a careful user of online dictionaries to start noticing that quite a lot of the same content crops up again and again on a variety of dictionary sites. For example, the very same Visual Thesaurus images which feature in Cambridge Dictionaries Online are also present at the Dictionary.com 35 site. The latter is an example of a dictionary resource that does not rely on its own data, but instead aggregates lexicographic content from other electronic (online) dictionaries. Dictionary.com is a particularly popular such aggregator. The popularity, one might suspect, has a lot to do with the attractive domain name, which to many users (and search engines?) strongly suggests that this is the dictionary (see e.g. Béjoint, 2010, on the popular image of the dictionary). As of this writing, the resource brings together lexicographic content from 15 dictionaries, including the American favourites Random House Dictionary and American Heritage Dictionary, as well as half a dozen special-purpose and special-subject dictionaries. Another aggregator is TheFreeDictionar y, with American Heritage Dictionary (again!), WordNet (again!) and Collins English Dictionary (and Thesaurus). The resource is worth consulting for this last one, as this time (compare 11.3.1.2 above) it is indeed the respectable Collins English Dictionary, which is generally not freely available elsewhere. While the ability to hyperlink and embed is one that lies at the heart of the World Wide Web, in dictionary aggregators the idea is taken to extremes, with the result that such dictionary portals produce absurdly long articles by mechanically pasting together, back-to-back, entries from several online dictionaries. These individual entries are often very similar, which results in
9781441128065_ch11_finals_txt_print.indd 241
7/6/2011 11:07:47 PM
242
e-Lexicography
highly unhelpful, many-times redundant, tortuous assemblages of disconnected lexicographic data.
11.4.2. Access Electronic dictionaries, including online dictionaries, are often praised for their access functionality, which is claimed to be superior compared to paper book form. Clearly, the electronic interface is by defi nition more flexible and has a potential for efficiency that is not achievable in static printed form, but it is also true that this potential is not always properly utilized, especially if the online dictionary is retrospectively digitalized (Wiegand et al., 2010: 209). One example of a respectable online dictionary with paperlike access is the American Heritage Dictionar y, which has no search facility at all; worse still is Dorland’s Medical Dictionar y (see 11.3.5, above), where outer access is even slower and more cumbersome than in a printed book. However, some online dictionaries do take advantage of the electronic media and explore alternative access routes. As an illustration of this issue, let us consider some access options in cases where a search term potentially returns large amounts of data.
11.4.2.1. The step-wise approach to outer access? More than ten years ago, Hulstijn and Atkins (1998) proposed what they called ‘step-wise access’ for electronic dictionaries. In this connection, it is interesting to observe how this proposal stands up in view of the practical implementations in online English dictionaries. For this, we must examine the volume of data that a dictionary presents to the user in those cases when a search term matches more than a single treatment unit, such as multiple lemmata (for instance items of different parts of speech), or includes multi-word expressions (MWEs), such as fi xed phrases, idioms or phrasal verbs. The spectrum of actual solutions seen in English online dictionaries can essentially be reduced to three options: 1. a menu of target items is presented; 2. a menu is presented, but the most likely choice opens by default; 3. partial entries are listed. The first option, by far the most common, can be illustrated using Macmillan Dictionary Online as an example. Here, a search on a word-long string team returns a vertical menu of nine matches, each one hyperlinked to an entry or subentry. The top of the menu looks like this:
9781441128065_ch11_finals_txt_print.indd 242
7/6/2011 11:07:47 PM
Online Dictionaries of English
243
team noun team verb dream team noun sales team noun Option 2. features in the Merriam-Webster’s Advanced Learner’s English Dictionary, where a search for team produces a similar list of seven items, but the first of these (here again, team noun) is already given as a complete entry immediately below the list. Option 3. is implemented in the online dictionary at myCOBUILD.com,36 available to buyers of the printed copy of the Collins COBUILD Advanced Dictionary. The approach is an intermediate one between a bare lemma list (Option 1.) and complete entries (Option 2.). As seen in Figure 11.1, showing the entry team in myCOBUILD.com, the dictionary interface alerts the user that multiple entries have been found, and then displays the top of each lemma with a More link leading to the complete entry for that lemma. Which of the three options is best? A universal answer, ignoring lexicographically relevant details such as the nature of the lookup situation and specific user needs and skills, rarely makes sense in lexicography, but let us offer some observations that might have a more universal appeal. Option 2 looks attractive, but there is a danger here that users may fail to recognize that the default choice (as here team noun) is the wrong one in their case. In contrast, Option
Figure 11.1: The entry for team in myCOBUILD.com as an example of a stepwise interface.
9781441128065_ch11_finals_txt_print.indd 243
7/6/2011 11:07:48 PM
244
e-Lexicography
1 seems relatively safe in terms of the risk of missing the right option, but the problem here lies in the economy of effort (aka laziness): users may lack the patience to navigate through the menu to actual full treatment, and may decide instead to ditch a tool which requires two much clicking work. In view of the above reservations, Option 3 might perhaps be optimal (other things being equal), and it is surprising that so few dictionaries have adopted it.
11.4.3. Customization and Profi ling in Online English Dictionaries A recent study by Tono (2011), the first dictionary use study ever to employ eye tracking, confirms the suspicion that dictionary users differ greatly in their consultation habits and strategies. The realization that different users have different needs and expectations lies behind efforts to vary or customize e-dictionaries (De Schryver, 2009; Verlinde et al., 2010), and, indeed, in some online dictionaries of English we have reviewed above, users do have some ability to control the presentation of lexicographic data. Oxford English Dictionary online has control buttons to display or hide away the following data types: Pronunciation, Spellings, Etymology, Quotations, Date Chart, Additions. It should be observed that this solution is not really lexicographic-function-driven (Tarp, 2008a), as the user here is required to explicitly select the data fields included in the dictionary. However, the users of a scholarly dictionary such as this one usually represent a high level of sophistication (many being language scholars), and so they are much more likely than naive users to know directly and explicitly what data types they actually need. Macmillan English Dictionary Online offers two pre-packaged presentation modes which can be selected by flipping the Show Less/Show More control button located next to the lemma sign. The choice is suggestive of the difference between a text reception mode and a text production mode, respectively. Switching to the more basic mode hides away the phonetic transcription, collocations (with examples), grammar labels and some of the examples. However, synonym links are still included, even though, arguably, a synonym list is not very useful for text reception. Only a minority of dictionary users will be aware that the dictionary has a third, even simpler mode, available via the socalled interstitial page, accessible from collaborating news sites37 by double-clicking on any word in the text (luckily, the engine includes lemmatization, so the wordform stealing takes the user to the lemma steal). In this mode, all examples and synonyms are now absent, as one would expect in true reception mode. User profiling is one of the highlights in the new Louvain EAP Dictionary (see also 11.3.2.2 above), now in development, where the content presented
9781441128065_ch11_finals_txt_print.indd 244
7/6/2011 11:07:49 PM
Online Dictionaries of English
245
depends on the user-selected native language and discipline (field domain) of interest.
11.4.4. Multimedia in Online Dictionaries Online dictionaries can potentially include a range of multimedia content. The potential is utilized in online dictionaries of English to varying degrees.
11.4.4.1. Graphics Graphical elements are not the sole domain of electronic dictionaries, as drawings and (to a lesser extent) photographs, diagrams and tables have been used for a long time in paper dictionaries. However, pictorials are more easily and cheaply included in electronic dictionaries (Lew, 2010). For example, illustrations are present in some entries in Cambridge Dictionaries Online or the free online version of Longman Dictionary of Contemporary English. Thanks to the linkability of the Web, it is quite possible to embed media from other providers. However, one has to count with the ramifications of limited control over hyperlinked content. For example, between (roughly) November 2009 and June 2010, the Google Dictionary used to display popular images from Google’s own image search service next to some entries. As a consequence, the Google Dictionary entry for kilt included a photograph which, likely without conscious intent, conveyed all too clearly the cultural information that kilts need no accompanying underwear (in the interest of propriety, no screenshot is included here). As of this writing, the Google Dictionary has discontinued the inclusion of images.
11.4.4.2. Audio It is becoming increasingly popular for online dictionaries of English to offer audio recordings of entry words. However, recordings of other verbal elements (definition, examples) are rarely included: of the dictionaries discussed in this chapter, it is only the subscription version of Longman Dictionary of Contemporary English which offers spoken recordings of all example sentences. One novel use of audio is to present characteristic sounds associated with the entry word: an interesting subgenre of ostensive defining. Proposals to include such elements in electronic dictionaries have been made by Dodd (1989: 91) and Ooi (1998: 112). Dodd called them sound effects, and such recordings are now available in the free Macmillan English Dictionary Online. There, the user
9781441128065_ch11_finals_txt_print.indd 245
7/6/2011 11:07:49 PM
246
e-Lexicography
can hear the sounds produced by musical instruments under their relevant headwords, both popular ones (guitar, piano, violin, recorder), and less well-known (sitar). Animal noises and bird calls are likewise included (roar, hoot: perhaps also worth linking under the entries lion and owl), as well as sounds made by humans (clap, laugh, hiccup) and noisy machines (train, helicopter).
11.4.4.3. Video and animation With the speed of the internet steadily on the increase, video content is becoming mainstream on the Web. However, English online dictionaries have not really embraced the video technology so far. This caution may, in fact, be wellfounded: Chun and Plass (1996) point out that video sequences are too transient to allow the spectator to build a stable mental model. Thus, videos may not make good cognitive sense, because the viewer may be unable to pace the information processing at the rate that works for them. Similar reservations can be raised for animated graphics, and there is at least one empirical study that appears to substantiate the pessimistic view of the effectiveness of animations, at least for dictionary-induced vocabulary learning. Lew and Doroszewska’s recent study (2009) found a strong and significant negative impact of viewing animations on vocabulary retention.
11.4.5. Dictionaries, Corpora and Lexical Databases We have seen above repeatedly online dictionaries using WordNet data. In fact, WordNet is often loosely referred to as a ‘dictionary’, even though, in more careful usage, it is a lexical database rather than a dictionary. I suspect that for the average user, the distinction is too fine a point. Yet, if we look at the recent history of dictionary-making, we see the growing role of information technology and structured data: corpora, databases, the use of structured markup such as XML. The current trend then is towards a clearer separation of the data layer from presentation, in line with Sue Atkins’ visionary proposal (1996). Increasingly, the dictionary as the user sees it is likely to be but an epiphenomenon on a structured lexical database or corpus, and the presentation layer is set to become an automated procedure, requiring little or no human intervention (De Schryver, 2009; Atkins et al., 2010; Kilgarriff & Rychlý, 2010; also see Nielsen & Almind, Chapter 7). Indeed, as corpus interfaces and wrappers get increasingly sophisticated, they can be used in ways similar to dictionaries, so that even a more cultured user may not care what’s ‘under the hood’ as long as the interface can be used as a sort-of dictionary. As an example, consider the fully automatic collocations dictionary ForBetterEnglish.com,38 which uses the SketchEngine and
9781441128065_ch11_finals_txt_print.indd 246
7/6/2011 11:07:49 PM
Online Dictionaries of English
247
Figure 11.2: Entry for tooth in the ForBetterEnglish.com automated collocations dictionary.
GDEX technologies (Kilgarriff et al., 2008) on server-resident corpora to automatically produce entries such as the one in Figure 11.2. Clearly, it takes quite an expert to tell that this is not your usual human-made dictionary entry. The illusion would have been even better if the type-of-collocation indicators (object of, etc.) had been given less technical and more user-friendly names. Another corpus-based online resource, also having to do with English collocations, JustTheWord,39 is even capable of correcting unnatural word combinations. Figure 11.3 shows the output for the query powerful tea with the ‘find alternatives’ option selected. The interface indicates whether the word combination is ‘good’ (green bar on the right, colours not shown in print), or ‘bad’ (red bar) and the length of the bar indicates the (un)typicality of the word combination. Further, the narrow blue bar directly underneath each combination indicates the degree of meaning similarity between the combination to be replaced and each candidate for replacement. Here, the collocation strong tea has the longest blue bar, and indeed this is the idiomatic phrase that a learner of English would have wanted to use instead of the nonidiomatic powerful tea, had they known any better themselves. All in all, the information provided is useful and relevant, and it may actually be hard to believe that this output has been computed fully automatically. There exist other ‘smart’ interfaces to corpora. One of them is http://corpus.byu. edu, created and maintained by Mark Davies, and it offers free access to several corpora, including the Corpus of Contemporary American English (COCA),40 currently the largest publicly available corpus of English. Another one is the
9781441128065_ch11_finals_txt_print.indd 247
7/6/2011 11:07:49 PM
e-Lexicography
248
Figure 11.3: JustTheWord alternative collocation suggestions for ‘powerful tea’.
SketchEngine,41 available by subscription. A subset of the British Academic Spoken English corpus is available through IBM’s many eyes42 clever visualizing interface, allowing the user to investigate the syntagmatic relationships of the most common words, though it is not all that useful for the less common combinations, due to small corpus size. A rich and comprehensive lexical database of English with a dictionary-like interface will very soon become publicly available online as part of the DANTE 43 project. These resources represent a high level of sophistication and so there is not much hope that their popularity will extend much beyond a relatively small group of power users; the others will just increasingly Google for any answers, irrespective of the nature of the problem, and I fear that this tendency presents a real threat to more specialized reference tools, including dictionaries.
11.5. Summary and Conclusion In our necessarily sketchy overview of English online dictionaries, we have seen that a great variety of dictionaries exist, and that, without proper guidance, users run the risk of getting lost in the riches. It is surprising to see so many of the online dictionaries (including quite a few from respectable publishers) still largely constrained by the paper model, with access mechanisms to lexicographic data often being substandard for today’s technology. Furthermore, users may get flooded with irrelevant and highly repetitive information, especially by dictionary aggregators. And even if hyperlinking to external sources
9781441128065_ch11_finals_txt_print.indd 248
7/6/2011 11:07:51 PM
Online Dictionaries of English
249
embodies the best practice in hypertext philosophy, it is not without danger, as it relinquishes much of the control over the content of ‘our’ dictionary page. More generally, the universal use of search engines (or one dominant search engine) presents a risk of dictionaries (or any specialized online works of reference) being marginalized. Finally, learners of English are still waiting for a function-driven lexical resource of the type represented by the excellent Base lexicale du français 44 (Verlinde et al., 2010, and Chapter 13).
Notes 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24
25 26 27
28 29 30 31
http://dictionary.cambridge.org http://oxforddictionaries.com www.collinslanguage.com www.chambersharrap.co.uk/chambers/features/chref/chref.py/main http://encarta.msn.com/encnet/features/dictionary/dictionaryhome.aspx www.internetworldstats.com w w w . o u p . c o m /e l t /c a t a l o g u e /t e a c h e r s i t e s /o a l d 7/ l o o k u p ? o u p _ jspFileName=document.jsp&cc=pl www.ldoceonline.com http://ldoce.longmandictionariesonline.com/dict/SearchEntry.html http://dictionary.cambridge.org www.macmillandictionary.com www.mycobuild.com http://dictionary.reverso.net/english-cobuild www.learnersdictionary.com http://nhd.heinle.com/home.aspx http://dictionary.cambridge.org/Default.asp?dict=A www.urbandictionary.com http://en.wiktionary.org www.wordnik.com www3.merriam-webster.com/opendictionary/ www.macmillandictionary.com/open-dictionary/latestEntries.htm www.macmillandictionaryblog.com, winner of the 2009 Edublog award for best education blog on the web http://quod.lib.umich.edu/m/med www.merckmedicus.com/pp/us/hcp/thcp_dorlands_content_split.jsp?pg=/ ppdocs/us/common/dorlands/ drlnd/misc/dmd-a-b-000.htm www.acronymfinder.com www.etymonline.com www.howjsay.com, the domain name being an eye-dialect rendition of the casual pronunciation of the phrase ‘how do you say?’ www.cmu.edu http://seas3.elte.hu/epd.html http://thesaurus.com/?regHome=true www.rhymezone.com
9781441128065_ch11_finals_txt_print.indd 249
7/6/2011 11:07:53 PM
250 32 33 34 35 36 37 38 39 40 41 42
43 44
e-Lexicography
http://wordnetweb.princeton.edu www.visuwords.com www.visualthesaurus.com http://dictionary.reference.com www.myCobuild.com One example is www.shanghaidaily.com http://forbetterenglish.com http://www.just-the-word.com/Sharp Laboratories www.americancorpus.org www.sketchengine.co.uk http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/3e3354583586 11de909d000255111976 www.webdante.com http://ilt.kuleuven.be/blf
9781441128065_ch11_finals_txt_print.indd 250
7/6/2011 11:07:53 PM
Chapter 12
e-Dictionaries in the Information Age: The Lexical Constellation Model (LCM) and the Definitional Construct1 Aquilino Sánchez Pascual Cantos
12.1. e-Dictionaries on the Web: The Case of Spanish e-Dictionaries A quick Google search for ‘Spanish dictionaries online’ produces more than 10,400,000 references. A very high figure indeed! However, if we repeat the same search in the advanced mode, looking for the exact phrase, the results are dramatically reduced to 8,240 references, a more realistic picture. If we go further and access the links, the results are a sheer surprise: online monolingual Spanish dictionaries are extremely rare. In fact, we find: 1. Diccionario de la Real Academia Española (DRAE) 2. Clave (SM) 3. Several other sites offer definitions of Spanish words retrieving the information, almost always, from the DRAE 4. A few other sites offer the definition of words in Spanish under certain conditions (after registering and paying a membership fee). This is the case of the monolingual dictionary VOX and the DUE (María Moliner) In addition to this, there are several monolingual dictionaries in electronic format, on CD or DVD (some of them also on the Web, as mentioned above). This is the case of the 1. 2. 3. 4. 5.
Diccionario de Lengua Española (Real Academia de la Lengua Española) DUE (Gredos) GDUEsA (SGEL) CLAVE (SM) VOX
9781441128065_ch12_finals_txt_print.indd 251
7/6/2011 11:08:09 PM
e-Lexicography
252
This prompt survey regarding monolingual Spanish e-dictionaries on the internet does not trigger much enthusiasm. Spanish does not offer the user abundant, varied and free lexicographical materials on this medium. The offer of e-dictionaries on CD/DVD support is also quite limited in number. These figures refer only to quantity. Spanish e-dictionaries can also be analysed from the point of view of the content they provide. In this respect, all of them, but one, are a sheer copy of standard paper-dictionaries. The advantages of e-dictionaries are those peculiar to e-word technologies: easy and quick access to information, instant information retrieval and, in some cases, access to basic cross-reference hypertext-like searches. The macrostructure of the entries is the same as in printed dictionaries: the lemma, followed by the word class, gender and number, meanings (ordered by etymological or histor ical criteria), idioms (when this is the case), and synonyms and antonyms (when this is the case). Only one of the monolingual Spanish dictionaries available on CD includes some new information: the GDUEsA 2. It offers information on the frequency of each word by means of frequency bands (ranging from 0 to 5 stars), and on the number of meanings, sub-meanings and set phrases included under each headword. On top of that, it is possible to adjust word searches by means of frequency criteria (frequency bands). Searches can be restricted according to frequencies, from frequency 5 (the highest) to frequency 1 (the lowest), or 0 frequencies (very rare words: hapax legomena/ dislegomena). Searches can also be carried out by means of part-of-speech criteria (nouns, adjectives, etc.). In addition to these search facilities, the interface takes advantage of colouring to differentiate the headword, the grammatical class, the lexical/semantic context (whenever made explicit), and the examples of usage (extracted from the corpus). Spanish monolingual dictionaries on the internet are, therefore, paper dictionaries transferred to electronic format. Consequently, it can be stated that Spanish e-dictionaries have not yet entered phase 3 (Cerquiglini, cited in Pruvost, 2000: 188), that is, a phase in which dictionaries pass from phase 1 (computer-assisted paper lexicography) to phase 2 (transfer of existing paper dictionaries to an electronic medium) and arrive at phase 3 (electronic dictionaries in their own right). In this phase, e-dictionaries become truly independent from paper dictionaries, with the addition of meaningful changes in the content offered and in the access structures and results provided (Atkins, 1996; Grefenstette, 1998; Leech & Nesi, 1999; Nesi, 2000; de Schryver, 2003, among others).
12.2. From Printed Words to e-Words Currently e-dictionaries are basically digitalized versions of paper-dictionaries, in other words, the result of shifting from the printed word to the electronic
9781441128065_ch12_finals_txt_print.indd 252
7/6/2011 11:08:09 PM
e-Dictionaries in the Information Age
253
word. Consequently, the transfer from paper dictionaries to e-dictionaries has not brought with it substantial changes in lexicography and/or lexicographic data, except in accessibility to lexicographic information and in the way in which data is stored. It is obvious that we cannot say, however, that e-dictionaries are exactly the same as paper dictionaries. e-dictionaries versus paper dictionaries gain, for example, in, among other features: z z z z z z z
Portability Accessibility Retrieval speed Cost and price Circulation potential Economy in size and Ecology and resources needed
The shift from paper to e-dictionaries may imply important consequences and, hopefully, benefits and advantages. But such a shift cannot be restricted to merely moving from paper words to e-words, or to improving accessibility to lexicographic information. We must reach ‘Phase 3 e-dictionaries’ (Cerquiglini, cited in Pruvost, 2000: 188), a phase in which the potential of the e-word must join with the potential derived from easily accessing abundant, relevant lexicographic information, the potential increase in its size and the varied possibilities offered by the Internet. e-dictionaries, in fact, should not be considered only from the perspective of the e-word upon which they are based. The e-word, its accessibility, storing and processing potential, have opened up new approaches to dictionary making and offer new possibilities for lexicographers, publishers and users. Printed dictionaries have been produced for the last 500 years or so in book format, and early dictionaries were very simple in layout and in content. Paper dictionaries today are significantly more comprehensive; printing and paper are of much higher quality; the content has evolved for the best: part of the information provided in modern dictionaries was absent in previous centuries, and so on. The conclusion is that modern paper dictionaries have, as a whole, come a long way from the paper dictionaries of the beginning of the sixteenth century in terms of quality of presentation, the number of words defined, the explanatory power of the definitions and so on. We may easily imagine the process of change from manuscript to printed dictionaries. Lexicographers at that time engaged in discussions similar to the ones we engage in today, now that we have entered the electronic age. The process of change from manuscript format to printed books required an adaptation period, perhaps a ‘painful one’ for some people. Writing a dictionary by hand versus producing it on a press implied, among other things, significant differences in readability, production efficiency, access to information and
9781441128065_ch12_finals_txt_print.indd 253
7/6/2011 11:08:09 PM
254
e-Lexicography
retrieval speed, cost and price, circulation potential, the amount of content included and so on. At least some of the circumstances peculiar to this ‘transition’ are similar to the ones we face nowadays. The transition from manuscript to printed dictionaries was no doubt gradual. And, most likely, throughout the whole of this period the issues occurring underwent a maturing process. The facilities of printed books favoured the progressive increase of the lexical information provided. New meanings were added, and new words were included, hand in hand with the development of society and the communicative needs of the speakers. Printed dictionaries gradually added information on etymology, directions for pronunciation (in English lexicography at least), illustrative examples of usage, specification of the part of speech and so on. The definitions originally based on providing merely synonyms or near synonyms (as had been the case of the glosses or margin notes in manuscripts) evolved to more complex meaning descriptions, as the dictionaries by John Florio (1598), Bullokar (1623), Johnson (1755) and Merriam-Webster (2002) – to mention but a few – reveal.
12.3. e-Dictionaries De Schryver (2003:152) highlights eight positive points when referring to paper dictionaries. They . . . make language palpable. can be admired in a library. are easy to browse and read recreationally. do not stress the eyes as much as computer displays. are easy to annotate. are durable. do not require electricity. do not have a potentially obsolete interface. Indeed, more advantages – and some disadvantages, also – could be added to these. Paper dictionaries belong to what we may call ‘the age of printing’, a large period (more than 500 years) in which the transmission and storage of knowledge has almost exclusively depended on paper. e-dictionaries belong to ‘the electronic age’, recently ‘inaugurated’, where storing and transmission of knowledge, information and data have changed and do not depend on paper alone, but on various other electronic devices, with their own characteristics and potential for circulation and storage. The electronic word brings with it a potential revolutionary change and challenges traditional printing in several ways. Such a challenge does not only refer to the nature of the e-word, or to its accessibility. A large number of positive consequences for lexicographers relate to the potential of the e-word for
9781441128065_ch12_finals_txt_print.indd 254
7/6/2011 11:08:10 PM
e-Dictionaries in the Information Age
255
enriching lexicographical resources, for accessing and managing data useful for the elaboration of lexicographical products and for presenting those products to final users. Several scholars (Crystal, 1986; Dodd, 1989; Zgusta, 1991; Atkins, 1996; Grefenstette, 1998; Leech & Nesi, 1999; Zaenen, 2002, among others) have referred to the dictionary of the future as something that must undergo substantial changes. The e-word opens the door to new resources and new ways of extracting relevant information or that needed for improving dictionaries. This runs parallel to the need for ‘new’ lexicographers. In order to innovate in the lexicographical field, lexicographers must also learn about the new resources available and how to manage them. Extracting lexical and relevant information from language use depends when all is said and done on lexicographers. The real potential of e-dictionaries is still unexploited and undefined. On the one hand, the basic source of e-dictionaries still lies in printed dictionaries. On the other hand, the physical support itself, the devices and tools for processing the e-word have evolved significantly throughout the short life of ‘the electronic age’, and keep evolving fairly quickly. The early computers and modern PCs, notebooks or netbooks, are very different in terms of price, potential and accessibility. Devices and means for processing and storing electronic information have also evolved from the diskette, to the CD, DVD, Bluray or flash memory gadgets; the storing and transmission of information have enlarged their capabilities, from the restricted potential of the early personal computer to the global and overreaching potential of the Internet. The PC platform has given way to the internet server. At the end of the first decade of the twenty-first century, website hosts may be somewhere between 15 and 20 million in number (Roby, 2004: 50) and are easily accessed anywhere in the world. Storage, processing and transmission of information are extremely relevant in dictionary-making. On the one hand, dictionaries are tools for information (information on word meanings). The kind of information dictionaries provide is large and extensive, and highly itemized. It is, therefore, important that the access to or retrieval of the specific information users look for should be quick and easily found. On the other hand, dictionaries need physical space for storing data. Paper dictionaries had well-known standards concerning suitable weight and number of pages, (ca. 3 kg and 2,000 pages for large-size dictionaries). e-dictionaries must also consider space, but the ratio against paper dictionaries is much more advantageous. The information contained in a large paper dictionary may take 100 Mb (excluding voice recordings). The increase in space storage to one or more GB (that is, ten or more times higher) is within the reach of any PC today. In the near future, this potential will certainly increase. Consequently, the storage capacity of computers overcomes some of the restrictions typical of paper dictionaries.
9781441128065_ch12_finals_txt_print.indd 255
7/6/2011 11:08:10 PM
e-Lexicography
256
Paper dictionaries must be carefully and efficiently structured and the information provided for each lemma must be adequately filtered in order to include only the essentials or most prominent features. Space is crucial. Additional information, such as grammatical irregularities, phonetic transcription, real examples, set phrases, syntactical information, collocates, more extensive and clear definitions and so on, must be tightly controlled in order to keep to specific space limits. e-dictionaries do not have such space restrictions. The use of powerful servers allows for a significant increase of the storage and computing facilities. Internet dictionaries may, therefore, store more lexicographic information. At the same time, retrieval facilities are very quick and open to a wider audience.
12.4. Beyond Paper Dictionaries: The Definition, a Construct in Progress The core of dictionaries lies in the definitions, since these convey the meanings of words. Consequently, definitions are and should be the main issue in lexicography, yet not the only one. e-dictionaries are still dictionaries, not encyclopaedias, or general guides for specific knowledge fields. We may put them together and blend them in a new kind of product (dictionaries and encyclopaedias are in some way complementary), but even in this case the new product will still include the function typically associated with dictionaries. In any case, the users of a language need to keep and preserve their code for communication, and in doing so they must keep to the meaning assigned to words. From that perspective, dictionaries are essential: they are reference works, repositories of the meanings associated to specific words or lexical units. The electronic word and various other technological tools may facilitate the function of dictionaries but will not change their substance. Databases, huge as they may be, access to audiovisual devices, speed in the transfer of information and so on, may help in illustrating and understanding the meanings of words, or in making them more transparent; they will not, however, fully replace word definitions as usually found in dictionaries. Emphasis on the centrality of definitions (and, hence, meaning) in lexicographical works does not aim to play down the importance, efficacy and usefulness of media and technological tools, or the needs of the users. Our conviction is, though, that change in lexicography will be incomplete if it does not affect the definitional process and the nature of definitions themselves. Dictionary entries in general are far from simple, even though some words are defined in an extremely simple way, as is the case of definitions through synonyms. The history of lexicography reveals that definitions have been varied in nature and heterogeneous in extension. Regarding the nature of the definition itself, the Aristotelian genus et diferentiae type is the most common
9781441128065_ch12_finals_txt_print.indd 256
7/6/2011 11:08:10 PM
e-Dictionaries in the Information Age
257
one. Other types of definition may also be found, such as the ‘synthetic definition’, the ‘rule-method definition’ (Landau, 2001), or the ‘intensional’ or ‘extensional’ types (Svensen, 1993). The ‘functional’ definition, often used in informal and colloquial settings and based on specification of the ‘function and purpose’ of the object defined, may also be added to the list. Some dictionaries are more detailed and explicit in their explanation of meaning (they may include abundant descriptive features of the object or thing itself), while others are more concise and look for the essential features of the definiendum, which require a more abstract definition, centred on class features and not so much on the specific description of the object or thing itself. The extension of the entry is also conditioned by the number of meanings each lemma has. Obviously, more meanings take up more space, and require more defining and contrastive features, which again means more space and, therefore, a larger physical capability (size). In this respect, paper dictionaries are much more limited than e-dictionaries. The space at our disposal in e-dictionaries significantly enlarges the frontiers for increasing the amount of information we may include in order to enrich and make more transparent the meaning of the words in focus. Approaching meaning from a lexicographical point of view must take two questions into consideration: (i) where to find it, and (ii) how to present it to the user. Meaning lies in words in use. Discovering and identifying word meanings requires, as a result, relevant and representative linguistic samples. Only linguistic samples may offer real instances of language use. Dictionaries usually take words as basic units of meaning. Meaning, however, goes beyond words: in fact, word meaning is finally shaped by context and words themselves are ‘disambiguated’ through contextual clues (Sánchez, Cantos & Almela, 2007). The power of context in the shaping of lexical meaning is widely admitted (Almela, 2006; Hoey, 2005, among others). This generally accepted conviction is, however, poorly represented in lexicographical work and praxis. Words are typically defined as if they were isolated and somehow monolithic units, with only occasional reference to contextual elements (especially collocates) and with scant information on the intricate web of semantic relationships that shape the units of meaning we call words, or the complex semantic relationships among words associated with specific lexical fields. Words of the same lexical field typically share some features, which function as the semantic ‘glue’ that guarantees cohesion and supports semantic dependency among them; lexical differences emerge by adding novel semantic features to the ones they share with other words in the group. The structure of meaning in phrases or sentences adjusts to a similar pattern. Most dictionaries are still in their infancy if analyzed from the point of view of the information they include on the semantic power of context. The e-word offers an excellent opportunity to compile representative amounts of
9781441128065_ch12_finals_txt_print.indd 257
7/6/2011 11:08:10 PM
e-Lexicography
258
linguistic data (corpora), analyze them, extract individual and contextual lexical information, systematize it and present it to the lexicographer. This is one of the faces of meaning in the electronic age: the potential to access linguistic materials capable of providing abundant and reliable input for meaning identification. The other side of ‘meaning’ considered here and closely related to and depending on the previous one, refers to the building of the definition itself, the definitional construct. The Aristotelian type of definition (genus et differentiae) assumes that we first classify the object or thing defined, and then we proceed to specify the ‘differences’, that is, the unique characteristics that make it possible to differentiate the things or objects included in a class. This definitional procedure applies in most dictionary entries. Specification of the differentiae permits, however, a wide spectrum; if the object or thing defined is well known by the speaker/ hearer, the amount of differentiae is reduced to a minimum. The knowledge of the world we store in our mind supplies most of the information we need in the identification process. Yet if the definiendum is unknown, the need for more differentiae may increase significantly. General purpose dictionaries will by necessity face the problem of how much information (differentiae) should be included, and the problem is usually solved by adjusting the size, number and substance of definitions to ‘common sense’, a polite way for stating that the lexicographer alone decides – consciously or not – on the needs of average dictionary users. Moreover, lexicographical tradition plays a key and decisive role in this respect. Consequently, dictionary entries, as definitional constructs, have been and will probably continue to be constructs in progress, varying in content, size and format. The characteristics of the speakers of each language, the different and heterogeneous kind and amount of knowledge users have of the language, of the words themselves and of the world around them can only result in a variety of definitional constructs. Lexicographers may systematize such variety and produce a reasonable typology of definitional constructs. e-dictionaries may face that problem afresh, with more information available, better tools at their disposal and more elaborate models.
12.5. The Semantic ID of Words One of the leading principles in lexicographical definitions has been ‘conciseness’. Conciseness aims to avoid superfluous or non-essential information and gain in efficiency by selecting the semantic features strictly necessary for the identification of meanings. This is why conciseness is a most precious goal in paper dictionaries, where space is limited. From the perspective of e- dictionaries, the urgent need for space is not as pressing; definitions could,
9781441128065_ch12_finals_txt_print.indd 258
7/6/2011 11:08:10 PM
e-Dictionaries in the Information Age
259
therefore, favour completeness rather than concision, be generous in details and include more relevant information. An increase in the definitional potential of entries requires not only more space, but also having access to adequate sources of meaning. The information provided by corpora and the internet offers sufficient linguistic data to illustrate usage in the right context. Specific corpus linguistics techniques also facilitate a deeper and more thorough lexicological analysis of meaning. The synergy of (i) a lexicological analysis of authentic data and lexical resources and (ii) a novel lexical meaning-structuring model, such as the Lexical Constellation Model (Cantos & Sánchez, 2001; Sánchez, Cantos & Almela, 2007), might result in a more plausible model for better capturing the structure and the complexity of word-meaning(s), and thereby lead to a more efficient definition. Each definition is (or should be) and functions as (or should function as) the exclusive semantic ID of the word defined. All the words of a language have (or should have) their ‘exclusive’ ID in a dictionary. But the information provided in lexical IDs is not to be necessarily restricted to the minimal features needed to differentiate words. It may also include additional and redundant information in order to further specify some features or simply to offer non-essential information, if such information helps with identification of the meaning of the word. e-dictionaries may ‘tolerate’ redundancy at very low cost in terms of space and search facilities. The addition of more information in definitions may also favour a more fluent and rapid comprehension of the lexical items defined. An elephant and a lion share the feature ‘mammal’, or ‘walking on four legs’. The addition of the feature ‘long curved tusks’ suffices to differentiate the one from the other (applying to elephants but not to lions). But more features could be mentioned as well in the definition, even if they are not strictly necessary. The feature ‘trunk’, ‘vegetarian’ or ‘carnivorous’ would contribute to reinforcing the contrast between both animals. If dictionary users lack some basic knowledge of the objects, things, beings or concepts defined, the presence of abundant and redundant features will no doubt help to disambiguate potential meanings.
12.6. Definitions: Should They Be Different? Dictionaries typically define words as if they were isolated units, kept in the mind of the lexicographer as separate entities. Lexicographers, however, do not fully agree in the exact externalization of the concepts associated to words (perhaps because their concepts do not fully match either). When a lion is defined, the reader needs to differentiate the lion from any another being/ thing in the world. Dictionaries typically include a variable array of features.
9781441128065_ch12_finals_txt_print.indd 259
7/6/2011 11:08:11 PM
e-Lexicography
260
The word lion is found in four English dictionaries with the following defining features:
Webster’s New World College
Merriam-Webster’s
Cambridge Advanced Learners’ Dictionary
The New Oxford Dictionary of English
1. a cat (Panthera leo), 2. large, 3. powerful 4. found in Africa and SW Asia, 5. with a tawny coat, 6. a tufted tail, 7. and, in the adult male, a shaggy mane 8. in folklore and fable the lion is considered king of the beasts.
1. 2. 3. 4.
1. 2. 3. 4. 5.
1. 2. 3. 4.
5. 6.
7. 8. 9.
a large carnivorous chiefly nocturnal mammal (Felis leo) of the cat family that is now found mostly in open or rocky areas of Africa but also in southern Asia and that has a tawny body with a tufted tail and a shaggy blackish or dark brown mane in the male.
a large wild animal of the cat family with yellowish brown fur 6. which lives in Africa and southern Asia.
a large tawny coloured cat that lives in prides 5. found in Africa and NW India 6. he male has a flowing mane 7. and takes little part in hunting, which is done cooperatively by the females.
The four dictionaries share some features (five of them): cat, large, with a ( . . . ) mane, tawny/brown (coloured), found in Africa and Asia. Several others are not shared (mammal, carnivorous, wild, powerful, nocturnal, tufted tail, (cooperative) hunting). Moreover, categorization varies significantly, with three categories appearing as the basic ones: mammal, animal and cat. All cats are animals and mammals, but not all animals or mammals are cats, and not all animals are mammals. If the base-category is different, the differentiae or defining features within each category should be at least partially different. The different categorization, however, does not seem to have been taken into account in the definition offered in each dictionary. Definitions including the category animal or mammal specify that it is of the ‘cat family’. The one including the category cat does not specify that cat is included within the animal or mammal categories. This information is assumed to be already known by the user. Could lexicographers assume that users have also other bits of information? Why not start by defining ‘lion’ as a ‘carnivore’, or the ‘largest cat’? Moreover, who decides which additional features should be selected in the definition? Is it more efficient to mention the tail, wild and so on, than nocturnal or mammal? How do lexicographers come to those decisions? There is an obvious lack of homogeneity and systematization; disparity abounds. Depending on
9781441128065_ch12_finals_txt_print.indd 260
7/6/2011 11:08:11 PM
e-Dictionaries in the Information Age
261
the knowledge of the world the reader has on the concept of the thing defined, one definition or another could be more important and useful for distinguishing between a lion and other animals or things in the world. Ideally, if words mean the same to all speakers of a language, there is no reason to define them in different ways, at least regarding their lexical ID. Nevertheless, this is not the case: a variety of definitions is found in most dictionaries, especially regarding the amount of additional differentiae included. Definitions would gain in homogeneity and perhaps accuracy if they adjusted to the same model of meaning analysis. The example of lion illustrates obvious differences in the initial categorization: two of them (Webster’s New World College (WNWC), The New Oxford Dictionary of English (NODE)) begin by including the lion in the category of ‘cat’; the Merriam-Webster’s (MW) begins with the category ‘mammal’, while the Cambridge Advanced Learners’ Dictionary (CALD) prefers the category ‘animal’. Is this a good idea?
12.7. The Lexical Constellation Model (LCM). A Platform for Establishing the Genus Et Differentiae The process of language acquisition by human beings runs parallel to the process of categorization and knowledge acquisition in general. Our brain perceives reality and is genetically prepared to identify and compare the objects perceived. The comparison of perceptions (sensory perceptions first) drives the brain to detect similarities and differences, on which it is possible later to shape categories. The categorization process of human beings is a key issue in knowledge acquisition and is obviously intimately related to the concepts we build in our minds and to the use of language, which is the vehicle for externalizing our concepts. Ultimately, the concepts we build and store in our brain are the source of the definitions of words. Definitions will, in turn, reveal the categories and hierarchical order of the concepts we have in mind. The LCM was first presented as a device to illustrate semantic attraction among words, especially in cases in which such an attraction was apparently unexpected (Cantos & Sánchez, 2001). In the physical world, a constellation refers to a group of celestial bodies or stars, with boundaries of some kind, perceived as forming a pattern; a constellation, therefore, implies an organized set of elements or units related to each other in some way. Cantos and Sánchez apply the term to lexical semantics and complex lexical units with elements inside, which bear some kind of relationship to each other and are hierarchically organized. They assume that ‘each sentence unit is formed by minor units and these in their turn are formed by other minor units, and so on; this indicates that each unit is a structure formed by other sub-structures, and each sub-structure by sub-sub-structures and so on’ (Cantos & Sánchez, 2001: 222). A hierarchical structure implies that each element is directly or indirectly dependent on other elements. Following a cosmological simile, the
9781441128065_ch12_finals_txt_print.indd 261
7/6/2011 11:08:11 PM
262
e-Lexicography
Figure 12.1: The structure of a lexical constellation.
lexical constellation resembles the solar system, with a central sun around which planets and moons orbit. Figure 12.1 shows a visual picture of the model. Any element in the constellation may connect with any other element and in many directions. Figure 12.1 illustrates how the core meaning of C is shared by three other lexical units, while D connects with C and E and the latter shares its meaning with two other lexical nuclei, and so on. Lexical units result from the clustering of specific semantic features which are perceived as units by the speakers. These units are, however, not isolated entities; they may share part of their features with other lexical units, so that the units intervening in the same set of connections are not fully independent as regards their semantic properties. Such interconnectivity is the very foundation of a lexical constellation. If we go back to the word lion, a hierarchical categorization of the concept it represents may, in accordance with the LCM, be illustrated in the following way: a lion is categorized first as an ‘animal’, later ‘a mammal’, and then ‘a cat’. Regarding ‘differences’ or additional descriptive features, we can affirm that a lion is ‘tawny coloured’ (the result of visual perception), it has a large head, a tail, four legs, it is powerful and so on (Figure 12.2). The hierarchy of categories is obvious: a lion must first be an ‘animal’ before being a ‘mammal’ or a ‘cat’. The knowledge we have in our mind concerning what a lion is obeys this categorization, be it consciously or not. More distinctive features are specified in order to differentiate a lion from other ‘cats’. However, not all the categories need always be mentioned, as is also the case with differentiae. The situation is clearly reflected in lexicographical definitions (as the ones offered above on the word lion by four different dictionaries). Some categories or features are not mentioned because they are taken for granted or implicitly assumed as ‘world knowledge’ users already have or are supposed to have. In doing so, lexicographers may run the risk of presenting ambiguous or incomplete definitions for some users. Some lexical features in fact are strictly
9781441128065_ch12_finals_txt_print.indd 262
7/6/2011 11:08:11 PM
e-Dictionaries in the Information Age
263
Four legs
Powerful man
With a tail cat mammal
animal
Etc...
Large head Tawny coloured carnivorous
Figure 12.2: The hierarchy and interdependence of categories and lexical features for lion.
necessary, as they belong to the very nature of this animal (being an animal of the cat family, for example) and it is risky not to mention them; others are peculiar to lions, but not so strictly essential to being a lion. A lion will still be a lion (i) even if due to an accident the lion has lost a leg or its tail; (ii) even if it is not powerful enough to survive in the wild; (iii) even if it does not live in Central Africa; (iv) even if its tail, for some reason, is shorter than normal; or (v) even if it hunts alone and not cooperatively and so on. Consequently, definitions admit some kind of variation (as traditional dictionaries clearly reveal), depending on the lexicographer and on the kind of information and knowledge targeted users are supposed to have. The perception human beings have of the world is not exactly the same for everybody. Besides differences in perception, there are also differences in the emphasis given to the features perceived. People living in Africa, close to regions where lions still survive, may perceive them more in terms of their ferocity, how much meat they eat, how dangerous they are for human beings and so on. Urban dwellers, used to seeing lions in a zoo, will probably emphasize other features, like being large, the colour of their hair, their big head and so on. In any case, people may normally never mention that lions are animals or mammals, because these are already features implicitly associated to the lion. Widespread and consolidated knowledge is often taken for granted, while less common knowledge will be more decisive for identifying the world around us. Obvious physical features are, nevertheless, often taken as distinctive features and mentioned in dictionaries. This is the case of the colour of the hair, the size of the head, the length of the tail, the brown mane in the male and so on. Diversity
9781441128065_ch12_finals_txt_print.indd 263
7/6/2011 11:08:13 PM
e-Lexicography
264
and differences in the knowledge we have of the world will exert a significant influence on the definitions. One more reason for referring to definitions as ‘constructs in progress’, subject to improvement.
12.8. A Step Forward in the Definitional Construct of e-Dictionaries We have mentioned above, the advantages and potential of e-dictionaries in relation to paper dictionaries. To recapitulate, we have highlighted: (i) storage capacity, (ii) processing speed and presentation of data, (iii) access to abundant linguistic and non- linguistic sources and data useful for the identification of meanings. We have also pointed out: (iv) the variety in definitions offered by traditional dictionaries and, consequently, their heterogeneity, and (v) the possibility of restructuring the definitional construct following the LCM. We may add to this the possibility of organizing entries by (vi) addressing various levels of completeness in the definitional construct, and by (vii) having access to various sources for illustration of meaning, thereby satisfying users’ needs (according to previously defined patterns of expectations) more efficiently.
12.8.1. The Defi nitional Construct of e-Dictionaries Several ‘prophecies’ have been made regarding the dictionaries of the future. We agree with Grefenstette (1998) as to some of the features that are likely to characterize e-dictionaries. One of them is closely connected with size. This cannot be dissociated from processing power, precisely what computers and the internet are so good at. Increasing the amount of information included in a dictionary may change dramatically the content of the information offered. Once this information is included, we only need suitable databases from which the user may retrieve meaning or directly access it through the internet. Furthermore, the ‘physical’ e-dictionary is not to be found at the other end of the computer in a specific and closed format. Users may type the word they are interested in and retrieve customized information on this word from anywhere in the world. The information to be retrieved may also vary in size, depending on the users’ needs or profile. Our proposal assumes that users’ needs are different and that they do not necessarily look for all the information available on each word. This assumption provides for one of the main characteristics of our proposal: entries can be organized in different levels so that the user may access the one most appropriate for them. We agree with Tono (Tono, 2001: 216) when he states, ‘Electronic dictionaries have great potential for adjusting the user interface to users’ skill
9781441128065_ch12_finals_txt_print.indd 264
7/6/2011 11:08:14 PM
e-Dictionaries in the Information Age
265
Figure 12.3: The meanings of mano, following the LCM.
level so that learners with different needs and skills can access information in a different way.’ Our proposal takes the lexical (and traditional) definition as pivotal, but this is not the only aid available. Other linguistic and non-linguistic sources (e.g. pictures) may also be efficient in terms of identifying what is the object/ thing being defined. If the function of conveying the meaning of the word is accomplished, the means used to achieve that goal have been efficient.
9781441128065_ch12_finals_txt_print.indd 265
7/6/2011 11:08:14 PM
266
e-Lexicography
We take the LCM as the point of departure for the organization and specification of lexical categories and differences of the words defined. The LCM offers a reliable tool for the semantic analysis of words; it also presents a clear and transparent visualization of lexical features and their importance in the identification of each word in contrast with others. In addition to this, the LCM is a sound platform for deciding on the amount of lexical information to be included in each of the definitional levels previously established. The lexical constellation offers the lexicographer a systematic and organized construct in which it is easy to detect primary vs. dependent, secondary or subsidiary features, the hierarchical relationships among them and, consequently, the identification of sub-units within units and their underlying cohesion. The sample previously presented (lion) may be taken as an illustrative example of the model we have in mind. We will take now a more complex word, hand/ mano, to illustrate our proposal at work.
12.8.2. The Defi nition of ‘Mano’ Following the LCM Mano, ‘hand’ in Spanish, is defined in a paper dictionary, GDUEsA, with 17 different meanings and more than 100 idioms and set phrases, in which mano appears as the head word. We looked for the meanings of mano in the corpus Cumbre 3 and then we organized them in accordance with the LCM. Figure 12.3 offers a visualization of the results.
12.9. A Proposal The proposal that follows envisages the potential design of a future e-dictionary based on a new concept of word/sense definition according to the LCM. We understand word/sense definition as that resulting from a modular and hierarchical approach which increases in structure, data and complexity, and which goes hand in hand with the user’s demand, access to data and access to (non-) lexical tools. Let us illustrate our notion of this cyber dictionary with the Spanish word mano (hand). Intuitively, we assume that different users have had a different education and, what is more, have different degrees of world knowledge. Consequently, each user should be able to access data in a unique and idiosyncratic way, depending on their individual demands and needs. This leads us to think of the user’s profile. In other words, the system elicits information and data from each user, either in a profile-like manner (user-assisted) or inferred from the various search options and search paradigms selected by the user (semi-automatic) when carrying out word searches. Next, the system uses this information in order to define the individual’s profile, which might respond
9781441128065_ch12_finals_txt_print.indd 266
7/6/2011 11:08:15 PM
e-Dictionaries in the Information Age
267
in a more realistic way to their need each time they access the e-dictionary. Similarly, and by default, the system might also supply the default-defined macro-profiles (i.e. pupils, adults and experts). Users would first need to register and the system would elicit personal data such as: z z z z z z z z z
name nationality age sex education professional profile goal/interest in using the dictionary search functionality defined macro-profile, etc.
For example, if the user were a primary school pupil and interested in looking up the word mano, the information displayed would, by default, be different from the information an adult would get. The data available to pupils would be less complex and word definitions would adjust to a simpler LCM of the word mano (see Figure 12.4). In contrast, an adult user might obtain a more elaborate and complex definitional spectrum of the word looked up (see Figure 12.5). Finally, experts or highly demanding users would be given even more data and information on the word mano (see Figure 12.6).
Figure 12.4: Pupils’ access mode.
9781441128065_ch12_finals_txt_print.indd 267
7/6/2011 11:08:15 PM
268
e-Lexicography
Figure 12.5: Adults’ access mode.
Our final aim is to find a balance among data user access, in line with Verlinde et al. (Verlinde, Leroyer & Binon, 2010), avoiding any sort of cognitive overflow. We want to offer different users: (i) different word/sense definitions, (ii) different data, and (iii) different tools. As a result, a pupil, an adult or an expert might, for the same word, be given different, dissimilar information (Figure 12.7). Nevertheless, the focus is not just on lexical information made available to users but also on non-linguistic elements (i.e. visual information: pictures, photographs, videos, etc.). In addition, the information access mode should be designed for a wide range of public; therefore, it makes no sense to incorporate complex, poorly intuitive search mechanisms. We are in favour of userfriendly interfaces, intuitive search facilities, easy database manipulation and so on, oriented towards inherent functionalities (Bergenholtz, Nielsen & Tarp, 2009; Tarp, 2006, 2008a). The convergence of the LCM and inherent functionalities could lead to a more ambitious notion of e-dictionary, as it might meet lexicographical as well as non-lexicographical users’ needs, for example: z information on words, multi-word units (MWUs), expressions, etc. z additional lexical gadgetry: { translation of words, MWUs, expressions, etc. { usage information of words, MWUs, expressions, etc. { help with reading, writing and translation, etc. z real world images and contextualization z encyclopedic information, etc.
9781441128065_ch12_finals_txt_print.indd 268
7/6/2011 11:08:17 PM
e-Dictionaries in the Information Age
Figure 12.6: Experts’ access mode.
269
9781441128065_ch12_finals_txt_print.indd 269
7/6/2011 11:08:18 PM
270
e-Lexicography
Figure 12.7: Pupil, adult and expert access mode.
12.9.1. Information on Words, Multi-Word Units (MWUs), Expressions, etc. This refers to more lexical information, such as access to: z dictionaries (general language and/or special purpose language) z thesaurus (cross references/related words; synonyms; antonyms; phrases/
collocations) z etymologies z lexical constellations z cognitive synonyms (synsets), etc.
However, the main emphasis and novelty of this e-dictionary proposal reside in the fact that the available/accessible dictionaries contain definitions based on the LCMs and are, therefore, distinguishable hierarchically, depending on the user’s profile or the mode selected.
12.9.2. Additional Lexical Gadgetry 12.9.2.1. Translation of words, multi-word units (MWUs), expressions, etc. This module allows the user to obtain data on different languages for contrasts, translations and so on. Access is not restricted only to bi- or multi-lingual dictionaries (general language and/or special purpose language) but also to
9781441128065_ch12_finals_txt_print.indd 270
7/6/2011 11:08:20 PM
e-Dictionaries in the Information Age
271
online translation tools (online translators, online translation memories, parallel bi-/multi-lingual aligned corpora, databases, terminology, usage-based statistics and so on). Once again, the uniqueness and novelty of this proposal is that the dictionaries made accessible to users in our e-dictionary are LCM-definition based. This has the unique advantage of offering contrastive data of collocates associated to the word searched and enlarging on the different world views of apparently the same or similar words. In turn, the LCM-definition based bi- or multilingual dictionaries offer the possibility of grading and structuring word data depending on user profiles or the mode selected.
12.9.2.2. Etymologies A user interested in knowing about the history of words, their derivation, and how their form and meaning have changed over time, can access online etymology dictionaries and databases.
12.9.2.3. Help with reading, writing, etc. Our concept of e-dictionary also considers offering access to more linguisticoriented data for special purposes, such as reading and writing, among others: pronunciation aid, style checkers, grammar checkers, spell checkers, online corpora (integrating corpus search tools; Sketch Engine-like software), access to real language usage (examples taken from the web, social networking websites, etc.) and also visual non-linguistic data (pictures, images and videos) related to the search word. We wish to promote not only access to sophisticated LCM-based lexical tools but also to genuine sources with real word usage, additionally aiding users with any supplementary information accessible online.
12.9.3. This Is How It Might Look The different modules or facilities are interconnected. However, the e- dictionaries’ core module is the dictionary-nucleus, formed by different LCM-lexical resources. No search or access paradigms are defined a priori. Users define their own profile or select any of the three default modes given. For instance, if we selected pupil level and chose a search word such as mano, we would be given a potential screen such as the one in Figure 12.8. Next, the e-dictionary automatically checks the spelling of the search word and visualizes correct or incorrect spelling. In the case of incorrect spelling,
9781441128065_ch12_finals_txt_print.indd 271
7/6/2011 11:08:21 PM
272
e-Lexicography
Figure 12.8: Pupil access mode screenshot.
the system would produce alternatives or suggest correct spelling. However, the user can still continue searching for an, apparently, misspelled word. A simple word search of mano with the pupil access mode might display different sorts of information (Figure 12.9): z Word sense information (based on an online LCM dictionary): Qué
significa z Access to other online dictionaries: Otros diccionarios z Lexical constellation and words related/associated with the search word:
Palabras/Ideas relacionadas z Pronunciation aid: Cómo se pronuncia z Translation information (access to online bi-lingual dictionaries): Cómo se
traduce z Visual aid (pictures and videos): Imágenes z Access to online corpora: Conocer más z Others: spell checker, online grammar, etc.: Otras ayudas
The whole e-dictionary works as a multi-functional unit accessible either as a stand-alone application or as part of an integrated tool within any other software application (word processor, internet browser, etc.).
9781441128065_ch12_finals_txt_print.indd 272
7/6/2011 11:08:21 PM
e-Dictionaries in the Information Age
273
Figure 12.9: Pupils’ access mode screenshot: word mano.
12.10. Some Final Considerations This new concept of e-dictionary radically changes our present notion of it. It is not an online paper dictionary or a paper dictionary in e-format. The goal of this proposal is to set a new standard and a new base line for forthcoming e-dictionaries. We do not aim to offer an ultra-sophisticated e-dictionary, but one that contains ‘better’ and more data, not just in breadth but also in depth. Here is where the notion of LCM is intrinsically rooted. LCM allows structured and hierarchical sense definitions: grading user’s demand from very simple accessible data to the extremely elaborate type. In addition, we wish to go deeply into user-dictionary dynamics. So far, little research has been carried out regarding user-behaviour patterns when online dictionaries are accessed (Bergenholtz, Nielsen & Tarp, 2009). It would be interesting to analyze users’ search traces and use their search paths in order to model their potential behaviour.
9781441128065_ch12_finals_txt_print.indd 273
7/6/2011 11:08:22 PM
274
e-Lexicography
Notes 1
2 3
This research is financially supported by the Spanish Ministry of Science and Innovation, research project (Ref.: FFI2009–07722), funded by the Plan Nacional de Investigación Científica, Desarrollo e Innovación Tecnológica; and the Fundación Séneca (Ref.: 08594/PHCS/08), the Murcian Regional Agency of Science and Technology. This is the first and only corpus-based Spanish dictionary to date. Cumbre is a 20-million word corpus of general Spanish, financed by SGEL S.A., and compiled in 1995.
9781441128065_ch12_finals_txt_print.indd 274
7/6/2011 11:08:24 PM
Chapter 13
Modelling Interactive Reading, Translation and Writing Assistants1 Serge Verlinde
13.1. Introduction e-lexicography has become widely used on the Web, for English as well as for many other languages. Both institutional lexicography, with its well- established (learner’s) dictionaries, and user-involved, bottom-up, collaborative lexicography (Lew, this volume) offer a wide range of resources. However, as Schwartz (2004) stated, more choice does not necessarily lead to greater customer satisfaction. And indeed, how would the user be able to compare and evaluate the quality of all these resources? Bottom-up lexicography does not carry the quality guarantee that institutional dictionaries do. Furthermore, how is a user to know where these numerous online resources can be found? Portals such as Glossarist, 2 or dictionary aggregators (OneLook)3 can be helpful, but overall they remain insufficient and lack critical evaluation. Finally, users do not always know how to use and interpret lexicographic resources that differ significantly from traditional paper dictionaries or well-known electronic dictionaries (e.g. the Visual Thesaurus).4 Alternative representations of data may completely confuse the user. Moreover, even correct interpretation of straightforward lexicographic descriptions continues to pose problems for many users (Nesi and Meara, 1994). All this could lead to an absurd situation where more is less, as is the subtitle of Schwartz’ book (2004). Given these challenges, the question to be addressed in this paper is the following. What kind of user interface will give as many users as possible access to the information that is relevant in a given context in a user-friendly way? Such an approach relies heavily on the function theory (Tarp, 2008a, and this volume). An earlier article (Verlinde, Leroyer & Binon, 2010), also drawing on function theory, discussed the underlying principles and structure of a reference site for French lexical resources, the Base lexicale du français (BLF)5 (Section 13.2). However, an interface that meets all these theoretical
9781441128065_ch13_finals_txt_print.indd 275
7/6/2011 11:08:34 PM
e-Lexicography
276
requirements may not necessarily be in line with users’ expectations, consultation habits or search strategies (Section 13.3). Other ways of accessing lexicographic information may, therefore, be needed (Section 13.4). These applications should allow more interactivity and be better adapted to the tasks performed by the user: reading (Section 13.5), translation (Section 13.6) and writing (Section 13.7). French as well as Dutch examples will be provided, but similar applications could be developed for other languages as well. The BLF website, for example, which originated as a database for French only, is being expanded to become a multilingual application which will be renamed Interactive Language Toolbox (ILT). This chapter ends with some suggestions for future developments (Section 13.8).
13.2. The Base Lexicale du Français The interface of the BLF differs significantly from other electronic dictionaries. Like the Danish Music Dictionary (Bergenholtz & Bergenholtz, Chapter 9), it considers the possible needs of a user in order to create various small, monofunctional dictionaries. On the home page, the user does not simply find a text box in which to enter a word or word combination, but must first identify the user consultation situation and needs (Tarp, 2008a: 146ff). The structure of the homepage reflects these choices and the possible needs users may have: finding information, looking up the translation of a word or word combination, or verifying how a word or a word combination is used or whether its translation is correct. A more structured description of the lexicon (e.g. the position of adjectives, lexical functions, words that often co-occur) and exercises are also provided in order to meet users’ cognitive needs. The exercises are both contextual and non-contextual (e.g. for verb conjugation). The contextual exercises (e.g. on gender agreement or use of prepositions) are automatically generated by matching the lexical information stored in the database with sentences taken from newspaper articles (Verlinde & Selva, 2001). For every need, the user must fill in a form and answer specific questions that will lead to the information requested (Verlinde, Binon & Leroyer, 2010: 8ff). Consequently, the interface of the BLF avoids most of the decoding work performed by users to interpret a traditional dictionary article, which is often underestimated by lexicographers (Gouws, 2007) and one of the main reasons for many unsuccessful searches.
13.3. Needs and Task-Oriented Access Paths In the field of human-computer interaction, usability studies evaluate the ease with which a particular tool is used. Heid (this volume) discusses the results
9781441128065_ch13_finals_txt_print.indd 276
7/6/2011 11:08:34 PM
Modelling Interactive Assistants
277
of a preliminary study on the usability of three electronic dictionaries. The scores of the BLF may appear somewhat disappointing, particularly for more complex searches. Users are clearly more accustomed to ‘simplistic’ electronic dictionaries (Heid, this volume). As it has become clear that the user may get lost in the access paths of the BLF, the website now also offers access based on specific tasks performed by the user, in addition to access based on the user’s needs. This task-oriented access is more user friendly and allows the user to submit a sentence, a paragraph or an entire text at once. According to the task selected (reading, translation or writing), the application adds a layer of relevant lexicographic information to the original text. At any time the user can access this information through hyperlinks on the words and word combinations in the text. To a certain extent, this approach echoes the idea of augmented reality. The advantages of task-oriented access to lexicographic information are obvious. There is no need to launch several successive searches, because all the information is immediately available and the lexicographer has full control of the data provided. This task-oriented approach offers an alternative to the individualization of lexicographical e-tools by recreating and re-representing data on user profiles to optimize electronic dictionaries (Tarp, this volume). In what follows, a detailed overview will be given of the three applications currently under development, that is, the reading (Section 13.4), translation (Section 13.5) and writing (Section 13.6) assistants, comparing them to similar existing tools.
13.4. Reading Assistant Some research has already been conducted in the field of computer-assisted reading support for foreign language learners (e.g. Feldweg & Breidt, 1996, for the COMPASS project).6 Prószéky and Földes (2005) developed a tool for reading assistance called MoBiMouse.7 Basically, the idea is to provide a (non-) contextual L1 translation of the words in a text written in a foreign language. The most sophisticated reading assistant for French is the Alexandria application. It is available in two versions and uses data from the Sensagent website.8 A first version is aimed at webmasters and allows them to integrate the script in the source code of a webpage. Double-clicking any word on such a webpage makes a pop-up screen appear, with the definitions or possible translations of the word. A second version (Alexandria édition familiale)9 can be downloaded and installed on a computer. It works in the same way for any text or file displayed on the screen. The application supports 37 languages. The screenshots below illustrate the kind of information available in the monolingual French version (on the left) and in the bilingual French–English version (on the right) for any verb form of the verb trouver appearing in a text (Figure 13.1).
9781441128065_ch13_finals_txt_print.indd 277
7/6/2011 11:08:34 PM
278
e-Lexicography
Figure 13.1: Alexandria édition familiale.
Notice that the tool only identifies single words. For proper nouns, the user is redirected to the relevant Wikipedia article. The BLF reading assistant differs from the Alexandria tool in three ways. The user must submit his text, which is less user-friendly but it allows the text to be analyzed. For instance, multi-word expressions can already be identified and we hope to integrate superficial syntactic analyses of the sentences as well (see 13.8.). Identification of multi-word expressions is important in a languagelearning context: many multi-word expressions are opaque from a receptive point of view and, thus, cause many difficulties to the foreign language learner. When available, the reading assistants offers a contextual translation for multi-word expressions by referring to the Opus website and its sentence aligned parallel corpora. The amount of information provided is a second important difference between both applications. For French, Alexandria not only lists definitions, synonyms and expressions, but also data taken from a historical dictionary (Littré). Again, the user may be overwhelmed by all this information, as part of it is irrelevant for receptive needs. In the BLF information is reduced to a strict minimum (Figure 13.2). The BLF pop-up screens also display links to external websites. This allows users to make the most of the wealth of information already available on the Web.
9781441128065_ch13_finals_txt_print.indd 278
7/6/2011 11:08:34 PM
Modelling Interactive Assistants
279
Figure 13.2: BLF, reading assistant.10
13.5. Translation Assistant Non-professional users who are familiar with Web-based translation tools (Babelfish,11 Google Translate12) or online bilingual dictionaries (Reverso,13 Alexandria) are the target group of our translation assistant. Rule-based machine translation (Babelfish) and statistical machine translation (Google Translate) perform quite well, although there is still a long way to go before it will be possible to translate a text fully automatically. These tools can be very helpful to understand a text written in a foreign language, offering translations from L2 to L1. For production purposes, however, the results are less convincing. One of the main characteristics of the translation assistant we have developed is its ability to adapt to the topic of a text. Working with a database makes it possible to upload a domain-specific list of translations or to defi ne the priority order in which datasets have to be browsed. As part of an interuniversity project on the translation of Dutch and French academic
9781441128065_ch13_finals_txt_print.indd 279
7/6/2011 11:08:36 PM
280
e-Lexicography
Figure 13.3: Translation assistant (academic text).
terminology to English in collaboration with the Center for Corpus Linguistics14 (CECL, Université de Louvain-la-Neuve), we developed a tool that is similar to the reading assistant. The translation assistant adds a specific layer of relevant information to any submitted text. Moreover, to enhance the user-friendliness of the tool, all terms listed in the academic translation list and their morphological variants are automatically highlighted in the text (Figure 13.3).15 This useful tool no longer requires the user to launch multiple searches and is, thus, very time-efficient. It also promotes the standardization of translations by indicating the official translation to be used for certain words. However, a simple translation assistant does not resolve the most delicate problem in translation: choosing the correct term in a given context. Therefore, this tool also provides the most common word combinations for many academic words. This information is based on corpus analysis of the entire websites of five major English universities (Oxford, Cambridge, Manchester, Leeds and Birmingham) performed by the Centre for English Corpus Linguistics (CECL). For example, the translation diplôme > degree is followed by specific word combinations such as to have a degree, a masters degree and a first degree in. In addition, a supplementary functionality allows the user to easily verify the existence of a word (combination) on the webpages of the five universities mentioned above. To this end, a shortcut has been created which automatically launches the search engines of these universities.
9781441128065_ch13_finals_txt_print.indd 280
7/6/2011 11:08:37 PM
Modelling Interactive Assistants
281
13.6. Writing Assistant Spelling and grammar checkers incorporated in Office and OpenOffice are well known and used extensively. For French, other well-known software packages include Antidote16 and Cordial,17 as well as the semi-commercial website Bon patron. For English, we have, for example, the Right Writer software and the SpellCheckPlus18 online checker, which is the English counterpart of Bon patron.19 Spelling and grammar checkers currently achieve reasonable accuracy and recall rates (Souque, 2006; and Véronis, 2005, on spelling checkers for French) and their performance continues to improve (Fontenelle, 2006 and 2009, on the Word grammar checker). For French, grammar checkers detect certain errors, such as noun-adjective agreement very well, while others, such as tense usage, are recognized much less or even not at all.20 In general, only 60 per cent of all grammar errors in French texts are identified by a grammar checker.21 The usefulness of spelling and grammar checkers for (foreign) language learning is proportional to the quality of the comments supplied with any error detection. Some tools, such as the Bon patron website or the Antidote software, provide very relevant comments (Figure 13.4), whereas others, such as
Figure 13.4: Grammar checker Antidote.
9781441128065_ch13_finals_txt_print.indd 281
7/6/2011 11:08:38 PM
282
e-Lexicography
the grammar checker for Word, fail to do so. Moreover, all these tools indicate the errors in a text and offer corrections automatically, so the user is hardly encouraged to reflect critically on his writing. For our writing assistant, we opted for a radically different approach to correct grammar. The recall rate, or the relevant items returned by the analysis, is too low to use a grammar checker as a reliable tool for improving writing skills. In addition, grammar checkers have been shown to perform poorly on texts written in non-standard language with many irregular syntactic constructions. Often feedback is also too restricted to allow the language learner to really
Figure 13.5: Writing assistant for French (grammar).
9781441128065_ch13_finals_txt_print.indd 282
7/6/2011 11:08:39 PM
Modelling Interactive Assistants
283
improve his writing skills. Therefore, the BLF writing assistant does not correct the submitted text. It only identifies syntactic and lexical patterns that may contain errors. These syntactic patterns include gender and number agreement, agreement of the past participle, use of the past tenses, position of the adjective, use of the subjonctif (subjunctive) and use of the passive voice, as well as error patterns specific to native Dutch speakers. Since interference of the mother tongue is a significant source of mistakes made by foreign language learners, our writing assistant also includes error patterns registered in a learner’s corpus (corpus FRIDA;22 Granger, 2003). As a result of the analysis performed by the application, all occurrences of problematic patterns in the text are displayed in small boxes (Figure 13.5). The example below shows patterns associated with the translation of the Dutch verb worden. Both syntactic (passive voice) and lexical patterns (use of the verb devenir) are mixed. The feedback is matched as adequately as possible to the presentation of the error patterns. This format helps the user to perform a quick check of all patterns detected. The user may also save any occurrences in his personal database by ticking the box in front of the relevant occurrence. Besides this application, which focuses mainly on grammar, there is also a tool which enables users to expand their vocabulary (words and word combinations). Texts written by foreign language students often prove to be somewhat inferior in terms of lexical richness. They often fail to choose precise words (e.g. hyponyms instead of specific terms) and do not master specific word combinations (e.g. to make a questionnaire instead of design or develop a questionnaire). This tool is still under development for French, but it is already available for (Academic) Dutch. Submitted texts are scanned for three criteria: precision of formulation, register (general language versus academic language) and correct word choice in a given context. Just as in the previous tools, feedback is displayed in pop-up screens. By selecting one of the three categories, all relevant occurrences are highlighted in the text, as shown below for an example of vague formulation (Figure 13.6). In the above example, vague lexical patterns such as there was (Dutch: er was) and a certain number of (Dutch: een aantal) are highlighted, but also the proper noun Europe (Dutch: Europa), which in a text on politics should be further specified as European Union or European Commission, depending on the context. Comments may thus be adapted to the subject of the text. As is the case for the translation assistant, the same tool yields different results for different users in different situations depending on the text’s domain. The writing assistant also suggests (near) synonyms and hyperonyms as academic alternatives to general language. In the above example, norm, procedure, law or convention are offered as alternatives to the word rule (Dutch: regel), together with other members of this word family and relevant word
9781441128065_ch13_finals_txt_print.indd 283
7/6/2011 11:08:41 PM
284
e-Lexicography
Figure 13.6: Writing assistant for Academic Dutch (Lexicon).
combinations. As always, it is up to the user to make a choice between these alternatives according to the context. To improve word choice, the tool also displays specific word combinations for certain words in the text. For the word advice (Dutch: advies) it lists verbs such as give, receive, ignore or follow and adjectives such as sound, bad or conflicting. Similar to one of the functionalities of the Check My Words23 software, examples of each of these word combinations can be consulted by using a shortcut to the KwicFinder website, 24 an online web concordance offering examples which are more easily readable than the results of a Google search.
13.7. Future Developments The usability study mentioned by Heid (this volume) illustrates the tension between theory and how to implement this theory in a practical tool. As suggested by Heid (this volume), creating small-scale mock-up dictionaries to test possible interfaces could help the lexicographer create tools that allow you to ‘find the thing you are looking for preferably in the very first place you look’ (Haas, 1962: 48, quoted by Bothma, this volume). Developing high quality reading, translation and writing tools requires rich databases and precise analyses of submitted texts. It is obvious that a wealth of information is available on the Web. Unfortunately, most of this information is presented on static Web pages: as is the case with paper resources, information is not always easily accessible
9781441128065_ch13_finals_txt_print.indd 284
7/6/2011 11:08:41 PM
Modelling Interactive Assistants
285
and not reusable. Therefore, a labour-intensive process of reformatting this information is required in order to integrate it into interactive assistants. A better analysis of submitted text could be another significant improvement of the proposed tools. At the moment, the analysis is limited to simple word form matching and lemmatization. As homonyms are not differentiated, the information layer added lacks precision. Multiword expressions are easily identifiable, but it is almost impossible at this stage to detect collocations or any other discontinuous lexical elements. In order to enhance the results of the analysis performed by the application, natural language procession techniques should be integrated, as in other CALL applications (Heift & Schulze, 2007: 52ff) or intelligent CALL applications (iCALL). The challenge here is the degree of robustness of, for example, shallow parsing and chunking applied to non-standard sentence patterns, specific to non-native language productions. A syntactic analysis which identifies the most important components of a sentence without being time-consuming would be sufficient for the purposes discussed here.
13.8. Conclusion This contribution presents new access possibilities to lexicographic data, both in dictionary format and as truly interactive tools, adapted to the most common tasks performed by (foreign) language learners: reading, translation and writing. As a reading, translation or writing assistant, the ‘dictionary’ becomes a kind of intelligent tutor. However, in the tools presented here, the focus still remains on language, and less on structure and content as in the writing assistants included in the electronic versions of a number of English learner’s dictionaries (Oxford iWriter25 and Longman Writing Assistant26). The reading, translation and writing assistants are very flexible. For each application, specific datasets can be uploaded, making the information layer added to the text as relevant as possible. In addition to these three innovative tools, the BLF provides many other needs-driven access paths to lexicographic data in the form of a single access mode with numerous shortcuts to external websites. In the future this access will be extended to lexicographic information for both Dutch and English, resulting in a true multilingual tool, which will be called Interactive Language Toolbox (ILT). Nevertheless, the usability study quoted by Heid (this volume) shows that this tool still needs improvement, both from a conceptual and a technical point of view. With the help of today’s technology (tracing and logging, comparison of mock-up dictionaries, task-based testing [Heid, Chapter 14]), it should finally be possible to catch a glimpse of users’ consultation habits, a must in order to develop true lexicographic Rolls-Royces (Tarp, this volume).
9781441128065_ch13_finals_txt_print.indd 285
7/6/2011 11:08:42 PM
286
e-Lexicography
Notes 1
2 3 4 5
6 7 8 9
10
11 12 13 14 15
16 17 18 19
20
21
22 23 24 25 26
This article is a revised version of ‘La conception de didacticiels intégrés d’aide à la lecture, à la traduction et à la rédaction’. In P. Desmet and A. Rivens. ELAO et production écrite. Revue française de linguistique appliquée XV (2010). This article is available on www.cairn.info/resume.php?ID_ARTICLE=RFLA_152_0053 www.glossarist.com/ www.onelook.com/ www.visualthesaurus.com/ The Base lexicale du français is the new interface of the Dafles (Dictionnaire d’apprentissage du français langue étrangère ou seconde – Verlinde, Selva & Binon, 2009), a lexical database for French that includes links to several websites dedicated to French lexicon. All references are listed on the website. www.sfs.uni-tuebingen.de/Compass/compassinfo.html www.morphologic.hu/A-MoBiMouse-6-hasznalata.html www.memodata.com/ http://alexandria.sensagent.com/alexandria-dll/v2/download/alexAndriafamilial.fr.jsp?from=square We plan to replace the link sens in the pop-up screen with short definitions taken from the Dafles. http://babelfish.altavista.com http://translate.google.com www.reverso.net/text_translation.asp http://cecl.fltr.ucl.ac.be/ The tool is available on the websites of the universities of Louvain-la-Neuve (http://sites.uclouvain.be/lexique/lexique.php) and Leuven (http://ilt.kuleuven.be/kucl), but is password protected. www.druide.com/a_description.html www.synapse-fr.com/correcteur_orthographe_grammaire.htm http://spellcheckplus.com/ The use of the Bon patron website is free for short texts. http://bonpatron. com/ An exhaustive list of errors detected by Antidote can be found on this website: www.druide.com/a_correcteur.html Lecture by Dominique Laurent (Synapse, editor of the Cordial software) at the Forum des industries de la langue (Louvain-la-Neuve, 17 March 2010). www.fltr.ucl.ac.be/fltr/germ/etan/cecl/Cecl-Projects/Frida/frida.htm http://mws.ust.hk/cmw/index.php http://kwicfinder.com/KWiCFinder.html www.oup.com/elt/catalogue/isbn/6713?cc=global www.pearsonlongman.com/ldoce/about.html
9781441128065_ch13_finals_txt_print.indd 286
7/6/2011 11:08:42 PM
Chapter 14
Electronic Dictionaries as Tools: Toward an Assessment of Usability Ulrich Heid
14.1. Introduction The fact that dictionaries are tools has been the focus of many discussions in recent lexicographic theories. Tarp (2008a: 123) defines the notion of ‘lexicographic tool’ as follows: A lexicographical tool is a tool that can be used via consultation or passive searching by users with a specific type of communicative or cognitive need to gain access to lexicographical data, from which they can extract the type of information required to cover their specific needs. Bergenholtz and Bergenholtz (this volume) go even one step further, in the title of their article: ‘A Dictionary Is a Tool, a Good Dictionary Is a Monofunctional Tool’. The tool aspect of electronic dictionaries points to their basic similarity with, for example, search engines for the internet, or interfaces for the access to databases. The latter two would clearly be classified, also by anyone not familiar with the Function Theory of lexicography, as being online information tools. As Bothma (this volume) shows convincingly, electronic dictionaries indeed share numerous characteristics with information systems, both search engines and database access tools: the needs-driven access, the ways of consultation (search and browsing), the access to particularized information and so on. In this chapter, we intend to apply a method from information science to electronic dictionaries, namely usability testing. Usability is commonly seen as a cluster of properties of (software) products which ensures effective and efficient use of the software, as well as user satisfaction. Consequently, developers of professional software (especially of consumer products, such as mobile telephones, search engines, text processors and other consumer software) attach much importance to usability; they design their products with a view to usability, and they carry out usability tests on prototypes of their software, to
9781441128065_ch14_finals_txt_print.indd 287
7/6/2011 11:10:36 PM
e-Lexicography
288
understand which properties of the upcoming products are easy to handle and which require redesign before being marketable. For electronic dictionaries, we are not aware of information scientific research on their usability; obviously, there is however a vast literature on dictionary use, as well as a few studies on specific aspects of the behaviour of users of electronic dictionaries (e.g. Bergenholtz & Johnsen 2005). We present a first set of experiments on usability testing of online electronic dictionaries; these tests have been carried out by Christina Bank (see Bank, 2010), in cooperation with the author, in winter 2009–2010, at the University of Hildesheim.1 We explain the lexicographic and the information scientific background of the experiments, as well as the first results obtained. These are indeed not to be seen as representative or in any way final: they basically serve as a starting-point for further hypotheses that would need to be tested. The remainder of this chapter is organized as follows: in section 14.2, we summarize the basic lexicographic reasoning behind the experiments we report on, and we briefly introduce the electronic dictionaries with which we work. Section 14.3 gives a brief overview of current notions of usability and of methods for usability testing, as far as these are applicable to electronic dictionaries. In section 14.4, we report on the tests carried out, describing the test conditions, the subjects involved and the methods used for gathering data. We also provide a first interpretation of the results. Section 14.5 is a conclusion, devoted to a more general discussion of our experiments, with a view to improvements: we criticize our approach and try to come up with proposals for more significant usability tests; ultimately, we do not intend to only do usability testing, but we think that it is necessary to come up with constructive proposals for the usability design of electronic dictionaries, very much the same way as this is customary in software development.
14.2. Lexicographic Background: Relationship with Usability The citation from Tarp (2008a: 123) given above sets the scene for the discussion of dictionaries as tools. According to the Function Theory of lexicog raphy, dictionaries should provide data that allow a person to infer information required to satisfy a need, either in a communicative situation (text production vs. reception etc.) or in a cognitive situation (need to learn about a word, a concept etc.).
14.2.1. Lexicographic Data Selection and Data Presentation Obviously, the lexicographer has a double task to accomplish, in order to provide adequate lexicographic data: one must select carefully the data that best
9781441128065_ch14_finals_txt_print.indd 288
7/6/2011 11:10:36 PM
Electronic Dictionaries as Tools
289
fit the user’s needs, and one must present such data in a convivial, pedagogical and easy-to-use way. Both these tasks can be seen as being directly related to usability, insofar as data selection has an influence of the effectiveness of a dictionary, and a convivial presentation should support its efficient use; both will lead to user satisfaction. Effectiveness can be translated (in an oversimplified way, see below), as the property of a given software to provide the right data and the right amount of data to the user. Efficiency may be paraphrased (in an equally simplified way) as the time to task completion typical of the use of a given piece of software. The quicker the user gets access to the data needed, the more efficient is the software. In this sense, the two basic building blocks of lexicographic work in the preparation of dictionaries, data selection and data presentation, are both key aspects of the dictionary’s usability. This applies to all kinds of lexicographic products, be they language dictionaries (where data selection is a matter of linguistics) or dictionaries providing domain knowledge (as is the case, for example, in the Danish Music Dictionary, where data selection is done by a domain expert). Obviously, not only electronic dictionaries can be analyzed from the point of view of usability, but also paper dictionaries. A crucial ingredient of efficient data presentation is, for example, the access to lexicographic data. Access paths in printed dictionaries as well as those in electronic ones are thus an aspect to be studied when it comes to a usability assessment (see below, section 14.4.3.4). According to Function Theory, any dictionary design must start from the needs of users, hence from the (extra-lexicographic) situations in which a user may have a given need that can possibly be satisfied by data retrieved from a dictionary. The diversity of needs implies almost by itself that diverging needs cannot or not optimally be satisfied by one and the same result of lexicographic data selection and presentation. Thus, it is only consequent to require that monofunctional dictionaries be constructed. Bergenholtz and Bergenholtz (this volume) present three (versions of their) music dictionary, according to different needs: one for reception tasks (understanding a term), one for cognitive tasks (to provide knowledge about objects designated by terms) and a third, polyfunctional one. A similar segmentation into different needs-specific (versions of) dictionaries can be found in other electronic dictionaries created by the Aarhus Centre for Lexicography, most prominently the Dictionary of Fixed Expressions (Ordbog over faste vendinger).
14.2.2. Monofunctional Dictionaries Monofunctional dictionaries are intended to satisfy both needs of effectiveness and of efficiency. An effective piece of software provides the data or processing it is intended to provide, and only this. Or, to put it differently: an effective piece of software avoids an information overload, by means of selecting data. The same can be stated for an effective electronic dictionary: it helps avoid
9781441128065_ch14_finals_txt_print.indd 289
7/6/2011 11:10:36 PM
290
e-Lexicography
an information overload, by providing exactly the data categories and data instances needed for a given usage situation (according to the available knowledge of the user). Hence neither standard search engines for the internet, which tend to provide very large amounts of results for most queries, nor general multifunctional dictionaries, which tend to provide all the data contained in the pre-lexicographic data collection in one article, can be seen as effective: they overload the user with data; the user has to select himself, but often does not have sufficient criteria to carry out an adequate selection. An efficient piece of software is one that gives quick access to the relevant data. This implies short and clearly reproducible access paths. With respect to dictionaries, efficiency involves devices that guide the user quickly to the relevant data, ideally with low complexity. Nesting, as known from many printed dictionaries, is sub-optimal with respect to access efficiency, whereas a simple alphabetical macrostructure is more efficient. A historically based sense ordering in a monolingual dictionary is likely less efficient for a user working with contemporary texts in a receptive situation than with one which is based on the frequency of readings. For software, efficiency is often calculated in terms of operations a user has to carry out, or of links one has to follow to arrive at the data one is interested in. Bergenholtz (2009) has shown that long search times may lead users to abort the search prematurely, with the result that data which would be available in a given dictionary are not found. Thus, monofunctional dictionaries tend to be more efficient than multifunctional ones, as they provide single answers in response to specific questions.
14.2.3. Innovative Electronic Dictionaries – the Online Tools Analyzed In the study we are reporting on, three highly innovative dictionaries are analyzed: two of them are electronic learners’ dictionaries, and all three have been designed exclusively for the electronic medium, expectably with usability in mind. The three dictionaries analyzed are the following: z ELDIT, Elektronisches Lernerwörterbuch Deutsch-Italienisch (Abel & Weber 2000;
Abel & Campogianni 2005); z BLF, Base lexicale du français (Verlinde et al., 2010, etc.); z OWID, Online-Wortinformationssystem (Müller-Spitzer, 2010).
ELDIT covers German (DE) and Italian (IT), BLF French (FR) and OWID German (DE). The two learners’ dictionaries have been designed with principles from lexicographic theory in mind: ELDIT is intended to serve learners of Italian with German as a mother tongue, as well as, conversely, learners of German with an Italian background. Thus, ELDIT is not a bilingual dictionary (Abel & Weber,
9781441128065_ch14_finals_txt_print.indd 290
7/6/2011 11:10:37 PM
Electronic Dictionaries as Tools
291
2000), but combines two production-oriented learners’ dictionaries designed as primarily monolingual dictionaries, with glosses in the mother tongue. BLF has been conceived as a learners’ dictionary of French for Dutchspeaking learners; it integrates functionality inspired by the principles of the Function Theory of lexicography, for example, by offering very specific taskrelated lexicographic data which are truly monofunctional; an example of such a device is BLF ’s function for providing data on the gender and on the inflection of French nouns: under the task-specific access path ‘Is it le or la?’, the dictionary provides the article used with the nouns and their singular and plural forms. OWID is intended for advanced users with a solid language background and with a detailed interest (cognitively or, in certain cases, communicatively motivated) in the linguistic properties of German words. OWID provides different degrees of detail, depending on the amount of data the user wants to see. While the two learners’ dictionaries each cover several thousand entries, OWID is a prototype restricted, for the time being, to a much smaller number of entries. It should be noted that all three electronic dictionaries are constantly being developed further – none of them is still in the state in which it was when it was used for our experiments, in January 2010. This in turn means that our analysis is, unfortunately, not exactly reproducible; however, the tendencies we identified may still be applicable to similarly structured electronic dictionaries.
14.3. Usability Testing – Methods and Application to Electronic Dictionaries 14.3.1 Notions of Usability Usability is considered as a property of (software) products. It has been described and defined in a variety of ways, which however have a common perspective. In the following, we give a short summary of the key notions, focusing on software products (and in particular on online dictionaries), even though the definitions can be applied to any manufactured product. A basic definition is given by the international standard ISO 9241–11 (1998): it contains the elements of effectiveness, efficiency and user satisfaction mentioned above: z effectiveness of a given product concerns the degree of detail and of com-
pleteness with which users achieve their goals by using the product; z efficiency of a given product concerns the effort users have to invest in
order to achieve their goals, in relationship with the degree of detail and the completeness of task fulfilment;
9781441128065_ch14_finals_txt_print.indd 291
7/6/2011 11:10:37 PM
292
e-Lexicography
z user satisfaction is defined both negatively, as the absence of obstacles in
the work with the product, and positively, as the users’ (positive) attitude towards the product. These definitional elements have a number of important implications: z Usability is a property of a product which emerges only in the interaction
with the user; as users are widely different, for example, with respect to their background, their prior knowledge about the application domain or the ways of using products of a given type (e.g. electronic dictionaries), usability is not a constant feature of a given product, but relative to the intended user group(s), and likely even to individual users. z An assessment of usability is only possible (or at least most meaningful) in a situation where the product is used for a given task; in other words: a generic assessment of the usability of a given electronic dictionary is difficult to carry out – an assessment should rather be embedded in a given usage situation. There is a broad literature on usability, and we can, in the framework of the present discussion, not go into detail about the different strands and tendencies. As Krömker (2007: 15) puts it, these traditions use different formulations and terms, but the basic principles are relatively stable. The discussion about the usability of internet applications and about usability testing includes most prominently work by Jacob Nielsen (1993), Jeffrey Rubin (1994), Rubin and Chisnell (2008), Dumas and Reddish (1999), and Shneiderman and Plaisant (2005). Krömker’s (2007:14) comparison of the most prominent definitional elements of the notion of usability contains the following aspects: z objectively measurable components concerning task completion vs. subject-
ive components concerning user satisfaction; z among the objectively measurable components the most prominent ones
are the error rate (which is an element of a measure of effectiveness), efficiency (Shneiderman & Plaisant, 2005: ‘speed of performance’), learnability and memorability (Nielsen, 1993); learnability implies easy understanding of the use of a given product by the user (Shneiderman: ‘time to learn’), and memorability means the fact that users are able to memorize the basics of the use of the tool over longer periods (Shneiderman: ‘retention over time’); both of the latter make users ‘feel at home’ with the product; z the subjective aspect of ‘satisfaction’ or ‘subjectively pleasing’ properties (Nielsen, 1993) of a given piece of software concerns the attitude of the user towards the software.
9781441128065_ch14_finals_txt_print.indd 292
7/6/2011 11:10:37 PM
Electronic Dictionaries as Tools
293
Research on usability is a part of information science; it is often seen as a subdomain of areas such as man–machine interaction (MMI), human–computer interaction (HCI), or user-centred design (UCD). Usability research involves usability design (i.e. the specification and realization of properties of a given software product that enhance its usability), and usability testing; the latter is used, as most software evaluation work, both comparatively, to assess differences between alternative software solutions and as a tool in usability design, that is, to identify potential usability problems in the course of the development of software.
14.3.2 Measuring Usability For practical reasons, there is much interest in possibilities of assessing both qualitative and quantitative aspects of usability. Usability tests can be carried out in different ways, involving usability and domain experts (analytical methods) or lay users (empirical methods). Both subjective assessment, via questionnaires, and objective measurements are commonly used. The latter are typically carried out in a usability laboratory (sometimes also in the environment) where the product to be tested is typically used by lay persons; they involve logging of software use (navigation and keystrokes), recording of think-aloud protocols, measurements of time to task completion, eye tracking (for protocols of the gaze movement of users during software use), and so on. The test results can be interpreted qualitatively, and, according to many experts, also to some extent quantitatively. Nielsen and collaborators state that a single expert will only identify about 35 per cent of all usability problems of a given piece of software. Thus, they suggest working with at least five to ten experts. Good practice has shown that 12 to 15 lay testers of a homogeneous user group will be sufficient to identify the majority of usability problems. Given that users always come with their own background and prior knowledge, a crucial factor is to understand how a homogeneous group of users can be set up. For electronic dictionaries, it seems close to impossible to set up a completely homogeneous group. However, experiments with about 30 test persons should give an adequate picture. In this context, it is important to clarify the interpretation of the data obtained from measurements carried out in a usability test. What is import ant, are not the absolute figures (e.g. ‘58.3 per cent of the users’), but, rather, proportions. When the majority of the test persons, for example, more than 75 per cent, encounter a given error, then it is evident that the error is not a matter of the test persons’ individual deficiencies, but a problem of the tested product.
9781441128065_ch14_finals_txt_print.indd 293
7/6/2011 11:10:37 PM
e-Lexicography
294
Typically, this qualitative interpretation of quantitative data is the objective of most user tests. Thus, in an application of usability testing methods to electronic dictionaries, instead of a comparative ranking of different dictionaries, the focus is on finding and removing usability problems, in order to identify features of the dictionaries that are effective, efficient and contribute to the users’ ‘ joy of use’. For user interfaces, ISO 9241–110 (2006) proposes seven principles of man– machine interaction (called ‘Dialogue principles’ in the standard), which are in principle also applicable to electronic dictionaries: z suitability for the task: the user interface should provide closed dialogue
z
z z
z
z
z
sequences for individual tasks, and a match between the system and the real world (Nielsen, 1993); self descriptiveness: user interfaces should give clear and informative feedback about the system status; data should be visible and (type-wise) recognizable; controllability: users should at any time be able to control the operation of the system; they should feel free in their interaction with it; conformity with user expectations: the system should be consistent, that is, a given type of data should always be presented in a given fashion, such that the system ‘behaves’ in a way users expect it to behave; error tolerance : the system should allow for easy reversal of actions (Shneiderman, 2002) and help users recognize, diagnose and recover from errors (Nielsen, 1993); suitability for individualization: frequent users should be able to define and use shortcuts (Shneiderman, 2002), and the look-and-feel of a user interface should be influenceable, at least to some extent, by the user; suitability for learning: the system should reduce short term memory load in users (Shneiderman, 2002), provide adequate documentation and thus reduce the amount of time needed to learn how to manipulate it.
These principles set the general expectation horizon for usability tests of user interfaces; they have to some extent also been applied to the electronic dictionaries under review.
14.4. A Case Study on the Usability of Electronic Dictionaries: Tests Results Interpretation In this section, we report on the usability tests carried out on the three online dictionaries, ELDIT, OWID and BLF (see section 14. 2.3). The overall layout of the case study comprises three steps: a pre-test questionnaire, test sessions in a usability laboratory and a post-test questionnaire.
9781441128065_ch14_finals_txt_print.indd 294
7/6/2011 11:10:37 PM
Electronic Dictionaries as Tools
295
All three stages were carried out with a population of 33 students (second to fourth year of studies, see section 14. 4.1 for details). The pre-test questionnaire is intended to capture the general expectations of the students with respect to electronic dictionaries. The test sessions in the usability laboratory provided measurable data, and the post-test questionnaire was administered in order to get subjective feedback, related with user satisfaction. The user tests were preceded, in Bank’s work (Bank, 2010), by an expert usability evaluation that was intended to formulate hypotheses to be tested afterwards with the students. In the following, we report on the three steps of the user tests.
14.4.1. Subjects The 33 students (31 female, 2 male) asked to participate in the tests are students of language-related study programs at the University of Hildesheim; they study ‘International Information Management’, ‘International Communication and Translation’ and ‘International Specialized Communication’; at the time of the tests, they were in their second, third or fourth year of study (the study age was without impact on the results obtained). The students all participated, at the time when the study was carried out (in January 2010), in courses on translation from English to German (general language texts) and were, thus, in principle, acquainted with the use of dictionaries in general and electronic dictionaries in particular. Most of them, however, had not previously known the electronic dictionaries that were tested, but the results do not show any difference between the few students who had already seen the dictionaries before and those who had not.
14.4.2 Pre-Test Questionnaire: Expectations of Students with respect to Electronic Dictionaries The pre-test questionnaire was administered in the framework of university courses; the objective of the questionnaire was to understand which functions the students would find most important in electronic dictionaries. The questions asked in the questionnaire concerned the following aspects of electronic dictionaries: (1) advantages of electronic dictionaries over paper dictionaries, especially with respect to (a) structure and navigation within the dictionary, (b) contents of the dictionary, (c) possibilities for interaction with the dictionary; (2) expected added-value of online dictionaries; (3) most important functions and properties of an electronic dictionary:
9781441128065_ch14_finals_txt_print.indd 295
7/6/2011 11:10:38 PM
296
e-Lexicography
z for users in general, z for the individuals filling in the questionnaire, in terms of frequency of
use: (a) error tolerance in the search functions, (b) help functions, (c) self descriptiveness of the dictionary site, (d) clarity of the structure of the site, (e) hyperlinks between articles, (f) possibility to get back to the start page at any time during the use of the dictionary, (g) amount of information given in the articles, (h) relevance ranking of search results, (i) presence of pronunciation data (sound files), (j) presence of illustrations. The questionnaire concerns both aspects of lexicographic data selection and data presentation (1b, 3g–j) and aspects of software usability in the strict sense; we wanted to understand the expectations of the students with respect to controllability and navigation (1a, 3b, 3d–f), as well as with respect to the properties of error tolerance (3a) and self descriptiveness (3c). We selected these criteria because they can also be tested relatively easily in the usability laboratory. The results of the questionnaire show that the students consider efficiency the most prominent advantage of an electronic dictionary over a paper dictionary: quick access to lexicographic data, ease of use and easy searching were mentioned most often as an answer to questions 1a, 1c and 2. The students also considered quick and easy search functions to be the most important functionality for all users (question 3). With respect to the lexicographic content of the dictionary (question 1b and 3), the students thought that electronic dictionaries were more complete, up to date and better documented with example sentences than printed ones (‘more data is better data’). This result is interesting in so far as it seems to indicate that the test persons thought that the presentation of large amounts of lexicographic data was an advantage in itself; obviously, more detailed studies about the acceptance of the focused monofunctional presentation of data would need to be undertaken, but it seems that the search habits of students are massively influenced by internet search engines, for which they accept being confronted with large amounts of candidate results from which they have to manually select. The same influence of internet search engines seems to shine through in the answers given to questions 3a–3j. The students consider a broad and detailed set of lexicographic indications (3g) the most important feature of a good electronic dictionary, followed by comfortable and error tolerant search functions
9781441128065_ch14_finals_txt_print.indd 296
7/6/2011 11:10:38 PM
Electronic Dictionaries as Tools
297
(3a) and relevance ranking (3h). The latter is known to them from search engines, and it might be interesting to investigate whether the metaphor of relevance ranking could be used in the presentation of search results in electronic dictionaries.2 On the basis of the results of the pre-test questionnaire, the search functions, the presentation of lexicographic data in search results and aspects of navigation (efficiency) were selected as targets for the user tests.
14.4.3 Task-Based Tests in the Usability Laboratory The tests administered in the usability laboratory concern all three tested dictionaries. All tests are based on tasks which are relatively close to work situations in text reception or text production. The wording of each task clearly makes reference to one of these situations, as well as to the communicative need to be satisfied by means of the search. A sample task is reproduced below: Du sitzt an einer Textproduktion Deutsch und bist Dir nicht sicher, ob Du an einer Stelle die Wortverbindung aus gutem Grund oder mit gutem Grund setzen sollst/kannst. Existieren überhaupt beide Ausdrücke? Könnte man beides sagen, oder nur eines der beiden? (Bank, 2010: 96)3
14.4.3.1. Tasks As the three dictionaries, ELDIT, OWID and BLF, have been conceived on the basis of different approaches, and with different user groups and usage situations in mind, not all tasks can be parallel. Some tasks have been adapted to specific features of the analyzed dictionaries. In this sense, we used a mainly summative test, which is only partly usable comparatively. Tasks which are relatively comparable between the three dictionaries concern the following properties: – efficiency of search: – for single word items and their meaning explanation (based on polysemous items): – OWID: readings of DE integrieren; – BLF: readings of FR absorber, given as a synonym of prendre; – ELDIT: synonyms of IT famiglia; – for multiword items – -OWID: ausgutemGrund vs. mitgutemGrund (see above); – BLF: meaning of c’est une question de vie ou de mort; – ELDIT: meaning of IT prendere di petto qualcuno;
9781441128065_ch14_finals_txt_print.indd 297
7/6/2011 11:10:38 PM
298
e-Lexicography
– navigation: – access to collocational and syntactic data: – OWID: typical adjectives for DE Familie; – ELDIT: syntactic valency patterns of IT prendere. Alongside these tasks concerning search and navigation, specific tasks were used to test peculiarities of the dictionaries under analysis. These include most prominently the portal function (and the related aspects of navigation, controllability and self-descriptiveness) of the version of BLF then in use. In fact, this BLF version had links to a large number of dictionary-external sources of information, especially on the internet. For example, translations are accessible from BLF via the online dictionary Interglot, and via parallel corpora. The analysed version of BLF also provides the first implementation of a truly task-oriented access to lexicographic data we are aware of, by offering a search function entitled ‘Is it le or la?’ This function provides the gender and number inflection of French nouns. It was tested with the noun honneur.
14.4.3.2. Test setup The tests were carried out with 33 students in about 75-minute individual sessions. All subjects got the above-mentioned tasks (and a few more), as well as the possibility of using a few minutes at the beginning of the work with each of the dictionaries to explore the websites. The tasks were presented in written form and explained; the students could get help if they got lost in one of the tasks. The sessions were carried out in a usability laboratory using the Morae Observer software4 for each task, and each subject, keystrokes, and mouse movements were recorded, to get a picture of the navigation behaviour of the subjects; furthermore, screen video and sound recording were used to capture think-aloud protocols; finally, time to task completion was measured. The observer further annotated the data obtained with markers for the success level (completed with ease – completed with difficulties – failure to complete the task) and with markers for events and attitudes observed during the sessions system errors, comments by the subjects, frustration and so on. In the analysis phase, task completion, time to task completion, as well as particular usability problems thrown up by the tasks were identified.
14.4.3.3. Sample results: search facilities Some of the results of the user test are worth being discussed in some detail. The focus in the tests was on search and navigation, as these are crucial elem-
9781441128065_ch14_finals_txt_print.indd 298
7/6/2011 11:10:38 PM
Electronic Dictionaries as Tools
299
ents of the efficiency of electronic dictionaries. The search for single word items is indeed quite efficient in OWID, 60 per cent of the subjects completed the search for integrieren and its readings with ease, another 40 per cent with difficulty. Similarly, more than 70 per cent of the students got the readings of absorber in BLF (45 per cent with ease). Searching for multiword items, however, is much more difficult: 70 per cent of the students did not manage to get OWID’s answer to the question about the difference between aus gutem Grund and mit gutem Grund; a reason for this might be the more complex access path one has to search for Grund, select the reading Motiv within the entry and then go to the submenu typische Verwendungen; with the exception of two students, all tried to insert the multiword expressions directly into the search field, which failed. Those who looked up the multiword under the noun Grund needed to understand that they had to select a reading (Motiv), to access multiwords; this latter part of the access path is highly debatable should the lexicographer assume that the dictionary user is familiar with the reading distinction operated, and able to link multiwords with respect to which he/she has a communicative need, with readings presented in the dictionary? For reception-oriented needs, we think that this is not the case: if the user searches for the meaning of a multiword item, he/she will typically not be able to identify the readings of the component elements (e.g. of bases of collocations). For text production, for example, for finding collocates of a given reading of a base, the access path obviously has to involve readings of polysemous bases. 5 The search for c’est une question de vie ou de mort in BLF was not very successful; only 10 per cent of the students managed to get access to the intended data. One of the reasons for this is that the subjects did not manage to correctly manipulate the search window for multiwords, which consists of two parts. Even though the window contains information on how it must be used (examples, given on the right hand side of the ‘GO!’ button), the students did not take note of this help. The users spent 3.28 minutes (on average) on this task. The search for IT prendere di petto qualcuno in ELDIT worked much better; 80 per cent of the students managed to get access to the lexicographic data related with this expression. With 1.87 minutes task completion time, this was the fastest access to multiwords observed. One reason for this is the flexibility of ELDIT’s search function, which allows multiword input, as well as its tolerance with respect to typographic errors. As mentioned above, we also tested BLF ’s task-oriented access device, exemplified by the search option ‘Is it le or la?’ Indeed, all students managed to get the correct result; 90 per cent of them with ease. With 37 seconds for task completion, the highly efficient nature of this device was clearly shown. Summarizing the tests on access to single-word and multiword expressions, a correlation between the complexity of the search interface and the time to
9781441128065_ch14_finals_txt_print.indd 299
7/6/2011 11:10:38 PM
300
e-Lexicography
task completion, as well as the percentage of correctly completed tasks may be seen. Simple search interfaces work better than more complicated ones. Our test persons did not read all available support messages, and thus failed on the multiword task with BLF; even though this part of BLF is self-descriptive, the layout and the path to be followed don’t seem to be conformant to user expectations. With OWID, multiword search follows a path inspired by good practice of production-oriented paper lexicography for collocations (base lemma → reading of base lemma → collocates, see the Oxford Collocations Dictionary for Students of English, OCDSE), but this is less natural for non-standard tasks on multiwords, and counter-intuitive for reception-oriented tasks. Bank (2010) lists numerous details of interface design that also played a role in the results obtained; for reasons of space we refrain from discussing these in detail.
14.4.3.4. Sample results: navigation The subjects had considerable difficulty in navigating through the more complex parts of the electronic dictionary websites. Partly, this has to do with the titles given by the lexicographers to the different types of indications present in the dictionaries. In OWID, the search for typical adjective collocates of Familie caused this type of problems: the students checked the indication ‘typische Verwendungen’ (‘typical uses’), but did not find anything; they should have looked up ‘Semantische Umgebung und lexikalische Mitspieler’ (‘semantic environment and lexical partners’); in this item, the sub-item ‘Wie ist eine Familie’ (‘how is a family’) contains the adjective collocates, which again was not found by all subjects. In principle, it is a good (and user-friendly) device to avoid using too much lexicographic and linguistic terminology in a dictionary, at least if it targets lay persons, not linguists. In this particular case, however, the replacement of linguistic terms by non-standard synonyms seems to have been a big part of the usability problem, as it reduced the self-descriptiveness of the dictionary. Another one is the fact that the dictionary is not fully consistent in its way of marking links; consequently, not all links were interpreted as such. In ELDIT, searching for the syntactic subcategorization of prendere caused similar problems, but less acute ones; 40 per cent of the students immediately understood that they needed to look up ‘Verwendung’ (‘use’) to get access to the syntactic information; another 43 per cent managed with difficulty, mainly after some time of browsing. A particular case is the navigation-related aspect of BLF ’s portal function; this was tested with the task of finding a German equivalent of FR cloture; only one-sixth of the students managed to get access to the translation without difficulties, and more than 50 per cent did not find the equivalent. There
9781441128065_ch14_finals_txt_print.indd 300
7/6/2011 11:10:39 PM
Electronic Dictionaries as Tools
301
are several reasons for this. The subjects had difficulties with using the menu ‘Get the translation of’, with selecting the translation direction, and with the online dictionary Interglot, which is opened when the search is started correctly: the Interglot dictionary website contains dynamic advertisements which make it appear very different from BLF; many students assumed they had reached Interglot by error, and thus closed the Interglot window immediately. Many of them also had massive difficulties in getting back to BLF.6
14.4.4. Post-Test Questionnaire After the user tests, the subjects were asked to fill another questionnaire, mainly aimed at getting feedback on their satisfaction with respect to the structure of the dictionary websites, to navigation, the portal function and to error tolerance. Questions asked included the following (translated from Bank, 2010: 117–125): (1) Is the dictionary structured in a clear way? (2) Can you find the relevant data at the place where you expect it? (3) Do you know at each point in time at which place in the dictionary you are? Did you arrive at web sites which you didn’t expect? (4) Can you revert your actions at any point in the search process? Do you know at any point in time how to get back to the start page? (5) How do you evaluate the ease of use of the dictionary? In terms of structure and controllability, ELDIT was ranked ahead of OWID and BLF. A major problem, which may have caused rather critical views on BLF, is the lack of controllability and conformity with user expectations of the version analysed: BLF got bad marks for the questions under (2) and (3) and not very good ones for those under (4). A reason for this is the fact that the links to external resources provided by BLF are not all expectable, and that some external resources do not link back to the web site from where they have been called; furthermore, the structure of the external web sites may greatly differ from that of the calling BLF site (see advertisements in Interglot), which adds to the non-expectability. Finally, certain language resources, such as parallel corpus viewers, lists of automatically produced word co-occurrences and so on, may be hard to interpret if encountered for the first time. It seems that caution is in place when it comes to the creation of lexicographic portal functions. In terms of ease-of-use, the subjects were asked to give marks between 1 (best) and 6 (worst). ELDIT was ranked best (2.53 on average), followed by OWID (3.13) and BLF (4.0). This subjective ranking seems to reflect the problems that the students had in using BLF ’s portal function. In free text comments,
9781441128065_ch14_finals_txt_print.indd 301
7/6/2011 11:10:39 PM
e-Lexicography
302
several students noted that BLF was easy to use when the users knew exactly what they had to search for. From a lexicographic point of view, this suggests that task-specific data presentation may be a good device for specific and wellcircumscribed tasks, while search functions for less focused tasks may need to be kept as simple as possible.
14.5. Towards Better Usability Testing of Electronic Dictionaries The experiments discussed in this chapter can at best be interpreted as a preliminary study; as this was the first study of electronic dictionaries in a usability laboratory, we needed to learn whether and how user testing of electronic dictionaries could at all be set up. Later experiments should be much more systematic, in several respects. The citation from Tarp (2008a) given in the introduction, mentions use of lexicographic tools via consultation or passive searching. In information science, these two consultation modes, an active task-specific one, and a less focused one, may be called ‘search’ and ‘browsing’. Search in this sense is a targeted one-shot action where the user aims at satisfying a particular need, for example, for linguistic information usable in the production of a text. Browsing is a less targeted activity of inspecting data provided by a given entry of the dictionary, possibly following links to other entries, and so on; it may thus consist of several successive steps. In the experiments reported here, we have only investigated a few search actions, no browsing. In further tests, a broader range of search tasks should be analyzed, and these should be contextualized as much as possible within typical actions of text production and text reception. It will also be necessary to address questions of monofunctionality in more detail: the example of BLF ’s function ‘Is it le or la’ shows that simple, easy to memorize task-specific search functions are effective and efficient. A task for user-centred design of dictionary interfaces will be to make sure more such functions can be provided (see. also Trap-Jensen, 2010). Our experiments also did not address in detail which lexicographic data categories should be presented to the user, depending on the needs of the user, and how they should be presented. The results obtained for example, on OWID suggest that a rich structure is interpreted as a sign of lexicographic quality by the users, but at the same time risks overloading them and causing navigation problems. This issue is related with that of the depth of access paths: some users seem to be so much acquainted with internet search engines and their one-shot search function (followed by manual inspection of candidate results on a trial-and-error basis), that they are reticent to follow deeper search paths,
9781441128065_ch14_finals_txt_print.indd 302
7/6/2011 11:10:39 PM
Electronic Dictionaries as Tools
303
where a sequence of decisions has to be taken, for example, going first to a lemma, then to its reading, and finally to its collocations. While such search paths are standard for printed (specialized) dictionaries, they seem to be less efficient (and less well accepted) in electronic dictionaries. But not only didn’t we have an opportunity to address a broader range of lexicographically relevant usability issues, it should also be emphasized that the experiments reported here are preliminary with respect to their internal consistency. As mentioned above, we did not conceive of our tests as comparative, but rather as summative; however, this relativizes the results: if it is intended to apply usability tests as a tool to investigate principles of electronic dictionary interface design, then one should keep more parameters constant than we could. In fact, to investigate access paths and the presentation of search results, instead of existing electronic dictionaries, a set of mock-ups should be used, which contain the same lexicographic data (categories), but different structures and/or different presentational layouts. Similarly, one could keep the presentational aspect constant, but provide lexicographic data of different granularity and degree of detail (e.g. with/without example sentences, more/ less grammatical indications, etc.), in order to test aspects of lexicographic data selection. We are in the process of preparing our first test on mock-ups: a small subset of entries from a collocation dictionary will be tested with different approaches to data presentation and with different access paths. We expect that there will be an opportunity, then, not only to come up with insights into the usability of certain lexicographic devices for data presentation in electronic dictionaries, but also with further elements of an assessment of the usability testing methodology itself: for electronic dictionaries, usability testing is only at its very beginning.
Notes 1
2
Many thanks to Christina Bank and Thomas Mandl (Hildesheim) for discussions on the topics of the present article. All experiments were carried out by Christina Bank; the present summary, interpretation and lexicographic assessment of Bank’s work is of the sole responsibility of the author – as are all misconceptions thereby introduced. It corresponds to some extent to the lexicographic objective of providing exactly the kinds of lexicographic data needed for solving a particular problem in a particular situation. As users may have different types of needs in connection with one and the same search in a dictionary, a sort of relevance ranking may avoid them to have to switch user profiles or task profiles: assume a user has a text receptive need with respect to a given lexical item; the user looks up the item and gets a meaning explanation, including a synonym. Instead of leaving
9781441128065_ch14_finals_txt_print.indd 303
7/6/2011 11:10:39 PM
304
3
4 5
6
e-Lexicography
the dictionary, the user’s curiosity is stimulated by the synonym indication, and the user feels a need for understanding the details of the difference between the two synonymous items. To satisfy this ‘secondary’ need, a direct display of the synonym (and an explanation of its difference with the first item searched) in the set of results of the first search could be useful, provided it is clearly marked as being of secondary relevance. You are producing a German text and you are not sure whether you can write aus gutem Grund or mit gutem Grund. Do both multiword expressions exist? Could both be used, or only one? MORAE: www.techsmith.de/morae.asp Note that the task is less clearly related to text production than it may seem; it is intended to make users search for the multiword; the perspective with which they search is however one of uncertainty about the meaning (reception-oriented) and, at the same time, about the form of the multiword (production oriented). Outside the user tests, we also noticed that students have difficulty in interpreting the parallel corpus website (OPUS, http://urd.let.rug.nl/tiedeman/OPUS) which could be called from the tested version of BLF: interpreting parallel example sentences (in which obviously no word or phrase equivalences are marked) in order to find word equivalence candidates is a non-trivial task.
9781441128065_ch14_finals_txt_print.indd 304
7/6/2011 11:10:40 PM
Chapter 15
Conclusions: Ten Key Issues in e-Lexicography for the Future Eva Samaniego Fernández Beatriz Pérez Cabello de Alba
15.1. Introduction Andersen and Nielsen (2009) summarize ten key issues in lexicography that were discussed at the international symposium Lexicography at a Crossroads: Dictionaries and Encyclopaedias Today, Lexicographical Tools Tomorrow, which was organized by the Centre for Lexicography (Aarhus School of Business) on 19–21 May 2008. Subsequently, a similar international symposium, devoted to discussing the state-of-the-art in e-lexicography, was held at the University of Valladolid (14–16 June 2010). From the start, we have to say that two of the issues summarized by Andersen and Nielsen were no longer considered contentious by most of the participants: the position of lexicography in the research community, and the role of lexicographers as language planners. Most of the participants agreed that lexicography is an independent discipline with its own research object (the dictionary) and its own scientific concepts, categories, methods of analysis and so on. Hence, the view that equates lexicography to a sub-discipline of linguistics was considered erroneous. For instance, in the compilation of the Danish Music Dictionary described in Chapter 9, no linguist took part. Moreover, participants also agreed that every single lexicographical decision ‘has language policy relevance and therefore, in the end, a political dimension’ (Andersen & Nielsen, 2009: 361). This reinforces the view of the lexicographer as a language planner and hence most participants in the symposium agreed that lexicographers are expected to give recommendations by adopting a proscriptive approach, that is, ‘users get explicit recommendations instead of being left to choose between competing alternatives’ (Andersen & Nielsen, 2009: 362).
9781441128065_ch15_finals_txt_print.indd 305
7/6/2011 11:08:49 PM
e-Lexicography
306
Participants also agreed that lexicographical structures must be determined by lexicographers and not by the technological ambitions of the computer experts (Andersen & Nielsen, 2009: 358), and defined lexicographers as dictionary experts, that is, professionals who take an active part in the process of dictionary-making. In sum, for the participants in Valladolid, the basic premise was that the research object of lexicography is the dictionary presented as a tool that aims to satisfy the specific needs specific users have in a specific use situation. Within this framework, we summarize below a list of key issues that are expected to shape the lexicographical debate in the near future. The issues discussed, which can be connected with Andersen and Nielsen’s list, were adapted to the characteristics of e-lexicography described in detail in this volume.
15.2. Discussion of the Ten Key Issues The key issues identified by the participants of the symposium are organized below together with a short introduction that contextualizes each issue. In line with the main topics discussed in this book, our list broadens and narrows the lexicographic debate at the same time. On the one hand, it broadens the debate by insisting that the future of e-lexicography is connected with its multidisciplinarity, that is, the construction and analysis of e-dictionaries can be achieved from more perspectives than the ones traditionally employed. On the other hand, we need a more precise definition of the concept of the e-dictionary with the aim of avoiding the proliferation of options that are beyond the reach of most users, either because they are very complicated or because they involve very high reference costs (Nielsen, 2008).
15.2.1. What Is an Internet Dictionary? Whereas some lexicographers have assumed the necessity of new theories to construct internet dictionaries (e-dictionaries, from now onwards), participants at the symposium stated that lexicographic theory must develop in such a way that it is not a medium-specific theory but a theory for all lexicographical tools. In other words, participants defended the view that we only need one lexicographical theory, although adaptable to the different access to, and data presentation possibilities of, the media. This implies that e-lexicography is viewed as a change of medium, which will force lexicographers to reconcile lexicographic theoretical assumptions with the characteristics of the new medium, and to ask whether we need more than users, access and data for defining e-dictionaries. For example, can lexicographic theory help
9781441128065_ch15_finals_txt_print.indd 306
7/6/2011 11:08:50 PM
Conclusions: Ten Key Issues in e-Lexicography
307
in avoiding information death and/or information stress? Within this framework, most participants accepted a narrow view of the e-dictionary by claiming that e-dictionaries are constructed with the aim of offering dynamic articles with dynamic data that correspond to the specific types of information needs which specific types of users performing specific types of lexicographically relevant activities may have in any consultation situation.
15.2.2. Is Lexicography Part of Information Science? In the age of the internet, the big problem is not lack of access to the needed data, but the fact that data cannot be found, or is found in a quantity that leads to information stress or information death, both of which usually force potential users to abandon the search before finding the results. Participants in the symposium discussed whether lexicographical assumptions can help users to avoid being engulfed by the large quantity of data retrieved, and agreed that only proper internet dictionaries, but not other types of electronic ones, assure quick and easy access to extra-lexicographical data. Hence, old typologies of electronic dictionaries must be substituted by more informed ones, which make room for considering lexicography as a consultation and interdisciplinary discipline concerned with the study, design and development of functional tools aimed solely at the gratification of human information needs and problems.
15.2.3. Which Consultation Use Situations Have Been Identified So Far? Participants agreed that a lexicographic theory could not be built directly from concrete and individual phenomena that may differ from each other in many ways. For example, an analysis of the log files connected with several e-dictionaries has found that there is not a systematic search path with which we can formulate theoretical assumptions on how users do really search. This theory has to be formulated from an abstraction with which we can work by referring to types of users, types of user situations, types of user needs and types of data that may satisfy these needs. Within this theoretical framework, participants discussed two related questions, one concerned with use situations and the other – which will be dealt with below 15.2.4 – with possibilities for the customization of user-needs satisfaction. Regarding use situations, participants presented several search options connected with cognitive and communicative use situations and envisaged several options connected with interpretive and operative use situations, both of which are starting to gain
9781441128065_ch15_finals_txt_print.indd 307
7/6/2011 11:08:50 PM
308
e-Lexicography
the attention of lexicographers and are therefore expected to be of interest in the near future.
15.2.4. How Can We Reconcile Abstract Models with Individual Search Options? As previously indicated, participants took a very active part in the debate on the best way for adapting abstract models to the fact that each consultation is an individual act and, therefore, the individualization of user-needs satisfaction is a question to be taken seriously, especially since the internet allows lexicographers to provide the necessary mechanisms for an individualization of dictionary content. Several options presented in the symposium were promising and might signal the way ahead for reconciling abstract models with individual options. These options are connected with using several information technologies and techniques, some of which, for example, Verlinde’s new interface, merited more attention than others. For example, our next question really presents a challenge for constructing real e-dictionaries and the way of classifying them.
15.2.5. Is an Information Tool the Same as an Information Database? As commented on in several chapters in this book, most participants in the symposium accepted that this question has already been answered, as they consider that a defining element of e-lexicography is the fact that there is a difference between an information database and an information tool. In particular, several lexicographical projects, for example, the accounting dictionaries, are witnesses to this distinction, which is having serious implications for both theoretical and practical lexicography. This distinction means that a single database can be the source of several dictionaries, for example, a dictionary for translating, a dictionary for reception and so on. In addition, this distinction is also blurring the very concept of dictionary typology, as in lexicographical e-tools conceived according to the principles of individualization, there are neither monofunctional nor multifunctional data access routes, but only individualized ones that translate into lexicographical e-tools viewed as a single multifunctional dictionary with individualized search options within the defined framework of its functions. For example, it was interesting to see in action how four or more dictionaries can be constructed from the same database, which prompts us to our next question relating to the quality of dictionaries. In sum, the future of e-lexicography is in one way or another going to
9781441128065_ch15_finals_txt_print.indd 308
7/6/2011 11:08:50 PM
Conclusions: Ten Key Issues in e-Lexicography
309
be connected to the acceptance of the difference between an information tool and an information database.
15.2.6. What Is a Good Dictionary? For hundreds of years, the quality of a dictionary has been equated to the amount of data included in the same. It was argued that the more data a dictionary contained, the better. However, this principle has been revealed as erroneous and full of flaws, and therefore should be discarded as it might lead users to commit the mistake of retrieving much more data than needed, a possibility that is of the utmost importance for e-lexicography, considering that e-lexicography is not hampered by space restrictions. Participants in the symposium agreed that a good dictionary is a tool that offers users the help they are looking for, no more and no less, and, therefore, agreed on a re-thinking of some of the approaches commonly used for dictionary-making. One of them is the role of corpora in e-lexicography (see below). In a word, there are two important criteria when evaluating the use and quality of a dictionary: (i) whether the user can find the item that contains the answer to the question that prompted the search, (ii) and how long the research took.
15.2.7. What Is the Role of Corpora in e-Lexicography? It is generally observed, particularly in publications that are totally influenced by British lexicographical traditions, that corpus is the magic word in lexicography. We are usually confronted with the sad truth that lexicographers are required to adapt their work and their data selection and processing options to the results generated by the computer. On the other hand, nobody has convincingly shown that aspects typically associated with corpora, for example, frequency markers, have a real influence on the way users do effectively select data. An open question for the future of e-lexicography could be whether a corpus has a real role in e-lexicography, especially in connection with some of the options already available, and which were discussed in connection with English dictionaries, which allow users to retrieve from corpora. More important for the construction of e-dictionaries is our next question, which refers to the human factor in lexicography.
15.2.8. Who Is a Lexicographer? The basic premise for our future discussions is that lexicography is an independent discipline that aims at making reference tools by integrating
9781441128065_ch15_finals_txt_print.indd 309
7/6/2011 11:08:50 PM
310
e-Lexicography
knowledge and skills from different quarters. This means that the construction of e-dictionaries may demand the joint effort of experts in several fields. Consequently, the question of who is a lexicographer was subject to several interpretations, particularly in connection with the democratization trend observed in some lexicographic projects, for example, Wikipedia, which are the result of contributions from different people, many of whom would never call themselves lexicographers. Participants agreed that a lexicographer is any person who takes part in a lexicographic project and envisaged rapid changes in conjunction with the development of wiki technologies and therefore with the necessity of considering lexicography an integral part of information science. This led us to our next question, which is concerned with theoretical and practical considerations which have to be learned and unlearned from the past.
15.2.9. What Can We Learn and Unlearn from the Past? Most participants in the symposium accepted a narrow approach to the concept of electronic lexicography by claiming that only some internet dictionaries, that is, those constructed afresh for the new medium, really have the potential for both challenging accepted views of lexicography and illustrating the aforementioned paradigm shift, which defends the inclusion of lexicography as an integral part of information science. However, participants also accepted that the future of e-lexicography should not shy away from its past and referred to the distinction between contemplative and transformative approaches as a way for illuminating the future with experiences from the past. In sum, this view is connected with our first question and reinforces the necessity of avoiding a two-string theory of lexicography, one for printed dictionaries and another for electronic dictionaries.
15.2.10. How Useful Can Information Technologies and Techniques Be? Last but not least, our final question highlights that some of the systems presented offer options that, first, are concerned with the question of pluri-monofunctionality, and, secondly, allow different levels of granularity. The question of pluri-monofunctionality will lead to producing options dependent on users’ needs (for example, the language and vocabulary in which the dictionary is constructed). Similarly, the level of granularity will be presented in terms of how and when information technologies and techniques such as ‘searching and navigating’, ‘user profiling’, ‘filtering’, ‘adaptive hypermedia’, ‘metadata mark-up’, ‘linked open knowledge’, ‘recommended systems’ and ‘annotated
9781441128065_ch15_finals_txt_print.indd 310
7/6/2011 11:08:50 PM
Conclusions: Ten Key Issues in e-Lexicography
311
systems’ – can be used. Within this framework we can expect that in the near future these technologies will be connected with upgrading searches, some of which, for example, Boolean searches, maximizing and minimizing searches, were detailed in some of the presentations of the symposium.
15.3. Conclusion Whenever lexicographical scholars from different traditions meet in the same room, there are bound to be divergent opinions about the topics discussed. This also happened at the symposium held at the University of Valladolid. However, we have to say that some consensus as well as valuable conclusions and recommendations emerged, as the above questions show. Perhaps the most valuable conclusion reached was that lexicography is a term for an independent university discipline and a professional activity. As has been shown elsewhere, ‘nothing is more practical than a good theory’ (Nielsen & Tarp, 2009: ix) and, therefore, we conclude this summary by stating our conviction that the future of e-lexicography will be bright, provided nobody forgets that, whatever their names, dictionaries are essentially tools that aim at satisfying the specific needs specific users may have in specific use situations and, therefore, both theoretical and practical lexicographical activities are, and must always be, interwoven.
9781441128065_ch15_finals_txt_print.indd 311
7/6/2011 11:08:50 PM
References
A. Dictionaries Internet Dictionaries, CD-ROM Dictionaries, Dictionary Portals, and Dictionary Aggregators: Accounting dictionaries: Acronym Finder 1988–2010: Acronym Finder: www.acronymfinder.com [Last accessed: 30 October 2010]. Alexandria: Alexandria: www.memodata.com/ [Last accessed: 30 October 2010] Alexandria Familiale: Alexandria (édition familiale): http://alexandria.sensagent. com [Last accessed: 30 October 2010]. American Heritage Dictionary 2006: The American Heritage® Dictionary of the English Language. 4th Edn: http://dictionary.reference.com/help/ahd4.html [Last accessed: 30 October 2010]. Base lexicale du français (BLF) 2010: http://ilt.kuleuven.be/blf [Last accessed: 30 October 2010]. BusinessDictionary 2010: Base lexicale du français BusinessDictionary.com: www.businessdictionary.com/404.php [Last accessed: 30 October 2010]. Cambridge Dictionaries Online: http://dictionary.cambridge.org [Last accessed: 30 October 2010]. Carnegie Mellon Dictionary: The CMU Pronouncing Dictionary: www.speech.cs.cmu. edu/cgi-bin/cmudict [Last accessed: 30 October 2010]. Chambers Dictionary: Chambers 21st Century Dictionary: www.chambersharrap. co.uk/chambers/index.shtml [Last accessed: 30 October 2010]. CLAVE 2006: Diccionario CLAVE. Madrid: SM (CD-ROM). Cobuild Dictionary: myCOBUILD.com: www.mycobuild.com/homepage.aspx [Last accessed: 30 October 2010]. Collins: Collins English Free Dictionary: http://collinslanguage.com [Last accessed: 30 October 2010]. Danish Internet Dictionary 2010: Bergenholtz, H., in cooperation with Filip Bodilsen, Kathrine Brosbøl Eriksen, Rasmus Elmelund, Stine Busk Hedegaard, Helene H. Jensen, Torben Jensen, Emilie Dittmer Laursen, Katja Å. Laursen, Heidi Agerbo Pedersen: Den Danske Netordbog. Database and design: Richard Almind: www.dendanskenetordbog.dk [Last accessed: 30 October 2010]. Dansk Parlør: Dansk Parlør: www.parlor.dk/ [Last accessed: 30 October 2010].
9781441128065_ref_finals_txt_print.indd 312
7/6/2011 11:37:22 PM
References
313
Den Danske Netordbog 2011. Edited by H. Bergenholtz, H. A. Pedersen, in collaboration with Filip Bodilsen, Rasmus Elmelund, Kathrine Brosbøl Eriksen, Stine Busk Hedegaard, Helene H. Jensen, Torben Jensen, Emilie Dittmer Laursen. Database: Richard Almind: Den Danske Netordbog. Odense: Ordbogen.com 2011. (www.ordbogen.com) [Last accessed: 12 April 2011]. Den Danske Ordbog 2010. Edited by Ebba Hjorth, Iver Kjær, Kjeld Kristensen, Ole Norling Kristensen and Lars Trap-Jensen. www.ordnet.dk/ddo [Last accessed: 30 October 2010]. Diccionario de la Lengua Española. www.rae.es . [Last accessed: 30 October 2010]. Dictionnaire de l’Académie. 9th edn. www.academie-francaise.fr/dictionnaire. [Last accessed: 30 October 2010]. Dictionary: Dictionary.com: http://dictionary.reference.com. [Last accessed: 30 October 2010]. Nielsen, S., Mourier, L., and Bergenholtz, H. 2003–2009. Danish Accounting Dictionary. www.ordbogen.com. [Last accessed: 30 October 2010]. Nielsen, S., Mourier, L., and Bergenholtz, H. 2003–2009. Danish–English Accounting Dictionary. www.ordbogen.com. [Last accessed: 30 October 2010]. Nielsen, S., Mourier, L. and Bergenholtz, H. 2004–2009. English Accounting Dictionary. www.ordbogen.com. [Last accessed: 30 October 2010]. Nielsen, S., Mourier, L., and Bergenholtz, H. 2004–2009. English–Danish Accounting Dictionary. www.ordbogen.com [Last accessed: 30 October 2010]. Nielsen, S., Mourier, L., Bergenholtz, H., Fuertes-Olivera, P. A., Gordo Gómez, P., Niño Amo, M., de los Rios Rodicio Á., Sastre Ruano, Á., Tarp, S. and Velasco Sacristán. M. 2009. El Diccionario Inglés–Español de Contabilidad. Online at: www. accountingdictionary.dk/regn/gbsp/regngbsp_index.php. [Last accessed: 30 October 2010].
Dictionary of fi xed expressions 2010: Bergenholtz, H. in collaboration with Esben Bjærge. 2010. Fixed Expressions with a Certain Meaning (Faste vendinger med en bestemt betydning). Database and design: Richard Almind. Odense: Ordbogen.com. www.ordbogen.com [Last accessed: 30 October 2010]. Bergenholtz, H. in collaboration with Esben Bjærge. 2010. Knowledge about Fixed Expressions (Betydning af faste vendinger). Database and design: Richard Almind. Odense: Ordbogen.com. www.ordbogen.com [Last accessed: 30 October 2010]. Bergenholtz, H. in collaboration with Esben Bjærge. 2010. Meaning of Fixed Expressions (Betydning af faste vendinger). Database and design: Richard Almind. Odense: Ordbogen.com. www.ordbogen.com [Last accessed: 30 October 2010]. Bergenholtz, H. in collaboration with Esben Bjærge. 2010. Use of Fixed Expressions (Brug af faste vendinger). Database and design: Richard Almind. Odense: Ordbogen.com. www.ordbogen.com [Last accessed: 30 October 2010]. Dorland’s Medical Dictionary 2007: Doorland’s Medical Dictionary: www.merckmedicus.com/pp/us/hcp/thcp_dorlands_content_split.jsp?pg=/ppdocs/us/
9781441128065_ref_finals_txt_print.indd 313
7/6/2011 11:37:22 PM
314
References
common/dorlands/drlnd/misc/dmd-a-b-000.htm [Last accessed: 30 October 2010]. DRAE 2003: Diccionario de la Lengua Española. Madrid: Espasa (CD-ROM) DUE 2001= Diccionario de Uso del Español. Madrid: Gredos (CD-ROM). ELDIT 2010: Elektronisches Lernerwörterbuch Deutsch-Italienisch: http://dev. eurac.edu:8081/MakeEldit1/Eldit.html [Last accessed: 30 October 2010]. e-WAT 2010 = Elektroniese Woordeboek van die Afrikaanse Taal (e-WAT): www.spel. co.za/spwat.html [Last accessed: 30 October 2010]. Encarta: Encarta World English Dictionary: http://encarta.msn.com/encnet/ features/dictionary/dictionaryhome.aspx [Last accessed: 30 October 2010]. English Pronouncing Dictionary: English Pronouncing Dictionary: http://seas3.elte. hu/epd.html [Last accessed: 30 October 2010]. Etymology Dictionary: Online Etymology Dictionary: www.etymonline.com [Last accessed: 30 October 2010]. Free Dictionary: The Free Dictionary.com: www.thefreedictionary.com/ [Last accessed: 30 October 2010]. GDEX Dictionary 2008: The GDEX Demo Dictionary: http://forbetterenglish.com/. [Last accessed: 30 October 2010]. GDUEsA 2001: Sánchez, A. 2001. Gran Diccionario de Uso del Español Actual. Madrid: SGEL. (CD-ROM) Glossarist: Glossarist. A searchable directory of glossaries and topical dictionaries: http:// glossarist.com/ [Last accessed: 30 October 2010]. Google 2010: Free Google Dictionaries: www.google.com/dictionary?hl=es [Last accessed: 30 October 2010]. Howjsay: Howjsay.com Talking Dictionary: www.howjsay.com/ [Last accessed: 30 April 2010]. Hypermedia Glossary of Genetic Terms 2010: Schlindwein, B. (ed.): Hypermedia Glossary of Genetic Terms: http://hal.weihenstephan.de/genglos/asp/genreq.asp. [Last accessed: 30 October 2010]. IATE 2010: Inter Active Terminology for Europe: http://iate.europa.eu/iatediff/ SearchByQueryLoad.do?method=load [Last accessed: 30 October 2010]. Longman 2010: Longman Dictionary of Contemporary English Online www.ldoceonline.com [Last accessed: 30 October 2010]. Macmillan 2010: Macmillan Dictionary: www.macmillandictionary.com [Last accessed: 30 October 2010]. Macmillan Open Dictionary 2010: Macmillan Dictionary. Open Dictionary: www.macmillandictionary.com/open-dictionary/latestEntries.htm [Last accessed: 30 October 2010]. Merriam Webster 2002: Third New International Dictionary. Springfield: Merriam Webster (CD-ROM). Merriam-Webster’s Online Dictionary 2010: Merriam-Webster’s Learner’s Dictionary: www.learnersdictionary.com [Last accessed: 30 October 2010]. Merriam-Webster Open Dictionary 2010: Merriam-Webster Open Dictionary: www3. merriam-webster.com/opendictionary/ [Last accessed: 30 October 2010]. Microbial Genetics Glossary 2006: Microbial Genetics Glossary: www.sci.sdsu. edu/~smaloy/Glossary/ [Last accessed: 30 October 2010].
9781441128065_ref_finals_txt_print.indd 314
7/6/2011 11:37:22 PM
References
315
Middle English Dictionary 2001: Middle English Dictionary: http://quod.lib.umich. edu/m/med [Last accessed: 30 October 2010]. Musikordbogen 2010: The Danish Music Dictionary: Bergenholtz, I., in cooperation with R. Almind and H. Bergenholtz. The Danish Music Dictionary. www.ordbogen.com [Last accessed: 30 October 2010] Newbury House 2010: Heinle’s Newbury House Dictionary of American English http:// nhd.heinle.com/home.aspx [Last accessed: 30 October 2010]. OED 2010: Oxford English Dictionary: www.oed.com/ [Last accessed: 30 October 2010]. OneLook: One Look Dictionary Search: www.onelook.com/ [Last accessed: 30 October 2010]. Ordbog over det danske sprog Internet 2010: Ordbog over det danske sprog. Dansk i perioden 1700–1950. ODS på netter 2010: http://ordnet.dk/ods/ [Last accessed: 30 October 2010]. Ordbogen over Faste Vendinger: Ordbogen over Faste Vendinger See: Dictionary of Fixed Expressions 2010. OWID 2010: Online-Wortschatz-Informationssystem Deutsch: www.owid.de/ [Last accessed: 30 October 2010]. Oxford: Oxford Dictionary of English. 3rd edn; and the New Oxford American Dictionary 2010. 3rd en: http://oxforddictionaries.com [Last accessed: 30 October 2010]. Oxford Advanced Learner’s Dictionary 2010: Oxford Advanced Learner’s Dictionary. 7th edn: www.oup.com/elt/catalogue/teachersites/oald7/ . [Last accessed: 30 October 2010]. Reverso: Online Dictionary: www.reverso.net [Last accessed: 30 October 2010]. RhymeZone 2010: RhymeZone Semantic Dictionary: www.rhymezone.com/ [Last accessed: 30 October 2010]. Svenska Akademiens Ordbok 2010: Svenska Akademiens Ordbok: http://g3.spraakdata. gu.se/saob [Last accessed: 30 October 2010]. Urban Dictionary: Urban Dictionary: www.urbandictionary.com [Last accessed: 30 October 2010]. Thesaurus 2010: Thesaurus.com: http://thesaurus.com/?regHome=true [Last accessed: 30 October 2010]. TLFi 2010: Trésor de la langue française informatisé: http://atilf.atilf.fr/tlf.htm. [Last accessed: 30 October 2010]. Visual Thesaurus 1998–2010: The Visual Thesaurus®: www.visualthesaurus.com/ [Last accessed: 30 October 2010]. VOX 2001: Diccionario General de la Lengua Española. Barcelona: VOX (CD-ROM). W3C 2001: W3C Semantic Web Activity: www.w3.org/2001/sw/. [Last accessed: 30 October 2010]. Webster New World College 2001 – Webster New World College Dictionary. Springfield: Merriam Webster (CD-ROM). Webster’s Online Dictionary 2010: Webster’s Online Dictionary: www.websters-onlinedictionary.org/ [Last accessed: 30 October 2010]. Wiktionary 2010: http://en.wiktionary.org [Last accessed: 30 October 2010]. Wikipedia 2010: http://en.wikipedia.org/wiki/Main_Page [Last accessed: 30 October 2010].
9781441128065_ref_finals_txt_print.indd 315
7/6/2011 11:37:22 PM
316
References
Wordnik 2010: www.wordnik.com [Last accessed: 30 October 2010]. Wortschatz Universität Leipzig 2010: Wortschatz Universität Leipzig 2010: http:// wortschatz.uni-leipzig.de [Last accessed: 30 October 2010].
Printed Dictionaries American Heritage Dictionary of Phrasal Verbs 2005. – The American Heritage Dictionary of Phrasal Verbs. Boston/New York: Houghton Mifflin Company. Biotechnology from A to Z 1993. – Bains, W. 1993. Biotechnology from A to Z. Introduction by G. Kirk Raab. Oxford/New York/Tokyo: Oxford University Press. Biotechnology Glossary 1990. Biotechnology Glossary: Glossaire de Biotechnologie. etc. English français deutsch italiano nederlands dansk español português ellhnika. London/New York: Elsevier. Bullokar 1623 – Bullokar, J. 1623. An English Expositor. London: John Legatt. Cambridge Advanced Learner’s Dictionary 2005. – The Cambridge Advanced Learner’s Dictionary 2005. Cambridge: Cam,bridge University Press. Cambridge Phrasal Verbs Dictionary 2006. – Cambridge Phrasal Verbs Dictionary. Cambridge: Cambridge University Press. Collins Cobuild Dictionary of Phrasal Verbs 1989. – Collins Cobuild Dictionary of Phrasal Verbs. London: HarperCollinsPublishers. Danske Talemåder 1998. – Røder, R. 1998. Danske Talemåder. Copenhagen: Gad. Den Danske Ordbog 2003–2005. – Den Danske Ordbog. Copenhagen: Det Danske Sprog – og Litteraturselskab/Gyldendal. Dictionary of Biotechnology 1986. – Combs, J. 1986. Macmillan Dictionary of Biotechnology. Macmillan: London/Basingstoke. English–Spanish Dictionary of Biotechnology 1998. – Kaufmann, U. and Bergenholtz, H. in cooperation with Bjarne Stumman, Sven Tarp, Laura de la Rosa Marabet, Nelson la Serna Torres and Gladys la Serna Miranda 1998. Encyclopedic Dictionary of Gene Technology. English–Spanish. Toronto: Lugus. Florio 1598. – Florio, J. 1598. A Worlde of Wordes. London. GDUEsA 2001. – Sánchez, A. (ed.). 2001. Gran Diccionario de Uso del Español Actual. Madrid: SGEL. Gentechnologie 1990. – Ibelgaufts, H. 1990. Gentechnologie von A bis Z. Weinheim usw.: VCH Gujin Tushu Jicheng. 1726. Volume 1–5,020. Ed. Menglei Chen and Tingxi Jiang. China. Harrap’s Essential English Dictionary 1995. – Higgleton, E. Managing (ed.) and Anne Seaton, (Senior Editor) 1995. Harrap’s Essential English Dictionary. Edingburgh: Harrap. Johnson 1755. – Johnson, S. 1755. A Dictionary of the English Language, 2 vols. London: J. & P. Knapton, T. & T. Longman et al. Longman Phrasal Verbs Dictionary 2000. – Longman Phrasal Verbs Dictionary. Harlow: Pearson Longman. Macmillan Phrasal Verbs Plus 2005. – Macmillan Phrasal Verbs Plus. Oxford: Macmillan Education.
9781441128065_ref_finals_txt_print.indd 316
7/6/2011 11:37:23 PM
References
317
Merriam-Webster 2002. – Third New International Dictionary. Springfield: MerriamWebster. Multilingual Glossary of Biotechnology Terms 1995. – Leuenberger, Hans Georg W., Nagel, B., Köbl, H. 1995. A Multilingual Glossary of Biotechnological Terms (IUPAC Recommendations) in English, French, German, Japanese, Portuguese, Russion, and Spanish. Basel: Verlag Helvetica Chimica Acta/Weinheim usw.: VCH. New Oxford 1998. – The New Oxford Dictionary of English 1998. Oxford: Oxford University Press. Nordisk leksikografisk ordbok 1997. – Bergenholtz, H., Cantell, I., Fjeld, R. V., Gundersen, D., Jónsson, J. H. and Svensén, B. 1997. Nordisk leksikografisk ordbok. Oslo: Universitetsforlaget. Nudansk Ordbog 2005. – Politikens Nudansk Ordbog 2005. Copenhagen: Politikens Forlag. Ordbog over det danske Sprog 1918–1956. Ordbog over det danske Sprog Bind I–XXVIII, Copenhagen: Gyldendal. Ordbog over det danske sprog paper 1992–2005. Ordbog over det danske sprog (1918– 1956) + Ordbog over det danske sprog Supplementsbind. Ordbog over det danske sprog Supplementsbind 1992–2005. Ordbog over det danske sprog. Supplement. Gyldendal: Copenhagen. Oxford Phrasal Verbs 2006. – Oxford Phrasal Verbs Dictionary for Learners of English. Oxford: Oxford University Press. Politikens Musikordbog 1996. – Bergenholtz, Inger: Politikens Musikordbog. Copenhague: Politiken. Talemåder i dansk 1998. – Andersen, S. T. 1998. Talemåder i dansk. Copenhagen: Munksgaard. Tesauro ISOC de Economía 1995. – Valverde, A. (ed.). Tesauro ISOC de Economía. Madrid: Centro de Información y Documentación Científica. Universal Dictionary of Trade and Commerce 1774. – Postlethwayt, M. (ed.). 1774. The Universal Dictionary of Trade and Commerce With large Additions and Improvements, Adapting the same to the Present State of British Affairs in America, since the last Treaty of Peace made in the Year 1763. With Great Variety of New Remarks and Illustrations Incorporated through the Whole: Together with Anything essential that is contained in Savary’s Dictionary: Also, all the Material Laws of Trade and Navigation relating to these Kingdoms, and the Customs and Usages to which all Traders are Subject. The Fourth Edn. Vol. 1. London: Printed for W. Strahan, etc. Woordeboek van die Afrikaanse Taal 1951. – Botha, W.F et al. (eds) 1951. Woordeboek van die Afrikaanse Taal. Stellenbosch: Buro van die WAT. Wörterbuch der Gentechnik 1998. – Oliver, S. G. and Ward, J. M. 1998. Wörterbuch der Gentechnik, Stuttgart: Gustav Fischer Verlag. Yongle Dadian 1408. Volume 1–11,095. Ed. Jin Xie. China.
B. Other Literature Abel, A. and Campogianni, S. 2005. “Facetten der Bedeutungsbeschreibung – ein integrativer Ansatz in der elektronischen Lernerlexikographie (aufgezeigt am
9781441128065_ref_finals_txt_print.indd 317
7/6/2011 11:37:23 PM
318
References
Beispiel von ELDIT)”, K. Mard-Miettinen and N. Niemilä (eds): Fachsprachen und Übersetzungstheorie. Vakki-Symposium XXV., Vörå 12.-13.02.2005. Publikationen der Studiengruppe für in Fachsprachenforschung. (Vaasa: Universität Vaasa), 62–72. Abel, A. and Weber, V. 2000. “ELDIT – A Prototype of an Innovative Dictionary”, Ulrich Heid et al. (eds): Vol. II, 807–818. Almela, M. 2006. From Words to Lexical Units: A Corpus-Driven Account of Collocation in and Idiomatic Patterning in English and Spanish. Frankfurt: Peter Lang. Almind, R. 2005. ‘Designing Internet Dictionaries’, in I. Barz, H., Bergenholtz, and J., Korhonen (eds), 103–118. Almind, R. 2008. ‘Søgemønstre i Logfiler’. LexicoNordica 15, 33–56. Amato, G. and Straccia, U. 1999. ‘User Profile Modeling and Applications to Digital Libraries. Research and Advanced Technology for Digital Libraries. (Lecture Notes in Computer Science, Volume 1696), 184–197. Andersen, B. 2006. Basic English Grammar. 2. udgave. Frederiksberg: Samfundslitteratur. Andersen, B. and Almind, R. 2001. ‘The Technical Realization of Three Monofunctional Phrasal Verb Dictionaries’. (This volume) Andersen, B. and Leroyer, P. 2008. ‘The Dilemma of Grammatical Data in Travel Dictionaries’. Lexikos 18, 27–45. Andersen, B. and Nielsen, S. 2009. ‘Ten Key Issues in Lexicography for the Future’, in H. Bergenholtz, S. Nielsen and Sven Tarp (eds), 355–363. Ankolekar, A., Krötzsch, M., Tran, T. and Vrandečić, D. 2008. ‘The Two Cultures: Mashing up Web 2.0 and the Semantic Web’. Journal of Web Semantics 6: 70–75. Atkins, B. T. S. 1993. ‘Theoretical Lexicography and its Relation to Dictionarymaking’. Dictionaries. Journal of The Dictionary Society of North America 14, 4–43. Atkins, B. T. S. 1996. ‘Bilingual Dictionaries – Past, Present and Future’, in M. Gellerstam et al. (eds), 515–546. Atkins, B. T. S. and Rundell, M. 2008. The Oxford Guide to Practical Lexicography. Oxford: Oxford University Press. Atkins, B. T. S., Kilgarriff, A. and Rundell, M. 2010. ‘Database of Analysed Texts of English (Dante): The Neid Database Project’, in A. Dykstra and T. Schoonheim (eds), 549–556. Bank, C. 2010. Die Usability von Online-Wörterbüchern und elektronischen Sprachportalen, (Hildesheim: University of Hildesheim, IwiSt), ms. [MA thesis]. Barz, I., Bergenholtz, H. and Korhonen, J. (eds) 2005. Schreiben, Verstehen, Übersetzen, Lernen. Zu ein- und zweisprachigen Wörterbüchern mit Deutsch. Frankfurt am Main: Peter Lang. Bederson, B. B. and Shneiderman, B. 2003. The Craft of Information Visualization: Readings and Reflections. San Francisco: Morgan Kaufmann. Béjoint, H. 2010. The Lexicography of English. Oxford: Oxford University Press. Bell, R. T. 2000. Translation and Translating. Theory and Practice. London & New York: Longman. Bergenholtz, H. 1995. ‘Leksikografi hvad er det?’, in Á. Svavarsdóttir, G. Kvaran, J. Jónssonn (eds), Nordiske studier i leksikografi 3. Rapport fra Konference om leksikografi i Norden. Reykjavik 7–10. June 1995. Reykjavik: Nordisk forening for leksikografi, 37–49.
9781441128065_ref_finals_txt_print.indd 318
7/6/2011 11:37:23 PM
References
319
Bergenholtz, H. 2003. ‘User-oriented Understanding of Descriptive, Proscriptive and Prescriptive Lexicography’. Lexikos 13, 65–80. Bergenholtz, H. 2009. ‘Hurtig og sikker tilgang til informationer om faste forbindelser’. LexicoNordica 16, 29–54. Bergenholtz, H. 2010. ‘Access to and Presentation of Needs-adapted Data in Monofunctional Internet Dictionaries’. Presentation at the acceptance of an honorary doctorate at the University of Valladolid, Valladolid, Spain, 17 June 2010. Bergenholtz, H. (in preparation). ‘Zugriff und Präsentation von Daten in Fachwörterbüchern, Lexika und Enzyklopädien’, in H. R. Spiegel, Z. Berdychowska, H. Bergenholtz and S. Habscheid (eds), Fachsprachen in Theorie und Praxis. (= Akten des IVG-Kongresses 2010 in Warschau, 30 July to 7 August). IVG: Warschau 2011. Bergenholtz, H. and Bergenholtz, I. 2011. ‘A Dictionary Is a Tool, a Good Dictionary Is a Monofunctional Tool’ (This volume). Bergenholtz, H. and Bjærge, E. 2009. ‘Konception af fire monofunktionelle ordbøger med faste vendinger’. LexicoNordica 16, 55–74. Bergenholtz, H. and Gouws, R. H. 2007a. ‘The Access Process for Fixed Expressions’. Lexicographica 23, 236–260. Bergenholtz, H. and Gouws, R. H. 2007b. ‘Korrek, Volledig, Relevant. Dit is die vraag aan leksikografiese definisies’. Tydskrif vir Geesteswetenskappe 47/4: 568–586. Bergenholtz, H. and Gouws, R. H. 2010. ‘A New Perspective on the Access Process’. Hermes. Journal of Language and Business Communication 44: 103–127. Bergenholtz, H. and Gouws, R. H. (to be published). ‘Who Is a Lexicographer?’ Bergenholtz, H., Gouws, R. H. and Claassen, W. T. (unpublished). ‘Ein Modell für ein Wörterbuch Deutsch-Afrikaans’. Bergenholtz, H. and Johnsen, M. 2005. ‘Log Files as a Tool for Improving Internet Dictionaries’. Hermes. Journal of Linguistics 34, 117–141. Bergenholtz, H. and Johnsen, M. 2007. ‘Log Files Can and Should Be Prepared for a Functionalistic Approach’. Lexikos 17, 1–20. Bergenholtz, H. and Johnsen, M. 2011. ‘User Research in the Field of Electronic Dictionaries: Methods, First Results’, in R. H. Gouws, U. Heid, W. Schweickard and H. E. Wiegand (eds). Bergenholtz, H. and Kaufmann, U. 1997. ‘Enzyklopädische Informationen in Wörterbüchern’, in N. Weber and H. E. Wiegand (eds), Semantik, Lexikographie und Computeranwendungen. Tübingen: Niemeyer, 168–182. Bergenholtz, H. and Nielsen, S. 2006. ‘Subject-field Components as Integrated Parts of LSP Dictionaries’. Terminology 12(2), 281–303. Bergenholtz, H. and Tarp, S. (eds) 1995. Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries. Amsterdam/Philadelphia: John Benjamins. Bergenholtz, H. and Tarp, S. 2002. ‘Die moderne lexikographische Funktionslehre. Diskussionsbeitrag zu neuen und alten Paradigmen, die Wörterbücher als Gebrauchsgegenstände verstehen’. Lexicographica 18: 253–263.
9781441128065_ref_finals_txt_print.indd 319
7/6/2011 11:37:23 PM
320
References
Bergenholtz, H. and Tarp, S. 2003. ‘Two Opposing Theories: On H. E. Wiegand’s Recent Discovery of Lexicographic Functions’. Hermes, Journal of Linguistics 31: 171–196. Bergenholtz, H. and Tarp, S. 2004. ‘The Concept of Dictionary Usage’. Nordic Journal of English Studies 3: 23–36. Bergenholtz, H. and Tarp, S. 2005a. ‘Verteilungsstrukturen in Wörterbüchern’, in I. Barz, H. Bergenholtz, and J. Korhonen, (eds), 119–126. Bergenholtz, H. and Tarp, S. 2005b. ‘Electronic Dictionaries: Old and New Lexicographic Solutions’. Hermes. Journal of Linguistics 34: 7–9. Bergenholtz, H. and Tarp, S. 2005c. ‘Wörterbuchfunktionen’, in H. Barz, H. Bergenholtz, J. Korhonen (eds.), 11-25. Bergenholtz, H. and Tarp, S. 2010. ‘LSP Lexicography or Terminography? The Lexicographer’s Point of View’, in P. A. Fuertes-Olivera, (ed.), 27–38. Bergenholtz, H., Nielsen, S. and Tarp, S. (eds) 2009. Lexicography at a Crossroads, Dictionaries and Encyclopedias Taday, Lexicographical Tools Tomorrow. Bern: Peter Lang. Bernal, E. and DeCesaris, J. (eds.) (2008). Proceedings of the XIII Euralex International Congress. Barcelona: Universitat Pompeu Fabra. Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Bizer, C., Heath, T., Idehen, K. and Berners-Lee, T. 2008. ‘Linked Data on the Web (LDOW2008)’. Proceedings WWW2008, Beijing, China. ACM, 1265–1266. Bogaards, P. 1990. ‘Où cherche-t-on dans le dictionnaire?’ International Journal of Lexicography 3(2): 79–103. Bogaards, P. 2010. ‘The Evolution of Learners’ Dictionaries and Merrian-Webster’s Advanced Learner’s English Dictionary, in I. Kernerman and P. Bogaards (eds), 11–27. Boshoff, S. P. E. 1926. ‘n Standaardwoordeboek van Afrikaans’. Gedenkboek ter ere van die GRA. Potchefstroom, 307–328. Bothma, TJD. 2011. ‘Filtering and Adapting Data and Information in the Online Environment in Response to User Needs’. (This volume) Botha, W. F. 2003. Die impak van die leksikografieteorie op die samestelling van die Woordeboek van die Afrikaanse Taal. Unpublished doctoral dissertation. University of Stellenbosch. Brusilovsky, P. 1996. ‘Methods and Techniques of Adaptive Hypermedia’. User Modeling and User-Adapted Interaction 6(2–3): 87–129. Brusilovksy, P. 2007. ‘Adaptive Navigation Support’, in P. Brusilovsky, A., Kobsa, and W. Nejdl (eds), The Adaptive Web: Methods and Strategies of Web Personalization. Berlin: Springer, 263–290. Brusilovsky, P. and Maybury, M. T. 2002. ‘From Adaptive Hypermedia to the Adaptive Web’. Communications of the ACM 45(5): 31–33. Brusilovsky, P. and Millán, E. 2007. ‘User Models for Adaptive Hypermedia and Adaptive Educational Systems’, in P. Brusilovsky, A., Kobsa, and W. Nejdl (eds), The Adaptive Web: Methods and Strategies of Web Personalization. Berlin: Springer, 3–53. Brusilovsky, P., Sosnovsky, S. and Yudelson, M. 2009. ‘Addictive Links: The Motivational Value of Adaptive Link Annotation’. New Review of Hypermedia and Multimedia, 15(1): 97–118. Bunt, A., Carenini, G. and Conati, C. 2007. ‘Adaptive Content Presentation for the Web’. The Adaptive Web: Methods and Strategies of Web Personalization. Berlin: Springer, 409–432.
9781441128065_ref_finals_txt_print.indd 320
7/6/2011 11:37:23 PM
References
321
Burchardt, A., Erk, K., Frank, A., Kowalski, A., Padó, S. and Pinkal, M. 2006. ‘The SALSA Corpus: A German Corpus Resource for Lexical Semantics’, in Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC2006). Genoa, Italy. Cantos, P. and Sánchez A. 2001. ‘Lexical Constellations: What Collocates Fail to Tell’. International Journal of Corpus Linguistics, 6(2): 199–228. Carr, M. 1997. ‘Internet Dictionaries and Lexicography’. International Journal of Lexicography 10(3): 209–230. Chapman, R. W. 1948. Lexicography. London: Oxford University Press. Chun, D. and Plass, J. L. 1996. ‘Effects of Multimedia Annotations on Vocabulary Acquisition’. Modern Language Journal 80(2): 183–198. Church, K. W. 2008. ‘Approximate Lexicography and Web Search’, in T. Hanks (ed.), Special Issue: The Legacy of John Sinclair. International Journal of Lexicography 21(3): 325–336. Combrink, J. G. H. 1962. ’n Prinsipiële beskouing oor WAT IV’. Tydskrif vir Geesteswetenskappe 2 (4): 199–221. Combrink, J. G. H. 1979. ‘Die sesde deel van die WAT’. Standpunte 140: 49–64. Cosijn, E. 2006. ‘Relevance Judgements within the Context of Work Tasks’. Proceedings of the 1st international conference on Information interaction in context, IIiX. ACM, 20–29. Cosijn, E. and P. Ingwersen, 2000. ‘Dimensions of Relevance’. Information Processing and Management: an International Journal 36(4): 533–550. Cowie, A. P. 1999. English Dictionaries for Foreign Learners: A History. Oxford: Oxford University Press. Crystal, D. 1986. ‘The Ideal Dictionary, Lexicographer and User’, in R. F. Ilson (ed.), 72–81. Dagut, M. and Laufer, B. 1985. ‘Avoidance of Phrasal Verbs – A Case for Contrastive Analysis’. SSLA 7: 73–80. Darwin, C. M. and Gray, L. S. 1999. ‘Going after the Phrasal Verb: An Alternative Approach to Classification’. TESOL Quarterly 33 (1): 65–83. Delavigne, V. 2008. ‘Élaboration d’une ressource lexicographique informatisée pour les patients atteints de cancer’. Interactions & Usages autour du Document Numérique. CIDE 11: 97–108. De Schryver, G. M. 2003. ‘Lexicographers’ Dreams in the Electronic-Dictionary Age’. International Journal of Lexicography, 16(2): 143–199. De Schryver, G. M. 2009. ‘State-of-the-Art Software to Support Intelligent Lexicography’, in R. Zhu (ed.), Proceedings of the International Seminar on Kangxi Dictionary & Lexicology. Beijing: Beijing Normal University, 565–580. De Schryver, G. M. and Prinsloo, D. J. 2000. ‘The Concept of Simultaneous Feedback: Towards a New Methodology for Compiling Dictionaries’. Lexikos 10, 1–31. DIN EN ISO 9241:10 (2003), (Berlin: Beuth) 2003 [= DIN-Taschenbuch 354]. DIN EN ISO 9241:110 (2006), (Berlin: Beuth) 2006. Dodd, W. S. 1989. ‘Lexicocomputing and the Dictionary of the Future’, in G. James (ed.), Lexicographers and Their Works. Exeter Linguistic Studies 14. Exeter: Exeter University Press, 83–93. Dolog, P. and Nejdl, W. 2007. ‘Semantic Web Technologies for the Adaptive Web’, in P. Brusilovsky, A., Kobsa, and W. Nejdl, (eds), The Adaptive Web: Methods and Strategies of Web Personalization. Berlin: Springer, 617–696.
9781441128065_ref_finals_txt_print.indd 321
7/6/2011 11:37:23 PM
322
References
Doroszewski, W. 1954. Z zaganien leksykografii polskiej. Warszawa. Downing, A. and Locke, P. 2002. A University Course in English Grammar. London/ New York: Routledge. Dumas, J. S. and Redish, J. C. 1999. A Practical Guide to Usability Testing. Bristol, UK: Intellect Books. Dykstra, A. and Schoonheim, T (eds.) Proceedings of the XIV Euralex International Congress. (in italics) Ljouwert: Afük. Elisenda, B. and DeCesaris, J. (eds) 2008. Proceedings of the XIII EURALEX International Congress. Barcelona: Universitat Pompeu Fabra. Farrar, S. and Langendoen, D. T. 2003. ‘A Linguistic Ontology for the Semantic Web’. GLOT International 7(3): 97–100. Feldweg, H. and Breidt, E. 1996. ‘COMPASS – An Intelligent Dictionary System for Reading Text in a Foreign Language’. Papers in Computational Lexicography (COMPLEX 96). Budapest: Linguistics Institute, 53–62. Fontenelle, T. 2006. ‘Les nouveaux outils de correction linguistique de Microsoft’, in P. Mertens, C. Fairon, A, Dister, and P. Watrin (eds): ‘TALN06 – Verbum ex machina – Actes de la 13e Conférence sur le traitement automatique des langues naturelles’ (Leuven, 10–13 avril 2006). UCL: Presses Universitaires de Louvain, 3–19. Fontenelle, T. 2009. http://blogs.msdn.com/correcteurorthographiqueoffice/ archive/2009/07/16/un-correcteur-contextuel-français-dans-office-2010.aspx Fontenelle, T., Hiligsmann, P., Michiels, A., Moulin, A. and Theissen, S. (eds) 1998. ACTES EURALEX’98 PROCEEDINGS, Communications soumises à EURALEX’98 (Huitième Congrès International de Lexicographie) Liège, Belgique/Papers submitted to the Eighth EURALEX International Congress on Lexicography in Liège, Belgium. Liège: English and Dutch Departments, University of Liège. Fuertes-Olivera, P. A. 2009a. ‘Systematic Introductions in Specialised Dictionaries’, in S. Nielsen and S. Tarp (eds), 161–178. Fuertes-Olivera, P. A. 2009b. ‘The Function Theory of Lexicography and Electronic Dictionaries: Wiktionary as a Prototype of Collective Free Multiple-Language Internet Dictionary’, in H. Bergenholtz, S. Nielsen and S. Tarp (eds), 99–134. Fuertes Olivera, P. A. (ed.) 2010a. Specialised Dictionaries for Learners. Berlin & New York: De Gruyter. Fuertes-Olivera, P. A. 2010b. ‘Introduction. Specialised Dictionaries for Learners’, in P. A. Fuertes Olivera (ed.), 17–24. Fuertes-Olivera, P. A. 2010c. ‘Lexicography for the Third Millennium. Free Institutional Internet Terminological Dictionaries for Learners’, in P. A. Fuertes-Olivera (ed.), 193–209. Fuertes-Olivera, P. A. and Nielsen, S. 2008. ‘Translating Politeness in Bilingual English–Spanish Business Correspondence’. Meta 53(3): 667–678. Fuertes-Olivera, P. A. and Pizarro-Sánchez, I. 2002. ‘Translation and ‘Similaritycreating Metaphors’ in Specialised Languages’. Target 14(1): 43–73. Garvin, P. L. 1955. ‘Problems in American Indian Lexicography and Text Edition’. Anais do XXXI Congr. International de Americanistas: 1013 ff. Gauch, S., Speretta, M., Chandramouli, A. and Micarelli, A. 2007. ‘User Profiles for Personalized Information Access’, in P. Brusilovsky, A., Kobsa, and W. Nejdl (eds), The Adaptive Web: Methods and Strategies of Web Personalization. Berlin: Springer, 54–89.
9781441128065_ref_finals_txt_print.indd 322
7/6/2011 11:37:23 PM
References
323
Gellerstam, M., Jarborg, J., Malmgren, S. G., Noren, K., Rogström, L. and Röjder Papmehl, C. (eds) 1996. EURALEX’96 Proceedings. Göteborg: Department of Swedish, Göteborg University. Gottlieb, H. and Mogensen, J. E. (eds), 2007. Dictionary Visions, Research and Practice. Amsterdam/Philadelphia: John Benjamins. Gouws, R. H. 1985. ‘Die sewende deel van die Woordeboek van die Afrikaanse Taal’. Standpunte 178: 13–25. Gouws, R. H. 1997. ‘Linguistische Theorie, lexikographische Praxis und das Woordeboek van die Afrikaanse Taal’, in K-P. Konerding, and A. Lehr (eds), Linguistische Theorie und lexikographische Praxis. Tübingen: Max Niemeyer, 17–31. Gouws, R. H. 2004. ‘Milestones in Metalexicography’, in Van Sterkenburg, P. G. J. (ed.) 2004. Linguistics Today – Facing a Greater Challenge, 187–205. Amsterdam / Philadelphia: Johan Benjamins. Gouws, R. H. 2005. ‘Oor die verhouding tussen woorderboekstrukture, woordeboekinhoud en leksikografiese funksies’. Lexikos 15, 52–69. Gouws, R. H. 2006. ‘Die zweisprachige Lexikographie Afrikaans-Deutsch – Eine metalexikographische Herausforderung’, in A. Dimova, V. Jesenšek, and P. Petkov (eds), Zweisprachige Lexikographie und Deutsch als Fremdsprache. Hildesheim: Georg Olms Verlag, 49–58. Gouws, R. H. 2007. ‘Sublemmata or Main Lemmata: A Critical Look at the Presentation of some Macrostructural Elements’, in H. Gottlieb and J. H. Mogensen (eds), 55–69. Gouws, R. H. 2009. ‘Dictionaries as Innovative Tools in a New Perspective on Standardisation’, in H. Bergenholtz, S. Nielsen, and S. Tarp (eds), 265–283. Gouws, R. H. 2011. ‘Learning, Unlearning and Innovation in the Planning of Electronic Dictionaries’. (This volume.) Gouws, R. H. and Leroyer, P. 2009. ’V leksikografiese toeganklikheid in die oorgang van ‘n toeristewoordeboek na ‘n toeristegids as naslaanbron’. Tydskrif vir Geestesewetenskappe 49(1): 145–159. Gouws, R. H. and Steyn, M. 2005. ‘Integrated Outer Texts: a Transtextual Approach to Lexicographic Functions’, in I. Barz, H., Bergenholtz, and J. Korhonen (eds), Schreiben, Verstehen, Übersetzen, Lernen. Zu ein- und zweisprachigen Wörterbüchern mit Deutsch. Frankfurt am Main: Peter Lang, 127–136. Gouws, R. H., Heid, U., Schweickard, W., and Wiegand, H. E. (eds). 2011. Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume: Recent Developments with Special Focus on Computational Lexicography. Berlin: Mouton de Gruyter: Berlin. (in print, to be published in 2012) Granger, S. 2003. ‘Error Tagged Learner Corpora and CALL: A Promising Synergy’. CALICO Journal, 20(3): 465–480. Grefenstette, G. 1998. ‘The Future of Linguistics and Lexicographers: Will There be Lexicographers in the Year 3000?’ in T. Fontenelle et al. (eds), 25–41. Grefenstette, G., Heid, U., Schulze, B. M., Fontenelle, T. and Gerardy, C. 1996. ‘The DECIDE Project: Multilingual Collocation Extraction’, in, M. Gellerstam et al., 93–107. Grobler, H. 1978. ‘n Voorlopige toepassing van S. P. E. Boshoff se kriteria vir ‘n groot woordeboek op WAT I–VI’. Klasgids 12(4): 29–46.
9781441128065_ref_finals_txt_print.indd 323
7/6/2011 11:37:23 PM
324
References
Haas, M. R. 1962. ‘What Belongs in a Bilingual Dictionary?’, in F. W. Householder and S. Saporta (eds), 45–50. Hanks, P. 2009. ‘Review of Stephen J. Perrault (ed.). 2008. Merriam-Webster’s Advanced Learner’s English Dictionary’. International Journal of Lexicography 22(3): 301–315. Harris, R. and Hutton, C. 2007. Definition in Theory and Practice. Language, Lexicography and the Law. London: Continuum. Hartmann, R. R. K. (ed.). 1984. LEX’eter ‘83 Proceedings. Tübingen: Max Niemeyer. Hartmann, R. R. K. 1989. ‘Sociology of the Dictionary User: Hypothesis and Empirical Studies’, in F. J. Hausmann, et al. (eds), 102–111. Hausmann, F. J. and Wiegand, H. E. 1989. ‘Component Parts and Structures of General Monolingual Dictionaries’, in F. J. Hausmann et al. (eds), 328–360. Hausmann, F. J., Reichmann, O., Wiegand, H. E. and Zgusta, L. (eds) 1989– 1991. Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography. Berlin: De Gruyter. He, D., Brusilovsky, P., Grady, J., Li, Q. and Ahn, J. W. 2007. ‘How Up-to-Date Should It Be?’ The Value of Instant Profiling and Adaptation in Information Filtering. Proceedings of the 2007 international conference on Web Intelligence, WI ‘07 (Silicon Valey, CA, USA). IEEE, 699–705. Heid, U. 2009. ‘Aspects of Lexical Description for Electronic Dictionaries’, in Proceedings of eLexicography 2009, Louvain-La-Neuve, 2009. Paper read at ‘e-Lexicography in the Twenty-First Century: New Challenges, New Applications (eLEX2009)’, held at the Université Catholique de Louvain, 22–24 October 2009. Heid, U. 2011. ‘Electronic Dictionaries as Tools: Toward an Assessment of Usability’. (This volume) Heid, U., Evert, S., Lehmann, E., and Rohrer, C. (eds) 2000. Proceedings of the Ninth Euralex International Congress, EURALEX 2000, Stuttgart, Germany, 8th–12th August 2000. Stuttgart: Institut für Maschinelle Sprachverarbeitung: Universität Stuttgart. Heift, T. and Schulze, M. 2007. Errors and Intelligence in Computer-Assisted Language Learning. Parsers and Pedagogues. London & New York: Routledge. Hein, P. 2010. Piet Hein’s Homepage. www.piethein.com/usr/piethein/HomepagUK. nsf Herbst, T. and Popp, K. (eds) 1999. The Perfect Learners’ Dictionary. Tübingen: Max Niemeyer Verlag. Hoey, M. 2005. Lexical Priming. A New Theory of Words and Language. London: Routledge. Householder, F. W. and Saporta, S. (eds) 1967. Problems in Lexicography, Report of the Conference on Lexicography held at Indiana University, 11–12 November 1960. Bloomington: Indiana University. Hulstijn, J. H. and Atkins, B. T. S. 1998. ‘Empirical Research on Dictionary Use in Foreign-Language Learning: Survey and Discussion’ in B. T. S. Atkins (ed.), Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators. Tübingen, Niemeyer, 7–19.
9781441128065_ref_finals_txt_print.indd 324
7/6/2011 11:37:23 PM
References
325
Hulstijn, J. H. and Marchena, E. 1989. ‘Avoidance: Grammatical or Semantic Causes’. SSLA 11: 241–255. Illich, I. 1973. Tools for Conviviality. London: Fontana. Ilson, R. F. (ed.) 1986. Lexicography: An Emerging International Profession. (Fulbright Papers 1.) Manchester: Manchester University Press. Ingwersen, P. 2001. ‘Cognitive Information Retrieval’. Annual Review of Information Science and Technology 34, 3–51. Ingwersen, P. 2007. ‘Context in Information Interaction – Revisited 2006’, in TJD Bothma, and A. Kaniki (eds), ProLISSA 2006: Proceedings of the Fourth Biennial DISSAnet Conference. Farm Inn, Pretoria, 2–3 November. Pretoria: Infuse, 13–23. Ingwersen, P. and Järvelin K. 2004. ‘Context in Information Interaction’, in TJD. Bothma, and A. Kaniki (eds), Progress in Library and Information Science in Southern Africa (ProLISSA): Proceedings of the Third Biennal DISSAnet Conference. Pretoria: Infuse, 301–310. Ingwersen, P. and Järvelin, K. 2005. The Turn: Integration of Information Seeking and Retrieval in Context. Dordrecht: Springer. ISO 9241–11. 1998. Ergonomic Requirements for Office Work with Visual Display Terminals (VTDs) – Part 11. Geneva: International Organization for Standardization. ISO 9241–110. 2006. Ergonomics of Human-System Interaction – Part 110: Dialogue Principles. Geneva: International Organization for Standardization. ISO/FDIS 12620. 2009. Terminology and Other Language and Content Resources – Specification of Data Categories and Management of a Data Category Registry for Language Resources. Geneva: International Organization for Standardization. ISO/FDIS 24613. 2008. Language Resource Management – Lexical Markup Framework (LMF). Geneva: International Organization for Standardization. Järvelin, K and Ingwersen, P. 2010. ‘User-oriented and Cognitive Models of Information Retrieval’. Encyclopedia of Library and Information Sciences. Third Edn. London: Taylor & Francis. Johnson, S. 1747. The Plan of a Dictionary of the English Language. Facsimile edition – 1970. Menston: The Scholar Press. Kang, H. and Shneiderman, B. 2000. ‘Visualization Methods for Personal Photo Collections: Browsing and Searching in the PhotoFinder’. 2000 IEEE inter national conference on multimedia and expo, ICME 2000. IEEE. Vol 3, 1539–1542. Kernerman, I. and Bogaards, P. (eds) 2010. English Learners’ Dictionaries at the DSNA 2009. Tel Aviv: K. Dictionaries. Kilgarriff, A. and Rychlý, P. 2010. ‘Semi-Automatic Dictionary Drafting’, in G-M. de Schryver, (ed.), A Way with Words: Recent Advances in Lexical Theory and Analysis. A Festschrift for Patrick Hanks. Kampala: Menha Publishers, 299–312. Kilgarriff, A., Husak, M., McAdam, K., Rundell, M. and Rychlý, P. 2008. ‘GDEX: Automatically Finding Good Dictionary Examples in a Corpus’, in E. Bernal and J DeCesaris (eds), 425–432. Knutov, E, De Bra, P and Pechenizkiy, M. 2009. ‘AH 12 Years Later: A Comprehensive Survey of Adaptive Hypermedia Methods and Techniques’. New Review of Hypermedia and Multimedia 15(1): 5–38.
9781441128065_ref_finals_txt_print.indd 325
7/6/2011 11:37:23 PM
326
References
Kobsa, A. 2007. ‘Generic User Modeling Systems’, in P. Brusilovsky, A., Kobsa, and W. Nejdl (eds), The Adaptive Web: Methods and Strategies of Web Personalization. Berlin: Springer, 136–154. Krömker, H. 2007. ‘Usability – Stand der Forschung’, in: J. Henning and M. Tjarks-Sobhani (eds): Usability und Technische Dokumentation. Lübeck: SchmidtRömhild, 12–23. Landau, S. I. 2001. Dictionaries: The Art and Craft of Lexicography (2nd edn). Cambridge: Cambridge University Press. Laufer, B. and Eliasson, S. 1993. ‘What Causes Avoidance in L2 Learning – L1–L2 Difference, L1–L2 Similarity, or L2 Complexity?’ SSLA 15: 35–48. Laurén, C. 1993. Fackspråk. Form, innhåll, function [Specialised Language. Form, content, function]. Lund: Studenterlitteratur. Leech, G. and Nesi, H. 1999. ‘Moving towards Perfection: The Learners’ (Electronic) Dictionary of the Future’, in T. Herbst and K. Popp (eds), 295–306. Leroyer, P. 2007. ‘Bringing Corporate Dictionary Design into Accord with Corporate Image. From Words to Messages and back again’, in H. Gottlieb and J. E. Mogensen (eds), 109–117. Leroyer, P. 2008a. ‘Maultasche og Novillada. En teori for turistordbøger’, in Á. Svavarsdóttir, G. Kvaran, G. Ingólfsson, and J. H. Jónsson (eds), Nordiske Studier i Leksikografi 9, Rapport fra Konference om Leksikografi i Norden, Akureyri 22–26. maj 2007. Reykjavik, Stofnun Árna Magnússonar í íslenskum fræðum og Nordisk Forening for Leksikografi i samarbejde med Språkrådet i Norge. Leroyer, P. 2008b. ‘Les Mutations de la Lexicographie Touristique’. Cahiers de linguistique, 103–116. Leroyer, P. 2009a. ‘Balancing the Tools: The Functional Transformation of Lexicographic Tools for Tourists’, in S. Nielsen, and S. Tarp (eds), 103–122. Leroyer, P. 2009b. ‘Lexicography Hits the Road: New Information Tools for Tourists’, in H. Bergenholtz, S. Nielsen, and S. Tarp (eds), 285–310. Leroyer, P. 2010. ‘Ej blot til lyst. Konsultation og navigation i leksikografi ske informationsværktøjer’. Nordiske Studier i Leksikografi 10. red. av Harry Lönnroth och Kristina Nikula. Tammersfors: Tammersforsk Universitet, 313–328. Lew, R. 2008. ‘Lexicographic Functions and Pedagogical Lexicography: Some Critical Notes on Sven Tarp’s Lexicography in the Borderland between Knowledge and Non-knowledge’, in K. Iwan and I. Korpaczewska (eds), Przegląd Humanistyczny. Pedagogika. Politologia. Filologia. Szczecin: Szczecińska Szkoła Wyższa Collegium Balticum, 114–123. Lew, R. 2009. ‘Towards Variable Function-Dependent Sense Ordering in Future Dictionaries’, in H. Bergenholtz, S. Nielsen and S. Tarp (eds), 237–264. Lew, R. 2010. ‘Multimodal Lexicography: The Representation of Meaning in Electronic Dictionaries’. Lexikos 20: 290–306. Lew, R. and Doroszewska, J. 2009. ‘Electronic Dictionaries Entries with Animated Pictures: Lookup Preferences and Word Retention’. International Journal of Lexicography 22(3): 239–257. Liao, Y. and Fukuya, Y. J. 2004. ‘Avoidance of Phrasal Verbs: The Case of Chinese Learners of English’. Language Learning 54: 193–226. McArthur, T. 1986. Worlds of Reference. Cambridge: Cambridge University Press.
9781441128065_ref_finals_txt_print.indd 326
7/6/2011 11:37:23 PM
References
327
Mossop, B. 2007. Revising and Editing for Translators. Manchester/Kinderhook: St. Jerome Publishing. Müller, P. 2010. Entwicklung einer dynamischen Web-Benutzeroberfläche für ein graph-basiertes Lexikonmodell. Undergraduate thesis. Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. Müller-Spitzer, C. 2010. ‘OWID – A dictionary net for corpus-based lexicography of contemporary German’, in A. Dykstra and T. Schoonheim(eds): Proceedings of the Fourteenth EURALEX International Congress, Leeuwarden, Netherlands, 6th– 10th July 2010, 442–452. Nesi, H. 2000. ‘Electronic Dictionaries in Second Language Vocabulary Comprehension and Acquisition: the State of the Art’, in U. Heid et al. (eds), 839–847. Nesi, H. and Meara, P. 1994. ‘Patterns of Misinterpretation in the Productive Use of ESL Dictionary Definitions’. System, 22: 1–15. Nielsen, J. 1993. Usability Engineering. San Diego, CA: Academic Press. Nielsen, S. 1990. ‘Contrastive Description of Dictionaries Covering LSP Communication’. Fachsprache/International Journal of LSP 3–4/1990, 129–136. Nielsen, S. 2002. Lexicographical Basis for an Electronic Bilingual Accounting Dictionary: Theoretical Considerations. www.sprog.asb.dk/sn/lexicographicalbasis.htm (Last accessed 31 August 2010). Nielsen, S. 2006. ‘Monolingual Accounting Dictionaries for EFL Text Production’. Ibérica 12: 43–64. Nielsen, S. 2008. “The Effect of Lexicographical Information Cost on Dictionary Making and Use’. Lexikos 18: 170–189. Nielsen, S. 2009. ‘Reviewing Printed and Electronic Dictionaries. A Theoretical and Practical Framework’, in S. Nielsen, and S. Tarp (eds), 23–42. Nielsen, S. and Almind, R. 2011. ‘From Database to Dictionary’. (This volume) Nielsen, S. and Mourier, L. 2005. ‘Internet Accounting Dictionaries: Present Solutions and Future Possibilities’. Hermes – Journal of Linguistics 34: 83–116. Nielsen, S. and Mourier, L. 2007. ‘Design of a Function-based Internet Accounting Dictionary’, in Gottlieb, H. and J. E. Mogensen (eds), 119–135. Nielsen, S. and Tarp, S. (eds) 2009. Lexicography in the 21st Century. In honour of Henning Bergenholtz. Amsterdam / Philadelphia: John Benjamins. Nielsen, S. and Tarp, S. 2009a. ‘Introduction: Nothing Is More Practical Than a Good Theory’, in S. Nielsen and S. Tarp (eds), ix–xi. Nord, C. 2005. Text Analysis in Translation. Theory, Methodology, and Didactic Application of a Model for Translation-Oriented Text Analysis. Amsterdam & New York: Rodopi. Odendal, F. F. 1979. ‘Plus positief en plus negatief’. Tydskrif vir Geesteswetenskappe 19(1): 24–41. Ooi, V. B. Y. 1998. Computer Corpus Lexicography. Edinburgh: Edinburgh University Press. Ooi, V. B. Y. 2008. ‘The Lexis of Electronic Gaming on the Web: A Sinclairian Approach’, in T. Hanks (ed.), Special Issue: The Legacy of John Sinclair. International Journal of Lexicography 21(3): 311–323. Országh, L. 1962. A szótáríras elmélete és gyarkorlate a Magyar nyelv ertelmezo szótárában. Budapest.
9781441128065_ref_finals_txt_print.indd 327
7/6/2011 11:37:24 PM
328
References
Pecman, M., Juilliard, C., Kübler, N., and Volanschi, A. 2009. ‘Processing Collocations in a Terminological Database Based on a Cross-disciplinary Study of Scientific Texts’, in S. Granger and M. Paquot (eds), eLexicography in the 21st Century: New Challenges, New Applications. Proceedings of eLex 2009. Presses Universitaires de Louvain, Cahiers du Cental, 249–262. Pedersen, J. 1995. ‘Systematic Classification’. In H. Bergenholtz and S. Tarp (eds), 83–90. Prinsloo, D. J. 2005. ‘Electronic Dictionaries Viewed from South Africa’. Hermes. Journal of Linguistics 34: 11–35. Prinsloo, D. J. 2009. ‘The Role of Corpora in Future Dictionaries’, in S. Nielsen and S. Tarp (eds), 181–206. Prószéky, G. and Földes, A. 2005. ‘Between Understanding and Translating: A Context-sensitive Comprehension Tool’. Archive of Control Sciences, 15(4): 637–644. Pruvost, J. 2000. ‘Colloquium Report: “Des dictionnaires papier aux dictionnaires électroniques”. VIIe Journée des Dictionnaires (22 mars 2000)’. International Journal of Lexicography 13(3): 187–193. Pruvost, J. 2006. ‘Avant-propos: A propos du troisième élément du triptyque: la métalexicographie, aux côtés de la lexicographie et de la dictionairique’. Cahiers de lexicologie, 88: 5–8. Ptaszynski, M. O. 2010. ‘Forbedring af datatilgang i elektroniske opslagsværker i forbindelse med kognitive brugersituationer’, in H. Lönnroth and K. Nikula (eds), Nordiska studier i lexigografi 10. Rapport fra Konferensen om lexikografi i Norden, Tammerfors 3–5 juni 2009. Nordiska förening för leksikografi, skrift nr 11, 417–431. Rey-Debove, J. 1971. Étude linguistique et sémiotique des dictionnaires français contemporains. Paris: Mouton. Roby, W. B. 2004. ‘The Internet, Autonomy and Lexicography: A Convergence?’ Mélanges CRAPEL, Nr. 28: 47–66. Ross, J. R. 1967. Constraints on Variables in Syntax. Bloomington: Indiana University Linguistics Club. Rubin, J. 1994. Handbook of Usability Testing. New York: Wiley. Rubin, J. and Chisnell, D. 2008. Handbook of Usability Testing. 2nd edition. New York: Wiley. Rude, C. D. (ed.) 2002. Technical Editing. Longman: New York. Rundell, M. 2006. ‘More Than One Way to Skin a Cat: Why Full-Sentence Definitions Have Not Been Universally Adopted’, in E. Corino, C. Marello and C. Onesti (eds), Atti Del XII Congresso Di Lessicografia, Torino, 6–9 Settembre 2006. Allessandria: Edizioni dell’Orso, 323–337. Sánchez, A., Cantos, P. and Almela, M. 2007. ‘Lexical Constellations and the Structure of Meaning: A Prototype Application to WSD’, in A. Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing, CICLing 2007, Mexico City, Berlin: Springer Verlag, 275–287. Ščerba, L. V. 1940. ‘Towards a General Theory of Lexicography’. International Journal of Lexicography 8(4): 315–350. Schwartz, B. 2004. The Paradox of Choice. Why More is Less. Harper Perennial: New York.
9781441128065_ref_finals_txt_print.indd 328
7/6/2011 11:37:24 PM
References
329
Shneiderman, B. 2002. User Interface Design (German edn.). Bonn: mitp-Verlag. Shneiderman, B. 2003. ‘The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations’, in B. B. Bederson and B. Shneiderman (eds), The Craft of Information Visualization: Readings and Reflections. San Francisco: Morgan Kaufmann, 364–371. Shneiderman, B. and Plaisant, C. 2005. Designing the User Interface. Harlow: Pearson Education. Sobkowiak, W. 2009. ‘Review of Wells, John C., Longman Pronunciation Dictionary (3rd Edn)’. International Journal of Lexicography 22(2): 191–209. Souque, A. 2006. ‘Approche critique des produits IDL. Word/OpenOffice.org Writer: correction orthographique. Available at: (September, 2010). http://fr.openoffice. org/docs/analyseCritiqueIDL-CorrecteurOrthographique.pdf Spohr, D. 2008. ‘Requirements for the Design of Electronic Dictionaries and a Proposal for their Formalisation’, in E. Bernal, and J. DeCesaris, (eds), 617–629. Spohr, D. 2010. Towards a Multifunctional Lexical Resource – Design and Implementation of a Graph-based Lexicon Model. Unpublished doctoral dissertation, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart. Spohr, D. 2011. ‘Multi-layer Architecture for Pluri-monofunctional Dictionaries’. (This volume). Spohr, D. and Heid, U. 2006. ‘Modelling Monolingual and Bilingual Collocation Dictionaries in Description Logics’, in P. Rayson, S. Sharoff, and S. Adolphs (eds), Proceedings of the EACL Workshop on Multi-Word-Expressions in a Multilingual Context – EACL 2006. Trento, Italy. Storjohann, P. 2005. ‘eLexiko: A Corpus-based Monolingual German Dictionary’. Hermes. Journal of Linguistics 34: 55–82. Svensén, B. 1993. Practical Lexicography. Principles and Method of Dictionary Making. Oxford: OUP. Svensén, B. 2009. ‘Subject-field Classification for Metalexicography Revisited’, in S. Nielsen, and S. Tarp (eds), 147–159. Tarp, S. 2000. ‘Theoretical challenges to LSP lexicography’. Lexikos 10: 189–208. Tarp, S. 2006. Leksikografien i grænselandet mellem viden og ikke-viden: generel leksikografisk teori med særlig henblik på lørnerleksikografi: doktorafhandling. Bind. Aarhus: Center for Lexicography, Aarhus School of Business. Tarp, S. 2007. ‘Lexicography in the Information Age’. Lexikos 17: 170–179. Tarp, S. 2008a. Lexicography in the Borderland between Knowledge and Non-knowledge. General Lexicographical Theory with Particular Focus on Learner’s Lexicography. Tübingen: Niemeyer. Tarp, S. 2008b. ‘The Third Leg of Two-Legged Lexicography’. Hermes. Journal of Linguistics 40: 117–131. Tarp, S. 2008c. ‘Revival of a Dusty Old Profession’. Hermes – Journal of Language and Communication Studies 41: 175–188. Tarp, S. 2008d. ‘Kan brugerundersøgelser overhovedet afdække brugernes leksikografiske behov?’ LexicoNordica, 15: 5–32. Tarp, S. 2009a. ‘Reflections on Data Access in Lexicographic Works’, in S. Nielsen and S. Tarp (eds), 43–62. Tarp, S. 2009b. ‘Reflections on Lexicographical User Research’. Lexikos 19: 275–296.
9781441128065_ref_finals_txt_print.indd 329
7/6/2011 11:37:24 PM
330
References
Tarp, S. 2010. ‘Functions in Specialised Dictionaries for Learners’, in P. A., FuertesOlivera (ed.), 39–53. Tarp, S. 2011. ‘Lexicographical and Other e-Tools for Consultation Purposes: Toward the Individualization of Needs Satisfaction’. (This volume.) Tono, Y. 2001. Research on Dictionary Use in the Context of Foreign Language Learning. Tübingen: Max Niemeyer Verlag. Tono, Y. 2010. ‘A Critical Review of Lexicographical Functions’. Lexicon 40: 1–26. Tono, Y. 2011. ‘Application of Eye-Tracking in EFL Learners’ Dictionary Look-up Process Research’. International Journal of Lexicography 24(1): 124–153. Trap-Jensen, L. 2010. ‘One, Two, Many: Customization and User Profiles in Internet Dictionaries’, in: A. Dykstra and T. Schoonheim (eds), Proceedings of the Fourteenth EURALEX International Congress, Leeuwarden, Netherlands, 6th–10th July 2010, 1133–1143. Turnbull, J. 2010. Oxfordiwriter with Oxford Advanced Learner’s Dictionary (8th edn). Oxford: Oxford University Press. Verlinde, S. 2011. ‘Modelling Interactive Reading, Translation and Writing Assistants’. (This volume) Verlinde, S. and Binon, J. 2009. ‘‘Pedagogical Lexicography Revisited’ in H. Bergenholtz, S. Nielsen and S. Tarp (eds), 69–89. Verlinde, S. and Selva, Th. 2001. ‘Nomenclature de dictionnaire et analyse de corpus’. Cahiers de lexicologie, 79: 113–139. Verlinde, S, Leroyer, P and Binon, J. 2010. ‘Search and You Will Find. From Stand-alone Lexicographic Tools to User Driven Task and Problem-oriented Multifunctional Leximats’. International Journal of Lexicography 23(1): 1–17. Verlinde, S., Selva, Th. and Binon, J. 2009. ‘Les bases de données au service d’un dictionnaire d’(auto)apprentissage pour allophones’. Lexique, 19: 217–233. Véronis, J. 2005. ‘Ortograf:OpenOffice vs. Microsoft’. Available at (September, 2010). http://blog.veronis.fr/2005/11/ortograf-openoffice-vs-microsoft.html Von Reischach, F. Michahelles, F. and Schmid, A. 2009. ‘The Design Space of Ubiquitous Product Recommendation Systems’. Mobile and Ubiquitous Multimedia. Proceedings of the 8th International Conference on Mobile and Ubiquitous Multimedia, MUM 2009. ACM. Weller, M. and Heid, U. 2010. ‘Multi-parametric Extraction of German Multiword Expressions from Parsed Corpora’, in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). ELRA, Valletta, Malta. Wiegand, H. E. 1984. ‘On the Structure and Contents of a General Theory of Lexicography’. Hartmann, R. R. K. (ed.) 1984, 13–30. Wiegand, H. E. 1989. ‘Der gegenwärtige Status der Lexikographie’, In: Hausmann, F. J. et al. (eds) 1989–1991, 246–280. Wiegand, H. E. 1998. Wörterbuchforschung. Untersuchungen zur Wörterbuchbenutzung, zur Theorie, Geschichte, Kritik und Automatisierung der Lexikographie. 1. Teilband. Berlin & New York: de Gruyter. Wiegand, H. E. 2001. ‘Was eigentlich sind Wörterbuchfunktionen? Kritische Anmerkungen zur neueren und neuesten Wörterbuchforschung’. Lexicographica 17: 217–248.
9781441128065_ref_finals_txt_print.indd 330
7/6/2011 11:37:24 PM
References
331
Wiegand, H. E. 2007. ‘Zugriffspfade in Printwörterbüchern. Ein Beitrag zur Schnittstelle von Benutzungshandlungen und Wörterbuchform’. Lexikos 17: 180–211. Wiegand, H. E. 2008. ‘Zugriffsstrukturen in Printwörterbüchern. Ein zusammenfassender Beitrag zu einem zentralen Ausschnitt einer Theorie der Wörterbuchform’. Lexicographica 24: 209–315. Wiegand, H. E., Beißwenger, M., Gouws, R. H., Kammerer, M., Storrer, A., and Wolski, W. 2010. Wörterbuch Zu Lexikographie und Wörterbuchforschung. Dictionary of Lexicography and Dictionary Research. Vol. 1 (A–C). Berlin: Walter de Gruyter. Wierzbicka, A. 1985. Lexicography and Conceptual Analysis. Ann Arbor: Karoma. Wierzbicka, A. 1993. ‘What Are the Uses of Theoretical Lexicography?’. Dictionaries. Journal of the Dictionary Society of North America 14: 1993, 44–78. Yamada, Shigeru. 2010. ‘EFL Dictionary Evolution: Innovations and Drawbacks’, in I. Kernerman and P. Bogaards (eds), 147–168. Zaenen, A. 2002. ‘Musings about the Impossible Electronic Dictionary’, in M. -H. Correárd (ed.), Lexicography and Natural Language Processing: A Festschrift in Honour of B. T. S. Atkins, EURALEX, 230–244. Zgusta, L. 1971. ‘Manual of Lexicography’. The Hague: Mouton. Zgusta, L. 1991. ‘Probable Future Developments in Lexicography’, in F. J. Hausmann et al. (eds), 1989–91, 3158–3168.
9781441128065_ref_finals_txt_print.indd 331
7/6/2011 11:37:24 PM
Index
Abel, A. 15, 290 access 1–3, 30–53, 56–8, 68–9, 103–4, 106, 108, 113–15, 124, 127–31, 133–4, 136, 147, 155, 169–71, 182, 190, 210, 226, 238, 242–3, 247–9, 268, 272, 277, 285, 289–90, 308 accessibility (accessology) 170–2, 182, 253–4 accounting dictionaries 5, 9–10, 13–14, 141–67, 171, 173–4, 176, 229, 308 El Diccionario Inglés–Español de Contabilidad 10–11, 176–86 Acronym Finder 239 adaptive hypermedia 88–90 see also information technology Afrikaans 21–2, 25, 28, 171 aggregator 241, 276 Aide à Rédaction de Textes Scientifiques see ARTES Alexandria 14, 277–8 Almela, M. 257, 259 Almind, R. 10–11, 33, 83, 103, 125, 171, 173, 176, 246 Amato, G. 84 Amazon 94–5 American Heritage Dictionary 232, 241–2 American Heritage Dictionary of Phrasal Verbs 209 Andersen, B. 11, 15, 103, 125, 134, 209, 305–6 Ankolekar, A. 80 annotation systems (information technologies) 95–8 Antidote 281 antonym 113, 143, 146–7, 151, 155, 158, 161–2, 177–8, 180, 252
9781441128065_index_finals_txt_print.indd 332
ARTES 137–8, 140 assistant 275–86 see also individualization Atkins, B. T. S. 8, 71, 121, 189–90, 193, 242, 246, 252, 255 audio 172, 245–6 see also multimedia Babelfish 279 Bank, C. 288, 297, 300 Base lexicale du Fran çais see BLF Bederson, B. B. 89 Béjoint, H. 8, 121–3, 241 Bell, R. T. 164 Bergenholtz, H. 2, 3–5, 11, 21, 23, 25, 27, 31, 33, 42, 57, 60, 71, 76–7, 79, 100–1, 104, 106, 125–6, 148–50, 164, 168–9, 171–2, 174, 176–7, 183, 189, 191, 268, 273, 276, 287–90 Bergenholtz, I. 11, 71, 106, 276, 287, 289 Betydning of faste vendinger see Meaning of Fixed Expressions Biber, D. 209 bilingual 145–8, 156, 161, 164, 177, 188 Binon, J. 14, 71, 172, 268, 275–6 Biotechnology from A to Z 33 Biotechnology Glossary 34 Bizer, C. 92 Bjærge, E. 60 BLF 11, 13–15, 87, 92, 94, 106, 249, 275–9, 283, 290–1, 294, 297, 299–302 see also ITL Bogaards, P. 32, 235 Bon patron 281 Boshoff, S. P. E. 21 Botha, W. F. 21 Bothma, TJD. 6, 103, 107, 284, 287 Breidt, E. 277
7/6/2011 11:29:50 PM
Index browsing 72, 81, 83–4, 302 see navigation Brug af faste vendinger see Use of Fixed Expressions Brusilovsky, P. 88–9 Bullokar, J. 254 Bunt, A. 88 Burchardt, A. 115 Cambridge Advanced Learner’s Dictionary 234, 260–1 Cambridge Dictionary of American English 235 Cambridge Dictionary Online 234, 241, 245 Cambridge Phrasal Verbs Dictionary 2009 Campogianni, S. 15, 290 Cantos, P. 12–13, 257, 259, 261 Carenini, G. 88 Carnegie Mellon University Pronouncing Dictionar y 239 Carr, M. 231 categorical perception 239 Cerquiglini 186, 252–3 Chambers 21st Century Dictionary 233 Chambers English Dictionary 233 Chapman, R. W. 20 Check My Words 284 Chinese 208 Chisnell, D. 292 Chun, D. 246 Church, K. W. 172 Claassen, W. T. 25 CLAVE 251 Cobuild / COBUILD 234 cognitive 65–9, 77–8, 92, 105, 130–2, 137–8, 149, 159, 168–86, 188, 191, 194, 217, 287–8 see use situation, lexicographic function and knowledge Collins Cobuild Advance Dictionary 243 Collins Cobuild Dictionary of Phrasal Verbs Collins English Dictionary 241 Collins English Free Dictionary 233 collocation(s) 31, 142–3, 151, 154, 158–62, 164–5, 215–17, 224–5, 227–8, 285 see function theory
9781441128065_index_finals_txt_print.indd 333
333
Combrink, J. G. H. 21 communication / communicative 66–9, 78, 105, 123, 130–2, 135–8, 149, 159, 168–86, 188, 191, 217, 287 see use situation, lexicographic function, text production, text reception, text translation, understanding and writing COMPASS 277 Conati, C. 88 contemplative 17–20, 27–8, 30–1, 190–1 context domain 74–6, 257–9 copycat 58–9, 63 see also online dictionary and information tools Cordial 281 corpus / corpora 31–2, 92, 115, 151–2, 246–8, 252, 258–9, 266, 271–2, 278, 280, 309 Cosijn, E. 81 cost 169–70, 172, 306 Cowie, A. P. 233 cross-reference(s) 155, 159–60 Crystal, D. 255 Cumbre 266, 274 customization 2–3, 6–7, 12, 98, 222, 244–5, 307–8 see also individualization Dagut, M. 208 Danish 9–10, 131–4, 141–5, 149, 152–3, 161–2, 164, 171, 174–5, 208–9, 211, 213, 215–22, 224–5, 227–8 Danish internet dictionary 192 Danish Music Dictionary 5, 9, 11, 13–15, 60–1, 100, 106, 192, 195–206, 229, 276, 289, 305 see also Musikordbogen Dansk parlør 135 Danske Talemåder 35–6 Darwin, C. M. 209 data 63–4, 103, 107, 109–11, 124–5, 127–31, 133–4, 141–67, 169, 172, 182, 209, 217–22, 288–9, 296–7, 306, 308
7/6/2011 11:29:51 PM
334
Index
data markup 90–1 see also information technology presentation structure 148 selection 151–5 databank 25–6 database 81–3, 99–101, 103–4, 111–14, 142–9, 167, 174, 195, 203, 208–13, 217, 223, 226–8, 256, 264, 279, 287, 308–9 lexical 7–8, 108, 111, 246–8 lexicographic(al) 1–3, 10–11, 37–8, 42–6, 107, 118, 137–8, 189 relational 143–6 De Bra, P. 88 de Schryver, G. M. 29, 168, 244, 246, 252, 254 definition 13, 21, 76–7, 142–3, 145–6, 155–62, 164, 177, 178, 181, 188, 193, 197–8, 224–6, 236, 245, 254, 256–74, 277–8 Delavigne, V. 130 Den Danske Netordbog 35–6 Den Danske Ordbog 35–6, 61 dialogue principles 294 see also user interface Diccionario de la Lengua Española 58–9, 251 Diccionario de la Real Academia Española (DRAE) 251 dictionary(ies) 30–1, 103, 121–4, 134–5, 142–7, 186–95, 211–13, 217, 228, 287, 308–9, 311 online dictionary 30, 61–2 structure(s) 3, 22–3 survey 32 see also usage Dictionary.com 240–1 Dictionary of Biotechnology 33 Dictionary of Fixed Expressions 60, 87, 95, 100, 174, 289 see also Ordbogen over faste vendinger Dictionary of the English Language 19, 126 Dodd, W. S. 245, 255 Dolog, P. 90 domain 107 Dorland’s Medical Dictionary 238, 242 Doroszewska, J. 172, 246 Doroszewski, W. 20
9781441128065_index_finals_txt_print.indd 334
Downing, A. 209 Dublin core 90 DUE 251 Dumas, J. S. 292 Dutch 279, 283–5, 291 dynamic (recreation and re-representation) 60–2, 64–5, 73, 103–4, 164–5, 172, 174, 229, 277, 307 e-dictionary see online dictionary e-lexicography 5, 8–9, 25–8, 57–63, 69, 73–4, 99–101, 122, 275–6 see also lexicography, lexicographic(al) e-tool, lexicographical information tool, lexicographical consultation tool and online dictionary e-WAT (Elektroniese Woordeboek van die Afrikaanse Taal) 82 EbscoHost 86 effectiveness 289–91 efficiency 288–91, 296 Ejendomsordbog (Dictionary of Real Property) 131–4 ELDIT (Elektronisches Lernerwörterbuch Deutsch–Italienisch) 15, 290, 294, 297–8, 301 Eliasson, S. 208–9 Encarta World English Dictionary 233 English 9–10, 12, 105, 135, 137, 141–6, 149, 152–3, 157–8, 161–2, 164, 175, 182–4, 189, 208–10, 213, 215–25, 228, 230–50, 277, 280–1, 285, 295 English Phrasal Verb(s) Dictionary 9, 224–6 English Pronouncing Dictionary 239–40 English–Spanish Dictionary of Biotechnology (Encyclopedic Dictionary of Gene Technology English–Spanish) 34, 66 entry map 83 equivalent 78, 155, 161–2, 164, 188, 218–19, 221–2, 224–5, 227–8, 235 etymologies 271 example 31, 142–3, 151, 155, 159, 161–2, 164, 199, 213, 216–19, 222,
7/6/2011 11:29:51 PM
Index 224–5, 227–8, 236, 244, 252 explanation 197–201 external subject classification 152–3 (eye) tracking 84–6, 91, 244, 293 see also usability Farrar, S. 109 Faste vendinger med en bestemt betydning see Fixed Expressions with a Certain (Specific) Meaning faster horses 59–61, 63 see online dictionary and information tools Feldweg, H. 277 filtering 86–8 see information technology Fixed Expressions with a Certain (Specific) Meaning 46–53, 60–1 see also Faste vendinger med en bestemt betydning Florio, J. 254 Földes, A. 277 Fontenelle, T. 281 ForBetterEnglish.com 246 French (FR) 131–4, 277–83, 290–1, 300 Frequency 191–4, 252, 309 see corpus FRIDA 283 Fuertes-Olivera, P. A. 10–11, 125, 148, 164, 172, 176–7, 183, 231, 237 Fukuya, Y. J. 208 function theory (dictionary functions, Modern Theory of Lexicographical Functions (MTLF) and functional theory) 2–3, 5, 10–11, 23–4, 42–4, 56–7, 63–6, 71–103, 104–5, 121–40, 148–51, 169–70, 176–8, 217, 275, 288–9, 291 Garvin, P. L. 20 Gauch, S. 84 GDUEsA 251–2, 266 General Ontology of Linguistic Description 109 Gentechnologie von A bis Z 33 German 25, 105, 111, 114, 290–1, 295, 300
9781441128065_index_finals_txt_print.indd 335
335
Glossarist.com 275 glosses 291 GOLD see General Ontology of Linguistic Description Google (Google Dictionary, Google Earth, Google Maps and Google Scholar) 182, 86, 89, 94, 96, 191, 245, 251, 279, 284 Gouws, R. H. 2–3, 25, 20–3, 25, 27–8, 31, 57, 71, 76–7, 79, 103, 125, 134, 171, 189, 276 grammar / grammatical 142–3, 145, 147, 155, 159–62, 213–15, 218– 19, 223–4, 226, 229, 244, 252, 280–3 Granger, S. 283 granularity 106–9 see customization graphics (multimedia) 245 Gray, L. S. 209 Grefenstette, G. 15, 252, 255, 264 Grobler, H. 21 Gujin Tushu Jicheng 62 Haas, M. R. 76, 284 Hanks, P. 235 hapax legomena 252 Harrap’s Essential English Dictionary 33 Harris, R. 156 Hartmann, R. R. K. 23 Hausmann, F. J. 148 HCI 293 see also human computer interaction He, D. 88, 94 Heid, U. 14–15, 105–6, 111, 115, 172, 276–7, 284–5 Heift, T. 285 Heinle’s Newbury House Dictionary of Amercian English 235 Hjelmslev 40 Hoey, M. 257 homonymy 155–2, 164, 285 Householder, F. W. 20 Howjsay.com 239 Hulstijn, J. H. 209, 242 human computer interaction 293 see also HCI Hutton, C. 156
7/6/2011 11:29:52 PM
336
Index
(hyper)link(ing) 180–5, 199–203, 277, 301 Hypermedia Glossary of Genetic Terms 34
Järvelin, K. 74–5, 81 Johnsen, M. 33, 171, 288 Johnson, S. 19, 254 JustTheWord 247
IAS / IFRS (International Accounting Standards / International Financial Reporting Standards) 143, 151–3, 157, 162, 164, 173, 175, 179, 181 IATE (The EU’s multilingual term base) 34, 60 Illich, I. 141 ILT see Interactive Language Toolbox individual(ization) 5–6, 9, 54–62, 63–9, 103–20, 130–1, 276–7, 308 see also customization and personalization active method 6, 67–8 interactive method 5–6, 67–8 passive method 6, 67–8 information 73–102, 124–5, 171, 176–7, 252–4, 268, 287 needs (user needs) 9, 99–101, 113, 130–40, 191–2, 217–18, 307 retrieval 74–100 science 8–9, 54–7, 72–4, 121–40, 287–8, 293, 302, 307, 310–11 stress (information death,information overload) 105, 125, 170–1, 191–5, 307 technologies 80–98 , 310–11 see also mashup tool (consultation tool) 3, 15–16, 56, 99–101, 107–9, 121–40, 174, 187–9, 223–8, 255, 284–6, 308–9 type (user type) 107 Ingwersen, P. 74–5, 81 Interactive Language Toolbox 5, 14, 276, 285 Interglot 301 internal subject classification 151–3 internet dictionary see online dictionary interpretative / interpretive (use situation) 65–6, 176, 217 ISO 109, 291, 294 Italian 290
Kang, H. 86 Kaufmann, U. 149 Kilgarriff, A. 246–7 knowledge (cognitive and information technologies) 91–4, 123, 133–4, 160, 177, 181–2, 188, 193–4, 199–203, 256, 289 Knowledge about Fixed Expressions (Viden om faste vendinger) 48–53, 61 Knutov, E. 88 Kobsa, A. 84 Krömker, H. 292 Krötzsch, M. 80 KwicFinder 284
9781441128065_index_finals_txt_print.indd 336
label 177, 180, 215, 218–19, 229 Landau, S. I. 257 Langendoen, D. T. 109 language dictionary 123–4, 127–8, 139 Language for General Purposes 123 see also LGP language planner 305–6 Laufer, B. 208–9 Laurén, C. 154 LCM 13, 259, 261–4 LEAD see Louvain EAP Dictionary learnability 292–3 see also usability learner 192, 208–11, 228–9, 239–40, 275, 278, 285, 290–1 see also student and pupil Leech, G. 252, 255 lemma (selection, lemmata) 135–6, 143, 146–8, 151, 155, 158–62, 164, 178, 191–2, 198–9, 209, 229, 252 Leroyer, P. 8, 14, 27, 71, 125, 127, 134, 189, 268, 275–6 Lew, R. 12, 171–2, 222–3, 245–6, 275 lexeme 109–11 lexical constellation model see LCM Lexical Markup Framework 109 see also LMF lexicographic(al) arrangement(s) 176–8
7/6/2011 11:29:52 PM
Index lexicographic(al) function 105–6, 108, 141–2, 148–9, 210 lexicographic(al) tool 105, 123–5, 287–90, 302 lexicographer 306, 309–10 lexicography (theory and practice) 19–29, 30–3, 54–6, 73–6, 98–9, 121–9, 190–1, 305 paradigm shift 54–6, 121–40 see information science leximat 169–70 Lexonco 9, 130 LGP see Language for General Purposes Liao, Y. 208 linguistic (colonialism) 3–5, 8, 20–2, 189–91 linked open knowledge 91–3 Littré 278 LMF see Lexical Markup Framework Locke, P. 209 Longman Dictionary of Contemporary English (Online) 59, 234, 245 Longman Phrasal Verbs Dictionary 209 Longman Writing Assistant 285 Louvain EAP Dictionary 12, 235, 244–5 see also LEAD LSP (Language for Specific Purposes, special purpose language) 123, 127, 130, 150, 269–71 see subject field McArthur, T. 22 Macmillan Dictionary and Thesaurus 61 Macmillan Dictionary Online 242 Macmillan English Dictionary (Online) 234, 244, 245 Macmillan Open Dictionary 237 Macmillan Phrasal Verbs Plus 209 macrostructure 148, 252, 290 man-machine interaction see MMI Marchena, E. 209 mashup 80–98, 96 see also technologies under information Maybury, M. T. 88 meaning 257–60
9781441128065_index_finals_txt_print.indd 337
337
Meaning of Fixed Expressions (Betydning of faste vendinger) 35– 41, 43–53 Meara, P. 275 memorability 292 see also usability Merriam-Webster’s 254, 260–1 Merriam-Webster’s Advanced Learner’s English Dictionary 243 Merriam-Webster Collegiate Dictionary 59 Merriam-Webster’s Learner’s Dictionary 235 Merriam-Webster Online Dictionary 59, 232 Merriam-Webster Open Dictionary 237 metadata 90–1 meta-lexicography 193–4 Michaelles 94 Microbial Genetics Glossary 34 Middle English Dictionary 238 Millán, E. 88 mining techniques 237 see also technologies under information MMI (man-machine interaction) 293 mobile e-lexicographic tourist guide 136–7 MoBiMouse 277 model(ling) 275–86 see also individualization model T Fords 60–1, 63 see also online dictionary and information tools modular 130–40, 266–8 see individualization and multi-layer architecture mono-functional (monofunctionality) 105–6, 118–19, 156–7, 187–91, 208–29, 276, 289–90, 302 monolingual 144–8, 155–6, 161, 164, 188 Mossop, B. 158, 164 Mourier, L. 148, 172, 176 Müller, P. 115–16 Müller-Spitzer, C. 15, 290 multidisciplinarity 306 multifunctional 105, 156, 290
7/6/2011 11:29:52 PM
338
Index
multi-layer architecture 114–18 see modular Multilingual Glossary of Biotechnological Terms 33 multimedia 245–6 see also graphics, picture and video multi-word unit(s) 268–71 Musikordbogen see Danish Music Dictionary MyCOBUILD.com 12, 243 navigate / navigation 80–4, 98, 136–7, 293, 297, 300, 302 see information technology and usability Nejdl, W. 90 Nesi, H. 168, 252, 255, 275 nesting 290 New Oxford American English 232 New Oxford Dictionary of English 260–1 Nielsen, J. 292–4 Nielsen, S. 10, 15, 71, 101, 103, 148–50, 158, 164, 169, 172, 176–7, 183, 246, 268, 273, 305–6, 311 Niño-Amo, M. 10–11 Nord, C. 164 Nordisk leksikografisk ordbok 134 notes 155, 159–62, 164, 177, 179, 235 Nudansk Ordbog 35–6 Odendal, F. F. 21 OED (Oxford English Dictionary) 82–4, 86, 238, 244 Office 281 OneLook 275 online dictionary 1, 15–16, 25–8, 141–2, 147–8, 168–76, 180, 182, 251–6, 264, 271–4, 295–6, 306–7 see also e-dictionary and internet dictionary bottom-up 231, 236–8, 275 collaborative 231, 236–7, 275 collective 230–1, 237 diachronic (historical) 237–8 dictionary aggregators 231 dictionary portals 231, 238 dictionary sets 231 free 231–4
9781441128065_index_finals_txt_print.indd 338
freemium 232 general English 231–3 Google dictionary 234–5 individual 231 institutional 230–41 learners’ 233–6 onomasiological 240–1 open 231 paid 231 pronouncing 239 restricted macrostructure 238–9 restricted microstructure 239 subject-field 238–9 user-involvement 231, 236–8, 275 Online Etymology Dictionary 239 Ooi, V. B. Y. 172, 245 OpenOffice 281 operative/operational 65–6, 130–1, 137–8, 176, 217 see also situation under use Opus 278 Ordbog over det danske sprog (Internet) 35, 59 Ordbogen 87, 92, 195–6, 199 Ordbogen over faste vendinger see Dictionary of fixed expressions Országh, L. 20 OWID (Onlinewortinformationssystem) 15, 290–1, 294, 297–302 OWL 107–8, 113, 118 Oxford Advanced Learner’s Dictionary 234 Oxford American College Dictionary 235 Oxford Collocations Dictionary for Students of English 300 Oxford Dictionary of English 232 Oxford English Dictionary see OED Oxford Phrasal Verbs 209 OxforfiWriter 285 Pechenizkiy, M. 88 Pecman, M. 137 Pedersen, J. 151 Pérez Cabello de Alba, B. 15 personalization 73–4 phoneme 24, 188, 239–40 phrasal verb(s) 208–29
7/6/2011 11:29:52 PM
Index picture 265, 268, 271–2 see also multimedia Pizarro-Sánchez, I. 183 Plaisant, C. 292 Plass, J. L. 246 pluri-monofunctionality (plurifunctional) 8, 103–20, 310 polyfunctional 176–86, 187–91, 203–6, 223, 289 polysemy 155–62, 164, 178 portability 253 Postlethwayt, M. 65 presentation 30–53, 106, 113–14, 124–5, 306 see access and data Prinsloo, D. J. 29, 92, 172 pronunciation 272 proscriptive 164–6, 171 Prószéky, G. 277 Pruvost, J. 140, 252–3 Ptaszynski, M. O. 125 pupil 267, 270–2 see also learner and student Quemada, B. 126 questionnaire 295–7, 301–2 Random House Dictionary 241 Random House Unabridged Dictionary 232 range 107 RDF(S) (Resource Description Framework (Schema)) 90, 92, 104, 107–8, 111 reading 268, 271–2, 277–9 see (text) reception recall rate 282 recommender systems 94 see also technologies under information Reddish 292 redundancy 142, 259 relevance 154–6 reliability 182–5 Resource Description Framework (Schema)) see RDF(S) Reverso 279 Rey-Debove, J. 121
9781441128065_index_finals_txt_print.indd 339
339
RhymeZone 240 Right Writer 281 Roby, W. B. 255 Rolls Royces 61–3, 285 see also online dictionary and information tools Ross, J. R. 24 Rubin, J. 292 Rude, C. D. 158 Rundell, M. 8, 71, 121, 189–90, 234 Rychlý, P. 246 Samaniego Fernández, E. 15 Sánchez, A. 12–13, 257, 259, 261 Saporta, S. 20 Š čerba, L. V. 20 Schmid, A. 94 Schulze, M. 285 Schwartz, B. 275 ScienceDirect 87 search (engine)/ searching (information technology) 68–9, 80–4, 72–98, 106, 116–17, 136–7, 147–8, 155–66, 168, 171–3, 177, 191, 201–5, 224, 252, 273, 287, 290, 297–300, 302 Boolean 42–53 default 4, 173, 240, 267 maximizing 42–53 minimizing 42–53 selection 151–5 Selva, Th. 276 semantic 257–9, 266 Semantic Rhyming Dictionary 240 Semantic Web 91, 104, 109, 114, 118 see OWL and RDF(S) Sensagent 277 sense 211, 218–19, 221–3, 229, 266, 272, 290 Shneiderman, B. 86, 89, 292, 294 SketchEngine 246–8, 271 Sosnovsky, S. 89 sound effect 245 see also multimedia Souque, A. 281 Spanish 10–12, 134–5, 141, 143, 145–6, 149, 152–3, 179, 182–5, 251–3, 266 SpellCheckPlus 281 Spohr, D. 3, 7–8, 91, 109, 111, 116
7/6/2011 11:29:52 PM
340
Index
Sprachlexikographie 24 Steyn, M. 71 Storjohann, P. 172 Straccia, U. 84 student 283, 295–7 see also learner and pupil (subject) classification 151–3 subject-field (specialist dictionary) 53, 123, 132, 151–2, 154, 160, 193, 238 surfing 81, 83–4 see navigation Svensén, B. 151, 177, 257 Svenska Akademiens Ordbok 59 Swedish 209 synonym(y) 113, 142–3, 146–7, 151, 155, 158–62, 177–8, 180, 184, 199, 213, 215, 218–19, 221–2, 224–5, 227–8, 244, 252, 256, 278 system 99–101 systematic (introduction) 11, 66–7, 182–5, 196, 199 Talemåder i dansk 35–6 Tarp, S. 2, 3, 5, 13–15, 23, 42, 244, 55–6, 60, 71, 73, 98–101, 103–7, 111, 113–14, 118, 123, 127, 148, 168–70, 172, 174, 176, 183, 268, 273, 275–7, 285, 287–8, 302, 311 term 142–3, 145–6, 154, 157, 160, 162, 171, 199, 280, 300 terminological classification 153–4 Tesauro ISOC de Contabilidad 182 (text) production (productive) 32–3, 42–3, 56, 64, 73–4, 78, 105, 111, 113, 116–17, 137, 149, 158–9, 164–5, 177, 182–3, 188–9, 191–2, 199, 206, 208–9, 216–18, 222–3, 228–9, 244, 279, 288, 297 (text) reception (receptive) 32–3, 42–3, 57, 73–4, 78, 105, 114, 116–17, 149, 177, 182–3, 188, 191, 193–4, 197–9, 209–10, 218, 223, 225, 228–9, 244, 288–9, 297 (text) translation 32–3, 57, 78, 132–4, 137, 146, 149, 158–9, 161–4, 174–5, 177, 182–3, 185, 188, 191, 198, 208–9, 211, 215–19, 221, 223, 228–9, 268 277, 279–80, 300–1
9781441128065_index_finals_txt_print.indd 340
TheFreeDictionary 241 Thesaurus.com 240 TLFi 126, 128 Tono, Y. 2, 71, 244, 264 Tran, T. 80 transformative 17–20, 27–8, 169 Trap-Jensen, L. 302 Trésor de la langue française informatisé see TLFi UCD (user-centred design) 293 under-specification 111 understand(ing) 155–7, 162–3, 175, 197–8, 218, 289 see reception Universal Dictionary of Trade and Commerce 65 Urban Dictionary 236 usability 11–12, 14–15, 101, 105, 276–7, 284, 287–304 see also information science usage 188, 190, 268, 290 use 51–3, 257, 288, 301 situation 2–3, 10, 16, 56–7, 63–8, 73–7, 103–6, 111–14, 127, 132–3, 148–9, 154, 307–8 Use of Fixed Expressions (Brug af faste vendinger) 44–53 user 23–5, 28–9, 67–8, 76–9, 99–101, 106, 115–16, 127–30, 143, 147, 155, 176–7, 188, 190, 203, 223–4, 243–6, 258, 260–1, 275, 288–90, 292, 306 expert 76–8, 85–6, 89, 117, 127, 149–50, 156, 160, 189, 194–5, 237–8, 267–8, 270 interface 275–86, 294–5 lay(men) (novice, non-expert) 76–7, 85–6, 89, 149–50, 194–6, 237–8 needs 2, 8–9, 13, 15–16, 25, 30–53, 56, 60–1, 63–4, 71–102, 103–6, 111–14, 118, 124–5, 141, 148–1, 188, 191–6, 243–4, 264, 276–7, 307 profile (individualization, customization) 84–98, 132, 150, 244–5, 266–7, 271–2, 277
7/6/2011 11:29:52 PM
Index semi-expert 76–8, 85–6, 89, 149–50, 156 type(s) 2–3, 10, 16, 56–7, 63–4, 76–7, 103–6, 111–14, 169–70, 307 user-centred design see UCD Verlinde, S. 13–14, 71, 106, 172, 244, 249, 268, 275–6, 290, 308 Véronis, J. 281 Viden om faste vendinger see Knowledge about Fixed Expressions video 268, 271–2 see also multimedia Visual Thesaurus 241, 275 VisuWords 241 Von Reischach, F. 94 VOX 251 Vrandečić, D. 80 WAT (Woordeboek van die Afrkaanse Taal) 21, 27–8, 82–3 Web Ontology Language see OWL Weber, V. 15, 290 Webster’s New World College 260–1 Webster’s Online Dictionary 83
9781441128065_index_finals_txt_print.indd 341
341
Weller, M. 115 Wiegand, H. E. 1, 3, 22–4, 26, 36, 63, 139, 148, 242 Wierzbicka, A. 190, 193 Wikipedia 62, 83–4, 90, 92, 96, 176, 184–5, 201, 310 Wiktionary 83–4, 96, 176, 236 Word 281–2 Worddeboek van die Afrkaanse Taal see WAT WordNet 240–1, 246–8 Wordnik 237 Wörterbuch der Gentechnik 34 Wortschatz Universität Leipzig 31 write / writing 155–8, 160, 197, 268, 271–2, 281–4 see also (text) production Yamada, Shigeru 234 Yongle Dadian 62 Yudelson, M. 89 Zaenen, A. 255 Zgusta, L. 18, 20, 255
7/6/2011 11:29:53 PM