214 98 3MB
English Pages 356 Year 2009
Lexicography in the 21st Century
Terminology and Lexicography Research and Practice (TLRP) Terminology and Lexicography Research and Practice aims to provide in-depth studies and background information pertaining to Lexicography and Terminology. General works include philosophical, historical, theoretical, computational and cognitive approaches. Other works focus on structures for purpose- and domain-specific compilation (LSP), dictionary design, and training. The series includes monographs, state-of-the-art volumes and course books in the English language.
Editors Marie-Claude L’ Homme University of Montreal
Kyo Kageura
University of Tokyo
Consulting Editor Juan C. Sager
Volume 12 Lexicography in the 21st Century. In honour of Henning Bergenholtz Edited by Sandro Nielsen and Sven Tarp
Lexicography in the 21st Century In honour of Henning Bergenholtz
Edited by
Sandro Nielsen Sven Tarp University of Aarhus
John Benjamins Publishing Company Amsterdam / Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data Lexicography in the 21st century : in honour of Henning Bergenholtz / edited by Sandro Nielsen, Sven Tarp. p. cm. (Terminology and Lexicography Research and Practice, issn 1388-8455 ; v. 12) Includes bibliographical references and index. 1. Lexicography. I. Bergenholtz, Henning. II. Nielsen, Sandro. III. Tarp, Sven. P327.L4328
2009
413'.028--dc22 isbn 978 90 272 2336 4 (hb; alk. paper) isbn 978 90 272 8901 8 (eb)
2009021345
© 2009 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
Prof. dr. phil et habil Henning Bergenholtz
Table of contents Introduction: Nothing is more practical than a good theory Sandro Nielsen and Sven Tarp
ix
part i. The dictionary, dictionary structures and access routes 1.
Sinuous lemma files in printed dictionaries: Access and lexicographic functions Rufus H. Gouws
2.
Reviewing printed and electronic dictionaries: A theoretical and practical framework Sandro Nielsen
3.
Reflections on data access in lexicographic works Sven Tarp
4.
Hybrid text constituent structures of dictionary articles: A contribution to the expansion of the theory of textual dictionary structures Herbert Ernst Wiegand
3
23 43
63
part ii. Dictionary functions and users 5.
On production-oriented information in Swedish monolingual defining dictionaries Sven-Göran Malmgren
6.
Balancing the tools: The functional transformation of lexicographic tools for tourists Patrick Leroyer
7.
Lexicography and language planning in Scandinavia and the Netherlands Lars S. Vikør
93
103 123
Lexicography in the 21st Century
part iii. Subject-field classification and introductions 8.
Subject-field classification for metalexicography revisited Bo Svensén
9.
Systematic introductions in specialised dictionaries: Some proposals in relation to accounting dictionaries Pedro A. Fuertes-Olivera
147
161
part iv. Data retrieval and corpus lexicography 10. The role of corpora in future dictionaries D.J. Prinsloo
181
11. Lexicographical data in natural-language systems Franziskus Geeb
207
part v. Collocations and phraseology 12. A methodology for describing collocations in a specialised dictionary Marie-Claude L’Homme 13. Lexicographic description: An onomasiological approach on the basis of phraseology Jón Hilmar Jónsson
237
257
14. Item-specific syntagmatic relations in dictionaries Thomas Herbst
281
Henning Bergenholtz: Bibliovita Sandro Nielsen and Sven Tarp
309
Notes on contributors
335
Subject index
339
Introduction Nothing is more practical than a good theory Sandro Nielsen and Sven Tarp
The history of dictionaries can be traced back several thousand years. During all this time a huge number of lexicographical products have seen the light of day, but many dictionaries have been relegated to obscurity whereas few have become part of the popular idiom. Similarly, the number of great lexicographers is small compared to the totality of lexicographers since the dawn of lexicography. Until the middle of the 20th century these household names included only lexicographical practitioners, who were primarily linguists, and lexicography was widely regarded as applied linguistics. However, during the last four decades in particular the academic community has realised that the scope of lexicography is not limited to linguistics, and academics started to develop general and specific theoretical principles that could explain the nature of dictionaries and help lexicographers to develop new and improved dictionaries. Since the 1980s, one of the prominent members of the lexicographical community is Professor Henning Bergenholtz, who is both a well-known dictionary author and theoretical scholar, thereby bridging the gap between theory and practice. Henning Bergenholtz started his career as a primary and secondary school teacher and went on to study mathematics and German at the University of Aarhus in Denmark, but moved to Germany to study mathematical linguistics at Technische Universität in Berlin from where he got his master’s degree in 1973. In 1975 Henning Bergenholtz was awarded the degree of Doctor of Philosophy and in 1978 was awarded the degree of Doctor habil. from the University of Essen. He has been professor at a number of universities in Europe and Africa and in 1986 he was appointed the world’s first professor of lexicography at the Aarhus School of Business, University of Aarhus, and has since gone on to establish himself as a preeminent expert in lexicography. Through his work he has gained an international reputation as an expert in morphology, the use of computer readable corpora, the relationship between grammar and lexicography, and the development of the
Lexicography in the 21st Century
theory of lexicographical functions for printed and electronic dictionaries. The Bibliovita shows that Henning Bergenholtz’s contributions to lexicography are fundamental and span both theoretical and practical aspects. It is difficult to give a full picture of a man with as much scope as Henning Bergenholtz, so this volume is dedicated to the field of lexicography, which has been greatly influenced by his extensive research and other activities, teaching of lexicography, dictionary reviewing and post-graduate teaching through the last four decades. He is also known as an organiser of conferences, editor and referee of lexicographical journals and most recently as director of Centre for Lexicography – Research into Needs-Adapted Information and Data Access, Aarhus School of Business, University of Aarhus. Coupled with his extensive contacts around the world, especially in Europe, Africa and Latin America, he has made his mark on the lexicographical landscape. The encouragement and maintenance of excellent scholarly values and critical inquiry is a hallmark of his career. He takes genuine pleasure in interacting with intelligent and inquiring young minds, spurring them on to greater things and this has spawned close and lasting friendships all over the world. The contributions to this volume make up only a small proportion of them. This volume aims at describing the state-of-the-art in lexicography at the beginning of the 21st century and making proposals for future theoretical and practical work in the field. Lexicography is now in a transitional period and needs to find its place in the research world. The work of Henning Bergenholtz can be seen as a bridge between lexicography as it was practiced in the 20th century and lexicography as it will be practiced in the 21st century. The contributors, given their various relationships with him and his work, have taken off from their own experiences and research journeys in the light of Henning Bergenholtz’s rich work, and gone in many of their own exciting directions in this volume. The contributors are all highly regarded international scholars in the field of lexicography and have published authoritative theoretical works on lexicography as well as numerous printed and electronic dictionaries of high quality. Their contributions show how the new theory of lexicographical functions can extend the forefront of the discipline by focusing on dictionary functions and how these can meet the needs of users in various types of user situations. This emphasizes the proposition that the true object of lexicography is the dictionary as a tool that can help users solve problems they encounter in communicative, cognitive and operative situations. We are greatly indebted to the contributors for taking time out of extremely busy schedules to honour Professor Henning Bergenholtz on the occasion of his 65th birthday. We are grateful for a diverse and excellent set of papers to honour
Introduction
Professor Bergenholtz and are pleased that the papers also reflect the true spirit of Professor Bergenholtz’s research in and dedication to lexicography as well as his wide network and influence in the international lexicographical community.
part i
The dictionary, dictionary structures and access routes
Sinuous lemma files in printed dictionaries Access and lexicographic functions Rufus H. Gouws In general monolingual and bilingual dictionaries a horizontal ordering of lemmata by means of lemma niching and nesting has primarily been developed as macrostructural space-saving procedures. Too often it impedes optimal access to data and eschews the relevant lexicographic functions. This paper argues that neither space-saving nor linguistic motivations suffice as the only criteria for macrostructural ordering procedures. Various access problems are discussed and it is shown that the use of nesting and niching is only permissible if it contributes to satisfying the lexicographic functions. All articles in a dictionary should embody the functions of that dictionary. It is shown how different levels of functional success can be achieved by different types of lemma clustering and when clustering should be avoided. Keywords: macrostructural procedures; niching; nesting; sinuous lemma files; lemma selection; access routes; lexicographic functions
1. Introduction The twenty-first century sees dictionaries as products of a scientific practice with a well-developed underlying theory. The needs of dictionary users are constantly changing and these changing needs should compel lexicographers to continue developing lexicographic theory and improving the quality, relevance and efficiency of dictionaries. Like all reference sources, dictionaries are consulted because the users need to retrieve specific information. Having identified the user-profile and being familiar with the needs of the intended target users lexicographers have to include the required data in their dictionaries. The focus of lexicographers of general monolingual and bilingual dictionaries has to be on the data they need to present but it is equally important to present the data in such a way that the users have rapid and unimpeded access to the data to ensure an optimal retrieval of information. The
Rufus H. Gouws
selection and presentation of data should be done in such a way that the envisaged function(s) of the dictionary can be satisfied in order to achieve its genuine purpose. This applies to both printed and electronic dictionaries. Both these types of dictionaries are containers of knowledge and the specific user-profile and lexicographic functions will determine the nature and extent of the data to be included. The approach to dictionary structures that applies to printed dictionaries needs a serious re-planning when devising electronic dictionaries. Real differences do exist. As an example, the notion of a macrostructure only applies to printed dictionaries. The users of electronic dictionaries, and here the term electronic dictionaries excludes the CD-ROM version of printed dictionaries where users see the same page layout on their computer screens that they see in the printed product, are not confronted with an ordered set of lexical items included in the dictionary as treatment units. Access to a given lemma sign takes the user to that lemma sign and its article but not within the alphabetical context seen in a printed dictionary but rather as an independent article, isolated from other articles and not part of a typical article stretch. Given these imminent changes to be formulated with regard to the structures of electronic dictionaries and looking at the future of dictionaries as utility tools one can assume that the notion of macrostructure will increasingly become less relevant. Many aspects relevant to the macrostructure of printed dictionaries are no longer relevant or even interesting in the domain of electronic dictionaries. One such aspect is the use of sinuous lemma files. Electronic dictionaries do not have to embark on a space-saving procedure of this kind. However, the immediate future of lexicographic tools still sees printed dictionaries as an important and persisting role player. Although research in theoretical lexicography should be directed at the future and the tools of the future, it should not eschew the importance of tools that will still be in use for, at least, the immediate future. This demands that metalexicographic research, besides its focus on electronic dictionaries, should still also be directed at printed dictionaries – including issues regarding the macrostructure of printed dictionaries. Metalexicographic development should have all aspects of dictionaries in its scope and innovative developments still need to be introduced with regard to printed dictionaries. Theoretical lexicographers should not try to escape this responsibility. This paper focuses on one aspect of printed dictionaries, i.e. the use of a sinuous lemma file. Although this macrostructural issue is of no importance to electronic dictionaries the procedure of horizontal ordering still prevails in printed dictionaries and still needs to be the target of ongoing metalexicographic research. Looking at lexicography in the twenty-first century implies that existing procedures should be re-evaluated to see whether they still contribute to the best possible way of ensuring the success of the genuine purpose of a given dictionary. The
Sinuous lemma files in printed dictionaries
horizontal macrostructural ordering is a candidate for such an evaluation. Horizontal ordering can be found in different types of dictionaries and represents a well-established macrostructural procedure. This paper focuses primarily on general monolingual and bilingual dictionaries, with the main emphasis on aspects regarding access and lexicographic functions. 2. Macrostructural procedures: A brief overview 2.1
Different procedures
The macrostructural items in general dictionaries show different ordering procedures. Two prominent methods of arrangement are the vertical and the horizontal methods, also referred to as straight alphabetical ordering and the use of a sinuous lemma file. The vertical arrangement is the default method and all dictionaries, also those that employ a horizontal arrangement, primarily display a vertical arrangement. Dictionaries that have a sinuous lemma file will always have this horizontal ordering complementing the vertical arrangement as primary macrostructural ordering system. The occurrence of a horizontal ordering always presupposes the existence of a more comprehensive vertical ordering. The notion of a sinuous lemma file implies a deviation from the default ordering. Horizontally ordered articles are niched or nested within the overall inclusive vertical presentation. A horizontal ordering must always be regarded as a motivated deviation from the default and never as a default ordering system in its own right. The nature of and motivation for this deviation must not be taken for granted but needs to result from a critical evaluation of its contribution within the domain of the envisaged functions of the given dictionary. Two types of horizontal ordering can be distinguished, i.e. niching and nesting. 2.2
Niching
Niching is characterised by a strict adherence to the alphabetical principle. This applies within the niche but also in the relation between the first lemma in the niche and the immediately preceding vertically-ordered lemma, and the last lemma in the niche and the immediately following vertically-ordered lemma respectively. Although a niche typically contains words with the same first stem and these words are often semantically related, this is no prerequisite for this type of sinuous lemma file because some dictionaries also display niches where no semantic relation exists and even where the lemmata do not have the same lexical item as their
Rufus H. Gouws
first stem. Example 1 shows a niche attached to the article of the preceding lemma sign, here the lemma sign limf (lymph). limf, lymph; ~a´ties, (-e), lymphatic; ~klier, lymph gland; ~sel, (-le), lymphocyte; ~stelsel, lymphatic system; ~vat, lymphatic (vessel); ~vog, chyle; ~weefsel, lymphoid tissue. Example 1: Excerpt from Groot woordeboek/Major dictionary (1997) The application of niching as a type of horizontal ordering is dominated by its adherence to the alphabetical ordering, even if it means that niched words are removed from their semantic environment, as can be seen in Example 2 where the niche has been attached to the article of the lemma sign date2. date1, (n) dadel (vrug). date2, (n) datum, dagtekening; afspraak; jaartal; tydperk; BEAR the ~, gedateer wees; BRING up to ~, (boeke) bywerk; BLIND ~, onbeplande afspraak, lukraak afspraak, toe-oë-afspraak, molafspraak; ~ of EXPIRY, vervaldatum; FIX the ~, jaar en dag bepaal; ~ of ISSUE, datum van uitgifte; KEEP up to ~, byhou; ~ of MATURITY, vervaldag; OUT of ~, verouder; ouderwets, uit die mode; UP to ~, nuwerwets, byderwets, modern; by (tot op) datum; op hoogte van die tyd; in die mode; (v) dateer, dagteken; reken; die datum vasstel; afspraak maak (met ’n meisie); ~ from, dagteken van; reken van; met ingang van; ~d, gedateer, verouderd; be ~d, genooi word (deur ’n man), geskiet word (stud.); ~less, ongedateer, sonder datum; ~ line, datumgrens; datering;… ~ loaf, dadelbrood; ~ mark, jaarmerk, datummerk; datumstempel; ~ oil, dadelolie; ~ palm, dadelpalm; ~ plum, dadelpruim; ~ press, stempelmasjien, datummasjien; ~r, stempelmasjien; ~ stamp, datumstempel; ~ stone, dadelpit; ~ wine, dadelwyn. Example 2: Excerpt from Groot woordeboek/Major dictionary (1997) English has two homonyms date, both presented as guiding elements of separate articles in this dictionary. Horizontal ordering in this dictionary is dominated by the alphabetical ordering, resulting in niching as the prevailing type of sinuous lemma file. Although both lexical items date occur as first stem in compounds or derivations selected as niched lemmata, the compounds semantically related to the first homonym date are not attached as a niche to the article of this lemma. All the complex forms with date as first stem are combined in a single niche, maintaining the alphabetical order of the article stretch, attached to the article of the second lemma date. In this niche the lexical items date loaf, date oil, date palm, date plum, date stone and date wine have the lexical item date, given as first member of the homonym pair, as their first component. They bear no semantic relation
Sinuous lemma files in printed dictionaries
to the lexical item date represented by the lemma sign date2 or to any of the other lemmata in the niche attached to the article of this lemma sign. In its horizontal ordering this dictionary consistently works with a system of niching and strictly adheres to the alphabetical ordering. Niching typically but not exclusively sees compounds and derivations attached to the article of the vertically ordered lemma sign formally resembling the first stem or root of the specific item. The Chambers 20th Century Dictionary (1983) follows the most typical approach by applying it only to derivations and compounds. In the preface it is stated that this is a space-saving procedure. In the 1996 edition of the Duden Deutsches Universalwörterbuch the alphabetical ordering results in lemmata representing simplexes sometimes being attached to the articles of lemmata representing complex forms. The word Lehnbildung is entered as a vertically ordered lemma sign with words like Lehnbildung, Lehne and lehnen attached as niched lemmata. The fifth edition (2003) of this dictionary no longer employs procedures of horizontal ordering but only uses a straight alphabetical ordering. Monolingual dictionaries often include a brief treatment of the niched lemmata. In the Duden Deutsches Universalwörterbuch (2003) a niche is attached to the article of the lemma sign Pudding and this niche contains the lemmata Puddingabitur, Puddingform and Puddingpulver. The article of each of these niched lemmata contains a stress marker, syllable division, grammatical data and a brief paraphrase of meaning. 2.3
Nesting
A distinction can be made between first and second level nesting (cf. Gouws and Prinsloo 2005). Like niching first level nesting also adheres to a strict alphabetical ordering on nest-internal level. An alphabetical deviation prevails between the last lemma in the nest and the first subsequent vertically ordered lemma, cf. the stretch presented in Example 3. ba´ba (-s; -tjie) 1 Baie jong kindjie; suigeling: ’n Baba vang (geselst.), behulpsaam wees met die geboorte van ’n kind. UITDR.: Die baba met die badwater uitgooi – sien onder BADWATER. 2 Die jongste: Die baba van die span. 3 (fig.) Iets wat deur iemand aan die gang gesit is en dus deur hom (vertroetel word): Die skema is sy baba. baba: ~bed, ~dogtertjie, ~handjie, ~kleertjies, ~seuntjie, ~taal, ~tehuis, ~uitset. ba´ba·lief Liewe baba – as liefkosende naam. ba´ba·sit·ter
Rufus H. Gouws
Iemand wat (teen betaling) kinders oppas wanneer die ouers nie tuis is nie. Example 3: Excerpt from HAT (2005) The nest attached to the article of the lemma sign baba internally displays a strict alphabetical ordering. However, the two vertically ordered lemmata babalief and babasitter, immediately following the nest of horizontally ordered lemmata do not follow the last lemma of the nest alphabetically. This results in first level nesting. It may be argued that a cluster like the one in the previous example, and this also applies to some examples of niching, does not represent a partial article stretch but rather a listing of microstructural entries presenting word formation products containing the lexical item represented by the lemma sign. This argument is countered by dictionaries like HAT and the comprehensive multivolume monolingual Afrikaans dictionary Woordeboek van die Afrikaanse Taal (1951-) where the users’ guidelines specifically refer to these entries as lemmata. Hausmann and Wiegand (1989: 336) refer to this as a lemma clustering and thereby acknowledge the macrostructural status of these entries. This claim is substantiated by examples like the following where the user can retrieve additional information, i.e. syllable stress, along with the presentation of orthographic data, from the articles of the horizontally ordered lemmata minnegodin and modellering (Example 4). min´negod, -e. God van liefde: Amor, Kupido, Eros; minnegodin´. modelleer´, ge-. 1. Voorstel, boetseer, vorm (in klei, was, ens.). 2. Klere dra om dit bekend te stel; modelleerder;.le´ring. Example 4: Excerpt from Verklarende Afrikaanse woordeboek (1993) In dictionaries like the already mentioned Chambers 20th Century Dictionary and the Duden Deutsches Universalwörterbuch a less restricted treatment is offered, including a brief paraphrase of meaning. This can also be seen in examples from different dictionaries discussed in Wiegand (1989: 387–392). The focus in this paper will not be on the potential debate amongst lexicographers whether examples like those from HAT and Verklarende Afrikaanse Woordeboek qualify as lemmata or not. In this paper, as in Hausmann and Wiegand (1989) they are regarded as lemmata participating in niching and nesting procedures by means of lemma clustering. Second level nesting prevails where the cluster-internal arrangement does not adhere to a strict alphabetical ordering, as seen in the partial article stretch reproduced in Example 5. broei (ge-) ww. 1. op eiers sit en hulle warm hou om hulle te laat uitkom. 2. voortkom, ontspruit. Daaruit sal onheil -. 3. peins, planne maak. Oor iets -.
Sinuous lemma files in printed dictionaries
4. ontwikkel, in wording wees. Daar is iets aan die – 5. hitte ontwikkel, warm word. Die mis, lug – 6. warm word deur gisting. Die hooi -. 7. deur spesiale verwarming vroeër laat bloei of ryp word. 8. warm kry. In die son sit en -. 9. kleintjies voortbring. Die jakkalse – in die lente. ‘broeiery, broeiing; broeieend, -eiers, -gans, -hen, -hok, -kamer, -kolonie, -paar, – proses, -sak, -tent (by 1); -mis (by 5); -aarde, -bed (by 6); -bak, -glas, -huise (by 7). ‘broeiend (-e; -er, -ste) b.nw. 1. drukkend, … Example 5: Excerpt from Nasionale woordeboek (1987) Gouws and Prinsloo (2005) indicate that the nest in this example not only interrupts the alphabetical ordering with regard to the subsequent vertically ordered lemma (broeiend) but the text block internal ordering also deviates from a strict alphabetical ordering. Morpho-semantic motivations determine the second level presentation in this example. The first two nested lemmata (broeiery and broeiing) are derivations, presented in their own partial article stretch within the nest, whereas all the other lemmata are compounds. The nest-internal partial article stretch of compounds displays its own subdivisions with grouping in terms of the different polysemous senses of the first stem of the compounds, as indicated in the treatment of the article of the vertically ordered lemma broei (to breed) to which the nest has been attached. The very limited treatment allocated to these nested lemmata can help users to know which sense of the polysemous lexical item broei is activated in a given compound. 3. Motivations for the use of a sinuous lemma file 3.1
Space-saving endeavours
Deviations from a straight alphabetical ordering have traditionally been motivated on space-saving grounds. The use of niched and nested lemmata has become a frequent procedure in modern-day lexicography. Although this procedure has been discussed in various research papers (cf. Wolski 1989; Wiegand 1989; 2002), these discussions have primarily been of a contemplative nature. The focus has been on the ways in which existing dictionaries present a horizontal macrostructural ordering. Development in printed dictionaries for the twenty-first century demands a transformative approach to a topic like this in order to ascertain whether horizontal ordering is still a valid lexicographic procedure and to determine where possible improvements can be made in the event of a continued use of this form of macrostructural arrangement in printed dictionaries.
Rufus H. Gouws
The space-saving motivation for horizontal ordering does not only influence the direction of the ordering of lemmata but niched and nested lemmata often receive a much more restricted treatment compared to vertically ordered lemmata. The space-saving drive often also targets the form of the lemma sign, resulting in nested and niched lemmata being subjected to textual condensation. These procedures of textual condensation see horizontally-ordered lemma signs being formally reduced to a place-keeping symbol and a lemma part, as in the condensed lemma sign ~ loaf, representing the lexical item date loaf in the niche attached to the article of the lemma date2 in Groot woordeboek/Major dictionary, as seen in Example 2. 3.2
The selection of lemmata
When employing a sinuous lemma file lexicographers need to decide which lemmata to allocate to the niches or nests and what the selection criteria should be. Although space-saving has been a primary motivation for the use of a sinuous lemma file it cannot be a determining criterion in the selection of the lexical items to participate in procedures of horizontal ordering, because it has nothing to do with the functions of a dictionary. Lemmata included in niches and nests usually represent complex lexical items like compounds and derivations. Compounds and derivations are not exclusively accommodated in nests and niches. The selection and presentation of data in any dictionary should be determined by the functions of the specific dictionary. This also applies to the selection of lemmata for inclusion in the sinuous lemma files. When taking a transformative approach to the planning and compilation of dictionaries and the formulation of lexicographic theory, aspects that fall outside the scope of the functions of the dictionary should not be taken into account. Because the formulation of these functions results from the identification of the user-profile of the dictionary the macrostructural selection and presentation necessarily should also respond to the needs and reference skills of the intended target user group. The selection of candidates for nested and niched lemmata is often done on purely linguistic grounds, especially lexicological and semantic grounds. Whether linguistic motivations like these could and should still be employed in the macrostructural selection needs to be evaluated, and it may be opposed in a functionbased approach. Monolingual dictionaries often display a selection of complex lexical items in a nest or a niche with little or no microstructural entries addressed at these lemmata. The macrostructural presentation of these lemmata bear witness to their existence in the lexicon of the language and users get guidance regarding their spelling and certain aspects of the word formation process, e.g. whether a given lexical item requires a linking phoneme when it occurs as a stem in a complex
Sinuous lemma files in printed dictionaries
lexical item. In Afrikaans the occurrence of the lexical item universiteit (university) as first stem in a compound demands the linking phoneme -s-, except when the second stem starts with an s. On morphological grounds the two compounds universiteitsgebou (university building) and universiteitstudent (university student) could then qualify for selection as candidates for lemmata in a niche or nest attached to the article of the lemma sign universiteit. A typical criterion used for the selection of these niched or nested lemmata in monolingual dictionaries is that of semantic transparency. Even with the focus of many monolingual dictionaries on an explanation of the meaning of the lexical items to be treated, a restricted treatment and therefore an occurrence in the horizontal ordering is allocated to complex lexical items where the meaning of the complex word can presumably be deduced from the meaning of the components of that word. This is a very subjective approach that can hardly be approved from a functional perspective. With regard to these so-called self-explanatory complex items Philip Gove, editor of the Webster’s Third New International Dictionary (1961), remarked that the self in self-explanatory should refer to the intended user; not to the lexicographer. Too often this is not the case and the users who consult the dictionary for a given word because they are not familiar with its meaning have to make their own deductions regarding the meaning of the word. Pure linguistic motivations are used to select the candidates for niche or nest inclusion and the user has no guidance as to determine when a specific linguistic criterion will apply. The notion of a self-explanatory complex item in a monolingual dictionary does not have the concept of self-translatable complex item as counterpart in a bilingual dictionary, because a self-explanatory item in the source language seldom has a self-explanatory item maintaining a one to one relation between the respective stems in the target language. No linguistic or other motivation is given for the selection of items to be included in the horizontal ordering in bilingual dictionaries. The only indication is the fact that the complex words are related to the main lemma by being its derivations or compounds. No consistency or predictability can be found in this selection because some complex items are lemmatised in the vertical and some in the horizontal ordering. It seems to be a haphazard approach and this neither enhances the confidence with which a user consults a bilingual dictionary nor does it help to solve rapid access to these lemmata. Linguistic criteria are not sufficient to introduce a different type of lemmatisation. These criteria are based on the linguistic expertise of the lexicographers and their interpretation of the presumed transparency of a given complex word and relies on the linguistic intuition of dictionary users.
Rufus H. Gouws
4. Access 4.1
Rapid access and successful access
Without disregarding the importance of space and the need to save space, a problem not uncommon in the making of printed dictionaries, a lexicographer can hardly argue that space-saving procedures can or should play a role in achieving the functions of a given dictionary. The importance given to space-saving procedures in the implementation of niching, nesting and, especially textual condensation, often coincide with the ill-fated notion that the layout of a dictionary is not part of the lexicographic process but merely a, too often only cosmetic, assignment to be undertaken by the publisher (cf. Almind and Bergenholtz 2002). In the planning and compilation of any dictionary it has to be a priority to ensure successful access to data by the intended target users. Different users will employ different access routes to reach their required destination in a dictionary and the layout of the dictionary and of the individual articles play an extremely important role in this access process. Space-saving procedures, whether it is the dictionary and article layout, the use of textual condensation or the use of a horizontally ordered macrostructure, that impede an optimal access to data also impede an optimal retrieval of information and therefore also impede the fulfilment of the lexicographic functions and the consequent success of achieving the genuine purpose of the dictionary. The access process (cf. Bergenholtz and Gouws 2007) is initiated by a specific extra-lexicographic problem that needs to be solved. During the pre-consultation phase of the access process the user decides where and how to look for the solution to a specific problem. Users then embark on the outer search route to try and reach the required lemma sign. When evaluating the access process one should not only look at the eventual success of reaching the required lemma sign that functions as guiding element of a nested or a niched article. For users employing a dictionary as a utility tool the speed of access is also important. Where a dictionary is used, e.g. by language practitioners or translators who need rapid access to data in a dictionary when busy performing their professional duties, the layout and presentation play a vital role in defining the success of the dictionary as a tool for these users. But although rapid access is important and adds to the value of the dictionary as a practical instrument it is not part of the functions of the dictionary. A successful dictionary consultation procedure and an optimal retrieval of information to achieve a specific lexicographic function and to satisfy a given user need are still possible whether by means of rapid or slow access. The speed and easiness of the access have no direct bearing on the lexicographic functions but successful
Sinuous lemma files in printed dictionaries
access is a prerequisite for the fulfilment of the lexicographic functions of any given dictionary. Adherence to a strict alphabetical ordering, even within the sinuous lemma file, when employing a system of niching can be negotiated by many dictionary users; although not with the same success as the ordering according to a consistently applied straight alphabetical procedure. Where the intended target users lack the appropriate dictionary using skills lexicographers will do well to refrain from using a sinuous lemma file. First level nesting offers a slightly more complicated access route than niching because the user may not know that a given lemma is to be found in the nest and may continue searching down the vertical ordering. Having reached the next vertically ordered lemma sign, a lemma sign that alphabetically should precede the required lemma sign, the user may further continue down the vertical ordering, still expecting to find the relevant lemma sign. Failure to achieve that may then compel the user to backtrack on the search route and to enter the nest where the required lemma sign can be found. This problem of finding the lemma sign may impede successful dictionary use. 4.2
Access to nested lemmata
As indicated earlier a grouping into partial article stretches in the lemma cluster attached to the lemma sign broei (cf. Example 5) is done on account of word-formation types, i.e. derivations versus compounds. Within the partial article stretch containing the compounds a semantic motivation, i.e. the specific sense of the lexical item represented by the first stem of the compound, leads to the introduction of different subclassifications. The implementation of these methods of ordering may enhance the quality of the retrieval of linguistic information from the dictionary. However, this ordering of the sublemmata within the article nest is exclusively done on linguistic grounds. The lexicographers have decided that although the nested lemma signs represent semantically transparent complex words, the dictionary user still needs additional guidance to ensure the proper interpretation of the meaning of the given word; hence the grouping into partial article stretches of lemmata in which the same sense of the lexical item represented by the first stem occurs. Within this cluster two main additional ordering types are used, i.e. a word formation and a semantic ordering. They relegate the alphabetical ordering to a second level procedure. In a dictionary where the alphabetical ordering prevails, the introduction of new types of ordering in selected partial article stretches confuses the users. If a user consults this dictionary to find the word broeibed he/she may or may not know that the word is a compound and not a derivation. Not knowing the meaning of a word, and therefore consulting the
Rufus H. Gouws
dictionary, implies that this user will not know which sense of the lexical item broei prevails in the compound broeibed. A typical consultation procedure would therefore have to negotiate all the lemmata in the nest from the first one until the required lemma is found. If the user is familiar with the system used in the dictionary he/she might realize when they come to the end of the partial article stretch, reproduced in Example 6, broei-eend, -eiers, -gans, -hen, -hok, -kamer, -kolonie, -paar, – proses, -sak, -tent Example 6: Partial article stretch adapted from Nasionale woordeboek (1987) without having had success in finding the required word, i.e. broeibed, that its absence in this particular alphabetically ordered partial article stretch does not necessarily imply that the word has been omitted from the dictionary. They would realize that their search route may take them via different alphabetically ordered partial article stretches within the one article cluster and that they may still find the required word. Users less familiar with the system may not realize that and may easily quit the search if the required word has not been found in an alphabetically ordered partial article stretch where the last lemma(ta) would alphabetically have followed the lemma sign they are looking for. Here successful dictionary consultation relies on prior linguistic knowledge on the side of the user, knowledge not to be expected from someone having to consult this dictionary. Had this dictionary resorted to niching as procedure for the horizontal ordering of lemmata the result would have been as in Example 7. broei-aarde, -bak, -bed, -eend, -eiers, -gans, -glas, -hen, -hok, -huise, -kamer, -kolonie, -mis, -paar, – proses, -sak, -tent Example 7: Niching resulting in horizontal ordering of lemmata In spite of the condensed form of the lemma signs, a presentation of nothing more than place-keeping symbols and lemma parts, access to a word like broeibed would have been much easier in a niched cluster. From an access perspective future dictionaries should rather avoid the nested cluster version of sinuous lemma files. 5. Niching, nesting and lexicographic functions From the motivations for the use of a sinuous lemma file presented in a preceding section it is clear that lexicographic functions did not play any role at all. This is typical of many older dictionaries with regard to many decisions. Lexicography in
Sinuous lemma files in printed dictionaries
the twenty-first century needs to break away from this tradition and dictionaries should be planned and compiled with lexicographic functions having their rightful impact. Within the functional theory it has been emphasized that lexicographic functions should determine every aspect of the dictionary (cf. Tarp 2000). This implies that when evaluating a dictionary on the basis of linguistic functions one has to view each aspect of such a dictionary in terms of the contribution it makes (or does not make) to satisfying the functions identified for the given dictionary. From the perspective of dictionary functions niching does not necessarily have to be abandoned as a macrostructural ordering type. Problems with regard to niching are primarily restricted to the speed of access, with the rapidness of access being impeded by both the horizontal ordering and the frequent use of condensed lemma signs. Although the functional approach does not imply the abandonment of procedures of horizontal ordering, this macrostructural procedure, as seen in some of the dictionaries employing this system, has a limited value, compared to the typical application of vertically ordered macrostructures in similar dictionary types. In the examples from both monolingual and bilingual dictionaries presented in this article the space-saving endeavours in the use of a horizontal ordering of lemmata also induce a restricted treatment in the articles with these lemmata as guiding elements. In the examples from the bilingual dictionary the treatment was restricted to orthographic guidance, rudimentary data on inflection in some articles and a presentation of one or more translation equivalents without any cotextual data and, as an exception, perhaps an entry giving some contextual guidance. Although dictionaries like the Chambers 20th Century Dictionary and the Duden Deutsches Universalwörterbuch give horizontally ordered lemmata a treatment not too different from that given to their vertically ordered counterparts, the examples of articles from the monolingual dictionaries given in this paper display an even worse situation than that prevailing in the bilingual dictionaries, with no sign of a paraphrase of meaning, the most typical entry in the default articles of these dictionaries. An evaluation of the procedures of nesting and niching from a functional perspective should not only be directed at the macrostructural ordering but also at the implications of the ordering, i.e. the type of treatment allocated to the horizontally ordered lemmata. A well-balanced evaluation of such an ordering should not look at the target of the evaluation in isolation but should contextualise it within the given dictionary as a whole. It should be evaluated whether the functions foreseen for the dictionary as a whole can also prevail in the dictionary sections to be investigated, here the articles of niched and nested lemmata. Unfortunately many dictionaries, including the ones referred to in this paper, do not explicitly state their envisaged functions. Having being compiled in an era where the focus in theoretical lexicography had not yet been on lexicographic functions one cannot,
Rufus H. Gouws
however, argue that the functional approach had not played any role in the planning and compilation of these dictionaries. Even if the term lexicographic functions had not come into the discussion the selection and presentation of data had to be done in accordance with the dictionary type and the needs of the typical user of such a dictionary. The advent of the modern theory of lexicographic functions should not be seen as a phase in the development of lexicography where new theoretically developed ideas are merely imposed on the lexicographic practice. The theory of lexicographic functions brings the assignment of a dictionary as a utility tool to the fore and emphasises the way in which such a tool needs to be directed at the specific needs of specific users in a specific situation of dictionary use. Compiling a function-based dictionary does not imply the introduction of a whole new range of macrostructural or microstructural entry types. It rather leads to a selection of items that will best ensure the satisfaction of the relevant function(s). Although dictionaries do not always refer to their function(s) it may never be presumed that they do not have functions. Users consult those dictionaries for specific purposes and these dictionary consultations are invariably linked to specific lexicographic functions – whether explicitly mentioned or not. A transformative approach needs to create an increased awareness of dictionary functions and the way in which it should be dealt with in the lexicographic practice. The dictionaries referred to in this paper clearly employ a multifunctional approach, with the focus on both cognitive and communicative functions and, in the latter, directed at text reception as well as text production functions. The use of macrostructural procedures of horizontal ordering should be evaluated against this background. One should compare the information users can retrieve from the articles of horizontally ordered lemmata to the retrieval from articles of vertically ordered lemmata. This comparison could lead to a determination of the success in satisfying the relevant dictionary functions. Lexicographic treatment makes provision for varying degrees of assistance with regard to lexicographic functions. Users may consult a given dictionary for text production purposes and their need may be satisfied by a brief paraphrase of meaning, ascertaining a sense or meaning of the word which they need in the text production process or even by only getting the proper spelling form of the given word. This is a light or a weak form of text production assistance. A stronger form would include, besides the paraphrase of meaning, additional entries like example sentences illustrating the typical use of the given word, morphological entries enabling the user to use the word in its inflected forms, etc. A weak version of text reception needs prevails, e.g. where the user merely wants to link a complex word to the simplex word which is semantically related to the first stem of the complex form. The user may find the word babadogtertjie in a text and by consulting a monolingual dictionary a niched
Sinuous lemma files in printed dictionaries
occurrence like the following may help the user to understand the word – especially if the user is a mother-tongue speaker of Afrikaans (Example 8). ba´ba (-s; -tjie) 1 Baie jong kindjie; suigeling:..... baba: ~bed, ~dogtertjie,... Example 8: Excerpt from HAT (2005) In a similar way the nature and extent of the treatment in horizontally ordered articles in a bilingual dictionary may ensure the degree of assistance regarding a given function that will sufficiently meet the needs of certain users. Limiting the entries in a given article does not necessarily mean a decrease in the lexicographic quality or the functional value. What is important is that the restriction, whether it applies to all articles or only to the horizontally ordered articles, should still result in an article that embodies the functions of that dictionary. A lexicographic functions based approach implies that all entries in a dictionary should be selected and treated accordingly. This also implies all the different components; consequently Gouws and Steyn (2005) have suggested the use of a transtextual approach to lexicographic functions so that the outer texts could also actively support the functions identified for the given dictionary. The decision to embark on a procedure of horizontal ordering should be made, taking cognisance of the functions of the intended dictionary. From a functional perspective the examples taken from the monolingual dictionaries HAT and Nasionale woordeboek (cf. Examples 3 and 5) offer no assistance in terms of the cognitive function, a function that should prevail where the genuine purposes of these dictionaries need to be achieved. Where a weak version of text reception assistance applies, and even in an extremely weak version of text production assistance, with the user interested in ascertaining only the spelling of a given form, the horizontal ordering as seen in the given examples of niching and even first level nesting may suffice. But the value of the dictionary may be of a more limited nature compared to other dictionaries with a consistent straight alphabetical ordering. When using niched and first level nested lemmata the lexicographer introduces a discriminating system into the dictionary whereby an envisaged function cannot be satisfied to the same degree in all the articles of the dictionary. Where the dictionary is designed to assist with a strong degree of text reception the given examples fail to satisfy the needs of the user, whereas the need for a moderate to strong text production function is totally ignored. Where the niched examples still allow reasonably easy access to the lemmata the application of second level nesting (cf. Example 9) yet again impedes access and eschews the lexicographic functions.
Rufus H. Gouws
‘broeiery, broeiing; broei-eend, -eiers, -gans, -hen, -hok, -kamer, -kolonie, -paar, – proses, -sak, -tent (by 1); -mis (by 5); -aarde, -bed (by 6); -bak, -glas, -huise (by 7). Example 9: Excerpt from Nasionale woordeboek 1987 Looking for text production assistance to use, e.g. the word broeibed a user will in the first instance have problems to access the lemma sign, as previously discussed. Semantic information cannot be retrieved in a direct way because no paraphrase of meaning is addressed at this lemma sign. The retrieval of semantic information regarding this word can only be achieved by means of complex and cumbersome methods in which an indirect form of remote partial addressing is involved. The entry (by 6) indicates that sense number six given for the preceding main lemma broei applies in this compound. The paraphrase of meaning given as sense number 6 is primarily addressed at the lemma sign broei and indirectly by means of article external remote addressing at the sublemmata broeiaarde and broeibed, each represented by the combination of a place keeping symbol and a lemma part, i.e. -aarde, -bed respectively. As is the case with access the clustering of nests as a procedure of horizontal ordering is also detrimental to the fulfilment of the lexicographic functions and should rather be avoided in general dictionaries. 6. New uses of lemma nesting In spite of some critical remarks regarding the use of lemma nesting in many existing dictionaries a specific version of this procedure can be employed to ensure an improvement in the lexicographic practice and lexicographers should be open to new uses of old procedures. Being under constant scrutiny the lexicographic presentation of treatment units will necessarily undergo changes from time to time and dictionaries in the twenty-first century should also give evidence of innovative applications in this section of the lexicographic practice. Lemmata are regarded as the default treatment units but microstructural entries can also be elevated to treatment units by means of procedures of non-lemmatic addressing. Lemma nesting can play an important role in ensuring a better presentation of certain treatment units. Traditionally, procedures of lemma nesting have primarily been applied to complex words. However, it has been indicated in this paper that a dictionary like the Duden Deutsches Universalwörterbuch has also included simplex forms in the nests attached to the article of a lemma sign representing a compound word. In general dictionaries, both monolingual and bilingual, multiword expressions are
Sinuous lemma files in printed dictionaries
typically included and treated in the article of a lemma sign representing a word regarded as core of the multiword expression. A fixed expression like blood is thicker than water will typically be included in a slot for fixed expressions within the article of the lemma sign blood. Multiword fixed expressions are fully-fledged lexical items but their inclusion within the article of another lemma sign does not reflect their status as independent lexical items. Instead of including them as microstructural treatment units they need to be regarded as macrostructural entries. Gouws (2008) argues that acknowledging fixed expressions as macrostructural items demands allocating them an appropriate macrostructural position. Due to many practical problems regarding the alphabetisation of fixed expressions as main lemmata in the straight alphabetical vertical ordering of a dictionary, their occurrence in a text block attached to the article of a vertically ordered lemma, representing a word that has been identified as guiding element of the fixed expression, seems to be the best possible way of entering them in a dictionary. Such a text block could accommodate from a single to an infinite number of fixed expressions. This text block should not be seen as a microstructural slot but as a nested partial article stretch that includes multiword expressions as macrostructural treatment units. This might imply the sacrifice of the traditional notion of a straight alphabetical ordering in these dictionaries. In current dictionaries the access process guiding users to fixed expressions goes via a word selected as guiding element of the fixed expression. This will remain the same. As fully-fledged treatment units fixed expressions can now be dealt with in a systematic way, as is the case with other sublemmata. Different types of general dictionaries do not have to devise ad hoc methods to include and treat these items and as lemmata they can receive a more extensive treatment. This will ensure the systematic inclusion of fixed expressions in general dictionaries but will also enhance the way in which their presentation and treatment are in accordance with the lexicographic functions identified for a given dictionary. This is an important improvement on the current presentation and treatment that too often fail to assist users who consult general dictionaries with a strong text production need with regard to the use of fixed expressions. 7. Concluding remarks In the planning of future dictionaries lexicographers should base their decision whether or not to employ sinuous lemma files, not merely on lexicographic tradition. These procedures may be used but their use should be motivated in terms of the user-profile and the envisaged functions of the dictionary. Distinctions need to
Rufus H. Gouws
be made between different types of sinuous lemma files, including lemma clusters, and their implications for the various functions of the dictionary and the successful access to the data on offer. References A. Dictionaries Botha, W.F. et al. (eds). 1951–. Woordeboek van die Afrikaanse Taal. Stellenbosch: Buro van die WAT. De Villiers, M. et al. (eds.). 1987. Nasionale Woordeboek. Seventh edition. Cape Town: Nasou. Drosdowski, G. et al. (eds.). 1996. Duden Deutsches Universalwörterbuch. Mannheim: Dudenverlag. Dudenredaktion. 2003. Duden Deutsches Universalwörterbuch. Fifth edition. Mannheim: Dudenverlag. Eksteen, L.C. et al. (eds.). 1997. Groot Woordeboek/Major Dictionary. Fourteenth edition. Cape Town: Pharos. Labuschagne, F.J. and Eksteen, L.C. 1993. Verklarende Afrikaanse Woordeboek. Eigth edition. Pretoria: J.L. van Schaik. Gove, P.B. et al. (eds.). 1961. Webster’s Third New International Dictionary of the English Language. Springfield, Massachusetts: Merriam-Webster. Kirkpatrick, E.M. et al. (eds.). 1983. Chambers 20th Century Dictionary. Edinburgh: Chambers. Odendal, F.F. and Gouws, R.H. 2005. Verklarende Handwoordeboek van die Afrikaanse Taal. Fifth edition. Cape Town: Pearson Education.
B. Other literature Almind, R. and Bergenholtz, H. 2000. “Die ästhetische Dimension der Lexikographie.” In Bild im Text – Text und Bild, U. Fix, U. and H. Wellmann (eds.), 259–288. Heidelberg: C. Winter. Bergenholtz, H. and Gouws, R. H. 2007. “The access process for fixed expressions.” Lexicographica 23: 236–260. Gouws, R.H. 2008. “Fixed word combinations as second level treatment units in dictionaries.” Paper read at the Fifth International Colloquium on Lexicography, Bratislava. Gouws, R.H. and Prinsloo, D.J. 2005. Principles and Practice of South African Lexicography. Stellenbosch: African SunMedia. Gouws, R. and Steyn, M. 2005. “Integrated outer texts: a transtextual approach to lexicographic functions.” In Schreiben, Verstehen, Übersetzen, Lernen, I. Barz, H. Bergenholtz and J. Korhonen (eds.), 127–136. Frankfurt: Peter Lang. Hausmann, F.J. and Wiegand, H.E. 1989. “Component parts and structures of general monolingual dictionaries: a survey.” In Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 328–360. Berlin/New York: Walter de Gruyter. Tarp, S. 2000. “Theoretical challenges to LSP lexicography.” Lexikos 10: 189–208.
Sinuous lemma files in printed dictionaries Wiegand, H.E. 1989. “Aspekte der Makrostruktur im allgemeinen einsprachigen Wörterbuch: alphabetische Anordnungsformen und ihre Probleme. ” In Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 371–409. Berlin/New York: Walter de Gruyter. Wiegand, H.E. 2002. “Über textuele Strukturen der Wörterbuchartikel und Artikelnischen im De Gruyter Wörterbuch Deutsch als Fremdsprache.” In Perspektiven der pädagogischen Lexikographie des Deutschen II, H.E.Wiegand (ed.), 413–442. Tübingen: Max Niemeyer. Wolski, W. 1989. “Das Lemma und die verschiedenen Lemmatypen.” In Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 360–371. Berlin/New York: Walter de Gruyter.
Reviewing printed and electronic dictionaries A theoretical and practical framework Sandro Nielsen Dictionary reviewing is an integral part of the lexicographic universe. However, lexicographers have called for generally applicable principles embracing both printed and electronic dictionaries. I propose that scholarly reviews contain information that is useful to their intended audiences and a set of principles that are generally applicable to achieve this goal. I suggest that reviewers may write useful reviews by treating the dictionary as an object of analysis and examination. I further propose that the lexicographic, factual and linguistic approaches should be combined with the three significant features of dictionaries: lexicographic functions, data and structures. Finally, I argue that a review should give a true and fair view of the dictionary, and this will strengthen the role of dictionary reviews in the scholarly discourse community. Keywords: dictionary reviewing; lexicographic functions; lexicographic data; lexicographic structures; significant features; useful information
1. Introduction Dictionary reviewing is an integral part of the lexicographic universe. As this universe has gradually expanded – the advent of electronic dictionaries is a case in point – it is necessary to take a fresh look at the work involved in dictionary reviewing. Reviews come in many shapes and sizes and some contribute to the development of lexicography more than others, for instance because of their substantive contents and the media in which they are published. Nonetheless, the communicative situations in which they are rooted affect readers’ perception of reviews, the way reviewers express themselves, and the extent of their contribution to lexicography. Some reviews are intended to be read by authors of dictionaries, other lexicographers, publishers, subject-field experts, and the specialist readers of a particular journal. Other dictionary reviews are written for potential buyers and the general readers of a popular magazine, both experts and laypersons.
Sandro Nielsen
Dictionary reviews are generally ranked low among the various types of scholarly writings. One reason is the lack of generally applicable principles, and as Jackson (2002: 173) suggests one of the most important issues for dictionary reviewing “is to establish a sound and rigorous basis on which to conduct the criticism together with a set of applicable criteria.” A statement like this calls for a set of general principles that embrace both printed and electronic dictionaries regardless of type, i.e. general and specialised dictionaries, monolingual and bilingual dictionaries etc. In this paper I will first attempt to identify the characteristic features of the academic or scholarly dictionary review and then propose some guidelines that can function as the basis of a theoretical and practical framework for writing reviews of printed and electronic dictionaries. The framework should be seen as a set of general principles underlying the detailed information that make up the actual review, so that the information contained in the review is useful to readers in the sense that it contributes to theoretical and/or practical lexicography. It does not attempt to impose a duty on reviewers to use specific and detailed information, words, etc.; the aim of the framework is to set out the objectives and basic principles of dictionary reviews. This involves the examination of the dictionary as an object of analysis, examination and description; the actual approach to reviewing; and the fundamental requirements as to the informative value of the review. 2. Identifying key features of scholarly reviews Reviews published in academia are examples of what is generally referred to as scholarly communication. More specifically, dictionary reviews can be viewed as a specific genre within the disciplinary community called lexicography. Individual disciplinary communities often share principles and practices, but it is widely accepted that each discipline has developed its own practices, principles and requirements. Kling and McKim (1999: 905) provide a general standard by claiming that scholarly communication is “a communicative practice anchored in three dimensions: publicity, accessibility, and trustworthiness.” Dictionary reviews meet the first two criteria if they are published in journals that are publicly accessible and potential readers are made aware of their existence. Fulfilment of the criterion of trustworthiness, also referred to as reliability, partly depends on whether reviews are peer reviewed (they are in effect published peer reviews) and partly on the practices, principles and requirements of the lexicographic community. The following discussion proposes a number of principles and requirements that may be used as benchmarks for trustworthy dictionary reviews. Not all dictionary reviews can rightly be described as academic or scholarly. One reason is that they are published in non-reviewed journals, do not have as
Reviewing printed and electronic dictionaries
their genuine purpose to contribute to the development of lexicography, and nor do their authors intend them to do so. Most reviews in newspapers, popular weeklies and trade magazines are examples of non-academic reviews, and their genuine purpose is to draw attention to a new publication, to describe it and to inform consumers in general that it has been introduced to the market. This does not mean that such reviews can never contribute to the development of lexicography, but they are usually written and published in non-academic settings in which the actors and media are not known for their scholarly interests. In contrast, Jackson (2002: 183) argues that academic dictionary reviews generally have the potential of contributing to lexicographic research, and such reviews are written and read by researchers as a natural part of their job. Most people regard researchers and their scholarly products as being in a different league than ordinary people and ordinary products. But what distinguishes reviews that contribute to research from those that do not? One answer is provided by Phillips and Pugh (2005: 48), who argue that “Research goes beyond description and requires analysis.” If this is extended to lexicography, one result is that dictionary reviews should not merely be descriptive but contain more or less thorough analyses, evaluations and reflections. This is supported by findings reported in the theoretical literature. Ripfel (1989: 32–33) proposes that an academic dictionary review is an examination that should contain a description of the contents of the dictionary (or dictionaries) under review, make an evaluation, explain the reasons for making the evaluation (including an explicit statement about the underlying theories and methods employed), and draw positive, negative or mixed conclusions. In other words, scholarly reviews that contribute to lexicographic research are at a level above everyday reflection and contain analyses, evaluations and findings that can form the basis of research conducted by others. The next step is to examine the objective of academic reviews. In general, the objective or goal of reviews is to give useful information to readers. The potential reader groups of any review are many and varied, but the following are the most important for this discussion. The primary audience of academic reviews is made up of scholars and researchers. This group includes researchers within lexicography and within other disciplines, who read reviews to discover new insights that they can use in their own research. The authors and publishers of the dictionaries examined are also primary audiences, even though some authors of specialised and technical dictionaries are experts in research areas other than lexicography and not all publishers employ in-house lexicographers. Another important group consists of dictionary users, either existing or potential users who look for stateof-the-art lexicographic products to satisfy their needs in specific types of situations. Authors and publishers of other dictionaries constitute a group whose members may find inspiration and solutions in reviews, which they themselves can use
Sandro Nielsen
in their future work. The fifth relevant group consists of ordinary dictionary buyers, including existing and potential users, librarians and donors who search for high-quality gifts to, for instance, relatives. Finally, persons generally interested in literature or in domain-specific literature are also likely readers of dictionary reviews. It should be noted that the above groups may overlap, but they represent different, though general, types of primary and secondary audiences that may find the information in dictionary reviews useful. Information that is useful to readers may take many forms. In this context the term useful information means information that enables readers of dictionary reviews to make informed decisions in the following general types of situation: 1. When they develop practical and/or theoretical lexicographic principles; 2. When they form opinions about dictionaries in practical and theoretical contexts; 3. When they use dictionaries in their practical and theoretical work; 4. When they buy or contemplate buying dictionaries; 5. When they recommend dictionaries. The above discussion makes it possible to provisionally identify the key features of scholarly reviews: they are reviews providing information that is useful to readers in making decisions about the use, purchase or recommendation of dictionaries, in forming and holding opinions about dictionaries, or in contributing to the practical and theoretical development of lexicography. In order to create the proper basis for achieving this objective, it is necessary to take a closer look at dictionaries as the objects of examination. 3. Defining the object of dictionary reviews In every examination it is necessary to have a clear definition of what you are studying. Obviously, the objects of dictionary reviews are dictionaries. But what actually is a dictionary? First of all, it should be appreciated that in a review process the dictionary changes from being mainly an object for use and becomes an object of examination, description and analysis. A study of various theoretical contributions to dictionary criticism, including Chan and Taylor (2001), Jackson (2002: 173–183), Osselton (1989), Steiner (1979) and Tomaszczyk (1986), reveal a relatively strong bias towards a linguistic approach to dictionary reviewing. They generally treat dictionaries as containers of purely linguistic concepts and focus on linguistic categories such as word classes, affixes, pronunciation, grammatical information (e.g. inflection, comparison and syntax), semantic and encyclopaedic information and equivalents. This is in line with the traditional understanding of
Reviewing printed and electronic dictionaries
dictionaries, one that is also found in the article dictionary in Oxford English Dictionary Online (2008): A book dealing with the individual words of a language (or certain specified classes of them), so as to set forth their orthography, pronunciation, signification, and use, their synonyms, derivation, and history, or at least some of these facts: for convenience of reference, the words are arranged in some stated form, now, in most languages, alphabetical; and in larger dictionaries the information given is illustrated by quotations from literature; a word-book, vocabulary, or lexicon.
This definition is traditional in that it is based on linguistic categories. A reviewer who bases his or her review on this or a similar definition runs the risk that the linguistic categories become a surrogate for the dictionary as they, and not the dictionary, become the object of the review. The approach to reviewing will then be linguistic and not lexicographic. This linguistically based understanding of the dictionary is the likely reason why, according to Bergenholtz and Mogensen (1993: 29), the two most frequent topics addressed in reviews are the lemma stock and semantic/encyclopaedic information. Rather than approaching the dictionary from a linguistically inspired point of view, it may be more appropriate to introduce what may be called a lexicographic approach to the concept of dictionary. According to Nielsen (2003: 111–112) this approach focuses on the significant features of a dictionary, i.e. features that are inherent in any dictionary and are not trivial for classification purposes. A significant feature helps to shed light on the existence of a dictionary as an object of investigation, description and analysis. The advantage of this approach is that it applies to printed, CD-ROM as well as online dictionaries, regardless of type; it highlights the fact that dictionaries are more than linguistic categories. Furthermore, it is theoretically neutral as it can be used as a starting point no matter which “theoretical school” reviewers belong to. Both printed and electronic dictionaries have at least three significant features. Firstly, the overriding feature of a dictionary is that it has been designed to fulfil one or more functions, referred to as lexicographic functions, e.g. communicative functions such as the understanding of texts, translation and text production, and cognitive functions such as knowledge acquisition in communication-free contexts. Secondly, the dictionary contains lexicographic data that have been selected to support the function(s) of the dictionary. The interrelationship between the first and second feature is important because each lexicographic function has been materialised through the items containing the data. Thirdly, the lexicographic structures combine and link the data in order to support and fulfil the dictionary function(s). Lexicographic structures may be order structures (e.g. the macrostructure), network structures (e.g. the cross-reference structure), information
Sandro Nielsen
structures (e.g. the distribution structure) and links in online dictionaries. These features should not be seen in isolation, as the dictionary is made up of the totality of the above three significant features and their interrelationship. A dictionary may have features other than those described above as being significant, but they will not be considered here. It follows that a dictionary is not merely a container of the lexicon of a language. It is much more than that. Only one of the three significant features directly relate to linguistics and linguistic principles, namely the lexicographic data. The other two features do not fall within linguistics with the result that linguistics concerns one-third of the concept of dictionary. Lexicographic functions deal with the help dictionaries can give users when they are engaged in a particular type of activity that has nothing to do with dictionaries per se, for instance translation. The way in which the data are structured and linked to give optimal help is also of no concern to linguistics, as evidenced by e.g. the creation of electronic links. Reviewers should therefore regard dictionaries as complex entities that provide specific types of help to specific types of users in specific types of situations. On the basis of the above discussion, it is appropriate to revert to the provisional identification of key features discussed in Section 2. The fact that the object of dictionary reviews is a complex entity with several significant features means that the provisional view expressed at the end of Section 2 needs to be extended. The objective of scholarly reviews is to provide information about the functions, data types and structures of one or more dictionaries that is useful to readers in making decisions about the use, purchase or recommendation of dictionaries, in forming and holding opinions about dictionaries, or in contributing to the practical and theoretical development of lexicography. This objective can be achieved through different approaches to reviewing. 4. Three approaches to reviewing dictionaries A framework embracing both printed and electronic dictionaries should not be biased towards linguistically based approaches to reviewing. In some instances, however, it may be appropriate to focus on linguistic elements, e.g. if the dictionary in question is a language dictionary designed for learners, such as Collins Cobuild English Dictionary (1995), Longman Dictionary of Contemporary English (2005) and Oxford Advanced Learner’s Dictionary (2005), particularly if the reviewer only or mainly addresses linguistic aspects, rules, structures etc. Nevertheless, if the full potential of a review is to be realised, including the review of dictionaries for specialised languages and technical dictionaries, it is necessary to look at reviewers and their competences in a new light.
Reviewing printed and electronic dictionaries
There are three basic approaches to reviewing dictionaries that can and should be combined irrespective of dictionary type. The first is the lexicographic approach, which analyses, describes and evaluates the lexicographic practices and principles used in compiling the dictionary under review. These practices and principles may be compared with those lexicographic practices and principles that, for one reason or another, were not used by the dictionary authors. This approach requires knowledge of lexicographic theories, principles and practices, and it is relevant whether the dictionary is a printed or an electronic one. Secondly, the factual approach focuses on an analysis, description and evaluation of the factual (semantic and encyclopaedic) data and topics contained and treated in the dictionary related to the lexicographic functions described above. This approach requires knowledge of specific subject fields, and it is important whether the dictionary is a sub-field, single-field or multi-field dictionary as described in Nielsen (1990: 132–135). Finally, the linguistic approach involves an analysis, description and evaluation of the language treated by the dictionary; it requires knowledge of language for general purposes (LGP) and/or language for specific purposes (LSP) as appropriate, and it is relevant whether the dictionary is mono-, bi- or multilingual. In order to make reviews as useful to readers as possible, these three approaches may be directly related to the significant features of dictionaries. 5. Combining significant features and approaches to reviewing When striving to realise the full potential of a review, reviewers may combine the three significant features of dictionaries and the three approaches to reviewing. There are various options available to reviewers and they can be more or less detailed, but examples of potential combinations are discussed below. 5.1
Using the lexicographic approach
Using the lexicographic approach, reviewers may address general issues that are relevant to all three significant features. However, it is important to note that printed and electronic dictionaries are not a text type. As pointed out by Bergenholtz, Tarp and Wiegand (1999: 1763), the printed dictionary is a collection of text types, e.g. the preface, the user guide, the dictionary grammar, the subject-field component and the wordlist. These texts are structural components and it is appropriate to establish some sort of relationship between them. At a functional level, the components are related in terms of data contents, and at a structural level they are related in terms of the actual form of the dictionary. Accordingly, all the individual components constitute a coherent whole: the dictionary; and the relationships
Sandro Nielsen
between the components form the basis of defining the scope of dictionary reviews. The position is similar with online dictionaries, as they may and often do consist of a number of text files that are linked to the database in which searches are made. Some online dictionaries have prefaces, user guides and additional texts, for instance Oxford English Dictionary Online (2008) and Merriam-Webster Online Dictionary (2008). Each dictionary component may then be examined on the basis of criteria that apply to each text type or genre, and these criteria can then be related to the three significant features of the dictionary. For instance, it may be relevant to discuss whether the outside matter is integrated or non-integrated in the sense described in Nielsen (1999: 94), as this is directly related to the function(s) and structures of the dictionary; further see Section 5.2 below. In its front matter, American Heritage College Dictionary (2004) has a text titled “Style Manual”, which is clearly relevant to the communicative functions text production, text revision and text editing, but no mention is made of this fact in the user guide. Reviewers should point this out as it would have been desirable if the dictionary had explicitly made users aware of the potential help that can be found in the dictionary. This lack of attention to user guides by reviewers is also noted by Chan and Taylor (2001: 168), who claim that most reviewers do not “appreciate the potential contribution made by information contained in the user’s guide to using a dictionary to its fullest extent.” Reviewers who take into account the various independent, though interrelated, dictionary components are more likely to provide useful information to readers than those who do not. For instance, after having read the review of Van Dale Pocketwoordenboek Nederlands als tweede taal (Van Dale Pocket Dictionary Dutch as a Second Language) in Hiligsmann (2005), readers still do not know what the dictionary looks like, nor do they know which components it contains and how the data in the components relate and interact. Even if dictionaries contain a minimum of components whose data do not relate to each other or the lexicographic functions, this would be useful information. Similarly, reviews that merely mention that the dictionaries examined contain front and back matter texts (e.g. Koltzé 2008) without explaining whether and how the data support the functions of the dictionary do not fully live up to the requirement of providing useful information in the sense described in Section 2 above. Use of the lexicographic approach would lead to more accurate evaluations of dictionaries and, one would hope, more useful reviews. Reviewers may present the lexicographic approach by using the functional method, which looks at the function(s) of the dictionary. It is relevant to analyse, describe and evaluate what functions the dictionary may fulfil, e.g. communicative functions such as the understanding of texts, text production, text revision, editing or translation, and cognitive functions such as knowledge acquisition unrelated to specific oral or written communicative acts. This method allows the reviewer to
Reviewing printed and electronic dictionaries
demonstrate that the dictionary may be used for functions and in user situations that the authors did not mention in the dictionary’s metatext (usually in the preface), or that the dictionary is suitable − or not suitable − for all or some of the functions and user situations mentioned in the metatext. It is possible to make a comparative review by combining the functional method with the temporal method, which introduces a time element, e.g. by looking at different editions of a dictionary. This involves two methods of presentation: The synchronic presentation looks at the edition of the dictionary under review and includes comparisons with the latest editions of other dictionaries – or a comparison of the printed version, the CD-ROM version and the online version of the same dictionary – whereas the diachronic presentation includes prior editions of the dictionary examined and places it in an evolutionary and historical context (see for instance Kleinedler 2000: 140–141 and Algeo 2003). Here improvements of any kind, technological advances and options in printed and electronic dictionaries may be highlighted. Comparative reviews thus allow reviewers to place the dictionary concerned in a larger and broader context so that the analyses, evaluations and conclusions relate to other dictionaries and the development of lexicographic principles and practices. 5.2
Using the factual approach
The factual approach may come into play regardless of whether the dictionary treats general language or specialised language, because even general-language dictionaries contain technical terms and diatechnical labels, see for instance Collins Cobuild English Dictionary (1995), which contains subject and style labels such as journalism, legal, medical and technical. As most reviewers are not experts in all the subject fields covered by multi-field dictionaries, it may be necessary to limit the review to selected subject fields, whether linguistic, technical or otherwise. A problem like this may partly be overcome by forming review teams consisting of lexicographers, linguists and subject-field experts. However, both linguists and subject-field experts should keep in mind that the object of the review is the dictionary and not linguistics, nor one or more subject-fields. One example of a review written by a team of reviewers is Meer and Sansome (2001), in which one reviewer addresses linguistic issues and the other considers pedagogical issues. As explained by Miranda (1996: 196), it is important that reviewers of books in general live up to the standards of a scholarly community, which implies that a person reviewing a dictionary must be “competent in doing so because she or he is an active participant in and contributor to the book’s particular area of research.” Reviews made by such competent persons also support the criterion of trustworthiness in scholarly discourse.
Sandro Nielsen
In this context it is possible to distinguish between two types of reviews. The review may either be maximising, i.e. it attempts to give as exhaustive a description, evaluation and analysis of the dictionary as possible or it may be minimising in the sense that it is deliberately limited to selected aspects regarding the dictionary, e.g. subject fields. When making this distinction it is helpful to have a quantitative scale, and a number of writers have suggested “checklists” for reviewers that may serve as guidelines, for instance Steiner (1979), Chan and Loong (1999), Jackson (2002: 176–182) and Bergenholtz (2003). These lists contain, in various degrees of detail, a range of topics or issues that may be considered review topics and the most comprehensive is that found in Bergenholtz (2003: 20); see the adapted list of review topics in the Appendix. It is tentatively suggested that a review addressing 17 or more categories is maximising, whereas one that addresses less than 17 is minimising. This is, admittedly, a crude and purely quantitative distinction, but any other differentiation is difficult to use and uphold in practice. It is impossible to say definitively which categories are mandatory – perhaps with the exception of functions, as all dictionaries have at least one function – because no two dictionaries are alike. Synonyms are undoubtedly relevant in a review of a dictionary of synonyms, but this topic may be irrelevant in reviews of other dictionaries. Similarly, equivalents are only mandatory in reviews of bi- or multilingual dictionaries whose function is to provide help in case of translation. This distinction between minimising and maximising reviews is based on a similar distinction described in Nielsen (1994: 37–38) between minimising and maximising dictionaries. It is suggested that maximising scholarly reviews are prima facie more useful than minimising ones. Reviewers are free to write about any possible and relevant review issue and its relation to the significant features of the dictionary. This does not mean, however, that reviewers may exclude substantially all issues and only focus on one or two. It makes no sense to talk about a review of a/the dictionary if the review only focuses on, for instance, the dictionary’s treatment of one specific word class. A “review” like his does not inform the reader about the dictionary, but merely about a single issue at the micro-level of the dictionary. An extremely minimising “review” does not qualify as a review of the dictionary, because it does not have the dictionary as its object or theme, and it does not meet the objective of dictionary reviews as described in sections 2 and 3 above. Reviewers may also examine the lexicographic data and to what extent their factual contents relate to the functions of the dictionary. Lexicographic data are primarily found in the wordlist but may be placed anywhere in the dictionary depending on its distribution structure. The relevant data are found in the lexicographic components of the dictionary, i.e. those components that contain data about its use and function(s), for instance the user guide, the wordlist and a
Reviewing printed and electronic dictionaries
subject-field component. Other components such as acknowledgements, advertising texts, picture credits and single user licence agreements are extra-lexicographic components as described in Tarp (1998: 128–130) and Nielsen (2006: 7–8). Some lexicographic components contain data that aim to facilitate the actual use of the dictionary (called use-related components) and others contain data supporting the function(s) of the dictionary (called function-related components), and a description, analysis and evaluation should take this dichotomy into account. It may also be relevant to compare the data of one dictionary with the data of one or more other dictionaries (comparative reviewing) and/or previous editions of the dictionary concerned. Other aspects to consider in connection with lexicographic data are whether definitions and explanations are correct, whether they can be easily understood by the dictionary’s target group, whether they support the dictionary function(s), whether they have been subjected to textual condensation and, if so, to what extent this impacts on the understanding of the data; further see Section 5.4 below. 5.3
Using the linguistic approach
The linguistic approach focuses on linguistic categories, principles, structures etc. and is relevant to all three significant features of dictionaries. Here I do not distinguish between LGP and LSP as most general dictionaries contain both LGP and LSP terms. Moreover, the language of specialised subject fields is characterised by the special use of linguistic options, for instance a higher frequency of complex sentence structures than in normal, unmarked language. The following examples form a brief and non-exhaustive list of issues that reviewers may take into consideration when using the linguistic approach: 1) Grammar, including syntax, morphology, and orthography; 2) Pronunciation, realised in writing or through sound files; 3) Synonyms, antonyms and hyperonyms; 4) Collocations, phrases and examples; 5) Language usage; 6) Translation strategies. It is impossible to address all the aspects that fall under the linguistic approach in a paper like this, but a few comments are appropriate. If the object of a review is a bilingual dictionary designed for translation, reviewers may examine whether its data support the translation function from a linguistic perspective. For instance, translation dictionaries that do not indicate the gender of the French or German equivalents are not well suited for translation. If the articles contain gender and morphology items, both users and reviewers of the dictionary can rightly expect the data to be easy to decode, understand and use, i.e. they are correct and usable. This analysis relates the linguistic approach to lexicographic data. It may also be appropriate to analyse how language usage relates to dictionary functions. This could include the question whether a translation dictionary uses a
Sandro Nielsen
particular translation strategy consistently or whether its equivalents and translations of phrases are arbitrarily source-language and target-language oriented. For a detailed discussion of translation strategies in bilingual dictionaries, see Nielsen (2000). Finally, reviewers may examine whether synonyms, antonyms and hyperonyms are used consistently so as to provide users with a systematic presentation of a small sub-field within a larger subject field, for instance by informing users whether a lemma and its synonyms and antonyms are general terms, specific terms or co-hyponyms. Information like this will help users to read and understand, for instance, LSP texts. 5.4
Lexicographic information costs
An aspect that extends to all three significant features of dictionaries as well as the three approaches to reviewing is the concept of lexicographic information costs. Based on Nielsen (2008: 173–174) lexicographic information costs may be defined as the effort, especially difficulty or inconvenience, that the user believes or feels is associated with consulting a dictionary, an article or any other text part of a dictionary. The focus is on the relationship between the expected and actual information costs and the expected and actual informative value, i.e. what users gain from consulting the dictionary. Search-related information costs are costs related to the lookup activities users have to perform when consulting a dictionary to get access to the data they are searching for; this is significant in connection with the search activities necessary to use CD-ROM, Intranet and Internet dictionaries. Examples of situations involving search-related information costs include cross-references without any clear purpose; the number of individual steps users must take in the lookup process before they find what they are looking for (printed and electronic dictionaries); the need to scroll up and down the screen to find something; the possibility of searching for specific words instead of scrolling long lists; the possibility of searching for parts of words and words in their inflected forms. Comprehension-related information costs are the costs related to the user’s ability to understand and interpret the data presented in a dictionary. They are basically answers to the question: how easy or difficult is it for users to understand the data presented? The design of the dictionary may contribute to keeping the lexicographic information costs at a low level, as an inappropriate design and structure may lead to high or increasing information costs. The actual wording and presentation of the data in the articles, for instance a high degree of textual condensation in definitions, may increase the information costs, whereas clear and consistent search routes may reduce the lexicographic information costs. For a discussion of the effects of textual condensation in dictionary articles, see Nielsen (2002); and for a discussion of the presentation of knowledge in printed dictionaries, see Wiegand (2000).
Reviewing printed and electronic dictionaries
5.5
Three general themes of scholarly dictionary reviewing
A principal aim of dictionary reviewing is to provide readers with useful information. The above discussion shows that, by combining the three approaches to reviewing with the three significant features of dictionaries, reviewers may realise the full potential of reviews. In sum, scholarly dictionary reviews should centre around three general themes: 1. Which function(s) does the dictionary have? The focus is on the types of help the dictionary is intended to give to particular types of users in particular types of situations in the real world. 2. Do the data presented in the dictionary support the function(s) identified? Focus is on whether the data match the needs of the target group(s) and whether the data are presented so that users can process them to get the information they need to solve their problems. 3. Do the structures used in the dictionary organise and link the data so that they combine with the function(s) to meet the needs of users? The focus is on whether the structures combine the data placed in different places in the dictionary in such a way that user needs are fulfilled in the best possible manner. These general themes allow reviewers to focus on the relationship and interplay between the three significant features of a dictionary and the three approaches to reviewing. This focus may help reviewers to realise the full potential of reviews so that they become good examples of scholarly discourse. However, in order to properly meet the requirements of trustworthiness, it is helpful to identify some of the most important qualitative characteristics of useful information. 6. Requirements to the informative value of reviews Any person who agrees to review a dictionary assumes a responsibility towards two groups of people. Firstly, the reviewer owes a duty to the author(s) of the dictionary to ensure that his review does not contain irrelevant, personal, unfair or unsubstantiated points of criticism (Harvey 1986: 54). Secondly, the reviewer owes a duty to the readers of his review to ensure that it does not give a wrong or misleading picture of the dictionary, and that his review contains information that is useful to readers. Consequently, the reviewer must comply with what may be called generally accepted reviewing principles, i.e. the review must be made according to those methods and principles that competent (“scholarly”) and responsible lexicographers and reviewers would apply. This means that there are certain requirements to the informative value of reviews that must be complied with.
Sandro Nielsen
Irrespective of the approach and focus adopted in the review, there are certain fundamental requirements that highlight the characteristics of useful review information. The overriding qualitative requirement of a review – which also emphasises the objective of a dictionary review – is that it gives a true and fair view (ein den tatsächligen Verhältnissen entsprechendes Bild; une image fidèle) of the dictionary concerned. To ensure that the review is as informative as possible for readers, it is important that it does not present a wrong or misleading picture of the dictionary. The term true and fair view is inspired by the terminology of financial reporting, which requires, inter alia, that financial statements must give a fair presentation of the financial position of the enterprise for the benefit of investors and other users of financial statements when they make their decisions (Alexander and Britton 2004: 198–201). This may be extended to dictionary reviews so that reviews must present fairly the functions, data, structures and interrelationships between text types of the dictionary for the benefit of intended readers when they make informed decisions in the types of situations specified in Section 2 above. A review can only satisfy the general requirement of a true and fair view if it meets a number of underlying assumptions (zu Grunde liegende Annahmen; hypothèses de base). In this context, underlying assumptions are qualitative attributes ensuring that the information presented in dictionary reviews meets the readers’ requirement of usefulness and the scholarly requirement of trustworthiness. When analysing the qualitative characteristics of dictionary reviews, it is possible to distinguish between four underlying assumptions. The first is the assumption of relevance (Relevanz; pertinence). The information, observations, comments, etc. are relevant if they actually relate to the objective of the review and can effectively be used by its intended audiences when they make decisions in the five general types of situation listed in Section 2 above. An example of what Hartmann (1999: 48) calls a product test in a dictionary review may serve to illustrate the assumption of relevance. It is difficult to see the relevance of the exercise when the reviewer of a business dictionary published in the year 2000 tests its lemma stock against two texts from 1986 and 1988 respectively (Bogaards 2002: 112). The terminology used in business life changed considerably between 1986 and 2000, and is still changing, so the findings of the test only show that the dictionary can be used when reading old texts that have no relation to the time when the dictionary and the review were published. The test, and consequently the information about the findings, would have been more relevant if the texts had been from the same year as the dictionary and the review. Readers may also expect reviews to comply with the assumption of materiality (Wesentlichkeit; importance relative). A piece of information, an observation, a comment, etc. is material if its omission will distort the true and fair view of the dictionary. For instance, it is immaterial to point out spelling mistakes unless there
Reviewing printed and electronic dictionaries
are exceptionally many of them and they reduce the informative value of the lexicographic data. Similarly, it is hardly material to observe that a closing parenthesis is missing in a particular article unless this example is used to identify and illustrate a general, negative characteristic of the dictionary as a whole (cf. Algeo 2003: 251). Put differently, a piece of information, an observation, a comment, etc. is material if it influences the reader’s decision to use and/or buy and/or recommend the dictionary and/or apply its lexicographic principles theoretically or practically, or if it influences the reviewer’s assessment of one or more review topics, alternatively the entire dictionary. Materiality may be regarded as a threshold and be linked to the review topics listed in the Appendix. When they have identified the topics that apply to the dictionary examined, reviewers can then go on to determine the materiality of their findings in respect of those topics as they proceed with the review work. The assumption of materiality thus implies that more weight should be given to significant findings and issues than insignificant ones, because insignificant findings and issues are unlikely to influence readers when making decisions on the basis of useful information as defined in Section 2 above. Finally, in view of the usually limited space available for dictionary reviews, it seems sensible to include only material issues. The reliability (Verlässlichkeit; fiabilité) of the review is also important because information is only useful if readers can depend on it. A review must not contain improbable, unrealistic or wrong statements, etc. that distort the true and fair view of the dictionary. In my experience reviews are generally reliable, but the following example illustrates that this is not always the case. When a reviewer criticises a bilingual law dictionary treating British legal terminology for having lemmatised a specific American legal term, this seems justified (Jørgensen 1994). But when a closer study of the dictionary subsequently reveals that the term selected by the reviewer is not contained anywhere in the dictionary, the review is not reliable. The assumption of reliability implies that the review should be regarded as a scholarly or “serious” text and be treated accordingly by reviewers, with the result that statements must be substantiated so as to avoid pure value judgements. Finally, readers must be entitled to expect reviews to comply with the assumption of neutrality (Neutralität; neutralité). In this context neutrality means that the review must contain all material and relevant aspects irrespective of whether they have a negative or positive influence on the reviewer’s assessment. In other words, the review must be free from bias (frei von verzerrenden Einflüssen; sans parti pris). It follows from this assumption that the reviewer’s personal relation (if any) to the dictionary, its author(s) and publishers must not lead to the assessment and evaluation of single aspects or the dictionary in its entirety being (excessively) negative or (excessively) positive. However, the assumption of neutrality does not prevent reviewers from discussing lexicographic theories and principles that
Sandro Nielsen
compete or contradict those used by the authors of the dictionary under review. It should be noted that the relative importance of the underlying assumptions discussed above is a matter of informed or “professional” judgement to be exercised on a case-by-case basis. Furthermore, it should be the duty of any reviewer to give a fair, and preferably comprehensive, description and evaluation of the dictionary concerned. 7. Concluding remarks The reviewing of electronic and printed dictionaries is not an exercise in linguistics or in subject fields but an exercise in lexicography. It does not follow from this that dictionary reviews cannot or should not be based on a linguistic approach, but that the linguistic approach is only one of three approaches to dictionary reviewing. Similarly, the linguistic and factual competences of reviewers should not be relegated to an insignificant position in the review process. Moreover, reviewers should define the object of their reviews, the dictionary, in terms of significant lexicographic features in order to give priority to lexicography and dictionary functions, as this emphasises the fact that dictionaries are much more than mere vessels of linguistic categories, namely lexicographic tools that that have been developed to fulfil specific types of needs of specific types of users in specific types of situations in the real (extra-lexicographic) world. The above proposed basis for a framework contains an outline of general theoretical and practical principles that underlie the true nature of dictionary reviews, and places the reviews in a lexicographic universe with the dictionary and lexicography at its centre. This seems to be in line with the modern understanding of lexicography as a separate academic discipline concerned with the compilation, design, evaluation and use of dictionaries. Moreover, a set of generally applicable principles may lead the discourse community to accept dictionary reviews as an important part of the scholarly discourse. Finally, it may result in reviews that actually contribute to the development of lexicographic theories and dictionary compilation. References A.
Dictionaries
Pickett, J.P. et al. (eds.). 2004. American Heritage College Dictionary. Fourth edition. Boston and New York: Houghton Mifflin.
Reviewing printed and electronic dictionaries Sinclair, J. (ed.). 1995. Collins Cobuild English Dictionary. Third edition. London: HarperCollins. Summers, D. et al. (eds.). 2005. Longman Dictionary of Contemporary English. Fourth edition. Harlow: Longman. Mish, F.C. et al. (eds.). 2004. Merriam-Webster’s Collegiate Dictionary. Eleventh edition. Springfield, MA: Merriam-Webster. Mish, F.C. et al. (eds.). 2008. Merriam-Webster Online Dictionary. Merriam-Webster Inc. (Accessed 3 September 2008). Wehmeier, S. et al. (eds.). 2005. Oxford Advanced Learner’s Dictionary. Oxford: Oxford University Press. Simpson, J. (ed.). 2008. Oxford English Dictionary Online. Oxford University Press. (Accessed 3 September 2008).
B.
Other literature
Alexander, D. and Britton, A. 2004. Financial Reporting. London: Thomson. Algeo, J. 2003. “[Review of] The New Oxford American Dictionary. Ed. Elizabeth J. Jewell, Frank Abate, and others. Oxford: Oxford University Press, 2001.” Dictionaries 24: 236–252. Bergenholtz, H. 2003. “Ordbogsanmeldelser i LexicoNordica.” LexicoNordica 10: 7–26. Bergenholtz, H. and Mogensen, J.E. 1993. “Wörterbuchkritik in Dänemark.” Lexicographica 9: 8−35. Bergenholtz, H., Tarp, S. and Wiegand, H.E. 1999. “Datendistributionsstrukturen. Makro- und Mikrostrukturen in neueren Fachwörterbüchern.” In Fachsprachen. Ein internationales Handbuch zur Fachsprachenforschung und Terminologiewissenschaft, Teilband 2, L. Hoffmann, H. Kalverkämper and H.E. Wiegand (eds.), 1762–1832. Berlin/New York: Walter de Gruyter. Bogaards, P. 2002. “[Review of] DAF: An Innovative Learners’ Dictionary of Business French.” International Journal of Lexicography 15(1): 105–116. Chan, A.Y.W. and Loong, Y. 1999. “Establishing criteria for evaluating a learner’s dictionary.” In Language Analysis, Description and Pedagogy, R. Berry, B. Asker, H. Hyland and M. Lam (eds.), 298–307. Hong Kong: Hong Kong University of Science and Technology. Chan, A.Y.W. and Taylor, A. 2001. “Evaluating learner dictionaries: What the reviews say.” International Journal of Lexicography 14(3): 163–180. Hartmann, R.R.K. 1999. Teaching and Researching Lexicography. Harlow: Longman. Harvey, J. 1986. “Social sciences.” In Reviews and Reviewing: A Guide, A.J. Walford (ed.), 53–89. London: Mansel Publishing. Hiligsmann, P. 2005. “[Review of] Marja Verburg and Ruud Stumpel (eds.) Van Dale Pocketwoordenboek Nederlands als tweede taal. Utrecht and Antwerpen: Van Dale Lexicografie. 2003”. International Journal of Lexicography 18(3): 384–391. Jackson, H. 2002: Lexicography. An Introduction. London/New York: Routledge. Jørgensen, S. 1994. “[Review of] Juridisk Basisordbog.” Sprog & Erhverv 2: 18. Kleinedler, S. 2000. “[Review of] Computing Dictionary. Fourth Edition. Ed. Ronald D. Kobler, Trevor Meers, and others. Lincoln, NE: Sandhills Publishing. 1999.” Dictionaries 21: 136– 142. Kling, R.K. and McKim, G. 1999. “Scholarly communication and the continuum of electronic publishing.” Journal of the American Society for Information Science 50(10): 890–906.
Sandro Nielsen Kotzé, E. 2008. “[Review of] Phillip Louw, Lorna Hiles (eds.) et al. Oxford Afrikaans-Engels/ English-Afrikaans Skoolwoordeboek/School Dictionary. Cape Town: Oxford University Press. 2007”. International Journal of Lexicography 21(2): 207–212. Meer, G. van der and Sansome, R. 2001. “OALD in a linguistic and a language teaching perspective.” International Journal of Lexicography 14(4): 283–306. Miranda, E.O. 1996. “On book reviewing.” Journal of Educational Thought 30(2): 191–202. Nielsen, S. 1990. “Contrastive description of dictionaries covering LSP communication.” Fachsprache/International Journal of LSP 3–4: 129–136. Nielsen, S. 1994. The Bilingual LSP Dictionary. Principles and Practice for Legal Language. [Forum für Fachsprachen-Forschung 24]. Tübingen: Gunter Narr. Nielsen, S. 1999. “Mediostructures in bilingual LSP dictionaries.” Lexicographica 15: 90–113. Nielsen, S. 2000. “Translation strategies for culture-specific textual conventions in bilingual dictionaries.” Lexicographica 16: 152–168. Nielsen, S. 2002. “Textual condensation in the articles of de Gruyter Wörterbuch Deutsch als Fremdsprache.” In Perspektiven der pädagogischen Lexikographie des Deutschen II. Untersuchungen anhand des de Gruyter Wörterbuch Deutsch als Fremdsprache, H.E. Wiegand (ed.), 597–608. Tübingen: Max Niemeyer. Nielsen, S. 2003. “Changes in dictionary subject matter.” In Untersuchungen zur kommerziellen Lexikographie der deutschen Gegenwartssprache I. »Duden. Das Groβe Wörterbuch der deutschen Sprache in zehn Bänden«, H.E. Wiegand (ed.), 109–114. Tübingen: Max Niemeyer. Nielsen, S. 2006. “A functional approach to user guides.” Dictionaries 27: 1–20. Nielsen, S. 2008. “The effect of lexicographical information costs on dictionary making and use.” Lexikos 18: 170–189. Osselton, N. 1989. “The history of academic dictionary criticism with reference to major dictionaries.” In Dictionaries. An International Handbook of Lexicography, vol. 1, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 225–230. Berlin/New York: Walter de Gruyter. Phillips, E.M. and Pugh, D.S. 2005. How to get a PhD. A handbook for students and their supervisors. Maidenhead: Open University Press. Ripfel, M. 1989. Wörterbuchkritik. Ein empirische Analyse von Wörterbuchrezensionen. Tübingen: Max Niemeyer. Steiner, R.J. 1979. “Guidelines for reviewers of bilingual dictionaries.” Dictionaries 1: 166–181. Tarp, S. 1998. ”Leksikografien på egne ben. Fordelingsstruktur og byggedele i et brugerorienteret perspektiv.” Hermes 21: 121–137. Tomaszczyk, J. 1986. “The bilingual dictionary under review.” In ZüriLex ’86 Proceedings. Papers read at the EURALEX International Congress, University of Zürich, 9–14 September 1986, M. Snell-Hornby (ed.), 289–297. Tübingen: Francke. Wiegand, H.E. 2000. “Wissen, Wissensrepräsentation und Printwörterbücher.” In Proceedings of the Ninth Euralex International Congress, EURALEX 2000, Stuttgart, Germany, August 8th12th, 2000, U. Heid et al. (eds.), 15–38. Stuttgart: Institut für Maschinelle Sprachverarbeitung.
Reviewing printed and electronic dictionaries
Appendix: List of review topics, adapted from Bergenholtz (2003: 20). 1. Communicative, cognitive and operative lexicographic functions 2. Dictionary users: cultural, factual, linguistic and translation competences 3. Guidance for use: straight-text format, illustrated, extent to which it relates to functions 4. Price 5. Layout/web-design 6. Information about the author(s) 7. Comparison with other dictionaries or earlier editions 8. The history and background of the dictionary 9. References to other reviews 10. Information about the reviewer(s) 11. Empirical basis of the dictionary 12. Outside matter or metatexts: integration/non-integration, interrelationship, support of function(s) 13. Lemma selection: derivatives, affixes, irregular spellings, spelling variants, multi-word units 14. Lemma presentation 15. Access routes and search options 16. Article structure, data types, article-internal search zones 17. Prescriptive/descriptive/proscriptive language usage guidance 18. Equivalents and equivalence: full, partial or zero equivalence, translation strategies 19. Cultural information: culture-specific, comparative 20. Grammar: grammar codes, natural language presentation 21. Orthography 22. Pronunciation: written, audible 23. Semantic/encyclopaedic data: restricted defining vocabulary, easy to understand 24. Diasystematic labelling: logical, easy to decode 25. Etymology 26. Example sentences: grammatical, citation or competence examples 27. Collocations and phrases 28. Idioms 29. Illustrations: drawings, pictures, video clips 30. Synonyms/antonyms/hyperonyms 31. Cross-references and links: extent to which they support function(s) 32. Information costs: search-related, comprehension-related 33. The entertainment value of the dictionary 34. Positive, negative or mixed overall conclusion
Reflections on data access in lexicographic works Sven Tarp This article discusses the need for quick and easy access to the relevant lexicographic data in the light of the function theory. In this respect, it raises the question: How many data does a user need in each consultation? As many as possible? Or as few as possible? The answer provided in the article is that the user needs exactly the amount of data that are necessary in order to meet his or her information needs, neither more nor less. If there are too few data the user’s needs will not be met, whereas too many data may confuse the user and make the spotting of the needed data slower and more difficult. On this basis, the article raises a new question: What is exactly the necessary amount of data needed in each consultation? In order to answer this question, the article introduces a distinction between concrete user needs and types of user needs related to a specific type of user situation. In this light, the article discusses a number of articles taken from printed and electronic dictionaries as well as a number of hypothetic articles based upon the former in order to explore some of the possibilities made available to lexicography by the new information technologies. Keywords: lexicography; data access; user needs; functional theory; article structures; search criteria
1. Introduction Nothing remains forever what it was. Everything is on the move. Everything is transformed by nature and forced into new paths. (Titus Lucretius Carus 99–55 BC)
Lexicography is on the move, transformed not by nature, but by society. The new technologies have placed lexicography in the troubled waters between the “old” printed dictionaries and the new electronic ones. However, the metamorphosis is far from completion. Lexicography is making for a safe port which it has still not entered. Just as a ship cannot make for its final destination without a captain, lexicography cannot evade the many hidden rocks on its long and never-ending Odyssey without an advanced theory. In this respect, neither the stubborn pragmatism
Sven Tarp
which has dominated lexicographic literature for so long nor the blind empiricism which, in the name of user research, has tried to impose itself during the last two or three decades have anything to offer but endless troubled waters. In his posthumously published essay Natural Science and the Spirit World, the German philosopher Friedrich Engels identified “the most certain path from natural science to mysticism” as “the shallowest empiricism that spurns all theory and distrusts all thought.” And he continued: Indeed, dialectics cannot be despised with impunity. However great one’s contempt for all theoretical thought, nevertheless one cannot bring two natural facts into relation with one another, or understand the connection existing between them, without theoretical thought. The only question is whether one’s thinking is correct or not, and contempt of theory is evidently the most certain way to think naturalistically, and therefore incorrectly. (Engels 1990: 354)
What holds true for natural science also applies to social sciences to which lexicography as a theory belongs according to the views shared by Professor Henning Bergenholtz and the author of this contribution. The only difference is that contempt for theoretical thought may not lead lexicography to mysticism as such, but to mystified users of its practical products, i.e. dictionaries, lexicons, thesauri, encyclopaedias, etc. Although the very concept of a lexicographic theory, i.e. the possibility of developing a theory, has been questioned by some modern supporters of Popper, such a theory is not only highly desirable as mentioned above, but also perfectly possible provided it is adapted to the nature and characteristics of the discipline in question (cf. Bergenholtz et al. 2008). At an abstract level, a theory is a systematically organised set of statements about a given subject field. In this sense, the basic statement of lexicographic theory is that dictionaries and other types of lexicographic works are utility products, i.e. cultural, man-made artefacts originally conceived and manufactured with the genuine purpose of satisfying specific types of human needs. Whatever the individual lexicographer thinks of the result of his practical work, it is a fact that the first dictionaries ever produced saw the light of day in response to certain needs observed in ancient society and that whoever consults a dictionary, even today, does it in order to satisfy needs of a specific type. In this light, the fundamental challenge of lexicography is to characterise and typologise these needs in order to establish a basis upon which the corresponding lexicographic solutions can be found and developed. However, the German scholar H.E. Wiegand, who was the first to explicitly state what many lexicographers before him had implicitly taken for granted, i.e. that dictionaries are utility tools, made his statement in the age of the printed dictionaries (cf. Wiegand 1987). For this reason, the response to lexicography’s fundamental challenge has to be reconsidered, and
Reflections on data access
maybe even modified, in the light of the new technologies made available to practical lexicography. 2. The need for information The common nature of all lexicographically relevant needs is that they are needs for information which may subsequently be used for a huge variety of purposes such as successful communication, storing of knowledge, performing of practical and mental operations, and interpretation of non-verbal signs (cf. Tarp 2008b). This concept of user needs transcends the ideas reflected in lexicographic practice for a very long period, at least in Europe, although recent research has shown that dictionaries produced during the Enlightenment display features of surprising “modernity” long ago forgotten. The Universal Dictionary of Trade and Commerce published in 1774 is, for instance, a dictionary containing lexicographic data of which some can only be understood as instructions helping the user to perform non-verbal operations of the mentioned types, i.e. a lexicographic function that has subsequently been relegated to manuals, handbooks, how-to’s, instruction books, user guides, and similar text types (cf. Tarp 2007; 2008a). Meeting needs for information has certainly been the purpose of dictionaries from the very beginning. In this respect there is nothing new under the sun. What is new is the focus on information in the present-day society which has even nicknamed itself “the information society”. Although dictionaries, and lexicographic works in general, have never been the only source for human beings interested in acquiring new information, today the number of “competitors” has grown considerably, especially due to the large amount of texts placed on the internet and made accessible through various types of search engines, among them Google, Yahoo, and others. Never before have so many data been available to so many people. However, one thing is the amount of available data; another thing is the access to these data. Access to relevant data has always been a problematic topic. In the Middle Ages, when far fewer books and other text types were available, it was quite a challenge to find the library that stored exactly the book or other text type where the relevant data could be found. Many researchers spent their whole life without discovering the text that might have solved their specific information needs and taken their research to a higher level. Even today, the famous Santa Cruz Library in Valladolid in Spain does not have a complete list of all lexicographic works included in its collection which comprises more than 13,000 titles from before 1815, for which reason it is very difficult, if not impossible, to find the relevant title and then the relevant data from which the needed information can be retrieved.
Sven Tarp
In many respects, things have not improved in the modern world in spite of the introduction of the marvellous new information technologies. Today it is possible to track down the needed data, especially on the internet, but all too often the process of finding them is very time-consuming. For instance, when someone “googles” on the internet in order to find data from which information can be retrieved, he or she will frequently, even when specifying the search criteria, end up with references to hundreds, if not thousands, of web pages. In this way, the user runs the risk of suffering what has been called “the information death”, i.e. being suffocated by the overwhelming amount of data popping up. In a thought-provoking contribution, Nielsen (2008) discusses the enormous costs that are the result of this process in relation to the use of lexicographic works. And the costs are far higher when viewed in relation to data access in general. Quick and easy access to the relevant data is therefore not only a must in terms of user friendliness, but also in economic and logistic terms because the corresponding waste of time and manpower is synonymous with undue costs as argued by Nielsen. In this light, it is highly surprising that the rapid introduction of the new information technologies, with a few exceptions (e.g. Bergenholtz and Gouws 2007), has not been accompanied by a broader discussion of the concepts of data accessibility and information retrieval from a needs-adapted perspective. For instance, although there is the relatively comprehensive literature about information retrieval, this literature seems to have a completely different approach in terms of specifying the real user needs and relating them to real types of users and user situations (cf. Ingwersen 1992; Manning et al. 2008). 3. The relevance of the function theory The modern function theory, developed by researchers at the Aarhus-based Centre for Lexicography of which Professor Bergenholtz has been the director since its foundation in 1995, provides a set of statements about lexicographic user needs which may assist the formulation of a future theory of information and data access not only in lexicographic works, but also in other text types conceived for consultation and retrieval of information. The most important of these statements is that users in general never need information in general. The type of information needed is always concrete and depends both on the concrete type of user and on the concrete type of situation in which the need occurs. Analogous to other utility tools, this means that the needs of potential users of information tools cannot be defined without specifying who needs what and in which situation. Research has shown that the situation in which the needs occur is the basis element in defining the type of information required and that the criteria for the
Reflections on data access
characterization of the various user types depends on the specific user situation in question. The function theory has until now discovered four fundamental types of user situations, i.e. the communicative, cognitive, operative and interpretative situations, although it cannot be ruled out that further situations will be identified as the result of future research. In the communicative, operative and interpretative situations the users’ needs for information will always be punctual, i.e. related to the solution of specific here-and-now problems or clarification of similar doubts, whereas in the cognitive situations the users may need either punctual information in order to fill a specific knowledge gap or global information in order to memorise it as part of a systematic study of a given topic. The need for global (or systematic) information is traditionally satisfied by books, text books, scientific articles and similar text types and does not belong to the subject field of lexicography which only deals with punctual needs that can be covered through consultation of the corresponding information tools such as dictionaries, lexicons, thesauri, encyclopaedias, and other tools based upon lexicographic principles. Although the terms data and information are frequently confused, especially in the lexicographic literature, it is important to underline that texts do not contain information, but only data from which the corresponding information may or may not be retrieved by a particular reader or user (cf. Wiegand 2000a; 2002). In this light, lexicographic works contain data which have been, or ought to have been, selected, prepared and presented according to lexicographic principles with the genuine purpose of meeting the specific types of information needs which a specific type of users may have in a specific type of situation. 4. Types of needs versus concrete needs The selection, preparation and presentation of needs-adapted data in lexicographic works, as well as in other information tools conceived for consultation, is only part of the solution, because these data should also be quickly and easily accessible in order to meet all the needs of the target user group and avoid the costs mentioned by Nielsen (2008). However, before discussing the possible solution that can ensure this quick and easy data access process it is necessary to answer the following questions: How many data does a user need in each consultation? As many as possible? Or as few as possible? The right answer to these questions is without any doubt that the user needs exactly the amount of data that are necessary in order to meet his or her information needs, neither more nor less. If there are too few data the user’s needs will not be met, whereas too many data may confuse the user and make the spotting of the needed data slower and more difficult.
Sven Tarp
Consequently, a new question arises: What is exactly the necessary amount of data needed in each consultation? In order to answer this question, a distinction should be made between concrete user needs and types of user needs related to a specific type of user situation. Here there is a basic contradiction in terms of approach. As a rule, lexicographers when planning and compiling a dictionary do only, and can only, work with types of needs to be satisfied by the lexicographic data, whereas users always consult dictionaries with a concrete need for information. The determination of the types of user needs is the result of a process of abstraction and these types of needs may, as such, also be considered abstract needs as opposed to concrete needs. They can be achieved by means of at least two different methods, i.e. deduction (the quick method) and the study of empirical data (the slow method), and they may in any case, if so desired, be verified – or falsified – empirically through the observation of the concrete needs occurring for concrete users in concrete situations (e.g. text production and text reception) and based upon the methods of modern sociology and statistics (cf. Bergenholtz et al. 2008; Tarp 2008c). In some cases the data needed to satisfy the type of needs are identical with the data needed to satisfy the concrete needs, but in other cases there may be a bigger or smaller discrepancy between the former and the latter in terms of both the content and the amount of data. How to deal with this discrepancy? There are basically two solutions, of which one is only possible in electronic dictionaries and tools. The first is the production of mono-functional dictionaries where the data in each individual article is presented or structured in such a way that the user can easily locate the data considered most relevant and, if needed, proceed to additional data included in the same article or elsewhere in the dictionary (although cross-references always slow down the overall consultation process). The second solution, which is only possible in electronic tools, is to prepare a preliminary interactive phase where the lexicographic tool helps the users to identify and specify their concrete needs before being guided to the corresponding data (article). In the following section we will take a closer look at the two solutions. 5. The well-structured article In poly-functional dictionaries the total amount of data included in the respective articles has to serve two or more functions. Although one type of data may sometimes support more than one function, in many cases it will only be relevant to one function. Consequently, when users who find themselves in a specific situation consult a poly-functional dictionary in order to get lexicographic assistance, they may end up with a lot of superfluous data whose only real function is to complicate and
Reflections on data access
slow down their specific consultation process. If quick and easy data access is desired and given top priority, this is a strong argument for the production of mono-functional dictionaries. The article reproduced in Example 1 from a poly-functional English learner’s dictionary which, according to the publishing house, is compiled to assist users with both text production and reception illustrates the problem. tooth /tu:θ/ noun (pl. teeth /ti:θ/) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. ○ to brush /clean your teeth ○ tooth decay ○ She answered through clenched teeth (= opening her mouth only a little because of anger). ○ The cat sank its teeth into his finger.–picture => body–see also buck teeth, false teeth, milk tooth, wisdom tooth 2 a narrow pointed part that sticks out of an object: the teeth on a saw–picture => fastener–see also fine-tootk comb idm cut your teeth on sth to do sth that gives you your first experience of a particular type of work cut a ‘tooth (of a baby) to grow a new tooth get your ‘teeth into sth (informal) to put a lot of effort and enthusiasm into sth that is difficult enough to keep you interested: Choose an essay topic that you can really get your teeth into. have ‘teeth (BrE, informal) (of an organization, a law, etc.) to be powerful and effective in the teeth of sth 1 despite problems, opposition, etc: The new policy was adopted in the teeth of fierce criticism. 2 in the direction that a strong wind is coming from: They crossed the bay in the teeth of a howling gale. set sb’s ‘teeth on edge (of a sound or taste) to make sb feel physically uncomfortable: Just the sound of her voice sets my teeth on edge. – more at armed v., bare v., bit., eye n., eye teeth, fight v., gnash, grit v., hell, kick n., kick n., lie2 v., long adj., red adj., skin n., sweet adj. Example 1: Dictionary article from Oxford Advanced Learner’s Dictionary (2005) If a user, in this case a foreign learner of English, has a reception problem related to tooth it is relatively easy to find the first meaning, i.e. the hard white structures in the mouth used for biting and chewing food, but with the design chosen for the article in Example 1 it is much more difficult to access the second meaning, i.e. a narrow pointed part that sticks out of an object. This problem could, of course, be partially solved by means of a convenient internal structure based upon the use of search fields (cf. Wiegand 2000b). But the ideal way to deal with this problem in printed dictionaries would be to design them as mono-functional dictionaries and leave out all data that are not relevant for this function and only keep the relevant data, i.e. the data required to meet the so-called primary (function-related) and
Sven Tarp
secondary (use-related) user needs (cf. Tarp 2008d). Based upon these criteria the article tooth will have the content reproduced in Example 2. tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. Example 2: Hypothetical dictionary article based upon Oxford Advanced Learner’s Dictionary (2005) The dictionary article in Example 2, which is structured according to the principle of search fields, contains, apart from the lemma, data about part of speech and the plural form of tooth as well as two short definitions followed by authentic examples whose function is to support the definitions. The indication of part of speech and the plural form serves the secondary user needs and helps users to confirm that they have accessed the right article. Indeed, the above data is everything that foreign learners of English need in order to be assisted lexicographically when having a reception problem related to the single word tooth although the concrete user in the concrete consultation may only need one of the definitions without the supporting text example. However, if users have reception problems related to word combinations with tooth, there are various possible solutions in printed dictionaries, the first of which is the traditional one where the respective word combinations are addressed to a lemma representing one of the words contained in them (Example 3). tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. word combinations cut your teeth on sth to do sth that gives you your first experience of a particular type of work cut a ‘tooth (of a baby) to grow a new tooth get your’ teeth into sth to put a lot of effort and enthusiasm into sth that is difficult enough to keep you interested: Choose an essay topic that you can really get your teeth into. have ‘teeth (of an organization, a law, etc.) to be powerful and effective in the teeth of sth 1 despite problems, opposition, etc: The new policy was adopted in the teeth of fierce criticism. 2 in the direction that a strong wind is coming from: They crossed the bay in the teeth of a howling gale.
Reflections on data access
set sb’s ‘teeth on edge (of a sound or taste) to make sb feel physically uncomfortable: Just the sound of her voice sets my teeth on edge. Example 3: Hypothetical dictionary article based upon Oxford Advanced Learner’s Dictionary (2005) This may be the only realistic solution when dealing with very voluminous dictionaries like the Oxford Advanced Learner’s Dictionary (2005) which already includes more than 70,000 lemmata, in as far as a more focussed solution would require various volumes and, thus, creating a new type of problem in terms of consultation price. However, if the publishers’ real problems are ignored for a brief moment, a more focussed solution would be to select lemmata for each of the many word combinations which may generate reception problems. In this case the dictionary will, among others, contain the lemmata included in Example 4. cut a ‘tooth idiom (of a baby) to grow a new tooth cut your teeth on sth idiom to do sth that gives you your first experience of a particular type of work get your ‘teeth into sth idiom to put a lot of effort and enthusiasm into sth that is difficult enough to keep you interested: Choose an essay topic that you can really get your teeth into. have ‘teeth idiom (of an organization, a law, etc.) to be powerful and effective in the teeth of sth idiom 1 despite problems, opposition, etc: The new policy was adopted in the teeth of fierce criticism. 2 in the direction that a strong wind is coming from: They crossed the bay in the teeth of a howling gale. set sb’s ‘teeth on edge idiom (of a sound or taste) to make sb feel physically uncomfortable: Just the sound of her voice sets my teeth on edge. tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. Example 4: Hypothetical dictionary articles based upon Oxford Advanced Learner’s Dictionary (2005)
Sven Tarp
Of course, the lemmatised forms of these word combinations can be discussed. But apart from that, the solution presented in Example 4 is probably much more user friendly in terms of quick and easy access to exactly the place where the needs of the foreign learners with reception problems related to English word combinations may be satisfied. For instance, users who have a reception problem related to the idiom have a tooth may go directly to the article where a short explication of this idiom is provided instead of first going to the article tooth (Example 1) and then speed-reading it in order to locate the small idiom squeezed in between others. However, this solution requires that the factual users in each case are aware that they have a reception problem related to the whole idiom have a tooth and not only to one of the words contained in it, in this case tooth, because they may think that tooth has other meanings than the ones already known to them. In such cases, the users will need cross-references from the single-word article to the multi-word articles (Example 5). tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. Cross references → cut your teeth on sth → cut a ‘tooth → get your’ teeth into sth → have ‘teeth → in the teeth of sth → set sb’s ‘teeth on edge Example 5: Hypothetical dictionary article based upon Oxford Advanced Learner’s Dictionary (2005) There are no known examples of learner’s dictionaries where both single words and word combinations have been lemmatised at a full scale, but there are various dictionaries of idioms specially conceived for foreign learners or other types of users, for instance the Cambridge International Dictionary of Idioms (1998). However, in these cases users who have reception problems will have to consult at least two different dictionaries, one when they have problems related to single words and another when they have problems related to idioms. Such a solution is possible, but it requires either a complex system of cross-references between the dictionaries or that, before each consultation, users know whether they have a singleword problem or a word combination problem, something that may hardly be expected from learners who have no real knowledge of the respective linguistic
Reflections on data access
categories. On the other hand, there is at least one known example of a dictionary for Danish learners of Danish that includes both single words and word combinations in its central word list. This dictionary is Dansk Glossarium, published in 1800 by the vicar Jens Høier Leth, but it has only managed to include all these words and combinations in one volume because the total number of lemmata is relatively small. Thus, it may inspire future lexicographers but it has still not met the challenge of designing a one-volume dictionary able to meet all needs related to one lexicographic function. Tarp (2008d) writes in relation to the primary information needs in term of text reception in the mother tongue: When people encounter problems in understanding their mother tongue, the reason is normally that they do not understand the meaning of a particular word, idiom or proverb. But sometimes the reason is that they do not know whether the word in question has any stylistic or cultural restrictions. As a result, potential users need information about meanings and pragmatic and cultural restrictions in order to understand a text in their mother tongue. Finally, the reason for any lack of understanding may be that people do not recognise a particular irregular inflection, making it necessary to select such forms with references to the basic form. (Tarp 2008d: 70)
For this reason the following information needs are listed by Tarp: meaning of lemmata, idioms, meaning of idioms, proverbs, meaning of proverbs, pragmatic and cultural restrictions, and irregular inflection forms as lemmata. This way of describing the problem is valid in as much as it refers to traditional dictionaries and dictionary articles like the one in Example 1 from the Oxford Advanced Learner’s Dictionary (2005). But at the highest level of abstraction the amount of data types required in order to meet the primary information needs in terms of text reception can be reduced to only two, i.e. the lemma and the meaning. Although the concept of a lemma may differ from the one suggested by Wiegand (1983), at this level of abstraction the lemmata could be single words, irregular inflection forms, combinations of words, or even something else, whereas the users’ need for information about meaning could be covered by explanations, equivalents, synonyms, examples and illustrations, or a combination of these, employing the users’ mother tongue or a foreign language depending on the type of dictionary. This solution may not be practicable in printed dictionaries compiled for anything else than a reduced vocabulary, but it nevertheless points to the future when electronic dictionaries become part of the discussion and the new technologies are used as they ought to be. This will be discussed in the following section.
Sven Tarp
6. The interactive phase The last two decades or so have seen the massive introduction of dictionaries based upon various electronic platforms such as CD-ROMs and the internet. However, generally speaking they have still not completely renewed lexicography, but are what Henry Ford would have called faster horses, i.e. the same old horses just running a little faster due to the new search facilities. The data presentation from which the enthusiastic user finally has to retrieve the needed information is still more or less the traditional dictionary article that has not been adapted to the new possibilities. The reason for this sad situation within modern lexicography is basically the dominating pragmatic and empirical trends that reject the necessity of a theoretical superstructure that may guide the conception and production of a completely new generation of lexicographic tools. Professor Henning Bergenholtz is one of the few lexicographers who have experimented with the new information technologies and introduced the first lexicographic Model T Fords that may be the forerunners of future lexicographic tools and very much desired lexicographic Mercedes Benzes and Rolls Royces. This has been reflected in a number of dictionaries authored or co-authored by Professor Bergenholtz such as the Danish OnLine Dictionary [Den Danske Netordbog] (2008), the Dictionary of Fixed Expressions [Ordbogen over Faste Vendinger] (2007), the Music Dictionary [Musikordbogen] (2006), and the Danish-English/English-Danish Dictionary of Accounting [Regnskabsordbøgerne] (2003–2008). Apart from introducing new search criteria such as associations in the Ord bogen over Faste Vendinger, the authors of this dictionary have also provided users with the option of going through an interactive phase before being guided to the respective dictionary articles (Example 6). Immediately below the above search mechanism the user will find an instruction guide explaining the various search criteria, both in terms of the options available in the scroll menu, i.e. contains, begins with, ends with and is, and in terms of the other four options which may be described as different user situations, i.e. text reception, text production 1, text production 2, and cognition. The difference between text production 1 and 2 is that in the first case the user “knows the fixed expression or part of it, and wants to know how it is used (e.g. grammar or word combination with the expression), or if it really has the meaning that he or she thinks”, whereas in the second case “the user wants to use any fixed expression with a specific meaning, but without considering a specific expression.” (Ord bogen over Faste Verdinger 2007)
Reflections on data access
Search for a word that
contains begins with ends with is
Start the search
Find the expression
Find the usage 1
Find the usage 2
All information
(understand a text)
(write a text 1)
(write a text 2)
(if you want to know more)
Example 6: Translation of interactive search mechanism in Ordbogen over Faste Vendinger (2007)
The search form contains is the default function and allows the user to access the respective dictionary article after performing only two actions, i.e. to write the word or word combination and activate the start function. But if the user wants to refine the search criteria, he or she may perform up to four different actions, i.e. define the search form, write the word or word combination, click on the user situation and activate the start function. The lexicographic data popping up will then depend on the criteria used. If the user for instance writes the Danish idiom spise brød til (hold your horses) and activates contains and Find the expression (reception), then the article will pop up on the screen (Example 7). spise brød til betydning (meaning) • udtryk for, at nogen bør vente lidt og tænke sig om, inden de reagerer på noget Dette udtryk bruges som en opfordring til nogen om at tage det roligt, sætte tempoet ned og ikke lade sig rive med i en overilet reaktion på en hændelse eller udvikling, som gør en spændt eller ophidset. Man kan også bruge udtrykket som en kommentar, når nogen overdriver. faste vendinger (fixed expressions) spise brød til Example 7: Dictionary article from Ordbogen over Faste Vendinger (2007) However, if users activate contains and All information (cognition), they will get to the article in Example 8, which apart from meaning and fixed expression includes data on synonyms, grammar, word combinations, authentic examples and associations.
Sven Tarp
spise brød til idiom • neutral betydning (meaning) • udtryk for, at nogen bør vente lidt og tænke sig om, inden de reagerer på noget Dette udtryk bruges som en opfordring til nogen om at tage det roligt, sætte tempoet ned og ikke lade sig rive med i en overilet reaktion på en hændelse eller udvikling, som gør en spændt eller ophidset. Man kan også bruge udtrykket som en kommentar, når nogen overdriver. faste vendinger (fixed expressions) spise brød til synonymer (synonyms) klappe hesten klappe kebabben koldt vand i blodet se tiden an Stop en halv. stoppe en halv grammatik (grammar) nogen spiser brød til ordforbindelser (word combinations) anbefale samtidigt at spise brød til begynde at spise brød til burde spise brød til heldigvis have lov til at spise brød til spise brød til sin hysteri spise brød til sin vrede vælge at spise brød til være grund til at sige spis nu lige brød til eksempler (examples) Aktiekursen er høj, ja – men spis nu lige brød til! Men selv om alle nu lader sig rive med og råber “sammenbrud”, bør man lige spise brød til. For vel vil weekendens begivenheder i Bruxelles betyde en udsættelse af EU-Forfatningens ikrafttræden, men helt sikkert er det, at der blot bliver tale om en forsinkelse. På The Computer Virus Myths home page bliver man bedt om at spise brød til sit virushysteri. Skal vi ikke lige spise brød til? Disse 100 millioner kan vist bruges bedre på mange andre områder i Odense kommune.
Reflections on data access
Spis nu lige brød til – når man møder overraskende påstande som disse, bør ens første reaktion altid være: Kan det være rigtigt, eller er der tale om fup? associationer (associations) forbehold nøle skeptisk tålmodig tøve
Example 8: Dictionary article from Ordbogen over Faste Vendinger (2007) If users activate Find the usage 1 or 2 (text production), other types of data will pop up on the screen. Consequently, what we have in this dictionary is dynamic articles including different types of data that are structured in different ways according to each type of search criteria. This represents an important innovation of lexicographic practice. However, the Ordbogen over Faste Vendinger is still not conceived completely in accordance with the principles of modern lexicographic function theory which not only relates the users’ specific types of information needs to the respective types of user situations, but also to the specific types of users. In order to achieve this, it is necessary to work with a two-dimensional search criterion (Figure 1). However, as mentioned above, the criteria for the typologization of the users depends on the respective types of user situations. It is therefore necessary to introduce a two-step search mechanism where users will first have to select the respective user situation and then, in a second interactive phase, have to choose between two or more user types in order to identify themselves. The relevance of this method becomes even clearer in relation to another of professor Bergenholtz’ electronic dictionaries, i.e. Regnskabsordbøgerne [Dictionary of Accounting] (2003–2008). In fact, this dictionary is described by the authors as four independent dictionaries, i.e. a monolingual Danish one, a bilingual Danish-English one, a monolingual
Types of user situations
Types of users
Figure 1. Abstract model of two-dimensional search criterion
Sven Tarp
English one, and a bilingual English-Danish one. This way of typologizing the dictionaries is problematic because it is exclusively based upon the dictionary form (monolingual or bilingual) and not upon the respective functions. According to the criteria of lexicographic functions, Regnskabsordbøgerne should be considered one multi-functional dictionary. The dictionary does still not (May 2009) include a search mechanism where the users may relate the search process to their respective situations and, even less, to their own characteristics. However, if such a mechanism had existed it would be clear that it was actually only one dictionary with many functions. This can be illustrated by the discussion of two central types of user situations, i.e. text reception and text production. If Danish accountants or anyone else interested in accounting literature are reading an English text and has reception problems, the corresponding need for information may be provided in various ways which depends on the characteristics of the reader. If the readers’ proficiency level in terms of English accounting terminology and English language in general is relatively high, then an English explanation would be sufficient in order to meet their needs. But if the readers’ general and special English proficiency level is low, then their needs can only be satisfied by means of Danish equivalents or explanations. And finally, if the proficiency level is somewhere in between the two former, then a combination of Danish and English data would be the best way to ensure that their needs are covered. Similarly, if Danish accountants are writing English texts and run into production problems of any sort, then the lexicographic solution will once more depend on their specialised and general proficiency level in English. Accountants with a high proficiency level in this foreign language will generally be able to write directly in English without making any mental outline in their mother tongue. This means that their production problems will frequently be related to English words or terms which they already know, but maybe want to know how these words and terms are used (e.g. grammar or word combination with the expression), or to confirm that the words really have the meaning they think. On the other hand, if the accountants in question have a low proficiency level, their text production will take place through a Danish outline which is then translated into English, for which reason they will need a Danish-English solution in order to be guided to the right word or expression in the target language. And for many Danish accountants, text production in English will probably take place as a combination of the two mentioned methods, for which reason they will sometimes need a DanishEnglish solution and sometimes a monolingual English solution. A similar discussion could be taken related to the user needs in terms of translation and cognition. All this shows that Regnskabsordbøgerne and other electronic dictionaries could benefit from an overall and more stringent functional approach
Reflections on data access
where the corresponding search mechanisms may lead their users to dynamic data (articles) that are especially adapted to each and every of the various types of users and situations covered by the dictionary in question. Such mechanisms could be refined in various ways. They could, as in the Ordbogen over Faste Vendinger, allow users to click on the right user situation and not only the ones used in this dictionary, but also others as for instance translation if the dictionary is bilingual. They could also allow users to define their own profile, either at a general level which the electronic dictionary may remember from consultation to consultation, or in relation to each consultation after determining the relevant user situation. However, there are also other possibilities. For instance, in the scroll menu which in Example 6 includes the search options contains, begins with, ends with and is, other options such as means the same as, means the opposite of or is associated with are possible in various types of dictionaries. And the same holds true for the possibility of writing not only individual words, but also word combinations in the search field as discussed in the previous section. One example from the English-Danish part of Regnskabsordbøgerne illustrates the possibilities in this respect. When writing the term accelerated depreciation in the search field, the user will end up with the data in Example 9. accelerated depreciation noun merafskrivning noun Definition The accelerated depreciation method recognises higher amounts of depreciation in the earlier years and lower amounts in the later years of a fixed asset’s life. Collocations • eliminate accelerated depreciation in the acquirer eliminere merafskrivning i den købende virksomhed • accelerated depreciation on fixed assets traded within the group merafskrivning på koncerninternt handlede anlægsaktiver • cumulative accelerated depreciation akkumulerede merafskrivninger • accelerated depreciation for tax purposes skattemæssige merafskrivninger • accelerated depreciation allowances skattemæssige merafskrivninger Examples • The reducing-balance method is a type of accelerated depreciation. Saldometoden indebærer en form for merafskrivning.
Sven Tarp
Synonyms Saldoafskrivning See also: reducing-balance method declining-balance method
Example 9: Dictionary article from Regnskabsordbøgerne (2003–2008) However, if the user writes the word combination accelerated depreciation allowances in the present version of the dictionary (May 2009), no data will pop up. But if it were possible to search directly for word combinations like accelerated depreciation allowances, an ideal result for the user would be the hypothetic article reproduced in Example 10. accelerated depreciation allowances word combination skattemæssige merafskrivninger word combination Related to accelerated depreciation noun merafskrivning noun Definition The accelerated depreciation method recognises higher amounts of depreciation in the earlier years and lower amounts in the later years of a fixed asset’s life. Synonyms Saldoafskrivning See also: reducing-balance method declining-balance method Example 10: Hypothetic dictionary article based upon Regnskabsordbøgerne (2003–2008) In Example 10, users are guided directly to a dictionary article which is especially adapted to their needs in terms of the word combination accelerated depreciation allowances which is presented as lemma with the corresponding Danish translation immediately below. These are the data that may be considered most relevant for users in this case. But if, for one or another reason, users need more data these data are also included in such a way that they do not disturb the quick and easy
Reflections on data access
access to the data considered most relevant when users need specific information about accelerated depreciation allowances. For most concrete consultations, these additional data may be completely superfluous, but as they do not make any harm there is no reason not to include them if they can provide some assistance in a number of consultations. 7. Concluding remarks This article has explored and discussed some of the possibilities made available to lexicography by the new information technologies. It has not tried to provide final answers and solutions, but only to give rise to a discussion about the need to “rethink” lexicography based upon an advanced theory and making extensive use of the new technologies. Nobody knows how dictionaries will develop in the future, but one direction will probably be the “individualization” of the lexicographic product, adapting it to the concrete needs of a concrete user in a concrete situation and providing much quicker and easier access to the relevant data. In this light, this article should be understood as an invitation to enhance the theoretical and scientific discussion about lexicography, data access and information retrieval, for which professor Henning Bergenholtz has appealed for years. References A.
Dictionaries
Walter, E. et al. (eds.). 1998. Cambridge International Dictionary of Idioms. Cambridge: Cambridge University Press. Leth, J.H. (ed.). 1800. Dansk Glossarium. En Ordbog til Forklaring over det danske Sprogs gamle, nye og fremme Ord og Talemaader for unge Mennesker og for Ustuderede. Et Forsøg. Med en Fortale af Professor Rasmus Nyerup. Kiøbenhavn: Trykt paa Hofboghandler Simon Poulsens Forlag hos Bogtrykker Morthorst’s Enke & Comp. Bergenholtz, H., Vrang V. and Almind R. 2007. Ordbogen over Faste Vendinger. Aarhus: Center for Leksicography. (Accessed 1 May 2009). Bergenholtz, H. et al. (eds.). 2008. Den Danske Netordbog. Odense: Ordbogen.com. (Accessed 1 May 2009). Bergenholtz, I. (ed.). 2006. Musikordbogen. Aarhus: Aarhus School of Business. (Accessed 1 May 2009). Wehmeier, S. et al. (eds.). 2005. Oxford Advanced Learner’s Dictionary. Oxford: Oxford University Press. Nielsen, S., Mourier, L. and Bergenholtz, H. (eds.). 2003–2008. Regnskabsordbøgerne. Aarhus: Aarhus School of Business. (Accessed 1 May 2009).
Sven Tarp Postlethwayt, M. (ed.). 1774. The Universal Dictionary of Trade and Commerce: With large Additions and Improvements, Adapting the same to the Present State of British Affairs in America, since the last Treaty of Peace made in the Year 1763. With Great Variety of New Remarks and Illustrations Incorporated through the Whole: Together with Anything essential that is contained in Savary’s Dictionary: Also, all the Material Laws of Trade and Navigation relating to these Kingdoms, and the Customs and Usages to which all Traders are Subject. Fourth Edition, Volume 1, London: Printed for W. Strahan, etc.
B.
Other literature
Be������������������������������������������������������������������������������������������ rgenholtz, C., Bergenholtz, H. and Tarp, S. 2008. “Leksikografi i videnskabsteoretisk perspektiv: sand, falsk eller irrelevant.” LexicoNordica 15: 155–168. Bergenholtz, H. and Gouws, R.H. 2007. “The access process in dictionaries for fixed expressions.” Lexicographica 23: 237–260. Engels, F. 1990. “Natural science and the spirit world.” In Collected Works, Volume 25, 345–355. Moscow: Progress Publishers. Ingwersen, P. 1992. Information Retrieval Interaction. London: Taylor Graham. Lucretius Carus, T. 1994. On the Nature of the Universe. London: Penguin. Manning, C.D., Raghavan, P. and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press Nielsen, S. 2008. “The effect of lexicographical information costs on dictionary making and use.” Lexikos 18: 170–189. Tarp, S. 2007. “Lexicography in the information age.” Lexikos 17: 170–179. Tarp, S. 2008a. “The third leg of two-legged lexicography.” Hermes 40: 117–131. Tarp, S. 2008b. “Revival of a dusty old profession.” Hermes 41: 175–188. Tarp, S. 2008c. “Kan brugerundersøgelser overhovedet afdække brugernes leksikografiske behov?” LexicoNordica 15: 5–32. Tarp, S. 2008d. Lexicography in the Borderland between Knowledge and Non-knowledge: General Lexicographical Theory with Particular Focus on Learner’s Lexicography [Lexicographica Series Maior, 134]. Tübingen: Max Niemeyer. Wiegand, H.E. 1983. “Was ist eigenlich ein Lemma? Ein Beitrag zur Theorie der lexikographischen Sprachbeschreibung.” In Studien zur neuhochdeutschen Lexicographie III, H.E. Wiegand (ed.), 401–474. Hildesheim/New York: Georg Olms. Wiegand, H.E. 1987. “Zur handlungstheoretischen Grundlegung der Wörterbenutzungsforschung.” Lexicographica 3: 178–227. Wiegand, H.E. 2000a. “Wissen, Wissenrepräsentationen und Printwörterbücher.” In Proceedings of the Ninth Euralex International Congress, Euralex 2000. Stuttgart, Germany, August 8th– 12th, 2000. Volume 1, U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds.), 15–38. Universität Stuttgart: Institut für Maschinelle Sprachverarbeitung. Wiegand, H.E. 2000b. “Über Suchbereiche, Suchzonen und ihre textuellen Strukturen in Printwörterbüchern.” In Wörterbücher in der Diskussion IV. Vorträge aus dem Heidelberger Lexikographischen Kolloquium, H.E. Wiegand (ed.), 233–301. Tübingen: Max Niemeyer. Wiegand, H.E. 2002. “Wissen in der Sprachlexikografie. Ein Plädoyer für einige immer noch notwendige Differenzierungen.” In Linguistik jenseits des Strukturalismus. Akten des II. OstWest-Kolloquiums Berlin 1998, K. Ezawa, W. Kürschner, K.H. Rensch and M. Ringmacher (eds), 265–281. Tübingen: Gunter Narr.
Hybrid text constituent structures of dictionary articles A contribution to the expansion of the theory of textual dictionary structures Herbert Ernst Wiegand Firstly, it is indicated with which different methods of segmentation the article internal text segments of different types can be ascertained. The following types of text segments are distinguished: items, item texts, non-typographical microstructural indicators, functional item additions, item segments, item form segments. Only text segments with text constituent status belong to the first three mentioned types of text segments. They exhibit an own form, at least one separate function and an own position with neighbours for that position. Only text segments without text constituent status belong to the last three abovementioned types of text segments. Non-hybrid textual structures display elementhomogeneous structure-carrying sets, containing only text segments with text constituent status. On the contrary, hybrid textual structures display elementheterogeneous structure-carrying sets. This allows in a formal way, among others, in the formation of structures also the explicit consideration of the upward and downward expanding as well as the internal expanding functional item additions. Consequently, they can have their own structural place in the textual structures presented by means of tree diagrams. Previously, this has not been possible. Many types of hybrid textual structures are distinguished and formally presented by means of tree diagrams. Finally, a comprehensive typological excerpt of hybrid text constituent structures is presented. Keywords: dictionary articles; article internal text segments; text constituent status; non-hybrid textual structures; hybrid textual structures; functional items
Herbert Ernst Wiegand
1. Preliminary remark regarding the connection of terminology, method and theory German Hybride originates from the Latin hybrida ‘half-cast’. The accompanying German adjective hybrid shows a spectrum of meaning that can be paraphrased as follows according to Duden-GFWB (2007): “mixed, from two origins, composed of several; formed by cross-breeding, mixture.” To the theory of lexicographic texts which I have developed, belongs a detailed heuristics (compare e.g. Wiegand 1990: 20–26; 2005 [2006]: 217–226), which also provides different methods for the segmentation of condensed lexicographic texts. For the application of these methods of segmentation, application conventions and conditions of correctness have been fixed. For the linking of methods and parts of theory demanded by the philosophy of science the following principle applies along with others: only those text elements are considered as elements for concrete textual dictionary structures that are obtained through the proper application of one of the methods of segmentation. The following distinctions apply: 1. Through an application of the method of exhaustive functional-positional segmentation, which presents one of the variants of the method of functionalpositional segmentation, all types of text constituents can be obtained. These are only those functional text segments that display a discrete continuous form of the text segment, at least one genuine function of a text segment and exactly one fixed textual position in the linguistic chain of the text. These are the following: items, item texts and non-typographical structural indicators (cf. Wiegand 2008b: Figure 4–1). 2. Through an application of the method of non-functional-positional segmentation all non-functional text segments as well as all internally expanded functional item additions are obtainable (cf. Wiegand 2007: 201 et seq.). 3. Through an application of the method of functional segmentative isolation all top and bottom expanded functional item additions, including those without their own position in the linguistic chain, are obtainable (cf. Wiegand 2007: 193 et seq.). The following arrangements also apply: 1. Concrete hierarchical text constituent structures of which the structure-carrying sets are element-homogeneous in the sense that they display as elements only text segments with text constituent status (cf. 1. above) are called concrete pure (or: non-hybrid) text constituent structures: All their elements have the same origin in the sense that they are obtainable through the application of the same method. Accordingly, abstract hierarchical constituent structures of which the structure-carrying sets only present classes of text segments with
Hybrid text constituent structures of dictionary articles
text constituent status are called abstract pure (or: non-hybrid) text constituent structures. 2. Concrete hierarchical text constituent structures of which the structure-carrying sets are element-heterogeneous in the sense that they display as elements both text segments with and also those without text constituent status (cf. 1- 3 above), are called concrete hybrid text constituent structures: Their elements differ in origin (cf. the translation of the paraphrasing in Duden-GFWB (2007), in the sense that they are obtainable through applications of different methods. Accordingly, abstract hierarchical text constituent structures of which the structure-carrying sets display both classes of text segments with text constituent status and classes of text segments without text constituent status are called abstract hybrid text constituent structures. From this the following can be determined: the composition of the structure-carrying sets is a criterion to distinguish between concrete hierarchical pure and concrete hierarchical hybrid text constituent structures; concrete hierarchical pure text constituent structures display element-homogeneous structure-carrying sets whereas concrete hierarchical hybrid text constituent structures display element-heterogeneous structure-carrying sets. This relation applies mutatis mutandis to the abstract structures. In addition, hierarchical textual structures that are not text constituent structures, e.g. certain item structures, can also be divided with the help of the same criteria into pure and hybrid structures (cf. 3.). 2. Pure article internal text constituent structures In condensed dictionary articles (cf. Wiegand 2003 [2004]: 203 et seq.) text constituent structures occur that belong to three types: the pure article constituent structure, the pure article microstructure and the article internal search area structure (cf. Wiegand 2000: 269 et seq.). To understand at least on an intermediate level of abstraction why the introduction of hybrid text constituent structures presents an “expansion of the theory of textual dictionary structures” (cf. the subtitle), the following paragraph firstly discusses an example of pure article microstructures in a slightly simplified way. The basis of the theory cannot be dealt with here (cf. Wiegand 1989a; 1989b). Compare the completely condensed dictionary articles da1 – da6 in Figure 1. da1:
Herbert Ernst Wiegand
da2:
da3:
da4: da5:
da6: Figure 1. Dictionary article da1 and da2 from HWDG (1984), da3 from Duden-DUW (2007), da4 from Sanders-WDS (1876), da5 from Hollós (2001) and da6 from Stowasser (1994)
In the following discussion we firstly look at the concrete hierarchical article microstructure of da1. No non-typographical microstructural indicators belong to the structure-carrying set of a concrete hierarchical article microstructure. They rather belong to the structure-carrying sets of concrete hierarchical article constituent structures of which the most prominent partial structures are the concrete hierarchical article microstructures. Consequently, to obtain its concrete hierarchical microstructure da1 should be segmented in a first methodological step in such a way that all other text constituents prevail. Because da1 contains no item text these will only be elementary items. Therefore the method of non-exhaustive functional-positional segmentation will be applied to da1. This presents a second variant of the method of functional-positional segmentation of which the correctness conditions require that all segmentation results only apply to the items. The segmentation convention C1 applies, according to which hyphens (e.g. those in -es and -e in da1) are regarded as part of the item form of an item with cohesion instructions. Another segmentation convention C2, not to be applied here, could indicate that both the hyphens should be regarded as immediate partial items, i.e. as condensed word stem items. This would imply that -es and -e are not elementary items but non-elementary items. In the following discussion the segmentation procedures are not explained in detail (cf. e.g. Wiegand 2000: 235 et seq.). Only the segmentation result is listed in
Hybrid text constituent structures of dictionary articles
such a way that each elementary and non-elementary item, presented in da1, is given a lower-case letter, preceding it in round brackets, as an item name. It should be noted that item names are individual names. Furthermore, every item of the class of items is related to the same general dictionary object directed item function to which it belongs. The latter goes for statements of the form “α∈β” with “α” as variable for items and “β” as variable for class symbols that are abbreviations of item class names. In addition “∈” is the symbol for the element-class-relation, in the sense of is an element of or in the sense of is included in. (a) Schurz, der; -es, -e [iABj]; a ∈ CF (= comment on form) (b) Schurz; b ∈.IFLS (= item giving the form of the lemma sign; here: lemmatic item giving a noun) (c) der; -es, -e; c ∈.IMor.n (= item giving the morphology at nouns) (d) der; d ∈ Iart (= item indicating an article) (e) -es, -e; e ∈ IDecC (= item indicating the declination class) (f) -es; f ∈ c.ISF (= condensed item giving the singular formation) (g) -e; g ∈ c.IPlF (= condensed item giving the plural formation) (h) [iABj]; h ∈ I-Pron (= item for the normal pronunciation; this is a zero item, presented in the concrete structure by a blank item (BI) which is also identified positionally over its neighbouring variables “i” and “j”) (i) bes. von Handwerkern [...] lederner S.; i ∈ CS (= comment on semantics) (j) [iABj] bes. von Handwerkern [...] aus festem Material; j ∈ pragsemI (= pragmatic-semantic item) (k) [iABj]; k ∈ I-pragZL (= item for the pragmatic zero labelling; this also is a zero item) (l) bes. von Handwerkern [...] aus festem Material; l ∈ c.IPM (= condensed item giving the paraphrase of meaning) (m) ein lederner S; m ∈ c.IComEx (= condensed item giving the competence example). The items b, d, f, g, h, k, l and m are elementary items, i.e. without the possibility of further functional-positional segmentation. The items a, c, e, i and j are nonelementary items and functional-positional segmentation is possible. In a second methodological step we are now constructing the element-homogeneous structure-carrying set for p MiSch
, the concrete (c) hierarchical (h) pure
da1
(p) microstructure (MiS) of da1. The elements of this structure carrying set are all methodically obtained items a-m and the complete dictionary article da1. The structure-carrying set is called ScMiS da1 : It can be presented as follows with 14 elements: ScMiS da1 da1 ,a,c,d,e,f ,g,h,i, j,k,l,m
Herbert Ernst Wiegand
In a third methodological step we define for ScMiS da1 a two-place non-reflexive (and consequently asymmetrical) as well as transitive relation of the type of the c precedence (p) relation – called R p da1 – with the relation term x precedes y, in c which “x” and “y” are variables for items. R p da1 belongs to the non-reflexive c c c partial ordering relations. The following applies: R p da1 SMiS da1 SMiS da1 . The structure-shaping relation R cp da1 contains as elements all those ordered pairs, e.g. and , that, when their coordinates are put in the relation term x precedes y, will deliver true sentences, e.g. a precedes d (≈ Schurz precedes “der”). R cp da1 therefore fully determines which items from da will precede which other 1 items from da1. The same also applies to other relations (in further examples) that belong to the type of the precedence relation. Precedence relations are also called predecessor-successor-relations. In a fourth methodological step we define for the structure-carrying set ScMiS da1 a two-placed reflexive, antisymmetrical and transitive relation of the partitive c (part) relation type, called R part da1 , with the relation term x is an item as part of y, with “x” as variable for items and “y” as variable for items and da1. The following c c SMiS applies: R cpart da1 SMiS and R cp da1 R cpart . The relation R cpart da1 belongs to the reflexive partial ordering relations. It contains as elements those ordered pairs, e.g. and that, when their coordinates are put in the relation term x is an item as part of y, will deliver true sentences, e.g. d is an item as part of c (≈ “der” is an item as part of “der; -es, -e”). The relation R cpart da1 therefore fully determines (i) which items from da1 are partial items from which other items from da1, as well as (ii) which items are parts of da1. c Following the execution of the fourth methodological step pMiSh da1 , the pure concrete hierarchical microstructure of da1 is given as an ordering structure, determining which items from which item classes are presented in which order in the dictionary article da1. By doing so it can also be determined that hierarchical pure article microstructures from completely condensed dictionary articles (cf. Wiegand 2003 [2004]: 207 et seq.) can occur as article internal item distribution structures. Concrete hierarchical pure microstructures can be presented by means of ordered tree diagrams (cf. Wiegand 1989a: 464 et seq.; 2000: 235 et seq.). They can furthermore occur along with the relevant and isomorphous abstract hierarchical microstructures in a commented structural diagram. Abstract microstructures can be obtained because of the fact that both the structure-shaping ordering relations are defined with regard to one structure-carrying set that does not contain da1 and the concrete items a-m, but rather the class DA of the dictionary articles (e.g. of a specific dictionary) as well as the item classes to which the items a-m belong. This structure-carrying set – called SaMiS da1 – can be presented as follows when used according to the class symbols introduced in (a)-(m):
Hybrid text constituent structures of dictionary articles
DA,CF,IFLS,IMor.n,Iart,IDecC,c.ISF,c.Ip1F,I Prc.IPlF, on,CS,pragsemI,I pragZL,c.IPM,c SaMiS da1 = {DA, CF, IFLS, IMor.n, Iart, IDecC, c.ISF, I-Pron, CS, pragsemI, I-pragZL, c.IPM, c.IComEx}
The single commented structural diagram for the concrete and isomorphous abstract hierarchical pure microstructure of da1 is given in Figure 2. A.H. PURE MICROSTRUCTURE
A. H. LEFT CORE STRUCTURE /MIS
A. H. BASE STRUCTURE /MIS
DA
A. H. RIGHT CORE STRUCTURE /MIS
CF
CS
IFLS|IWF.ns| IMor.n Isp Iart
I-nPron
IDecC
pragsemI I-pragZL
c.IComEx
c.IPM
c.ISF c.IPF
Schurz der
-es
-e [iBI j]
[iBI j] bes. von Hand - ein lederner werkern [...] S. Material
Figure 2. Single commented structural diagram of the abstract (and isomorphous concrete) hierarchical microstructure of da1 in Figure 1. Conventions of presentation: “x ___ y” means (read from the bottom to the top) the same as x is an item as part of y; “u ––– v” means (read from the bottom to the top) the same as u is an element of v (= u ∈ v, where “v” is a class symbol). “|” means the same as at the same time; LEFT CORE STRUCTURE/MiS should be read as left core structure as partial structure of the microstructure (MiS); abbreviations (all abbreviations which label nodes are class symbols for classes of items with the same dictionary topic directed genuine item function): A. = ABSTRACT; H = HIERARCHICAL; DA = dictionary article; CF = comment on form; CS = comment on semantics; IFLS = item giving the form of the lemma sign; IWF.ns = item giving the word form of the nominative singular; Isp = item giving the spelling; IMor.n = item giving the morphology at nouns; I-nPron = item for the normal pronunciation; Iart = item indicating an article; IDecC = item indicating the declination class; c.ISF = condensed item giving the formation of the singular form; c.IPF = condensed item giving the formation of the plural; pragsemI = pragmatic-semantic item; I-pragZL = item for the pragmatic zero labeling; c.IPM = condensed item giving the paraphrase of meaning; c.IComEx = condensed item giving a competence example; IBIj = blank item of which the structural position is determined via the citation of the preceding variable “i” and the following variable “j” with the corresponding class symbols of both neighbours
Herbert Ernst Wiegand
Numerous types of pure microstructures can be distinguished. An incomplete overview can be found in Wiegand (2002: 573–580). 3. Hybrid article internal text constituent structures In this section we are looking at hybrid article microstructures as an example of hybrid article internal text constituent structures. These hybrid article microstructures can be seen in a dictionary article when it displays at least one functional item addition. For dictionary articles with functional item additions both a pure and a hybrid article microstructure prevails. Functional item additions are functional text segments by means of which, as is the case with items, something is presented that, in contrast with items, does not display text constituent status (cf. with regard to functional item additions e.g. Wiegand 2005 [2006]: 326–330; 2007: 192 et seq.). They are not, as items are, obtainable as a result of a functional-positional segmentation. This is because they do not have their own position in the language chain (as e.g. the sunken dot underneath the item form segment “i” of the item giving the form of the lemma sign in da3, cf. Figure 1) or because, when they do have an own position in the language chain, a functional-positional isolation is not possible, e.g. the semantic inner gloss “Facharzt” in da2 in Figure 1, where a segmentation leads to both non-functional text segments “er muß einen Spezialisten” and “aufsuchen”. The sunken dot in da3 belongs to the bottom expanded functional item additions; it is a bottom expanded bifunctional item addition that is upwardly addressed at the item form segment “i”. It realizes a word accent marker which is also a marker of the vowel quantity as being short (Wacc|vocq.s). The same applies to the sunken dot in da2. If a concrete and an abstract hierarchical pure microstructure are allocated to da3 then these will be obtainable in the same way as the microstructures from da1. It can be presented as in Figure 3. When looking at the structural diagram in Figure 3 it is noticeable that in the abstract hierarchical microstructure the word accent marker which also marks the short vowel quantity does not have its own node. It therefore displays no own structural place in the ordered tree diagram that is linked to another structural place in the same tree diagram by means of at least one interface. The occurrence of this bottom expanded bifunctional addition in da3 can only be seen from the two node labels “IFLS Wacc|vocq.s | IWF.ns | Isp” and “Ias Wacc|vocq.s”. In the accompanying isomorphous concrete microstructure the sunken dot in the bottom expanded item giving the syllable accent can therefore also not display an own structural place. It rather appears in “rịchts” as in the concrete text of the dictionary article! If in the presentation of article microstructures you do not only want to take explicit cognizance of the article internal distribution of items and, when
Hybrid text constituent structures of dictionary articles A. H. PURE MICROSTRUCTURE A. H. BASE STRUCTURES /MIS
DA
A. H. RIGHT CORE STRUCTURE /MIS
A. H. LEFT CORE STRUCTURE /MIS
CF
CS
IFLS Wacc|vocq.s| |WF.Ns|Isp
IMor.n
ISyl ISylD Ias IsylD ISyl ISylD ISyl ISylD Wacc| vocq.s
Ge
|
richts |
voll
|
zie
Isyl
pragsemI
Iort IDecC
I-pragZL
IcomEx2
IMP c.IComEx c.IComEx
c.ISF c.IPF
|
her der
-s
–
[iBI j] Angehöriger die Schulden der […] bemithilfe des traut ist - s eintreiben der G. hat die Möbel gepfändet
Figure 3. Single commented structural diagram for the abstract (and isomorphous concrete) hierarchical microstructure, displayed by da3 in Figure 1. New abbreviations: ISyl = item giving a syllable; IsylD = item giving the syllable division; Ias = item giving the accentuated syllable; means the same as bottom expanded; Wacc|vocq.s = marker for word accent and vowel quantity as short; IComEx2 = item consisting of two items giving the competence examples
necessary the distribution of item texts, but, if available, also of functional item additions, you have to proceed from pure to hybrid microstructures. Without going into all formal details, da3 has been used as an example representing all bottom expanded functional item additions to explain this. Next we first look at the bottom expanded item giving the syllable accent “rịchts” from da3. It is an immediate partial item of the bottom expanded item giving the form of the lemma sign and therefore belongs to the lemmatic partial items, as is the case with the four syllable and four syllable division items. All lemmatic partial items with the exception of the item giving the syllable accent are non-expanded elementary items. In contrast, the item giving the syllable accent belongs to the single expanded elementary items and among these to the bottom expanded ones (cf. Wiegand 2005 [2006]: 289). While non-expanded elementary items do not display any internal structure the situation is different in the case of expanded elementary items. They can display an internal hybrid structure. Their elements are obtainable as top or bottom expanded elementary items through the application of a method of functional segmentative isolation. With an application
Herbert Ernst Wiegand
of this method horizontal segmentation sections are structured in such a way that the top and the bottom expanded functional item additions are separated from the elementary items. In the case of the bottom expanded item giving the syllable accent “rịchts” it means that the result of the segmentation consists of the segment “richts” and the sunken dot. It should be noted that both segments are parts of “rịchts”. To be able to determine the exact position of the sunken dot, a non-functional-positional segmentation of “rịchts” is performed, resulting in the following item form segments: “r|ị|chts” (with “|” marking the segmentation joint); “r” is the front, “i” the middle and “chts” the back item form segment. The following statement applies: r < ị < chts (with “, , , G.s ComEx , G.s,I(G.s)ComEx , bISeg,I(G.s)ComEx} reR apart a1 = { 35000 Components and structures > 35300 Microstructure > 35330 Components and positions): 35330–1 subdivision 35330–1.1 lemma section 35330–1.2 explanatory section 35330–1.3 demonstration section 35330–2.1 entry position 35330–2.1.1 lemma position 35330–2.2 entry field 35330–2.2.1 entry head 35330–3 part-of-speech section 35330–4.1 formal section
Bo Svensén
35330–4.2 semantic-pragmatic section 35330–4.3 contextual section 35330–4.3.1 collocation section 35330–4.3.2 idiom section 35330–5 sense 35330–5.1 sub-sense 35330–6.1 right-hand context 35330–6.2 left-hand context 35330–7 extra column 3. Lexicographic terms and linguistic terms How many of the terms used in metalexicographic text should be regarded as strictly lexicographic? It is almost in the nature of things that the ratio should vary between the different sectors of the subject field. In sectors such as dictionary typology, information types, dictionary components and dictionary structures, the number ought to be comparatively large. Other sectors can be expected to have a less conspicuous ratio of such terms, for instance dictionary functions and dictionary use, and also the collection, selection, processing, presentation and distribution of lexicographic data. In yet other sectors, the ratio of strictly lexicographic terms is probably rather insignificant, for instance lexicographic project work and lexicographic aids and tools. However, it may sometimes be difficult to make a clear distinction between terms belonging to linguistics and terms that are more or less restricted to the technical language of metalexicography. For instance, the reviewers of NLO failed to make this distinction when they criticized the dictionary for having classified the term equivalent under “Interlingual relations” (main class “General theory of linguistics”) instead of entering it under main class “Special theory of general bilingual dictionaries.” (Braasch et al. 1999: 190) The reason for the classification in NLO is that the term equivalent is not exclusively lexicographic but is used in several other contexts as well; the reviewers may have been misled by the fact that equivalent is often used in everyday metalexicographic parlance to denote a particular type of information item in dictionaries, namely the one for which a more correct term would be indication of equivalent or equivalent indication. A parallel case commented on by the reviewers (Braasch et al. 1999: 190–191) is diasystematic marking, which was classified in NLO under “Marking; pragmatic information” (main class “General theory of lexicography”) but which, according to the reviewers, should have been classified instead under “Diasystematics” (main class “General theory of linguistics”). In this case, their criticism was justified:
Subject-field classification for metalexicography revisited
NLO had not managed to account satisfactorily for the distinction that, for the sake of clarity, should be made between marking and marking information (= labelling). The term marking should be understood to denote a quality of a lexical item (“the quality of being marked”) and, consequently, not be regarded as an exclusively lexicographic term. The term labelling, on the other hand, should be regarded as having the restricted meaning of “the action or system of providing lexical items with indications of marking (= labels) in a dictionary”.7 The difficulties just mentioned are not alleviated by the fact that there is a rather close resemblance between the structure of subclass “Information types” (see Section 2.3 above) and that of the first subclass under “General theory of linguistics”. At present, this subclass has the following structure: 21000 Linguistic items and their characteristics 21100 Morphology 21110 Inflectional morphology 21120 Derivational morphology 21130 Compositional morphology 21140 Abbreviations and clippings 21200 Realization 21210 Graphical realization 21220 Phonetic realization 21300 Grammatical category 21400 Meaning 21410 Meaning components 21420 Meaning structure 21430 Meaning types 21440 Meaning change 21500 Diasystematics 21510 Diachronics [---] 21600 Syntagmatics 21610 Word combinations in general 21620 Constructions 21630 Collocations 21640 Idioms and other fixed word combinations 21700 Paradigmatics 21710 Content paradigmatics 21720 Expression paradigmatics 21800 Interlingual relationships 21810 Types of equivalence and equivalents 21820 Equivalence relationships
Bo Svensén
However, those writing on metalexicography will be better served if the classification scheme allows them to find, for instance, equivalent alongside equivalent indication, phonetic transcription alongside pronunciation key, marking alongside labelling, and not under different main classes. With the production needs of this user group in mind, then, it might be useful to establish a special classification scheme where the categories belonging to “Linguistic items and their characteristics” have been incorporated into “Information types”. This would mean, for example, that “21110 Inflectional morphology” and “34131 Inflectional morphology” were merged into one class, while “21500 Diasystematics” and “34600 Diasystematic labelling” were merged into another. As a further consequence, the entire linguistic main class can be excluded from the true subject-field classification scheme for metalexicography, which would then, after this final pruning, include the following main classes: 10000 Lexicography as a discipline 20000 General theory of lexicography 30000 Special theory of general-purpose dictionaries 40000 Special theory of specialized dictionaries 50000 Lexicographic methods and tools 4. Concluding remarks Future work will include verifying the appropriateness of the restricted classification scheme in all its details, investigating the usefulness of the special classification scheme for those writing on metalexicography, testing out the additional numerical codes at the lowest levels and, not least, subjecting all the terms to a close scrutiny as to the justification of their presence in a true subject-field classification scheme for metalexicography. Notes: 1. What is called term classification here is actually not a classification of terms in the normal sense of the word, but should rather be described as a systematic procedure of selecting (or creating) terms that reflect as much as possible the corresponding concept classification. 2. The list is based on the table of contents (cf. Hausmann et al. 1989–1991: XXXV–LII). 3. It should be noted that NLO, apart from definitions and explanations in Norwegian and terms in all the major Scandinavian languages, also has terms in English, German and French. 4. Not translated by me but by a native British lexicographer in cooperation with a Swedishborn professional translator working in England.
Subject-field classification for metalexicography revisited 5. In what follows, the number of term records quoted does not include synonym records headed by cross-reference lemmas. 6. A residual category “69000 Others”, made up of terms that should rightly have been classified under “31900 Interlingual relations”, may be left out of account here. 7. As a matter of fact, the reviewers do not seem to have been able to make this distinction either, since they apparently equate marking and labelling, which makes their criticism only partly justified.
References A.
Dictionaries
Bergenholtz, H., Cantell, I., Fjeld, R.V., Gundersen, D., Jónsson, J.H. and Svensén, B. 1997. Nordisk leksikografisk ordbok [NLO; “Nordic Dictionary of Lexicography”]. Oslo: Universitetsforlaget. Hartmann, R.R.K. and James, G. 1998. Dictionary of Lexicography. London/New York: Routledge.
B.
Other literature
Béjoint, H. 2000. Modern Lexicography: An Introduction. Oxford: Oxford University Press. Bergenholtz, H. and Tarp, S. (eds.) 1995. Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries. Amsterdam/Philadelphia: John Benjamins. Braasch, A., Hjorth, E. and Madsen, B.N. 1999. “[Review of] Henning Bergenholtz, Ilse Cantell, Ruth Vatvedt Fjeld, Dag Gundersen, Jón Hilmar Jónson, Bo Svensén. Nordisk leksikografisk ordbok. Oslo: Universitetsforlaget. 1997.” International Journal of Lexicography 12(2): 185–192. Hartmann, R.R.K. 2001. Teaching and Researching Lexicography. Harlow: Longman. Hausmann, F.J., Reichmann, O., Wiegand, H.E. and Zgusta, L. (eds.) 1989–91. Wörterbücher / Dictionaries /Dictionnaires. Volumes 1–3. Berlin/New York: Walter de Gruyter. Jackson, H. 2002. Lexicography. An Introduction. London/New York: Routledge. Landau, S.I. 2001. Dictionaries. The Art and Craft of Lexicography. Second edition. Cambridge: Cambridge University Press. Sterkenburg, P. van (ed.) 2003. A Practical Guide to Lexicography. Amsterdam/Philadelphia: John Benjamins. Svensén, B. 1993. Practical Lexicography. Principles and Methods of Dictionary-Making. Oxford/ New York: Oxford University Press. Svensén, B. 1994. “Nordisk lexikografisk ordbok – terminologisk systematik [‘Nordic Dictionary of Lexicography – terminological systematics’]”. LexicoNordica 1: 229–238. Svensén, B. 2004. Handbok i lexikografi [“A Handbook of Lexicography”]. Stockholm: Norstedts Akademiska Förlag. Svensén, B. 2009. A Handbook of Lexicography. The Theory and Practice of Dictionary-Making. Second edition. Cambridge: Cambridge University Press.
Systematic introductions in specialised dictionaries Some proposals in relation to accounting dictionaries Pedro A. Fuertes-Olivera Bergenholtz and Tarp (1995) claim that systematic introductions adapted to lexicographical function(s) are important components in specialised dictionaries. In essence, they are independent dictionary components supplemented by cross-references to and from individual articles in the wordlist. This paper reviews previous work on systematic introductions and makes proposals that are considered for inclusion in a planned English-Spanish online Dictionary of Accounting, which will be integrated into the Dictionary of Accounting, a network of two monolingual Danish and English dictionaries and two bilingual ones. The proposals will be developed using the function theory of lexicography and, in particular, we will discuss systematic introductions in relation to selected cognitive and communicative user situations, such as specialised translation. Keywords: specialised lexicography; subject-field components; specialized dictionaries; lexicographical functions; user needs
1. Encyclopaedic information in specialised dictionaries Encyclopaedic information is concerned with describing factual knowledge and extra-linguistic reality. In specialised dictionaries encyclopaedic information is usually given in encyclopaedic notes in the dictionary articles, encyclopaedic labels addressed to the individual lemmata or equivalents, and independent outside matter components, referred to as systematic introductions, subject-field components, encyclopaedic sections, or subject field term systems (Bergenholtz and Nielsen 2006: 284). Encyclopaedic notes in specialised dictionaries are regarded as key components that fulfil dictionary functions and satisfy the needs of intended users. For example, in the Encyclopedic Dictionary of Gene Technology, Kaufman et al. (1998) claim that encyclopaedic notes are similar to definitions for non-specialists, a revolutionary assumption that illustrates one of the roles encyclopaedic notes
Pedro A. Fuertes-Olivera
should play in specialised dictionaries. First, they are useful for eliminating uncertainty about a sub-field and/or subject field in cases of text production or text reception by an expert or semi-expert who happens to be unfamiliar with a specific issue (for example, a Spanish accountant unfamiliar with accounting concepts used by American companies). Second, the notes explain concepts, primarily to laypersons and semi-experts, and therefore should be addressed only to the specialised terms proper. Third, if adequately cross-referred to from other lemmata, framed articles or systematic introductions, encyclopaedic notes are appropriate lexicographical means of presenting a full introduction to the subject field (or subfield) in question. Four, the contents, language, and style of encyclopaedic notes have to be adjusted to the factual and linguistic competences of the intended users and their needs. For instance, dictionaries for learners should replace the so-called encyclopaedic style (typically incomplete sentences) with complete sentences (Fuertes-Olivera and Arribas-Baño 2008). Five, together with their primarily cognition-oriented functions, encyclopaedic notes can also support communicationoriented functions by offering collocations and show other linguistic characteristics of the lemma (Bergenholtz and Tarp 1995: 143–150). Encyclopaedic labels are often special symbols or abbreviated terms used in reference works to help the user find a particular lemma, choose the correct equivalent, or indicate the association of a term with a particular subject field. In specialised dictionaries taking into account dictionary functions and users’ needs, encyclopaedic labels will assist users more in communication-oriented situations involving a foreign language than in situations involving their native language, provided that the target users have at least a working knowledge of the subject field (or sub-field) treated, particularly in culture-dependent domains. The labels are also useful for cognition-oriented functions, although they cannot replace encyclopaedic notes or systematic introductions (Bergenholtz and Tarp 1995: 150–154). Systematic introductions are separate dictionary components, placed either in the front matter or in the back matter (usually following the user’s guide), and aim at providing help in cognition-oriented and communication-oriented user situations. In the former type of situations, the component provides an introduction to or a systematic, detailed presentation of the subject field covered by the dictionary, thereby fulfilling its cognitive function. In communication-related situations, the component supplements the encyclopaedic information offered with language information in the form of collocations, auxiliary words, and examples of standard LSP usage (Bergenholtz and Tarp 1995: 154–159 and 176–178; Bergenholtz and Nielsen 2006: 290–293). On the whole, this type of outer text helps users in a systematic way
Systematic introductions in specialised dictionaries
to understand the relevant subject field. In relation to the systematic introduction of the Encyclopedic Dictionary of Gene Technology, Humbley explains that: [i]n forty-odd closely printed pages the reader is presented with the basics of microbiology, DNA and gene technology and the ethics involved in gene technology. This section is illustrated with tables and structures of the entities described, and a couple of schemata to indicate how knowledge about the field has evolved, but there are generally few purely didactic illustrations. In spite of the presentation, which is predictably dry due to the extreme compression, it is obviously very useful to the reader to have a potted survey of the field. (Humbley 2003: 594)
2. Systematic introductions in specialised dictionaries Systematic introductions should be adapted to the function(s) of the dictionary, because these introductions aim to cater for users who may need a general understanding, a more systematic presentation, or a comparative description of a subject field either for comprehending (for example, students preparing for an exam), producing (for instance, journalists writing articles), and/or translating texts. Although most published dictionaries pay only scant attention to lexicographical functions, usually because the process of compiling dictionaries is primarily based on economic considerations which lead to publishing dictionaries for the largest possible audience, there are some notable exceptions. The Encyclopedic Dictionary of Gene Technology (Kaufmann et al. 1998) contains a well-developed systematic introduction written by lexicographers who paid attention to several variables affecting the length, nature, scope, and structure of systematic introductions: subject-field coverage, specialised knowledge, language of the intended users, number of languages covered, and culture dependence or independence of the subject field covered by the dictionary in question. The contents of systematic introductions are directly affected by the fact that specialised dictionaries may cover one entire subject field (single-field dictionary), several subject fields (multi-field dictionary), or one or more sub-fields (sub-field dictionary), in a very detailed (maximising dictionary) or limited (minimising dictionary) way. Consequently, systematic introductions are likely to be more relevant in either maximising or minimising single-field or sub-field dictionaries than in multi-field dictionaries, as the introductions would become too complex and voluminous in the last situation. In terms of factual knowledge of the target users, interested laypersons and semi-experts will benefit greatly from systematic introductions. In contrast, experts are unlikely to need them because their expected degree of knowledge will force lexicographers to produce very detailed and complex systematic introductions
Pedro A. Fuertes-Olivera
if they are to be of any use. Laypersons and semi-experts may benefit from two different types of systematic introductions: semi-experts requiring a more detailed and systematic presentation than interested laypersons who may benefit from a general and brief introduction to the subject field described in the dictionary. The language of the target users affects systematic introductions in respect of the explication language used. The best solution is where the explication language is the user’s native language if the dictionary primarily aims to assist users comprehend texts, and it is the user’s foreign language if the dictionary supports foreignlanguage translation and production. On the other hand, the degree of language complexity in the systematic introductions must be adapted to the linguistic competences of the intended users. The number of languages covered by the dictionary also affects systematic introductions, as spatial constraints are likely to rule out the inclusion of encyclopaedic introductions in specialised multilingual dictionaries (Bergenholtz and Tarp 1995: 177). However, monolingual and bilingual LSP dictionaries are suitable for including systematic introductions in the language(s) covered by the dictionary. This means that in bilingual dictionaries, systematic introductions should be provided in both languages, with different possibilities. For example, Bergenholtz and Nielsen (2006) propose that a multifunctional specialised dictionary designed for laypersons and semi-experts needs two systematic introductions each written in the relevant language and style; in a bilingual dictionary covering a culturalindependent subject field, the two aforementioned subject introductions will have to be converted into four; and three-times-two systematic introductions are needed for bilingual dictionaries covering culture-dependent subject fields: “one in L1 about L1, one in L1 about L2, one in L1 about differences between L1 and L2, one in L2 about L1, one in L2 about L2, and one in L2 about the differences between L1 and L2.” (Bergenholtz and Nielsen 2006: 294) The cultural component is a key lexicographical element in dictionaries for culture-dependent subject fields such as business and economics, where the subjectmatter shows major or minor differences between languages and cultures. However, matters of business and economics are subject to the workings of two opposing forces: 1) globalisation and systematisation of rules promoted by international organisations; 2) the presence of cultural traditions. In the field of accounting, there are both international standards developed by, for instance, the International Accounting Standards Board (IASB) and national rules and traditions, most of which are being constantly adapted to the international requirements.
Systematic introductions in specialised dictionaries
3. Data distribution and access in systematic introductions .
Lexicographers are expected not only to balance their texts in systematic introductions according to the characteristics in the user profile, both in terms of factual information and the language in which it is written, but also to adopt adequate data distribution and access structures. Bergenholtz and Nielsen present different options for the distribution of data: 1. All data are placed in the subject-field component. 2. The data in the subject-field component are all restatements of the data found in the articles. 3. The data in the subject-field component are partly restatements of the data in the articles. 4. The data in the subject-field component complement the data in the articles. 5. The data in the subject-field component are restatements of all the data in the articles as well as new and supplementary data. 6. The data in the subject-field component are partly restatements of some of the data in the articles and partly new and supplementary data. (Bergenholtz and Nielsen 2006: 293) They add that the best option for a multifunctional specialised dictionary for laypersons and semi-experts is option 6 for several reasons. One is that it allows lexicographers to omit very specific data about a particular term in the systematic introduction and place these data in the relevant article, where they are more easily understood by potential users (for example, a variant form of a term). Furthermore, it favours communication-oriented functions by including relevant data in the article (for example, word class, pronunciation, grammar, etc.). Option 6 also provides easy access routes through the use of explicitly and fully integrated systematic introductions. The access routes are made up of direct cross-references from the individual dictionary article to the systematic introduction, which needs to be divided into numbered chapters or paragraphs, in which key terms have been highlighted, and where illustrations (if necessary), a separate table of contents, and a subject index have also been provided. Finally, a stand-alone section covering partly new and supplementary data will assist users in a self-study context as a genuine text in its own right (Tarp 2005) if it is organised according to a systematic subject classification that disregards the alphabetic ordering, as alphabetisation will upset the subject-field inherent logic and make it more difficult for users, especially learners, to understand and relate details, particularly when references are not explicitly given but are merely implied.
Pedro A. Fuertes-Olivera
4. Encyclopaedic information in business dictionaries Business dictionaries are reference works that include the lexical units occurring in the field of business together with some specific terms typically used in economics and many general-language words. More recent analyses have shown that most existing business dictionaries do not inform users of their orientation in terms of functions (Andersen and Fuertes-Olivera 2009), and that they do not pay much attention to a new breed of users demanding more pedagogically-sound reference works (Fuertes-Olivera and Arribas-Baño 2008). The above findings have at least two important lexicographical consequences. First, a proper process of lemma selection in these reference works is difficult to implement because the processes of terminologization and de-terminologization are very active in the hybrid nature of business discourse. Second, lexicographers seem to follow a long lexicographical tradition characterised by not knowing which encyclopaedic information to include and where to place it. A result of the above processes is the proliferation of monolingual multi-field and single-field or sub-field business dictionaries containing similar data although they are targeting different user groups. Consequently, the market for business dictionaries is full of reference works with insufficient encyclopaedic information. For example, Peter Collin Publishing has a series of printed monolingual English dictionaries comprising multi-field business dictionaries (including an American business dictionary, and an English business dictionary), together with single-field and sub-field dictionaries of accounting, banking and finance, government and politics, personnel management, marketing, etc. In addition to encyclopaedic notes and encyclopaedic labels, these dictionaries contain a back matter component called ‘Supplement’, providing an overview of selected topics thought to be relevant for potential users: international telephone codes; local times around the world; numbers; weights and measures; world currencies; examples of business letters and curriculum vitae; descriptions of a number of very basic concepts. An analysis of these supplements show that the multi-field business dictionaries and single-field or sub-field business dictionaries published by Peter Collin Publishing all include similar topics and that these supplements only focus on a handful of relevant concepts. This type of outside matter component has three main drawbacks. First, the individual articles contain no cross-references to the supplements. Second, the supplements are not divided into numbered chapters or paragraphs to which references may be made by indicating the number of the relevant section in the dictionary article, nor do they highlight relevant terms by using bold face. In short, the supplements contain no structural indicators that help users navigate the texts and find what they are looking for. Third, the supplements are not based on a carefully prepared subject classification and consequently the
Systematic introductions in specialised dictionaries
concepts given are not treated as integrated systems of a domain but as separate elements. Moreover, none of the supplements is accompanied by a separate table of contents or references to relevant literature and therefore does not facilitate more extensive studies on the part of users. To sum up, two conclusions can be drawn from the lexicographical practice of including supplements instead of appropriate systematic introductions. First of all, the back matter components do not really work as systematic introductions. In the second place, the considerable similarity of the various supplements shows that lexicographers have not paid due attention to users’ needs, user situations, and lexicographical functions. The two conclusions can be illustrated by analysing the different structures found in the Dictionary of Accounting (1992). Examples 1, 2, and 3 show the lemmas accounting and equation presented in the wordlist, and the concept the accounting equation described in the back matter. ◊ accounting noun work of recording money paid, received, borrowed or owed; accounting methods or accounting procedures; accounting system; accounting bases = the possible ways in which accounting concepts may be applied to financial transactions (the methods used to depreciate assets, how intangible assets or work in progress are dealt with, etc.); (….) comment: note the various theoretical bases for accountancy, moving from the general to the specific: ‘accounting concepts’ are general: so various concepts, such as the ‘accruals concept’ may apply to depreciation; ‘accounting bases’ are more specific, so in the case of depreciation, the bases could be straight-line depreciation, reducing balance depreciation, etc.; ‘accounting policies’ are the policies applied by a company, so the company policy could be to apply straight-line depreciation in its financial statements quote applicants will be professionally qualified and have a degree in Commerce or Accounting Australian Financial Review Example 1: Accounting in the wordlist of the Dictionary of Accounting (1992) ◊ equation noun set of mathematical rules applied to solve a problem; the basic accounting equation is that assets equal liabilities plus capital Example 2: Equation in the wordlist in the Dictionary of Accounting (1992) Assets = Capital + Liabilities Which can be restated as: Capital = Assets – Liabilities
Pedro A. Fuertes-Olivera
Or: Liabilities = Assets – Capital Assets are: the resources owned by the company (cash, debtors, machinery, stocks, etc.) Liabilities and capital are: the amounts of money owed to people who have provided those resources (banks, trade creditors, shareholders, debenture holders, etc.) Example 3: The Accounting Equation in the back matter ‘supplement’ in the Dictionary of Accounting (1992) The analysis of the three examples and of the outside matter in the dictionary includes the following findings: 1. The Preface states that the dictionary provides a basic vocabulary of terms used in the fields of accounting, bookkeeping and general finance (i.e. it primarily targets laypersons and semi-experts), that it covers both British and American usage, that the words and phrases are defined in simple English, and that explanatory comments are given in some cases to expand the definitions, as shown in Example 1. 2. The above examples show that the dictionary supports text reception more than text production and translation. It only shows the word class of the entry word, together with a short definition, collocations, and examples in the form of quotes, although this only occurs in very few instances (for example, only 28 of the 248 entries and sub-entries under the letter a have a quote). 3. Although the encyclopaedic information given in the form of comments is interesting, its value for the intended primary users is negligible because comments are scarce (only 20 of the entries under the letter a have a comment) and not integrated: examples 1, 2, and 3 refer to the same concept, without crossreferences from one to another. 4. The lack of an appropriate system of cross-references is particularly evident in the treatment of accounting where accounting equation is not mentioned, nor are users cross-referred to the supplement or to the entry equation, although the dictionary contains a lot of similar expressions (accounting methods; accounting procedures; accounting system; accounting bases; accounting concept; accounting entity; accounting period; accounting policies; accounting rate of return; accounting standards; Accounting Standard Board; accounting technician; accounting unit; cost accounting; current cost accounting; financial accounting; management accounting). 5. The concepts described in the back matter are very few in number, unrelated to the dictionary articles and do not form part of a description of the conceptual structure of the subject field. The supplement refers to a few basic concepts
Systematic introductions in specialised dictionaries
(statements of accounting standard practice, the accounting equation, T accounts, modified accounts, basic double-entry bookkeeping), sometimes with examples of typical accounting texts: a profit and loss account, a balance sheet, a statement of source and application of funds, a cash flow statement, and a value added statement. Related to the above discussion is the fact that many compilers of business dictionaries are unsure where to place the necessary conceptual information. Instead of producing systematic introductions, they prefer a system of cross-references among dictionary articles for offering encyclopaedic data. A typical example is found in the Dictionary of Business and Management (Rooney et al. 2003). Its User’s Guide (2003: vii) claims that the dictionary provides definitions of more than 6,000 international business and management terms, covering the fields of e-commerce, economics, finance, banking and accounting, human resources and personnel, marketing, operations and production, and statistics; it also adds that the there is a back matter component called ‘Facts and Figures’, containing “fascinating data on the world economy”, “practical examples of key business components”, and “comprehensive listings of world currencies and stock exchanges.” On closer inspection, however, the 19-page back matter is not useful because it shows the same deficiencies as the Peter Collin dictionaries and because most of the pages include randomly selected information: 1. 12 pages are devoted to constantly-changing economic indicators for a range of countries: national income, growth, GNI per capita, growth of output, external debt, manufacturing output, new patent applications, passenger car ownership, TV ownerships, billion-dollar brands, retail sales growth, list of billionaires per country. This information is useless conceptually (it changes almost on a daily basis) and communicatively, as it is only a list. 2. 4 pages are reserved for listing currencies and capitals of the world’s top business centres and stock exchanges. Again, this information has a very limited use and does not offer any hint regarding the conceptual structure of the domain. 3. 3 pages are used for explaining CVs and accompanying covering letters. Although they are acceptable illustrations they only cover a tiny part of the conceptual domain and therefore have very limited value. As in the case of the Peter Collin collection, the Dictionary of Business and Management (2003) resorts to including detailed encyclopaedic data in the dictionary articles by means of a well-developed system of cross-references among the entries. Direct and indirect cross-references are used to link terms that are conceptually
Pedro A. Fuertes-Olivera
related (Examples 4 and 5 respectively), or in need of expansion in another article (Example 6). administration school Gen Mgt see business administration Example 4: Use of see together with bold face and italics for direct crossreferences in the Dictionary of Business and Management (2003) activity-based management Gen Mgt a management control technique that focuses on the resource costs of organisational activities and processes, and the improvement of quality, profitability, and customer value. This technique uses activity-based costing information to identify strategies for removing resource waste from operating activities. Main tools employed include: strategic analysis, value analysis, cost analysis, life-cycle costing, and activity-based budgeting. Example 5: Use of bold face and italics for indirect cross-references in the Dictionary of Business and Management (2003) Adair, John Eric (b. 1934) Gen Mgt British academic. Best known for his three-circle model of leadership, which is based on overlapping circles representing the task, the team, and the individual. Adair’s model, otherwise known as action-centred leadership, is described in the book of the same name (1973). Like Warren Bennis, Adair, who has a military background, believes that leadership can be taught. Example 6: Expanded information in the Dictionary of Business and Management (2003) More interesting are the back matter components ‘Apéndices’ (Appendices) in the Diccionario de Economía y Finanzas [Dictionary of Economics and Finance] (Tamames and Gallego 1994), a very popular dictionary in the Spanish market since its first edition was published in 1988. In its 1994 edition, the dictionary lemmatises around 5,000 Spanish business/economics terms in its main wordlist and includes a well-developed frame structure containing different front matter and back matter components. The former consist of a table of contents, a preface written by a famous Spanish linguist highlighting the importance of cognition-oriented dictionaries, and an introduction and a user’s guide by the authors commenting on lexicographical aspects of the dictionary. The back matter is devoted to encyclopaedic data: 1. A graphic and mathematical 154-page appendix containing very detailed verbal definitions, including mathematical formulae, of 154 terms. In the user’s guide the authors indicate that these terms are also described in the main wordlist in simpler language and without the help of mathematical language.
Systematic introductions in specialised dictionaries
2. 3. 4.
5.
6.
7.
8. 9.
Although the authors do not explain the reasons for the inclusion of the appendix (in the user’s guide, they only indicate that the 154 terms were included because all of them can be explained by means of graphs and figures), this lexicographical practice is in line with the tenets of the function theory as it equates the complexity of definitions with the needs of the target group (Bergenholtz and Kaufmann 1997). For unknown reasons, this appendix does not appear in the recent editions of the dictionary, even though such a component is very useful for semi-experts. For example, students can use it as a study book of 154 important concepts in the field. A list of world countries specifying currencies and their symbols. A list of prestigious economic journals, indicating publishing house, periodicity, and contents. A list of business/economic terms which should be included in the Diccionario de la Lengua Española (Spanish Language Dictionary published by the Spanish Royal Academy). A list of all the Spanish terms included in the wordlist with their English equivalents and the sub-field(s) to which they belong. In the user’s guide the authors refer to this list as a kind of “fast vocabulary” that is very useful for reception purposes as it allows easy access to terms in both Spanish and English and the discovery of related conceptual terms as they all cross-refer users to the classification system used in the next component of the dictionary. A classification system based on the classification proposed by the Journal of Economic Literature, perhaps the most widely used thematic classification employed in business/economics, containing all the terms assigned to each of the 29 sub-fields covered by the dictionary. An English-Spanish wordlist that is very suitable for the comprehension of English texts by Spanish-speaking users who can look up the listed terms in the main wordlist. An index of proper names. A list of references containing more than 100 dictionaries and handbooks.
Only one of the above components is incorporated in the 2006 edition (13th edition) of the dictionary: the classification system, which has been simplified to make it more suitable for semi-experts and laypersons. This edition introduced a new component: a list of abbreviations and acronyms, some of which are also in the main wordlist. Finally, the Diccionario de Términos Económicos, Financieros y Comerciales [Dictionary of Economics, Finance and Business Terms] (2008) contains a front matter component of twenty-odd pages called ‘introducción’ (introduction), in which the authors explain that the dictionary targets primarily Spanish students of
Pedro A. Fuertes-Olivera
ESP and translation. Consequently, the encyclopaedic information offered include encyclopaedic labelling and notes suitable for the intended users, and a 10-page academic description of the register focusing mainly on aspects of term formation (for example, many English financial terms are metaphorical) and the linguistic processes used in the dictionary for giving equivalents (calques, loan translations, translations, adaptations, etc.). In Example 7, the lexicographers use superscript, sub-field labels such as mer finan/prod/diner, bolsa (financial market/product/money/stock market), language and etymology comments (pipo is a calque whose origin can be traced to domino or dice games), cross-references to other articles (the symbol V), and style markers (col.): pip1 n: MER FINAN/PROD/DINER, BOLSA pipo; se aplica este término, calcado del inglés, a la fluctuación en un tipo de cambio, equivalente a 0,00001 unidades; en su origen, la palabra inglesa se refiere a cada uno de los puntos que aparecen en las fichas de dominó o de los dados; V. junior pip. [Exp: pip2 (ganar por un margen escasísimo ◊ The company was just pipped for/to the contract), pipped at the post, be col (perder en los últimos metros/por un pelo)]. (pip1 n: FINANC MAR/PROD/MON/, STOCK MARKET pipo; this term, an English calque, refers to the fluctuation in a type of change, equivalent to 0.00001 units; in its origin, the English word refers to each one of the points that appear on the domino tiles or on dice; V. junior pip. [Exp: pip2 (ganar por un margen escasísimo ◊ The company was just pipped for/to the contract), pipped at the post, be col (perder en los últimos metros/por un pelo)) Example 7: The entry pip in the Diccionario de Términos Económicos, Financieros, y Comerciales (2008) The above analysis shows that business dictionaries do not have proper systematic introductions, but the encyclopaedic information given is mostly addressed to the dictionary articles, and the separate components contained in the outside matter lack a full presentation of the domain, or do not refer to the subject field but to something else. In addition, these components do not support the genuine purpose of the dictionary and do not function interactively with the wordlist to assist users in retrieving the required information from the dictionary. The way in which lexicographical structures are employed gives lexicographers the opportunity to be innovative. For instance, we can devise systematic introductions that explain the field of accounting to users by incorporating them into the Accounting Dictionaries (Nielsen, Mourier and Bergenholtz 2003–2008), thus utilising the outer texts in a much more creative and innovative way than many published business dictionaries do. Moreover, assuming that dictionaries
Systematic introductions in specialised dictionaries
can only be used effectively if the target users can have unimpeded access to the data they need (Bergenholtz and Gouws 2007), the proposed outer text must be of the type called extended aided integrated systematic structure (extended aided integrated subject-field component in Bergenholtz and Nielsen 2006: 292–3). 5. Proposing a systematic introduction for the Accounting Dictionaries The Accounting Dictionaries (Nielsen, Mourier and Bergenholtz 2003–2008) consist of a network of four online accounting dictionaries for learning accounting and its specialised language. The network contains two monolingual (a Danish and an English) and two bilingual (a Danish-English and an English-Danish) accounting dictionaries linked to each other so that users can easily move from one dictionary to another. Nielsen (2007) explains that the network was designed to assist users in learning L1 as well as L2 accounting terminology and usage. Each of the four accounting dictionaries is polyfunctional. For example, the English dictionary provides data for reading, understanding and producing accounting texts; and the English-Danish dictionary is designed for translating texts into and producing texts in Danish. The incorporation of a Spanish part to the dictionary network targeting mainly Spanish students in need of learning native-language as well as foreign-language accounting terminology is proposed. These students need to acquire knowledge about accounting through terminology as well as register-specific usage in Spanish and English to understand textbooks and to write factually and grammatically correct accounting texts. Those users who are enrolled in translation programmes also need to acquire knowledge that enables them to produce factually and grammatically correct translations of accounting texts. This “requires factual and linguistic knowledge about the native-language source text, including register, and knowledge about the foreign-language register and usage of the target text.” (Nielsen 2007: 369). The factual and linguistic data are distributed throughout the dictionary, forcing lexicographers to carefully decide their arrangement and to devise a system of relations between them. In specialised dictionaries, lexicographers should provide part of the encyclopaedic information in systematic introductions in order to reduce the length of individual articles, avoid repetitions, and provide an overall, systematic overview of factual (and linguistic) aspects of the subject field. The proposed systematic introductions will have a primarily cognition-oriented function for semi-experts (Spanish students) and deal secondarily with the communication-oriented functions of text reception, text production and translation (English into Spanish and Spanish into English). Another important
Pedro A. Fuertes-Olivera
aspect is the process of standard-setting, which has led to a convergence of European and American norms and rules, and whose lexicographical consequence is that the systematic introduction envisaged will be similar to one intended for a culture-independent subject field. Although a possible solution is to prepare three-times-two systematic introductions as proposed by Bergenholtz and Nielsen (2006), this will not be necessary in the field of accounting in light of the aforementioned trend towards systematisation and harmonisation of rules. Instead, a culture-dependent presentation of the subject field of accounting in the form of a comparative systematic introduction in the different languages covered by the dictionary is called for, together with ample room for commenting on peculiarities. This solution will imply that the systematic introduction in an English-Spanish bilingual dictionary of accounting should be written in both languages, focus on the international rules, and, when necessary, refer briefly to relevant national rules and traditions explained in full in the corresponding dictionary article(s). For instance, the entries for cost value and market value must cross-refer users to the systematic introduction explaining accounting methods and procedures and to the entry swap (permuta) where lexicographers can comment on an important difference between Spanish and English accounting traditions: when recording the value of a swap, English accounting systems tend to record the market value of the swap whereas Spanish accountants are required by law to record the cost value of the swap. Moreover, there are some exceptions to this general principle and data on these should be presented in the dictionary article for the Spanish term permuta. These specific data should be given in encyclopaedic notes, reserving the systematic introduction for explaining more general concepts such as the way of recording values in balance sheets and accounting books. In particular, the proposed introduction to the subject field will be an extended aided fully-integrated systematic introduction (Bergenholtz and Nielsen 2006) based on the following principles: 1. It will provide a systematic overview of the entire subject field and can therefore be used as an independent textbook on accounting (Bergenholtz and Svensén, 1994) written in pedagogical language. 2. It will be written in English and Spanish with information based on the conceptual framework prepared by the International Accounting Standards Board, supplemented, if necessary, by comparative information about differences among Spanish, English and American practices; only general principles will be explained in the systematic introduction whereas specific details will be explained in the dictionary articles.
Systematic introductions in specialised dictionaries
3. It will contain new and supplementary data together with restatements of the data already given in the dictionary articles; specific data about a particular term will be included in the relevant article treating the term. 4. The two language versions will be presented in parallel columns, as this will facilitate comparison of terms and usage (Bergenholtz and Tarp 1995: 158–159). Example 8 shows a draft of the initial chapter of the systematic introduction to be incorporated into the Accounting Dictionaries. La contabilidad es la ciencia económica que analiza la realidad económico-financiera de un agente con vistas a ofrecer una información sobre el estado patrimonial pasado, presente y futuro del mismo. Consiste en anotar en los diferentes libros de contabilidad las diferentes partidas desglosando lo que el agente paga, recibe, presta, o toma prestado. El producto final de la contabilidad son las cuentas anuales, que son un conjunto de estados contables o estados financieros integradas por el balance, la cuenta de pérdidas y ganancias, la memoria explicativa, el estado de cambio en el patrimonio neto, y el estado de flujos de efectivo. En España es costumbre acompañar las cuentas anuales con el informe de gestión.
Accounting is the economic science devoted to the keeping of accounts, and preparation and audit of financial and economic statements with the aim of offering a fair and true view of the past, present, and future wealth of an organization. The practice of accounting, referred to as accountancy in the USA, consists in recording in different accounting books money paid, received, borrowed, or owed. The final product is the annual accounts or financial statements which comprises documents such as the balance (sheet), the profit and loss account, the annual report, changes in equity, cash-flow statement, and auditor’s report.
Example 8: § 1. La contabilidad / accounting Example 8 informs users that accounting is a science (i.e. its procedures are subject to scientific practices accepted everywhere) whose records offer a fair and true view of the money paid, received, borrowed, or owed by an organization, thus helping market agents make informed decisions. Users also learn that a Spanish tradition requires the joint publication of annual accounts and management reports. In addition, users are cross-referred to specific dictionary articles where necessary data for cognition-oriented and communication-oriented functions are given.
Pedro A. Fuertes-Olivera
6. Concluding remarks As practical tools specialised dictionaries should assist users in enhancing their knowledge of both the subject field and the related language. The preparation of well-developed systematic introductions will contribute to both tasks, especially if their design is based on users’ profiles and dictionary functions. The length, nature, scope, and structure of systematic introductions in specialised dictionaries should be adapted to variables such as the subject fields covered by the dictionary, the degree of factual knowledge target users have together with their native language competence, users’ foreign-language competence, the number of languages covered, and the nature of the specialised language recorded. The analysis of the encyclopaedic information contained in business dictionaries indicates that there is room for improving their quality by incorporating extended aided integrated systematic introductions, because most existing business dictionaries do not usually include well-thought-out outer matter components but continue to include traditional ‘supplements’ whose information is almost useless, particularly because the information is selected at random and does not target specific users. The proposed Spanish part to the Accounting Dictionaries will aim mainly at Spanish students of accounting but also at Spanish translators of economic texts. For these users we will include an extended aided systematic introduction of accounting written in Spanish and English and which will cater for both the cognitive and communicative needs of the intended users. This introduction will offer an overview of the subject field of accounting which may be used as a textbook, and the Spanish and English texts will illustrate typical accounting language that is useful for producing texts in and translating texts into English and/or Spanish. The English systematic introduction will be particularly relevant when companies have to present their annual financial records in English either because of legal requirements or because they want to have their shares quoted on international stock markets. References A.
Dictionaries
[Accounting Dictionaries] Nielsen, S., Mourier, L. and Bergenholtz H. 2003–2008. Danish Accounting Dictionary. URL: http://www.regnskabsordbogen.dk/regn/dkdk/dkregn.aspx; Danish-English Accounting Dictionary. URL: http://www.regnskabsordbogen.dk/regn/ dkgb/dkgbregn.aspx; English Accounting Dictionary. URL: http://www.regnskabsordbogen. dk/regn/gbgb/gbregn.aspx; English-Danish Accounting Dictionary. URL: http://www.regnskabsordbogen.dk/regn/gbdk/gbdkregn.aspx (Accessed June 23 2008).
Systematic introductions in specialised dictionaries [American Business Dictionary] Collin, P.H., Weiland, C. and Dohn, D.S. 1991. American Business Dictionary. London: Peter Collin Publishing. [Dictionary of Business and Management] Rooney, K. et al. 2003. Chartered Management Institute Dictionary of Business and Management. London: Bloomsbury. [Diccionario de Economía y Finanzas] Tamales, R. and Santiago G. 1994. Diccionario de Economía y Finanzas. First edition. Madrid: Alianza Editorial. [Diccionario de Términos Económicos, Financieros y Comerciales] Alcaraz Varó, E. and Hughes, B. 2008. Diccionario de Términos Económicos, Financieros y Comerciales. A dictionary of economic, financial and commercial terms. Inglés-Español / Spanish-English. Fifth edition. Barcelona: Ariel. [Dictionary of Accounting] Collin, P.H. and Joliffe, A. 1992. Dictionary of Accounting. London: Peter Collin Publishing. [Encylopedic Dictionary of Gene Technology] Kaufman, U., Bergenholtz, H., in cooperation with B. Stumman, S. Tarp, L. de la Rosa Marabet, N. la Serna Torres and G. la Serna Miranda. 1998. Encyclopedic Dictionary of Gene Technology. English-Spanish. Diccionario Enciclopédico de Ingeniería Genética Español-Inglés. Toronto: Lugus. [English Business Dictionary] Collin, P.H. 1986. English Business Dictionary. London: Peter Collin Publishing.
B.
Other Literature
Andersen, B. and Fuertes-Olivera, P.A. 2009, forthcoming. “The application of function theory to the classification of English monolingual dictionaries.” Lexicographica 25. Bergenholtz, H. and Gouws, R. 2007. “The access process in dictionaries for fixed expressions.” Lexicographica 23: 236–260. Bergenholtz, H. and Kaufmann, U. 1997. “Terminography and lexicography. A critical survey of dictionaries from a single specialised field.” Hermes 18: 91–125. Bergenholtz, H. and Nielsen, S. 2006. “Subject-field components as integrated parts of LSP dictionaries.” Terminology 12(2): 281–303. Bergenholtz, H. and Svensén, B. 1994. “Systematisk inledning till Nordisk lexikografisk ordbok (NLO)” [Systematic introduction to the Nordic Dictionary of Lexicography]. LexicoNordica 1: 149–185. Bergenholtz, H. and Tarp, S. (eds). 1995. Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries. Amsterdam/Philadelphia: John Benjamins. Bergenholtz, H. and Tarp, S. 2003. “Two opposing theories: On H.E. Wiegand’s recent discovery of lexicographic functions.” Hermes 31: 171–196. Bergenholtz, H. and Tarp, S. 2004. “The concept of dictionary usage.” Nordic Journal of English Studies 3: 23–36. Fuertes-Olivera, P.A. and Arribas-Baño, A. 2008. Pedagogical Specialised Lexicography. The representation of meaning in English and Spanish business dictionaries [Terminology and Lexicography in Research and Practice 11]. Amsterdam/Philadelphia: John Benjamins. Humbley, J. 2003. “Review of Kaufmann, U. and Bergenholtz et al. (1998): Encyclopedic Dictionary of Gene Technology, vol. 1, English (with Spanish equivalents), 385 p. vol. 2, Spanish (with English equivalents): Toronto: Lugus Libros, 411 p.” Meta 48(4): 593–595.
Pedro A. Fuertes-Olivera Nielsen, S. 2007. “Dictionary networking in an LSP learning context.” In Teaching and Learning LSP: Blurring Boundaries. Proceedings of the 6th International AELFE Conference, Lisbon 13–15 September, M. Kuteeva and H.F. Martins (eds.), 367–373. Lisbon: ISCAL. Tarp, S. 2005. “The pedagogical dimension of the well-conceived specialised dictionary.” Ibérica 10: 7–22.
part iv
Data retrieval and corpus lexicography
The role of corpora in future dictionaries D.J. Prinsloo The aim of this article is to reflect on current utilization of corpora and lexicographic tools built from corpora as well as on future perspectives on the development of corpora and corpus tools. The point of departure is a lexicographic perspective for Afrikaans and the Bantu languages spoken in South Africa. The crucial development steps to future corpus-based lexicography, in chronological order, are: corpus creation, corpus annotation, qualitative corpus query outputs and advanced dictionary writing systems capable of extracting relevant data from corpora and other lexicographic sources. Such advanced dictionary writing systems place the lexicographer in the position of a final editor rather than a dictionary compiler. Keywords: corpus lexicography; lemma selection; corpus annotation; data processing; lexical resource building
1. Introduction As a leading international lexicographer, Henning Bergenholtz appreciates and utilizes electronic corpora for his lexicographic work. The corpus is generally acknowledged as an indispensable resource for the creation of dictionaries and lexicographic tools, and is a source for a variety of query programs. The value of a corpus or ‘the corpus’ should, however, not be over- or underestimated – it cannot replace the lexicographer, nor should it be regarded as inferior to the knowledge of the lexicographer in any respect (cf. Landau 2001: 273 – 342 for detailed descriptions of the history of corpus development, corpus compilation and corpus use). Corpora should thus be seen as a significant resource to be used judiciously. In the words of Teubert “by exploiting corpora, bilingual and multilingual lexicography can reach a new quality level” (1996: 241). Roberts and Montgomery provide a sound perspective: … despite the many advantages that corpora present, they must be used with some caution. The lexicographic evidence they provide must be subjected to the
D.J. Prinsloo
sound judgement of lexicographers. And lexicographers must ensure that they do not become overwhelmed by corpus evidence! (1996: 463)
It will be argued in this article that the future power of corpora as lexicographic resources will reach far beyond mere size, and corpora must be both balanced and representative. Their value as an aid to lexicography will increasingly depend on several levels of annotation, e.g. of a morphosyntactic kind and/or bi- or multilingual alignment, and on the existence of advanced corpus query programs capable of processing vast quantities of data and of rendering user-friendly and machine readable outputs. Currently available corpus query outputs — ranging from the most basic, such as alphabetical and frequency word and lemma lists and keywords in context, to more sophisticated outputs, such as lexicographic rulers and block systems, word sketches and automated translation equivalent detection — are expected to remain the cornerstone of corpus-based or corpus-driven lexicography. However, the contribution of the corpus of the future and of the future of the corpus depends on the extent to which comprehensive behavioural patterns of words can be compiled and presented to lexicographers in a condensed but userfriendly, machine-readable format. Brief descriptions of the corpus-query outputs mentioned above will be given, and the main obstacles in terms of the compilation of comprehensive behavioural patterns will be outlined and aligned with the work and vision of John Sinclair, Sue Atkins, Patrick Hanks and Henning Bergenholtz. 2. Corpora and corpus lexicography for Afrikaans and the nine official Bantu languages in South Africa Corpora created for isiZulu, isiNdebele, Siswati, Xitsonga, Tshivenda, isiXhosa, Sepedi, Setswana, Sesotho and Afrikaans range from one million to ten million tokens. Mega-corpus data, however, exist for Afrikaans in the form of the Media24 archive estimated at 1,000 million tokens. Data culled from all of these raw corpora has contributed substantially to dictionary compilation for these languages as well as to Human Language Technology (HLT) activities, such as the compilation of spelling checkers. Efforts to annotate these corpora for parts of speech are in progress, opening the way for advanced HLT products, such as grammar checkers, parsers, machine translation software and text to speech converters. 2.1
Major English corpora
Major English corpora include the American National Corpus, Bank of English, British National Corpus, BYU Corpus of American English, Oxford English Corpus,
The role of corpora in future dictionaries
and the Scottish Corpus of Texts & Speech. These corpora contain hundreds of millions of tokens and are indispensable as databases in modern lexicography. Such corpora also constitute the foundation for future lexicographic development. 2.2
Corpus design, balance, representativeness
Apart from the issue of corpus size, corpus compilers pay particular attention to the nature, design, types and quantities of material collected and included in their corpora. This generally concerns concepts such as “balance” and “representativeness”. A general corpus is typically designed to be balanced, by containing texts from different genres … including spoken and written …. (Kennedy: 1998: 20) For a corpus to be ‘representative’ there must be a clearly analysed and defined population to take the sample from. (Kennedy: 1998: 52)
Bergenholtz and Tarp (1995: 94) assert that the requirement for exemplary composition of corpora consisting of LSP texts implies that: 1. The corpus should cover all sub-fields of the subject field in question. 2. The text types which the dictionary intends to consider should be included in relation to their presumed relevance to the intended dictionary users and situations. Where corresponding translations for source texts exist, Bergenholtz and Tarp add that these might be incorporated as baseline material in the preparation of a bilingual LSP dictionary but warn against a number of problematic aspects that might introduce errors into the compilation of such a dictionary: One single translator does not usually have at his disposal such a comprehensive and up-to-date collection of texts that the material may be considered sufficiently large to provide an exemplary section of the LSP(s) which the dictionary is intended to cover. (Bergenholtz and Tarp 1995: 95)
Corpus compilers have also paid much attention to corpus planning and design – the design of the Longman Lancaster English Language Corpus (Summers 1993: 201) is a good example of such efforts. The reality for most of the Bantu languages is, by contrast, such that a neatly designed collection strategy is not yet possible. Ultimately, the selection process simply involves the collection of all available texts for the specific language in order to reach corpus sizes of a few million running words.
D.J. Prinsloo
2.3
The lemma list
One of the basic uses of corpora and corpus-query tools is the extraction and compilation of lemma lists from existing corpora. In practice, this means the generation of frequency and alphabetical lists of words from corpora and their subsequent lemmatisation. For languages such as Sepedi, Setswana, Sesotho, Tshivenda and Xitsonga which use a disjunctive orthography, the correlation between orthographic word and lemma is very close, that is, a word list generated from the corpus is roughly equal to the eventual dictionary lemma list with items smaller than words (e.g. affixes) or multiword lemmas as exceptions. For the conjunctively written languages isiZulu, isiXhosa, isiNdebele and Siswati for which traditionally a stem lemmatisation strategy is followed, a rather complex lemmatisation process has to be performed on such word lists, especially in the absence of lemmatisers for these languages. Such frequency and alphabetical word lists are largely used to determine what should be included or excluded from the dictionary – often regarded as the most basic challenge in lexicography (cf. De Schryver and Prinsloo 2000a) in respect of combating inconsistencies at the macrostructural level. 2.4
Sense distinction, extraction of examples, clusters, collocates, idioms and expressions
Keyword-in-context extraction, commonly referred to as concordance lines, can also be regarded as a basic corpus-query output of significant value as a lexicographic aid. Analysis of a digestible number of concordance lines can assist the lexicographer in terms of sense distinction (cf. De Schryver and Prinsloo 2000b) and also in pinpointing examples, clusters, collocations, idioms, etc. It is of interest that taking examples from the corpus as opposed to generating author-constructed examples became a point of dispute for many years (cf. Prinsloo and Gouws 2000). 2.5
Balancing alphabetical stretches
Proper balance in alphabetical stretches in dictionaries is a lexicographic aspect that has arguably not enjoyed sufficient attention in lexicographic studies. It is a generally accepted principle that alphabetical categories in any given language do not contain an equal number of words. In Sepedi, for example, the categories B, D, L S, and especially M, contain the majority of words while categories such as J, Q, U, V, X, Y and Z are relatively small. Prinsloo and De Schryver (2002) describe the design of so-called lexicographic rulers that reflect the presumed balance in alphabetical stretches. Figure 1 illustrates such a lexicographic ruler devised for Setswana.
The role of corpora in future dictionaries
Figure 1. A lexicographic ruler for Setswana (Prinsloo 2004: 165)
Prinsloo and De Schryver (2003) also highlight examples of glaring imbalances even in the compilation of major dictionary projects. They indicate, for example, that the alphabetical stretch K in the Woordeboek van die Afrikaanse Taal (WAT) has been grossly over-treated in the absence of a lexicographic ruler for Afrikaans. It is apparent, therefore, that balancing alphabetical stretches with the aid of a lexicographic ruler is axiomatic in the compilation of corpora to enhance the quality of future dictionaries. 2.6
Compilation process regulators
In practice, few dictionary compilation projects are ‘on schedule’ and programme managers often find it difficult to ‘pace’ the dictionary compilation process. Once again, the corpus offers an indispensable solution to compilers, in terms of a percentage-based subdivision of the Setswana Ruler referred to as a Block System in Table 1, for instance. Table 1. A Block System for Setswana (Prinsloo 2004: 165)
D.J. Prinsloo
3. Measuring the lexicographic impact factor It can be argued that the inclusion (or exclusion) of each lemma in a dictionary should be justified in terms of certain criteria, e.g. frequency, cultural/curriculum term, etc. The authors of corpus-based dictionaries often defend inclusion versus omission of lemmas either in terms of frequency in the general corpus or, in the case of school dictionaries, in terms of a frequency playoff between the general corpus and so-called dedicated or domain-specific corpora containing the study material of learners (cf. De Schryver and Prinsloo 2003). Dictionaries compiled by the state-funded National Lexicography Units for Bantu languages in South Africa have a strong obligation to include cultural terms and pedagogic terminology. From the point of view of frequency, the question, especially in the case of a restricted dictionary, is to what extent the dictionary covers the existing vocabulary of the language. To rephrase this in corpus-linguistics terms: what is the percentage of tokens in the corpus covered by the dictionary? Word frequencies in any given corpus are truly Zipfian in the sense that a relatively small percentage of highly used words accounts for a large percentage of occurrences in the language. Zipf ’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc. For example, in the Brown Corpus “the” is the most frequently occurring word, and all by itself accounts for nearly 7% of all word occurrences (69971 out of slightly over 1 million). True to Zipf ’s Law, the second-place word “of ” accounts for slightly over 3.5% of words (36411 occurrences), followed by “and” (28852). Only 135 vocabulary items are needed to account for half the Brown Corpus. (Wikipedia 2009)
In COBUILD2 the most frequent 14,700 lemmas are marked by means of filled diamonds on a scale of five filled diamonds to one filled diamond in descending order (Table 2). From Table 2, it is clear that the top 1,900 lemmas represent 75% of English (tokens) and the top 14,700 an astonishing 95%.1 A Sepedi dictionary in which the top 5,000 tokens in the Pretoria Sepedi Corpus (PSC)2 have been lemmatised, covers almost 90% of the corpus or, roughly generalised, 90% of Northern Sotho in a given context. From Table 2 and Table 3 the impact factor of high frequency for English and Sepedi is clear and the implication for lexicography is quite significant.
The role of corpora in future dictionaries
Table 2. Summary of frequency band values in COBUILD 2 (1995) Number of filled diamonds
Lemmas per category
5 4 (Total 5+4) 3 2 1 (Total 3+2+1) (Total 5+4+3+2+1)
Totals
% of all written and spoken English
1900
75
12800 14700
20 95
700 1200 1500 3200 8100
Table 3. Types versus tokens in Sepedi Types (Number of different words) Top 1,000 Top 5,000 Top 10,000
3.1
Total frequencies (Sum of all counts)
Tokens (Total number of words in the corpus)
4,615,053 5,250,768 5,462,500
5,957,553 5,957,553 5,957,553
% of tokens
77. 5 88.1 91.7
Frequency trajectories
Corpora reflecting language over a time frame of at least a few decades can be very useful to the lexicographer and this aspect of corpus data has not been utilised to its full potential. In the case of lexicographic labelling, for example, lexicographers rely on introspection in the allocation of chronolectic labels such as obsolete, antiquating, neologism, fashion word, etc. Corpus frequency data over an extended period, however, offers a firm(er) basis for allocating such labels. Prinsloo and Gouws (2006) study frequency trajectories of Afrikaans words in the Media24 archive and show how frequency trajectories, even for a relatively short period of two to three decades, can provide clear evidence for chronolectic labelling (Table 4). Table 4. Frequencies per 50 million tokens 1985–2003 Word MISSIE ‘Mission’ BOSBERAAD ‘Bush summit’ REGSTELLENDE ‘Affirmative’
1985–89 /50M
1990–94 /50M
1995–99 /50M
2000–03 /50M
108 0 45
320 205 1386
527 201 2196
338 119 1248
D.J. Prinsloo
Figure 2. Frequency trajectories of missie, bosberaad and regstellende
Figure 2 demonstrates that the frequency of use of missie (mission) in the late nineties was five times higher than in the late eighties but that its frequency has declined in the first few years of the new millennium to the levels of the early nineties. Exactly the same patterns are observed for bosberaad (bush summit) and regstellende (affirmative). Compare Table 5 of the words gonswoord (buzz word), scenario (scenario), both labelled as neologisms in Verklarende Handwoordeboek van die Afrikaanse Taal (HAT) and the words uitset (output) and bemagtig (empower), both words having a relatively new but unlabelled sense (Figure 3). In summary, the compilers of this particular dictionary are able to consider labelling the specific sense of missie and bosberaad in a revised edition, e.g. as verouderend (antiquating), whereas scenario, uitset, bemagtig and gonswoord can be unlabelled. Table 5. Fashion words and neologisms scenario, uitset, bemagtig and gonswoord Word SCENARIO UITSET ‘output’ GONSWOORD ‘buzzword’ BEMAGTIG ‘empower’
1985–89 /50M
1990–94 /50M
1995–99 /50M
2000–03 /50M
58 19 0 1
236 57 1 44
265 67 12 459
325 86 24 632
The role of corpora in future dictionaries
Figure 3. Frequency trajectories of scenario, uitset, gonswoord and bemagtig
3.2
Homonym separation
Frequency lists culled from raw corpora or even from corpora annotated for partof-speech do not clearly reflect frequency in the case of homonyms. The Sepedi token kae occurs 5,178 times in the PSC but no conclusion can be drawn in terms of the frequency breakdown of the interrogative kae (where) versus the adjective kae (how many). The actual respective frequency counts can, however, easily be detected in the POS-tagged PSC, because they carry different POS-tags and occur in different morpho-syntactic environments. Co-texts for kae (where) typically include a verb (V) in close proximity (e.g. in the position directly preceding kae, (L1) while co-texts of kae (how many) usually have a demonstrative (CDEM) in the L1 position. In the case of kgabo (flame) versus kgabo (ape) however, POS-tags do not offer a solution, because both are tagged as a noun of class 9 (N09) and unlike kae (where) versus kae (how many), kgabo (flame) versus kgabo (ape) typically occur with a similar co-text as far as POS is concerned. Disambiguation by means of typical collocations is in an experimental phase but preliminary results are quite impressive. The positions L1, L2 – L5 and R1, R2 – R5 to the right, are computationally scanned for the respective sets of collocates mollo (fire), swa/fiša (burn), etc., versus (di)tšhwene (baboon(s)), (di)phoofolo (wild animal(s)), etc. So, for example, from 215 occurrences of kgabo in the PSC, 138 (64.2%) could be detected as referring to a flame, 29 (13.5%) as indicating an ape and 31 (14.4%) as referring to the personal name Kgabo with only 17 (7.9%) instances where no indications could be found within 100 characters of co-text to the left or to the right of kgabo.
D.J. Prinsloo
4. Corpus annotation for lexicographic purposes 4.1
Part-of-speech (POS) tagging
Grammatical or part-of-speech tagging assigns a part-of-speech label to each element in the corpus, as in the Sepedi example: Monna yo o tlile ka mmotoro. (This man came by car). See Example 1. Monna [N01] yo [CDEM01] o [CS01] tlile [V] ka [PAINST] mmotoro [N03].[.] Example 1: Sepedi example of part-of-speech tagging. (N01: noun of class 1, CDEM01: concord demonstrative of class 1, CS01: concord subject of class 1, V: Verb, PAINST: Particle instrumental, N03: noun of class 3) 4.2
Grammatical parsing
Part-of-speech tagging is often followed by syntactical analysis assigning structural markers to each constituent in the corpus. On a first, non-recursive level, chunking might be processed. A second level of syntactic analysis entails the detection of recursive constituents. A fully syntactically annotated corpus (or Treebank) offers significant assistance to lexicographers when used, for example, for the detection of collocations and subcategorization information. Consider Example 2 from the University Centre for Computer Corpus Research on Language (UCREL). [S [N Nemo NP1,, [N the_AT killer_NN1 whale_NN1 N],, [Fr [N who_PNQS N] [V ‘d_VHD grown_VVN [J too_RG big_JJ [P for_IF [N his_APP$ pool_NN1 [P on_II [N Clacton_NP1 Pier_NNL1 N] P]N]P]J]V]Fr]N],, [V has_VHZ arrived_VVN safely_RR [P at_II [N his_APP$ new_JJ home_NN1 [P in_II [N Windsor_NP1
The role of corpora in future dictionaries
[NP safari_NN1 park_NNL1 NP] N]P]N]P]V]. S] Example 2: Treebank from the University Centre for Computer Corpus Research on Language (UCREL 2009) 4.3
Semantic analysis
For lexicographic purposes, semantic tagging is aimed at sense distinction as illustrated in Example 3.
PPIS1 VV0 AT1 JJ NN1 IO NN1
I like a particular shade of lipstick
Z8 E2+ Z5 A4.2+ O4.3 Z5 B4
Example 3: Sense distinction by semantic tagging The semantic tags on the right are composed of: 1. an upper case letter indicating general discourse field; 2. a digit indicating a first subdivision of the field; 3. (optionally) a decimal point followed by a further digit to indicate a finer subdivision; 4. (optionally) one or more ‘pluses’ or ‘minuses’ to indicate a positive or negative position on a semantic scale. For example, A4.2+ indicates a word in the category ‘general and abstract words’ (A), the subcategory ‘classification’ (A4), the sub-subcategory ‘particular and general’ (A4.2), and ‘particular’ as opposed to ‘general’ (A4.2+). Likewise, E2+ belongs to the category ‘emotional states, actions, events and processes’ (E), subcategory ‘liking and disliking’ (E2), and refers to ‘liking’ rather than ‘disliking’ (E2+). (UCREL 2009) 5. Enhanced processing of corpus data Editors-in-chief or those in charge of production know that time spent on studying concordance lines often leads to reduced lexicographic output. The gap between corpus-query output formats and the dictionary article is as yet too wide, i.e. too
D.J. Prinsloo
much manual processing of the provided data by lexicographers is required. The key to the reduction of such manual processing probably lies in what Atkins and Rundell (2008: 103) call the Corpus Query System (CQS) of which lexical profiling as performed by Word Sketches (cf. discussion below) is an excellent example. Future systems should even reach the level of sophistication where corpus query software suggests a draft article for a specific lemma, similar, for example, to suggestions provided in semi-automated translation. The ideal role of the lexicographer would thus be comparable to that of the pilot of a fully automated and computerised modern jetliner overseeing processes with limited manual intervention. Sinclair et al. (1996: 177) emphasize the fact that “the correlation between the environment of a word and its meaning is very high.” The Malvern Workshop held in 1994 reflects on what is probably the most significant work ever done on translation equivalence in relation to bilingual lexicography (cf. Sinclair et al. 1996). Tognini-Bonelli (1996: 200) in reference to Firth (1957) emphasizes the point that “a major part of the meaning of an item arises from its relations with its immediate co-text as well as the general context of the situation” and that “the formal features observable in the co-text of a word or phrase will inevitably be so integrated that the boundaries between text and context, item and environment will become more and more difficult to define” (Tognini-Bonelli 1996: 202). Atkins (2005, forthcoming) says that the database of a bilingual dictionary is expected to give information on parameters, such as various senses, semantic content, semantic scope, morphological properties, valency, participation in idiomatic phrases, collocates, etc. Hanks describes a strategy for detailed pattern analysis of each verb. Such patterns “consist of a verb with its valencies, plus semantic values for each valency and other relevant clues, and are associated with an implicature that associates the meaning with the context rather than with the word in isolation” (2006: 1165). Thus the focus is on pattern analysis of the verb: The first step is to identify, by corpus analysis, all the patterns of normal use associated with each verb. The verb is the pivot of the clause, and many nouns will fall into place in a semantic ontology once their relationship – their normal relationship to verbs – is known. (Hanks 2006: 1168)
Calzolari (1996) refers to automated techniques for analysing and extracting lexical information from textual corpora. She gives a correlation between linguistic phenomena and levels of analysis represented here in table format (Table 6). Given the state-of-the-art in our field, some of the linguistic analyses listed above can be performed automatically with good coverage and a good success rate (those at the top in the above figure [Table 6]), others allow at least semi-automatic processing, while the last ones (at the bottom) are more difficult to perform successfully either for coverage or for adequacy or both. (Calzolari 1996: 7)
The role of corpora in future dictionaries
Table 6. Correlations between linguistic phenomena and levels of analysis (Calzolari 1996: 6) Levels of analysis
Correlations
1. Form layer
1 → a, b, c, d, e and g
2. Lemma layer
2 → b, c, d, f and h
3. Category layer
3. → h and i
4. Syntactic function layer 4. → i and j 5. Semantic layer
5. → f, g and j
Phenomena a. M orphological idiosyncrasies pluralia tantum etc.) b. Compounds c. Collocations d. Idioms e. Fixed phrase f. Semantic fields g. Lexical disambiguation h. Subcategorization frames i. Syntactic regularities j. Knowledge extraction
Atkins and Grundy describe a method of ‘lexicographic profiling’ performed for over 400 lemmas “with regard to all aspects of inherent properties and corpus use” (2006: 1103). The example given for echo comprises 31 such properties including lexical form, corpus profile, part-of-speech, syntax, semantics, morphology, etc. (Atkins and Grundy 2006: 1104–1105) What will be required according to Pérez Hernández is “a methodology for automatic pattern recognition procedures …” (1996: 218). For the compilation of bilingual dictionaries such pattern recognition procedures have to be performed on corpora, thus in a sense bringing together what is advocated by Hanks with what is produced through Word Sketches, FrameNet, WordNet, etc. Although Word Sketches, FrameNet, WordNet etc. differ in nature, they are all corpus based and the lexicographer regards them as advanced tools for pattern recognition. 6. Major initiatives of lexical resource building With corpus sizes on the increase, lexicographers were faced with the overwhelming task of processing ‘raw’ data effectively. They soon realised that the amount of data, e.g. thousands of concordance lines generated for each lemma, was becoming impossible to study in such a ‘raw’ format. The logical next step was to design systems that could further process, interpret, sort and calculate corpus data, such as the FrameNet, Wordnet and Word Sketches, which hold the key to the future of the corpus. Each of these resources is described briefly below.
D.J. Prinsloo
6.1
FrameNet
The developers of the FrameNet project perceive the lack of large-scale lexical resources as an impediment to the development of Natural Language Processing (NLP) applications: A significant roadblock to the development of practical natural language processing applications is the lack of large-scale lexical resources with the right kind and amount of information. Such resources must cover the breadth of a language’s basic vocabulary, but also provide appropriate syntactic, semantic and statistical information about individual lexical items. The FrameNet project is an attempt to create this next generation of lexical database. (FrameNet 2009)
Consider Table 7 and Figure 4, which serve to illustrate the efficacy of the FrameNet platform with its context-specific definitions. Table 7. FrameNet entry for the lemma argue from Atkins et al. (2003: 337) No.
frames
lemma
pos
Display
Definition
1
Evidence
argue
V
With FE
argue argue
V V
With FE
constitute evidence in support of something exchange diverging or opposite views give reasons or cite evidence in support of something
2 3
Conversation Reasoning
FE=Frame Elements
Figure 4 shows the FrameNet entry concerning an aspect of South Africa’s defence programme.
Figure 4. Full-text annotation reports: South Africa Introduction (FrameNet 2009)
The role of corpora in future dictionaries
6.2
WordNet
WordNet is a large lexical database of English and may be described as a network of meaningfully related words and concepts: Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. (WordNet 2009)
Consider a selection of the data for bank in Figure 5 which shows these interrelationships.
Figure 5. A selection of the data for bank in WordNet (2009)
6.3
Word sketches
Word sketches are described by Kilgarriff et al. (2004 and http://www.sketchengine.co.uk/) as one-page automatic, corpus-based summaries of a word’s grammatical and collocational behaviour. From Figure 6 it is for instance clear that exercise frequently appears as the object of verbs such as do, undertake, complete, etc. or takes as a modifier practical, regular, physical, etc. See Atkins and Rundell (2008: 109) for a more detailed description of the basics of lexical profiling with specific reference to word sketches. In addition to the three Internet platforms outlined above, some other useful and interesting lexicographic tools have been developed. These are now briefly outlined.
D.J. Prinsloo
Figure 6. Word sketch for exercise ukWaC freq = 224381 (Sketch Engine 2009)
6.4
Human Brain Cloud and FrameGrapher
The Human Brain Cloud is a variation on the ‘ubiquitous beetle’ mind-map strategy where the keyword is graphically illustrated in relation to associated concepts. Although intended more as a word association game, this interesting word-relation spider may be of assistance to lexicographers and has potential for future development as a lexicographic source reflecting semantic relations in an accessible, non-complicated way. Figure 7, for instance, is a snapshot of a semantic relations build-up in relation to Monday. Monday is first and essentially related to other days of the week, but also to emotive and practical responses as in ‘blue Monday’ and ‘work day’. FrameGrapher, as with WordNet, provides a framework of interrelationships. Compare FrameGrapher’s relation network for Criminal_investigation in Figure 8a and its definition in Figure 8b. The Core Frame Elements (FEs) are investigator and incident. See Atkins and Rundell (2008: 144) for a more detailed description of the basics of frames and frame elements.
The role of corpora in future dictionaries
Figure 7. Snapshot taken during the build-up of Human Brain Cloud Monday (Human Brain Cloud 2009)
Figure 8a. Frame relations of Criminal_investigation in FrameGrapher (FrameNet 2009)
Figure 8b. Definition of Criminal_investigation (FrameNet 2009)
D.J. Prinsloo
6.5
Brigham Young University Corpus query tools
The BYU Corpus of American English is a text corpus of 360 million tokens. The tokens are equally divided among spoken words, fiction, popular magazines, newspapers, and academic texts and are also historically balanced – 20 million words for each year from 1990–2007. Free online searches include advanced search options, such as those for collocates, part-of-speech, comparison of related words, and the frequency and distribution of synonyms, as well as combinations of such search functions. Compare Figures 9 and 10 for beautiful, Glasnost, CD and DvD.
Figure 9. Related words, statistical and graphical information for beautiful (The Corpus of Contemporary American English 2009)
Glasnost, for example, was frequently used in the period 1990–1994 as a fashion word but its use rapidly declined in the period 1995–2007 from 3.6 per millon words to 0.1 p/m. CD enjoyed a steady increase in frequency between 1990 and 2004 but started to decline. DVD grew rapidly in frequency from zero frequency in the period 1990–1994 to 19.5 per million words in the period 2005–2009, probably at the expense of CD.
The role of corpora in future dictionaries
Figure 10. BYU Corpus frequency breakdown in terms of genre and history for Glasnost, CD and DvD (The Corpus of Contemporary American English 2009)
D.J. Prinsloo
6.6
Automatic level detector for defining vocabulary
Restricting the lexicographer to a set of defining vocabulary is common practice in many dictionary compilation projects. Macmillan English Dictionary for Advanced Learners (MED), for example, even includes its list of defining vocabulary in the back matter. Lexicographic tools alerting the lexicographer as to whether words were used, for example, in the paraphrase of meaning or in examples of usage that are not in the list of defining vocabulary need to be developed further. In other words, those words presumed to be too difficult for the target user to understand should be developed. The dictionary writing system Mātāpuna has a built-in test facility ensuring that all words used in definitions come from a defining vocabulary. 6.7
Dictionary writing systems, the incorporation of lexicographic tools and sophisticated corpus queries
The absence of commercially available dictionary writing systems up to approximately the year 2000 was quite apparent and detrimental to lexicographic practice by individuals and smaller publishing houses. The past decade, however, has experienced the development of dictionary writing systems such as Mātāpuna, SIL Toolbox, Tshwanelex, and IDM. In the absence of dictionary writing systems, dictionary compilers generally reverted to general-purpose tools, especially to word processors that could offer little more than ‘what-you-put-in-is-what-you-get-out’. The Mātāpuna Dictionary Writing System is a web-based, multi-user, multilingual dictionary writing system currently used to compile Te Taura Whiri i te Reo Māori (The Māori Language Commissions monolingual dictionary of the Māori language). According to its own description (http://www.matapuna.org/), the system “assists with many aspects of lexicography, including team collaboration, routine error and consistency checking, corpus searching, publishing, and progress monitoring in addition to the traditional headword and entry management.” TshwaneLex is a dictionary writing system for the compilation of dictionaries or terminology lists. It contains many specialised features of which a few will be briefly outlined below. According to its purported claims, TshwaneLex enables: any organisation producing dictionaries to dramatically reduce dictionary compilation time, increase the quality and consistency of their dictionaries, and manage larger projects with larger teams. These features include an integrated Corpus Query System, immediate article preview, full customisability, styles system, automatic cross-reference tracking, automated lemma reversal, online and electronic dictionary modules, export to MS Word and typesetting systems (such as InDesign and XPP). (Tshwanelex 2009)
The role of corpora in future dictionaries
Typical features and advantages of using dictionary writing systems instead of standard word processing software include the following. Dictionaries can be compiled in a shorter space of time; automation of functions, such as automatic control of numbering, font size, font type (normal, italics, boldface) capitalization; lemma reversal; cross-reference checking and updating; defining vocabulary utilization. Some dictionary writing systems also include corpora and advanced corpus query outputs such as word sketches and word behavioural patterns and word nets (cf. Word Sketches, WordNet, and FrameNet and BYU corpus queries, already discussed). Multiple dictionaries could be generated from the same database. It is possible for dictionary writing systems to provide a friendly, adaptable user-interface. Compare, for example, Figures 11 and 12 for user-friendly and multi-functional input screens of Tshwanelex. While treating the lemma setlaela (a dumb person) the lexicographer sees the cross-reference to e.g. lešilo (fool) and also that setlaela is a reference address of its plural form ditlaela.
Figure 11. Tshwanelex: indicating cross-references in relation to setlaela (Tshwanelex 2009)
D.J. Prinsloo
Figure 12. Integrated corpus query tool, with auto-extracted usage example in Tshwanelex (2009)
Figure 13. LexiView screen for the word Markt (Heid et al. 2004)
The role of corpora in future dictionaries
In Figure 12 usage examples are automatically offered to assist the lexicographer in the treatment of absolutely. Heid et al. (2004) and Evert et al. (2004) describe tools for upgrading printed dictionaries by means of corpus-based lexical acquisition. The activities include a comparison between corpus and dictionary data. The results of the comparison are used by LexiView, an interactive user interface (GUI) for interactive inspection. The lexicographer can see a wealth of information with e.g. inclusion versus deletion indicators. 7. Concluding remarks The corpus of the future is expected to be a huge, but thoroughly selected and heavily annotated database constituting the basis for corpus query programs. The future of the corpus for lexicographic use revolves around the sophistication and ability of corpus query programs to extract, compile and present the corpus data to the lexicographer. The ideal situation would be one where the lexicographer could play the role of a final editor of dictionary articles prepared by a dictionary writing software. Notes 1. It is not indicated in the introduction section of COBUILD 2 how these figures were calculated. 2. The University of Pretoria Sepedi Corpus (PSC) is a collection of ca. six million running words of Northern Sotho, containing texts from different genres and domains.
References A.
Dictionaries
[COBUILD 2] Sinclair, J. (ed.). 1995. Collins COBUILD English Dictionary. London: HarperCollins. [HAT] Odendal, F.F. and Gouws, R.H. 2004. Verklarende Handwoordeboek van die Afrikaanse Taal. Doornfontein: Perskor. [MED] Rundell, M. (ed.). 2007. Macmillan English Dictionary for Advanced Learners. Second Edition. Oxford: Macmillan. [WAT] Hauptfleisch, D.C. (ed.). 1984. Woordeboek van die Afrikaanse Taal. VII. Bureau of the WAT: Stellenbosch.
D.J. Prinsloo
B.
Other literature
American National Corpus: http://www.cs.vassar.edu/~ide/papers/anc-lrec04.pdf (Accessed 20 February 2009). Atkins, B.T.S. 2005, forthcoming. “Me Lexicographer, You Translators or Context-free (vs. context-sensitive) translation and what it involves.” In Translation and Meaning Part 7: Proceedings of the Maastricht Session of the 2005 Maastricht- Lódz Duo Colloquium on “Translation and Meaning”, held in Maastricht, The Netherlands, 18–21 May 2005, M. Thelen and B. Lewandowska-Tomaszczyk (eds.). Atkins, B.T.S. and Grundy, V. 2006. “Lexicographic profiling: an aid to consistency in dictionary entry design.” In Proceedings of the Twelfth EURALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006, E. Corino et al. (eds.), 1097–1107. Atkins, B.T.S., Rundell, M. and Sato, H. 2003. “The contribution of Framenet to practical lexicography.” International Journal of Lexicography 16(3): 333–357. Atkins, B.T.S and Rundell, M. 2008. The Oxford Guide to Practical Lexicography. Oxford/New York: Oxford University Press. Bank of English: (Accessed 20 February 2009). Bergenholtz, H. and Tarp, S. 1995. Manual of Specialised Lexicography: The preparation of specialised dictionaries. Amsterdam/Philadelphia: John Benjamins. Braasch, A. and Povlsen, C. (eds.) 2002. Proceedings of the Tenth EURALEX International Congress, EURALEX 2002, Copenhagen, Denmark, August 13–17, 2002. Copenhagen: Center for Sprogteknologi. British National Corpus: (Accessed 20 February 2009). BYU Corpus of American English: (Accessed 20 February 2009). Calzolari, N. 1996. “Lexicon and corpus: a multi-faceted interaction.” In Euralex ‘96 Proceedings I-II, Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, Sweden, M. Gellerstam, J. Järborg, S-G. Malmgren, K. Norén, L. Rogström and C.R. Papmehl (eds.), 3–16. Gothenburg: Department of Swedish, Göteborg University. Corino, Elisa et al. (eds.). 2006. Proceedings of the Twelfth EURALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006. Alessandria, Italy: Edizioni dell’Orso. The Corpus of Contemporary American English: (Accessed 20 February 2009). De Schryver, G-M. and Prinsloo, D.J. 2000a. “Electronic corpora as a basis for the compilation of African-language dictionaries, Part 1: The macrostructure.” South African Journal of African Languages 20(4): 291–309. De Schryver, G-M. and Prinsloo, D.J. 2000b. “Electronic corpora as a basis for the compilation of African-language dictionaries, Part 2: The microstructure.” South African Journal of African Languages 20(4): 310–330. De Schryver, G-M. and Prinsloo, D.J. 2003. “Compiling a lemma-sign list for a specific target user group: The Junior Dictionary as a case in point.” Dictionaries. Journal of The Dictionary Society of North America 24: 28–58. Evert, S., Heid, U., Säuberlich, B., Debus-Gregor, E. and Scholze-Stubenrecht, W. 2004. “Supporting corpus-based dictionary updating.” In Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004, G. Williams and S.
The role of corpora in future dictionaries Vessier (eds.), 255–264. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud. Firth, J.R. 1957. Papers in Linguistics 1934–1951. London: Oxford University Press. FrameNet: (Accessed 20 February 2009) Gellerstam, M., Järborg, J., Malmgren, S.-G., Norén, K., Rogström, L. and Papmehl, C.R. (eds.). 1996. Euralex ‘96 Proceedings I-II, Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, Sweden. Gothenburg: Department of Swedish, Göteborg University. Hanks, P. 2006. “The organization of the lexicon: semantic types and lexical sets.” In Proceedings of the Twelfth EURALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006, E. Corino et al. (eds.), 1165–1168. Alessandria, Italy: Edizioni dell’Orso. Heid, U., Säuberlich, B., Debus-Gregor, E. and Scholze-Stubenrecht, W. 2004 “Tools for upgrading printed dictionaries by means of corpus-based lexical acquisition.” (Accessed 20 February 2009). Human Brain Cloud: (Accessed 20 February 2009). IDM: (Accessed 20 February 2009). Kennedy, G. 1998. An Introduction to Corpus Linguistics. London: Longman. Kilgarriff, A., Rychly, P., Smrz, P. and Tugwel, D. 2004. “The Sketch Engine.” In Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004, G. Williams and S. Vessier (eds.), 255–264. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud. 105–115. Landau, S.I. 2001 Dictionaries: The Art and Craft of Lexicography. Second Edition. Cambridge: Cambridge University Press. Mātāpuna: (Accessed 20 February 2009). Media24: (Accessed 20 February 2009). Oxford English Corpus: (Accessed 20 February 2009). Pérez Hernández, C. 1996. “A Pilot study on translation equivalence between English and Spanish.” International Journal of Lexicography 9(3): 218–237. Prinsloo, D.J. 2004. “Revising Matumo’s Setswana – English – Setswana Dictionary.” Lexikos 14: 158–172. Prinsloo, D.J. and De Schryver, G.-M. 2002. “Designing a measurement instrument for the relative length of alphabetical stretches in dictionaries, with special reference to Afrikaans and English.” In Proceedings of the Tenth EURALEX International Congress, EURALEX 2002, Copenhagen, Denmark, August 13–17, 2002, A. Braasch and C. Povlsen (eds.), 483–494. Copenhagen: Center for Sprogteknologi. Prinsloo, D.J. and De Schryver, G.-M. 2003. “Effektiewe vordering met die Woordeboek van die Afrikaanse Taal soos gemeet in terme van ’n multidimensionele Liniaal [Effective Progress with the Woordeboek van die Afrikaanse Taal as Measured in Terms of a Multidimensional Ruler].” In’n Man wat beur: Huldigingsbundel vir Dirk van Schalkwyk, W. Botha (ed.), 106–126. Stellenbosch: Buro van die WAT. Prinsloo, D.J. and Gouws, R.H. 2000. “The use of examples in polyfunctional dictionaries.” Lexikos 10: 138–156. Prinsloo, D.J. and Gouws, R.H. 2006. “Fashion words in Afrikaans dictionaries: a long walk to lexicographic freedom or just a lexical fly-by-night?” In Proceedings of the Twelfth EU-
D.J. Prinsloo RALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006, E. Corino et al. (eds.), 301–312. Alessandria, Italy: Edizioni dell’Orso. Roberts, R.P. and Montgomery, C. 1996. “The use of corpora in bilingual lexicography.” In Euralex ‘96 Proceedings I-II, Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, Sweden, M. Gellerstam, J. Järborg, S-G. Malmgren, K. Norén, L. Rogström and C.R. Papmehl (eds.), 457–464. Gothenburg: Department of Swedish, Göteborg University. Scottish Corpus of Texts & Speech: http://www.scottishcorpus.ac.uk/ (Accessed 20 February 2009). SIL: (Accessed 20 February 2009). Sinclair, J., Payne, J. and Pérez Hernández, C. 1996. (eds.). “Corpus to corpus: a study of translation equivalence.” International Journal of Lexicography. 9(3): 171–273. Sketch Engine: (Accessed 20 February 2009). Summers, Della. 1993. “Longman/Lancaster English Language Corpus – criteria and design.” International Journal of Lexicography 6(3): 181–208. Teubert, W. 1996. “Comparable or Parallel Corpora?” International Journal of Lexicography 9(3): 238–264. Tognini-Bonelli, E. 1996. “Towards Translation Equivalence from a Corpus Linguistics Perspective.” International Journal of Lexicography 9(3): 197–217. Tshwanelex: (Accessed 20 February 2009). UCREL: (Accessed 20 February 2009). Wikipedia: (Accessed 20 February 2009). WordNet: (Accessed 20 February 2009). Williams, G. and Vessier, S. (eds.). 2004. Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud.
Lexicographical data in natural-language systems Franziskus Geeb Natural Language Systems, in the sense of interactive, text-based conversations between man and machine such as a chatbot, are widely used in business, entertainment, and research. The success of a chat robot is justified substantially by its ability to lead a dialogue with a chat partner and make meaningful statements. As knowledge bases for this communication, terminological/ lexicographical data are conceivably useful. A large structural depth and well-known definitions of user lexicographical data offer a good basis for an automated discussion between a chatbot and a human user. This project clarifies the circumstances under which lexicographical data can be used as a knowledge base for a chatbot and evaluates the chances for success of such data. Keywords: knowledge bases; lexicographical data; chat robots; monolingual online dictionaries; database ontologies
1. Communication with chatbots and lexicography 1.1
State of the art
Communication between man and machine (computer) is widely known and was written about as early as 1966 by Joseph Weizenbaum. The main idea was and is to write a program that enables the computer to have a meaningful conversation with its user in natural language. Example 1 reproduces a conversation between a human and a chatbot taken from one of the well-known chatbots “ELIZA” (Weizenbaum 1966).
1. 2. 3. 4. 5.
User: hi Chatbot: How are you today? What would you like to discuss? User: how can you help me Chatbot: Don’t you believe that I can help you? User: yes
Franziskus Geeb
6. Chatbot: I understand. 7. User: I ’m so sad 8. Chatbot: Tell me more... 9. User: My hamster died 10. Chatbot: That is quite interesting. 11. ....
Example 1: NLP-addiction (2008) Chat communication is located somewhere between written language and spoken language, and information technology does not really limit the conversation, e.g. by a limitation of words or syntax (Geeb 2007). Users are in general allowed to type anything into the input field. Thus, the main problems for the software of a chatbot are: 1. Is the input written in a language that is familiar to the chatbot? 2. Is the input a complete sentence, part of a sentence, or something else (exclamation)? 3. Is the input a correct sentence? 4. Does the chatbot know the words from the input (usually after weak/strong stemming and deleting stop words from the input)? If not, it could be a dialect, sociolect or even usual LGP words that are not part of the chatbot’s knowledge base. 5. Is the input somehow related to the chatbot’s former answers to earlier input from the user or answers from the chatbot? 6. Is the user input some kind of small talk, or is it a question to a given knowledge base of the chatbot? 7. How do I (chatbot) answer questions or user input in general that I (chatbot) do not understand? 8. How do I (chatbot) learn from user input — or should I not use the user’s input as logical and semantically relevant information?
Consider the dialogue in Example 2.
1. 2. 3. 4. 5. 6.
User: “What do you know about cars?” Chatbot: “They have four wheels.” User: “Cars have five wheels.” Chatbot: “Are you sure that cars have five wheels?” User: “Yes” Chatbot: “Ok, I have learned that cars have five wheels.”
Example 2: Dialogue between user and chatbot
Lexicographical data in natural-language systems
The wrong information from the user must not be given on purpose but could be the result of insufficient knowledge in this specific area. The effect would be the same — incorrect information in the knowledge base. 9. How do I (chatbot) handle offending input, which tends to be a remarkable part of the user input in some environments (usually young users just playing or technically interested users who want to take the chatbot to its limit)? In a lexicographical environment, chatbots are fairly new (Geeb 2007) but can be used for different purposes always related to text reception of L1-texts by L1-users (Geeb 1998: 39 et seq.; Bergenholtz and Tarp 2003: 174): 1. Extending a search interface from a keyword search to a user centred natural language search by extending the search in the lexicographical database from the known one-word search to a search of different words from the user input, applying some kind of ranking on the knowledge base (dictionary) such as term weight. This would be a knowledge-orientated lexicographical function (Bergenholtz and Tarp 2003: 176). 2. Clarifying the user’s needs and intentions in the search through checkbacks for better search results (Montero and Araki 2007; Popp et al. 2007). 3. Giving the user cross-references to information in the lexicographical database that is not part of the search results (e.g. lemma) that the user expects. This could be achieved through an elaborate cross-reference structure (Bergenholtz and Tarp 1995: 215 et seq.) or even better with a systematic classification (Bergenholtz and Tarp 1995: 195 et seq.), taxonomy, or ontology connected to the information in the lexicographical database. 4. The chatbot could learn from the user’s input regarding linguistic and semantic or even cross-referenced information. Irrespective of whether the user can be trusted or not. On the other hand, user participation is an important feature of modern lexicography but also in the actual trends in the Internet known as Web 2.0. 5. Leading the user to relevant content that the user otherwise would not read. If a dictionary user is interested in some information but the intention of the author and/or publisher (Geeb 1998a: 60 et seq.) is to give the user more information, hoping that this could affect the user’s knowledge, this makes sense. Teenagers who want to know when they are allowed to buy alcoholic drinks may look up the lemma “Alcopops” searching for semantic and/or encyclopaedic information. There, they are informed about the legal age, but the encyclopaedic information to this lemma is rather long, and the users probably do not have the patience to read it. Instead, they are led to a quiz that contains roughly the same content as the encyclopaedic information but in a funnier way, with interaction and immediate feedback after each question and answer.
Franziskus Geeb
Reading or knowing the encyclopaedic content of this lemma could be important to the user because they are informed, e.g., about the period of time in which alcohol remains in the body. The intention of the author and publisher of this dictionary is obviously not just to inform users about the legal aspects of alcohol but to give users a better basis for a decision about whether to drink alcohol or not. 6. Providing a better search interface to a (lexicographical) database (Hammer and Schmoecker 2004: 163 et seq.) and thus theoretically including more lexicographical functions (Bergenholtz 1997). 7. Establishing some kind of customer relationship between the website (lexicographical) user and the (lexicographical) product (Koch 2002: 145; Pelka 2003: 24). This can be achieved by providing the chatbot with a character (et. Images) expressing emotions etc. (Thiedeke 2001: 55 et seq.). This is just one example, but the idea of using a chatbot to lead the user to information that is not part of the original user’s intention (Geeb 1997) but may be of interest to the user can be applied to many other use cases. Last but not least, in learning environments, e.g. software for all kinds of “lifelong learning” while in active employment, this connection between a chatbot, a lexicographical database, and the concept of pushing information to the user in a certain situation is promising. From the point of view of a chatbot programmer, the main reason for using lexicographical products (“lexikographische Nachschlagewerke” as described by Geeb (1998a: 35 et seq.) and “ordbog” described by Bergenholtz 1994: 56 et seq.) as a layer in the knowledge base of a chatbot is the highly structured information that can usually be found in lexicographical databases. Here an important distinction has to be made, however. Monolingual LGP dictionaries that mainly contain linguistic information such as the German Duden (1999) are in general not that useful for a chatbot knowledge base apart from the fact that this kind of linguistic information could be used for syntactic or semantic parsing of the user input or construction of chatbot answers. Monolingual LGP dictionaries with semantic and/or encyclopaedic information (Geeb 1998; 1998a) such as Webster’s (1996) or even encyclopaedias such as Brockhaus (2006) would be useful for the chatbot if the areas to be covered by the chatbot are part of the dictionary. The same restriction applies to specialized dictionaries. If the chatbot’s conversation is to be limited to the field of the LSP dictionary and if this dictionary contains not only linguistic information but also semantic and encyclopaedic information, the dictionary would be of great value for the chatbot knowledge base. Bilingual LGP dictionaries (lexicographical products) and bilingual LSP dictionaries (lexicographical products) tend to have some semantic information in L1 or L2 related to their user function. They may be of some value for a theoretical chatbot that is able not only
Lexicographical data in natural-language systems
to answer but also to translate input and answers (Shigeoka 2002: 269). In addition to all the problems a regular monolingual chatbot has to take into account, many other unsolved problems from machine translation and bilingual LSP dictionaries (Bergenholtz and Pedersen 1999) would apply to this chatbot. Therefore, this kind of chatbot relying on a L1-L2 or L2-L1 dictionary in one user function or another is currently only of theoretical interest. In general monolingual dictionaries — or even broader: “lexicographical products” — are useful as part of a knowledge base for a chatbot if they contain semantic and encyclopaedic information as well as many cross- references or even a systematic classification, taxonomy, etc. 1.2
Lookedup4you
The project lookedup4you (Geeb and Spree 2005), which was carried out for the consumer advice centre “Verbraucherzentrale Nordrhein Westfalen” in Germany, is a dictionary for young consumers aged from about 10 to 16. Students of the University of Applied Sciences, Hamburg (Germany), wrote the programs as well as the content of the dictionary, while the main idea and the concept were developed by some researchers at the same university. The project uses two main technologies to establish a lexicographical-enabled chatbot: AMIL, a special chatbot language, and a relational database. Both features will be discussed here as technology on the one hand and as a concrete implementation at lookedup4you on the other hand. This will lead to several conclusions regarding the possibilities of the actual system as well as conclusions regarding concepts of future developments in this area. The dictionary at lookedup4you (2005) is a specialized monolingual dictionary in German with lemmata from different fields. All lemmata are related to the main Web site, shown in Figure 1, even though the dictionary is a system of its own. The concept of using a chatbot as a search interface for the lexicographical database aims to give better search results as well as to motivate the user group to use the well-structured information of a dictionary almost without knowing that the chatbot is part of the dictionary interface. The dictionary has been online for several years and is used by many of the visitors of the centre’s main Web site even though it contains no more than about 270 articles. Nevertheless, all articles contain text and cross references to other articles, content on the main Web site, and other Web sites. Short films that explain the lexicographical content of an article and quizzes that present the content in a different and almost entertaining way are additional features of some articles (Figure 1).
Franziskus Geeb
Figure 1. Lookedup4you main Web site (2005)
The main problem for this chatbot is to decide whether the user input is only regular chat or whether it should be treated as a query on the dictionary. The strategy used in this particular implementation is to run both kinds of conversations at the same time. This sometimes results in confusing answers, such as the dialogue reproduced in Example 3. 1. User: Was ist eigentlich Homebanking 2. Chatbot: Ich weiss nicht genau. — Der Artikel ”Homebanking” passt genau zu Deinem Thema. (1. User: What does Homebanking mean? 2. Chatbot: I don’t know exactly. – The article „Homebanking“ perfectly matches your topic.) Example 3: Dialogue between user and chatbot The first answer “Ich weiss nicht genau” (I don’t know exactly) is from the small talk part, and the second is from the dictionary search that will be described in detail below. The reason for using this strategy is that it would be hard to decide where small talk ends and a query that can be answered by the dictionary begins. With this in mind, the obvious solution would be to connect small talk and dictionary information in only one system such as an ontology, but this concept has not been implemented in the actual installation as shown above.
Lexicographical data in natural-language systems
2. Structured data for chatbots 2.1
ELIZA and AIML
“ELIZA is a program that makes natural language conversation with a computer possible” (Weizenbaum 1966: 36). Even if this is proved to be incorrect, the main idea in Weizenbaum’s program is still being used by several chatbots. The program looks for keywords and transforms them into other words that can be used in an answer. In the example of the centre’s dictionary, a dialogue could be performed in the way reproduced in Example 4. 1. User: Wer hat das Telefon erfunden (User: Who invented the telephone?) 2. Chatbot – Software: • Finding words: {(Wer) (hat) (das) (Telefon) (erfunden)} • Finding known parts (“keywords”): (Wer) • Finding transformation rule: (Wer) = Ich weiß nicht wer (.) (I don’t know who (...)) Answer: Ich weiss nicht wer (hat) (das) (Telefon) (erfunden) (I don’t know who (invented) (the) (telephone)) Example 4: Dialogue between user and chatbot In this way, a conversation would be possible in many cases just by asking back or repeating the input of the user in a different way. This is not just small talk but a way to keep the conversation with the user going even if the meaning of this conversation is doubtful. In a lexicographical context, there is no advantage in using this system apart from what the key words would be, e.g. lemmata. This concept is used by the actual chatbot in lookedup4you. “AIML (artificial intelligence markup language) is an XML-compliant language that’s easy to learn and makes it possible for you to begin customizing an Alicebot or creating one from scratch within minutes” (AIML 2008). The fact that XML is an easy-to-learn and widely used standard for representing data and several AIML-applications in different fields (Lundqvist et al. 2006; Popp and Huber 2007; Shawar and Atwell 2004) supports this statement. AIML is the key to use a so-called Alicebot (Wallace 2005), which is a chatbot and nothing more than a piece of software parsing the directives in the AIML file(s) using pattern matching. The user input is not parsed in a linguistic sense but just matched against strings containing possible user input (Shawar and Atwell 2004: 408). Other well-known chatbot projects running without AIML still use the pattern matching strategy, too
Franziskus Geeb
(Tatai et al. 2003). The software to use AIML in a front-end chatbot can be found in many programming languages and is mainly for Web use. A typical conversation written in AIML could be formalised as shown in Example 5. KANNST DU DAS NICHT AENDERN