Lexicography in the 21st Century: In honour of Henning Bergenholtz [1 ed.] 9789027289018, 9789027223364

This is a state-of-the-art volume on lexicography at the beginning of the 21st century. It also offers proposals for fut

238 98 3MB

English Pages 356 Year 2009

Recommend Papers

Reflections on 21st Century Human Habitats in India: Felicitation Volume in Honour of Professor M. H. Qureshi (Advances in 21st Century Human Settlements) 9811630992, 9789811630996

This book highlights various dimensions of human habitats in 21st Century India. The human habitats in the country are m

113 55 7MB Read more

Introducing Criticism in the 21st Century 9780748695300

Provides a wide-ranging guide to current directions in literary criticism GBS_insertPreviewButtonPopup(['ISBN:9780

111 73 4MB Read more

Climate Change in the 21st Century 9780773581296

Understanding the world's biggest crisis - and why it's not just an environmental problem.

100 85 4MB Read more

Navies in the 21st Century 9781473849921, 9781473849914

"A range of naval experts . . . build[s] a diverse and deeply-thought out picture of where maritime warfare is now

150 79 7MB Read more

Marine Biotechnology in the 21st Century

591 72 818KB Read more

Trade in the 21st Century 9780815729051, 9780815729044

537 24 21MB Read more

Health in the 21st Century 9264910948, 9789264910942

This report explores how data and digital technology can help achieve policy objectives and drive positive transformatio

642 11 5MB Read more

Globalization in the 21st Century 1538179725, 9781538179727

The fate of globalization in the 21st century hangs in the balance. Although recent data show that most global integrati

111 39 3MB Read more

Gold Mining in the 21st Century

Everything You Need and Want to Know About Modern Gold Prospecting! This one book outlines EVERYTHING a beginner will n

121 78 14MB Read more

Obesity in the 21st Century 3031391675, 9783031391675

In recent years, the focus of obesity prevention has been on diet. However, diet is only one of many factors which play

118 35 3MB Read more

Lexicography in the 21st Century: In honour of Henning Bergenholtz [1 ed.]
9789027289018, 9789027223364

Author / Uploaded
Sandro Nielsen
Sven Tarp

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Lexicography in the 21st Century

Terminology and Lexicography Research and Practice (TLRP) Terminology and Lexicography Research and Practice aims to provide in-depth studies and background information pertaining to Lexicography and Terminology. General works include philosophical, historical, theoretical, computational and cognitive approaches. Other works focus on structures for purpose- and domain-specific compilation (LSP), dictionary design, and training. The series includes monographs, state-of-the-art volumes and course books in the English language.

Editors Marie-Claude L’ Homme University of Montreal

Kyo Kageura

University of Tokyo

Consulting Editor Juan C. Sager

Volume 12 Lexicography in the 21st Century. In honour of Henning Bergenholtz Edited by Sandro Nielsen and Sven Tarp

Lexicography in the 21st Century In honour of Henning Bergenholtz

Edited by

Sandro Nielsen Sven Tarp University of Aarhus

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Library of Congress Cataloging-in-Publication Data Lexicography in the 21st century : in honour of Henning Bergenholtz / edited by Sandro Nielsen, Sven Tarp. p. cm. (Terminology and Lexicography Research and Practice, issn 1388-8455 ; v. 12) Includes bibliographical references and index. 1. Lexicography. I. Bergenholtz, Henning. II. Nielsen, Sandro. III. Tarp, Sven. P327.L4328

2009

413'.028--dc22 isbn 978 90 272 2336 4 (hb; alk. paper) isbn 978 90 272 8901 8 (eb)

2009021345

© 2009 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Prof. dr. phil et habil Henning Bergenholtz

Table of contents Introduction: Nothing is more practical than a good theory Sandro Nielsen and Sven Tarp

ix

part i. The dictionary, dictionary structures and access routes 1.

Sinuous lemma files in printed dictionaries: Access and lexicographic functions Rufus H. Gouws

2.

Reviewing printed and electronic dictionaries: A theoretical and practical framework Sandro Nielsen

3.

Reflections on data access in lexicographic works Sven Tarp

4.

Hybrid text constituent structures of dictionary articles: A contribution to the expansion of the theory of textual dictionary structures Herbert Ernst Wiegand

3

23 43

63

part ii. Dictionary functions and users 5.

On production-oriented information in Swedish monolingual defining dictionaries Sven-Göran Malmgren

6.

Balancing the tools: The functional transformation of lexicographic tools for tourists Patrick Leroyer

7.

Lexicography and language planning in Scandinavia and the Netherlands Lars S. Vikør

93

103 123

 Lexicography in the 21st Century

part iii. Subject-field classification and introductions 8.

Subject-field classification for metalexicography revisited Bo Svensén

9.

Systematic introductions in specialised dictionaries: Some proposals in relation to accounting dictionaries Pedro A. Fuertes-Olivera

147

161

part iv. Data retrieval and corpus lexicography 10. The role of corpora in future dictionaries D.J. Prinsloo

181

11. Lexicographical data in natural-language systems Franziskus Geeb

207

part v. Collocations and phraseology 12. A methodology for describing collocations in a specialised dictionary Marie-Claude L’Homme 13. Lexicographic description: An onomasiological approach on the basis of phraseology Jón Hilmar Jónsson

237

257

14. Item-specific syntagmatic relations in dictionaries Thomas Herbst

281

Henning Bergenholtz: Bibliovita Sandro Nielsen and Sven Tarp

309

Notes on contributors

335

Subject index

339

Introduction Nothing is more practical than a good theory Sandro Nielsen and Sven Tarp

The history of dictionaries can be traced back several thousand years. During all this time a huge number of lexicographical products have seen the light of day, but many dictionaries have been relegated to obscurity whereas few have become part of the popular idiom. Similarly, the number of great lexicographers is small compared to the totality of lexicographers since the dawn of lexicography. Until the middle of the 20th century these household names included only lexicographical practitioners, who were primarily linguists, and lexicography was widely regarded as applied linguistics. However, during the last four decades in particular the academic community has realised that the scope of lexicography is not limited to linguistics, and academics started to develop general and specific theoretical principles that could explain the nature of dictionaries and help lexicographers to develop new and improved dictionaries. Since the 1980s, one of the prominent members of the lexicographical community is Professor Henning Bergenholtz, who is both a well-known dictionary author and theoretical scholar, thereby bridging the gap between theory and practice. Henning Bergenholtz started his career as a primary and secondary school teacher and went on to study mathematics and German at the University of Aarhus in Denmark, but moved to Germany to study mathematical linguistics at Technische Universität in Berlin from where he got his master’s degree in 1973. In 1975 Henning Bergenholtz was awarded the degree of Doctor of Philosophy and in 1978 was awarded the degree of Doctor habil. from the University of Essen. He has been professor at a number of universities in Europe and Africa and in 1986 he was appointed the world’s first professor of lexicography at the Aarhus School of Business, University of Aarhus, and has since gone on to establish himself as a preeminent expert in lexicography. Through his work he has gained an international reputation as an expert in morphology, the use of computer readable corpora, the relationship between grammar and lexicography, and the development of the



Lexicography in the 21st Century

theory of lexicographical functions for printed and electronic dictionaries. The Bibliovita shows that Henning Bergenholtz’s contributions to lexicography are fundamental and span both theoretical and practical aspects. It is difficult to give a full picture of a man with as much scope as Henning Bergenholtz, so this volume is dedicated to the field of lexicography, which has been greatly influenced by his extensive research and other activities, teaching of lexicography, dictionary reviewing and post-graduate teaching through the last four decades. He is also known as an organiser of conferences, editor and referee of lexicographical journals and most recently as director of Centre for Lexicography – Research into Needs-Adapted Information and Data Access, Aarhus School of Business, University of Aarhus. Coupled with his extensive contacts around the world, especially in Europe, Africa and Latin America, he has made his mark on the lexicographical landscape. The encouragement and maintenance of excellent scholarly values and critical inquiry is a hallmark of his career. He takes genuine pleasure in interacting with intelligent and inquiring young minds, spurring them on to greater things and this has spawned close and lasting friendships all over the world. The contributions to this volume make up only a small proportion of them. This volume aims at describing the state-of-the-art in lexicography at the beginning of the 21st century and making proposals for future theoretical and practical work in the field. Lexicography is now in a transitional period and needs to find its place in the research world. The work of Henning Bergenholtz can be seen as a bridge between lexicography as it was practiced in the 20th century and lexicography as it will be practiced in the 21st century. The contributors, given their various relationships with him and his work, have taken off from their own experiences and research journeys in the light of Henning Bergenholtz’s rich work, and gone in many of their own exciting directions in this volume. The contributors are all highly regarded international scholars in the field of lexicography and have published authoritative theoretical works on lexicography as well as numerous printed and electronic dictionaries of high quality. Their contributions show how the new theory of lexicographical functions can extend the forefront of the discipline by focusing on dictionary functions and how these can meet the needs of users in various types of user situations. This emphasizes the proposition that the true object of lexicography is the dictionary as a tool that can help users solve problems they encounter in communicative, cognitive and operative situations. We are greatly indebted to the contributors for taking time out of extremely busy schedules to honour Professor Henning Bergenholtz on the occasion of his 65th birthday. We are grateful for a diverse and excellent set of papers to honour

Introduction 

Professor Bergenholtz and are pleased that the papers also reflect the true spirit of Professor Bergenholtz’s research in and dedication to lexicography as well as his wide network and influence in the international lexicographical community.

part i

The dictionary, dictionary structures and access routes

Sinuous lemma files in printed dictionaries Access and lexicographic functions Rufus H. Gouws In general monolingual and bilingual dictionaries a horizontal ordering of lemmata by means of lemma niching and nesting has primarily been developed as macrostructural space-saving procedures. Too often it impedes optimal access to data and eschews the relevant lexicographic functions. This paper argues that neither space-saving nor linguistic motivations suffice as the only criteria for macrostructural ordering procedures. Various access problems are discussed and it is shown that the use of nesting and niching is only permissible if it contributes to satisfying the lexicographic functions. All articles in a dictionary should embody the functions of that dictionary. It is shown how different levels of functional success can be achieved by different types of lemma clustering and when clustering should be avoided. Keywords: macrostructural procedures; niching; nesting; sinuous lemma files; lemma selection; access routes; lexicographic functions

1. Introduction The twenty-first century sees dictionaries as products of a scientific practice with a well-developed underlying theory. The needs of dictionary users are constantly changing and these changing needs should compel lexicographers to continue developing lexicographic theory and improving the quality, relevance and efficiency of dictionaries. Like all reference sources, dictionaries are consulted because the users need to retrieve specific information. Having identified the user-profile and being familiar with the needs of the intended target users lexicographers have to include the required data in their dictionaries. The focus of lexicographers of general monolingual and bilingual dictionaries has to be on the data they need to present but it is equally important to present the data in such a way that the users have rapid and unimpeded access to the data to ensure an optimal retrieval of information. The



Rufus H. Gouws

selection and presentation of data should be done in such a way that the envisaged function(s) of the dictionary can be satisfied in order to achieve its genuine purpose. This applies to both printed and electronic dictionaries. Both these types of dictionaries are containers of knowledge and the specific user-profile and lexicographic functions will determine the nature and extent of the data to be included. The approach to dictionary structures that applies to printed dictionaries needs a serious re-planning when devising electronic dictionaries. Real differences do exist. As an example, the notion of a macrostructure only applies to printed dictionaries. The users of electronic dictionaries, and here the term electronic dictionaries excludes the CD-ROM version of printed dictionaries where users see the same page layout on their computer screens that they see in the printed product, are not confronted with an ordered set of lexical items included in the dictionary as treatment units. Access to a given lemma sign takes the user to that lemma sign and its article but not within the alphabetical context seen in a printed dictionary but rather as an independent article, isolated from other articles and not part of a typical article stretch. Given these imminent changes to be formulated with regard to the structures of electronic dictionaries and looking at the future of dictionaries as utility tools one can assume that the notion of macrostructure will increasingly become less relevant. Many aspects relevant to the macrostructure of printed dictionaries are no longer relevant or even interesting in the domain of electronic dictionaries. One such aspect is the use of sinuous lemma files. Electronic dictionaries do not have to embark on a space-saving procedure of this kind. However, the immediate future of lexicographic tools still sees printed dictionaries as an important and persisting role player. Although research in theoretical lexicography should be directed at the future and the tools of the future, it should not eschew the importance of tools that will still be in use for, at least, the immediate future. This demands that metalexicographic research, besides its focus on electronic dictionaries, should still also be directed at printed dictionaries – including issues regarding the macrostructure of printed dictionaries. Metalexicographic development should have all aspects of dictionaries in its scope and innovative developments still need to be introduced with regard to printed dictionaries. Theoretical lexicographers should not try to escape this responsibility. This paper focuses on one aspect of printed dictionaries, i.e. the use of a sinuous lemma file. Although this macrostructural issue is of no importance to electronic dictionaries the procedure of horizontal ordering still prevails in printed dictionaries and still needs to be the target of ongoing metalexicographic research. Looking at lexicography in the twenty-first century implies that existing procedures should be re-evaluated to see whether they still contribute to the best possible way of ensuring the success of the genuine purpose of a given dictionary. The

Sinuous lemma files in printed dictionaries

horizontal macrostructural ordering is a candidate for such an evaluation. Horizontal ordering can be found in different types of dictionaries and represents a well-established macrostructural procedure. This paper focuses primarily on general monolingual and bilingual dictionaries, with the main emphasis on aspects regarding access and lexicographic functions. 2. Macrostructural procedures: A brief overview 2.1

Different procedures

The macrostructural items in general dictionaries show different ordering procedures. Two prominent methods of arrangement are the vertical and the horizontal methods, also referred to as straight alphabetical ordering and the use of a sinuous lemma file. The vertical arrangement is the default method and all dictionaries, also those that employ a horizontal arrangement, primarily display a vertical arrangement. Dictionaries that have a sinuous lemma file will always have this horizontal ordering complementing the vertical arrangement as primary macrostructural ordering system. The occurrence of a horizontal ordering always presupposes the existence of a more comprehensive vertical ordering. The notion of a sinuous lemma file implies a deviation from the default ordering. Horizontally ordered articles are niched or nested within the overall inclusive vertical presentation. A horizontal ordering must always be regarded as a motivated deviation from the default and never as a default ordering system in its own right. The nature of and motivation for this deviation must not be taken for granted but needs to result from a critical evaluation of its contribution within the domain of the envisaged functions of the given dictionary. Two types of horizontal ordering can be distinguished, i.e. niching and nesting. 2.2

Niching

Niching is characterised by a strict adherence to the alphabetical principle. This applies within the niche but also in the relation between the first lemma in the niche and the immediately preceding vertically-ordered lemma, and the last lemma in the niche and the immediately following vertically-ordered lemma respectively. Although a niche typically contains words with the same first stem and these words are often semantically related, this is no prerequisite for this type of sinuous lemma file because some dictionaries also display niches where no semantic relation exists and even where the lemmata do not have the same lexical item as their





Rufus H. Gouws

first stem. Example 1 shows a niche attached to the article of the preceding lemma sign, here the lemma sign limf (lymph). limf, lymph; ~a´ties, (-e), lymphatic; ~klier, lymph gland; ~sel, (-le), lymphocyte; ~stelsel, lymphatic system; ~vat, lymphatic (vessel); ~vog, chyle; ~weefsel, lymphoid tissue. Example 1: Excerpt from Groot woordeboek/Major dictionary (1997) The application of niching as a type of horizontal ordering is dominated by its adherence to the alphabetical ordering, even if it means that niched words are removed from their semantic environment, as can be seen in Example 2 where the niche has been attached to the article of the lemma sign date2. date1, (n) dadel (vrug). date2, (n) datum, dagtekening; afspraak; jaartal; tydperk; BEAR the ~, gedateer wees; BRING up to ~, (boeke) bywerk; BLIND ~, onbeplande afspraak, lukraak afspraak, toe-oë-afspraak, molafspraak; ~ of EXPIRY, vervaldatum; FIX the ~, jaar en dag bepaal; ~ of ISSUE, datum van uitgifte; KEEP up to ~, byhou; ~ of MATURITY, vervaldag; OUT of ~, verouder; ouderwets, uit die mode; UP to ~, nuwerwets, byderwets, modern; by (tot op) datum; op hoogte van die tyd; in die mode; (v) dateer, dagteken; reken; die datum vasstel; afspraak maak (met ’n meisie); ~ from, dagteken van; reken van; met ingang van; ~d, gedateer, verouderd; be ~d, genooi word (deur ’n man), geskiet word (stud.); ~less, ongedateer, sonder datum; ~ line, datumgrens; datering;… ~ loaf, dadelbrood; ~ mark, jaarmerk, datummerk; datumstempel; ~ oil, dadelolie; ~ palm, dadelpalm; ~ plum, dadelpruim; ~ press, stempelmasjien, datummasjien; ~r, stempelmasjien; ~ stamp, datumstempel; ~ stone, dadelpit; ~ wine, dadelwyn. Example 2: Excerpt from Groot woordeboek/Major dictionary (1997) English has two homonyms date, both presented as guiding elements of separate articles in this dictionary. Horizontal ordering in this dictionary is dominated by the alphabetical ordering, resulting in niching as the prevailing type of sinuous lemma file. Although both lexical items date occur as first stem in compounds or derivations selected as niched lemmata, the compounds semantically related to the first homonym date are not attached as a niche to the article of this lemma. All the complex forms with date as first stem are combined in a single niche, maintaining the alphabetical order of the article stretch, attached to the article of the second lemma date. In this niche the lexical items date loaf, date oil, date palm, date plum, date stone and date wine have the lexical item date, given as first member of the homonym pair, as their first component. They bear no semantic relation

Sinuous lemma files in printed dictionaries

to the lexical item date represented by the lemma sign date2 or to any of the other lemmata in the niche attached to the article of this lemma sign. In its horizontal ordering this dictionary consistently works with a system of niching and strictly adheres to the alphabetical ordering. Niching typically but not exclusively sees compounds and derivations attached to the article of the vertically ordered lemma sign formally resembling the first stem or root of the specific item. The Chambers 20th Century Dictionary (1983) follows the most typical approach by applying it only to derivations and compounds. In the preface it is stated that this is a space-saving procedure. In the 1996 edition of the Duden Deutsches Universalwörterbuch the alphabetical ordering results in lemmata representing simplexes sometimes being attached to the articles of lemmata representing complex forms. The word Lehnbildung is entered as a vertically ordered lemma sign with words like Lehnbildung, Lehne and lehnen attached as niched lemmata. The fifth edition (2003) of this dictionary no longer employs procedures of horizontal ordering but only uses a straight alphabetical ordering. Monolingual dictionaries often include a brief treatment of the niched lemmata. In the Duden Deutsches Universalwörterbuch (2003) a niche is attached to the article of the lemma sign Pudding and this niche contains the lemmata Puddingabitur, Puddingform and Puddingpulver. The article of each of these niched lemmata contains a stress marker, syllable division, grammatical data and a brief paraphrase of meaning. 2.3

Nesting

A distinction can be made between first and second level nesting (cf. Gouws and Prinsloo 2005). Like niching first level nesting also adheres to a strict alphabetical ordering on nest-internal level. An alphabetical deviation prevails between the last lemma in the nest and the first subsequent vertically ordered lemma, cf. the stretch presented in Example 3. ba´ba (-s; -tjie) 1 Baie jong kindjie; suigeling: ’n Baba vang (geselst.), behulpsaam wees met die geboorte van ’n kind. UITDR.: Die baba met die badwater uitgooi – sien onder BADWATER. 2 Die jongste: Die baba van die span. 3 (fig.) Iets wat deur iemand aan die gang gesit is en dus deur hom (vertroetel word): Die skema is sy baba. baba: ~bed, ~dogtertjie, ~handjie, ~kleertjies, ~seuntjie, ~taal, ~tehuis, ~uitset. ba´ba·lief Liewe baba – as liefkosende naam. ba´ba·sit·ter





Rufus H. Gouws

Iemand wat (teen betaling) kinders oppas wanneer die ouers nie tuis is nie. Example 3: Excerpt from HAT (2005) The nest attached to the article of the lemma sign baba internally displays a strict alphabetical ordering. However, the two vertically ordered lemmata babalief and babasitter, immediately following the nest of horizontally ordered lemmata do not follow the last lemma of the nest alphabetically. This results in first level nesting. It may be argued that a cluster like the one in the previous example, and this also applies to some examples of niching, does not represent a partial article stretch but rather a listing of microstructural entries presenting word formation products containing the lexical item represented by the lemma sign. This argument is countered by dictionaries like HAT and the comprehensive multivolume monolingual Afrikaans dictionary Woordeboek van die Afrikaanse Taal (1951-) where the users’ guidelines specifically refer to these entries as lemmata. Hausmann and Wiegand (1989: 336) refer to this as a lemma clustering and thereby acknowledge the macrostructural status of these entries. This claim is substantiated by examples like the following where the user can retrieve additional information, i.e. syllable stress, along with the presentation of orthographic data, from the articles of the horizontally ordered lemmata minnegodin and modellering (Example 4). min´negod, -e. God van liefde: Amor, Kupido, Eros; minnegodin´. modelleer´, ge-. 1. Voorstel, boetseer, vorm (in klei, was, ens.). 2. Klere dra om dit bekend te stel; modelleerder;.le´ring. Example 4: Excerpt from Verklarende Afrikaanse woordeboek (1993) In dictionaries like the already mentioned Chambers 20th Century Dictionary and the Duden Deutsches Universalwörterbuch a less restricted treatment is offered, including a brief paraphrase of meaning. This can also be seen in examples from different dictionaries discussed in Wiegand (1989: 387–392). The focus in this paper will not be on the potential debate amongst lexicographers whether examples like those from HAT and Verklarende Afrikaanse Woordeboek qualify as lemmata or not. In this paper, as in Hausmann and Wiegand (1989) they are regarded as lemmata participating in niching and nesting procedures by means of lemma clustering. Second level nesting prevails where the cluster-internal arrangement does not adhere to a strict alphabetical ordering, as seen in the partial article stretch reproduced in Example 5. broei (ge-) ww. 1. op eiers sit en hulle warm hou om hulle te laat uitkom. 2. voortkom, ontspruit. Daaruit sal onheil -. 3. peins, planne maak. Oor iets -.

Sinuous lemma files in printed dictionaries

4. ontwikkel, in wording wees. Daar is iets aan die – 5. hitte ontwikkel, warm word. Die mis, lug – 6. warm word deur gisting. Die hooi -. 7. deur spesiale verwarming vroeër laat bloei of ryp word. 8. warm kry. In die son sit en -. 9. kleintjies voortbring. Die jakkalse – in die lente. ‘broeiery, broeiing; broeieend, -eiers, -gans, -hen, -hok, -kamer, -kolonie, -paar, – proses, -sak, -tent (by 1); -mis (by 5); -aarde, -bed (by 6); -bak, -glas, -huise (by 7). ‘broeiend (-e; -er, -ste) b.nw. 1. drukkend, … Example 5: Excerpt from Nasionale woordeboek (1987) Gouws and Prinsloo (2005) indicate that the nest in this example not only interrupts the alphabetical ordering with regard to the subsequent vertically ordered lemma (broeiend) but the text block internal ordering also deviates from a strict alphabetical ordering. Morpho-semantic motivations determine the second level presentation in this example. The first two nested lemmata (broeiery and broeiing) are derivations, presented in their own partial article stretch within the nest, whereas all the other lemmata are compounds. The nest-internal partial article stretch of compounds displays its own subdivisions with grouping in terms of the different polysemous senses of the first stem of the compounds, as indicated in the treatment of the article of the vertically ordered lemma broei (to breed) to which the nest has been attached. The very limited treatment allocated to these nested lemmata can help users to know which sense of the polysemous lexical item broei is activated in a given compound. 3. Motivations for the use of a sinuous lemma file 3.1

Space-saving endeavours

Deviations from a straight alphabetical ordering have traditionally been motivated on space-saving grounds. The use of niched and nested lemmata has become a frequent procedure in modern-day lexicography. Although this procedure has been discussed in various research papers (cf. Wolski 1989; Wiegand 1989; 2002), these discussions have primarily been of a contemplative nature. The focus has been on the ways in which existing dictionaries present a horizontal macrostructural ordering. Development in printed dictionaries for the twenty-first century demands a transformative approach to a topic like this in order to ascertain whether horizontal ordering is still a valid lexicographic procedure and to determine where possible improvements can be made in the event of a continued use of this form of macrostructural arrangement in printed dictionaries.





Rufus H. Gouws

The space-saving motivation for horizontal ordering does not only influence the direction of the ordering of lemmata but niched and nested lemmata often receive a much more restricted treatment compared to vertically ordered lemmata. The space-saving drive often also targets the form of the lemma sign, resulting in nested and niched lemmata being subjected to textual condensation. These procedures of textual condensation see horizontally-ordered lemma signs being formally reduced to a place-keeping symbol and a lemma part, as in the condensed lemma sign ~ loaf, representing the lexical item date loaf in the niche attached to the article of the lemma date2 in Groot woordeboek/Major dictionary, as seen in Example 2. 3.2

The selection of lemmata

When employing a sinuous lemma file lexicographers need to decide which lemmata to allocate to the niches or nests and what the selection criteria should be. Although space-saving has been a primary motivation for the use of a sinuous lemma file it cannot be a determining criterion in the selection of the lexical items to participate in procedures of horizontal ordering, because it has nothing to do with the functions of a dictionary. Lemmata included in niches and nests usually represent complex lexical items like compounds and derivations. Compounds and derivations are not exclusively accommodated in nests and niches. The selection and presentation of data in any dictionary should be determined by the functions of the specific dictionary. This also applies to the selection of lemmata for inclusion in the sinuous lemma files. When taking a transformative approach to the planning and compilation of dictionaries and the formulation of lexicographic theory, aspects that fall outside the scope of the functions of the dictionary should not be taken into account. Because the formulation of these functions results from the identification of the user-profile of the dictionary the macrostructural selection and presentation necessarily should also respond to the needs and reference skills of the intended target user group. The selection of candidates for nested and niched lemmata is often done on purely linguistic grounds, especially lexicological and semantic grounds. Whether linguistic motivations like these could and should still be employed in the macrostructural selection needs to be evaluated, and it may be opposed in a functionbased approach. Monolingual dictionaries often display a selection of complex lexical items in a nest or a niche with little or no microstructural entries addressed at these lemmata. The macrostructural presentation of these lemmata bear witness to their existence in the lexicon of the language and users get guidance regarding their spelling and certain aspects of the word formation process, e.g. whether a given lexical item requires a linking phoneme when it occurs as a stem in a complex

Sinuous lemma files in printed dictionaries

lexical item. In Afrikaans the occurrence of the lexical item universiteit (university) as first stem in a compound demands the linking phoneme -s-, except when the second stem starts with an s. On morphological grounds the two compounds universiteitsgebou (university building) and universiteitstudent (university student) could then qualify for selection as candidates for lemmata in a niche or nest attached to the article of the lemma sign universiteit. A typical criterion used for the selection of these niched or nested lemmata in monolingual dictionaries is that of semantic transparency. Even with the focus of many monolingual dictionaries on an explanation of the meaning of the lexical items to be treated, a restricted treatment and therefore an occurrence in the horizontal ordering is allocated to complex lexical items where the meaning of the complex word can presumably be deduced from the meaning of the components of that word. This is a very subjective approach that can hardly be approved from a functional perspective. With regard to these so-called self-explanatory complex items Philip Gove, editor of the Webster’s Third New International Dictionary (1961), remarked that the self in self-explanatory should refer to the intended user; not to the lexicographer. Too often this is not the case and the users who consult the dictionary for a given word because they are not familiar with its meaning have to make their own deductions regarding the meaning of the word. Pure linguistic motivations are used to select the candidates for niche or nest inclusion and the user has no guidance as to determine when a specific linguistic criterion will apply. The notion of a self-explanatory complex item in a monolingual dictionary does not have the concept of self-translatable complex item as counterpart in a bilingual dictionary, because a self-explanatory item in the source language seldom has a self-explanatory item maintaining a one to one relation between the respective stems in the target language. No linguistic or other motivation is given for the selection of items to be included in the horizontal ordering in bilingual dictionaries. The only indication is the fact that the complex words are related to the main lemma by being its derivations or compounds. No consistency or predictability can be found in this selection because some complex items are lemmatised in the vertical and some in the horizontal ordering. It seems to be a haphazard approach and this neither enhances the confidence with which a user consults a bilingual dictionary nor does it help to solve rapid access to these lemmata. Linguistic criteria are not sufficient to introduce a different type of lemmatisation. These criteria are based on the linguistic expertise of the lexicographers and their interpretation of the presumed transparency of a given complex word and relies on the linguistic intuition of dictionary users.





Rufus H. Gouws

4. Access 4.1

Rapid access and successful access

Without disregarding the importance of space and the need to save space, a problem not uncommon in the making of printed dictionaries, a lexicographer can hardly argue that space-saving procedures can or should play a role in achieving the functions of a given dictionary. The importance given to space-saving procedures in the implementation of niching, nesting and, especially textual condensation, often coincide with the ill-fated notion that the layout of a dictionary is not part of the lexicographic process but merely a, too often only cosmetic, assignment to be undertaken by the publisher (cf. Almind and Bergenholtz 2002). In the planning and compilation of any dictionary it has to be a priority to ensure successful access to data by the intended target users. Different users will employ different access routes to reach their required destination in a dictionary and the layout of the dictionary and of the individual articles play an extremely important role in this access process. Space-saving procedures, whether it is the dictionary and article layout, the use of textual condensation or the use of a horizontally ordered macrostructure, that impede an optimal access to data also impede an optimal retrieval of information and therefore also impede the fulfilment of the lexicographic functions and the consequent success of achieving the genuine purpose of the dictionary. The access process (cf. Bergenholtz and Gouws 2007) is initiated by a specific extra-lexicographic problem that needs to be solved. During the pre-consultation phase of the access process the user decides where and how to look for the solution to a specific problem. Users then embark on the outer search route to try and reach the required lemma sign. When evaluating the access process one should not only look at the eventual success of reaching the required lemma sign that functions as guiding element of a nested or a niched article. For users employing a dictionary as a utility tool the speed of access is also important. Where a dictionary is used, e.g. by language practitioners or translators who need rapid access to data in a dictionary when busy performing their professional duties, the layout and presentation play a vital role in defining the success of the dictionary as a tool for these users. But although rapid access is important and adds to the value of the dictionary as a practical instrument it is not part of the functions of the dictionary. A successful dictionary consultation procedure and an optimal retrieval of information to achieve a specific lexicographic function and to satisfy a given user need are still possible whether by means of rapid or slow access. The speed and easiness of the access have no direct bearing on the lexicographic functions but successful

Sinuous lemma files in printed dictionaries

access is a prerequisite for the fulfilment of the lexicographic functions of any given dictionary. Adherence to a strict alphabetical ordering, even within the sinuous lemma file, when employing a system of niching can be negotiated by many dictionary users; although not with the same success as the ordering according to a consistently applied straight alphabetical procedure. Where the intended target users lack the appropriate dictionary using skills lexicographers will do well to refrain from using a sinuous lemma file. First level nesting offers a slightly more complicated access route than niching because the user may not know that a given lemma is to be found in the nest and may continue searching down the vertical ordering. Having reached the next vertically ordered lemma sign, a lemma sign that alphabetically should precede the required lemma sign, the user may further continue down the vertical ordering, still expecting to find the relevant lemma sign. Failure to achieve that may then compel the user to backtrack on the search route and to enter the nest where the required lemma sign can be found. This problem of finding the lemma sign may impede successful dictionary use. 4.2

Access to nested lemmata

As indicated earlier a grouping into partial article stretches in the lemma cluster attached to the lemma sign broei (cf. Example 5) is done on account of word-formation types, i.e. derivations versus compounds. Within the partial article stretch containing the compounds a semantic motivation, i.e. the specific sense of the lexical item represented by the first stem of the compound, leads to the introduction of different subclassifications. The implementation of these methods of ordering may enhance the quality of the retrieval of linguistic information from the dictionary. However, this ordering of the sublemmata within the article nest is exclusively done on linguistic grounds. The lexicographers have decided that although the nested lemma signs represent semantically transparent complex words, the dictionary user still needs additional guidance to ensure the proper interpretation of the meaning of the given word; hence the grouping into partial article stretches of lemmata in which the same sense of the lexical item represented by the first stem occurs. Within this cluster two main additional ordering types are used, i.e. a word formation and a semantic ordering. They relegate the alphabetical ordering to a second level procedure. In a dictionary where the alphabetical ordering prevails, the introduction of new types of ordering in selected partial article stretches confuses the users. If a user consults this dictionary to find the word broeibed he/she may or may not know that the word is a compound and not a derivation. Not knowing the meaning of a word, and therefore consulting the





Rufus H. Gouws

dictionary, implies that this user will not know which sense of the lexical item broei prevails in the compound broeibed. A typical consultation procedure would therefore have to negotiate all the lemmata in the nest from the first one until the required lemma is found. If the user is familiar with the system used in the dictionary he/she might realize when they come to the end of the partial article stretch, reproduced in Example 6, broei-eend, -eiers, -gans, -hen, -hok, -kamer, -kolonie, -paar, – proses, -sak, -tent Example 6: Partial article stretch adapted from Nasionale woordeboek (1987) without having had success in finding the required word, i.e. broeibed, that its absence in this particular alphabetically ordered partial article stretch does not necessarily imply that the word has been omitted from the dictionary. They would realize that their search route may take them via different alphabetically ordered partial article stretches within the one article cluster and that they may still find the required word. Users less familiar with the system may not realize that and may easily quit the search if the required word has not been found in an alphabetically ordered partial article stretch where the last lemma(ta) would alphabetically have followed the lemma sign they are looking for. Here successful dictionary consultation relies on prior linguistic knowledge on the side of the user, knowledge not to be expected from someone having to consult this dictionary. Had this dictionary resorted to niching as procedure for the horizontal ordering of lemmata the result would have been as in Example 7. broei-aarde, -bak, -bed, -eend, -eiers, -gans, -glas, -hen, -hok, -huise, -kamer, -kolonie, -mis, -paar, – proses, -sak, -tent Example 7: Niching resulting in horizontal ordering of lemmata In spite of the condensed form of the lemma signs, a presentation of nothing more than place-keeping symbols and lemma parts, access to a word like broeibed would have been much easier in a niched cluster. From an access perspective future dictionaries should rather avoid the nested cluster version of sinuous lemma files. 5. Niching, nesting and lexicographic functions From the motivations for the use of a sinuous lemma file presented in a preceding section it is clear that lexicographic functions did not play any role at all. This is typical of many older dictionaries with regard to many decisions. Lexicography in

Sinuous lemma files in printed dictionaries

the twenty-first century needs to break away from this tradition and dictionaries should be planned and compiled with lexicographic functions having their rightful impact. Within the functional theory it has been emphasized that lexicographic functions should determine every aspect of the dictionary (cf. Tarp 2000). This implies that when evaluating a dictionary on the basis of linguistic functions one has to view each aspect of such a dictionary in terms of the contribution it makes (or does not make) to satisfying the functions identified for the given dictionary. From the perspective of dictionary functions niching does not necessarily have to be abandoned as a macrostructural ordering type. Problems with regard to niching are primarily restricted to the speed of access, with the rapidness of access being impeded by both the horizontal ordering and the frequent use of condensed lemma signs. Although the functional approach does not imply the abandonment of procedures of horizontal ordering, this macrostructural procedure, as seen in some of the dictionaries employing this system, has a limited value, compared to the typical application of vertically ordered macrostructures in similar dictionary types. In the examples from both monolingual and bilingual dictionaries presented in this article the space-saving endeavours in the use of a horizontal ordering of lemmata also induce a restricted treatment in the articles with these lemmata as guiding elements. In the examples from the bilingual dictionary the treatment was restricted to orthographic guidance, rudimentary data on inflection in some articles and a presentation of one or more translation equivalents without any cotextual data and, as an exception, perhaps an entry giving some contextual guidance. Although dictionaries like the Chambers 20th Century Dictionary and the Duden Deutsches Universalwörterbuch give horizontally ordered lemmata a treatment not too different from that given to their vertically ordered counterparts, the examples of articles from the monolingual dictionaries given in this paper display an even worse situation than that prevailing in the bilingual dictionaries, with no sign of a paraphrase of meaning, the most typical entry in the default articles of these dictionaries. An evaluation of the procedures of nesting and niching from a functional perspective should not only be directed at the macrostructural ordering but also at the implications of the ordering, i.e. the type of treatment allocated to the horizontally ordered lemmata. A well-balanced evaluation of such an ordering should not look at the target of the evaluation in isolation but should contextualise it within the given dictionary as a whole. It should be evaluated whether the functions foreseen for the dictionary as a whole can also prevail in the dictionary sections to be investigated, here the articles of niched and nested lemmata. Unfortunately many dictionaries, including the ones referred to in this paper, do not explicitly state their envisaged functions. Having being compiled in an era where the focus in theoretical lexicography had not yet been on lexicographic functions one cannot,





Rufus H. Gouws

however, argue that the functional approach had not played any role in the planning and compilation of these dictionaries. Even if the term lexicographic functions had not come into the discussion the selection and presentation of data had to be done in accordance with the dictionary type and the needs of the typical user of such a dictionary. The advent of the modern theory of lexicographic functions should not be seen as a phase in the development of lexicography where new theoretically developed ideas are merely imposed on the lexicographic practice. The theory of lexicographic functions brings the assignment of a dictionary as a utility tool to the fore and emphasises the way in which such a tool needs to be directed at the specific needs of specific users in a specific situation of dictionary use. Compiling a function-based dictionary does not imply the introduction of a whole new range of macrostructural or microstructural entry types. It rather leads to a selection of items that will best ensure the satisfaction of the relevant function(s). Although dictionaries do not always refer to their function(s) it may never be presumed that they do not have functions. Users consult those dictionaries for specific purposes and these dictionary consultations are invariably linked to specific lexicographic functions – whether explicitly mentioned or not. A transformative approach needs to create an increased awareness of dictionary functions and the way in which it should be dealt with in the lexicographic practice. The dictionaries referred to in this paper clearly employ a multifunctional approach, with the focus on both cognitive and communicative functions and, in the latter, directed at text reception as well as text production functions. The use of macrostructural procedures of horizontal ordering should be evaluated against this background. One should compare the information users can retrieve from the articles of horizontally ordered lemmata to the retrieval from articles of vertically ordered lemmata. This comparison could lead to a determination of the success in satisfying the relevant dictionary functions. Lexicographic treatment makes provision for varying degrees of assistance with regard to lexicographic functions. Users may consult a given dictionary for text production purposes and their need may be satisfied by a brief paraphrase of meaning, ascertaining a sense or meaning of the word which they need in the text production process or even by only getting the proper spelling form of the given word. This is a light or a weak form of text production assistance. A stronger form would include, besides the paraphrase of meaning, additional entries like example sentences illustrating the typical use of the given word, morphological entries enabling the user to use the word in its inflected forms, etc. A weak version of text reception needs prevails, e.g. where the user merely wants to link a complex word to the simplex word which is semantically related to the first stem of the complex form. The user may find the word babadogtertjie in a text and by consulting a monolingual dictionary a niched

Sinuous lemma files in printed dictionaries 

occurrence like the following may help the user to understand the word – especially if the user is a mother-tongue speaker of Afrikaans (Example 8). ba´ba (-s; -tjie) 1 Baie jong kindjie; suigeling:..... baba: ~bed, ~dogtertjie,... Example 8: Excerpt from HAT (2005) In a similar way the nature and extent of the treatment in horizontally ordered articles in a bilingual dictionary may ensure the degree of assistance regarding a given function that will sufficiently meet the needs of certain users. Limiting the entries in a given article does not necessarily mean a decrease in the lexicographic quality or the functional value. What is important is that the restriction, whether it applies to all articles or only to the horizontally ordered articles, should still result in an article that embodies the functions of that dictionary. A lexicographic functions based approach implies that all entries in a dictionary should be selected and treated accordingly. This also implies all the different components; consequently Gouws and Steyn (2005) have suggested the use of a transtextual approach to lexicographic functions so that the outer texts could also actively support the functions identified for the given dictionary. The decision to embark on a procedure of horizontal ordering should be made, taking cognisance of the functions of the intended dictionary. From a functional perspective the examples taken from the monolingual dictionaries HAT and Nasionale woordeboek (cf. Examples 3 and 5) offer no assistance in terms of the cognitive function, a function that should prevail where the genuine purposes of these dictionaries need to be achieved. Where a weak version of text reception assistance applies, and even in an extremely weak version of text production assistance, with the user interested in ascertaining only the spelling of a given form, the horizontal ordering as seen in the given examples of niching and even first level nesting may suffice. But the value of the dictionary may be of a more limited nature compared to other dictionaries with a consistent straight alphabetical ordering. When using niched and first level nested lemmata the lexicographer introduces a discriminating system into the dictionary whereby an envisaged function cannot be satisfied to the same degree in all the articles of the dictionary. Where the dictionary is designed to assist with a strong degree of text reception the given examples fail to satisfy the needs of the user, whereas the need for a moderate to strong text production function is totally ignored. Where the niched examples still allow reasonably easy access to the lemmata the application of second level nesting (cf. Example 9) yet again impedes access and eschews the lexicographic functions.



Rufus H. Gouws

‘broeiery, broeiing; broei-eend, -eiers, -gans, -hen, -hok, -kamer, -kolonie, -paar, – proses, -sak, -tent (by 1); -mis (by 5); -aarde, -bed (by 6); -bak, -glas, -huise (by 7). Example 9: Excerpt from Nasionale woordeboek 1987 Looking for text production assistance to use, e.g. the word broeibed a user will in the first instance have problems to access the lemma sign, as previously discussed. Semantic information cannot be retrieved in a direct way because no paraphrase of meaning is addressed at this lemma sign. The retrieval of semantic information regarding this word can only be achieved by means of complex and cumbersome methods in which an indirect form of remote partial addressing is involved. The entry (by 6) indicates that sense number six given for the preceding main lemma broei applies in this compound. The paraphrase of meaning given as sense number 6 is primarily addressed at the lemma sign broei and indirectly by means of article external remote addressing at the sublemmata broeiaarde and broeibed, each represented by the combination of a place keeping symbol and a lemma part, i.e. -aarde, -bed respectively. As is the case with access the clustering of nests as a procedure of horizontal ordering is also detrimental to the fulfilment of the lexicographic functions and should rather be avoided in general dictionaries. 6. New uses of lemma nesting In spite of some critical remarks regarding the use of lemma nesting in many existing dictionaries a specific version of this procedure can be employed to ensure an improvement in the lexicographic practice and lexicographers should be open to new uses of old procedures. Being under constant scrutiny the lexicographic presentation of treatment units will necessarily undergo changes from time to time and dictionaries in the twenty-first century should also give evidence of innovative applications in this section of the lexicographic practice. Lemmata are regarded as the default treatment units but microstructural entries can also be elevated to treatment units by means of procedures of non-lemmatic addressing. Lemma nesting can play an important role in ensuring a better presentation of certain treatment units. Traditionally, procedures of lemma nesting have primarily been applied to complex words. However, it has been indicated in this paper that a dictionary like the Duden Deutsches Universalwörterbuch has also included simplex forms in the nests attached to the article of a lemma sign representing a compound word. In general dictionaries, both monolingual and bilingual, multiword expressions are

Sinuous lemma files in printed dictionaries 

typically included and treated in the article of a lemma sign representing a word regarded as core of the multiword expression. A fixed expression like blood is thicker than water will typically be included in a slot for fixed expressions within the article of the lemma sign blood. Multiword fixed expressions are fully-fledged lexical items but their inclusion within the article of another lemma sign does not reflect their status as independent lexical items. Instead of including them as microstructural treatment units they need to be regarded as macrostructural entries. Gouws (2008) argues that acknowledging fixed expressions as macrostructural items demands allocating them an appropriate macrostructural position. Due to many practical problems regarding the alphabetisation of fixed expressions as main lemmata in the straight alphabetical vertical ordering of a dictionary, their occurrence in a text block attached to the article of a vertically ordered lemma, representing a word that has been identified as guiding element of the fixed expression, seems to be the best possible way of entering them in a dictionary. Such a text block could accommodate from a single to an infinite number of fixed expressions. This text block should not be seen as a microstructural slot but as a nested partial article stretch that includes multiword expressions as macrostructural treatment units. This might imply the sacrifice of the traditional notion of a straight alphabetical ordering in these dictionaries. In current dictionaries the access process guiding users to fixed expressions goes via a word selected as guiding element of the fixed expression. This will remain the same. As fully-fledged treatment units fixed expressions can now be dealt with in a systematic way, as is the case with other sublemmata. Different types of general dictionaries do not have to devise ad hoc methods to include and treat these items and as lemmata they can receive a more extensive treatment. This will ensure the systematic inclusion of fixed expressions in general dictionaries but will also enhance the way in which their presentation and treatment are in accordance with the lexicographic functions identified for a given dictionary. This is an important improvement on the current presentation and treatment that too often fail to assist users who consult general dictionaries with a strong text production need with regard to the use of fixed expressions. 7. Concluding remarks In the planning of future dictionaries lexicographers should base their decision whether or not to employ sinuous lemma files, not merely on lexicographic tradition. These procedures may be used but their use should be motivated in terms of the user-profile and the envisaged functions of the dictionary. Distinctions need to



Rufus H. Gouws

be made between different types of sinuous lemma files, including lemma clusters, and their implications for the various functions of the dictionary and the successful access to the data on offer. References A. Dictionaries Botha, W.F. et al. (eds). 1951–. Woordeboek van die Afrikaanse Taal. Stellenbosch: Buro van die WAT. De Villiers, M. et al. (eds.). 1987. Nasionale Woordeboek. Seventh edition. Cape Town: Nasou. Drosdowski, G. et al. (eds.). 1996. Duden Deutsches Universalwörterbuch. Mannheim: Dudenverlag. Dudenredaktion. 2003. Duden Deutsches Universalwörterbuch. Fifth edition. Mannheim: Dudenverlag. Eksteen, L.C. et al. (eds.). 1997. Groot Woordeboek/Major Dictionary. Fourteenth edition. Cape Town: Pharos. Labuschagne, F.J. and Eksteen, L.C. 1993. Verklarende Afrikaanse Woordeboek. Eigth edition. Pretoria: J.L. van Schaik. Gove, P.B. et al. (eds.). 1961. Webster’s Third New International Dictionary of the English Language. Springfield, Massachusetts: Merriam-Webster. Kirkpatrick, E.M. et al. (eds.). 1983. Chambers 20th Century Dictionary. Edinburgh: Chambers. Odendal, F.F. and Gouws, R.H. 2005. Verklarende Handwoordeboek van die Afrikaanse Taal. Fifth edition. Cape Town: Pearson Education.

B. Other literature Almind, R. and Bergenholtz, H. 2000. “Die ästhetische Dimension der Lexikographie.” In Bild im Text – Text und Bild, U. Fix, U. and H. Wellmann (eds.), 259–288. Heidelberg: C. Winter. Bergenholtz, H. and Gouws, R. H. 2007. “The access process for fixed expressions.” Lexicographica 23: 236–260. Gouws, R.H. 2008. “Fixed word combinations as second level treatment units in dictionaries.” Paper read at the Fifth International Colloquium on Lexicography, Bratislava. Gouws, R.H. and Prinsloo, D.J. 2005. Principles and Practice of South African Lexicography. Stellenbosch: African SunMedia. Gouws, R. and Steyn, M. 2005. “Integrated outer texts: a transtextual approach to lexicographic functions.” In Schreiben, Verstehen, Übersetzen, Lernen, I. Barz, H. Bergenholtz and J. Korhonen (eds.), 127–136. Frankfurt: Peter Lang. Hausmann, F.J. and Wiegand, H.E. 1989. “Component parts and structures of general monolingual dictionaries: a survey.” In Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 328–360. Berlin/New York: Walter de Gruyter. Tarp, S. 2000. “Theoretical challenges to LSP lexicography.” Lexikos 10: 189–208.

Sinuous lemma files in printed dictionaries  Wiegand, H.E. 1989. “Aspekte der Makrostruktur im allgemeinen einsprachigen Wörterbuch: alphabetische Anordnungsformen und ihre Probleme. ” In Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 371–409. Berlin/New York: Walter de Gruyter. Wiegand, H.E. 2002. “Über textuele Strukturen der Wörterbuchartikel und Artikelnischen im De Gruyter Wörterbuch Deutsch als Fremdsprache.” In Perspektiven der pädagogischen Lexikographie des Deutschen II, H.E.Wiegand (ed.), 413–442. Tübingen: Max Niemeyer. Wolski, W. 1989. “Das Lemma und die verschiedenen Lemmatypen.” In Wörterbücher. Dictionaries. Dictionnaires. An International Encyclopedia of Lexicography, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 360–371. Berlin/New York: Walter de Gruyter.

Reviewing printed and electronic dictionaries A theoretical and practical framework Sandro Nielsen Dictionary reviewing is an integral part of the lexicographic universe. However, lexicographers have called for generally applicable principles embracing both printed and electronic dictionaries. I propose that scholarly reviews contain information that is useful to their intended audiences and a set of principles that are generally applicable to achieve this goal. I suggest that reviewers may write useful reviews by treating the dictionary as an object of analysis and examination. I further propose that the lexicographic, factual and linguistic approaches should be combined with the three significant features of dictionaries: lexicographic functions, data and structures. Finally, I argue that a review should give a true and fair view of the dictionary, and this will strengthen the role of dictionary reviews in the scholarly discourse community. Keywords: dictionary reviewing; lexicographic functions; lexicographic data; lexicographic structures; significant features; useful information

1. Introduction Dictionary reviewing is an integral part of the lexicographic universe. As this universe has gradually expanded – the advent of electronic dictionaries is a case in point – it is necessary to take a fresh look at the work involved in dictionary reviewing. Reviews come in many shapes and sizes and some contribute to the development of lexicography more than others, for instance because of their substantive contents and the media in which they are published. Nonetheless, the communicative situations in which they are rooted affect readers’ perception of reviews, the way reviewers express themselves, and the extent of their contribution to lexicography. Some reviews are intended to be read by authors of dictionaries, other lexicographers, publishers, subject-field experts, and the specialist readers of a particular journal. Other dictionary reviews are written for potential buyers and the general readers of a popular magazine, both experts and laypersons.



Sandro Nielsen

Dictionary reviews are generally ranked low among the various types of scholarly writings. One reason is the lack of generally applicable principles, and as Jackson (2002: 173) suggests one of the most important issues for dictionary reviewing “is to establish a sound and rigorous basis on which to conduct the criticism together with a set of applicable criteria.” A statement like this calls for a set of general principles that embrace both printed and electronic dictionaries regardless of type, i.e. general and specialised dictionaries, monolingual and bilingual dictionaries etc. In this paper I will first attempt to identify the characteristic features of the academic or scholarly dictionary review and then propose some guidelines that can function as the basis of a theoretical and practical framework for writing reviews of printed and electronic dictionaries. The framework should be seen as a set of general principles underlying the detailed information that make up the actual review, so that the information contained in the review is useful to readers in the sense that it contributes to theoretical and/or practical lexicography. It does not attempt to impose a duty on reviewers to use specific and detailed information, words, etc.; the aim of the framework is to set out the objectives and basic principles of dictionary reviews. This involves the examination of the dictionary as an object of analysis, examination and description; the actual approach to reviewing; and the fundamental requirements as to the informative value of the review. 2. Identifying key features of scholarly reviews Reviews published in academia are examples of what is generally referred to as scholarly communication. More specifically, dictionary reviews can be viewed as a specific genre within the disciplinary community called lexicography. Individual disciplinary communities often share principles and practices, but it is widely accepted that each discipline has developed its own practices, principles and requirements. Kling and McKim (1999: 905) provide a general standard by claiming that scholarly communication is “a communicative practice anchored in three dimensions: publicity, accessibility, and trustworthiness.” Dictionary reviews meet the first two criteria if they are published in journals that are publicly accessible and potential readers are made aware of their existence. Fulfilment of the criterion of trustworthiness, also referred to as reliability, partly depends on whether reviews are peer reviewed (they are in effect published peer reviews) and partly on the practices, principles and requirements of the lexicographic community. The following discussion proposes a number of principles and requirements that may be used as benchmarks for trustworthy dictionary reviews. Not all dictionary reviews can rightly be described as academic or scholarly. One reason is that they are published in non-reviewed journals, do not have as

Reviewing printed and electronic dictionaries 

their genuine purpose to contribute to the development of lexicography, and nor do their authors intend them to do so. Most reviews in newspapers, popular weeklies and trade magazines are examples of non-academic reviews, and their genuine purpose is to draw attention to a new publication, to describe it and to inform consumers in general that it has been introduced to the market. This does not mean that such reviews can never contribute to the development of lexicography, but they are usually written and published in non-academic settings in which the actors and media are not known for their scholarly interests. In contrast, Jackson (2002: 183) argues that academic dictionary reviews generally have the potential of contributing to lexicographic research, and such reviews are written and read by researchers as a natural part of their job. Most people regard researchers and their scholarly products as being in a different league than ordinary people and ordinary products. But what distinguishes reviews that contribute to research from those that do not? One answer is provided by Phillips and Pugh (2005: 48), who argue that “Research goes beyond description and requires analysis.” If this is extended to lexicography, one result is that dictionary reviews should not merely be descriptive but contain more or less thorough analyses, evaluations and reflections. This is supported by findings reported in the theoretical literature. Ripfel (1989: 32–33) proposes that an academic dictionary review is an examination that should contain a description of the contents of the dictionary (or dictionaries) under review, make an evaluation, explain the reasons for making the evaluation (including an explicit statement about the underlying theories and methods employed), and draw positive, negative or mixed conclusions. In other words, scholarly reviews that contribute to lexicographic research are at a level above everyday reflection and contain analyses, evaluations and findings that can form the basis of research conducted by others. The next step is to examine the objective of academic reviews. In general, the objective or goal of reviews is to give useful information to readers. The potential reader groups of any review are many and varied, but the following are the most important for this discussion. The primary audience of academic reviews is made up of scholars and researchers. This group includes researchers within lexicography and within other disciplines, who read reviews to discover new insights that they can use in their own research. The authors and publishers of the dictionaries examined are also primary audiences, even though some authors of specialised and technical dictionaries are experts in research areas other than lexicography and not all publishers employ in-house lexicographers. Another important group consists of dictionary users, either existing or potential users who look for stateof-the-art lexicographic products to satisfy their needs in specific types of situations. Authors and publishers of other dictionaries constitute a group whose members may find inspiration and solutions in reviews, which they themselves can use



Sandro Nielsen

in their future work. The fifth relevant group consists of ordinary dictionary buyers, including existing and potential users, librarians and donors who search for high-quality gifts to, for instance, relatives. Finally, persons generally interested in literature or in domain-specific literature are also likely readers of dictionary reviews. It should be noted that the above groups may overlap, but they represent different, though general, types of primary and secondary audiences that may find the information in dictionary reviews useful. Information that is useful to readers may take many forms. In this context the term useful information means information that enables readers of dictionary reviews to make informed decisions in the following general types of situation: 1. When they develop practical and/or theoretical lexicographic principles; 2. When they form opinions about dictionaries in practical and theoretical contexts; 3. When they use dictionaries in their practical and theoretical work; 4. When they buy or contemplate buying dictionaries; 5. When they recommend dictionaries. The above discussion makes it possible to provisionally identify the key features of scholarly reviews: they are reviews providing information that is useful to readers in making decisions about the use, purchase or recommendation of dictionaries, in forming and holding opinions about dictionaries, or in contributing to the practical and theoretical development of lexicography. In order to create the proper basis for achieving this objective, it is necessary to take a closer look at dictionaries as the objects of examination. 3. Defining the object of dictionary reviews In every examination it is necessary to have a clear definition of what you are studying. Obviously, the objects of dictionary reviews are dictionaries. But what actually is a dictionary? First of all, it should be appreciated that in a review process the dictionary changes from being mainly an object for use and becomes an object of examination, description and analysis. A study of various theoretical contributions to dictionary criticism, including Chan and Taylor (2001), Jackson (2002: 173–183), Osselton (1989), Steiner (1979) and Tomaszczyk (1986), reveal a relatively strong bias towards a linguistic approach to dictionary reviewing. They generally treat dictionaries as containers of purely linguistic concepts and focus on linguistic categories such as word classes, affixes, pronunciation, grammatical information (e.g. inflection, comparison and syntax), semantic and encyclopaedic information and equivalents. This is in line with the traditional understanding of

Reviewing printed and electronic dictionaries 

dictionaries, one that is also found in the article dictionary in Oxford English Dictionary Online (2008): A book dealing with the individual words of a language (or certain specified classes of them), so as to set forth their orthography, pronunciation, signification, and use, their synonyms, derivation, and history, or at least some of these facts: for convenience of reference, the words are arranged in some stated form, now, in most languages, alphabetical; and in larger dictionaries the information given is illustrated by quotations from literature; a word-book, vocabulary, or lexicon.

This definition is traditional in that it is based on linguistic categories. A reviewer who bases his or her review on this or a similar definition runs the risk that the linguistic categories become a surrogate for the dictionary as they, and not the dictionary, become the object of the review. The approach to reviewing will then be linguistic and not lexicographic. This linguistically based understanding of the dictionary is the likely reason why, according to Bergenholtz and Mogensen (1993: 29), the two most frequent topics addressed in reviews are the lemma stock and semantic/encyclopaedic information. Rather than approaching the dictionary from a linguistically inspired point of view, it may be more appropriate to introduce what may be called a lexicographic approach to the concept of dictionary. According to Nielsen (2003: 111–112) this approach focuses on the significant features of a dictionary, i.e. features that are inherent in any dictionary and are not trivial for classification purposes. A significant feature helps to shed light on the existence of a dictionary as an object of investigation, description and analysis. The advantage of this approach is that it applies to printed, CD-ROM as well as online dictionaries, regardless of type; it highlights the fact that dictionaries are more than linguistic categories. Furthermore, it is theoretically neutral as it can be used as a starting point no matter which “theoretical school” reviewers belong to. Both printed and electronic dictionaries have at least three significant features. Firstly, the overriding feature of a dictionary is that it has been designed to fulfil one or more functions, referred to as lexicographic functions, e.g. communicative functions such as the understanding of texts, translation and text production, and cognitive functions such as knowledge acquisition in communication-free contexts. Secondly, the dictionary contains lexicographic data that have been selected to support the function(s) of the dictionary. The interrelationship between the first and second feature is important because each lexicographic function has been materialised through the items containing the data. Thirdly, the lexicographic structures combine and link the data in order to support and fulfil the dictionary function(s). Lexicographic structures may be order structures (e.g. the macrostructure), network structures (e.g. the cross-reference structure), information



Sandro Nielsen

structures (e.g. the distribution structure) and links in online dictionaries. These features should not be seen in isolation, as the dictionary is made up of the totality of the above three significant features and their interrelationship. A dictionary may have features other than those described above as being significant, but they will not be considered here. It follows that a dictionary is not merely a container of the lexicon of a language. It is much more than that. Only one of the three significant features directly relate to linguistics and linguistic principles, namely the lexicographic data. The other two features do not fall within linguistics with the result that linguistics concerns one-third of the concept of dictionary. Lexicographic functions deal with the help dictionaries can give users when they are engaged in a particular type of activity that has nothing to do with dictionaries per se, for instance translation. The way in which the data are structured and linked to give optimal help is also of no concern to linguistics, as evidenced by e.g. the creation of electronic links. Reviewers should therefore regard dictionaries as complex entities that provide specific types of help to specific types of users in specific types of situations. On the basis of the above discussion, it is appropriate to revert to the provisional identification of key features discussed in Section 2. The fact that the object of dictionary reviews is a complex entity with several significant features means that the provisional view expressed at the end of Section 2 needs to be extended. The objective of scholarly reviews is to provide information about the functions, data types and structures of one or more dictionaries that is useful to readers in making decisions about the use, purchase or recommendation of dictionaries, in forming and holding opinions about dictionaries, or in contributing to the practical and theoretical development of lexicography. This objective can be achieved through different approaches to reviewing. 4. Three approaches to reviewing dictionaries A framework embracing both printed and electronic dictionaries should not be biased towards linguistically based approaches to reviewing. In some instances, however, it may be appropriate to focus on linguistic elements, e.g. if the dictionary in question is a language dictionary designed for learners, such as Collins Cobuild English Dictionary (1995), Longman Dictionary of Contemporary English (2005) and Oxford Advanced Learner’s Dictionary (2005), particularly if the reviewer only or mainly addresses linguistic aspects, rules, structures etc. Nevertheless, if the full potential of a review is to be realised, including the review of dictionaries for specialised languages and technical dictionaries, it is necessary to look at reviewers and their competences in a new light.

Reviewing printed and electronic dictionaries 

There are three basic approaches to reviewing dictionaries that can and should be combined irrespective of dictionary type. The first is the lexicographic approach, which analyses, describes and evaluates the lexicographic practices and principles used in compiling the dictionary under review. These practices and principles may be compared with those lexicographic practices and principles that, for one reason or another, were not used by the dictionary authors. This approach requires knowledge of lexicographic theories, principles and practices, and it is relevant whether the dictionary is a printed or an electronic one. Secondly, the factual approach focuses on an analysis, description and evaluation of the factual (semantic and encyclopaedic) data and topics contained and treated in the dictionary related to the lexicographic functions described above. This approach requires knowledge of specific subject fields, and it is important whether the dictionary is a sub-field, single-field or multi-field dictionary as described in Nielsen (1990: 132–135). Finally, the linguistic approach involves an analysis, description and evaluation of the language treated by the dictionary; it requires knowledge of language for general purposes (LGP) and/or language for specific purposes (LSP) as appropriate, and it is relevant whether the dictionary is mono-, bi- or multilingual. In order to make reviews as useful to readers as possible, these three approaches may be directly related to the significant features of dictionaries. 5. Combining significant features and approaches to reviewing When striving to realise the full potential of a review, reviewers may combine the three significant features of dictionaries and the three approaches to reviewing. There are various options available to reviewers and they can be more or less detailed, but examples of potential combinations are discussed below. 5.1

Using the lexicographic approach

Using the lexicographic approach, reviewers may address general issues that are relevant to all three significant features. However, it is important to note that printed and electronic dictionaries are not a text type. As pointed out by Bergenholtz, Tarp and Wiegand (1999: 1763), the printed dictionary is a collection of text types, e.g. the preface, the user guide, the dictionary grammar, the subject-field component and the wordlist. These texts are structural components and it is appropriate to establish some sort of relationship between them. At a functional level, the components are related in terms of data contents, and at a structural level they are related in terms of the actual form of the dictionary. Accordingly, all the individual components constitute a coherent whole: the dictionary; and the relationships



Sandro Nielsen

between the components form the basis of defining the scope of dictionary reviews. The position is similar with online dictionaries, as they may and often do consist of a number of text files that are linked to the database in which searches are made. Some online dictionaries have prefaces, user guides and additional texts, for instance Oxford English Dictionary Online (2008) and Merriam-Webster Online Dictionary (2008). Each dictionary component may then be examined on the basis of criteria that apply to each text type or genre, and these criteria can then be related to the three significant features of the dictionary. For instance, it may be relevant to discuss whether the outside matter is integrated or non-integrated in the sense described in Nielsen (1999: 94), as this is directly related to the function(s) and structures of the dictionary; further see Section 5.2 below. In its front matter, American Heritage College Dictionary (2004) has a text titled “Style Manual”, which is clearly relevant to the communicative functions text production, text revision and text editing, but no mention is made of this fact in the user guide. Reviewers should point this out as it would have been desirable if the dictionary had explicitly made users aware of the potential help that can be found in the dictionary. This lack of attention to user guides by reviewers is also noted by Chan and Taylor (2001: 168), who claim that most reviewers do not “appreciate the potential contribution made by information contained in the user’s guide to using a dictionary to its fullest extent.” Reviewers who take into account the various independent, though interrelated, dictionary components are more likely to provide useful information to readers than those who do not. For instance, after having read the review of Van Dale Pocketwoordenboek Nederlands als tweede taal (Van Dale Pocket Dictionary Dutch as a Second Language) in Hiligsmann (2005), readers still do not know what the dictionary looks like, nor do they know which components it contains and how the data in the components relate and interact. Even if dictionaries contain a minimum of components whose data do not relate to each other or the lexicographic functions, this would be useful information. Similarly, reviews that merely mention that the dictionaries examined contain front and back matter texts (e.g. Koltzé 2008) without explaining whether and how the data support the functions of the dictionary do not fully live up to the requirement of providing useful information in the sense described in Section 2 above. Use of the lexicographic approach would lead to more accurate evaluations of dictionaries and, one would hope, more useful reviews. Reviewers may present the lexicographic approach by using the functional method, which looks at the function(s) of the dictionary. It is relevant to analyse, describe and evaluate what functions the dictionary may fulfil, e.g. communicative functions such as the understanding of texts, text production, text revision, editing or translation, and cognitive functions such as knowledge acquisition unrelated to specific oral or written communicative acts. This method allows the reviewer to

Reviewing printed and electronic dictionaries

demonstrate that the dictionary may be used for functions and in user situations that the authors did not mention in the dictionary’s metatext (usually in the preface), or that the dictionary is suitable − or not suitable − for all or some of the functions and user situations mentioned in the metatext. It is possible to make a comparative review by combining the functional method with the temporal method, which introduces a time element, e.g. by looking at different editions of a dictionary. This involves two methods of presentation: The synchronic presentation looks at the edition of the dictionary under review and includes comparisons with the latest editions of other dictionaries – or a comparison of the printed version, the CD-ROM version and the online version of the same dictionary – whereas the diachronic presentation includes prior editions of the dictionary examined and places it in an evolutionary and historical context (see for instance Kleinedler 2000: 140–141 and Algeo 2003). Here improvements of any kind, technological advances and options in printed and electronic dictionaries may be highlighted. Comparative reviews thus allow reviewers to place the dictionary concerned in a larger and broader context so that the analyses, evaluations and conclusions relate to other dictionaries and the development of lexicographic principles and practices. 5.2

Using the factual approach

The factual approach may come into play regardless of whether the dictionary treats general language or specialised language, because even general-language dictionaries contain technical terms and diatechnical labels, see for instance Collins Cobuild English Dictionary (1995), which contains subject and style labels such as journalism, legal, medical and technical. As most reviewers are not experts in all the subject fields covered by multi-field dictionaries, it may be necessary to limit the review to selected subject fields, whether linguistic, technical or otherwise. A problem like this may partly be overcome by forming review teams consisting of lexicographers, linguists and subject-field experts. However, both linguists and subject-field experts should keep in mind that the object of the review is the dictionary and not linguistics, nor one or more subject-fields. One example of a review written by a team of reviewers is Meer and Sansome (2001), in which one reviewer addresses linguistic issues and the other considers pedagogical issues. As explained by Miranda (1996: 196), it is important that reviewers of books in general live up to the standards of a scholarly community, which implies that a person reviewing a dictionary must be “competent in doing so because she or he is an active participant in and contributor to the book’s particular area of research.” Reviews made by such competent persons also support the criterion of trustworthiness in scholarly discourse.





Sandro Nielsen

In this context it is possible to distinguish between two types of reviews. The review may either be maximising, i.e. it attempts to give as exhaustive a description, evaluation and analysis of the dictionary as possible or it may be minimising in the sense that it is deliberately limited to selected aspects regarding the dictionary, e.g. subject fields. When making this distinction it is helpful to have a quantitative scale, and a number of writers have suggested “checklists” for reviewers that may serve as guidelines, for instance Steiner (1979), Chan and Loong (1999), Jackson (2002: 176–182) and Bergenholtz (2003). These lists contain, in various degrees of detail, a range of topics or issues that may be considered review topics and the most comprehensive is that found in Bergenholtz (2003: 20); see the adapted list of review topics in the Appendix. It is tentatively suggested that a review addressing 17 or more categories is maximising, whereas one that addresses less than 17 is minimising. This is, admittedly, a crude and purely quantitative distinction, but any other differentiation is difficult to use and uphold in practice. It is impossible to say definitively which categories are mandatory – perhaps with the exception of functions, as all dictionaries have at least one function – because no two dictionaries are alike. Synonyms are undoubtedly relevant in a review of a dictionary of synonyms, but this topic may be irrelevant in reviews of other dictionaries. Similarly, equivalents are only mandatory in reviews of bi- or multilingual dictionaries whose function is to provide help in case of translation. This distinction between minimising and maximising reviews is based on a similar distinction described in Nielsen (1994: 37–38) between minimising and maximising dictionaries. It is suggested that maximising scholarly reviews are prima facie more useful than minimising ones. Reviewers are free to write about any possible and relevant review issue and its relation to the significant features of the dictionary. This does not mean, however, that reviewers may exclude substantially all issues and only focus on one or two. It makes no sense to talk about a review of a/the dictionary if the review only focuses on, for instance, the dictionary’s treatment of one specific word class. A “review” like his does not inform the reader about the dictionary, but merely about a single issue at the micro-level of the dictionary. An extremely minimising “review” does not qualify as a review of the dictionary, because it does not have the dictionary as its object or theme, and it does not meet the objective of dictionary reviews as described in sections 2 and 3 above. Reviewers may also examine the lexicographic data and to what extent their factual contents relate to the functions of the dictionary. Lexicographic data are primarily found in the wordlist but may be placed anywhere in the dictionary depending on its distribution structure. The relevant data are found in the lexicographic components of the dictionary, i.e. those components that contain data about its use and function(s), for instance the user guide, the wordlist and a

Reviewing printed and electronic dictionaries 

subject-field component. Other components such as acknowledgements, advertising texts, picture credits and single user licence agreements are extra-lexicographic components as described in Tarp (1998: 128–130) and Nielsen (2006: 7–8). Some lexicographic components contain data that aim to facilitate the actual use of the dictionary (called use-related components) and others contain data supporting the function(s) of the dictionary (called function-related components), and a description, analysis and evaluation should take this dichotomy into account. It may also be relevant to compare the data of one dictionary with the data of one or more other dictionaries (comparative reviewing) and/or previous editions of the dictionary concerned. Other aspects to consider in connection with lexicographic data are whether definitions and explanations are correct, whether they can be easily understood by the dictionary’s target group, whether they support the dictionary function(s), whether they have been subjected to textual condensation and, if so, to what extent this impacts on the understanding of the data; further see Section 5.4 below. 5.3

Using the linguistic approach

The linguistic approach focuses on linguistic categories, principles, structures etc. and is relevant to all three significant features of dictionaries. Here I do not distinguish between LGP and LSP as most general dictionaries contain both LGP and LSP terms. Moreover, the language of specialised subject fields is characterised by the special use of linguistic options, for instance a higher frequency of complex sentence structures than in normal, unmarked language. The following examples form a brief and non-exhaustive list of issues that reviewers may take into consideration when using the linguistic approach: 1) Grammar, including syntax, morphology, and orthography; 2) Pronunciation, realised in writing or through sound files; 3) Synonyms, antonyms and hyperonyms; 4) Collocations, phrases and examples; 5) Language usage; 6) Translation strategies. It is impossible to address all the aspects that fall under the linguistic approach in a paper like this, but a few comments are appropriate. If the object of a review is a bilingual dictionary designed for translation, reviewers may examine whether its data support the translation function from a linguistic perspective. For instance, translation dictionaries that do not indicate the gender of the French or German equivalents are not well suited for translation. If the articles contain gender and morphology items, both users and reviewers of the dictionary can rightly expect the data to be easy to decode, understand and use, i.e. they are correct and usable. This analysis relates the linguistic approach to lexicographic data. It may also be appropriate to analyse how language usage relates to dictionary functions. This could include the question whether a translation dictionary uses a



Sandro Nielsen

particular translation strategy consistently or whether its equivalents and translations of phrases are arbitrarily source-language and target-language oriented. For a detailed discussion of translation strategies in bilingual dictionaries, see Nielsen (2000). Finally, reviewers may examine whether synonyms, antonyms and hyperonyms are used consistently so as to provide users with a systematic presentation of a small sub-field within a larger subject field, for instance by informing users whether a lemma and its synonyms and antonyms are general terms, specific terms or co-hyponyms. Information like this will help users to read and understand, for instance, LSP texts. 5.4

Lexicographic information costs

An aspect that extends to all three significant features of dictionaries as well as the three approaches to reviewing is the concept of lexicographic information costs. Based on Nielsen (2008: 173–174) lexicographic information costs may be defined as the effort, especially difficulty or inconvenience, that the user believes or feels is associated with consulting a dictionary, an article or any other text part of a dictionary. The focus is on the relationship between the expected and actual information costs and the expected and actual informative value, i.e. what users gain from consulting the dictionary. Search-related information costs are costs related to the lookup activities users have to perform when consulting a dictionary to get access to the data they are searching for; this is significant in connection with the search activities necessary to use CD-ROM, Intranet and Internet dictionaries. Examples of situations involving search-related information costs include cross-references without any clear purpose; the number of individual steps users must take in the lookup process before they find what they are looking for (printed and electronic dictionaries); the need to scroll up and down the screen to find something; the possibility of searching for specific words instead of scrolling long lists; the possibility of searching for parts of words and words in their inflected forms. Comprehension-related information costs are the costs related to the user’s ability to understand and interpret the data presented in a dictionary. They are basically answers to the question: how easy or difficult is it for users to understand the data presented? The design of the dictionary may contribute to keeping the lexicographic information costs at a low level, as an inappropriate design and structure may lead to high or increasing information costs. The actual wording and presentation of the data in the articles, for instance a high degree of textual condensation in definitions, may increase the information costs, whereas clear and consistent search routes may reduce the lexicographic information costs. For a discussion of the effects of textual condensation in dictionary articles, see Nielsen (2002); and for a discussion of the presentation of knowledge in printed dictionaries, see Wiegand (2000).

Reviewing printed and electronic dictionaries 

5.5

Three general themes of scholarly dictionary reviewing

A principal aim of dictionary reviewing is to provide readers with useful information. The above discussion shows that, by combining the three approaches to reviewing with the three significant features of dictionaries, reviewers may realise the full potential of reviews. In sum, scholarly dictionary reviews should centre around three general themes: 1. Which function(s) does the dictionary have? The focus is on the types of help the dictionary is intended to give to particular types of users in particular types of situations in the real world. 2. Do the data presented in the dictionary support the function(s) identified? Focus is on whether the data match the needs of the target group(s) and whether the data are presented so that users can process them to get the information they need to solve their problems. 3. Do the structures used in the dictionary organise and link the data so that they combine with the function(s) to meet the needs of users? The focus is on whether the structures combine the data placed in different places in the dictionary in such a way that user needs are fulfilled in the best possible manner. These general themes allow reviewers to focus on the relationship and interplay between the three significant features of a dictionary and the three approaches to reviewing. This focus may help reviewers to realise the full potential of reviews so that they become good examples of scholarly discourse. However, in order to properly meet the requirements of trustworthiness, it is helpful to identify some of the most important qualitative characteristics of useful information. 6. Requirements to the informative value of reviews Any person who agrees to review a dictionary assumes a responsibility towards two groups of people. Firstly, the reviewer owes a duty to the author(s) of the dictionary to ensure that his review does not contain irrelevant, personal, unfair or unsubstantiated points of criticism (Harvey 1986: 54). Secondly, the reviewer owes a duty to the readers of his review to ensure that it does not give a wrong or misleading picture of the dictionary, and that his review contains information that is useful to readers. Consequently, the reviewer must comply with what may be called generally accepted reviewing principles, i.e. the review must be made according to those methods and principles that competent (“scholarly”) and responsible lexicographers and reviewers would apply. This means that there are certain requirements to the informative value of reviews that must be complied with.



Sandro Nielsen

Irrespective of the approach and focus adopted in the review, there are certain fundamental requirements that highlight the characteristics of useful review information. The overriding qualitative requirement of a review – which also emphasises the objective of a dictionary review – is that it gives a true and fair view (ein den tatsächligen Verhältnissen entsprechendes Bild; une image fidèle) of the dictionary concerned. To ensure that the review is as informative as possible for readers, it is important that it does not present a wrong or misleading picture of the dictionary. The term true and fair view is inspired by the terminology of financial reporting, which requires, inter alia, that financial statements must give a fair presentation of the financial position of the enterprise for the benefit of investors and other users of financial statements when they make their decisions (Alexander and Britton 2004: 198–201). This may be extended to dictionary reviews so that reviews must present fairly the functions, data, structures and interrelationships between text types of the dictionary for the benefit of intended readers when they make informed decisions in the types of situations specified in Section 2 above. A review can only satisfy the general requirement of a true and fair view if it meets a number of underlying assumptions (zu Grunde liegende Annahmen; hypothèses de base). In this context, underlying assumptions are qualitative attributes ensuring that the information presented in dictionary reviews meets the readers’ requirement of usefulness and the scholarly requirement of trustworthiness. When analysing the qualitative characteristics of dictionary reviews, it is possible to distinguish between four underlying assumptions. The first is the assumption of relevance (Relevanz; pertinence). The information, observations, comments, etc. are relevant if they actually relate to the objective of the review and can effectively be used by its intended audiences when they make decisions in the five general types of situation listed in Section 2 above. An example of what Hartmann (1999: 48) calls a product test in a dictionary review may serve to illustrate the assumption of relevance. It is difficult to see the relevance of the exercise when the reviewer of a business dictionary published in the year 2000 tests its lemma stock against two texts from 1986 and 1988 respectively (Bogaards 2002: 112). The terminology used in business life changed considerably between 1986 and 2000, and is still changing, so the findings of the test only show that the dictionary can be used when reading old texts that have no relation to the time when the dictionary and the review were published. The test, and consequently the information about the findings, would have been more relevant if the texts had been from the same year as the dictionary and the review. Readers may also expect reviews to comply with the assumption of materiality (Wesentlichkeit; importance relative). A piece of information, an observation, a comment, etc. is material if its omission will distort the true and fair view of the dictionary. For instance, it is immaterial to point out spelling mistakes unless there

Reviewing printed and electronic dictionaries 

are exceptionally many of them and they reduce the informative value of the lexicographic data. Similarly, it is hardly material to observe that a closing parenthesis is missing in a particular article unless this example is used to identify and illustrate a general, negative characteristic of the dictionary as a whole (cf. Algeo 2003: 251). Put differently, a piece of information, an observation, a comment, etc. is material if it influences the reader’s decision to use and/or buy and/or recommend the dictionary and/or apply its lexicographic principles theoretically or practically, or if it influences the reviewer’s assessment of one or more review topics, alternatively the entire dictionary. Materiality may be regarded as a threshold and be linked to the review topics listed in the Appendix. When they have identified the topics that apply to the dictionary examined, reviewers can then go on to determine the materiality of their findings in respect of those topics as they proceed with the review work. The assumption of materiality thus implies that more weight should be given to significant findings and issues than insignificant ones, because insignificant findings and issues are unlikely to influence readers when making decisions on the basis of useful information as defined in Section 2 above. Finally, in view of the usually limited space available for dictionary reviews, it seems sensible to include only material issues. The reliability (Verlässlichkeit; fiabilité) of the review is also important because information is only useful if readers can depend on it. A review must not contain improbable, unrealistic or wrong statements, etc. that distort the true and fair view of the dictionary. In my experience reviews are generally reliable, but the following example illustrates that this is not always the case. When a reviewer criticises a bilingual law dictionary treating British legal terminology for having lemmatised a specific American legal term, this seems justified (Jørgensen 1994). But when a closer study of the dictionary subsequently reveals that the term selected by the reviewer is not contained anywhere in the dictionary, the review is not reliable. The assumption of reliability implies that the review should be regarded as a scholarly or “serious” text and be treated accordingly by reviewers, with the result that statements must be substantiated so as to avoid pure value judgements. Finally, readers must be entitled to expect reviews to comply with the assumption of neutrality (Neutralität; neutralité). In this context neutrality means that the review must contain all material and relevant aspects irrespective of whether they have a negative or positive influence on the reviewer’s assessment. In other words, the review must be free from bias (frei von verzerrenden Einflüssen; sans parti pris). It follows from this assumption that the reviewer’s personal relation (if any) to the dictionary, its author(s) and publishers must not lead to the assessment and evaluation of single aspects or the dictionary in its entirety being (excessively) negative or (excessively) positive. However, the assumption of neutrality does not prevent reviewers from discussing lexicographic theories and principles that



Sandro Nielsen

compete or contradict those used by the authors of the dictionary under review. It should be noted that the relative importance of the underlying assumptions discussed above is a matter of informed or “professional” judgement to be exercised on a case-by-case basis. Furthermore, it should be the duty of any reviewer to give a fair, and preferably comprehensive, description and evaluation of the dictionary concerned. 7. Concluding remarks The reviewing of electronic and printed dictionaries is not an exercise in linguistics or in subject fields but an exercise in lexicography. It does not follow from this that dictionary reviews cannot or should not be based on a linguistic approach, but that the linguistic approach is only one of three approaches to dictionary reviewing. Similarly, the linguistic and factual competences of reviewers should not be relegated to an insignificant position in the review process. Moreover, reviewers should define the object of their reviews, the dictionary, in terms of significant lexicographic features in order to give priority to lexicography and dictionary functions, as this emphasises the fact that dictionaries are much more than mere vessels of linguistic categories, namely lexicographic tools that that have been developed to fulfil specific types of needs of specific types of users in specific types of situations in the real (extra-lexicographic) world. The above proposed basis for a framework contains an outline of general theoretical and practical principles that underlie the true nature of dictionary reviews, and places the reviews in a lexicographic universe with the dictionary and lexicography at its centre. This seems to be in line with the modern understanding of lexicography as a separate academic discipline concerned with the compilation, design, evaluation and use of dictionaries. Moreover, a set of generally applicable principles may lead the discourse community to accept dictionary reviews as an important part of the scholarly discourse. Finally, it may result in reviews that actually contribute to the development of lexicographic theories and dictionary compilation. References A.

Dictionaries

Pickett, J.P. et al. (eds.). 2004. American Heritage College Dictionary. Fourth edition. Boston and New York: Houghton Mifflin.

Reviewing printed and electronic dictionaries  Sinclair, J. (ed.). 1995. Collins Cobuild English Dictionary. Third edition. London: HarperCollins. Summers, D. et al. (eds.). 2005. Longman Dictionary of Contemporary English. Fourth edition. Harlow: Longman. Mish, F.C. et al. (eds.). 2004. Merriam-Webster’s Collegiate Dictionary. Eleventh edition. Springfield, MA: Merriam-Webster. Mish, F.C. et al. (eds.). 2008. Merriam-Webster Online Dictionary. Merriam-Webster Inc. (Accessed 3 September 2008). Wehmeier, S. et al. (eds.). 2005. Oxford Advanced Learner’s Dictionary. Oxford: Oxford University Press. Simpson, J. (ed.). 2008. Oxford English Dictionary Online. Oxford University Press. (Accessed 3 September 2008).

B.

Other literature

Alexander, D. and Britton, A. 2004. Financial Reporting. London: Thomson. Algeo, J. 2003. “[Review of] The New Oxford American Dictionary. Ed. Elizabeth J. Jewell, Frank Abate, and others. Oxford: Oxford University Press, 2001.” Dictionaries 24: 236–252. Bergenholtz, H. 2003. “Ordbogsanmeldelser i LexicoNordica.” LexicoNordica 10: 7–26. Bergenholtz, H. and Mogensen, J.E. 1993. “Wörterbuchkritik in Dänemark.” Lexicographica 9: 8−35. Bergenholtz, H., Tarp, S. and Wiegand, H.E. 1999. “Datendistributionsstrukturen. Makro- und Mikrostrukturen in neueren Fachwörterbüchern.” In Fachsprachen. Ein internationales Handbuch zur Fachsprachenforschung und Terminologiewissenschaft, Teilband 2, L. Hoffmann, H. Kalverkämper and H.E. Wiegand (eds.), 1762–1832. Berlin/New York: Walter de Gruyter. Bogaards, P. 2002. “[Review of] DAF: An Innovative Learners’ Dictionary of Business French.” International Journal of Lexicography 15(1): 105–116. Chan, A.Y.W. and Loong, Y. 1999. “Establishing criteria for evaluating a learner’s dictionary.” In Language Analysis, Description and Pedagogy, R. Berry, B. Asker, H. Hyland and M. Lam (eds.), 298–307. Hong Kong: Hong Kong University of Science and Technology. Chan, A.Y.W. and Taylor, A. 2001. “Evaluating learner dictionaries: What the reviews say.” International Journal of Lexicography 14(3): 163–180. Hartmann, R.R.K. 1999. Teaching and Researching Lexicography. Harlow: Longman. Harvey, J. 1986. “Social sciences.” In Reviews and Reviewing: A Guide, A.J. Walford (ed.), 53–89. London: Mansel Publishing. Hiligsmann, P. 2005. “[Review of] Marja Verburg and Ruud Stumpel (eds.) Van Dale Pocketwoordenboek Nederlands als tweede taal. Utrecht and Antwerpen: Van Dale Lexicografie. 2003”. International Journal of Lexicography 18(3): 384–391. Jackson, H. 2002: Lexicography. An Introduction. London/New York: Routledge. Jørgensen, S. 1994. “[Review of] Juridisk Basisordbog.” Sprog & Erhverv 2: 18. Kleinedler, S. 2000. “[Review of] Computing Dictionary. Fourth Edition. Ed. Ronald D. Kobler, Trevor Meers, and others. Lincoln, NE: Sandhills Publishing. 1999.” Dictionaries 21: 136– 142. Kling, R.K. and McKim, G. 1999. “Scholarly communication and the continuum of electronic publishing.” Journal of the American Society for Information Science 50(10): 890–906.



Sandro Nielsen Kotzé, E. 2008. “[Review of] Phillip Louw, Lorna Hiles (eds.) et al. Oxford Afrikaans-Engels/ English-Afrikaans Skoolwoordeboek/School Dictionary. Cape Town: Oxford University Press. 2007”. International Journal of Lexicography 21(2): 207–212. Meer, G. van der and Sansome, R. 2001. “OALD in a linguistic and a language teaching perspective.” International Journal of Lexicography 14(4): 283–306. Miranda, E.O. 1996. “On book reviewing.” Journal of Educational Thought 30(2): 191–202. Nielsen, S. 1990. “Contrastive description of dictionaries covering LSP communication.” Fachsprache/International Journal of LSP 3–4: 129–136. Nielsen, S. 1994. The Bilingual LSP Dictionary. Principles and Practice for Legal Language. [Forum für Fachsprachen-Forschung 24]. Tübingen: Gunter Narr. Nielsen, S. 1999. “Mediostructures in bilingual LSP dictionaries.” Lexicographica 15: 90–113. Nielsen, S. 2000. “Translation strategies for culture-specific textual conventions in bilingual dictionaries.” Lexicographica 16: 152–168. Nielsen, S. 2002. “Textual condensation in the articles of de Gruyter Wörterbuch Deutsch als Fremdsprache.” In Perspektiven der pädagogischen Lexikographie des Deutschen II. Untersuchungen anhand des de Gruyter Wörterbuch Deutsch als Fremdsprache, H.E. Wiegand (ed.), 597–608. Tübingen: Max Niemeyer. Nielsen, S. 2003. “Changes in dictionary subject matter.” In Untersuchungen zur kommerziellen Lexikographie der deutschen Gegenwartssprache I. »Duden. Das Groβe Wörterbuch der deutschen Sprache in zehn Bänden«, H.E. Wiegand (ed.), 109–114. Tübingen: Max Niemeyer. Nielsen, S. 2006. “A functional approach to user guides.” Dictionaries 27: 1–20. Nielsen, S. 2008. “The effect of lexicographical information costs on dictionary making and use.” Lexikos 18: 170–189. Osselton, N. 1989. “The history of academic dictionary criticism with reference to major dictionaries.” In Dictionaries. An International Handbook of Lexicography, vol. 1, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds.), 225–230. Berlin/New York: Walter de Gruyter. Phillips, E.M. and Pugh, D.S. 2005. How to get a PhD. A handbook for students and their supervisors. Maidenhead: Open University Press. Ripfel, M. 1989. Wörterbuchkritik. Ein empirische Analyse von Wörterbuchrezensionen. Tübingen: Max Niemeyer. Steiner, R.J. 1979. “Guidelines for reviewers of bilingual dictionaries.” Dictionaries 1: 166–181. Tarp, S. 1998. ”Leksikografien på egne ben. Fordelingsstruktur og byggedele i et brugerorienteret perspektiv.” Hermes 21: 121–137. Tomaszczyk, J. 1986. “The bilingual dictionary under review.” In ZüriLex ’86 Proceedings. Papers read at the EURALEX International Congress, University of Zürich, 9–14 September 1986, M. Snell-Hornby (ed.), 289–297. Tübingen: Francke. Wiegand, H.E. 2000. “Wissen, Wissensrepräsentation und Printwörterbücher.” In Proceedings of the Ninth Euralex International Congress, EURALEX 2000, Stuttgart, Germany, August 8th12th, 2000, U. Heid et al. (eds.), 15–38. Stuttgart: Institut für Maschinelle Sprachverarbeitung.

Reviewing printed and electronic dictionaries 

Appendix: List of review topics, adapted from Bergenholtz (2003: 20). 1. Communicative, cognitive and operative lexicographic functions 2. Dictionary users: cultural, factual, linguistic and translation competences 3. Guidance for use: straight-text format, illustrated, extent to which it relates to functions 4. Price 5. Layout/web-design 6. Information about the author(s) 7. Comparison with other dictionaries or earlier editions 8. The history and background of the dictionary 9. References to other reviews 10. Information about the reviewer(s) 11. Empirical basis of the dictionary 12. Outside matter or metatexts: integration/non-integration, interrelationship, support of function(s) 13. Lemma selection: derivatives, affixes, irregular spellings, spelling variants, multi-word units 14. Lemma presentation 15. Access routes and search options 16. Article structure, data types, article-internal search zones 17. Prescriptive/descriptive/proscriptive language usage guidance 18. Equivalents and equivalence: full, partial or zero equivalence, translation strategies 19. Cultural information: culture-specific, comparative 20. Grammar: grammar codes, natural language presentation 21. Orthography 22. Pronunciation: written, audible 23. Semantic/encyclopaedic data: restricted defining vocabulary, easy to understand 24. Diasystematic labelling: logical, easy to decode 25. Etymology 26. Example sentences: grammatical, citation or competence examples 27. Collocations and phrases 28. Idioms 29. Illustrations: drawings, pictures, video clips 30. Synonyms/antonyms/hyperonyms 31. Cross-references and links: extent to which they support function(s) 32. Information costs: search-related, comprehension-related 33. The entertainment value of the dictionary 34. Positive, negative or mixed overall conclusion

Reflections on data access in lexicographic works Sven Tarp This article discusses the need for quick and easy access to the relevant lexicographic data in the light of the function theory. In this respect, it raises the question: How many data does a user need in each consultation? As many as possible? Or as few as possible? The answer provided in the article is that the user needs exactly the amount of data that are necessary in order to meet his or her information needs, neither more nor less. If there are too few data the user’s needs will not be met, whereas too many data may confuse the user and make the spotting of the needed data slower and more difficult. On this basis, the article raises a new question: What is exactly the necessary amount of data needed in each consultation? In order to answer this question, the article introduces a distinction between concrete user needs and types of user needs related to a specific type of user situation. In this light, the article discusses a number of articles taken from printed and electronic dictionaries as well as a number of hypothetic articles based upon the former in order to explore some of the possibilities made available to lexicography by the new information technologies. Keywords: lexicography; data access; user needs; functional theory; article structures; search criteria

1. Introduction Nothing remains forever what it was. Everything is on the move. Everything is transformed by nature and forced into new paths. (Titus Lucretius Carus 99–55 BC)

Lexicography is on the move, transformed not by nature, but by society. The new technologies have placed lexicography in the troubled waters between the “old” printed dictionaries and the new electronic ones. However, the metamorphosis is far from completion. Lexicography is making for a safe port which it has still not entered. Just as a ship cannot make for its final destination without a captain, lexicography cannot evade the many hidden rocks on its long and never-ending Odyssey without an advanced theory. In this respect, neither the stubborn pragmatism



Sven Tarp

which has dominated lexicographic literature for so long nor the blind empiricism which, in the name of user research, has tried to impose itself during the last two or three decades have anything to offer but endless troubled waters. In his posthumously published essay Natural Science and the Spirit World, the German philosopher Friedrich Engels identified “the most certain path from natural science to mysticism” as “the shallowest empiricism that spurns all theory and distrusts all thought.” And he continued: Indeed, dialectics cannot be despised with impunity. However great one’s contempt for all theoretical thought, nevertheless one cannot bring two natural facts into relation with one another, or understand the connection existing between them, without theoretical thought. The only question is whether one’s thinking is correct or not, and contempt of theory is evidently the most certain way to think naturalistically, and therefore incorrectly. (Engels 1990: 354)

What holds true for natural science also applies to social sciences to which lexicography as a theory belongs according to the views shared by Professor Henning Bergenholtz and the author of this contribution. The only difference is that contempt for theoretical thought may not lead lexicography to mysticism as such, but to mystified users of its practical products, i.e. dictionaries, lexicons, thesauri, encyclopaedias, etc. Although the very concept of a lexicographic theory, i.e. the possibility of developing a theory, has been questioned by some modern supporters of Popper, such a theory is not only highly desirable as mentioned above, but also perfectly possible provided it is adapted to the nature and characteristics of the discipline in question (cf. Bergenholtz et al. 2008). At an abstract level, a theory is a systematically organised set of statements about a given subject field. In this sense, the basic statement of lexicographic theory is that dictionaries and other types of lexicographic works are utility products, i.e. cultural, man-made artefacts originally conceived and manufactured with the genuine purpose of satisfying specific types of human needs. Whatever the individual lexicographer thinks of the result of his practical work, it is a fact that the first dictionaries ever produced saw the light of day in response to certain needs observed in ancient society and that whoever consults a dictionary, even today, does it in order to satisfy needs of a specific type. In this light, the fundamental challenge of lexicography is to characterise and typologise these needs in order to establish a basis upon which the corresponding lexicographic solutions can be found and developed. However, the German scholar H.E. Wiegand, who was the first to explicitly state what many lexicographers before him had implicitly taken for granted, i.e. that dictionaries are utility tools, made his statement in the age of the printed dictionaries (cf. Wiegand 1987). For this reason, the response to lexicography’s fundamental challenge has to be reconsidered, and

Reflections on data access 

maybe even modified, in the light of the new technologies made available to practical lexicography. 2. The need for information The common nature of all lexicographically relevant needs is that they are needs for information which may subsequently be used for a huge variety of purposes such as successful communication, storing of knowledge, performing of practical and mental operations, and interpretation of non-verbal signs (cf. Tarp 2008b). This concept of user needs transcends the ideas reflected in lexicographic practice for a very long period, at least in Europe, although recent research has shown that dictionaries produced during the Enlightenment display features of surprising “modernity” long ago forgotten. The Universal Dictionary of Trade and Commerce published in 1774 is, for instance, a dictionary containing lexicographic data of which some can only be understood as instructions helping the user to perform non-verbal operations of the mentioned types, i.e. a lexicographic function that has subsequently been relegated to manuals, handbooks, how-to’s, instruction books, user guides, and similar text types (cf. Tarp 2007; 2008a). Meeting needs for information has certainly been the purpose of dictionaries from the very beginning. In this respect there is nothing new under the sun. What is new is the focus on information in the present-day society which has even nicknamed itself “the information society”. Although dictionaries, and lexicographic works in general, have never been the only source for human beings interested in acquiring new information, today the number of “competitors” has grown considerably, especially due to the large amount of texts placed on the internet and made accessible through various types of search engines, among them Google, Yahoo, and others. Never before have so many data been available to so many people. However, one thing is the amount of available data; another thing is the access to these data. Access to relevant data has always been a problematic topic. In the Middle Ages, when far fewer books and other text types were available, it was quite a challenge to find the library that stored exactly the book or other text type where the relevant data could be found. Many researchers spent their whole life without discovering the text that might have solved their specific information needs and taken their research to a higher level. Even today, the famous Santa Cruz Library in Valladolid in Spain does not have a complete list of all lexicographic works included in its collection which comprises more than 13,000 titles from before 1815, for which reason it is very difficult, if not impossible, to find the relevant title and then the relevant data from which the needed information can be retrieved.



Sven Tarp

In many respects, things have not improved in the modern world in spite of the introduction of the marvellous new information technologies. Today it is possible to track down the needed data, especially on the internet, but all too often the process of finding them is very time-consuming. For instance, when someone “googles” on the internet in order to find data from which information can be retrieved, he or she will frequently, even when specifying the search criteria, end up with references to hundreds, if not thousands, of web pages. In this way, the user runs the risk of suffering what has been called “the information death”, i.e. being suffocated by the overwhelming amount of data popping up. In a thought-provoking contribution, Nielsen (2008) discusses the enormous costs that are the result of this process in relation to the use of lexicographic works. And the costs are far higher when viewed in relation to data access in general. Quick and easy access to the relevant data is therefore not only a must in terms of user friendliness, but also in economic and logistic terms because the corresponding waste of time and manpower is synonymous with undue costs as argued by Nielsen. In this light, it is highly surprising that the rapid introduction of the new information technologies, with a few exceptions (e.g. Bergenholtz and Gouws 2007), has not been accompanied by a broader discussion of the concepts of data accessibility and information retrieval from a needs-adapted perspective. For instance, although there is the relatively comprehensive literature about information retrieval, this literature seems to have a completely different approach in terms of specifying the real user needs and relating them to real types of users and user situations (cf. Ingwersen 1992; Manning et al. 2008). 3. The relevance of the function theory The modern function theory, developed by researchers at the Aarhus-based Centre for Lexicography of which Professor Bergenholtz has been the director since its foundation in 1995, provides a set of statements about lexicographic user needs which may assist the formulation of a future theory of information and data access not only in lexicographic works, but also in other text types conceived for consultation and retrieval of information. The most important of these statements is that users in general never need information in general. The type of information needed is always concrete and depends both on the concrete type of user and on the concrete type of situation in which the need occurs. Analogous to other utility tools, this means that the needs of potential users of information tools cannot be defined without specifying who needs what and in which situation. Research has shown that the situation in which the needs occur is the basis element in defining the type of information required and that the criteria for the

Reflections on data access 

characterization of the various user types depends on the specific user situation in question. The function theory has until now discovered four fundamental types of user situations, i.e. the communicative, cognitive, operative and interpretative situations, although it cannot be ruled out that further situations will be identified as the result of future research. In the communicative, operative and interpretative situations the users’ needs for information will always be punctual, i.e. related to the solution of specific here-and-now problems or clarification of similar doubts, whereas in the cognitive situations the users may need either punctual information in order to fill a specific knowledge gap or global information in order to memorise it as part of a systematic study of a given topic. The need for global (or systematic) information is traditionally satisfied by books, text books, scientific articles and similar text types and does not belong to the subject field of lexicography which only deals with punctual needs that can be covered through consultation of the corresponding information tools such as dictionaries, lexicons, thesauri, encyclopaedias, and other tools based upon lexicographic principles. Although the terms data and information are frequently confused, especially in the lexicographic literature, it is important to underline that texts do not contain information, but only data from which the corresponding information may or may not be retrieved by a particular reader or user (cf. Wiegand 2000a; 2002). In this light, lexicographic works contain data which have been, or ought to have been, selected, prepared and presented according to lexicographic principles with the genuine purpose of meeting the specific types of information needs which a specific type of users may have in a specific type of situation. 4. Types of needs versus concrete needs The selection, preparation and presentation of needs-adapted data in lexicographic works, as well as in other information tools conceived for consultation, is only part of the solution, because these data should also be quickly and easily accessible in order to meet all the needs of the target user group and avoid the costs mentioned by Nielsen (2008). However, before discussing the possible solution that can ensure this quick and easy data access process it is necessary to answer the following questions: How many data does a user need in each consultation? As many as possible? Or as few as possible? The right answer to these questions is without any doubt that the user needs exactly the amount of data that are necessary in order to meet his or her information needs, neither more nor less. If there are too few data the user’s needs will not be met, whereas too many data may confuse the user and make the spotting of the needed data slower and more difficult.



Sven Tarp

Consequently, a new question arises: What is exactly the necessary amount of data needed in each consultation? In order to answer this question, a distinction should be made between concrete user needs and types of user needs related to a specific type of user situation. Here there is a basic contradiction in terms of approach. As a rule, lexicographers when planning and compiling a dictionary do only, and can only, work with types of needs to be satisfied by the lexicographic data, whereas users always consult dictionaries with a concrete need for information. The determination of the types of user needs is the result of a process of abstraction and these types of needs may, as such, also be considered abstract needs as opposed to concrete needs. They can be achieved by means of at least two different methods, i.e. deduction (the quick method) and the study of empirical data (the slow method), and they may in any case, if so desired, be verified – or falsified – empirically through the observation of the concrete needs occurring for concrete users in concrete situations (e.g. text production and text reception) and based upon the methods of modern sociology and statistics (cf. Bergenholtz et al. 2008; Tarp 2008c). In some cases the data needed to satisfy the type of needs are identical with the data needed to satisfy the concrete needs, but in other cases there may be a bigger or smaller discrepancy between the former and the latter in terms of both the content and the amount of data. How to deal with this discrepancy? There are basically two solutions, of which one is only possible in electronic dictionaries and tools. The first is the production of mono-functional dictionaries where the data in each individual article is presented or structured in such a way that the user can easily locate the data considered most relevant and, if needed, proceed to additional data included in the same article or elsewhere in the dictionary (although cross-references always slow down the overall consultation process). The second solution, which is only possible in electronic tools, is to prepare a preliminary interactive phase where the lexicographic tool helps the users to identify and specify their concrete needs before being guided to the corresponding data (article). In the following section we will take a closer look at the two solutions. 5. The well-structured article In poly-functional dictionaries the total amount of data included in the respective articles has to serve two or more functions. Although one type of data may sometimes support more than one function, in many cases it will only be relevant to one function. Consequently, when users who find themselves in a specific situation consult a poly-functional dictionary in order to get lexicographic assistance, they may end up with a lot of superfluous data whose only real function is to complicate and

Reflections on data access 

slow down their specific consultation process. If quick and easy data access is desired and given top priority, this is a strong argument for the production of mono-functional dictionaries. The article reproduced in Example 1 from a poly-functional English learner’s dictionary which, according to the publishing house, is compiled to assist users with both text production and reception illustrates the problem. tooth /tu:θ/ noun (pl. teeth /ti:θ/) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. ○ to brush /clean your teeth ○ tooth decay ○ She answered through clenched teeth (= opening her mouth only a little because of anger). ○ The cat sank its teeth into his finger.–picture => body–see also buck teeth, false teeth, milk tooth, wisdom tooth 2 a narrow pointed part that sticks out of an object: the teeth on a saw–picture => fastener–see also fine-tootk comb idm cut your teeth on sth to do sth that gives you your first experience of a particular type of work cut a ‘tooth (of a baby) to grow a new tooth get your ‘teeth into sth (informal) to put a lot of effort and enthusiasm into sth that is difficult enough to keep you interested: Choose an essay topic that you can really get your teeth into. have ‘teeth (BrE, informal) (of an organization, a law, etc.) to be powerful and effective in the teeth of sth 1 despite problems, opposition, etc: The new policy was adopted in the teeth of fierce criticism. 2 in the direction that a strong wind is coming from: They crossed the bay in the teeth of a howling gale. set sb’s ‘teeth on edge (of a sound or taste) to make sb feel physically uncomfortable: Just the sound of her voice sets my teeth on edge. – more at armed v., bare v., bit., eye n., eye teeth, fight v., gnash, grit v., hell, kick n., kick n., lie2 v., long adj., red adj., skin n., sweet adj. Example 1: Dictionary article from Oxford Advanced Learner’s Dictionary (2005) If a user, in this case a foreign learner of English, has a reception problem related to tooth it is relatively easy to find the first meaning, i.e. the hard white structures in the mouth used for biting and chewing food, but with the design chosen for the article in Example 1 it is much more difficult to access the second meaning, i.e. a narrow pointed part that sticks out of an object. This problem could, of course, be partially solved by means of a convenient internal structure based upon the use of search fields (cf. Wiegand 2000b). But the ideal way to deal with this problem in printed dictionaries would be to design them as mono-functional dictionaries and leave out all data that are not relevant for this function and only keep the relevant data, i.e. the data required to meet the so-called primary (function-related) and



Sven Tarp

secondary (use-related) user needs (cf. Tarp 2008d). Based upon these criteria the article tooth will have the content reproduced in Example 2. tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. Example 2: Hypothetical dictionary article based upon Oxford Advanced Learner’s Dictionary (2005) The dictionary article in Example 2, which is structured according to the principle of search fields, contains, apart from the lemma, data about part of speech and the plural form of tooth as well as two short definitions followed by authentic examples whose function is to support the definitions. The indication of part of speech and the plural form serves the secondary user needs and helps users to confirm that they have accessed the right article. Indeed, the above data is everything that foreign learners of English need in order to be assisted lexicographically when having a reception problem related to the single word tooth although the concrete user in the concrete consultation may only need one of the definitions without the supporting text example. However, if users have reception problems related to word combinations with tooth, there are various possible solutions in printed dictionaries, the first of which is the traditional one where the respective word combinations are addressed to a lemma representing one of the words contained in them (Example 3). tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. word combinations cut your teeth on sth to do sth that gives you your first experience of a particular type of work cut a ‘tooth (of a baby) to grow a new tooth get your’ teeth into sth to put a lot of effort and enthusiasm into sth that is difficult enough to keep you interested: Choose an essay topic that you can really get your teeth into. have ‘teeth (of an organization, a law, etc.) to be powerful and effective in the teeth of sth 1 despite problems, opposition, etc: The new policy was adopted in the teeth of fierce criticism. 2 in the direction that a strong wind is coming from: They crossed the bay in the teeth of a howling gale.

Reflections on data access

set sb’s ‘teeth on edge (of a sound or taste) to make sb feel physically uncomfortable: Just the sound of her voice sets my teeth on edge. Example 3: Hypothetical dictionary article based upon Oxford Advanced Learner’s Dictionary (2005) This may be the only realistic solution when dealing with very voluminous dictionaries like the Oxford Advanced Learner’s Dictionary (2005) which already includes more than 70,000 lemmata, in as far as a more focussed solution would require various volumes and, thus, creating a new type of problem in terms of consultation price. However, if the publishers’ real problems are ignored for a brief moment, a more focussed solution would be to select lemmata for each of the many word combinations which may generate reception problems. In this case the dictionary will, among others, contain the lemmata included in Example 4. cut a ‘tooth idiom (of a baby) to grow a new tooth cut your teeth on sth idiom to do sth that gives you your first experience of a particular type of work get your ‘teeth into sth idiom to put a lot of effort and enthusiasm into sth that is difficult enough to keep you interested: Choose an essay topic that you can really get your teeth into. have ‘teeth idiom (of an organization, a law, etc.) to be powerful and effective in the teeth of sth idiom 1 despite problems, opposition, etc: The new policy was adopted in the teeth of fierce criticism. 2 in the direction that a strong wind is coming from: They crossed the bay in the teeth of a howling gale. set sb’s ‘teeth on edge idiom (of a sound or taste) to make sb feel physically uncomfortable: Just the sound of her voice sets my teeth on edge. tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. Example 4: Hypothetical dictionary articles based upon Oxford Advanced Learner’s Dictionary (2005)





Sven Tarp

Of course, the lemmatised forms of these word combinations can be discussed. But apart from that, the solution presented in Example 4 is probably much more user friendly in terms of quick and easy access to exactly the place where the needs of the foreign learners with reception problems related to English word combinations may be satisfied. For instance, users who have a reception problem related to the idiom have a tooth may go directly to the article where a short explication of this idiom is provided instead of first going to the article tooth (Example 1) and then speed-reading it in order to locate the small idiom squeezed in between others. However, this solution requires that the factual users in each case are aware that they have a reception problem related to the whole idiom have a tooth and not only to one of the words contained in it, in this case tooth, because they may think that tooth has other meanings than the ones already known to them. In such cases, the users will need cross-references from the single-word article to the multi-word articles (Example 5). tooth noun (pl. teeth ) 1 any of the hard white structures in the mouth used for biting and chewing food: I’ve just had a tooth out at the dentist’s. 2 a narrow pointed part that sticks out of an object: the teeth on a saw. Cross references → cut your teeth on sth → cut a ‘tooth → get your’ teeth into sth → have ‘teeth → in the teeth of sth → set sb’s ‘teeth on edge Example 5: Hypothetical dictionary article based upon Oxford Advanced Learner’s Dictionary (2005) There are no known examples of learner’s dictionaries where both single words and word combinations have been lemmatised at a full scale, but there are various dictionaries of idioms specially conceived for foreign learners or other types of users, for instance the Cambridge International Dictionary of Idioms (1998). However, in these cases users who have reception problems will have to consult at least two different dictionaries, one when they have problems related to single words and another when they have problems related to idioms. Such a solution is possible, but it requires either a complex system of cross-references between the dictionaries or that, before each consultation, users know whether they have a singleword problem or a word combination problem, something that may hardly be expected from learners who have no real knowledge of the respective linguistic

Reflections on data access 

categories. On the other hand, there is at least one known example of a dictionary for Danish learners of Danish that includes both single words and word combinations in its central word list. This dictionary is Dansk Glossarium, published in 1800 by the vicar Jens Høier Leth, but it has only managed to include all these words and combinations in one volume because the total number of lemmata is relatively small. Thus, it may inspire future lexicographers but it has still not met the challenge of designing a one-volume dictionary able to meet all needs related to one lexicographic function. Tarp (2008d) writes in relation to the primary information needs in term of text reception in the mother tongue: When people encounter problems in understanding their mother tongue, the reason is normally that they do not understand the meaning of a particular word, idiom or proverb. But sometimes the reason is that they do not know whether the word in question has any stylistic or cultural restrictions. As a result, potential users need information about meanings and pragmatic and cultural restrictions in order to understand a text in their mother tongue. Finally, the reason for any lack of understanding may be that people do not recognise a particular irregular inflection, making it necessary to select such forms with references to the basic form. (Tarp 2008d: 70)

For this reason the following information needs are listed by Tarp: meaning of lemmata, idioms, meaning of idioms, proverbs, meaning of proverbs, pragmatic and cultural restrictions, and irregular inflection forms as lemmata. This way of describing the problem is valid in as much as it refers to traditional dictionaries and dictionary articles like the one in Example 1 from the Oxford Advanced Learner’s Dictionary (2005). But at the highest level of abstraction the amount of data types required in order to meet the primary information needs in terms of text reception can be reduced to only two, i.e. the lemma and the meaning. Although the concept of a lemma may differ from the one suggested by Wiegand (1983), at this level of abstraction the lemmata could be single words, irregular inflection forms, combinations of words, or even something else, whereas the users’ need for information about meaning could be covered by explanations, equivalents, synonyms, examples and illustrations, or a combination of these, employing the users’ mother tongue or a foreign language depending on the type of dictionary. This solution may not be practicable in printed dictionaries compiled for anything else than a reduced vocabulary, but it nevertheless points to the future when electronic dictionaries become part of the discussion and the new technologies are used as they ought to be. This will be discussed in the following section.



Sven Tarp

6. The interactive phase The last two decades or so have seen the massive introduction of dictionaries based upon various electronic platforms such as CD-ROMs and the internet. However, generally speaking they have still not completely renewed lexicography, but are what Henry Ford would have called faster horses, i.e. the same old horses just running a little faster due to the new search facilities. The data presentation from which the enthusiastic user finally has to retrieve the needed information is still more or less the traditional dictionary article that has not been adapted to the new possibilities. The reason for this sad situation within modern lexicography is basically the dominating pragmatic and empirical trends that reject the necessity of a theoretical superstructure that may guide the conception and production of a completely new generation of lexicographic tools. Professor Henning Bergenholtz is one of the few lexicographers who have experimented with the new information technologies and introduced the first lexicographic Model T Fords that may be the forerunners of future lexicographic tools and very much desired lexicographic Mercedes Benzes and Rolls Royces. This has been reflected in a number of dictionaries authored or co-authored by Professor Bergenholtz such as the Danish OnLine Dictionary [Den Danske Netordbog] (2008), the Dictionary of Fixed Expressions [Ordbogen over Faste Vendinger] (2007), the Music Dictionary [Musikordbogen] (2006), and the Danish-English/English-Danish Dictionary of Accounting [Regnskabsordbøgerne] (2003–2008). Apart from introducing new search criteria such as associations in the Ord bogen over Faste Vendinger, the authors of this dictionary have also provided users with the option of going through an interactive phase before being guided to the respective dictionary articles (Example 6). Immediately below the above search mechanism the user will find an instruction guide explaining the various search criteria, both in terms of the options available in the scroll menu, i.e. contains, begins with, ends with and is, and in terms of the other four options which may be described as different user situations, i.e. text reception, text production 1, text production 2, and cognition. The difference between text production 1 and 2 is that in the first case the user “knows the fixed expression or part of it, and wants to know how it is used (e.g. grammar or word combination with the expression), or if it really has the meaning that he or she thinks”, whereas in the second case “the user wants to use any fixed expression with a specific meaning, but without considering a specific expression.” (Ord bogen over Faste Verdinger 2007)

Reflections on data access 

Search for a word that

contains begins with ends with is

Start the search

Find the expression

Find the usage 1

Find the usage 2

All information

(understand a text)

(write a text 1)

(write a text 2)

(if you want to know more)

Example 6: Translation of interactive search mechanism in Ordbogen over Faste Vendinger (2007)

The search form contains is the default function and allows the user to access the respective dictionary article after performing only two actions, i.e. to write the word or word combination and activate the start function. But if the user wants to refine the search criteria, he or she may perform up to four different actions, i.e. define the search form, write the word or word combination, click on the user situation and activate the start function. The lexicographic data popping up will then depend on the criteria used. If the user for instance writes the Danish idiom spise brød til (hold your horses) and activates contains and Find the expression (reception), then the article will pop up on the screen (Example 7). spise brød til betydning (meaning) • udtryk for, at nogen bør vente lidt og tænke sig om, inden de reagerer på noget Dette udtryk bruges som en opfordring til nogen om at tage det roligt, sætte tempoet ned og ikke lade sig rive med i en overilet reaktion på en hændelse eller udvikling, som gør en spændt eller ophidset. Man kan også bruge udtrykket som en kommentar, når nogen overdriver. faste vendinger (fixed expressions) spise brød til Example 7: Dictionary article from Ordbogen over Faste Vendinger (2007) However, if users activate contains and All information (cognition), they will get to the article in Example 8, which apart from meaning and fixed expression includes data on synonyms, grammar, word combinations, authentic examples and associations.



Sven Tarp

spise brød til idiom • neutral betydning (meaning) • udtryk for, at nogen bør vente lidt og tænke sig om, inden de reagerer på noget Dette udtryk bruges som en opfordring til nogen om at tage det roligt, sætte tempoet ned og ikke lade sig rive med i en overilet reaktion på en hændelse eller udvikling, som gør en spændt eller ophidset. Man kan også bruge udtrykket som en kommentar, når nogen overdriver. faste vendinger (fixed expressions) spise brød til synonymer (synonyms) klappe hesten klappe kebabben koldt vand i blodet se tiden an Stop en halv. stoppe en halv grammatik (grammar) nogen spiser brød til ordforbindelser (word combinations) anbefale samtidigt at spise brød til begynde at spise brød til burde spise brød til heldigvis have lov til at spise brød til spise brød til sin hysteri spise brød til sin vrede vælge at spise brød til være grund til at sige spis nu lige brød til eksempler (examples) Aktiekursen er høj, ja – men spis nu lige brød til! Men selv om alle nu lader sig rive med og råber “sammenbrud”, bør man lige spise brød til. For vel vil weekendens begivenheder i Bruxelles betyde en udsættelse af EU-Forfatningens ikrafttræden, men helt sikkert er det, at der blot bliver tale om en forsinkelse. På The Computer Virus Myths home page bliver man bedt om at spise brød til sit virushysteri. Skal vi ikke lige spise brød til? Disse 100 millioner kan vist bruges bedre på mange andre områder i Odense kommune.

Reflections on data access 

Spis nu lige brød til – når man møder overraskende påstande som disse, bør ens første reaktion altid være: Kan det være rigtigt, eller er der tale om fup? associationer (associations) forbehold nøle skeptisk tålmodig tøve

Example 8: Dictionary article from Ordbogen over Faste Vendinger (2007) If users activate Find the usage 1 or 2 (text production), other types of data will pop up on the screen. Consequently, what we have in this dictionary is dynamic articles including different types of data that are structured in different ways according to each type of search criteria. This represents an important innovation of lexicographic practice. However, the Ordbogen over Faste Vendinger is still not conceived completely in accordance with the principles of modern lexicographic function theory which not only relates the users’ specific types of information needs to the respective types of user situations, but also to the specific types of users. In order to achieve this, it is necessary to work with a two-dimensional search criterion (Figure 1). However, as mentioned above, the criteria for the typologization of the users depends on the respective types of user situations. It is therefore necessary to introduce a two-step search mechanism where users will first have to select the respective user situation and then, in a second interactive phase, have to choose between two or more user types in order to identify themselves. The relevance of this method becomes even clearer in relation to another of professor Bergenholtz’ electronic dictionaries, i.e. Regnskabsordbøgerne [Dictionary of Accounting] (2003–2008). In fact, this dictionary is described by the authors as four independent dictionaries, i.e. a monolingual Danish one, a bilingual Danish-English one, a monolingual

Types of user situations

Types of users

Figure 1. Abstract model of two-dimensional search criterion



Sven Tarp

English one, and a bilingual English-Danish one. This way of typologizing the dictionaries is problematic because it is exclusively based upon the dictionary form (monolingual or bilingual) and not upon the respective functions. According to the criteria of lexicographic functions, Regnskabsordbøgerne should be considered one multi-functional dictionary. The dictionary does still not (May 2009) include a search mechanism where the users may relate the search process to their respective situations and, even less, to their own characteristics. However, if such a mechanism had existed it would be clear that it was actually only one dictionary with many functions. This can be illustrated by the discussion of two central types of user situations, i.e. text reception and text production. If Danish accountants or anyone else interested in accounting literature are reading an English text and has reception problems, the corresponding need for information may be provided in various ways which depends on the characteristics of the reader. If the readers’ proficiency level in terms of English accounting terminology and English language in general is relatively high, then an English explanation would be sufficient in order to meet their needs. But if the readers’ general and special English proficiency level is low, then their needs can only be satisfied by means of Danish equivalents or explanations. And finally, if the proficiency level is somewhere in between the two former, then a combination of Danish and English data would be the best way to ensure that their needs are covered. Similarly, if Danish accountants are writing English texts and run into production problems of any sort, then the lexicographic solution will once more depend on their specialised and general proficiency level in English. Accountants with a high proficiency level in this foreign language will generally be able to write directly in English without making any mental outline in their mother tongue. This means that their production problems will frequently be related to English words or terms which they already know, but maybe want to know how these words and terms are used (e.g. grammar or word combination with the expression), or to confirm that the words really have the meaning they think. On the other hand, if the accountants in question have a low proficiency level, their text production will take place through a Danish outline which is then translated into English, for which reason they will need a Danish-English solution in order to be guided to the right word or expression in the target language. And for many Danish accountants, text production in English will probably take place as a combination of the two mentioned methods, for which reason they will sometimes need a DanishEnglish solution and sometimes a monolingual English solution. A similar discussion could be taken related to the user needs in terms of translation and cognition. All this shows that Regnskabsordbøgerne and other electronic dictionaries could benefit from an overall and more stringent functional approach

Reflections on data access 

where the corresponding search mechanisms may lead their users to dynamic data (articles) that are especially adapted to each and every of the various types of users and situations covered by the dictionary in question. Such mechanisms could be refined in various ways. They could, as in the Ordbogen over Faste Vendinger, allow users to click on the right user situation and not only the ones used in this dictionary, but also others as for instance translation if the dictionary is bilingual. They could also allow users to define their own profile, either at a general level which the electronic dictionary may remember from consultation to consultation, or in relation to each consultation after determining the relevant user situation. However, there are also other possibilities. For instance, in the scroll menu which in Example 6 includes the search options contains, begins with, ends with and is, other options such as means the same as, means the opposite of or is associated with are possible in various types of dictionaries. And the same holds true for the possibility of writing not only individual words, but also word combinations in the search field as discussed in the previous section. One example from the English-Danish part of Regnskabsordbøgerne illustrates the possibilities in this respect. When writing the term accelerated depreciation in the search field, the user will end up with the data in Example 9. accelerated depreciation noun merafskrivning noun Definition The accelerated depreciation method recognises higher amounts of depreciation in the earlier years and lower amounts in the later years of a fixed asset’s life. Collocations • eliminate accelerated depreciation in the acquirer eliminere merafskrivning i den købende virksomhed • accelerated depreciation on fixed assets traded within the group merafskrivning på koncerninternt handlede anlægsaktiver • cumulative accelerated depreciation akkumulerede merafskrivninger • accelerated depreciation for tax purposes skattemæssige merafskrivninger • accelerated depreciation allowances skattemæssige merafskrivninger Examples • The reducing-balance method is a type of accelerated depreciation. Saldometoden indebærer en form for merafskrivning.

 Sven Tarp

Synonyms Saldoafskrivning See also: reducing-balance method declining-balance method

Example 9: Dictionary article from Regnskabsordbøgerne (2003–2008) However, if the user writes the word combination accelerated depreciation allowances in the present version of the dictionary (May 2009), no data will pop up. But if it were possible to search directly for word combinations like accelerated depreciation allowances, an ideal result for the user would be the hypothetic article reproduced in Example 10. accelerated depreciation allowances word combination skattemæssige merafskrivninger word combination Related to accelerated depreciation noun merafskrivning noun Definition The accelerated depreciation method recognises higher amounts of depreciation in the earlier years and lower amounts in the later years of a fixed asset’s life. Synonyms Saldoafskrivning See also: reducing-balance method declining-balance method Example 10: Hypothetic dictionary article based upon Regnskabsordbøgerne (2003–2008) In Example 10, users are guided directly to a dictionary article which is especially adapted to their needs in terms of the word combination accelerated depreciation allowances which is presented as lemma with the corresponding Danish translation immediately below. These are the data that may be considered most relevant for users in this case. But if, for one or another reason, users need more data these data are also included in such a way that they do not disturb the quick and easy

Reflections on data access 

access to the data considered most relevant when users need specific information about accelerated depreciation allowances. For most concrete consultations, these additional data may be completely superfluous, but as they do not make any harm there is no reason not to include them if they can provide some assistance in a number of consultations. 7. Concluding remarks This article has explored and discussed some of the possibilities made available to lexicography by the new information technologies. It has not tried to provide final answers and solutions, but only to give rise to a discussion about the need to “rethink” lexicography based upon an advanced theory and making extensive use of the new technologies. Nobody knows how dictionaries will develop in the future, but one direction will probably be the “individualization” of the lexicographic product, adapting it to the concrete needs of a concrete user in a concrete situation and providing much quicker and easier access to the relevant data. In this light, this article should be understood as an invitation to enhance the theoretical and scientific discussion about lexicography, data access and information retrieval, for which professor Henning Bergenholtz has appealed for years. References A.

Dictionaries

Walter, E. et al. (eds.). 1998. Cambridge International Dictionary of Idioms. Cambridge: Cambridge University Press. Leth, J.H. (ed.). 1800. Dansk Glossarium. En Ordbog til Forklaring over det danske Sprogs gamle, nye og fremme Ord og Talemaader for unge Mennesker og for Ustuderede. Et Forsøg. Med en Fortale af Professor Rasmus Nyerup. Kiøbenhavn: Trykt paa Hofboghandler Simon Poulsens Forlag hos Bogtrykker Morthorst’s Enke & Comp. Bergenholtz, H., Vrang V. and Almind R. 2007. Ordbogen over Faste Vendinger. Aarhus: Center for Leksicography. (Accessed 1 May 2009). Bergenholtz, H. et al. (eds.). 2008. Den Danske Netordbog. Odense: Ordbogen.com. (Accessed 1 May 2009). Bergenholtz, I. (ed.). 2006. Musikordbogen. Aarhus: Aarhus School of Business. (Accessed 1 May 2009). Wehmeier, S. et al. (eds.). 2005. Oxford Advanced Learner’s Dictionary. Oxford: Oxford University Press. Nielsen, S., Mourier, L. and Bergenholtz, H. (eds.). 2003–2008. Regnskabsordbøgerne. Aarhus: Aarhus School of Business. (Accessed 1 May 2009).



Sven Tarp Postlethwayt, M. (ed.). 1774. The Universal Dictionary of Trade and Commerce: With large Additions and Improvements, Adapting the same to the Present State of British Affairs in America, since the last Treaty of Peace made in the Year 1763. With Great Variety of New Remarks and Illustrations Incorporated through the Whole: Together with Anything essential that is contained in Savary’s Dictionary: Also, all the Material Laws of Trade and Navigation relating to these Kingdoms, and the Customs and Usages to which all Traders are Subject. Fourth Edition, Volume 1, London: Printed for W. Strahan, etc.

B.

Other literature

Be�� rgenholtz, C., Bergenholtz, H. and Tarp, S. 2008. “Leksikografi i videnskabsteoretisk perspektiv: sand, falsk eller irrelevant.” LexicoNordica 15: 155–168. Bergenholtz, H. and Gouws, R.H. 2007. “The access process in dictionaries for fixed expressions.” Lexicographica 23: 237–260. Engels, F. 1990. “Natural science and the spirit world.” In Collected Works, Volume 25, 345–355. Moscow: Progress Publishers. Ingwersen, P. 1992. Information Retrieval Interaction. London: Taylor Graham. Lucretius Carus, T. 1994. On the Nature of the Universe. London: Penguin. Manning, C.D., Raghavan, P. and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press Nielsen, S. 2008. “The effect of lexicographical information costs on dictionary making and use.” Lexikos 18: 170–189. Tarp, S. 2007. “Lexicography in the information age.” Lexikos 17: 170–179. Tarp, S. 2008a. “The third leg of two-legged lexicography.” Hermes 40: 117–131. Tarp, S. 2008b. “Revival of a dusty old profession.” Hermes 41: 175–188. Tarp, S. 2008c. “Kan brugerundersøgelser overhovedet afdække brugernes leksikografiske behov?” LexicoNordica 15: 5–32. Tarp, S. 2008d. Lexicography in the Borderland between Knowledge and Non-knowledge: General Lexicographical Theory with Particular Focus on Learner’s Lexicography [Lexicographica Series Maior, 134]. Tübingen: Max Niemeyer. Wiegand, H.E. 1983. “Was ist eigenlich ein Lemma? Ein Beitrag zur Theorie der lexikographischen Sprachbeschreibung.” In Studien zur neuhochdeutschen Lexicographie III, H.E. Wiegand (ed.), 401–474. Hildesheim/New York: Georg Olms. Wiegand, H.E. 1987. “Zur handlungstheoretischen Grundlegung der Wörterbenutzungsforschung.” Lexicographica 3: 178–227. Wiegand, H.E. 2000a. “Wissen, Wissenrepräsentationen und Printwörterbücher.” In Proceedings of the Ninth Euralex International Congress, Euralex 2000. Stuttgart, Germany, August 8th– 12th, 2000. Volume 1, U. Heid, S. Evert, E. Lehmann and C. Rohrer (eds.), 15–38. Universität Stuttgart: Institut für Maschinelle Sprachverarbeitung. Wiegand, H.E. 2000b. “Über Suchbereiche, Suchzonen und ihre textuellen Strukturen in Printwörterbüchern.” In Wörterbücher in der Diskussion IV. Vorträge aus dem Heidelberger Lexikographischen Kolloquium, H.E. Wiegand (ed.), 233–301. Tübingen: Max Niemeyer. Wiegand, H.E. 2002. “Wissen in der Sprachlexikografie. Ein Plädoyer für einige immer noch notwendige Differenzierungen.” In Linguistik jenseits des Strukturalismus. Akten des II. OstWest-Kolloquiums Berlin 1998, K. Ezawa, W. Kürschner, K.H. Rensch and M. Ringmacher (eds), 265–281. Tübingen: Gunter Narr.

Hybrid text constituent structures of dictionary articles A contribution to the expansion of the theory of textual dictionary structures Herbert Ernst Wiegand Firstly, it is indicated with which different methods of segmentation the article internal text segments of different types can be ascertained. The following types of text segments are distinguished: items, item texts, non-typographical microstructural indicators, functional item additions, item segments, item form segments. Only text segments with text constituent status belong to the first three mentioned types of text segments. They exhibit an own form, at least one separate function and an own position with neighbours for that position. Only text segments without text constituent status belong to the last three abovementioned types of text segments. Non-hybrid textual structures display elementhomogeneous structure-carrying sets, containing only text segments with text constituent status. On the contrary, hybrid textual structures display elementheterogeneous structure-carrying sets. This allows in a formal way, among others, in the formation of structures also the explicit consideration of the upward and downward expanding as well as the internal expanding functional item additions. Consequently, they can have their own structural place in the textual structures presented by means of tree diagrams. Previously, this has not been possible. Many types of hybrid textual structures are distinguished and formally presented by means of tree diagrams. Finally, a comprehensive typological excerpt of hybrid text constituent structures is presented. Keywords: dictionary articles; article internal text segments; text constituent status; non-hybrid textual structures; hybrid textual structures; functional items



Herbert Ernst Wiegand

1. Preliminary remark regarding the connection of terminology, method and theory German Hybride originates from the Latin hybrida ‘half-cast’. The accompanying German adjective hybrid shows a spectrum of meaning that can be paraphrased as follows according to Duden-GFWB (2007): “mixed, from two origins, composed of several; formed by cross-breeding, mixture.” To the theory of lexicographic texts which I have developed, belongs a detailed heuristics (compare e.g. Wiegand 1990: 20–26; 2005 [2006]: 217–226), which also provides different methods for the segmentation of condensed lexicographic texts. For the application of these methods of segmentation, application conventions and conditions of correctness have been fixed. For the linking of methods and parts of theory demanded by the philosophy of science the following principle applies along with others: only those text elements are considered as elements for concrete textual dictionary structures that are obtained through the proper application of one of the methods of segmentation. The following distinctions apply: 1. Through an application of the method of exhaustive functional-positional segmentation, which presents one of the variants of the method of functionalpositional segmentation, all types of text constituents can be obtained. These are only those functional text segments that display a discrete continuous form of the text segment, at least one genuine function of a text segment and exactly one fixed textual position in the linguistic chain of the text. These are the following: items, item texts and non-typographical structural indicators (cf. Wiegand 2008b: Figure 4–1). 2. Through an application of the method of non-functional-positional segmentation all non-functional text segments as well as all internally expanded functional item additions are obtainable (cf. Wiegand 2007: 201 et seq.). 3. Through an application of the method of functional segmentative isolation all top and bottom expanded functional item additions, including those without their own position in the linguistic chain, are obtainable (cf. Wiegand 2007: 193 et seq.). The following arrangements also apply: 1. Concrete hierarchical text constituent structures of which the structure-carrying sets are element-homogeneous in the sense that they display as elements only text segments with text constituent status (cf. 1. above) are called concrete pure (or: non-hybrid) text constituent structures: All their elements have the same origin in the sense that they are obtainable through the application of the same method. Accordingly, abstract hierarchical constituent structures of which the structure-carrying sets only present classes of text segments with

Hybrid text constituent structures of dictionary articles 

text constituent status are called abstract pure (or: non-hybrid) text constituent structures. 2. Concrete hierarchical text constituent structures of which the structure-carrying sets are element-heterogeneous in the sense that they display as elements both text segments with and also those without text constituent status (cf. 1- 3 above), are called concrete hybrid text constituent structures: Their elements differ in origin (cf. the translation of the paraphrasing in Duden-GFWB (2007), in the sense that they are obtainable through applications of different methods. Accordingly, abstract hierarchical text constituent structures of which the structure-carrying sets display both classes of text segments with text constituent status and classes of text segments without text constituent status are called abstract hybrid text constituent structures. From this the following can be determined: the composition of the structure-carrying sets is a criterion to distinguish between concrete hierarchical pure and concrete hierarchical hybrid text constituent structures; concrete hierarchical pure text constituent structures display element-homogeneous structure-carrying sets whereas concrete hierarchical hybrid text constituent structures display element-heterogeneous structure-carrying sets. This relation applies mutatis mutandis to the abstract structures. In addition, hierarchical textual structures that are not text constituent structures, e.g. certain item structures, can also be divided with the help of the same criteria into pure and hybrid structures (cf. 3.). 2. Pure article internal text constituent structures In condensed dictionary articles (cf. Wiegand 2003 [2004]: 203 et seq.) text constituent structures occur that belong to three types: the pure article constituent structure, the pure article microstructure and the article internal search area structure (cf. Wiegand 2000: 269 et seq.). To understand at least on an intermediate level of abstraction why the introduction of hybrid text constituent structures presents an “expansion of the theory of textual dictionary structures” (cf. the subtitle), the following paragraph firstly discusses an example of pure article microstructures in a slightly simplified way. The basis of the theory cannot be dealt with here (cf. Wiegand 1989a; 1989b). Compare the completely condensed dictionary articles da1 – da6 in Figure 1. da1:



Herbert Ernst Wiegand

da2:

da3:

da4: da5:

da6: Figure 1. Dictionary article da1 and da2 from HWDG (1984), da3 from Duden-DUW (2007), da4 from Sanders-WDS (1876), da5 from Hollós (2001) and da6 from Stowasser (1994)

In the following discussion we firstly look at the concrete hierarchical article microstructure of da1. No non-typographical microstructural indicators belong to the structure-carrying set of a concrete hierarchical article microstructure. They rather belong to the structure-carrying sets of concrete hierarchical article constituent structures of which the most prominent partial structures are the concrete hierarchical article microstructures. Consequently, to obtain its concrete hierarchical microstructure da1 should be segmented in a first methodological step in such a way that all other text constituents prevail. Because da1 contains no item text these will only be elementary items. Therefore the method of non-exhaustive functional-positional segmentation will be applied to da1. This presents a second variant of the method of functional-positional segmentation of which the correctness conditions require that all segmentation results only apply to the items. The segmentation convention C1 applies, according to which hyphens (e.g. those in -es and -e in da1) are regarded as part of the item form of an item with cohesion instructions. Another segmentation convention C2, not to be applied here, could indicate that both the hyphens should be regarded as immediate partial items, i.e. as condensed word stem items. This would imply that -es and -e are not elementary items but non-elementary items. In the following discussion the segmentation procedures are not explained in detail (cf. e.g. Wiegand 2000: 235 et seq.). Only the segmentation result is listed in

Hybrid text constituent structures of dictionary articles 

such a way that each elementary and non-elementary item, presented in da1, is given a lower-case letter, preceding it in round brackets, as an item name. It should be noted that item names are individual names. Furthermore, every item of the class of items is related to the same general dictionary object directed item function to which it belongs. The latter goes for statements of the form “α∈β” with “α” as variable for items and “β” as variable for class symbols that are abbreviations of item class names. In addition “∈” is the symbol for the element-class-relation, in the sense of is an element of or in the sense of is included in. (a) Schurz, der; -es, -e [iABj]; a ∈ CF (= comment on form) (b) Schurz; b ∈.IFLS (= item giving the form of the lemma sign; here: lemmatic item giving a noun) (c) der; -es, -e; c ∈.IMor.n (= item giving the morphology at nouns) (d) der; d ∈ Iart (= item indicating an article) (e) -es, -e; e ∈ IDecC (= item indicating the declination class) (f) -es; f ∈ c.ISF (= condensed item giving the singular formation) (g) -e; g ∈ c.IPlF (= condensed item giving the plural formation) (h) [iABj]; h ∈ I-Pron (= item for the normal pronunciation; this is a zero item, presented in the concrete structure by a blank item (BI) which is also identified positionally over its neighbouring variables “i” and “j”) (i) bes. von Handwerkern [...] lederner S.; i ∈ CS (= comment on semantics) (j) [iABj] bes. von Handwerkern [...] aus festem Material; j ∈ pragsemI (= pragmatic-semantic item) (k) [iABj]; k ∈ I-pragZL (= item for the pragmatic zero labelling; this also is a zero item) (l) bes. von Handwerkern [...] aus festem Material; l ∈ c.IPM (= condensed item giving the paraphrase of meaning) (m) ein lederner S; m ∈ c.IComEx (= condensed item giving the competence example). The items b, d, f, g, h, k, l and m are elementary items, i.e. without the possibility of further functional-positional segmentation. The items a, c, e, i and j are nonelementary items and functional-positional segmentation is possible. In a second methodological step we are now constructing the element-homogeneous structure-carrying set for p MiSch

, the concrete (c) hierarchical (h) pure

da1 

(p) microstructure (MiS) of da1. The elements of this structure carrying set are all methodically obtained items a-m and the complete dictionary article da1. The structure-carrying set is called ScMiS da1 : It can be presented as follows with 14 elements: ScMiS da1  da1 ,a,c,d,e,f ,g,h,i, j,k,l,m



Herbert Ernst Wiegand

In a third methodological step we define for ScMiS da1  a two-place non-reflexive (and consequently asymmetrical) as well as transitive relation of the type of the c precedence (p) relation – called R p da1  – with the relation term x precedes y, in c which “x” and “y” are variables for items. R p da1  belongs to the non-reflexive c c c partial ordering relations. The following applies: R p da1  SMiS da1 SMiS da1 . The structure-shaping relation R cp da1  contains as elements all those ordered pairs, e.g. and , that, when their coordinates are put in the relation term x precedes y, will deliver true sentences, e.g. a precedes d (≈ Schurz precedes “der”). R cp da1  therefore fully determines which items from da will precede which other 1 items from da1. The same also applies to other relations (in further examples) that belong to the type of the precedence relation. Precedence relations are also called predecessor-successor-relations. In a fourth methodological step we define for the structure-carrying set ScMiS da1  a two-placed reflexive, antisymmetrical and transitive relation of the partitive c (part) relation type, called R part da1 , with the relation term x is an item as part of y, with “x” as variable for items and “y” as variable for items and da1. The following c c SMiS applies: R cpart da1  SMiS and R cp da1  R cpart  . The relation R cpart da1  belongs to the reflexive partial ordering relations. It contains as elements those ordered pairs, e.g. and that, when their coordinates are put in the relation term x is an item as part of y, will deliver true sentences, e.g. d is an item as part of c (≈ “der” is an item as part of “der; -es, -e”). The relation R cpart da1  therefore fully determines (i) which items from da1 are partial items from which other items from da1, as well as (ii) which items are parts of da1. c Following the execution of the fourth methodological step pMiSh da1 , the pure concrete hierarchical microstructure of da1 is given as an ordering structure, determining which items from which item classes are presented in which order in the dictionary article da1. By doing so it can also be determined that hierarchical pure article microstructures from completely condensed dictionary articles (cf. Wiegand 2003 [2004]: 207 et seq.) can occur as article internal item distribution structures. Concrete hierarchical pure microstructures can be presented by means of ordered tree diagrams (cf. Wiegand 1989a: 464 et seq.; 2000: 235 et seq.). They can furthermore occur along with the relevant and isomorphous abstract hierarchical microstructures in a commented structural diagram. Abstract microstructures can be obtained because of the fact that both the structure-shaping ordering relations are defined with regard to one structure-carrying set that does not contain da1 and the concrete items a-m, but rather the class DA of the dictionary articles (e.g. of a specific dictionary) as well as the item classes to which the items a-m belong. This structure-carrying set – called SaMiS da1  – can be presented as follows when used according to the class symbols introduced in (a)-(m):

Hybrid text constituent structures of dictionary articles 

DA,CF,IFLS,IMor.n,Iart,IDecC,c.ISF,c.Ip1F,I  Prc.IPlF, on,CS,pragsemI,I  pragZL,c.IPM,c SaMiS da1  = {DA, CF, IFLS, IMor.n, Iart, IDecC, c.ISF, I-Pron, CS, pragsemI, I-pragZL, c.IPM, c.IComEx}

The single commented structural diagram for the concrete and isomorphous abstract hierarchical pure microstructure of da1 is given in Figure 2. A.H. PURE MICROSTRUCTURE

A. H. LEFT CORE STRUCTURE /MIS

A. H. BASE STRUCTURE /MIS

DA

A. H. RIGHT CORE STRUCTURE /MIS

CF

CS

IFLS|IWF.ns| IMor.n Isp Iart

I-nPron

IDecC

pragsemI I-pragZL

c.IComEx

c.IPM

c.ISF c.IPF

Schurz der

-es

-e [iBI j]

[iBI j] bes. von Hand - ein lederner werkern [...] S. Material

Figure 2. Single commented structural diagram of the abstract (and isomorphous concrete) hierarchical microstructure of da1 in Figure 1. Conventions of presentation: “x ___ y” means (read from the bottom to the top) the same as x is an item as part of y; “u ––– v” means (read from the bottom to the top) the same as u is an element of v (= u ∈ v, where “v” is a class symbol). “|” means the same as at the same time; LEFT CORE STRUCTURE/MiS should be read as left core structure as partial structure of the microstructure (MiS); abbreviations (all abbreviations which label nodes are class symbols for classes of items with the same dictionary topic directed genuine item function): A. = ABSTRACT; H = HIERARCHICAL; DA = dictionary article; CF = comment on form; CS = comment on semantics; IFLS = item giving the form of the lemma sign; IWF.ns = item giving the word form of the nominative singular; Isp = item giving the spelling; IMor.n = item giving the morphology at nouns; I-nPron = item for the normal pronunciation; Iart = item indicating an article; IDecC = item indicating the declination class; c.ISF = condensed item giving the formation of the singular form; c.IPF = condensed item giving the formation of the plural; pragsemI = pragmatic-semantic item; I-pragZL = item for the pragmatic zero labeling; c.IPM = condensed item giving the paraphrase of meaning; c.IComEx = condensed item giving a competence example; IBIj = blank item of which the structural position is determined via the citation of the preceding variable “i” and the following variable “j” with the corresponding class symbols of both neighbours



Herbert Ernst Wiegand

Numerous types of pure microstructures can be distinguished. An incomplete overview can be found in Wiegand (2002: 573–580). 3. Hybrid article internal text constituent structures In this section we are looking at hybrid article microstructures as an example of hybrid article internal text constituent structures. These hybrid article microstructures can be seen in a dictionary article when it displays at least one functional item addition. For dictionary articles with functional item additions both a pure and a hybrid article microstructure prevails. Functional item additions are functional text segments by means of which, as is the case with items, something is presented that, in contrast with items, does not display text constituent status (cf. with regard to functional item additions e.g. Wiegand 2005 [2006]: 326–330; 2007: 192 et seq.). They are not, as items are, obtainable as a result of a functional-positional segmentation. This is because they do not have their own position in the language chain (as e.g. the sunken dot underneath the item form segment “i” of the item giving the form of the lemma sign in da3, cf. Figure 1) or because, when they do have an own position in the language chain, a functional-positional isolation is not possible, e.g. the semantic inner gloss “Facharzt” in da2 in Figure 1, where a segmentation leads to both non-functional text segments “er muß einen Spezialisten” and “aufsuchen”. The sunken dot in da3 belongs to the bottom expanded functional item additions; it is a bottom expanded bifunctional item addition that is upwardly addressed at the item form segment “i”. It realizes a word accent marker which is also a marker of the vowel quantity as being short (Wacc|vocq.s). The same applies to the sunken dot in da2. If a concrete and an abstract hierarchical pure microstructure are allocated to da3 then these will be obtainable in the same way as the microstructures from da1. It can be presented as in Figure 3. When looking at the structural diagram in Figure 3 it is noticeable that in the abstract hierarchical microstructure the word accent marker which also marks the short vowel quantity does not have its own node. It therefore displays no own structural place in the ordered tree diagram that is linked to another structural place in the same tree diagram by means of at least one interface. The occurrence of this bottom expanded bifunctional addition in da3 can only be seen from the two node labels “IFLS  Wacc|vocq.s | IWF.ns | Isp” and “Ias  Wacc|vocq.s”. In the accompanying isomorphous concrete microstructure the sunken dot in the bottom expanded item giving the syllable accent can therefore also not display an own structural place. It rather appears in “rịchts” as in the concrete text of the dictionary article! If in the presentation of article microstructures you do not only want to take explicit cognizance of the article internal distribution of items and, when

Hybrid text constituent structures of dictionary articles  A. H. PURE MICROSTRUCTURE A. H. BASE STRUCTURES /MIS

DA

A. H. RIGHT CORE STRUCTURE /MIS

A. H. LEFT CORE STRUCTURE /MIS

CF

CS

IFLS Wacc|vocq.s| |WF.Ns|Isp

IMor.n

ISyl ISylD Ias IsylD ISyl ISylD ISyl ISylD Wacc| vocq.s

Ge

|

richts |

voll

|

zie

Isyl

pragsemI

Iort IDecC

I-pragZL

IcomEx2

IMP c.IComEx c.IComEx

c.ISF c.IPF

|

her der

-s

–

[iBI j] Angehöriger die Schulden der […] bemithilfe des traut ist - s eintreiben der G. hat die Möbel gepfändet

Figure 3. Single commented structural diagram for the abstract (and isomorphous concrete) hierarchical microstructure, displayed by da3 in Figure 1. New abbreviations: ISyl = item giving a syllable; IsylD = item giving the syllable division; Ias = item giving the accentuated syllable;  means the same as bottom expanded; Wacc|vocq.s = marker for word accent and vowel quantity as short; IComEx2 = item consisting of two items giving the competence examples

necessary the distribution of item texts, but, if available, also of functional item additions, you have to proceed from pure to hybrid microstructures. Without going into all formal details, da3 has been used as an example representing all bottom expanded functional item additions to explain this. Next we first look at the bottom expanded item giving the syllable accent “rịchts” from da3. It is an immediate partial item of the bottom expanded item giving the form of the lemma sign and therefore belongs to the lemmatic partial items, as is the case with the four syllable and four syllable division items. All lemmatic partial items with the exception of the item giving the syllable accent are non-expanded elementary items. In contrast, the item giving the syllable accent belongs to the single expanded elementary items and among these to the bottom expanded ones (cf. Wiegand 2005 [2006]: 289). While non-expanded elementary items do not display any internal structure the situation is different in the case of expanded elementary items. They can display an internal hybrid structure. Their elements are obtainable as top or bottom expanded elementary items through the application of a method of functional segmentative isolation. With an application



Herbert Ernst Wiegand

of this method horizontal segmentation sections are structured in such a way that the top and the bottom expanded functional item additions are separated from the elementary items. In the case of the bottom expanded item giving the syllable accent “rịchts” it means that the result of the segmentation consists of the segment “richts” and the sunken dot. It should be noted that both segments are parts of “rịchts”. To be able to determine the exact position of the sunken dot, a non-functional-positional segmentation of “rịchts” is performed, resulting in the following item form segments: “r|ị|chts” (with “|” marking the segmentation joint); “r” is the front, “i” the middle and “chts” the back item form segment. The following statement applies: r < ị < chts (with “, , , G.s ComEx , G.s,I(G.s)ComEx , bISeg,I(G.s)ComEx} reR apart a1 = { 35000 Components and structures > 35300 Microstructure > 35330 Components and positions): 35330–1 subdivision 35330–1.1 lemma section 35330–1.2 explanatory section 35330–1.3 demonstration section 35330–2.1 entry position 35330–2.1.1 lemma position 35330–2.2 entry field 35330–2.2.1 entry head 35330–3 part-of-speech section 35330–4.1 formal section

 Bo Svensén

35330–4.2 semantic-pragmatic section 35330–4.3 contextual section 35330–4.3.1 collocation section 35330–4.3.2 idiom section 35330–5 sense 35330–5.1 sub-sense 35330–6.1 right-hand context 35330–6.2 left-hand context 35330–7 extra column 3. Lexicographic terms and linguistic terms How many of the terms used in metalexicographic text should be regarded as strictly lexicographic? It is almost in the nature of things that the ratio should vary between the different sectors of the subject field. In sectors such as dictionary typology, information types, dictionary components and dictionary structures, the number ought to be comparatively large. Other sectors can be expected to have a less conspicuous ratio of such terms, for instance dictionary functions and dictionary use, and also the collection, selection, processing, presentation and distribution of lexicographic data. In yet other sectors, the ratio of strictly lexicographic terms is probably rather insignificant, for instance lexicographic project work and lexicographic aids and tools. However, it may sometimes be difficult to make a clear distinction between terms belonging to linguistics and terms that are more or less restricted to the technical language of metalexicography. For instance, the reviewers of NLO failed to make this distinction when they criticized the dictionary for having classified the term equivalent under “Interlingual relations” (main class “General theory of linguistics”) instead of entering it under main class “Special theory of general bilingual dictionaries.” (Braasch et al. 1999: 190) The reason for the classification in NLO is that the term equivalent is not exclusively lexicographic but is used in several other contexts as well; the reviewers may have been misled by the fact that equivalent is often used in everyday metalexicographic parlance to denote a particular type of information item in dictionaries, namely the one for which a more correct term would be indication of equivalent or equivalent indication. A parallel case commented on by the reviewers (Braasch et al. 1999: 190–191) is diasystematic marking, which was classified in NLO under “Marking; pragmatic information” (main class “General theory of lexicography”) but which, according to the reviewers, should have been classified instead under “Diasystematics” (main class “General theory of linguistics”). In this case, their criticism was justified:

Subject-field classification for metalexicography revisited 

NLO had not managed to account satisfactorily for the distinction that, for the sake of clarity, should be made between marking and marking information (= labelling). The term marking should be understood to denote a quality of a lexical item (“the quality of being marked”) and, consequently, not be regarded as an exclusively lexicographic term. The term labelling, on the other hand, should be regarded as having the restricted meaning of “the action or system of providing lexical items with indications of marking (= labels) in a dictionary”.7 The difficulties just mentioned are not alleviated by the fact that there is a rather close resemblance between the structure of subclass “Information types” (see Section 2.3 above) and that of the first subclass under “General theory of linguistics”. At present, this subclass has the following structure: 21000 Linguistic items and their characteristics 21100 Morphology 21110 Inflectional morphology 21120 Derivational morphology 21130 Compositional morphology 21140 Abbreviations and clippings 21200 Realization 21210 Graphical realization 21220 Phonetic realization 21300 Grammatical category 21400 Meaning 21410 Meaning components 21420 Meaning structure 21430 Meaning types 21440 Meaning change 21500 Diasystematics 21510 Diachronics [---] 21600 Syntagmatics 21610 Word combinations in general 21620 Constructions 21630 Collocations 21640 Idioms and other fixed word combinations 21700 Paradigmatics 21710 Content paradigmatics 21720 Expression paradigmatics 21800 Interlingual relationships 21810 Types of equivalence and equivalents 21820 Equivalence relationships

 Bo Svensén

However, those writing on metalexicography will be better served if the classification scheme allows them to find, for instance, equivalent alongside equivalent indication, phonetic transcription alongside pronunciation key, marking alongside labelling, and not under different main classes. With the production needs of this user group in mind, then, it might be useful to establish a special classification scheme where the categories belonging to “Linguistic items and their characteristics” have been incorporated into “Information types”. This would mean, for example, that “21110 Inflectional morphology” and “34131 Inflectional morphology” were merged into one class, while “21500 Diasystematics” and “34600 Diasystematic labelling” were merged into another. As a further consequence, the entire linguistic main class can be excluded from the true subject-field classification scheme for metalexicography, which would then, after this final pruning, include the following main classes: 10000 Lexicography as a discipline 20000 General theory of lexicography 30000 Special theory of general-purpose dictionaries 40000 Special theory of specialized dictionaries 50000 Lexicographic methods and tools 4. Concluding remarks Future work will include verifying the appropriateness of the restricted classification scheme in all its details, investigating the usefulness of the special classification scheme for those writing on metalexicography, testing out the additional numerical codes at the lowest levels and, not least, subjecting all the terms to a close scrutiny as to the justification of their presence in a true subject-field classification scheme for metalexicography. Notes: 1. What is called term classification here is actually not a classification of terms in the normal sense of the word, but should rather be described as a systematic procedure of selecting (or creating) terms that reflect as much as possible the corresponding concept classification. 2. The list is based on the table of contents (cf. Hausmann et al. 1989–1991: XXXV–LII). 3. It should be noted that NLO, apart from definitions and explanations in Norwegian and terms in all the major Scandinavian languages, also has terms in English, German and French. 4. Not translated by me but by a native British lexicographer in cooperation with a Swedishborn professional translator working in England.

Subject-field classification for metalexicography revisited  5. In what follows, the number of term records quoted does not include synonym records headed by cross-reference lemmas. 6. A residual category “69000 Others”, made up of terms that should rightly have been classified under “31900 Interlingual relations”, may be left out of account here. 7. As a matter of fact, the reviewers do not seem to have been able to make this distinction either, since they apparently equate marking and labelling, which makes their criticism only partly justified.

References A.

Dictionaries

Bergenholtz, H., Cantell, I., Fjeld, R.V., Gundersen, D., Jónsson, J.H. and Svensén, B. 1997. Nordisk leksikografisk ordbok [NLO; “Nordic Dictionary of Lexicography”]. Oslo: Universitetsforlaget. Hartmann, R.R.K. and James, G. 1998. Dictionary of Lexicography. London/New York: Routledge.

B.

Other literature

Béjoint, H. 2000. Modern Lexicography: An Introduction. Oxford: Oxford University Press. Bergenholtz, H. and Tarp, S. (eds.) 1995. Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries. Amsterdam/Philadelphia: John Benjamins. Braasch, A., Hjorth, E. and Madsen, B.N. 1999. “[Review of] Henning Bergenholtz, Ilse Cantell, Ruth Vatvedt Fjeld, Dag Gundersen, Jón Hilmar Jónson, Bo Svensén. Nordisk leksikografisk ordbok. Oslo: Universitetsforlaget. 1997.” International Journal of Lexicography 12(2): 185–192. Hartmann, R.R.K. 2001. Teaching and Researching Lexicography. Harlow: Longman. Hausmann, F.J., Reichmann, O., Wiegand, H.E. and Zgusta, L. (eds.) 1989–91. Wörterbücher / Dictionaries /Dictionnaires. Volumes 1–3. Berlin/New York: Walter de Gruyter. Jackson, H. 2002. Lexicography. An Introduction. London/New York: Routledge. Landau, S.I. 2001. Dictionaries. The Art and Craft of Lexicography. Second edition. Cambridge: Cambridge University Press. Sterkenburg, P. van (ed.) 2003. A Practical Guide to Lexicography. Amsterdam/Philadelphia: John Benjamins. Svensén, B. 1993. Practical Lexicography. Principles and Methods of Dictionary-Making. Oxford/ New York: Oxford University Press. Svensén, B. 1994. “Nordisk lexikografisk ordbok – terminologisk systematik [‘Nordic Dictionary of Lexicography – terminological systematics’]”. LexicoNordica 1: 229–238. Svensén, B. 2004. Handbok i lexikografi [“A Handbook of Lexicography”]. Stockholm: Norstedts Akademiska Förlag. Svensén, B. 2009. A Handbook of Lexicography. The Theory and Practice of Dictionary-Making. Second edition. Cambridge: Cambridge University Press.

Systematic introductions in specialised dictionaries Some proposals in relation to accounting dictionaries Pedro A. Fuertes-Olivera Bergenholtz and Tarp (1995) claim that systematic introductions adapted to lexicographical function(s) are important components in specialised dictionaries. In essence, they are independent dictionary components supplemented by cross-references to and from individual articles in the wordlist. This paper reviews previous work on systematic introductions and makes proposals that are considered for inclusion in a planned English-Spanish online Dictionary of Accounting, which will be integrated into the Dictionary of Accounting, a network of two monolingual Danish and English dictionaries and two bilingual ones. The proposals will be developed using the function theory of lexicography and, in particular, we will discuss systematic introductions in relation to selected cognitive and communicative user situations, such as specialised translation. Keywords: specialised lexicography; subject-field components; specialized dictionaries; lexicographical functions; user needs

1. Encyclopaedic information in specialised dictionaries Encyclopaedic information is concerned with describing factual knowledge and extra-linguistic reality. In specialised dictionaries encyclopaedic information is usually given in encyclopaedic notes in the dictionary articles, encyclopaedic labels addressed to the individual lemmata or equivalents, and independent outside matter components, referred to as systematic introductions, subject-field components, encyclopaedic sections, or subject field term systems (Bergenholtz and Nielsen 2006: 284). Encyclopaedic notes in specialised dictionaries are regarded as key components that fulfil dictionary functions and satisfy the needs of intended users. For example, in the Encyclopedic Dictionary of Gene Technology, Kaufman et al. (1998) claim that encyclopaedic notes are similar to definitions for non-specialists, a revolutionary assumption that illustrates one of the roles encyclopaedic notes

 Pedro A. Fuertes-Olivera

should play in specialised dictionaries. First, they are useful for eliminating uncertainty about a sub-field and/or subject field in cases of text production or text reception by an expert or semi-expert who happens to be unfamiliar with a specific issue (for example, a Spanish accountant unfamiliar with accounting concepts used by American companies). Second, the notes explain concepts, primarily to laypersons and semi-experts, and therefore should be addressed only to the specialised terms proper. Third, if adequately cross-referred to from other lemmata, framed articles or systematic introductions, encyclopaedic notes are appropriate lexicographical means of presenting a full introduction to the subject field (or subfield) in question. Four, the contents, language, and style of encyclopaedic notes have to be adjusted to the factual and linguistic competences of the intended users and their needs. For instance, dictionaries for learners should replace the so-called encyclopaedic style (typically incomplete sentences) with complete sentences (Fuertes-Olivera and Arribas-Baño 2008). Five, together with their primarily cognition-oriented functions, encyclopaedic notes can also support communicationoriented functions by offering collocations and show other linguistic characteristics of the lemma (Bergenholtz and Tarp 1995: 143–150). Encyclopaedic labels are often special symbols or abbreviated terms used in reference works to help the user find a particular lemma, choose the correct equivalent, or indicate the association of a term with a particular subject field. In specialised dictionaries taking into account dictionary functions and users’ needs, encyclopaedic labels will assist users more in communication-oriented situations involving a foreign language than in situations involving their native language, provided that the target users have at least a working knowledge of the subject field (or sub-field) treated, particularly in culture-dependent domains. The labels are also useful for cognition-oriented functions, although they cannot replace encyclopaedic notes or systematic introductions (Bergenholtz and Tarp 1995: 150–154). Systematic introductions are separate dictionary components, placed either in the front matter or in the back matter (usually following the user’s guide), and aim at providing help in cognition-oriented and communication-oriented user situations. In the former type of situations, the component provides an introduction to or a systematic, detailed presentation of the subject field covered by the dictionary, thereby fulfilling its cognitive function. In communication-related situations, the component supplements the encyclopaedic information offered with language information in the form of collocations, auxiliary words, and examples of standard LSP usage (Bergenholtz and Tarp 1995: 154–159 and 176–178; Bergenholtz and Nielsen 2006: 290–293). On the whole, this type of outer text helps users in a systematic way

Systematic introductions in specialised dictionaries 

to understand the relevant subject field. In relation to the systematic introduction of the Encyclopedic Dictionary of Gene Technology, Humbley explains that: [i]n forty-odd closely printed pages the reader is presented with the basics of microbiology, DNA and gene technology and the ethics involved in gene technology. This section is illustrated with tables and structures of the entities described, and a couple of schemata to indicate how knowledge about the field has evolved, but there are generally few purely didactic illustrations. In spite of the presentation, which is predictably dry due to the extreme compression, it is obviously very useful to the reader to have a potted survey of the field. (Humbley 2003: 594)

2. Systematic introductions in specialised dictionaries Systematic introductions should be adapted to the function(s) of the dictionary, because these introductions aim to cater for users who may need a general understanding, a more systematic presentation, or a comparative description of a subject field either for comprehending (for example, students preparing for an exam), producing (for instance, journalists writing articles), and/or translating texts. Although most published dictionaries pay only scant attention to lexicographical functions, usually because the process of compiling dictionaries is primarily based on economic considerations which lead to publishing dictionaries for the largest possible audience, there are some notable exceptions. The Encyclopedic Dictionary of Gene Technology (Kaufmann et al. 1998) contains a well-developed systematic introduction written by lexicographers who paid attention to several variables affecting the length, nature, scope, and structure of systematic introductions: subject-field coverage, specialised knowledge, language of the intended users, number of languages covered, and culture dependence or independence of the subject field covered by the dictionary in question. The contents of systematic introductions are directly affected by the fact that specialised dictionaries may cover one entire subject field (single-field dictionary), several subject fields (multi-field dictionary), or one or more sub-fields (sub-field dictionary), in a very detailed (maximising dictionary) or limited (minimising dictionary) way. Consequently, systematic introductions are likely to be more relevant in either maximising or minimising single-field or sub-field dictionaries than in multi-field dictionaries, as the introductions would become too complex and voluminous in the last situation. In terms of factual knowledge of the target users, interested laypersons and semi-experts will benefit greatly from systematic introductions. In contrast, experts are unlikely to need them because their expected degree of knowledge will force lexicographers to produce very detailed and complex systematic introductions

 Pedro A. Fuertes-Olivera

if they are to be of any use. Laypersons and semi-experts may benefit from two different types of systematic introductions: semi-experts requiring a more detailed and systematic presentation than interested laypersons who may benefit from a general and brief introduction to the subject field described in the dictionary. The language of the target users affects systematic introductions in respect of the explication language used. The best solution is where the explication language is the user’s native language if the dictionary primarily aims to assist users comprehend texts, and it is the user’s foreign language if the dictionary supports foreignlanguage translation and production. On the other hand, the degree of language complexity in the systematic introductions must be adapted to the linguistic competences of the intended users. The number of languages covered by the dictionary also affects systematic introductions, as spatial constraints are likely to rule out the inclusion of encyclopaedic introductions in specialised multilingual dictionaries (Bergenholtz and Tarp 1995: 177). However, monolingual and bilingual LSP dictionaries are suitable for including systematic introductions in the language(s) covered by the dictionary. This means that in bilingual dictionaries, systematic introductions should be provided in both languages, with different possibilities. For example, Bergenholtz and Nielsen (2006) propose that a multifunctional specialised dictionary designed for laypersons and semi-experts needs two systematic introductions each written in the relevant language and style; in a bilingual dictionary covering a culturalindependent subject field, the two aforementioned subject introductions will have to be converted into four; and three-times-two systematic introductions are needed for bilingual dictionaries covering culture-dependent subject fields: “one in L1 about L1, one in L1 about L2, one in L1 about differences between L1 and L2, one in L2 about L1, one in L2 about L2, and one in L2 about the differences between L1 and L2.” (Bergenholtz and Nielsen 2006: 294) The cultural component is a key lexicographical element in dictionaries for culture-dependent subject fields such as business and economics, where the subjectmatter shows major or minor differences between languages and cultures. However, matters of business and economics are subject to the workings of two opposing forces: 1) globalisation and systematisation of rules promoted by international organisations; 2) the presence of cultural traditions. In the field of accounting, there are both international standards developed by, for instance, the International Accounting Standards Board (IASB) and national rules and traditions, most of which are being constantly adapted to the international requirements.

Systematic introductions in specialised dictionaries 

3. Data distribution and access in systematic introductions .

Lexicographers are expected not only to balance their texts in systematic introductions according to the characteristics in the user profile, both in terms of factual information and the language in which it is written, but also to adopt adequate data distribution and access structures. Bergenholtz and Nielsen present different options for the distribution of data: 1. All data are placed in the subject-field component. 2. The data in the subject-field component are all restatements of the data found in the articles. 3. The data in the subject-field component are partly restatements of the data in the articles. 4. The data in the subject-field component complement the data in the articles. 5. The data in the subject-field component are restatements of all the data in the articles as well as new and supplementary data. 6. The data in the subject-field component are partly restatements of some of the data in the articles and partly new and supplementary data. (Bergenholtz and Nielsen 2006: 293) They add that the best option for a multifunctional specialised dictionary for laypersons and semi-experts is option 6 for several reasons. One is that it allows lexicographers to omit very specific data about a particular term in the systematic introduction and place these data in the relevant article, where they are more easily understood by potential users (for example, a variant form of a term). Furthermore, it favours communication-oriented functions by including relevant data in the article (for example, word class, pronunciation, grammar, etc.). Option 6 also provides easy access routes through the use of explicitly and fully integrated systematic introductions. The access routes are made up of direct cross-references from the individual dictionary article to the systematic introduction, which needs to be divided into numbered chapters or paragraphs, in which key terms have been highlighted, and where illustrations (if necessary), a separate table of contents, and a subject index have also been provided. Finally, a stand-alone section covering partly new and supplementary data will assist users in a self-study context as a genuine text in its own right (Tarp 2005) if it is organised according to a systematic subject classification that disregards the alphabetic ordering, as alphabetisation will upset the subject-field inherent logic and make it more difficult for users, especially learners, to understand and relate details, particularly when references are not explicitly given but are merely implied.

 Pedro A. Fuertes-Olivera

4. Encyclopaedic information in business dictionaries Business dictionaries are reference works that include the lexical units occurring in the field of business together with some specific terms typically used in economics and many general-language words. More recent analyses have shown that most existing business dictionaries do not inform users of their orientation in terms of functions (Andersen and Fuertes-Olivera 2009), and that they do not pay much attention to a new breed of users demanding more pedagogically-sound reference works (Fuertes-Olivera and Arribas-Baño 2008). The above findings have at least two important lexicographical consequences. First, a proper process of lemma selection in these reference works is difficult to implement because the processes of terminologization and de-terminologization are very active in the hybrid nature of business discourse. Second, lexicographers seem to follow a long lexicographical tradition characterised by not knowing which encyclopaedic information to include and where to place it. A result of the above processes is the proliferation of monolingual multi-field and single-field or sub-field business dictionaries containing similar data although they are targeting different user groups. Consequently, the market for business dictionaries is full of reference works with insufficient encyclopaedic information. For example, Peter Collin Publishing has a series of printed monolingual English dictionaries comprising multi-field business dictionaries (including an American business dictionary, and an English business dictionary), together with single-field and sub-field dictionaries of accounting, banking and finance, government and politics, personnel management, marketing, etc. In addition to encyclopaedic notes and encyclopaedic labels, these dictionaries contain a back matter component called ‘Supplement’, providing an overview of selected topics thought to be relevant for potential users: international telephone codes; local times around the world; numbers; weights and measures; world currencies; examples of business letters and curriculum vitae; descriptions of a number of very basic concepts. An analysis of these supplements show that the multi-field business dictionaries and single-field or sub-field business dictionaries published by Peter Collin Publishing all include similar topics and that these supplements only focus on a handful of relevant concepts. This type of outside matter component has three main drawbacks. First, the individual articles contain no cross-references to the supplements. Second, the supplements are not divided into numbered chapters or paragraphs to which references may be made by indicating the number of the relevant section in the dictionary article, nor do they highlight relevant terms by using bold face. In short, the supplements contain no structural indicators that help users navigate the texts and find what they are looking for. Third, the supplements are not based on a carefully prepared subject classification and consequently the

Systematic introductions in specialised dictionaries 

concepts given are not treated as integrated systems of a domain but as separate elements. Moreover, none of the supplements is accompanied by a separate table of contents or references to relevant literature and therefore does not facilitate more extensive studies on the part of users. To sum up, two conclusions can be drawn from the lexicographical practice of including supplements instead of appropriate systematic introductions. First of all, the back matter components do not really work as systematic introductions. In the second place, the considerable similarity of the various supplements shows that lexicographers have not paid due attention to users’ needs, user situations, and lexicographical functions. The two conclusions can be illustrated by analysing the different structures found in the Dictionary of Accounting (1992). Examples 1, 2, and 3 show the lemmas accounting and equation presented in the wordlist, and the concept the accounting equation described in the back matter. ◊ accounting noun work of recording money paid, received, borrowed or owed; accounting methods or accounting procedures; accounting system; accounting bases = the possible ways in which accounting concepts may be applied to financial transactions (the methods used to depreciate assets, how intangible assets or work in progress are dealt with, etc.); (….) comment: note the various theoretical bases for accountancy, moving from the general to the specific: ‘accounting concepts’ are general: so various concepts, such as the ‘accruals concept’ may apply to depreciation; ‘accounting bases’ are more specific, so in the case of depreciation, the bases could be straight-line depreciation, reducing balance depreciation, etc.; ‘accounting policies’ are the policies applied by a company, so the company policy could be to apply straight-line depreciation in its financial statements quote applicants will be professionally qualified and have a degree in Commerce or Accounting Australian Financial Review Example 1: Accounting in the wordlist of the Dictionary of Accounting (1992) ◊ equation noun set of mathematical rules applied to solve a problem; the basic accounting equation is that assets equal liabilities plus capital Example 2: Equation in the wordlist in the Dictionary of Accounting (1992) Assets = Capital + Liabilities Which can be restated as: Capital = Assets – Liabilities

 Pedro A. Fuertes-Olivera

Or: Liabilities = Assets – Capital Assets are: the resources owned by the company (cash, debtors, machinery, stocks, etc.) Liabilities and capital are: the amounts of money owed to people who have provided those resources (banks, trade creditors, shareholders, debenture holders, etc.) Example 3: The Accounting Equation in the back matter ‘supplement’ in the Dictionary of Accounting (1992) The analysis of the three examples and of the outside matter in the dictionary includes the following findings: 1. The Preface states that the dictionary provides a basic vocabulary of terms used in the fields of accounting, bookkeeping and general finance (i.e. it primarily targets laypersons and semi-experts), that it covers both British and American usage, that the words and phrases are defined in simple English, and that explanatory comments are given in some cases to expand the definitions, as shown in Example 1. 2. The above examples show that the dictionary supports text reception more than text production and translation. It only shows the word class of the entry word, together with a short definition, collocations, and examples in the form of quotes, although this only occurs in very few instances (for example, only 28 of the 248 entries and sub-entries under the letter a have a quote). 3. Although the encyclopaedic information given in the form of comments is interesting, its value for the intended primary users is negligible because comments are scarce (only 20 of the entries under the letter a have a comment) and not integrated: examples 1, 2, and 3 refer to the same concept, without crossreferences from one to another. 4. The lack of an appropriate system of cross-references is particularly evident in the treatment of accounting where accounting equation is not mentioned, nor are users cross-referred to the supplement or to the entry equation, although the dictionary contains a lot of similar expressions (accounting methods; accounting procedures; accounting system; accounting bases; accounting concept; accounting entity; accounting period; accounting policies; accounting rate of return; accounting standards; Accounting Standard Board; accounting technician; accounting unit; cost accounting; current cost accounting; financial accounting; management accounting). 5. The concepts described in the back matter are very few in number, unrelated to the dictionary articles and do not form part of a description of the conceptual structure of the subject field. The supplement refers to a few basic concepts

Systematic introductions in specialised dictionaries 

(statements of accounting standard practice, the accounting equation, T accounts, modified accounts, basic double-entry bookkeeping), sometimes with examples of typical accounting texts: a profit and loss account, a balance sheet, a statement of source and application of funds, a cash flow statement, and a value added statement. Related to the above discussion is the fact that many compilers of business dictionaries are unsure where to place the necessary conceptual information. Instead of producing systematic introductions, they prefer a system of cross-references among dictionary articles for offering encyclopaedic data. A typical example is found in the Dictionary of Business and Management (Rooney et al. 2003). Its User’s Guide (2003: vii) claims that the dictionary provides definitions of more than 6,000 international business and management terms, covering the fields of e-commerce, economics, finance, banking and accounting, human resources and personnel, marketing, operations and production, and statistics; it also adds that the there is a back matter component called ‘Facts and Figures’, containing “fascinating data on the world economy”, “practical examples of key business components”, and “comprehensive listings of world currencies and stock exchanges.” On closer inspection, however, the 19-page back matter is not useful because it shows the same deficiencies as the Peter Collin dictionaries and because most of the pages include randomly selected information: 1. 12 pages are devoted to constantly-changing economic indicators for a range of countries: national income, growth, GNI per capita, growth of output, external debt, manufacturing output, new patent applications, passenger car ownership, TV ownerships, billion-dollar brands, retail sales growth, list of billionaires per country. This information is useless conceptually (it changes almost on a daily basis) and communicatively, as it is only a list. 2. 4 pages are reserved for listing currencies and capitals of the world’s top business centres and stock exchanges. Again, this information has a very limited use and does not offer any hint regarding the conceptual structure of the domain. 3. 3 pages are used for explaining CVs and accompanying covering letters. Although they are acceptable illustrations they only cover a tiny part of the conceptual domain and therefore have very limited value. As in the case of the Peter Collin collection, the Dictionary of Business and Management (2003) resorts to including detailed encyclopaedic data in the dictionary articles by means of a well-developed system of cross-references among the entries. Direct and indirect cross-references are used to link terms that are conceptually

 Pedro A. Fuertes-Olivera

related (Examples 4 and 5 respectively), or in need of expansion in another article (Example 6). administration school Gen Mgt see business administration Example 4: Use of see together with bold face and italics for direct crossreferences in the Dictionary of Business and Management (2003) activity-based management Gen Mgt a management control technique that focuses on the resource costs of organisational activities and processes, and the improvement of quality, profitability, and customer value. This technique uses activity-based costing information to identify strategies for removing resource waste from operating activities. Main tools employed include: strategic analysis, value analysis, cost analysis, life-cycle costing, and activity-based budgeting. Example 5: Use of bold face and italics for indirect cross-references in the Dictionary of Business and Management (2003) Adair, John Eric (b. 1934) Gen Mgt British academic. Best known for his three-circle model of leadership, which is based on overlapping circles representing the task, the team, and the individual. Adair’s model, otherwise known as action-centred leadership, is described in the book of the same name (1973). Like Warren Bennis, Adair, who has a military background, believes that leadership can be taught. Example 6: Expanded information in the Dictionary of Business and Management (2003) More interesting are the back matter components ‘Apéndices’ (Appendices) in the Diccionario de Economía y Finanzas [Dictionary of Economics and Finance] (Tamames and Gallego 1994), a very popular dictionary in the Spanish market since its first edition was published in 1988. In its 1994 edition, the dictionary lemmatises around 5,000 Spanish business/economics terms in its main wordlist and includes a well-developed frame structure containing different front matter and back matter components. The former consist of a table of contents, a preface written by a famous Spanish linguist highlighting the importance of cognition-oriented dictionaries, and an introduction and a user’s guide by the authors commenting on lexicographical aspects of the dictionary. The back matter is devoted to encyclopaedic data: 1. A graphic and mathematical 154-page appendix containing very detailed verbal definitions, including mathematical formulae, of 154 terms. In the user’s guide the authors indicate that these terms are also described in the main wordlist in simpler language and without the help of mathematical language.

Systematic introductions in specialised dictionaries 

2. 3. 4.

5.

6.

7.

8. 9.

Although the authors do not explain the reasons for the inclusion of the appendix (in the user’s guide, they only indicate that the 154 terms were included because all of them can be explained by means of graphs and figures), this lexicographical practice is in line with the tenets of the function theory as it equates the complexity of definitions with the needs of the target group (Bergenholtz and Kaufmann 1997). For unknown reasons, this appendix does not appear in the recent editions of the dictionary, even though such a component is very useful for semi-experts. For example, students can use it as a study book of 154 important concepts in the field. A list of world countries specifying currencies and their symbols. A list of prestigious economic journals, indicating publishing house, periodicity, and contents. A list of business/economic terms which should be included in the Diccionario de la Lengua Española (Spanish Language Dictionary published by the Spanish Royal Academy). A list of all the Spanish terms included in the wordlist with their English equivalents and the sub-field(s) to which they belong. In the user’s guide the authors refer to this list as a kind of “fast vocabulary” that is very useful for reception purposes as it allows easy access to terms in both Spanish and English and the discovery of related conceptual terms as they all cross-refer users to the classification system used in the next component of the dictionary. A classification system based on the classification proposed by the Journal of Economic Literature, perhaps the most widely used thematic classification employed in business/economics, containing all the terms assigned to each of the 29 sub-fields covered by the dictionary. An English-Spanish wordlist that is very suitable for the comprehension of English texts by Spanish-speaking users who can look up the listed terms in the main wordlist. An index of proper names. A list of references containing more than 100 dictionaries and handbooks.

Only one of the above components is incorporated in the 2006 edition (13th edition) of the dictionary: the classification system, which has been simplified to make it more suitable for semi-experts and laypersons. This edition introduced a new component: a list of abbreviations and acronyms, some of which are also in the main wordlist. Finally, the Diccionario de Términos Económicos, Financieros y Comerciales [Dictionary of Economics, Finance and Business Terms] (2008) contains a front matter component of twenty-odd pages called ‘introducción’ (introduction), in which the authors explain that the dictionary targets primarily Spanish students of

 Pedro A. Fuertes-Olivera

ESP and translation. Consequently, the encyclopaedic information offered include encyclopaedic labelling and notes suitable for the intended users, and a 10-page academic description of the register focusing mainly on aspects of term formation (for example, many English financial terms are metaphorical) and the linguistic processes used in the dictionary for giving equivalents (calques, loan translations, translations, adaptations, etc.). In Example 7, the lexicographers use superscript, sub-field labels such as mer finan/prod/diner, bolsa (financial market/product/money/stock market), language and etymology comments (pipo is a calque whose origin can be traced to domino or dice games), cross-references to other articles (the symbol V), and style markers (col.): pip1 n: MER FINAN/PROD/DINER, BOLSA pipo; se aplica este término, calcado del inglés, a la fluctuación en un tipo de cambio, equivalente a 0,00001 unidades; en su origen, la palabra inglesa se refiere a cada uno de los puntos que aparecen en las fichas de dominó o de los dados; V. junior pip. [Exp: pip2 (ganar por un margen escasísimo ◊ The company was just pipped for/to the contract), pipped at the post, be col (perder en los últimos metros/por un pelo)]. (pip1 n: FINANC MAR/PROD/MON/, STOCK MARKET pipo; this term, an English calque, refers to the fluctuation in a type of change, equivalent to 0.00001 units; in its origin, the English word refers to each one of the points that appear on the domino tiles or on dice; V. junior pip. [Exp: pip2 (ganar por un margen escasísimo ◊ The company was just pipped for/to the contract), pipped at the post, be col (perder en los últimos metros/por un pelo)) Example 7: The entry pip in the Diccionario de Términos Económicos, Financieros, y Comerciales (2008) The above analysis shows that business dictionaries do not have proper systematic introductions, but the encyclopaedic information given is mostly addressed to the dictionary articles, and the separate components contained in the outside matter lack a full presentation of the domain, or do not refer to the subject field but to something else. In addition, these components do not support the genuine purpose of the dictionary and do not function interactively with the wordlist to assist users in retrieving the required information from the dictionary. The way in which lexicographical structures are employed gives lexicographers the opportunity to be innovative. For instance, we can devise systematic introductions that explain the field of accounting to users by incorporating them into the Accounting Dictionaries (Nielsen, Mourier and Bergenholtz 2003–2008), thus utilising the outer texts in a much more creative and innovative way than many published business dictionaries do. Moreover, assuming that dictionaries

Systematic introductions in specialised dictionaries 

can only be used effectively if the target users can have unimpeded access to the data they need (Bergenholtz and Gouws 2007), the proposed outer text must be of the type called extended aided integrated systematic structure (extended aided integrated subject-field component in Bergenholtz and Nielsen 2006: 292–3). 5. Proposing a systematic introduction for the Accounting Dictionaries The Accounting Dictionaries (Nielsen, Mourier and Bergenholtz 2003–2008) consist of a network of four online accounting dictionaries for learning accounting and its specialised language. The network contains two monolingual (a Danish and an English) and two bilingual (a Danish-English and an English-Danish) accounting dictionaries linked to each other so that users can easily move from one dictionary to another. Nielsen (2007) explains that the network was designed to assist users in learning L1 as well as L2 accounting terminology and usage. Each of the four accounting dictionaries is polyfunctional. For example, the English dictionary provides data for reading, understanding and producing accounting texts; and the English-Danish dictionary is designed for translating texts into and producing texts in Danish. The incorporation of a Spanish part to the dictionary network targeting mainly Spanish students in need of learning native-language as well as foreign-language accounting terminology is proposed. These students need to acquire knowledge about accounting through terminology as well as register-specific usage in Spanish and English to understand textbooks and to write factually and grammatically correct accounting texts. Those users who are enrolled in translation programmes also need to acquire knowledge that enables them to produce factually and grammatically correct translations of accounting texts. This “requires factual and linguistic knowledge about the native-language source text, including register, and knowledge about the foreign-language register and usage of the target text.” (Nielsen 2007: 369). The factual and linguistic data are distributed throughout the dictionary, forcing lexicographers to carefully decide their arrangement and to devise a system of relations between them. In specialised dictionaries, lexicographers should provide part of the encyclopaedic information in systematic introductions in order to reduce the length of individual articles, avoid repetitions, and provide an overall, systematic overview of factual (and linguistic) aspects of the subject field. The proposed systematic introductions will have a primarily cognition-oriented function for semi-experts (Spanish students) and deal secondarily with the communication-oriented functions of text reception, text production and translation (English into Spanish and Spanish into English). Another important

 Pedro A. Fuertes-Olivera

aspect is the process of standard-setting, which has led to a convergence of European and American norms and rules, and whose lexicographical consequence is that the systematic introduction envisaged will be similar to one intended for a culture-independent subject field. Although a possible solution is to prepare three-times-two systematic introductions as proposed by Bergenholtz and Nielsen (2006), this will not be necessary in the field of accounting in light of the aforementioned trend towards systematisation and harmonisation of rules. Instead, a culture-dependent presentation of the subject field of accounting in the form of a comparative systematic introduction in the different languages covered by the dictionary is called for, together with ample room for commenting on peculiarities. This solution will imply that the systematic introduction in an English-Spanish bilingual dictionary of accounting should be written in both languages, focus on the international rules, and, when necessary, refer briefly to relevant national rules and traditions explained in full in the corresponding dictionary article(s). For instance, the entries for cost value and market value must cross-refer users to the systematic introduction explaining accounting methods and procedures and to the entry swap (permuta) where lexicographers can comment on an important difference between Spanish and English accounting traditions: when recording the value of a swap, English accounting systems tend to record the market value of the swap whereas Spanish accountants are required by law to record the cost value of the swap. Moreover, there are some exceptions to this general principle and data on these should be presented in the dictionary article for the Spanish term permuta. These specific data should be given in encyclopaedic notes, reserving the systematic introduction for explaining more general concepts such as the way of recording values in balance sheets and accounting books. In particular, the proposed introduction to the subject field will be an extended aided fully-integrated systematic introduction (Bergenholtz and Nielsen 2006) based on the following principles: 1. It will provide a systematic overview of the entire subject field and can therefore be used as an independent textbook on accounting (Bergenholtz and Svensén, 1994) written in pedagogical language. 2. It will be written in English and Spanish with information based on the conceptual framework prepared by the International Accounting Standards Board, supplemented, if necessary, by comparative information about differences among Spanish, English and American practices; only general principles will be explained in the systematic introduction whereas specific details will be explained in the dictionary articles.

Systematic introductions in specialised dictionaries 

3. It will contain new and supplementary data together with restatements of the data already given in the dictionary articles; specific data about a particular term will be included in the relevant article treating the term. 4. The two language versions will be presented in parallel columns, as this will facilitate comparison of terms and usage (Bergenholtz and Tarp 1995: 158–159). Example 8 shows a draft of the initial chapter of the systematic introduction to be incorporated into the Accounting Dictionaries. La contabilidad es la ciencia económica que analiza la realidad económico-financiera de un agente con vistas a ofrecer una información sobre el estado patrimonial pasado, presente y futuro del mismo. Consiste en anotar en los diferentes libros de contabilidad las diferentes partidas desglosando lo que el agente paga, recibe, presta, o toma prestado. El producto final de la contabilidad son las cuentas anuales, que son un conjunto de estados contables o estados financieros integradas por el balance, la cuenta de pérdidas y ganancias, la memoria explicativa, el estado de cambio en el patrimonio neto, y el estado de flujos de efectivo. En España es costumbre acompañar las cuentas anuales con el informe de gestión.

Accounting is the economic science devoted to the keeping of accounts, and preparation and audit of financial and economic statements with the aim of offering a fair and true view of the past, present, and future wealth of an organization. The practice of accounting, referred to as accountancy in the USA, consists in recording in different accounting books money paid, received, borrowed, or owed. The final product is the annual accounts or financial statements which comprises documents such as the balance (sheet), the profit and loss account, the annual report, changes in equity, cash-flow statement, and auditor’s report.

Example 8: § 1. La contabilidad / accounting Example 8 informs users that accounting is a science (i.e. its procedures are subject to scientific practices accepted everywhere) whose records offer a fair and true view of the money paid, received, borrowed, or owed by an organization, thus helping market agents make informed decisions. Users also learn that a Spanish tradition requires the joint publication of annual accounts and management reports. In addition, users are cross-referred to specific dictionary articles where necessary data for cognition-oriented and communication-oriented functions are given.

 Pedro A. Fuertes-Olivera

6. Concluding remarks As practical tools specialised dictionaries should assist users in enhancing their knowledge of both the subject field and the related language. The preparation of well-developed systematic introductions will contribute to both tasks, especially if their design is based on users’ profiles and dictionary functions. The length, nature, scope, and structure of systematic introductions in specialised dictionaries should be adapted to variables such as the subject fields covered by the dictionary, the degree of factual knowledge target users have together with their native language competence, users’ foreign-language competence, the number of languages covered, and the nature of the specialised language recorded. The analysis of the encyclopaedic information contained in business dictionaries indicates that there is room for improving their quality by incorporating extended aided integrated systematic introductions, because most existing business dictionaries do not usually include well-thought-out outer matter components but continue to include traditional ‘supplements’ whose information is almost useless, particularly because the information is selected at random and does not target specific users. The proposed Spanish part to the Accounting Dictionaries will aim mainly at Spanish students of accounting but also at Spanish translators of economic texts. For these users we will include an extended aided systematic introduction of accounting written in Spanish and English and which will cater for both the cognitive and communicative needs of the intended users. This introduction will offer an overview of the subject field of accounting which may be used as a textbook, and the Spanish and English texts will illustrate typical accounting language that is useful for producing texts in and translating texts into English and/or Spanish. The English systematic introduction will be particularly relevant when companies have to present their annual financial records in English either because of legal requirements or because they want to have their shares quoted on international stock markets. References A.

Dictionaries

[Accounting Dictionaries] Nielsen, S., Mourier, L. and Bergenholtz H. 2003–2008. Danish Accounting Dictionary. URL: http://www.regnskabsordbogen.dk/regn/dkdk/dkregn.aspx; Danish-English Accounting Dictionary. URL: http://www.regnskabsordbogen.dk/regn/ dkgb/dkgbregn.aspx; English Accounting Dictionary. URL: http://www.regnskabsordbogen. dk/regn/gbgb/gbregn.aspx; English-Danish Accounting Dictionary. URL: http://www.regnskabsordbogen.dk/regn/gbdk/gbdkregn.aspx (Accessed June 23 2008).

Systematic introductions in specialised dictionaries  [American Business Dictionary] Collin, P.H., Weiland, C. and Dohn, D.S. 1991. American Business Dictionary. London: Peter Collin Publishing. [Dictionary of Business and Management] Rooney, K. et al. 2003. Chartered Management Institute Dictionary of Business and Management. London: Bloomsbury. [Diccionario de Economía y Finanzas] Tamales, R. and Santiago G. 1994. Diccionario de Economía y Finanzas. First edition. Madrid: Alianza Editorial. [Diccionario de Términos Económicos, Financieros y Comerciales] Alcaraz Varó, E. and Hughes, B. 2008. Diccionario de Términos Económicos, Financieros y Comerciales. A dictionary of economic, financial and commercial terms. Inglés-Español / Spanish-English. Fifth edition. Barcelona: Ariel. [Dictionary of Accounting] Collin, P.H. and Joliffe, A. 1992. Dictionary of Accounting. London: Peter Collin Publishing. [Encylopedic Dictionary of Gene Technology] Kaufman, U., Bergenholtz, H., in cooperation with B. Stumman, S. Tarp, L. de la Rosa Marabet, N. la Serna Torres and G. la Serna Miranda. 1998. Encyclopedic Dictionary of Gene Technology. English-Spanish. Diccionario Enciclopédico de Ingeniería Genética Español-Inglés. Toronto: Lugus. [English Business Dictionary] Collin, P.H. 1986. English Business Dictionary. London: Peter Collin Publishing.

B.

Other Literature

Andersen, B. and Fuertes-Olivera, P.A. 2009, forthcoming. “The application of function theory to the classification of English monolingual dictionaries.” Lexicographica 25. Bergenholtz, H. and Gouws, R. 2007. “The access process in dictionaries for fixed expressions.” Lexicographica 23: 236–260. Bergenholtz, H. and Kaufmann, U. 1997. “Terminography and lexicography. A critical survey of dictionaries from a single specialised field.” Hermes 18: 91–125. Bergenholtz, H. and Nielsen, S. 2006. “Subject-field components as integrated parts of LSP dictionaries.” Terminology 12(2): 281–303. Bergenholtz, H. and Svensén, B. 1994. “Systematisk inledning till Nordisk lexikografisk ordbok (NLO)” [Systematic introduction to the Nordic Dictionary of Lexicography]. LexicoNordica 1: 149–185. Bergenholtz, H. and Tarp, S. (eds). 1995. Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries. Amsterdam/Philadelphia: John Benjamins. Bergenholtz, H. and Tarp, S. 2003. “Two opposing theories: On H.E. Wiegand’s recent discovery of lexicographic functions.” Hermes 31: 171–196. Bergenholtz, H. and Tarp, S. 2004. “The concept of dictionary usage.” Nordic Journal of English Studies 3: 23–36. Fuertes-Olivera, P.A. and Arribas-Baño, A. 2008. Pedagogical Specialised Lexicography. The representation of meaning in English and Spanish business dictionaries [Terminology and Lexicography in Research and Practice 11]. Amsterdam/Philadelphia: John Benjamins. Humbley, J. 2003. “Review of Kaufmann, U. and Bergenholtz et al. (1998): Encyclopedic Dictionary of Gene Technology, vol. 1, English (with Spanish equivalents), 385 p. vol. 2, Spanish (with English equivalents): Toronto: Lugus Libros, 411 p.” Meta 48(4): 593–595.

 Pedro A. Fuertes-Olivera Nielsen, S. 2007. “Dictionary networking in an LSP learning context.” In Teaching and Learning LSP: Blurring Boundaries. Proceedings of the 6th International AELFE Conference, Lisbon 13–15 September, M. Kuteeva and H.F. Martins (eds.), 367–373. Lisbon: ISCAL. Tarp, S. 2005. “The pedagogical dimension of the well-conceived specialised dictionary.” Ibérica 10: 7–22.

part iv

Data retrieval and corpus lexicography

The role of corpora in future dictionaries D.J. Prinsloo The aim of this article is to reflect on current utilization of corpora and lexicographic tools built from corpora as well as on future perspectives on the development of corpora and corpus tools. The point of departure is a lexicographic perspective for Afrikaans and the Bantu languages spoken in South Africa. The crucial development steps to future corpus-based lexicography, in chronological order, are: corpus creation, corpus annotation, qualitative corpus query outputs and advanced dictionary writing systems capable of extracting relevant data from corpora and other lexicographic sources. Such advanced dictionary writing systems place the lexicographer in the position of a final editor rather than a dictionary compiler. Keywords: corpus lexicography; lemma selection; corpus annotation; data processing; lexical resource building

1. Introduction As a leading international lexicographer, Henning Bergenholtz appreciates and utilizes electronic corpora for his lexicographic work. The corpus is generally acknowledged as an indispensable resource for the creation of dictionaries and lexicographic tools, and is a source for a variety of query programs. The value of a corpus or ‘the corpus’ should, however, not be over- or underestimated – it cannot replace the lexicographer, nor should it be regarded as inferior to the knowledge of the lexicographer in any respect (cf. Landau 2001: 273 – 342 for detailed descriptions of the history of corpus development, corpus compilation and corpus use). Corpora should thus be seen as a significant resource to be used judiciously. In the words of Teubert “by exploiting corpora, bilingual and multilingual lexicography can reach a new quality level” (1996: 241). Roberts and Montgomery provide a sound perspective: … despite the many advantages that corpora present, they must be used with some caution. The lexicographic evidence they provide must be subjected to the

 D.J. Prinsloo

sound judgement of lexicographers. And lexicographers must ensure that they do not become overwhelmed by corpus evidence! (1996: 463)

It will be argued in this article that the future power of corpora as lexicographic resources will reach far beyond mere size, and corpora must be both balanced and representative. Their value as an aid to lexicography will increasingly depend on several levels of annotation, e.g. of a morphosyntactic kind and/or bi- or multilingual alignment, and on the existence of advanced corpus query programs capable of processing vast quantities of data and of rendering user-friendly and machine readable outputs. Currently available corpus query outputs — ranging from the most basic, such as alphabetical and frequency word and lemma lists and keywords in context, to more sophisticated outputs, such as lexicographic rulers and block systems, word sketches and automated translation equivalent detection — are expected to remain the cornerstone of corpus-based or corpus-driven lexicography. However, the contribution of the corpus of the future and of the future of the corpus depends on the extent to which comprehensive behavioural patterns of words can be compiled and presented to lexicographers in a condensed but userfriendly, machine-readable format. Brief descriptions of the corpus-query outputs mentioned above will be given, and the main obstacles in terms of the compilation of comprehensive behavioural patterns will be outlined and aligned with the work and vision of John Sinclair, Sue Atkins, Patrick Hanks and Henning Bergenholtz. 2. Corpora and corpus lexicography for Afrikaans and the nine official Bantu languages in South Africa Corpora created for isiZulu, isiNdebele, Siswati, Xitsonga, Tshivenda, isiXhosa, Sepedi, Setswana, Sesotho and Afrikaans range from one million to ten million tokens. Mega-corpus data, however, exist for Afrikaans in the form of the Media24 archive estimated at 1,000 million tokens. Data culled from all of these raw corpora has contributed substantially to dictionary compilation for these languages as well as to Human Language Technology (HLT) activities, such as the compilation of spelling checkers. Efforts to annotate these corpora for parts of speech are in progress, opening the way for advanced HLT products, such as grammar checkers, parsers, machine translation software and text to speech converters. 2.1

Major English corpora

Major English corpora include the American National Corpus, Bank of English, British National Corpus, BYU Corpus of American English, Oxford English Corpus,

The role of corpora in future dictionaries 

and the Scottish Corpus of Texts & Speech. These corpora contain hundreds of millions of tokens and are indispensable as databases in modern lexicography. Such corpora also constitute the foundation for future lexicographic development. 2.2

Corpus design, balance, representativeness

Apart from the issue of corpus size, corpus compilers pay particular attention to the nature, design, types and quantities of material collected and included in their corpora. This generally concerns concepts such as “balance” and “representativeness”. A general corpus is typically designed to be balanced, by containing texts from different genres … including spoken and written …. (Kennedy: 1998: 20) For a corpus to be ‘representative’ there must be a clearly analysed and defined population to take the sample from. (Kennedy: 1998: 52)

Bergenholtz and Tarp (1995: 94) assert that the requirement for exemplary composition of corpora consisting of LSP texts implies that: 1. The corpus should cover all sub-fields of the subject field in question. 2. The text types which the dictionary intends to consider should be included in relation to their presumed relevance to the intended dictionary users and situations. Where corresponding translations for source texts exist, Bergenholtz and Tarp add that these might be incorporated as baseline material in the preparation of a bilingual LSP dictionary but warn against a number of problematic aspects that might introduce errors into the compilation of such a dictionary: One single translator does not usually have at his disposal such a comprehensive and up-to-date collection of texts that the material may be considered sufficiently large to provide an exemplary section of the LSP(s) which the dictionary is intended to cover. (Bergenholtz and Tarp 1995: 95)

Corpus compilers have also paid much attention to corpus planning and design – the design of the Longman Lancaster English Language Corpus (Summers 1993: 201) is a good example of such efforts. The reality for most of the Bantu languages is, by contrast, such that a neatly designed collection strategy is not yet possible. Ultimately, the selection process simply involves the collection of all available texts for the specific language in order to reach corpus sizes of a few million running words.

 D.J. Prinsloo

2.3

The lemma list

One of the basic uses of corpora and corpus-query tools is the extraction and compilation of lemma lists from existing corpora. In practice, this means the generation of frequency and alphabetical lists of words from corpora and their subsequent lemmatisation. For languages such as Sepedi, Setswana, Sesotho, Tshivenda and Xitsonga which use a disjunctive orthography, the correlation between orthographic word and lemma is very close, that is, a word list generated from the corpus is roughly equal to the eventual dictionary lemma list with items smaller than words (e.g. affixes) or multiword lemmas as exceptions. For the conjunctively written languages isiZulu, isiXhosa, isiNdebele and Siswati for which traditionally a stem lemmatisation strategy is followed, a rather complex lemmatisation process has to be performed on such word lists, especially in the absence of lemmatisers for these languages. Such frequency and alphabetical word lists are largely used to determine what should be included or excluded from the dictionary – often regarded as the most basic challenge in lexicography (cf. De Schryver and Prinsloo 2000a) in respect of combating inconsistencies at the macrostructural level. 2.4

Sense distinction, extraction of examples, clusters, collocates, idioms and expressions

Keyword-in-context extraction, commonly referred to as concordance lines, can also be regarded as a basic corpus-query output of significant value as a lexicographic aid. Analysis of a digestible number of concordance lines can assist the lexicographer in terms of sense distinction (cf. De Schryver and Prinsloo 2000b) and also in pinpointing examples, clusters, collocations, idioms, etc. It is of interest that taking examples from the corpus as opposed to generating author-constructed examples became a point of dispute for many years (cf. Prinsloo and Gouws 2000). 2.5

Balancing alphabetical stretches

Proper balance in alphabetical stretches in dictionaries is a lexicographic aspect that has arguably not enjoyed sufficient attention in lexicographic studies. It is a generally accepted principle that alphabetical categories in any given language do not contain an equal number of words. In Sepedi, for example, the categories B, D, L S, and especially M, contain the majority of words while categories such as J, Q, U, V, X, Y and Z are relatively small. Prinsloo and De Schryver (2002) describe the design of so-called lexicographic rulers that reflect the presumed balance in alphabetical stretches. Figure 1 illustrates such a lexicographic ruler devised for Setswana.

The role of corpora in future dictionaries 

Figure 1. A lexicographic ruler for Setswana (Prinsloo 2004: 165)

Prinsloo and De Schryver (2003) also highlight examples of glaring imbalances even in the compilation of major dictionary projects. They indicate, for example, that the alphabetical stretch K in the Woordeboek van die Afrikaanse Taal (WAT) has been grossly over-treated in the absence of a lexicographic ruler for Afrikaans. It is apparent, therefore, that balancing alphabetical stretches with the aid of a lexicographic ruler is axiomatic in the compilation of corpora to enhance the quality of future dictionaries. 2.6

Compilation process regulators

In practice, few dictionary compilation projects are ‘on schedule’ and programme managers often find it difficult to ‘pace’ the dictionary compilation process. Once again, the corpus offers an indispensable solution to compilers, in terms of a percentage-based subdivision of the Setswana Ruler referred to as a Block System in Table 1, for instance. Table 1. A Block System for Setswana (Prinsloo 2004: 165)

 D.J. Prinsloo

3. Measuring the lexicographic impact factor It can be argued that the inclusion (or exclusion) of each lemma in a dictionary should be justified in terms of certain criteria, e.g. frequency, cultural/curriculum term, etc. The authors of corpus-based dictionaries often defend inclusion versus omission of lemmas either in terms of frequency in the general corpus or, in the case of school dictionaries, in terms of a frequency playoff between the general corpus and so-called dedicated or domain-specific corpora containing the study material of learners (cf. De Schryver and Prinsloo 2003). Dictionaries compiled by the state-funded National Lexicography Units for Bantu languages in South Africa have a strong obligation to include cultural terms and pedagogic terminology. From the point of view of frequency, the question, especially in the case of a restricted dictionary, is to what extent the dictionary covers the existing vocabulary of the language. To rephrase this in corpus-linguistics terms: what is the percentage of tokens in the corpus covered by the dictionary? Word frequencies in any given corpus are truly Zipfian in the sense that a relatively small percentage of highly used words accounts for a large percentage of occurrences in the language. Zipf ’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc. For example, in the Brown Corpus “the” is the most frequently occurring word, and all by itself accounts for nearly 7% of all word occurrences (69971 out of slightly over 1 million). True to Zipf ’s Law, the second-place word “of ” accounts for slightly over 3.5% of words (36411 occurrences), followed by “and” (28852). Only 135 vocabulary items are needed to account for half the Brown Corpus. (Wikipedia 2009)

In COBUILD2 the most frequent 14,700 lemmas are marked by means of filled diamonds on a scale of five filled diamonds to one filled diamond in descending order (Table 2). From Table 2, it is clear that the top 1,900 lemmas represent 75% of English (tokens) and the top 14,700 an astonishing 95%.1 A Sepedi dictionary in which the top 5,000 tokens in the Pretoria Sepedi Corpus (PSC)2 have been lemmatised, covers almost 90% of the corpus or, roughly generalised, 90% of Northern Sotho in a given context. From Table 2 and Table 3 the impact factor of high frequency for English and Sepedi is clear and the implication for lexicography is quite significant.

The role of corpora in future dictionaries 

Table 2. Summary of frequency band values in COBUILD 2 (1995) Number of filled diamonds

Lemmas per category

5 4 (Total 5+4) 3 2 1 (Total 3+2+1) (Total 5+4+3+2+1)

Totals

% of all written and spoken English

1900

75

12800 14700

20 95

700 1200 1500 3200 8100

Table 3. Types versus tokens in Sepedi Types (Number of different words) Top 1,000 Top 5,000 Top 10,000

3.1

Total frequencies (Sum of all counts)

Tokens (Total number of words in the corpus)

4,615,053 5,250,768 5,462,500

5,957,553 5,957,553 5,957,553

% of tokens

77. 5 88.1 91.7

Frequency trajectories

Corpora reflecting language over a time frame of at least a few decades can be very useful to the lexicographer and this aspect of corpus data has not been utilised to its full potential. In the case of lexicographic labelling, for example, lexicographers rely on introspection in the allocation of chronolectic labels such as obsolete, antiquating, neologism, fashion word, etc. Corpus frequency data over an extended period, however, offers a firm(er) basis for allocating such labels. Prinsloo and Gouws (2006) study frequency trajectories of Afrikaans words in the Media24 archive and show how frequency trajectories, even for a relatively short period of two to three decades, can provide clear evidence for chronolectic labelling (Table 4). Table 4. Frequencies per 50 million tokens 1985–2003 Word MISSIE ‘Mission’ BOSBERAAD ‘Bush summit’ REGSTELLENDE ‘Affirmative’

1985–89 /50M

1990–94 /50M

1995–99 /50M

2000–03 /50M

108 0 45

320 205 1386

527 201 2196

338 119 1248

 D.J. Prinsloo

Figure 2. Frequency trajectories of missie, bosberaad and regstellende

Figure 2 demonstrates that the frequency of use of missie (mission) in the late nineties was five times higher than in the late eighties but that its frequency has declined in the first few years of the new millennium to the levels of the early nineties. Exactly the same patterns are observed for bosberaad (bush summit) and regstellende (affirmative). Compare Table 5 of the words gonswoord (buzz word), scenario (scenario), both labelled as neologisms in Verklarende Handwoordeboek van die Afrikaanse Taal (HAT) and the words uitset (output) and bemagtig (empower), both words having a relatively new but unlabelled sense (Figure 3). In summary, the compilers of this particular dictionary are able to consider labelling the specific sense of missie and bosberaad in a revised edition, e.g. as verouderend (antiquating), whereas scenario, uitset, bemagtig and gonswoord can be unlabelled. Table 5. Fashion words and neologisms scenario, uitset, bemagtig and gonswoord Word SCENARIO UITSET ‘output’ GONSWOORD ‘buzzword’ BEMAGTIG ‘empower’

1985–89 /50M

1990–94 /50M

1995–99 /50M

2000–03 /50M

58 19 0 1

236 57 1 44

265 67 12 459

325 86 24 632

The role of corpora in future dictionaries 

Figure 3. Frequency trajectories of scenario, uitset, gonswoord and bemagtig

3.2

Homonym separation

Frequency lists culled from raw corpora or even from corpora annotated for partof-speech do not clearly reflect frequency in the case of homonyms. The Sepedi token kae occurs 5,178 times in the PSC but no conclusion can be drawn in terms of the frequency breakdown of the interrogative kae (where) versus the adjective kae (how many). The actual respective frequency counts can, however, easily be detected in the POS-tagged PSC, because they carry different POS-tags and occur in different morpho-syntactic environments. Co-texts for kae (where) typically include a verb (V) in close proximity (e.g. in the position directly preceding kae, (L1) while co-texts of kae (how many) usually have a demonstrative (CDEM) in the L1 position. In the case of kgabo (flame) versus kgabo (ape) however, POS-tags do not offer a solution, because both are tagged as a noun of class 9 (N09) and unlike kae (where) versus kae (how many), kgabo (flame) versus kgabo (ape) typically occur with a similar co-text as far as POS is concerned. Disambiguation by means of typical collocations is in an experimental phase but preliminary results are quite impressive. The positions L1, L2 – L5 and R1, R2 – R5 to the right, are computationally scanned for the respective sets of collocates mollo (fire), swa/fiša (burn), etc., versus (di)tšhwene (baboon(s)), (di)phoofolo (wild animal(s)), etc. So, for example, from 215 occurrences of kgabo in the PSC, 138 (64.2%) could be detected as referring to a flame, 29 (13.5%) as indicating an ape and 31 (14.4%) as referring to the personal name Kgabo with only 17 (7.9%) instances where no indications could be found within 100 characters of co-text to the left or to the right of kgabo.

 D.J. Prinsloo

4. Corpus annotation for lexicographic purposes 4.1

Part-of-speech (POS) tagging

Grammatical or part-of-speech tagging assigns a part-of-speech label to each element in the corpus, as in the Sepedi example: Monna yo o tlile ka mmotoro. (This man came by car). See Example 1. Monna [N01] yo [CDEM01] o [CS01] tlile [V] ka [PAINST] mmotoro [N03].[.] Example 1: Sepedi example of part-of-speech tagging. (N01: noun of class 1, CDEM01: concord demonstrative of class 1, CS01: concord subject of class 1, V: Verb, PAINST: Particle instrumental, N03: noun of class 3) 4.2

Grammatical parsing

Part-of-speech tagging is often followed by syntactical analysis assigning structural markers to each constituent in the corpus. On a first, non-recursive level, chunking might be processed. A second level of syntactic analysis entails the detection of recursive constituents. A fully syntactically annotated corpus (or Treebank) offers significant assistance to lexicographers when used, for example, for the detection of collocations and subcategorization information. Consider Example 2 from the University Centre for Computer Corpus Research on Language (UCREL). [S [N Nemo NP1,, [N the_AT killer_NN1 whale_NN1 N],, [Fr [N who_PNQS N] [V ‘d_VHD grown_VVN [J too_RG big_JJ [P for_IF [N his_APP$ pool_NN1 [P on_II [N Clacton_NP1 Pier_NNL1 N] P]N]P]J]V]Fr]N],, [V has_VHZ arrived_VVN safely_RR [P at_II [N his_APP$ new_JJ home_NN1 [P in_II [N Windsor_NP1

The role of corpora in future dictionaries 

[NP safari_NN1 park_NNL1 NP] N]P]N]P]V]. S] Example 2: Treebank from the University Centre for Computer Corpus Research on Language (UCREL 2009) 4.3

Semantic analysis

For lexicographic purposes, semantic tagging is aimed at sense distinction as illustrated in Example 3.

PPIS1 VV0 AT1 JJ NN1 IO NN1

I like a particular shade of lipstick

Z8 E2+ Z5 A4.2+ O4.3 Z5 B4

Example 3: Sense distinction by semantic tagging The semantic tags on the right are composed of: 1. an upper case letter indicating general discourse field; 2. a digit indicating a first subdivision of the field; 3. (optionally) a decimal point followed by a further digit to indicate a finer subdivision; 4. (optionally) one or more ‘pluses’ or ‘minuses’ to indicate a positive or negative position on a semantic scale. For example, A4.2+ indicates a word in the category ‘general and abstract words’ (A), the subcategory ‘classification’ (A4), the sub-subcategory ‘particular and general’ (A4.2), and ‘particular’ as opposed to ‘general’ (A4.2+). Likewise, E2+ belongs to the category ‘emotional states, actions, events and processes’ (E), subcategory ‘liking and disliking’ (E2), and refers to ‘liking’ rather than ‘disliking’ (E2+). (UCREL 2009) 5. Enhanced processing of corpus data Editors-in-chief or those in charge of production know that time spent on studying concordance lines often leads to reduced lexicographic output. The gap between corpus-query output formats and the dictionary article is as yet too wide, i.e. too

 D.J. Prinsloo

much manual processing of the provided data by lexicographers is required. The key to the reduction of such manual processing probably lies in what Atkins and Rundell (2008: 103) call the Corpus Query System (CQS) of which lexical profiling as performed by Word Sketches (cf. discussion below) is an excellent example. Future systems should even reach the level of sophistication where corpus query software suggests a draft article for a specific lemma, similar, for example, to suggestions provided in semi-automated translation. The ideal role of the lexicographer would thus be comparable to that of the pilot of a fully automated and computerised modern jetliner overseeing processes with limited manual intervention. Sinclair et al. (1996: 177) emphasize the fact that “the correlation between the environment of a word and its meaning is very high.” The Malvern Workshop held in 1994 reflects on what is probably the most significant work ever done on translation equivalence in relation to bilingual lexicography (cf. Sinclair et al. 1996). Tognini-Bonelli (1996: 200) in reference to Firth (1957) emphasizes the point that “a major part of the meaning of an item arises from its relations with its immediate co-text as well as the general context of the situation” and that “the formal features observable in the co-text of a word or phrase will inevitably be so integrated that the boundaries between text and context, item and environment will become more and more difficult to define” (Tognini-Bonelli 1996: 202). Atkins (2005, forthcoming) says that the database of a bilingual dictionary is expected to give information on parameters, such as various senses, semantic content, semantic scope, morphological properties, valency, participation in idiomatic phrases, collocates, etc. Hanks describes a strategy for detailed pattern analysis of each verb. Such patterns “consist of a verb with its valencies, plus semantic values for each valency and other relevant clues, and are associated with an implicature that associates the meaning with the context rather than with the word in isolation” (2006: 1165). Thus the focus is on pattern analysis of the verb: The first step is to identify, by corpus analysis, all the patterns of normal use associated with each verb. The verb is the pivot of the clause, and many nouns will fall into place in a semantic ontology once their relationship – their normal relationship to verbs – is known. (Hanks 2006: 1168)

Calzolari (1996) refers to automated techniques for analysing and extracting lexical information from textual corpora. She gives a correlation between linguistic phenomena and levels of analysis represented here in table format (Table 6). Given the state-of-the-art in our field, some of the linguistic analyses listed above can be performed automatically with good coverage and a good success rate (those at the top in the above figure [Table 6]), others allow at least semi-automatic processing, while the last ones (at the bottom) are more difficult to perform successfully either for coverage or for adequacy or both. (Calzolari 1996: 7)

The role of corpora in future dictionaries 

Table 6. Correlations between linguistic phenomena and levels of analysis (Calzolari 1996: 6) Levels of analysis

Correlations

1. Form layer

1 → a, b, c, d, e and g

2. Lemma layer

2 → b, c, d, f and h

3. Category layer

3. → h and i

4. Syntactic function layer 4. → i and j 5. Semantic layer

5. → f, g and j

Phenomena a. M orphological idiosyncrasies pluralia tantum etc.) b. Compounds c. Collocations d. Idioms e. Fixed phrase f. Semantic fields g. Lexical disambiguation h. Subcategorization frames i. Syntactic regularities j. Knowledge extraction

Atkins and Grundy describe a method of ‘lexicographic profiling’ performed for over 400 lemmas “with regard to all aspects of inherent properties and corpus use” (2006: 1103). The example given for echo comprises 31 such properties including lexical form, corpus profile, part-of-speech, syntax, semantics, morphology, etc. (Atkins and Grundy 2006: 1104–1105) What will be required according to Pérez Hernández is “a methodology for automatic pattern recognition procedures …” (1996: 218). For the compilation of bilingual dictionaries such pattern recognition procedures have to be performed on corpora, thus in a sense bringing together what is advocated by Hanks with what is produced through Word Sketches, FrameNet, WordNet, etc. Although Word Sketches, FrameNet, WordNet etc. differ in nature, they are all corpus based and the lexicographer regards them as advanced tools for pattern recognition. 6. Major initiatives of lexical resource building With corpus sizes on the increase, lexicographers were faced with the overwhelming task of processing ‘raw’ data effectively. They soon realised that the amount of data, e.g. thousands of concordance lines generated for each lemma, was becoming impossible to study in such a ‘raw’ format. The logical next step was to design systems that could further process, interpret, sort and calculate corpus data, such as the FrameNet, Wordnet and Word Sketches, which hold the key to the future of the corpus. Each of these resources is described briefly below.

 D.J. Prinsloo

6.1

FrameNet

The developers of the FrameNet project perceive the lack of large-scale lexical resources as an impediment to the development of Natural Language Processing (NLP) applications: A significant roadblock to the development of practical natural language processing applications is the lack of large-scale lexical resources with the right kind and amount of information. Such resources must cover the breadth of a language’s basic vocabulary, but also provide appropriate syntactic, semantic and statistical information about individual lexical items. The FrameNet project is an attempt to create this next generation of lexical database. (FrameNet 2009)

Consider Table 7 and Figure 4, which serve to illustrate the efficacy of the FrameNet platform with its context-specific definitions. Table 7. FrameNet entry for the lemma argue from Atkins et al. (2003: 337) No.

frames

lemma

pos

Display

Definition

1

Evidence

argue

V

With FE

argue argue

V V

With FE

constitute evidence in support of something exchange diverging or opposite views give reasons or cite evidence in support of something

2 3

Conversation Reasoning

FE=Frame Elements

Figure 4 shows the FrameNet entry concerning an aspect of South Africa’s defence programme.

Figure 4. Full-text annotation reports: South Africa Introduction (FrameNet 2009)

The role of corpora in future dictionaries 

6.2

WordNet

WordNet is a large lexical database of English and may be described as a network of meaningfully related words and concepts: Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. (WordNet 2009)

Consider a selection of the data for bank in Figure 5 which shows these interrelationships.

Figure 5. A selection of the data for bank in WordNet (2009)

6.3

Word sketches

Word sketches are described by Kilgarriff et al. (2004 and http://www.sketchengine.co.uk/) as one-page automatic, corpus-based summaries of a word’s grammatical and collocational behaviour. From Figure 6 it is for instance clear that exercise frequently appears as the object of verbs such as do, undertake, complete, etc. or takes as a modifier practical, regular, physical, etc. See Atkins and Rundell (2008: 109) for a more detailed description of the basics of lexical profiling with specific reference to word sketches. In addition to the three Internet platforms outlined above, some other useful and interesting lexicographic tools have been developed. These are now briefly outlined.

 D.J. Prinsloo

Figure 6. Word sketch for exercise ukWaC freq = 224381 (Sketch Engine 2009)

6.4

Human Brain Cloud and FrameGrapher

The Human Brain Cloud is a variation on the ‘ubiquitous beetle’ mind-map strategy where the keyword is graphically illustrated in relation to associated concepts. Although intended more as a word association game, this interesting word-relation spider may be of assistance to lexicographers and has potential for future development as a lexicographic source reflecting semantic relations in an accessible, non-complicated way. Figure 7, for instance, is a snapshot of a semantic relations build-up in relation to Monday. Monday is first and essentially related to other days of the week, but also to emotive and practical responses as in ‘blue Monday’ and ‘work day’. FrameGrapher, as with WordNet, provides a framework of interrelationships. Compare FrameGrapher’s relation network for Criminal_investigation in Figure 8a and its definition in Figure 8b. The Core Frame Elements (FEs) are investigator and incident. See Atkins and Rundell (2008: 144) for a more detailed description of the basics of frames and frame elements.

The role of corpora in future dictionaries 

Figure 7. Snapshot taken during the build-up of Human Brain Cloud Monday (Human Brain Cloud 2009)

Figure 8a. Frame relations of Criminal_investigation in FrameGrapher (FrameNet 2009)

Figure 8b. Definition of Criminal_investigation (FrameNet 2009)

 D.J. Prinsloo

6.5

Brigham Young University Corpus query tools

The BYU Corpus of American English is a text corpus of 360 million tokens. The tokens are equally divided among spoken words, fiction, popular magazines, newspapers, and academic texts and are also historically balanced – 20 million words for each year from 1990–2007. Free online searches include advanced search options, such as those for collocates, part-of-speech, comparison of related words, and the frequency and distribution of synonyms, as well as combinations of such search functions. Compare Figures 9 and 10 for beautiful, Glasnost, CD and DvD.

Figure 9. Related words, statistical and graphical information for beautiful (The Corpus of Contemporary American English 2009)

Glasnost, for example, was frequently used in the period 1990–1994 as a fashion word but its use rapidly declined in the period 1995–2007 from 3.6 per millon words to 0.1 p/m. CD enjoyed a steady increase in frequency between 1990 and 2004 but started to decline. DVD grew rapidly in frequency from zero frequency in the period 1990–1994 to 19.5 per million words in the period 2005–2009, probably at the expense of CD.

The role of corpora in future dictionaries 

Figure 10. BYU Corpus frequency breakdown in terms of genre and history for Glasnost, CD and DvD (The Corpus of Contemporary American English 2009)

 D.J. Prinsloo

6.6

Automatic level detector for defining vocabulary

Restricting the lexicographer to a set of defining vocabulary is common practice in many dictionary compilation projects. Macmillan English Dictionary for Advanced Learners (MED), for example, even includes its list of defining vocabulary in the back matter. Lexicographic tools alerting the lexicographer as to whether words were used, for example, in the paraphrase of meaning or in examples of usage that are not in the list of defining vocabulary need to be developed further. In other words, those words presumed to be too difficult for the target user to understand should be developed. The dictionary writing system Mātāpuna has a built-in test facility ensuring that all words used in definitions come from a defining vocabulary. 6.7

Dictionary writing systems, the incorporation of lexicographic tools and sophisticated corpus queries

The absence of commercially available dictionary writing systems up to approximately the year 2000 was quite apparent and detrimental to lexicographic practice by individuals and smaller publishing houses. The past decade, however, has experienced the development of dictionary writing systems such as Mātāpuna, SIL Toolbox, Tshwanelex, and IDM. In the absence of dictionary writing systems, dictionary compilers generally reverted to general-purpose tools, especially to word processors that could offer little more than ‘what-you-put-in-is-what-you-get-out’. The Mātāpuna Dictionary Writing System is a web-based, multi-user, multilingual dictionary writing system currently used to compile Te Taura Whiri i te Reo Māori (The Māori Language Commissions monolingual dictionary of the Māori language). According to its own description (http://www.matapuna.org/), the system “assists with many aspects of lexicography, including team collaboration, routine error and consistency checking, corpus searching, publishing, and progress monitoring in addition to the traditional headword and entry management.” TshwaneLex is a dictionary writing system for the compilation of dictionaries or terminology lists. It contains many specialised features of which a few will be briefly outlined below. According to its purported claims, TshwaneLex enables: any organisation producing dictionaries to dramatically reduce dictionary compilation time, increase the quality and consistency of their dictionaries, and manage larger projects with larger teams. These features include an integrated Corpus Query System, immediate article preview, full customisability, styles system, automatic cross-reference tracking, automated lemma reversal, online and electronic dictionary modules, export to MS Word and typesetting systems (such as InDesign and XPP). (Tshwanelex 2009)

The role of corpora in future dictionaries 

Typical features and advantages of using dictionary writing systems instead of standard word processing software include the following. Dictionaries can be compiled in a shorter space of time; automation of functions, such as automatic control of numbering, font size, font type (normal, italics, boldface) capitalization; lemma reversal; cross-reference checking and updating; defining vocabulary utilization. Some dictionary writing systems also include corpora and advanced corpus query outputs such as word sketches and word behavioural patterns and word nets (cf. Word Sketches, WordNet, and FrameNet and BYU corpus queries, already discussed). Multiple dictionaries could be generated from the same database. It is possible for dictionary writing systems to provide a friendly, adaptable user-interface. Compare, for example, Figures 11 and 12 for user-friendly and multi-functional input screens of Tshwanelex. While treating the lemma setlaela (a dumb person) the lexicographer sees the cross-reference to e.g. lešilo (fool) and also that setlaela is a reference address of its plural form ditlaela.

Figure 11. Tshwanelex: indicating cross-references in relation to setlaela (Tshwanelex 2009)

 D.J. Prinsloo

Figure 12. Integrated corpus query tool, with auto-extracted usage example in Tshwanelex (2009)

Figure 13. LexiView screen for the word Markt (Heid et al. 2004)

The role of corpora in future dictionaries 

In Figure 12 usage examples are automatically offered to assist the lexicographer in the treatment of absolutely. Heid et al. (2004) and Evert et al. (2004) describe tools for upgrading printed dictionaries by means of corpus-based lexical acquisition. The activities include a comparison between corpus and dictionary data. The results of the comparison are used by LexiView, an interactive user interface (GUI) for interactive inspection. The lexicographer can see a wealth of information with e.g. inclusion versus deletion indicators. 7. Concluding remarks The corpus of the future is expected to be a huge, but thoroughly selected and heavily annotated database constituting the basis for corpus query programs. The future of the corpus for lexicographic use revolves around the sophistication and ability of corpus query programs to extract, compile and present the corpus data to the lexicographer. The ideal situation would be one where the lexicographer could play the role of a final editor of dictionary articles prepared by a dictionary writing software. Notes 1. It is not indicated in the introduction section of COBUILD 2 how these figures were calculated. 2. The University of Pretoria Sepedi Corpus (PSC) is a collection of ca. six million running words of Northern Sotho, containing texts from different genres and domains.

References A.

Dictionaries

[COBUILD 2] Sinclair, J. (ed.). 1995. Collins COBUILD English Dictionary. London: HarperCollins. [HAT] Odendal, F.F. and Gouws, R.H. 2004. Verklarende Handwoordeboek van die Afrikaanse Taal. Doornfontein: Perskor. [MED] Rundell, M. (ed.). 2007. Macmillan English Dictionary for Advanced Learners. Second Edition. Oxford: Macmillan. [WAT] Hauptfleisch, D.C. (ed.). 1984. Woordeboek van die Afrikaanse Taal. VII. Bureau of the WAT: Stellenbosch.

 D.J. Prinsloo

B.

Other literature

American National Corpus: http://www.cs.vassar.edu/~ide/papers/anc-lrec04.pdf (Accessed 20 February 2009). Atkins, B.T.S. 2005, forthcoming. “Me Lexicographer, You Translators or Context-free (vs. context-sensitive) translation and what it involves.” In Translation and Meaning Part 7: Proceedings of the Maastricht Session of the 2005 Maastricht- Lódz Duo Colloquium on “Translation and Meaning”, held in Maastricht, The Netherlands, 18–21 May 2005, M. Thelen and B. Lewandowska-Tomaszczyk (eds.). Atkins, B.T.S. and Grundy, V. 2006. “Lexicographic profiling: an aid to consistency in dictionary entry design.” In Proceedings of the Twelfth EURALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006, E. Corino et al. (eds.), 1097–1107. Atkins, B.T.S., Rundell, M. and Sato, H. 2003. “The contribution of Framenet to practical lexicography.” International Journal of Lexicography 16(3): 333–357. Atkins, B.T.S and Rundell, M. 2008. The Oxford Guide to Practical Lexicography. Oxford/New York: Oxford University Press. Bank of English: (Accessed 20 February 2009). Bergenholtz, H. and Tarp, S. 1995. Manual of Specialised Lexicography: The preparation of specialised dictionaries. Amsterdam/Philadelphia: John Benjamins. Braasch, A. and Povlsen, C. (eds.) 2002. Proceedings of the Tenth EURALEX International Congress, EURALEX 2002, Copenhagen, Denmark, August 13–17, 2002. Copenhagen: Center for Sprogteknologi. British National Corpus: (Accessed 20 February 2009). BYU Corpus of American English: (Accessed 20 February 2009). Calzolari, N. 1996. “Lexicon and corpus: a multi-faceted interaction.” In Euralex ‘96 Proceedings I-II, Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, Sweden, M. Gellerstam, J. Järborg, S-G. Malmgren, K. Norén, L. Rogström and C.R. Papmehl (eds.), 3–16. Gothenburg: Department of Swedish, Göteborg University. Corino, Elisa et al. (eds.). 2006. Proceedings of the Twelfth EURALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006. Alessandria, Italy: Edizioni dell’Orso. The Corpus of Contemporary American English: (Accessed 20 February 2009). De Schryver, G-M. and Prinsloo, D.J. 2000a. “Electronic corpora as a basis for the compilation of African-language dictionaries, Part 1: The macrostructure.” South African Journal of African Languages 20(4): 291–309. De Schryver, G-M. and Prinsloo, D.J. 2000b. “Electronic corpora as a basis for the compilation of African-language dictionaries, Part 2: The microstructure.” South African Journal of African Languages 20(4): 310–330. De Schryver, G-M. and Prinsloo, D.J. 2003. “Compiling a lemma-sign list for a specific target user group: The Junior Dictionary as a case in point.” Dictionaries. Journal of The Dictionary Society of North America 24: 28–58. Evert, S., Heid, U., Säuberlich, B., Debus-Gregor, E. and Scholze-Stubenrecht, W. 2004. “Supporting corpus-based dictionary updating.” In Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004, G. Williams and S.

The role of corpora in future dictionaries  Vessier (eds.), 255–264. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud. Firth, J.R. 1957. Papers in Linguistics 1934–1951. London: Oxford University Press. FrameNet: (Accessed 20 February 2009) Gellerstam, M., Järborg, J., Malmgren, S.-G., Norén, K., Rogström, L. and Papmehl, C.R. (eds.). 1996. Euralex ‘96 Proceedings I-II, Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, Sweden. Gothenburg: Department of Swedish, Göteborg University. Hanks, P. 2006. “The organization of the lexicon: semantic types and lexical sets.” In Proceedings of the Twelfth EURALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006, E. Corino et al. (eds.), 1165–1168. Alessandria, Italy: Edizioni dell’Orso. Heid, U., Säuberlich, B., Debus-Gregor, E. and Scholze-Stubenrecht, W. 2004 “Tools for upgrading printed dictionaries by means of corpus-based lexical acquisition.” (Accessed 20 February 2009). Human Brain Cloud: (Accessed 20 February 2009). IDM: (Accessed 20 February 2009). Kennedy, G. 1998. An Introduction to Corpus Linguistics. London: Longman. Kilgarriff, A., Rychly, P., Smrz, P. and Tugwel, D. 2004. “The Sketch Engine.” In Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004, G. Williams and S. Vessier (eds.), 255–264. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud. 105–115. Landau, S.I. 2001 Dictionaries: The Art and Craft of Lexicography. Second Edition. Cambridge: Cambridge University Press. Mātāpuna: (Accessed 20 February 2009). Media24: (Accessed 20 February 2009). Oxford English Corpus: (Accessed 20 February 2009). Pérez Hernández, C. 1996. “A Pilot study on translation equivalence between English and Spanish.” International Journal of Lexicography 9(3): 218–237. Prinsloo, D.J. 2004. “Revising Matumo’s Setswana – English – Setswana Dictionary.” Lexikos 14: 158–172. Prinsloo, D.J. and De Schryver, G.-M. 2002. “Designing a measurement instrument for the relative length of alphabetical stretches in dictionaries, with special reference to Afrikaans and English.” In Proceedings of the Tenth EURALEX International Congress, EURALEX 2002, Copenhagen, Denmark, August 13–17, 2002, A. Braasch and C. Povlsen (eds.), 483–494. Copenhagen: Center for Sprogteknologi. Prinsloo, D.J. and De Schryver, G.-M. 2003. “Effektiewe vordering met die Woordeboek van die Afrikaanse Taal soos gemeet in terme van ’n multidimensionele Liniaal [Effective Progress with the Woordeboek van die Afrikaanse Taal as Measured in Terms of a Multidimensional Ruler].” In’n Man wat beur: Huldigingsbundel vir Dirk van Schalkwyk, W. Botha (ed.), 106–126. Stellenbosch: Buro van die WAT. Prinsloo, D.J. and Gouws, R.H. 2000. “The use of examples in polyfunctional dictionaries.” Lexikos 10: 138–156. Prinsloo, D.J. and Gouws, R.H. 2006. “Fashion words in Afrikaans dictionaries: a long walk to lexicographic freedom or just a lexical fly-by-night?” In Proceedings of the Twelfth EU-

 D.J. Prinsloo RALEX International Congress on Lexicography, University of Turin, Italy, 6–9 September 2006, E. Corino et al. (eds.), 301–312. Alessandria, Italy: Edizioni dell’Orso. Roberts, R.P. and Montgomery, C. 1996. “The use of corpora in bilingual lexicography.” In Euralex ‘96 Proceedings I-II, Papers submitted to the Seventh EURALEX International Congress on Lexicography in Göteborg, Sweden, M. Gellerstam, J. Järborg, S-G. Malmgren, K. Norén, L. Rogström and C.R. Papmehl (eds.), 457–464. Gothenburg: Department of Swedish, Göteborg University. Scottish Corpus of Texts & Speech: http://www.scottishcorpus.ac.uk/ (Accessed 20 February 2009). SIL: (Accessed 20 February 2009). Sinclair, J., Payne, J. and Pérez Hernández, C. 1996. (eds.). “Corpus to corpus: a study of translation equivalence.” International Journal of Lexicography. 9(3): 171–273. Sketch Engine: (Accessed 20 February 2009). Summers, Della. 1993. “Longman/Lancaster English Language Corpus – criteria and design.” International Journal of Lexicography 6(3): 181–208. Teubert, W. 1996. “Comparable or Parallel Corpora?” International Journal of Lexicography 9(3): 238–264. Tognini-Bonelli, E. 1996. “Towards Translation Equivalence from a Corpus Linguistics Perspective.” International Journal of Lexicography 9(3): 197–217. Tshwanelex: (Accessed 20 February 2009). UCREL: (Accessed 20 February 2009). Wikipedia: (Accessed 20 February 2009). WordNet: (Accessed 20 February 2009). Williams, G. and Vessier, S. (eds.). 2004. Proceedings of the Eleventh EURALEX International Congress, EURALEX 2004, Lorient, France, July 6–10, 2004. Lorient: Faculté des Lettres et des Sciences Humaines, Université de Bretagne Sud.

Lexicographical data in natural-language systems Franziskus Geeb Natural Language Systems, in the sense of interactive, text-based conversations between man and machine such as a chatbot, are widely used in business, entertainment, and research. The success of a chat robot is justified substantially by its ability to lead a dialogue with a chat partner and make meaningful statements. As knowledge bases for this communication, terminological/ lexicographical data are conceivably useful. A large structural depth and well-known definitions of user lexicographical data offer a good basis for an automated discussion between a chatbot and a human user. This project clarifies the circumstances under which lexicographical data can be used as a knowledge base for a chatbot and evaluates the chances for success of such data. Keywords: knowledge bases; lexicographical data; chat robots; monolingual online dictionaries; database ontologies

1. Communication with chatbots and lexicography 1.1

State of the art

Communication between man and machine (computer) is widely known and was written about as early as 1966 by Joseph Weizenbaum. The main idea was and is to write a program that enables the computer to have a meaningful conversation with its user in natural language. Example 1 reproduces a conversation between a human and a chatbot taken from one of the well-known chatbots “ELIZA” (Weizenbaum 1966).

1. 2. 3. 4. 5.

User: hi Chatbot: How are you today? What would you like to discuss? User: how can you help me Chatbot: Don’t you believe that I can help you? User: yes

 Franziskus Geeb

6. Chatbot: I understand. 7. User: I ’m so sad 8. Chatbot: Tell me more... 9. User: My hamster died 10. Chatbot: That is quite interesting. 11. ....

Example 1: NLP-addiction (2008) Chat communication is located somewhere between written language and spoken language, and information technology does not really limit the conversation, e.g. by a limitation of words or syntax (Geeb 2007). Users are in general allowed to type anything into the input field. Thus, the main problems for the software of a chatbot are: 1. Is the input written in a language that is familiar to the chatbot? 2. Is the input a complete sentence, part of a sentence, or something else (exclamation)? 3. Is the input a correct sentence? 4. Does the chatbot know the words from the input (usually after weak/strong stemming and deleting stop words from the input)? If not, it could be a dialect, sociolect or even usual LGP words that are not part of the chatbot’s knowledge base. 5. Is the input somehow related to the chatbot’s former answers to earlier input from the user or answers from the chatbot? 6. Is the user input some kind of small talk, or is it a question to a given knowledge base of the chatbot? 7. How do I (chatbot) answer questions or user input in general that I (chatbot) do not understand? 8. How do I (chatbot) learn from user input — or should I not use the user’s input as logical and semantically relevant information?

Consider the dialogue in Example 2.

1. 2. 3. 4. 5. 6.

User: “What do you know about cars?” Chatbot: “They have four wheels.” User: “Cars have five wheels.” Chatbot: “Are you sure that cars have five wheels?” User: “Yes” Chatbot: “Ok, I have learned that cars have five wheels.”

Example 2: Dialogue between user and chatbot

Lexicographical data in natural-language systems 

The wrong information from the user must not be given on purpose but could be the result of insufficient knowledge in this specific area. The effect would be the same — incorrect information in the knowledge base. 9. How do I (chatbot) handle offending input, which tends to be a remarkable part of the user input in some environments (usually young users just playing or technically interested users who want to take the chatbot to its limit)? In a lexicographical environment, chatbots are fairly new (Geeb 2007) but can be used for different purposes always related to text reception of L1-texts by L1-users (Geeb 1998: 39 et seq.; Bergenholtz and Tarp 2003: 174): 1. Extending a search interface from a keyword search to a user centred natural language search by extending the search in the lexicographical database from the known one-word search to a search of different words from the user input, applying some kind of ranking on the knowledge base (dictionary) such as term weight. This would be a knowledge-orientated lexicographical function (Bergenholtz and Tarp 2003: 176). 2. Clarifying the user’s needs and intentions in the search through checkbacks for better search results (Montero and Araki 2007; Popp et al. 2007). 3. Giving the user cross-references to information in the lexicographical database that is not part of the search results (e.g. lemma) that the user expects. This could be achieved through an elaborate cross-reference structure (Bergenholtz and Tarp 1995: 215 et seq.) or even better with a systematic classification (Bergenholtz and Tarp 1995: 195 et seq.), taxonomy, or ontology connected to the information in the lexicographical database. 4. The chatbot could learn from the user’s input regarding linguistic and semantic or even cross-referenced information. Irrespective of whether the user can be trusted or not. On the other hand, user participation is an important feature of modern lexicography but also in the actual trends in the Internet known as Web 2.0. 5. Leading the user to relevant content that the user otherwise would not read. If a dictionary user is interested in some information but the intention of the author and/or publisher (Geeb 1998a: 60 et seq.) is to give the user more information, hoping that this could affect the user’s knowledge, this makes sense. Teenagers who want to know when they are allowed to buy alcoholic drinks may look up the lemma “Alcopops” searching for semantic and/or encyclopaedic information. There, they are informed about the legal age, but the encyclopaedic information to this lemma is rather long, and the users probably do not have the patience to read it. Instead, they are led to a quiz that contains roughly the same content as the encyclopaedic information but in a funnier way, with interaction and immediate feedback after each question and answer.

 Franziskus Geeb

Reading or knowing the encyclopaedic content of this lemma could be important to the user because they are informed, e.g., about the period of time in which alcohol remains in the body. The intention of the author and publisher of this dictionary is obviously not just to inform users about the legal aspects of alcohol but to give users a better basis for a decision about whether to drink alcohol or not. 6. Providing a better search interface to a (lexicographical) database (Hammer and Schmoecker 2004: 163 et seq.) and thus theoretically including more lexicographical functions (Bergenholtz 1997). 7. Establishing some kind of customer relationship between the website (lexicographical) user and the (lexicographical) product (Koch 2002: 145; Pelka 2003: 24). This can be achieved by providing the chatbot with a character (et. Images) expressing emotions etc. (Thiedeke 2001: 55 et seq.). This is just one example, but the idea of using a chatbot to lead the user to information that is not part of the original user’s intention (Geeb 1997) but may be of interest to the user can be applied to many other use cases. Last but not least, in learning environments, e.g. software for all kinds of “lifelong learning” while in active employment, this connection between a chatbot, a lexicographical database, and the concept of pushing information to the user in a certain situation is promising. From the point of view of a chatbot programmer, the main reason for using lexicographical products (“lexikographische Nachschlagewerke” as described by Geeb (1998a: 35 et seq.) and “ordbog” described by Bergenholtz 1994: 56 et seq.) as a layer in the knowledge base of a chatbot is the highly structured information that can usually be found in lexicographical databases. Here an important distinction has to be made, however. Monolingual LGP dictionaries that mainly contain linguistic information such as the German Duden (1999) are in general not that useful for a chatbot knowledge base apart from the fact that this kind of linguistic information could be used for syntactic or semantic parsing of the user input or construction of chatbot answers. Monolingual LGP dictionaries with semantic and/or encyclopaedic information (Geeb 1998; 1998a) such as Webster’s (1996) or even encyclopaedias such as Brockhaus (2006) would be useful for the chatbot if the areas to be covered by the chatbot are part of the dictionary. The same restriction applies to specialized dictionaries. If the chatbot’s conversation is to be limited to the field of the LSP dictionary and if this dictionary contains not only linguistic information but also semantic and encyclopaedic information, the dictionary would be of great value for the chatbot knowledge base. Bilingual LGP dictionaries (lexicographical products) and bilingual LSP dictionaries (lexicographical products) tend to have some semantic information in L1 or L2 related to their user function. They may be of some value for a theoretical chatbot that is able not only

Lexicographical data in natural-language systems 

to answer but also to translate input and answers (Shigeoka 2002: 269). In addition to all the problems a regular monolingual chatbot has to take into account, many other unsolved problems from machine translation and bilingual LSP dictionaries (Bergenholtz and Pedersen 1999) would apply to this chatbot. Therefore, this kind of chatbot relying on a L1-L2 or L2-L1 dictionary in one user function or another is currently only of theoretical interest. In general monolingual dictionaries — or even broader: “lexicographical products” — are useful as part of a knowledge base for a chatbot if they contain semantic and encyclopaedic information as well as many cross- references or even a systematic classification, taxonomy, etc. 1.2

Lookedup4you

The project lookedup4you (Geeb and Spree 2005), which was carried out for the consumer advice centre “Verbraucherzentrale Nordrhein Westfalen” in Germany, is a dictionary for young consumers aged from about 10 to 16. Students of the University of Applied Sciences, Hamburg (Germany), wrote the programs as well as the content of the dictionary, while the main idea and the concept were developed by some researchers at the same university. The project uses two main technologies to establish a lexicographical-enabled chatbot: AMIL, a special chatbot language, and a relational database. Both features will be discussed here as technology on the one hand and as a concrete implementation at lookedup4you on the other hand. This will lead to several conclusions regarding the possibilities of the actual system as well as conclusions regarding concepts of future developments in this area. The dictionary at lookedup4you (2005) is a specialized monolingual dictionary in German with lemmata from different fields. All lemmata are related to the main Web site, shown in Figure 1, even though the dictionary is a system of its own. The concept of using a chatbot as a search interface for the lexicographical database aims to give better search results as well as to motivate the user group to use the well-structured information of a dictionary almost without knowing that the chatbot is part of the dictionary interface. The dictionary has been online for several years and is used by many of the visitors of the centre’s main Web site even though it contains no more than about 270 articles. Nevertheless, all articles contain text and cross references to other articles, content on the main Web site, and other Web sites. Short films that explain the lexicographical content of an article and quizzes that present the content in a different and almost entertaining way are additional features of some articles (Figure 1).

 Franziskus Geeb

Figure 1. Lookedup4you main Web site (2005)

The main problem for this chatbot is to decide whether the user input is only regular chat or whether it should be treated as a query on the dictionary. The strategy used in this particular implementation is to run both kinds of conversations at the same time. This sometimes results in confusing answers, such as the dialogue reproduced in Example 3. 1. User: Was ist eigentlich Homebanking 2. Chatbot: Ich weiss nicht genau. — Der Artikel ”Homebanking” passt genau zu Deinem Thema. (1. User: What does Homebanking mean? 2. Chatbot: I don’t know exactly. – The article „Homebanking“ perfectly matches your topic.) Example 3: Dialogue between user and chatbot The first answer “Ich weiss nicht genau” (I don’t know exactly) is from the small talk part, and the second is from the dictionary search that will be described in detail below. The reason for using this strategy is that it would be hard to decide where small talk ends and a query that can be answered by the dictionary begins. With this in mind, the obvious solution would be to connect small talk and dictionary information in only one system such as an ontology, but this concept has not been implemented in the actual installation as shown above.

Lexicographical data in natural-language systems 

2. Structured data for chatbots 2.1

ELIZA and AIML

“ELIZA is a program that makes natural language conversation with a computer possible” (Weizenbaum 1966: 36). Even if this is proved to be incorrect, the main idea in Weizenbaum’s program is still being used by several chatbots. The program looks for keywords and transforms them into other words that can be used in an answer. In the example of the centre’s dictionary, a dialogue could be performed in the way reproduced in Example 4. 1. User: Wer hat das Telefon erfunden (User: Who invented the telephone?) 2. Chatbot – Software: • Finding words: {(Wer) (hat) (das) (Telefon) (erfunden)} • Finding known parts (“keywords”): (Wer) • Finding transformation rule: (Wer) = Ich weiß nicht wer (.) (I don’t know who (...)) Answer: Ich weiss nicht wer (hat) (das) (Telefon) (erfunden) (I don’t know who (invented) (the) (telephone)) Example 4: Dialogue between user and chatbot In this way, a conversation would be possible in many cases just by asking back or repeating the input of the user in a different way. This is not just small talk but a way to keep the conversation with the user going even if the meaning of this conversation is doubtful. In a lexicographical context, there is no advantage in using this system apart from what the key words would be, e.g. lemmata. This concept is used by the actual chatbot in lookedup4you. “AIML (artificial intelligence markup language) is an XML-compliant language that’s easy to learn and makes it possible for you to begin customizing an Alicebot or creating one from scratch within minutes” (AIML 2008). The fact that XML is an easy-to-learn and widely used standard for representing data and several AIML-applications in different fields (Lundqvist et al. 2006; Popp and Huber 2007; Shawar and Atwell 2004) supports this statement. AIML is the key to use a so-called Alicebot (Wallace 2005), which is a chatbot and nothing more than a piece of software parsing the directives in the AIML file(s) using pattern matching. The user input is not parsed in a linguistic sense but just matched against strings containing possible user input (Shawar and Atwell 2004: 408). Other well-known chatbot projects running without AIML still use the pattern matching strategy, too

 Franziskus Geeb

(Tatai et al. 2003). The software to use AIML in a front-end chatbot can be found in many programming languages and is mainly for Web use. A typical conversation written in AIML could be formalised as shown in Example 5. KANNST DU DAS NICHT AENDERN

Nein, das uebersteigt meine Faehigkeiten.

Bin ich Superwoman?

Ich bin hier ja nicht der Chef...

(COULDN’T YOU CHANGE THAT Laugh No, that exceeds my skills. Am I Superwoman? I’m not the boss here...) Example 5: Typical conversation written in AIML The element is the input of the user. If this input is typed exactly this way, the bot will answer with one of the random answers in the element and show the image referred to in the element . The image part is not standard AIML but an extension from the lookedup4you project (2005), enabling different expressions with different answers (sad, happy, joking, etc.) and adding features of a personal nature to the chatbot answer and thus extending the features of a common lexicographical product. The obvious disadvantage of this answer strategy is the fact that the input of the user exactly has to match the predefined pattern in the AIML file, though the match is not case sensitive. Typos like “KANST DU...” would result in a non-match. But AIML has a wildcard feature, too. The wildcard “*” represents anything typed by the user. So with a lexicographical context in mind, Example 6 could be the AIML representation of the lemma Homebanking from lookedup4you. * HOMEBANKING Von Homebanking spricht man, wenn die Abwicklung diverser Bankgeschäfte, wie z.B. Überweisungen, von zu Hause erledigt wird. Homebanking wird in der Regel mit Hilfe von

Lexicographical data in natural-language systems 

Telefon oder Internet durchgeführt. Der große Vorteil ist, dass die Bankgeschäfte unabhängig von den Geschäftszeiten der Banken getätigt werden können. Beim Homebanking unterscheidet man zwischen Telefonbanking und Online- Banking. (“explain” One speaks of homebanking when the completion of various banking transactions, e.g. transfers, are done at home. Homebanking is usually accomplished by telephone or internet. The main advantage is that banking transactions can be made independently of the office hours of the banks. In homebanking one differentiates telephone banking from on-line banking.) Example 6: AIML representation of the lemma Homebanking from lookedup4you (2005) User input such as “Was ist Homebanking” (What is homebanking?) or “Kennst Du Homebanking” (Do you know homebanking?) would result in a match on this lemma. If the input was “Kann ich Homebanking benutzen,” (Can I use homebanking?) this would not trigger the lemma because the wildcard is only in front of the lemma. AIML knows only one wildcard in a pattern, and this wildcard is in addition very simple compared to regular expressions known from different programming languages or XML schemas, even though AIML is very similar to an XML application. The following category element can compensate for part of this problem by using the wildcard after the lemma and in the template referring to the original lemma with the element as the link to the lemma and the element representing the wildcard from the user’s input as shown in Example 7. HOMEBANKING * (“explain”) HOMEBANKING Example 7: Use of wildcard element in search However, one problem remains unsolved. If the user input is something like “Ist Homebanking gefährlich,” (Is homebanking dangerous?) none of the above rules would lead to a match or the original category for Homebanking. The pattern

 Franziskus Geeb

* HOMEBANKING *

would match this input but is not legal according to the AIML Reference Manual (AIML 2001). On the other hand, the AIML Overview by the inventor of AIML, Richard Wallace (Wallace 2002), states that these patterns are possible. One of the applications of AIML in programming languages, Program E (Program E 2007), also implements this feature, so that the following category element is correct (Example 8). * HOMEBANKING * (“explain”) Von Homebanking... Example 8: Category element accepted by Program E (2007) With these differences between the official AIML documents and documentation in mind, the conclusion is that, in a lexicographical context, the wildcard on both sides of the lemma is crucial. In an even more elaborate version, a lexicographical AMIL should be regular expression-enabled despite the problems of the different regex (i.e. regular expressions) standards and engines (Wall 2006). With regular expressions, several writing versions of “HOMEBANKING” such as “HOME BANKING,” “HOME-BANKING,” “HOME—BANKING,” etc. could be matched and linked to the lemma with a simple expression like

.* HOME[\s-]*?BANKING.*”

The same statement in AIML would result in many category elements because each spelling variant of the lemma has to be defined as a category of its own, preventing uncontrolled matches such as the tempting solution of a wildcard inside the lemma, which, by the way, is not possible in AIML 1.0.1:

* HOME * BANKING *

AIML has some kind of systematic classification as known from especially LSP – dictionaries, too. The element can be used to handle homographs and polysemy as in Example 9, which is not part of a dictionary but was chosen to illustrate this feature.

Lexicographical data in natural-language systems 

* Geldkarte * Eine Geldkarte bekommst Du bei der Bank * Küche einrichten * Frage mich mehr zum Thema Wohnen * BANK * Eine Bank hilft dir bei der Abwicklung von Dingen rund ums Geld. * BANK * Es gibt die verschiedensten Bänke. (cash card You can get a cash card from a Furnish kitchen Ask me more about the topic Live A bank helps you with any transaction that involves money. There are the most diverse banks.) Example 9: Using the element to handle homographs and polysemy This system of topics allows only disjunct classes at one level, which is an obvious drawback. A lexicographical version of AIML must therefore open up for classification at multiple levels and the assignment of multiple classifications to one lemma (faceted classification).

 Franziskus Geeb

Elements forming the lexicographical microstructure can easily be applied to the element as shown with the element and are therefore not discussed here in detail. As illustrated above, AIML is capable of containing lexicographical information, but the standard version of AIML is not really suited to express the many different aspects of a systematic classification, which is an important part of lookedup4you (2005) as a specialized dictionary. Therefore, in this project only small talk that does not contain any lexicographical information is stored in AIML. However, the lexicographical information is part of a relational database. 2.2

Relational databases

An important part of the chatbot in a lexicographical context is the data structure of the dictionary. The data model of the dictionary’s database in lookedup4you is illustrated in Figure 2.

Figure 2. Data model of the dictionary’s database in lookedup4you (2005)

Lexicographical data in natural-language systems 

The table “t_artikel” is the main resource of the database containing the information about the lemma. As a result of the user profile and the purpose of a specialized dictionary, the linguistic information is not elaborate. Optional linguistic information is a translation of the lemma into English and plural. Important in this table are the fields “kurztext” and “langtext,” which both contain encyclopaedic information. Kurztext is a short definition, e.g. the lemma Acrylamid: Acrylamid ist ein chemisches Produkt. Es wird als Baustein für Kunststoffe, Lebensmittelverpackungen und Kosmetika verwendet. Acrylamid kann aber auch in Lebensmitteln entstehen, wenn stärkehaltige Nahrungsmittel (wie Kartoffeln und Getreide) stark erhitzt werden. (Acrylamide is a chemical product. It is used as a component in plastics, food packing and cosmetics. In addition, acrylamide can develop in food, when starchy food (like potatoes and grain) are strongly heated up.)

In most cases, the main structure of “kurztext” is a definition with “genus proximum” and “differentia specifica”, but there is no rule requiring the editor to write definitions in the field “kurztext” in exactly this way. “Langtext”, on the other hand, contains more specific information about the lemma, e.g. the lemma Geldkarte: Wie und wo kann man mit einer Geldkarte bezahlen? Der Mikro-Chip ist auf der Geldkarte als kleiner goldener Chip zu erkennen. Auf diesen Chip kann nach einer PIN-Prüfung, d.h. nach Prüfung deiner Geheimzahl, ein gewünschter Betrag bis maximal 200 Euro geladen werden. Dieser Betrag wird von deinem Konto bei der Bank abgebucht. Eine Geldkarte kann im Prinzip beliebig oft aufgeladen werden. Zum Aufladen der Geldkarte können die meisten Geldautomaten oder spezielle Ladeterminals verwendet werden. In den Geschäften oder Firmen, die die Geldkarte anerkennen, kann der im Chip gespeicherte Betrag zur Zahlung von Waren und Dienstleistungen genutzt werden. Wenn du mit deiner Geldkarte bezahlt hast, wird die entsprechende Kaufsumme von deinem Chip abgebucht. Mit speziellen Chipkartenlesern kannst du sowohl das aktuelle Guthaben auf der Geldkarte als auch die letzten Abbuchungen und Ladevorgänge ablesen. Die Geldkarte kannst du als Kontoinhaber von deiner Bank bekommen. Aber auch als Jugendlicher ohne Konto kannst du eine Geldkarte beantragen und somit bargeldlos bezahlen. (Lookedup4you 2005) (How and where can one pay with a cash card? The microchip on the cash card can be recognized as a small golden chip. On this chip a desired amount up to 200 euros can be loaded after a Pin code is entered i.e. after the secret code is confirmed. This amount is deducted from your bank account. A cash card can in principle be loaded very often. To load the cash card you can use most of the ATM machines or special load terminals. In shops or businesses that accept the cash card, the amount stored in the chip can be used for the payment of goods and services. When you pay with your cash card, the corresponding purchase sum is deducted from your chip. With special card readers you

 Franziskus Geeb

can read off both the current amount on the cash card and the last deductions and loading transactions. You can get the cash card as an account owner from your bank. But as a minor without an account you can request a cash card and this way you don’t have to pay cash.)

There is no limit on the size of this text, and it can theoretically be from 0 to 224–1 characters (16.4 MB). This also applies to the “kurztext,” but only for technical reasons, in the database. Using the main entry interface, the editor has a limited number of characters for the “kurztext”, which can be edited by the administrator. The distinction between “kurztext” and “langtext” is mainly based on the argumentation of semantic and encyclopaedic information in a specialized dictionary (Geeb 1998: 130 et seq.). The theoretical background of these concepts is not made known to the user, though. The information’s “datum” (date of editing), “verfasser” (editor), “status” (state in the editing process), and “zugriffe” (number of edits) are editorial only. More lexicographical content, which is important for the lexicographical user, is provided in this table in the fields “quiz” and “film”. Both types of information are optional. A quiz is always based on the encyclopaedic information of an article but extends this information in some way (Example 10). Lemma: Handy (short definition): Die richtige Bezeichnung ist eigentlich Mobiltelefon. Der Begriff “Handy” kam als Bezeichnung für handliche und tragbare Funkgeräte zustande. Er wurde von der Industrie geschaffen und wird in der englischen Sprache gar nicht verwendet. Allerdings hat er sich durchgesetzt und ist so in Deutschland bzw. in deutschsprachigen Ländern zum Synonym für das Mobiltelefon geworden. (The correct designation is actually mobile phone. The term “Handy” came off as a name for handy and portable mobile radios. It was created by the industry and is not used at all in English. However, the use of the term set firmly throughout Germany and German-language countries where it became a synonym for mobile phone.)

Quiz: Handy 1. Wofür steht das Wort Handy ursprünglich? Für eine kleine Hand Für ein schnurloses Haustelefon Für ein kleines Funkgerät (1. What does the word handy originally stand for? For a small hand For a cordless telephone For a small mobile radio) Example 10: Lemma definition and quiz for Handy

Lexicographical data in natural-language systems 

The purpose is to make young users read the important information that may be overlooked in a longer article (“langtext”). A quiz, on the other hand, features lexicographical content as some kind of game and is therefore frequently used by teenagers, who make up the main user group of this dictionary. Quizzes can be used without reading the article first but just by clicking the answers and trying to find the right answer. There is no limit on the number of tries. Users who have already read the article can use the quiz to check their own knowledge in this field. The same purpose as in a quiz is provided by the lexicographical film. Encyclopaedic information from the article is transferred to a film script showing the content in a more entertaining way than the written text using multimedia. Again, the dictionary’s main purpose is to get young users interested in its content. Therefore, content is in some cases provided in two ways—as text and as film. A lemma can have one or more synonyms (table “t_synonyme”). The synonyms are not provided with any further information, and they can theoretically be stored as lemmata as well. The same features apply to search words (table “t_suchbegriffe”) that are not limited to a given list. The editors are allowed to use any word as a search word even if the word is already stored in the database as a synonym or lemma. It is the responsibility of the editor to find and use search words that provide additional ways to find the content. Search words for the lemma Nachfrage (demand) are Angebot (offer), Preisbildung (price formation), Markt (market), and Marktgeschehen (market situation). If a user searches for Preisbildung, there is no lemma but a hit for a search word, and the software will redirect the user’s query to the lemma Nachfrage, which has important information on Preisbildung, too. Another way to access the content of this dictionary is browsing the categories and subcategories. There is a many-to-many relationship between lemma and subcategories (table “t_artikel_has_t_subkategorien”), allowing a lemma to be registered under several subcategories and opening the subcategories for an unlimited number of lemmata. A subcategory can be part of several categories, although this very complex feature is only used for the subcategory “Allgemeines,” which is found in nearly every category. The tables “t_linksext” and “t_linksint” store information about external links (World Wide Web and also the main Web site on which the dictionary is based) and internal links (the actual dictionary). These two tables store the traditional cross-references that can be found in many dictionaries, but they are also important as an extension of the encyclopaedic information and as a path for the chat robot where additional information is located. 2.2.1 Chatbot answers from the dictionary database

The chatbot uses the database in the actual version of the program as one of two possible answer strategies. The first answer strategy is a conversation in small talk

 Franziskus Geeb

as shown above in the section on AIML. Every input from the user is matched against the content of the AIML database to provide some kind of interaction with the user through a dialogue. The second answer — the one from the dictionary — is a result of an extended search in the lexicographical database of lookedup4you. The following steps are carried out: 1. Stop words such as abgesehen, alle, allein, aller, alles, als, also (apart from, �� everyone, alone, all, everything, as, thus), etc. and special characters are removed from the input. 2. The remaining words are stemmed using a weak stemming as known from the Porter algorithm (Porter 1980). A strong stemming could easily result in an overstemming and the lemmata, synonyms and stop words are stored in a structured way (nominative singular) anyway. Matching the weak stemmed version against these words should therefore produce good results. We therefore assume that everything a user enters in the form and sends to the chatbot and which is not excluded as a stop word, is a word suitable for searching. 3. For all words with more than two characters, a full-text query is run on several fields with the stemmed and truncated words. As shown in the data model, a (full-text) index is set up on the fields “lemma”, “kurztext”, “langtext”, and “plural”. This full-text index is provided by the storage engine MyISAM and calculates the relevance value of each word with this formula:

w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)

“Where dtf is the number of times the term appears in the document, sumdtf is the sum of (log(dtf)+1)’s for all terms in the same document, U is the number of Unique terms in the document, N is the total number of documents and nf is the number of documents that contain the term” (MySQL 2008). This full-text query includes different searches on the fields “lemma”, “synonym”, and “suchbegriff ” (search word). At the same time, a full-text query is run with a non-stemmed version of the words but on a full-text index that includes different fields: “lemma”, “plural”, “kurztext”, “langtext”. Each full-text query results in 0-n matches for a lemma and a relevance value as shown above. If the relevance value is higher than a predefined threshold, the match is registered as relevant for the user. All relevant matches are sorted according to the relevance value and presented to the user in this order of ranking. The system then picks a random answer text for each case of the following: 0 matches, 1 match, n-matches. The only purpose of the random answers is to introduce the search result in natural language. In this way, the chatbot always answers with some kind of small talk and a highly structured query from the dictionary database. The disadvantage of this

Lexicographical data in natural-language systems 

solution is the fact that the user always receives two answers at the same time: one from the small-talk engine and one from the dictionary database. It would be more convincing if the chatbot would send only one answer, so that the program must decide at an early stage of the input parsing whether it is small talk or a question from the user that has to be executed as a query in the dictionary database. This problem could be solved by running a word list of weighted terms that are part of usual small talk as stored in the small-talk database against the user input. This operation should result in a probability value that is used to decide whether the user input should be classified as small talk or as a dictionary question. Still, one problem would not have been solved by now, and that may be the hardest one to solve: in case of a dictionary question, the dialogue part of the answer is a random answer containing phrases like “The article of the word xyz may give you more information.” This is obviously not wrong, but it is not ideal, either. It would be much more natural if the chatbot would include the interesting parts of the “kurztext” or “langtext” in the answer or in an even more advanced system produce new answer texts using the information contained in the “kurztext” and “langtext.” The tables “t_artkel_has_t_subkategorien,” “t_subkatagorien,” and “t_kategorien” contain important information for linking different articles together. The lemma Acrylamid is part of the subcategory “Gefahrenstoffe” (hazardous materials), which is again part of the category “Körper” (body) und “Gesundheit” (health). Consider the dialogue in Example 11. 1. User: “Was kannst Du über giftige Sachen im Essen sagen?” 2. Chatbot: Small Talk + “Gefährlich im Essen ist Acrylamid. Hilft Dir das weiter?” 3. User: Reads article from the dictionary database about Acrylamid + “Ja, etwas.” 4. Chatbot: “Canthaxanthin ist ebenfalls ein wichtiges Thema bei der gesunden Ernährung.” (What can you say about poisonous things in food? Dangerous in food is Acrylamide. Does that help you? Yes, a little. Canthaxanthin is likewise an important topic of healthy nutrition.) Example 11: Dialogue between user and chatbot This dialogue could be performed with the database shown above because the connection between Acrylamind and Canthaxanthin is stored in the database. However, there is no available knowledge in the database explaining that Arcylamid is related to, e.g. “gesundes Essen”, only the contrary information that is “giftig”.

 Franziskus Geeb

The user input “1,” “Was kannst Du über giftige Sachen im Essen sagen,” could be written as “Was kannst Du mir über gesundes Essen sagen?” with a slightly different meaning but pointing at the same group of lemmata such as Acrylamid. The above relational database carries no information that could lead to this conclusion (“gesundes Essen” is the opposite of “Gift im Essen”). 2.3

OWL ontologies

Ontologies, as known from the semantic web (Berners-Lee 2001; Daconta et al. 2003) are a way to store information in a structured way but with connections between the information blocks adding a semantic meaning to these blocks. Information blocks could be e.g. lemmata and a connection between these lemmata such as Acrylamid and Canthaxanthin. Both are part of the subcategory “Gefahrenstoffe”. “Gesundes Essen” could be “opposite of ” “giftige Stoffe im Essen,” which again could be stored as some kind of synonym of “Gift im Essen” and “Nahrungsmittelgifte.” “Nahrungsmittelgifte” could also contain Acrylamid and Canthaxanthin as subcategories. Therefore, the chatbot could answer in the way reproduced in Example 12. 1. User: “Was kannst Du über giftige Sachen im Essen sagen?” 2. Chatbot: Small Talk + “Nahrungsmittelgifte sind das Gegenteil von gesundem Essen und z.B. Acrylamid und Canthaxanthin. Beides sind Giftstoffe. (What can you say about poisonous things in food? Food poisons are the opposite of healthy meals and e.g. Acrylamide and Canthaxanthin. Both are poisonous substances) Example 12: Dialogue between user and chatbot OWL (Bechhofner et al. 2004) is an XML application for storing structured information such as in ontologies: “The OWL Web Ontology Language is designed for use by applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has three increasingly expressive sublanguages: OWL Lite, OWL DL, and OWL Full” (W3C 2008). In OWL, structure, data and rules connecting structural elements or/and data are stored. This is not so different from a relational database where structure (tables), data (lexicographical information) and rules (constraints) can be stored. The difference between a database and an OWL ontology is the approach. Databases

Lexicographical data in natural-language systems 

can store OWL and ontologies, but they need a special entity relationship modelling and programming to do so. OWL is made only for storing ontologies. In addition, OWL ontologies are Web-enabled by default through the use of XML. 2.3.1 Modelling domain content with OWL

The main aspect of OWL is storing data and information in classes and subclasses or as individuals of these classes. Modelling subclasses according to the actual classification system of lookedup4you can easily be done in OWL. Example 13 shows one subclass to “Verbraucherrecht” (consumer legislation): “Gesetze” (laws). “Jugendschutzgesetz” (law for the protection of children and youth) again is a subclass of “Gesetze,” etc.

Example 13: Classes and subclasses of data stored in OWL All lemmata are stored as classes, too, with a functional data type property as string (Example 14).

(intellectual property legal capacity law for the protection of young workers dependent persons age of criminal responsibility pocket money section advertising promise competition copyright) Example 14: Classes of lemmata with functional data type properties In this way, lemmata are classes, and they carry information such as “kurztext” in individuals connected to the class (Example 15). Das Jugendschutzgesetz beinhaltet gesetzliche Sondervorschriften zum Schutz der Kinder und Jugendlichen vor gesundheitlichen und sittlichen Gefahren in der Öffentlichkeit. (The law for the protection of children and youth contains special statutory provisions to protect children and young persons from health and moral dangers in public.) Example15: Lemma with short explanatory text As part of the individual there could also be any kind of semantic connection (Geeb 1997a) such as a simple hypernym/hyponym relation (Example 16).

Lexicographical data in natural-language systems 

Das Jugendschutzgesetz beinhaltet gesetzliche Sondervorschriften zum Schutz der Kinder und Jugendlichen vor gesundheitlichen und sittlichen Gefahren in der Öffentlichkeit.

Example 16: Simple hypernym/hyponym relation The strength of OWL in this area is the ability to store different kinds of relations between classes and individuals in a functional and non-functional way, i.e. it is possible to have, e.g., only one hypernym or in the non-functional way even many hypernyms to one lemma. Systems with several levels in the classification have another interesting feature, “transitivity,” which means that a relation from A to B and B to C is also A to C. In a lexicographical system, these features are of great value, particularly if these data are part of a chatbot. 2.3.2 Modelling domain structure with OWL

OWL and the theory of Web ontologies in OWL can be used in different ways to store the lexicographical data and information. In the above example, the main aspect was the relation between the lemmata (domain content) represented as classes and/or individuals in a systematic classification. Another starting point is the modelling of the microstructure and therefore the structure of the relational database in OWL. In this case, the dictionary (lexicographical product) would be the root of the system: All other features of an article as known from the tables of the original database in the dictionary would be subclasses (Example 17).

 Franziskus Geeb

Example 17: Subclasses in the dictionary microstructure The semantic relations between these classes are modelled as object properties. Example 18 shows a non-functional connection between domain and range where the domain contains 1-n elements in the range. Example 18: Non-functional connection between domain and range where the domain contains 1-n elements in the range In this case, data type properties are used as slots for the concrete information on the lemma as shown below for the synonym, which is basically a string on a 1-n relation between lemma and synonym (Example 19).

Lexicographical data in natural-language systems 

Example 19: Non-functional connection between domain and range where the domain contains 1-n elements in the range Converting the object property for synonym to a functional property but containing the data type property non-functional guarantees this traditional 1-n relationship as known from the database already shown (Example 20). Example 20: Non-functional guarantees traditional 1-n relationship after conversion of object property All information about categories and subcategories from the dictionary could be stored as a data type property like the short example of two main categories reproduced in Example 21. Trends + Shopping Körper + GesundhEit Example 21: Two main categories of data type property The instance of a lemma with a short text, one subcategory, and two synonyms would be what is reproduced in Example 22.

Von Homebanking spricht man, wenn die Abwicklung diverser Bankgeschäfte, wie z.B. Überweisungen, von zu Hause erledigt wird. Homebanking

Geld + Job

(One speaks of homebanking when the completion of various banking transactions, e.g. transfers, are done at home.) Example 22: Instance of a lemma with a short text, one subcategory, and two synonyms

Lexicographical data in natural-language systems 

The main difference between modelling content and modelling structure in OWL is obviously the retrieval of the content. While the second approach resembles a database but with more logical features applied, the first concept is pure domain modelling with less regard for the lexicographical structure of the information as known from the common microstructure and the cross-reference structure. 3. Concluding remarks The main problem addressed in this paper is how to use lexicographical data and information as a knowledge base for a chatbot. Using these data for an automatic answer machine would add value to monolingual online dictionaries if they contained some amount of semantic and/or encyclopaedic information and if the dictionary contained an elaborate cross-reference structure plus synonyms, etc. There are three different ways that could be part of the solution to this problem: the chatbot programming language AIML, a relational database with full-text search and ranking, and OWL ontologies. AIML and the full-text database were used in a concrete project as shown above, but they were not connected to each other in a semantic way. The considerations for OWL are still prototypes. By now, it seems that AIML is a possible solution for storing small talk and even lexicographical information. Retrieval from AIML is not difficult, but AIML needs to be extended with a lexicographical tagset such as leXeML (Geeb 2001) and features such as regular expressions. Relational databases were and are the typical tool for representing lexicographical information today. They are widely known and offer features such as full-text retrieval with term weight and relevance ranking not only as a way to structure data but also as a technical solution for the retrieval. Therefore, this technology must be seen as the most stable and effective way to store data in general and lexicographical information in particular today. XML databases and application servers are not (yet) as fast and as well-known as these databases. OWL ontologies, on the other hand, are the best tool of the three to represent semantic meaning in structure and/or content. Because OWL is used as a standard worldwide and its use will grow in the future, it may be the solution for interoperability of lexicographical data around the world. Still, more research needs to be done to explore the way in which lexicographical data should be stored in OWL and whether the content approach or the structural approach will result in the best retrieval under different circumstances. Keeping in mind that relational databases are quick and efficient in retrieval and even are fuzzy match-enabled and considering the semantic data structure of OWL, use of both concepts and technologies at the same time should give the best

 Franziskus Geeb

results. Still, the problem of using these lexicographical OWL database ontologies in a lexicographical chat is not solved. AIML seems to be the easiest and best way today to deal with the conversation part of the problem. Even if AIML needs to be extended functionally and as a tagset, its use has become established on the one hand, and on the other hand it is based on the same technology as OWL: XML. A future project should therefore clarify the possibility of using a chatbot on top of a full-text search-enabled lexicographical database containing the lexicographical information in some kind of OWL ontology connected to the dialogue feature through an extended AIML application. References A.

Dictionaries

[Brockhaus] Brockhaus. Enzyklopädie. 2006. 30 volumes. Twenty first edition. Mannheim: Brockhausverlag. [Duden] Wissenschaftlichen Rat der Dudenredaktion: Werner Scholze-Stubenrecht et al. (eds.) 1999. Das große Wörterbuch der deutschen Sprache in zehn Bänden. Mannheim: Dudenverlag. [Webster’s] Webster’s Encyclopedic Unabridged Dictionary of the English Language. 1996. New York: Random House.

B.

Other literature

AIML. 2001. AIML Reference Manual. (Accessed 15 February 2008). AIML. 2008. ALICE and AIML Documentation. (Accessed 15 February 2008). Bechhofer, S. et al. 2004. OWL Web Ontology Language. Reference. W3C Recommendation 10 February 2004. (Accessed 12 December 2007). Bergenholtz, H. 1994. “Ordbøgers funktion og æstetik”. In Sprogets funktion og æstetik, P. Skyum-Nielsen (ed.), 55–82. Copenhagen: Gad. Bergenholtz, H. 1997. “Polyfunktionale ordbøger.” LexicoNordica 4: 5–29. Bergenholtz, H. and Pedersen, J. 1999. “Fachwörterbücher als Hilfsmittel bei der Übersetzung von Fachtexten.” In Fachsprachen. Ein internationales Handbuch zur Fachsprachenforschung und Terminologiewissenschaft, L. Hoffmann, H. Kalverkämper and H.E. Wiegand (eds.), 1884–1889. Berlin/New York: Walter de Gruyter. Bergenholtz, H. and Tarp, S. (eds.). 1995. Manual of Specialised Lexicography. The preparation of specialised dictionaries. Amsterdam/Philadelphia: John Benjamins. Bergenholtz, H. and Tarp, S. 2002. “Die moderne lexikographische Funktionslehre. Diskussionsbeitrag zu neuen und alten Paradigmen, die Wörterbücher als Gebrauchsgegenstände verstehen.” Lexicographica 18: 253–263.

Lexicographical data in natural-language systems  Bergenholtz, H and Tarp, S. 2003. “Two opposing theories: On H.E. Wiegand’s recent discovery of lexicographic functions.” Hermes 31: 171–196. Bergenholtz, H. and Tarp, S. 2005. “Wörterbuchfunktionen.” In Schreiben, Verstehen, Übersetzen und Lernen: Zu ein- und zweisprachigen Wörterbüchern mit Deutsch, I. Barz, H. Bergenholtz and J. Korhonen (eds.), 11–25. Frankfurt am Main: Peter Lang. Berners-Lee, T. 2001. “The Semantic Web.” Scientific American 2001(5). (Accessed 12 April 2008). Daconta, M, Obrst, L.J. and Smith, K.T. 2003. The Semantic Web. A Guide to the Future of XML, Web Services, and Knowledge Management. Indianapolis: Wiley. Geeb, F. 1997. “Die Benutzertypologie als Grundstein terminologischer und lexikographischer Arbeit.” In Proceedings from XXII International Association Language & Business Conference ‘Language and Business Life’, A. Grinsted (ed.), Volume 2, 215–235. Duisburg: Internationale Vereinigung Sprache und Wirtschaft e.V. Geeb, F. 1997a. “Bedeutungsbeziehungen in fachlexikographischen Nachschlagewerken.” Hermes 18: 127–155. Geeb. F. 1998. “Semantische und enzyklopädische Informationen in zweisprachigen Fachwörterbüchern.” In Symposium on Lexicography VIII. Proceedings of the Eighth International Symposium on Lexicography May 2 – 5, 1996 at the University of Copenhagen, A. Zettersten, J.E. Mogensen, V. Hjørnager Pedersen (eds.), 175–186. Tübingen: Max Niemeyer Verlag. Geeb, F. 1998a.: Semantische und enzyklopädische Informationen in Fachwörterbüchern. Eine Untersuchung zu fachinformativen Informationstypen mit besonderer Berücksichtigung wortgebundener Darstellungsformen. Aarhus: Aarhus School of Business. Geeb, F. 2001. “leXeML – Vorschlag und Diskussion einer metalexikographischen Auszeichnungssprache.” Sprache und Datenverarbeitung. International Journal for Language Data Processing: 27–61. Geeb, F. 2007. “Chatbots in der praktischen Fachlexikographie und Terminologie.” LDV-Forum. GLDV-Journal for Computational Linguistics and Language Technology 22(1): 51–70. Geeb, F. and Spree, U. 2005. “lookedup4you – Mikrostruktur und Makrostruktur und all das. Produktion eines Online-Nachschlagewerks als studentisches Projekt. Ein Erfahrungsbericht.” Fachzeitschrift Information – Wissenschaft & Praxis (nfd) 3: 133–142. Hammer, C., and Schmoecker, M. 2004. “Navigation als Instrument der Kundenführung und -bindung.” In Praxishandbuch Portalmanagement, P. Gantsch and S. Lee (eds.), 157–168. Wiesbaden: Gabler. Koch, M. 2002. “Globale Benutzerprofile und kundenindividuelle Produkte.” In Der Mensch im Netz – Ubiquitous Computing. 4. Liechtensteinisches Wirtschaftsinformatik-Symposium an der FH Liechtenstein, B. Britzelmeier, S. Geberl and S. Weinmann (eds.), 137–146. Wiesbaden: Teubner. Lundqvist, K., Williams, S., and Baker, K. 2006. “Evaluation of quality of e-learning environments.” In Proceeding of the 5th European Conference on e-Learning. Univesity of Winchester, UK, 11–12 September 2006, D. Remeney (ed.), 216–223. Reading: Academic Conferences. Montero, S.C. and Araki, K. 2007. “Evaluation of trivial dialogue phrase databases through practical application to user-computer conversation – case study: English-Spanish.” In Knowledge-Based Intelligent Information and Engineering-Systems: KES 2007 – WIRN 2007. 11th International Conference, KES 2007 XVII Italian Workshop on Neural Networks Vieteri sul Mare, Italy, September 2007, Proccedings, B. Apolloni, R.J. Howlett and L. Jain (eds.), 361– 368. Heidelberg: Springer.

 Franziskus Geeb MySQL. 2008. MySQL Forge: MySQL Internals Algorithms. (Accessed 2 April 2008). NLP-addiction.com. 2008. NLP-addiction.co. (Accessed 2 June 2008). Pelka, B. 2003. Künstliche Intelligenz und Kommunikation: Delphi Studie zur Technologiefolgeabschätzung des Einsatzes von Künstlicher Intelligenz auf Kommunikation, Medien, Gesellschaft. Münster: Lit-Verlag. Popp, H. and Huber, E. 2007. “Steigerung des Humankapitals in KMU durch virtuelle Weiterbildung.” In Wissensmanagement für KMU, A. Belliger, D. Krieger and V. Friedrich (eds.), 219–236. Zürich: vdf Hochschulverlag. Porter, M.F. 1980. “An algorithm for suffix stripping.” Program 14(3): 130–137. Program E. 2007. Program E. (Accessed 26 October 2007). Shawar, B.A. and Atwell, E. 2004. “ALICE and Machine Learning Chatbots.” In Natural Language Processing and Information Systems. 9th International Conference on Applications of Natural Language to Information Systems NLBD 2004 Salford, UK, June 2004, Proceedings, F. Meziane and E. Méthais (eds.), 407–412. Heidelberg: Springer. Shigeoka, I. 2002. Instant Messaging in Java. The Jabber Protocols Manning. Greenwich: Manning Publications. Tatai, G., Csordás, A., Kiss, Á., Szalo, A., and Laufer, L. 2003. “Happy chatbot, happy user.” In Intelligent Virtual Agents. 4th International Workshop, IVA 2003, Kloster Irsee, Germany, September 2003 Proceedings, T. Rist, R. Aylett, D. Ballin and J. Rickel (eds.), 5–12. Heidelberg: Springer. Thiedeke, U. 2001. “Die Gemeinschaft der Eigensinnigen. Interaktionsmediale Kommunikationsbedingungen und virtuelle Gemeinschaften.” In Internet-Bildung-Gemeinschaft. Medienbildung und Gesellschaft, F. von Gross, W. Marotzki and U. Sander (eds.), 45–94. Wiesbaden: VS-Verlag. W3C. 2008. Web Ontology Language (OWL) (Accessed 4 January 2008). Wall. L. 2006. Apocalypse 5: Pattern Matching. (Accessed 16 January 2008). Wallace, R.S. 2002. AIML Overview. (Accessed 1 February 2008). Wallace, R.S. 2005. Artificial Intelligence Markup Language (AIML) Version 1.0.1 A.L.I.C.E. AI Foundation Working Draft 8 August 2005 (rev 008.) (Accessed 16 January 2008). Weizenbaum, J. 1966. “ELIZA – A computer program for the study of natural language communication between man and machine.” Communications of the ACM 10(8): 36–45.

part v

Collocations and phraseology

A methodology for describing collocations in a specialised dictionary Marie-Claude L’Homme This contribution describes a methodology for encoding and organizing collocations in a French terminological database. First, we will review previous work ‑ i.e., specialised dictionaries and a term bank that take into account lexical combinations ‑ in different fields: e.g., stock exchange, business, Internet. Second, we will present our own methodology based on that devised within Explanatory Combinatorial Lexicology (ECL) (Mel’čuk et al. 1995; 1984– 1999). We will propose a definition of “collocation” and examine some of its repercussions on the kinds of lexical combinations that appear in our dictionary. We will then show how collocations are listed and described in the entries. The description takes into account three linguistic properties of collocations: 1) their syntactic structure; 2) their general meaning; and 3) the role played by actants (i.e., arguments). Finally, we will show how explanations can be adapted so as to be more user-friendly. Keywords: specialised dictionary; terminological database; collocation; lexical relations; actantial structure; computing; Internet

1. Introduction Since the 1980s, it has been recognised that collocations are extremely useful additions to the microstructure of general language dictionaries (Benson et al. 1986; Fontenelle 1997; Hausmann 1979; Mel’čuk et al. 1984–1999) as well as specialised dictionaries (Cohen 1986; Heid and Freibott 1991; Bergenholtz and Tarp 1995). The access to collocations ‑ especially if some form of explanation of their meaning is provided ‑ can help solve problems related to the production of correct lexical units to be used in combination with a specific term or another lexical unit. However, points of view as to how collocations should be defined, collected, encoded, and organised in printed or electronic reference works can differ. There does not appear to be a consensual framework or methodology that lexicographers

 Marie-Claude L’Homme

or terminologists can refer to in order to deal with the specific linguistic phenomenon of lexical combinatorics. This contribution presents a general methodology devised to help terminologists deal with collocations in a specialised lexical database. The database, called the DiCoInfo or the Dictionnaire fundamental de l’informatique et de l’Internet (2009; Fundamental Dictionary of Computing and the Internet), contains terms related to the fields of computing and the Internet. Our proposal draws on the theoretical principles defined within Explanatory Combinatorial Lexicology, henceforth ECL (Mel’čuk et al. 1995), on encoding methods used in printed specialised dictionaries and term banks that include collocations, and on work we carried out while compiling our database. Our methodology comprises a number of steps that aim to guide terminologists when: 1) collecting collocations; 2) encoding them in the database; and 3) organizing them. These steps are presented in detail in Section 4. But first, Section 2 presents some previous work on specialised collocations. Section 3 gives a brief description of the DiCoInfo. Most work reported in this contribution ‑ including the DiCoInfo ‑ has been carried out in French. Hence, we will provide English examples or an English translation of French examples wherever possible. 2. Collocations in specialised dictionaries Terminologists and specialised lexicographers have started to list collocations in specialised dictionaries or term banks in order to provide users with information regarding the proper lexical combinations in which terms can be used in specialised texts. It has often been pointed out that standard terminological reference works focus on conceptual information about terms. This information, although extremely useful, does not provide answers to other questions users may have regarding the linguistic behaviour of terms. Although they can become quite familiar with a given subject matter, nonexperts may have difficulties producing the correct verb, noun or adjective that is typically found in combination with a specific term. Scholars have explained this by the fact that some word combinations are unpredictable, meaning that they cannot be generated based on their components’ syntactic or semantic properties. This observation applies to general language as well as to specialised languages. However, in the latter case, native speakers of a language cannot rely exclusively on their intuition to produce correct combinations; they must reproduce consensual usages that have been defined within specific subject fields.

Describing collocations in a specialised dictionary 

Table 1. Collocations listed in specialised dictionaries Field

Headword

Collocation(s)

Dictionary

Stock exchange

Devise (currency)

~ s’affaiblit, ~ baisse (~ decreases) consolider une ~ (strengthen a ~)

Cohen (1986)

Internet

Link

activate a ~, click on a ~ broken ~, external ~

Meynard (2000)

Business

Actif (assets)

~ stratégique (strategic ~) détenir des ~s (have ~) vendre des ~s (sell ~)

Binon et al. (2000)

Heid and Freibott (1991) illustrated the above observation with the French example créer un fichier (to create a file). Although verbal alternatives (equivalent as far as meaning is concerned) are available in French (établir, concevoir, developper), créer is preferred in the field of computing. Interestingly, developper (to develop) will be used with programme (program), application (application software) or logiciel (software program) to express the same basic sense (“cause something to exist”). Many other examples can be found in different domains. Table 1 gives examples of collocations that have been listed in specialised dictionaries. (Our list does not take into account the specific classification systems for collocations in these dictionaries. Encoding methods will be reviewed in the following subsections.) In terminology, there is still no general agreement on what kinds of word combinations should be encoded in dictionaries. In fact, terminologists do not agree on the name to give word combinations comprising a term; several denominations can still be found in the literature: phraseologisms, collocations, cooccurrents, specialised lexical combinations, etc. However, when browsing through specialised dictionaries or term banks that encode collocations, a number of similarities can be observed: 1. Collocations are listed under a headword that has already been defined as a term in a specialised subject field; 2. The keyword of the collocation, being a term, is usually a noun or a noun phrase; 3. Lexical units that typically combine with terms – defined as collocates – can be verbs, nouns, or adjectives. However, specialised dictionaries that take into account collocations differ with respect to the method chosen to list and represent them in entries. The following subsections review common encoding methods used in specialised dictionaries or terminological databases.

 Marie-Claude L’Homme

2.1

Listing collocations according to the parts of speech of collocates

A simple method for taking into account specialised collocations consists in simply adding collocates to existing terminological descriptions (a dictionary entry or a term record that already contains common information categories such as lists of synonyms, definitions, observations, etc.). This method is applied in the Termium Plus (2009) term bank where new fields have been added to the standard structure of term records in order to list collocations (in fact, the fields are labelled PHR, for phraseology). Collocates are classified according to their part of speech. Figure 1 gives an example of this encoding for the term data.1 Three new fields ‑ one for verbal collocates, one for adjectives, and one for nouns ‑ have been added to the record structure. This method assumes that a user already knows the part of speech of the collocate he or she is looking for and will access the right part of the record. data CORRECT, VOIR OBS, PLUR, NORMALISE DEF – Reinterpretable [representations] of information in a formalised manner suitable for communication, interpretation, or processing. OBS – Data can be processed by humans or by automatic means. OBS – The singular term “datum” is rarely used in the field of information technology. The plural form “data”, which is commonly used in this field, is often perceived as a collective noun, which takes a singular verb. However, “data” can take a plural verb if attention is being focused on each individual piece of data (e.g. The data were transferred.). OBS – data: term and definition in the singular standardised by ISO/IEC and CSA International. PHR – To analyze, archive, assemble, collect, compile, convert, edit, process, save, transfer, translate data. PHR – Alphanumeric, analog, binary, digital, dynamic, input, output, static data. PHR – Data analysis, archiving, collection, compilation, conversion, creation, processing, transmission.

Figure 1. Data in Termium Plus (2009) and collocates

2.2

Listing collocations in more than one language

In addition to classifying collocates according to their part of speech (Meynard 2000) ‑ who compiled a dictionary of word combinations in the field of the Internet ‑ provides users with English and French translations of typical lexical combinations. An example is given in Figure 2.

Describing collocations in a specialised dictionary 

The dictionary is divided into two separate parts: English -> French; French -> English. Not only can users have access to the term’s equivalent, but also to the idiomatic translation of collocations in which the term can be found. In addition, the categories defined to classify collocations take into account the syntactic position of the term (this distinction is not made, for example, in the fields added to the Termium Plus records). LINK definition Hypertext object that allows a word, an image or a data element to connect with another.

French base noun: lien

Collocate noun + Base noun • Attributes of a link • Behaviour of a link • Colour of a link • Creation of a link • Deletion of a link • […]

Attributs d’un lien Comportement d’un lien Couleur d’un lien Création d’un lien Suppression d’un lien

Collocate verb + Base noun • To activate a link • To click on a link • To create a link • To delete a link • To display a link • […]

Activer un lien Cliquer sur un lien Créer un lien Supprimer un lien Afficher un lien

Base noun + Collocate verb • Link displays • Link initiates • Link points to • Link specifies • Link works

Collocate adjective + Base noun • Basic link Lien élémentaire • Broken link Lien rompu • Direct link Lien direct • External link Lien externe • Functional link Lien fonctionnel • […]

Lien affiche Lien lance Lien évoque Lien spécifie Lien fonctionne

Figure 2. Link in Meynard (2000: 144–145)

 Marie-Claude L’Homme

2.3

Listing collocations according to their meaning

Listing collocates according to their part of speech and to the syntactic position of the term is useful, but this method does not discriminate between collocates belonging to the same part of speech and expressing different meanings. For example, on the term record taken from Termium Plus (Figure 1) nothing informs the user that assemble and edit convey very different meanings when combined with data. Cohen (1986), who compiled a French collocations dictionary in the field of stock exchange, addresses this problem. The lexicographer classifies collocates according to their part of speech and according to their general meaning. All collocates are arranged in a table of recurrent meanings expressed by collocates in the field being analysed (e.g., CROISSANCE (increase) or DÉCLIN (decrease)). The entry devise (currency) is reproduced in Figure 3. This method allows users to select a collocate that expresses a specific meaning (for example, “increase”) and to choose the part of speech that corresponds to the context he or she must produce. The table also shows that for each sense/part of speech correspondence, more than one collocate may be available. DEVISE : Effet de commerce exprimé en monnaie étrangère et devant être payé par un débiteur étranger à un agent économique; par extension, la monnaie étrangère elle-même.

DÉBUT CROISSANCE

INDÉTERMINÉS

DÉCLIN

FIN AUTRES COOCCURENTS

NOMS

VERBES (SUJET)

VERBES (OBJET)

ADJECTIFS

raffermissement renforcement revalorisation évolution fluctuation stabilisation stagnation affaiblissement baisse dépréciation recul

se raffermir se renforcer

consolider raffermir ravaloriser

forte

affaiblir déprimer

faible

avoir disponibilités réserves apport sortie

Figure 3. Devise in Cohen (1986: 48)

évoluer fluctuer se stabiliser stagner s’affaiblir baisser se déprécier

expatrier rapatrier

Describing collocations in a specialised dictionary 

2.4

Representing the entire set of the linguistic properties of collocations

The above subsections have shown that specialised dictionaries listing collocations can represent: 1) collocates of terms which belong to the parts of speech of the adjective, the verb, and the noun; 2) different syntactic positions occupied by the term in certain collocations; and 3) different meanings conveyed by collocates when combined with a given term. 1 un PRIX 1.1 Somme d’argent (ou partie d’un autre bien ou service) qui représente la valeur d’un bien ou d’un service (Y) offert à la vente par un agent économique (un particulier, un commerçant, une entreprise, une banque – X) à un autre agent économique (le client: un particulier, un commerçant, une entreprise, une banque – Z). […] + adjectif (sens 1.1) un prix courant un prix unique TYPE DE PRIX (sens 1.1) Les prix + adjectif qui désigne un type de produits. Les prix agricoles; pétroliers; énergétiques. Un prix (de vente) net […] + nom (sens 1.1) Une politique des prix Une guerre des prix Le rapport qualité-prix […] + verbe: qui fait quoi? (sens 1.1) X  fixer le ~ de Y la fixation du ~ (de Y) déterminer le ~ (de Y) la détermination du ~ (de Y) établir le ~ (de Y) l’établissement du ~ (de Y) × un ~ s’établir sur le marché  X (un commerçant, pratiquer un/des ~ une entreprise, une (fréq.: + adjectif) banque) […] × Z obtenir un/des ~ l’obtention d’un ~ (+adjectif) (fréq.: + adjectif) […]

Figure 4. Part of the entry prix in Binon et al. (2000: 431–435)

 Marie-Claude L’Homme

In addition to all these properties (meaning and part of speech of collocates, syntactic structures of collocations), Binon et al. (2000), a learner’s dictionary containing French terms2 related to the field of business, establishes an explicit relationship between the definition of the keyword (in which its actants, i.e., arguments, are stated) and the linguistic realizations of collocations.3 We reproduce in Figure 4 part of the entry devoted to prix (price) (the information given concerns the first sense given in the entry – two other senses are described). As can be seen in Figure 4 (above), collocations are first listed according to the parts of speech of the collocates: + adjectif (adjectives), + nom (nouns), and + verbe (verbs). Adjective and noun collocates are further subdivided according to their meaning: e.g., TYPE DE PRIX (prices according to their type). The organization of verbal collocates differs from what has been seen in the other specialised dictionaries examined up to now. The description takes into account the actantial structure of the keyword. This structure is highlighted in the definition which states that prix has three actants, first represented by variables (X, Y and Z), and then explained using generic terms that normally appear as actants (e.g., the first actant X is explained in the definition as being an economic agent: a bank, a store owner, etc.). When listing verbal collocations, the lexicographers have specified which actant is involved in the meaning. For example, in fixer le prix (establish the price), the first actant (X) of prix is the subject of fixer. We will come back to the representation of actants used by Binon et al. (2000) when we present our own proposal, especially in Section 4.2, which is devoted to the encoding of our collocations in our own terminological database. First, however, we will give a short general description of the database. 3. The DiCoInfo: a brief description The DiCoInfo is a lexical database which contains French terms related to the fields of computing and the Internet. More than 1000 articles can be accessed on a website (http://www.olst.umontreal.ca/dicoinfo).4 It aims at providing rich lexicosemantic information on terms; it accounts for fine-grained semantic distinctions, describes the actantial structure of terms, includes long lists of lexical relations shared by the term being described and other lexical units. The database comprises fundamental terms ‑ terms that can be found in different texts on computing as opposed to terms attached to a very specific subdivision of the field (for example, programming with Java). The terms described can be nouns (e.g., données, ordinateur; data, computer), verbs (e.g., afficher, naviguer; display, browse), adjectives (e.g., dynamique, virtuel; dynamic, virtual), or adverbs (e.g., numériquement, dynamiquement; digitally, dynamically). Multi-word

Describing collocations in a specialised dictionary 

expressions can also appear as headwords if their meaning is non-compositional (e.g., système d’exploitation, par défaut; operating system; default). Terms are selected using a method combining a term extractor (TermoStat, Drouin 2003) and the application of lexico-semantic criteria by terminologists. Each entry in the dictionary corresponds to a specific sense. Sense distinctions are made using a combination of four lexico-semantic criteria: 1) compatible and differential cooccurrence (Mel’čuk et al. 1995: 64–65); 2) substitution with a synonym; 3) differential derivation; 4) other paradigmatic relationships. It is important to point out that only senses related to the fields of computing and the Internet are described in the dictionary, even if other senses can be found in specialised corpora. Hence, three separate entries deal with the senses of adresse: 1) adresse1: Lorsque le microprocesseur demande l’accès à une adresse en mémoire (address1: When the microprocessor requests access to an address in the memory); 2) adresse2: Vous envoyez une requête au serveur proxy (une adresse de site par exemple) (address2: You send a request to the proxy server (the address of a website for example)); 3) adresse3: L’utilisateur dispose d’une adresse qui lui est propre (address3: The user has an address that is specific). The entries of the DiCoInfo are subdivided into 10 information categories that can be accessed on the website. As can be seen in Figure 5, the headword, grammatical information, state (this category indicates if the entry is completed or if some information categories are still missing), actantial structure, definition (for terms in entries that are completed), and synonyms appear in the default display. Also, some administrative information ‑ the date the entry was last updated and the code of the terminologist who wrote the entry ‑ is also given at this stage. barre d’espacement1, n. f.

Statut: 0 Structure actancielle: la barre d’espacement: ~ utilisée par AGENT{utilisateur 1} pour intervenir sur PATIENT{espace 2} Réalisations linguistiques des actants Définition: Touche de forme allongée placée au bas du clavier qu’un UTILISATEUR utilise afin d’insérer des ESPACES. Synonyme(s): barre espace, barre d’espace Contextes Liens lexicaux

Figure 5. Entry for barre d’espacement1 (space bar)

 Marie-Claude L’Homme barre d’espacement1, n. f. Liens lexicaux

Statut: 0

Explication – terme typique Explanation – Typical Term

Lexie reliée Related Lexical Unit

≈5 ≈ ≈ L’utilisateur utilise la b. L’utilisateur utilise la b. -> NOM6 L’utilisateur utilise la b. pour intervenir sur un espace -> NOM L’utilisateur cesse d’utiliser la b. -> NOM Lieu où on trouve une b. Lieu où on trouve une b.

touche1 pavé1 flèche1 appuyer1 sur la ~ enfoncer1 la ~ enfoncement1 de la ~ insérer2... avec la ~ insertion2 de... avec la ~ relâcher1 la ~ relâchement1 de la ~ clavier1 mini-clavier1

Figure 6. Lexical relations for barre d’espacement1 (space bar)

Additional information categories can be displayed if the user wishes to access them: linguistic realizations of actants, contexts and lexical relations. Entries list up to three contexts extracted from the corpus used during the preparation of the entries. Some entries also include an information category called “Informations complémentaires” (additional information), which contains links to other sites providing additional encyclopedic information on concepts. The most complex information category, called “Lexical relations”, is the one on which we will focus in the remainder of the article. This information category lists all lexical units (some of which can be terms of the domains of computing and the Internet) which share a paradigmatic or syntagmatic relation with the term appearing as a headword. An explanation of the relationship is also provided. Finally, if the related lexical unit appears as a headword in the dictionary, a hyperlink redirects the user to the entry where it is described. Figure 6 shows how lexical relations are encoded for the term barre d’espacement1 (space bar). Figure 6 shows that recurrent paradigmatic relationships are given at the top of the list (e.g., touche (key), which is a hyperonym). Then, verbal collocations are listed (e.g., appuyer sur la ~; press the space bar) along with a general explanation of their meaning (e.g., L’utilisateur utilise la b.; The user uses the space bar). Finally, less standard paradigmatic relationships are listed at the bottom of the table (e.g., clavier; keyboard). The list reproduced here for barre d’espacement is rather

Describing collocations in a specialised dictionary 

short, but some entries can contain more than 100 lexical relations (for example, the entry fichier (file) contains approximately 150 lexical relations). More than 20,000 lexical relations are provided for the 1000 terms available online, which represents about 20 relationships per term. A search engine can be used to access entries (as well as synonyms and variants), and lexical relations. 4. Collocations in the DiCoInfo In this section, we will focus on one part of our terminological descriptions, i.e., collocations that appear in the “lexical relations” category of our database. We present our own framework and method for taking into account this linguistic phenomenon in the DiCoInfo. As will be seen further on, our perspective bears some similarities to the work presented above, but also takes into account the specificities of the terms we aim to describe and the field to which they belong. 4.1

Definition

Our definition of “collocation” is based on Explanatory Combinatorial Lexicology, ECL, (Mel’čuk 1995) which views it as an unpredictable combination of lexical units, i.e., a combination that cannot be produced based on the regular syntactic and semantic properties of the units involved.7 Collocations comprise a keyword (also called base), which is selected by the speaker to express a given meaning, and a collocate, the selection of which is imposed by the keyword. The types of collocations that are taken into account in our specialised dictionary are those in which the keyword is a term. We assume that the term is freely selected, but that the selection of the collocate (another term or a lexical unit that has not been defined as a term in the field under analysis) is imposed by the term. For example, verbs used to express the idea of “use” differ according to the term selected, as shown in Example 1.

“use” Internet: surf the ~ “use” file: edit a ~ “use” mouse: move a ~ “use” password: use a ~

Example1: Verbs that express “use” Similarly, verbs used to express “create” also differ according to the key term selected, as shown in Example 2.

 Marie-Claude L’Homme

“create” a password: create a ~, define a ~ “create” a file: create a ~ “create” interface: develop an ~ “create” code: generate ~ “create” program: develop a ~, write a ~

Example 2: Verbs that express “create” Given this “unpredictability”,8 it is thus necessary to encode collocations in specialised dictionaries in order to give users access to this information. We will also adhere to the general principle according to which collocates are listed in an entry whose headword is a term. 4.2

Collection of collocations

Referring to the definition above, terminologists collect collocations from a corpus of texts on computing and the Internet that contains approx. 2 million words. When browsing through the corpus with a concordancer, terminologists collect nouns, verbs, and adjectives that are typically combined with noun terms. A few adverbial collocates are also collected for adjectival and verbal terms. However, a 2 million word corpus is not enough to ensure that all relevant collocates are covered. After collecting a first series of collocations from the corpus, terminologists then make a series of queries on the web to find combinations that did not appear in the corpus. Web queries are also extremely useful to confirm Table 2. Provisional list of collocations containing the term fichier (file) transférer + fichier gérer + fichier

enregistrer + fichier taille + fichier sauvegarder + fichier fichier + données

Échange + fichier format + fichier

Le FTP est un protocole qui permet de transférer des fichiers entre votre ordinateur et un ordinateur distant (FTP). en cliquant deux fois (double cliquant) sur cette icône vous pourrez voir les composants de votre ordinateur et gérer vos fichiers (Prise_en_main_de_Windows). permettant d’enregistrer les fichiers dans une arborescence (systeme_d_exploitation) a suppression du bridage lié à la taille des fichiers (Unix). Assurez-vous de sauvegarder le fichier sur le lecteur A (Access). Les programmes d’application La plupart des données stockées sur le disque dur de l’ordinateur sont contenues dans des programmes d’ application ou des fichiers de données (Adevim) les systèmes d’exploitation de réseau facilitent l’échange de fichiers et l’ impression des documents (Adevim) demander un ‘Téléchargement’ en indiquant le format du fichier (PDF ou TIFF) et le nombre de pages (Aidenum).

Describing collocations in a specialised dictionary 

the frequency of given combinations. At the end of this stage, terminologists obtain a provisional list of potentially valid combinations (an example is given in Table 2 for the term fichier (file)), which will be further refined and organised during the following stages. Encoding of collocations

4.3

Once collocations are collected, they are encoded in the dictionary. Our encoding takes into account three linguistic properties of collocations in accordance with ECL.9, 10 1. The syntactic relationship between the base and the collocate. For example, we distinguish the verbal collocates of mouse, namely move and click, based on the fact that, when used in combination with move, mouse is the first complement (direct object); when used in combination with click, mouse is the second complement. move a mouse: verb + 1st complement click on... (e.g., an icon) with a mouse: verb + 2nd complement 2. The actantial structure of the base. Many collocations, especially verbal and nominal collocations, convey a meaning that involves one or more actants that appear in the actantial structure of the keyword. Another example of move and click being used in combination with mouse is given below. As can be seen in the example, the sense of move a mouse involves the first actant of mouse; the sense of click on something with the mouse, involves both actants of the term.

Mouse, n.

Someone (1st actant) moves a ~

Actantial structure: ~ used by someone (1st actant) to act on something (2nd actant)

Someone (1st actant) click on something (2nd actant) with a ~

 Marie-Claude L’Homme

3) The sense of the collocate. Collocates convey different senses when combined with specific terms. For example, both move and click refer to typical uses of the mouse; however, other collocates convey a meaning of “creation”, e.g., create a file, a password, develop a program. Figure 7 shows how the encoding of the three linguistic properties listed above is implemented in the Web version of the DiCoInfo for the term Internet. As can be seen in Figure 7, the syntactic relationship of Internet and the verbal collocate is evidenced by the phrase containing the collocate and the keyword. In se connecter à ~ (connect to the Internet), naviguer dans ~ (surf the Internet), se déconnecter d’~ (to log out of the Internet), Internet is the first complement. However, in chercher … dans ~ (search for something in the Internet), Internet is the second complement. Secondly, the actantial structure of the base is taken into account by the explanations of the meanings of collocations. Figure 7 shows that Internet has two actants. They are represented using two different systems: the first one states the semantic role of the actant (AGENT, PATIENT); the second one gives the typical term or terms that instantiate an actant (e.g., utilisateur (user) is the typical agent of Internet). In naviguer dans ~, the first actant is involved and stated in the explanation: L’utilisateur utilise I. (the user uses the Internet). In chercher … dans ~, both actants are involved and thus appear in the explanation: L’utilisateur utilise Internet, n. m. Actantial structure: ~ utilisé par AGENT{utilisateur 1} pour intervenir sur PATIENT {ressource 2, site 1}

L’utilisateur commence à utiliser I. L’utilisateur utilise I. L’utilisateur utilise I. pour intervenir sur la ressource ou le site L’utilisateur cesse d’utiliser I.

Figure 7. Encoding of collocations for Internet1

accéder à ~, aller dans ~ se connecter à ~ naviguer dans ~ chercher … dans ~, trouver… dans ~ se déconnecter d’~

I. pour intervenir sur la ressource ou le site (The user uses the Internet to act on the resource or the site). Finally, the meanings of collocations are highlighted in the explanations, by means of a basic vocabulary (use, create, cause, start to use, etc.). In the example given above, all collocates are related to the use of the Internet, however, distinctions appear according to the stage of the use: L’utilisateur commence à utiliser I. (The user starts using the Internet); L’utilisateur utilise I. (The user uses the Internet); L’utilisateur cesse d’utiliser I. (The user stops using the Internet).

Describing collocations in a specialised dictionary 

4.4

Ordering of collocations

Finally, collocations are organised in entries according to the following criteria. As was seen above, collocations are listed between standard paradigmatic relationships (e.g., near‑synonymy, hyperonymy, etc.) and less standard ones (location, typical instrument, etc.). First, collocations referring to a “sort of ” keyword are listed along with an explanation. This first group comprises term + noun, and adjective + term or term + adjective collocations. The explanations take the form of relative clauses as can be seen in Example 3.

Mot de passe, n. m. Que l’utilisateur remplace par un autre (That the user replaces with another) Qui fonctionne bien (That works well) Qui ne fonctionne pas bien (That does not work well)

ancien ~ (old ~) bon ~, ~ valide (good ~, valid ~) mauvais ~, ~ non valide (bad ~, invalid ~)

Example 3: Explaining the collocations of mot de passe (password) Then, verbal collocates are listed according to two principles: 1) the main actant involved in the meaning of the collocation; and 2) the order in which the activities denoted by the verbs are carried out in the real world. We will illustrate these two principles using the term mot de passe and part of its verbal collocates (Figure 8). The first verbal collocations listed are those in which the first actant of mot de passe is involved, namely the user. This can be seen in the explanations starting with L’utilisateur appearing in Figure 8 (lines 1 to 12): créer un mot de passe (create a password); oublier un mot de passe (forget a password). Then, collocations in which the keyword itself is the subject of the verb are listed (lines 13 and 14): un mot de passe expire, arrive à échéance; a password expires). Afterwards, collocations in which another actant is involved are listed. For example, in the mot de passe entry, a RESPONSABLE (RESPONSIBLE) role has been defined and this role is typically filled by the terms fournisseur (service provider) or ordinateur (computer). This actant is the subject of a number of verbal collocates which appear in Figure 8, namely those on lines 15 to 17: demander un mot de passe (ask for a password); accepter un mot de passe (accept a password). Finally, external participants (which do not appear in the actantial structure may be involved in the meaning of collocates). They appear at the end of the list. In Figure 8, the last line contains a collocation in which an external participant (expressed by Qqn.; Somebody) is involved: Qqn. met fin au fonctionnement du m. (Someone stops the password from working).

 Marie-Claude L’Homme mot de passe1, n. m. Structure actancielle: un mot de passe: ~ remis par AGENT{utilisateur 1} à RESPONSABLE{fournisseur1; ordinateur 1} pour intervenir sur DESTINATION{compte 1; site 1} 1

L’utilisateur crée un m.

créer1 un ~, choisir1 un ~, définir1, un ~

2

-> NOM

3

L’utilisateur cause qu’il ait un m.

4 5 6 7 8 9 10 11

-> NOM L’utilisateur a un m. L’utilisateur modifie un m. -> NOM L’utilisateur n’a pas un m. -> NOM Un m. que l’utilisateur n’a plus L’utilisateur utilise un m. pour le remettre au fournisseur ou à l’ordinateur L’utilisateur utilise un m. pour intervenir sur le compte ou le site Le m. intervient sur le compte ou le site Le m. cesse de fonctionner Le fournisseur ou l’ordinateur se prépare à utiliser un m. Le fournisseur ou l’ordinateur commence à utiliser un m. Le fournisseur ou l’ordinateur commence à utiliser un m. Qqn met fin au fonctionnement d’un m.

création1 d’un ~, choix1 d’un ~, définition1, d’un ~ apprendre un ~, mémoriser un ~, se munir d’un ~, obtenir un ~ obtention d’un ~ avoir un ~, posséder un ~ modifier son ~, changer son ~ changement du ~, modification du ~ oublier son ~, perdre son ~ oubli du ~, perte du ~ ~ oublié, ~ perdu s’authentifier1 avec ~

12 13 14 15 16 17 18

accéder2 à avec un ~ le ~ protège1a … le ~ expire, le ~ arrive à échéance demander un ~ vérifier un ~ accepter1 un ~ découvrir un ~, pirater1 un ~, voler un ~

Figure 8. Verbal collocations for mot de passe1 (password)

The first verbal collocations listed are those in which the first actant of mot de passe is involved, namely the user. This can be seen in the explanations starting with L’utilisateur appearing in Figure 8 (lines 1 to 12): créer un mot de passe (create a password); oublier un mot de passe (forget a password). Then, collocations in which the keyword itself is the subject of the verb are listed (lines 13 and 14): un mot de passe expire, arrive à échéance; a password expires). Afterwards, collocations in which another actant is involved are listed. For example, in the mot de passe entry, a RESPONSABLE (RESPONSIBLE) role has been defined and this role is typically filled by the terms fournisseur (service provider) or ordinateur (computer). This actant is the subject of a number of verbal collocates which

Describing collocations in a specialised dictionary 

appear in Figure 8, namely those on lines 15 to 17: demander un mot de passe (ask for a password); accepter un mot de passe (accept a password). Finally, external participants (which do not appear in the actantial structure may be involved in the meaning of collocates). They appear at the end of the list. In Figure 8, the last line contains a collocation in which an external participant (expressed by Qqn.; Somebody) is involved: Qqn. met fin au fonctionnement du m. (Someone stops the password from working). Once this first subdivision is made, verbal collocates are organised according to the order in which the activities are normally carried out. For example, looking at the collocations in which the user is involved, the creation activity (créer un mot de passe; create a password) will be carried out before the memorization (apprendre un mot de passe; learn a password), the loss of a password (perdre un mot de passe; lose a password), or the use of a password (accéder à avec un mot de passe; to access something with a password). Similarly, a service provider or a computer will first ask for a password (demander un mot de passe), before checking (vérifier un mot de passe), accepting (accepter un mot de passe) or refusing it (refuser un mot de passe). Another example appears in Section 4.3 where the verbal collocates of Internet are organised according to the different steps involved in the use of the Internet. Figure 8 shows that nominalizations (e.g., modification; change) and adjectivations of verbs (e.g., oublié; forgotten) also appear in this part of the list of collocations. Derivatives of verbs are listed under the verb to which they are related semantically. 5. Concluding remarks In this contribution, we presented a methodology for collecting, encoding and organizing collocations in a specialised lexical database devoted to the fields of computing and the Internet. We based our methodology on a framework devised for general language (namely, ECL) which proves extremely useful in representing the many linguistic properties of collocations and on other concrete implementations in specialised dictionaries. We also adapted the ECL framework and other proposals to the specificities of the word combinations we needed to represent, namely, combinations comprising a term related to the field of computing or the Internet. Collocations represent a complex linguistic phenomenon and should be encoded in such a way as to account for their linguistic properties, i.e., their syntactic structure, their meaning, and the actantial structure of the keyword. The representation of these linguistic properties ensures that users will reproduce the collocations in proper contexts. However, the challenge is to define a representation system that is user-friendly enough to ensure that users be able to locate the collocation they are

 Marie-Claude L’Homme

looking for or find relevant information regarding a specific collocation. We based our explanations on a proposal made by Polguère (2003) for general-language lexicography, but we are fully aware that more work needs to be done in order to adapt our own explanations to what users of specialised dictionaries are expecting. Finally, our proposal does not yet deal with the problem of representing relationships between collocations across languages. This will be addressed in the near future. Notes: 1. Note that many records are trilingual in Termium Plus (English, French, and Spanish). In the term record reproduced here, collocates are given in all three languages. 2. In fact, the dictionary gives equivalents in other languages (English, German, Spanish, etc.), but detailed entries are given for French terms only. 3. The relationship between the actantial structure of the keyword of a collocation and the linguistic realizations of collocations and the method for representing them was first defined in Explanatory Combinatorial Lexicology (Mel’čuk et al. 1984–1999; 1995). We will come back to this further on. 4. The Web version of the DiCoInfo is encoded in HTML. This format is generated from an XML representation designed with the collaboration of Benoît Alain, Guy Lapalme and Vincent St-Amour. The XML formalism can also be used to generate a PDF “printed” version of the dictionary. 5. The ≈ symbol is used to represent all senses that are considered as neighbouring: near synonyms, hyperonyms, co-hyponyms, etc. 6. => NOM (NOUN) is used to describe the relationship between the keyword and the nominalization of a verb. Since the verb itself appears above, we find it unnecessary to duplicate the explanation. 7.

A similar definition is given in Hausmann (1979).

8. It could be argued that collocations are less unpredictable in specialised languages than in general language. It has been observed that some collocates combine with terms that belong to the same semantic class (L’Homme 2000, 2003, for example). Although this is true ‑ for example, develop combines with interface, program, application, etc. ‑ there is no rule that allows to predict that develop will be chosen instead of create or generate. 9. ECL represents collocations using a formal system called lexical functions, LFs. Collocations in the DiCoInfo are also encoded using this system. However, in this contribution, we focus on the natural language explanation of collocations, an adaptation proposed by Polguère (2003) in order to make LFs more user-friendly for those who are not familiar with the formalism. “User-friendly” versions of lexical functions have been applied in a general language dictionary, the Lexique actif du français (Mel’čuk and Polguère 2007). 10. As was seen above, these properties are also taken into account in a specialised dictionary (cf. Section 2.4).

Describing collocations in a specialised dictionary 

Acknowledgments The work reported in this contribution was supported by the Social and Humanities Research Council of Canada (SSHRC) and the Fonds québécois de recherche sur la société et la culture (FQRSC). The author would like to thank Crystal Crow for her revision of the English.

References A.

Dictionaries

Benson, N., Benson, E. and Ilson, R. 1986. The BBI Combinatory Dictionary of English: A guide to word combinations. Amsterdam/Philadelphia: John Benjamins. Binon, J., Verlinde, S., Van Dyck, J. and Bertels, A. 2000. Dictionnaire d’apprentissage du français des affaires. Dictionnaire de compréhension et de production de la langue des affaires, Paris: Didier. Dictionnaire fundamental de l’informatique et de l’Internet (DiCoInfo) http://olst.ling.umontreal. ca/dicoinfo/> (Accessed 15 February 2009). Mel’čuk, I. et al. 1984–1999. Dictionnaire explicatif et combinatoire du français contemporain. Montréal: Presses de l’Université de Montréal. Termium Plus® (Accessed 15 February 2009).

B.

Other literature

Bae, H.S. and L’Homme, M.C. 2008. “Converting a monolingual lexical database into a multilingual specialised dictionary.” In Multilingualism and Applied Comparative Linguistics, Volume 2, F. Boers, J. Darquennes, K. Kerremans and R. Temmerman (eds.), 225–255. Cambridge: Cambridge Scholars Publishing. Bergenholtz, H. and Tarp, S. (eds.). 1995. Manual of Specialised Lexicography. The preparation of specialised dictionaries. Amsterdam/Philadelphia: John Benjamins. Cohen, B. 1986. Lexique de cooccurrents. Bourse–conjoncture économique. Montréal: Linguatech. Drouin, P. 2003. “Term extraction using non-technical corpora as a point of leverage.” Terminology 9(1): 99–117. Fontenelle, T. 1997. Turning a Bilingual Dictionary into a Lexical-Semantic Database. Tübingen: Max Niemeyer. Hausmann, F.J. 1979. “Un dictionnaire des collocations est-il possible?” Travaux de linguistique et de littérature 17(1): 187–195. Heid, U. and Freibott, G. 1991. “Collocations dans une base de données terminologique et lexicale.” Meta. 36(1): 77–91. L’Homme, M.C. 2000. “Understanding specialised lexical combinations.” Terminology 6(1): 89– 110. L’Homme, M.C. 2003. “Les combinaisons lexicales spécialisées (CLS). Description lexicographique et intégration aux banques de terminologie.” In Les collocations, analyse et traitement, F. Grossmann and A. Tutin (eds.), 89–103. Amsterdam: De Werelt

 Marie-Claude L’Homme Mel’čuk, I., Clas, A. and Polguère, A. 1995. Introduction à la lexicologie explicative et combinatoire, Louvain-la-Neuve (Belgique): Duculot/Aupelf – UREF. Mel’čuk, I. and Polguère, A. 2007. Lexique actif du français. L’apprentissage du vocabulaire fondé sur 20 000 dérivations sémantiques et collocations du français. Bruxelles: De Boeck. Meynard, I. 2000. Internet. Répertoire bilingue de combinaisons lexicales spécialisées françaisanglais. Montréal: Linguatech. Polguère, A. 2003. “Collocations et fonctions lexicales: pour un modèle d’apprentissage.” In Les Collocations. Analyse et traitement, F. Grossmann and A. Tutin (eds.), 117–133. Amsterdam: De Werelt.

Lexicographic description An onomasiological approach on the basis of phraseology Jón Hilmar Jónsson The paper discusses the validity of using an onomasiological approach to a general dictionary description to a larger extent than has been standard in lexicographic work. This approach allows for the description to focus more clearly on the entire vocabulary and the internal relations between the lexical units than has been possible through a traditional semasiological description with its emphasis on an independent description of individual words (singleword lemmas). Simultaneously, different types of phrases can be treated more actively and as independent units. Icelandic wordnet is a dictionary description of this kind, where a comprehensive and wide-ranging description of Icelandic phrases, which has been published in three phraseological dictionaries, forms the basis and provides material for the description. On the one hand, the phrases appear as multi-word lemmas, and on the other they reveal the syntactic and semantic relations within the vocabulary. Keywords: lexicographic description; phraseology; semantic relations; wordnet; lemma list structure; tagging; disambiguation

1. Introduction In the same way as the corresponding words in the other Nordic languages, orðabók (a book of words), the Icelandic term for a dictionary emphasises the main characteristic of dictionaries, i.e. to present descriptions of a given number of words. In this presentation, each word is described individually as a separate entity, and this description is in most cases without any active correlation to descriptions of other words in the dictionary. Counting the number of words described has been the most direct way to measure the validity of a dictionary; the more words included, the more comprehensive the description, and thus the more valid the dictionary. From the users’ point of view (at least as consumers), this way of

 Jón Hilmar Jónsson

measuring the validity of a work can be considered satisfactory, but anyone with experience of lexicographic work knows that the number of words in a dictionary does not say much about what its authors set out to do. Nor does a finished dictionary say anything about the problems of selecting the appropriate words while omitting others. Actually, it is interesting how rarely those problems are mentioned by authors and compilers when presenting their work and the prerequisites for the selection of lemmas are usually not discussed. The same applies to the selection of and, in particular, the limitation of various features that appear in the description of each lemma, on which principles phrases and word combinations are selected, the function of examples, etc. The structuring of most printed dictionaries, including the majority of the most comprehensive and highly respected ones for single languages, reflects the role of the word (i.e. the lemma/entry word in the lexicographic sense) as the basic element for the lexicographic description. This testifies to the fact that the lexicographer’s work is to a large extent concerned with illustrating the different meanings of individual lemmas and arranging them in the appropriate order. The phrases included in lemma articles are not presented in any collective way, instead they are subordinated to individual lemmas; exactly which lemma can be difficult for users to determine, and the phrases can also be hard to find in comprehensive articles. This results in phrases generally being given little space in dictionaries, so it can be presumed that meaningful set phrases are subject to much stricter demands for limitations and reductions than single words. When this is kept in mind, it has to be admitted that even the most elaborate and respected dictionaries have only in limited ways succeeded in giving a complete description of their vocabulary, and many of the data they contain do not reach users because they are not presented in the appropriate context. This applies for instance to some types of multiword lexical units and their internal relationship, as well as the internal relationship between the lemmas in general, especially their semantic relationship, which, usually, is left to specialised dictionaries, such as thesauri. Traditionally, however, the semantic relationship of set phrases gets even less attention, although many types of specialised phraseological dictionaries have been published. The situation described here is related to two main points of view which the lexicographer has had to choose between when organising and presenting the lexicographic description. On the one hand, the basic units of the description (the lemmas) appear as formal units which can have various meanings, on the other hand they can appear as separate lemmas to represent different meanings. This is the main distinction between a semasiological and an onomasiological lexicographic description. In reality, this distinction is not completely clear. In semasiological dictionaries one can often find examples of two or more homonymous lemmas,

Lexicographic description of phraseology 

especially when the etymology of the words is clearly different. Still, it is more common that a lexicographic description dealing with semantic relationships between words only uses one lemma for each word, thus uniting different submeanings which again are related to other words in their own way. This, for instance, applies to most traditional thesauri in which the lemmas also function as keywords to various set phrases which, semantically, can be unrelated to the lemma. The semasiological approach has, without any doubt, been the most prevalent for a long time, and its benefits as well as its limitations have heavily influenced the social attitudes towards dictionaries and their use. This is for example evident from the attention this approach is given in the most important lexicographic handbooks and readers, compared to the little attention usually given to the onomasiological approach (see e.g. Svensén 2004: 47–48; Béjoint 2000: 32–41). Although it is difficult to measure the influence of those two approaches, it is safe to say that semasiological lexicographic descriptions have much more often been used as bases for onomasiological dictionaries than the other way around, and that onomasiological dictionaries are often considered of little use as sources or material for semasiological dictionaries. An important factor here, undoubtedly, is that onomasiological lexicographic descriptions have only to a limited degree been supported by examples of use or other sources which often take up considerable space in dictionaries of the other type. It can be argued that this situation is in many ways unfortunate; that there is reason to give the onomasiological approach greater attention and a more active role in the order and presentation of a general lexicographic description, instead of limiting it to specialised types of dictionaries. With the arrival of electronic dictionaries, this has become clearer than before as it is now possible to create new description methods where those two main approaches intertwine and support each other. 2. A few main criteria for a general lexicographic description For various reasons, lexicographic descriptions can now be presented in a more thorough, complete and continuous manner than was possible just a few decades ago. Similarly, a general lexicographic description can now span a wider area than before, and combine roles which have, until now, been left to different types of dictionaries according to a traditional dictionary classification. This widening of the field is especially important when taking into account that many dictionary users have only a vague idea about the limitations of individual dictionaries, and it is even less likely that users choose their dictionaries from accurate type definitions when it comes to electronic publications. Finally, it must be kept in mind that in small language communities with a limited market for dictionaries there is a

 Jón Hilmar Jónsson

clear need for a versatile lexicographic description. This is particularly the case in Iceland and generally in all the Nordic language communities. Bearing those prerequisites in mind, it can be expected that a general lexicographic description manages to emphasise, trace and reflect the internal relationships that appear within the vocabulary, both semantically (synonyms, antonyms, semantic fields), syntactically (collocations, phrases, constructions) and morphologically (compounds and derivations). In this context, it is important that, as far as possible, the description should be based on parallel data, which to a certain extent have already been analysed and categorised, and which combined create an active basis for the dictionary. This description must show the characteristics and particularities of individual parts of speech and make it possible for users to use those parts of speech as active elements for sorting and classification. The decision of which type of lexical units should appear as lemmas and how they should be presented must also be based on clear main rules for lemmatisation. The dictionary description must be open and allow constant additions to the basis of the dictionary, and the lemma list must similarly be open to new words and phrases. The onomasiological point of view has to play a more important part in the selection of lemmas than before, and vocabulary that mostly appears in rhematic positions should be given more attention. In regard to the semantic properties of words and phrases, and the analysis of semantic relations, it must be transparent how to treat polysemy and homonymy. For one thing, it is important that clearly distinguished meanings of words (and phrases) be treated as independent lemmas in order for the semantic relations to appear clearly and correctly. In the separation of submeanings (and a possible evaluation of their internal importance) it must be possible to get support from the data forming the basis of the dictionary. For another thing, the analysis of semantic relations has to be conducted at different stages, from the closest units (synonyms, antonyms) to the more distant near-synonyms and wider conceptual fields. If the basis of the dictionary is comprehensive enough, this analysis can be expected to shed light on the internal position and importance of semantically related words, especially synonyms, in regard to usage activity. Thirdly, the semantic analysis must include set phrases (phraseological lemmas) as well as single words. Finally, it is quite realistic to formally co-ordinate the analysis of synonyms and antonyms among phrases, so that the definition of synonymous and antonymous relations is based on parallel formal properties to some degree. The position of phrases as lemmas in such a description will for instance be decided on the basis of the following criteria: 1. The lemmatisation of phrases is not only dependent on semantic independence and clarity (cf. set phrases) or frequency in the dictionary basis, but also on any internal semantic (and to some extent formal) relations to other lemmas.

Lexicographic description of phraseology 

2. The arguments behind the lemmatisation of phrases and the decision about their lemma form must in each case be supported by evidence from the dictionary basis. 3. Phrases can be semantically polysemous (bisemous), and if so must be treated as separate lemmas (just like polysemous single words). 4. Lemma forms of phrases must be standardised so it is possible to get an overview over single types and use formal analytical processes in the sorting and classification. This includes the distinction between constant and variable elements within the phrase. The combination of a semasiological and an onomasiological description can to a certain extent be achieved by using elements from a traditional semasiological description, especially common elements (such as hyperonyms) in definitions (see Trap-Jensen and Lorentzen 2006). However, the problem is that the prerequisites of the semasiological description continue to apply (shape and limit the topic), and the semantic context of the lemmas can only to a limited degree be traced on the basis of the elements found in such a description, for instance because the presentation of definitions and explanations is insufficiently standardised, so that the semantic classification achieved only includes a small part of the vocabulary. The general characteristics of semasiological dictionary descriptions also apply. First, it is the neutral characteristics of the lemma forms which can contain polysemy as well as combine varying and contrasting units described in the entry article. Secondly, the prerequisites for selection of lemmas in such a description have a limiting effect. Thirdly, there is real emphasis on the thematic vocabulary, and therefore, many words and submeanings are presented individually and in isolation, but there is less emphasis on the extensive and varied vocabulary that typically appears in a rhematic position. Finally, the inclusion of phrases is usually limited and their position unclear. In light of the above discussion, there are good reasons to approach this type of general dictionary description in a more independent way. The method described below is based on the idea of using dictionaries that when combined create a comprehensive description of phrases as well as the entire material they contain as the basis for such a description. This basis is then analysed further and categorised from the point of view of the respective needs and goals. In this way, phrases are treated much more independently than in previous general dictionary descriptions. The description of the phrases also provides the basic material for examining semantic relations between individual words. The lexicographic project in question, Íslenskt orðanet (Icelandic wordnet) is currently under construction by Þórdís Úlfarsdóttir and myself at The Árni Magnússon Institute of Icelandic Studies in Reykjavík.1 Our aim is to create an electronic

 Jón Hilmar Jónsson

dictionary description of the Icelandic language with emphasis on semantic and syntactic relations within the vocabulary, both between individual words and phrases. Below, the main characteristics of this lexicographic project will be discussed. The value of independent phraseological descriptions in that context will be considered, and the phraseological dictionaries that the project is based upon will briefly be presented, along with the structuring and presentation of their material. The use of phraseological material to construct a complete lexical-semantic wordnet will be discussed, as well as the characteristics of the lemma list, how it is built up simultaneously with the processing of the material, and the specifics of individual parts of speech will be illustrated. The treatment of multiword lexical units as fully valid lemmas will briefly be considered along with their standardised presentation and the value of grammatical tagging of the lemma forms, before explaining how the polysemy of words and set phrases will be treated, and how this problem varies for different parts of speech. Then the methods and options for the semantic analysis and the semantic classification of the lemmas will be accounted for, partly by tracing together semantically analogous words and expressions in different ways on the basis of collocations and other types of phrases, and partly directly through the lexicographer’s semantic analysis. Finally, the validity of this type of lexicographic description will be discussed in brief, and it will be argued that it is more successful in presenting a complete picture of the vocabulary and its inner context than a traditional semasiological description, and is in fact an important basis for the analysis focused on here. 3. An independent phraseological description A lexicographic description of set phrases can in one way or another be found in most dictionaries and can also be the main subject of certain types of dictionaries, focusing on various types of phrases from different perspectives. In most general dictionaries, such as typical definition dictionaries, the phraseological description usually receives limited attention. Collocations, not to mention free combinations, are usually allowed very little space in the dictionary text, and in the editing process, they are often subject to further limitations and reductions, losing out to elements such as examples of use and semantic explanations. Furthermore, it must be kept in mind that the selection of collocations and their inclusion in the dictionary text entirely depend on the prerequisites that apply for the type and contents of individual articles, and that the main function of each set phrase included is to show a typical use of the (single-word) lemma under which the phrase is placed (and which is usually defined as the essence of the phrase).

Lexicographic description of phraseology 

This means that a general lexicographic description does not present any collective picture of the collocation material contained in dictionaries, and even lexicographers may find it difficult to appreciate the extent and characteristics of their material. Set phrases are usually placed under single-word lemmas and thus belong to the microstructure of the dictionary. This means that the users’ access to the phrases is through an intermediary, as the decision where to place single phrases can be unclear (both in relation to the lemma list and within the individual article). In addition, the presentation of set phrases is only co-ordinated up to a point, and the formal variation of individual phrases is only visible to a limited degree. Moreover, because of their structural placing, the phrases are much less likely to be explained through examples of use than the single words and their different submeanings. Finally, it is evident that there is a strong tendency to limit the number of set phrases included, especially in articles that could potentially contain many of them. When working towards a lexicographic description such as that outlined in the previous section, it must be assumed that phraseology plays a more visible and active role in the description. Therefore, there is a clear need for a comprehensive and diverse collection of collocations which reflects various contexts of use. The role of such a collection of collocations is especially to bring to light the relevant semantic relations within individual parts of speech, and thus to act as glue between synonyms and semantically related units. This collection can also be used as a source to evaluate and compare the closeness of the semantic relations between phrases. However, it is a matter of opinion to what extent it should appear in the description of single lemmas in a finished dictionary, or whether it should rather be available in the background of the description. In the light of the previous reflections, there are compelling reasons to treat phrases as individual lemmas in the same way as single words. This means that the mapping of semantic relations within the vocabulary equally includes single words, set phrases and semantically meaningful constructions. By allowing an inclusion of standardised multi-word units in the lemma list, the lemma strings can be tagged grammatically, and this tagging can be used for sorting and classifying, organising, and, when needed, for various lexicological investigations in connection with the semantic analysis. 4. A lexicographic work on Icelandic phraseology Between 1990 and 2005, this author worked on a lexicographic description of Icelandic phraseology, which was published in three dictionaries. In the first one, Orðastaður (lit. ‘a place of words’, referred to in the following as ‘the Phrase dictionary’), published in 1994, the main focus was on the syntactic context of

 Jón Hilmar Jónsson

phrases, collocations and constructions. Here, the word formation of the lemmas is also considered, showing a semantically categorised collection of their compounds in a separate component in the article. In 2001, an enlarged and revised version of the Phrase dictionary was published (see Figure 1), and a year later, Orðaheimur (lit. ‘a world of words’, below referred to as ‘the Concept dictionary’) was brought out. There, Icelandic phraseology is described in a semantic context, and the description is divided in two: on the one hand, a lemma list containing 840 alphabetised concepts, where a categorised description of phrases is found under each concept (see Figure 2); on the other hand, a universal list of phrases, alphabetised on the basis of the main words of the phrases – 33,000 set phrases in total (see Figure 3). Here, the main emphasis is on set phrases, such as idioms. The material from those two dictionaries was then combined in a single dictionary, published in 2005 as Stóra orðabókin um íslenska málnotkun (‘the big dictionary on Icelandic language use’, below referred to as ‘the Combined dictionary’). There, the lemma lists are combined, but words and concepts are separated through the use of different fonts.

Figure 1. Excerpt from the Combined dictionary (2005)

Lexicographic description of phraseology 

Here, the complete list of the phrases in the Phrase dictionary is included, so the number of phrases in the Combined dictionary is approx. 80,000 in total. The third main part of the dictionary contains lists of the corresponding concepts in English, Danish and German. The printed version of the Combined dictionary also included a CD, which in many ways offers a more direct and flexible access to the material, for instance through various sorting alternatives.

Figure 2. Sorting alternatives in the Combined dictionary CD-ROM

During this same period, a comprehensive phraseological database was compiled, based on an electronic collection of citations from the archive of the Lexicographic Institute of the University of Iceland, using the Phrase dictionary as a model for the presentation. That database now contains more than 130,000 different set phrases, categorised under 50,000 lemmas with flexible search access (www.lexis. hi.is/osamb/osamb.pl).

 Jón Hilmar Jónsson

When combined, those works present a comprehensive description of Icelandic phraseology, together with the description of word formation, originally published in the Phrase dictionary. The material collected shows the diverse characteristics of the Icelandic vocabulary, not only the syntactic relations within the set phrases themselves, but also the semantic relations between words, which can be found on the basis of set phrases and compounds and their classifications.

Figure 3. Semantic relations of compounds and their classifications

Through this work, an important foundation for a comprehensive electronic description of the Icelandic vocabulary has been developed. Semantic relations of both single words and set phrases are in the front, and a diverse phraseological description, where the syntactic and morphological relations are analysed is in the background.

Lexicographic description of phraseology 

5. Icelandic wordnet and its structure On the basis of the above principles, an electronic lexical semantic wordnet is now under construction for the Icelandic language. The bulk of the material, more than 200,000 different set phrases, is taken from the Combined dictionary and the phraseological database, as well as the approx. 100,000 compounds from the word formation component of the Phrase dictionary, and examples of phrases found in other digital text collections. In addition, other types of appropriate material have been included, such as a collection of synonymous Icelandic equivalents from bilingual dictionaries, with Icelandic as the target language, in order to provide further insights into the semantic relations within the vocabulary. 5.1

The making and structure of the lemma list

The database of the wordnet includes a list of keywords, i.e. the main words of each phrase (mostly nouns, adjectives and verbs) as well as all compounds and the words (or word parts) they are constructed of and which make up the vocabulary of the Phrase dictionary. This list of keywords spans the entire database and functions as the first outline for a lemma list for the wordnet. Thus, each keyword in this preliminary lemma list is visibly related to other keywords, syntactically, semantically or on the basis of word-formation, and this relation is used as foundation for a further analysis of the lemmas within the wordnet. The database is constructed so that each lemma (keyword) includes, together with an alphabetical list of the phrases and compounds it is used with, an active list of all other lemmas (keywords) those phrases and compounds are connected to, sorted by word class and alphabetically within each word class (see Figure 4). A special icon points to the type of relation the keyword has to the lemma, and as the analysis of the material and the description of the lemmas escalates, new types of relations appear which are then distinguished by their own special icons within the list of keywords. The purpose of the analysis of phrases within individual lemma articles is twofold. Firstly, to group together set phrases by keywords that show synonymous relations and to connect antonyms within the keywords and secondly, to build up the lemma list itself, especially by adding the appropriate multi-word lemmas on the basis of the phrases. The latter primarily relates to the treatment of verbs and adverbs (and adverbial phrases).

 Jón Hilmar Jónsson

Figure 4. Active list of phrases and the lemmas they are connected to from the Icelandic wordnet

In the case of nouns and adjectives, and their relations within collocations, each step in the categorising process, the purpose of which is to collect synonyms, is a landmark on the way to gather all the synonymous material that the entire collection of collocations contains. Nouns indicating ‘wind’ (which can have slightly different meanings, for instance based on strength or temperature) can for instance be connected on the basis of various adjectives, e.g. hlýr (warm) connects synonymous nouns for a (gentle) breeze such as andvari, blær and gola; hrákaldur (bitter cold) e.g. connecting gustur (gust of wind), stormur (storm) and súgur (draft of wind); kaldur (cold) e.g. connecting gjóstur and nepja, both meaning bitterly cold wind; and beljandi (screamingly loud) e.g. connecting stormur and rok (storm). In the same way, the nouns can show the semantic relation of adjectives, for instance the noun vindur (wind); on the one hand it connects the adjectives svalur (cool), napur (bitingly cold), hráslagalegur (cold and damp) and andkaldur (cool) to indicate cold wind, and on the other hand it connects heitur (hot) and hlýr (warm) to indicate warm wind. A further purpose of categorising the synonyms is to analyse the semantic variety of the lemma, especially adjectives. The situation regarding verbs and their relations to other word classes is not as simple. Under the noun vindur, for example, a substantial number of the phrases are found which are connected to verbs. Many of those are set phrases, including metaphoric idioms such as fá vind í seglin ‘get the wind in one’s sails’ (gather momentum),

Lexicographic description of phraseology 

aka seglum eftir vindi ‘drive sails according to wind’ (be opportunistic). Others can be defined as collocations, for instance vindurinn blæs (the wind blows), vindinn herðir (the wind is strengthened). The verbs fá (get) and aka (drive) in the idiomatic phrases above have no active relations within the vocabulary to the nouns in the phrases, and the connection to the verbs blása (blow) and herða (strengthen) in the last phrases is conditioned by the syntactic characteristics of the verbs. This means that if verbs and verbal lemmas are supposed to play an active role as keywords towards the word classes they function with within phrases, in the same way as nouns and adjectives, we must consider the syntactic characteristics for each case. Furthermore, it is clear that semantic differences among verbs cannot be distinguished without the full consideration of the formal characteristics. Verbs rarely stand out in the lemma lists of general dictionaries, although they can certainly be very visible in the dictionary text itself. If the description of verbs is to include their context of use, syntactic placing, etc., it is necessary to use the formal characteristics to distinguish and categorise, and, for dictionary users, as ways to search and sort the material. Here, there are a number of possible options, but generally speaking, the choices are whether, and if so, to what extent, the description should be expressed in the macrostructure of the dictionary, or whether it should rather be included in the internal layers of the text. A fundamental aspect of the lexicographic works that the Icelandic wordnet is based on is that the presentation of verbs and verbal phrases will show their syntactic characteristics, for instance that arguments will reflect case government and the characteristics of the subject. A further step towards user-friendliness was taken in the Concept dictionary (and later in the Combined dictionary) when all the included phrases were compiled in a continuous alphabetised index, based on a fixed organising method. That way, users can easily search for individual phrases and be referred to the appropriate lemma(s) in the main part of the dictionary. The lemma forms in the index are still traditional, including the infinitive as the lemma form for verbs. To indicate the dimension of the most used verbs, it can be mentioned that in the Combined dictionary, each of the verbs hafa (have) and koma (come) are found in approx. 1,500 phrases. Another important factor of this categorisation is that formally parallel phrases will largely be grouped together, making various characterising aspects of the use of each verb distinguishable. It can be said that in the Combined dictionary, a small step has been taken towards treating verbal phrases as perfectly valid and separate lemmas. In the CD version of the dictionary, users’ access to set phrases, including verbal phrases, is even more direct and more flexible since there are two options to access the material. On the one hand, it is possible to look up phrases found under a certain lemma in the main part of the dictionary, and organise them in various ways; firstly by the form of the phrase, secondly by other keywords connected to the phrase, and

 Jón Hilmar Jónsson

thirdly by the word classes of the words used with the lemma in each phrase. On the other hand, it is possible to look up all set phrases where a given keyword is used, and similarly organise them in different ways: by the form of the phrase, by connecting lemmas from the main part of the dictionary (including concepts), and by the word class of the lemma. In the lexicographic context that applies to the wordnet, which is based on already analysed phraseological material whose description was intended for an electronic presentation, there is reason to go even further and consider fully how verbs, their position and use are dependent on different types of sentences and constructions, and treat them as separate lemmas. Only through that kind of separation can verbs play a comparable role to nouns and adjectives by uncovering the synonymous relations between the other words in the set phrases. This also underlines how set phrases and verbal constructions are fully valid, meaningful lemmas that should be regarded as equal to single-word lemmas. The phraseological description in the Combined dictionary, with its fixed presentation, is a precondition for a possible and realistic treatment of verbs in this way. The presentation of the lemmas in the wordnet is also similar to the one in the Combined dictionary, as will be explained below. This specific treatment of verbs and verbal constructions results in verbs taking up much more space than other word classes compared to a description only consisting of single-word lemmas. How to categorise verbal phrase lemmas in exact word classes depends on their formal presentation, but still there are tens of thousands of verbal lemmas in the wordnet, and if the widest definition is used, there would be close to 70,000. In comparison, there are 125,000 noun lemmas. Although the change from traditional single-word lemmas to multi-word ones is mostly noticeable among the verbs, it is also an interesting option in the description of adjectives, as will briefly be discussed below. Moreover, the inclusion of multi-word lemmas in the wordnet is especially characteristic for adverbial phrases, but this will not be discussed in this article. 5.2

Verbal lemmas – presentation and grammatical tagging

As already mentioned, the presentation of verbal lemmas follows the same main pattern as that for the set phrases in the Combined dictionary and the phraseological database. This presentation will not be described in detail, but some of the most important aspects are the separation of constant and variable elements within the lemma strings, as well as the interchangeable presentation on the basis of the characteristics of the subject (and to some extent, the characteristics of the object), where the main division is between verbs that require an animate subject and those with a subject of the inanimate nature.

Lexicographic description of phraseology 

By including verbal phrases and other multi-word phrases in the lemma list, the lexicographic description contains a far larger number of meaningful units than a single-word lemma list could. As a result of this the word class is no longer the only grammatical sorting feature; the formal and syntactic properties of multiword lemmas can actively be used in the sorting and classification of the lemmas.

Figure 5. From the database of the Icelandic wordnet. The tag strings combine syntactically identical constructions and phrases

 Jón Hilmar Jónsson

In order to present those elements as clearly and usefully as possible, multi-word lemmas are tagged grammatically in the wordnet. Samples of grammatically parallel verbal lemmas found with this tagging can be seen in Figure 5. Those samples also shed light on important aspects within the presentation of the lemmas. It is evident that such a classification and organisation of phrases, independent of specific words or strings of words, are applicable to both the analysis of the characteristics of individual verbs as well as to an analytical approach towards the entire lemma file. The tagging of the lemma strings increases the importance of multi-word lemmas and plays a practical role for the semantic analysis of the lemmas, since formally parallel lemmas often have common semantic characteristics. Last but not least, the tagging facilitates a study of certain semantic characteristics and the semantic categories of the lemmas in light of their formal characteristics, and in the same way, it is possible to analyse the prevalent semantic characteristics of individual syntactic types. 5.3

Disambiguation of polysemous words and phrases

The treatment of verbs described above, where the syntactic environment of the verb and the main arguments appear in the lemma, implies that individual lemmas in the wordnet are in almost all cases semantically monosemous. Such an organisation is fundamentally different from the reigning tradition in general lexicographic descriptions where the lemmas can easily be polysemous, and the object is to distinguish between the submeanings and organise them in the appropriate internal order. This does not only apply to verbs. The need for a disambiguation of the lemmas must also apply to other word classes, as unfortunate polysemy can hinder a natural connection between semantically related lemmas and thus damage the complete semantic classification of the vocabulary (see also Jónsson 2008). The problem of analysing synonyms and antonyms, and other semantic relations between adjectives, stems from the difficulty of establishing clear boundaries between their submeanings, and how much the meaning of an adjective is influenced by the noun it refers to. In a large collection of collocations, such as the one used as basis for the wordnet, the adjectives will, as can be expected, show wider semantic variations than are usually reflected by a traditional lexicographic description. In the wordnet, the semantic analysis is first and foremost intended to give a realistic and confirmed description of the internal semantic relations between the adjectives, and the semantic disambiguation needed serves that purpose. At least to some extent, this can be achieved in a similar way as for the verbs, i.e. with multi-word (especially bipartite) lemmas where the adjective is followed by an illustrative noun for the respective meaning: bitur [frost] (bitter frost), bitur [háð] (bitter irony), bitur [hnífur] (bitter knife).

Lexicographic description of phraseology 

As for the adjectives themselves, it is natural that such a distinction should mostly be limited to the clearest submeanings. However, composite lemma forms with disambiguating nouns can also be a relevant alternative in order to fix the meaning to the appropriate noun, even though the adjective is monosemous, partly to clarify its meaning and partly to create a common denominator for the adjective lemmas that are used with the noun(s) in question. In that way, the noun in the lemma bitur [hnífur] is related to other composite lemmas with the same noun, and at the same time provides an overview of semantically related adjectives: beittur [hnífur], egghvass [hnífur], hvass [hnífur], skarpur [hnífur] (sharp knife); bitlaus [hnífur], egglaus [hnífur] (blunt knife). Under those lemmas, a diverse collection of nouns is combined with the noun hnífur as the common denominator, thus forming a basis for a semantic analysis of the words. It is then possible to add information from other sources to this basis, for instance verbal lemmas where the noun hnífur appears as an argument: brýna , hvessa , hvetja , skerpa (sharpen the knife/edge/axe/saw), bítur , bítur (the knife cuts well/poorly). Using composite adjective lemmas is also an active way to bring together vocabulary referring to important and prominent nouns, regardless of whether the adjectives are monosemous or polysemous. In this way, lemmas with the noun [vindur] can unite adjectives that describe different aspects of the noun (and its synonyms); both words that are found in semantically different lemmas: hvass [vindur] (blowing wind), hvass [hnífur] (sharp knife), hvass [gagnrýni] (harsh criticism), hvass [augnaráð] (sharp look); and words that are clearly monosemous: andkaldur [vindur], byljóttur [vindur] (cool, stormy). Polysemy among nouns is generally clearer than among adjectives, and it is also usually treated more clearly in a general semasiologic lexicographic description, partly by distinguishing between homonyms within the lemma list, but mostly by differentiating between separate submeanings in the articles with some sort of a lemma code. In the wordnet there is a clear need for a wide-ranging disambiguation of polysemous single-word noun lemmas, to avoid the confusion between different meanings which would lead to incorrect semantic connections. The distinction can be achieved with the use of a lemma code together with a label that identifies the actual meaning. 5.4

The semantic analysis of the lemmas

As indicated above, the main topic of the construction of the wordnet is to analyse semantic relations between single-word and the multi-word lemmas on the basis of their morphological and syntactical properties. Keeping in mind the comprehensiveness and many-sidedness of the material it is clear that this analysis is

 Jón Hilmar Jónsson

time-consuming, and that the work will have to be done in stages, which for example involve scanning the lemma file with a view to investigating the relations of individual word classes within set phrases, for instance noun lemmas with regard to the adjectives they combine with and their internal semantic relations, or the verbal lemmas in order to categorise the nouns they bring together. Some stages of the work are well ahead while others have taken a longer time, so it is impossible to give a complete description of the analytical process, and only limited aspects of its results have been uncovered. Therefore, only a few main points will be explained, as well as the main ideas that have affected the methodology of the semantic analysis. First, it should be mentioned that the analysis is mainly two-sided. On the one hand it is based on a lexicographic evaluation which, for example, results in the categorisation of synonyms under single lemmas and a semantic labelling of the lemmas. On the other hand, a great part of the semantic relations are mechanically traceable, and the results of such a trace will have an independent value, for the project as well as for the dictionary users, and support the lexicographic evaluation. Regarding the prospective users of the wordnet, the aim is to give different options when investigating the semantic context of individual lemmas, e.g. to find synonyms and antonyms or get an overview over continuous semantic fields (by widening or narrowing the field), and to let a mechanical semantic trace give an even more complex picture of semantically correlated words and phrases. Furthermore, a mechanical trace of the relations between semantically related words will give a more complex picture, including statistical information on their internal position, e.g. which lemmas within a group of synonyms are more actively related than others. 5.4.1 Synonyms and semantic relations in the light of phraseology

The synonymous and antonymous relations are the clearest semantic relations when the lexicographic material is observed, and the most accessible analytical method in the first stages of the lexicographic work is to connect and categorise the vocabulary on that basis. In the assessment of synonyms, there are naturally many instances of doubt and uncertainty, and it can be difficult to decide whether certain words or lemmas should be regarded as synonymous or whether there is a less semantic relation. In the cases of multi-word verbal phrases, the form must also be considered, and a certain syntactic consistency must be required between lemmas that are considered synonymous. The distinction of multi-word lemmas is the first and, to an extent, the most comprehensive approach in order to analyse synonymous relations, both within the verbs themselves and the nouns they bring together in set phrases. The verb yrkja (compose), for instance, is used in 20 multi-word lemmas with changeable arguments, especially objects and prepositional phrases. Most of those phrases are connected to the lemma yrkja (compose a verse/poem), and under

Lexicographic description of phraseology 

that lemma, they can further be categorised into groups of synonymous phrases, such as those listed in Example 1. a. yrkja kvæði (poem) yrkja ljóð yrkja brag

b. yrkja erfikvæði (funeral poem) yrkja erfiljóð

c. yrkja vísu (stanza) yrkja stöku yrkja bögu d. yrkja ástakvæði (love poem) yrkja mansöng Example 1: Synonymous phrases under the lemma yrkja In the case of noun lemmas, the noun kvæði is connected to more than 30 (multiword) verbal lemmas, and under the lemma vísa, there are connections to a similar number of verbal lemmas. Here, there are also cases of synonymous relations, for instance those in Example 2.

a. flytja fram (perform the poem) mæla af munni fram segja fram

b.

bagla saman (patch up a poem, compose with difficulty) berja saman böggla saman hnoða saman klambra saman

Example 2: Synonymous relations Under the lemma kvæði connections to over 70 adjectives appear where for example the synonymous phrases reproduced in Example 3 can be found.

a. ljóðrænt kvæði (lyrical poem) lýrískt kvæði

b. íburðarmikið kvæði (grandiose poem) skrúðmikið kvæði

c. myndrænt kvæði (visual poem) sjónrænt kvæði Example 3: Synonymous phrases with kvæði

 Jón Hilmar Jónsson

Only synonymous connections under single lemmas have been mentioned above. Those can also be achieved, and in some cases additions can be found by comparing the complete collection of formally parallel phrases under semantically related lemmas, for instance by combining all phrases of the type [adjective + noun] found with the 40 nouns connected to the verbal lemma yrkja (write ) in order to find synonyms among them. For individual words and their inclusion in groups of synonyms compiled this way, it is obvious that the most active words can connect a great number of differently yet semantically related words. This, for instance, applies to central nouns like bátur (boat) and skip (ship) found in various groups of synonyms which have been created by categorising phrases with verbs as well as adjectives and which, in addition, contain a number of semantically related nouns. In these circumstances it is most appropriate to use a mechanical trace to establish through statistical information the nature of the relation between the synonyms, the frequency of individual words, how often single pairs of synonyms are found together, etc. However, the mechanical trace does not always have to be based on the lexicographic categorisation and evaluation. It can also be used more freely as a basic analytical tool whose results can be used as the basis for a further semantic categorisation by the lexicographer. A free electronic analysis of the relations between words in set phrases can be appropriate in various circumstances, and it will be offered to some degree as an option for users of the wordnet. An example of such an operation that often shows interesting results is a combination of co-ordinate words, especially co-ordinate nouns and adjectives. The word fátækt (poverty) is a central word in the collection of those nouns referring to a bad financial situation and is more general and neutral than those nouns which could be considered synonymous (cf. Sigmundsson 1985). Phrases where this word is found reveal rather few characteristic adjectives, and there are also few prevalent verbal phrases. However, there are many examples of co-ordinate nouns, over 50 in total, e.g. fátækt og bjargarleysi (poverty and helplessness), fátækt og erfiðleikar (poverty and difficulties), hungur og fátækt (hunger and poverty). The words used here only to a limited extent show any semantic relation to the word fátækt; on the other hand, they reflect and reveal the various connotations of the word, and in some ways, they are semantically parallel internally: fátækt og veikindi (poverty and illness), fátækt og heilsuleysi (poverty and bad health); fátækt og aumingjaskapur (poverty and misery), fátækt og vesalmennska (poverty and wretchedness). Therefore, it is interesting to trace the relations that appear in other word-pairs where these words are used in order to give a

Lexicographic description of phraseology 

comprehensive picture of a broad semantic field, even with further information on the position of individual words. 5.4.2 Semantic categorisation and concepts

Although mechanical actions can be helpful in connecting semantically parallel words and phrases, there is still a need for a lexicographic evaluation of the semantic characteristics of the lemmas, as well as the lexicographer’s hands-on control over their semantic categorisation. This applies specifically to the semantic context of the lemmas which stretch further than the synonymous relations and formal characteristics, which cannot be defined without referring to conceptual characteristics. In the semantic description of Icelandic set phrases in the Combined dictionary (and before that in the Concept dictionary) the lemma list consists of 840 concepts in alphabetical order. In addition, each of these concepts also includes referrals to other semantically related concepts. As mentioned above, the dictionary includes lists of the parallel concept headings in three foreign languages including English. Naturally, a transferral to English gives an incomplete picture of the semantic relations, but if we look at the English concepts, the concept Anger (Icelandic ‘Reiði’) refers for instance to the concepts Vehemence, Rage, Excitement, Sensitivity, Excitability, Insanity, Irritability, Scolding, Quarrel, Surliness, Callousness, Rudeness, Insult and Discontent, and the concept Friendship (Icelandic ‘Vinátta’) refers to other concepts such as Friendliness, Goodwill, Sincerity, Fondness, Affection, Love, Empathy, Solidarity, Loyalty and Hostility. Under each concept, the set phrases are categorised more thoroughly with semantic comments and notes on each chapter. Although the comprehensiveness of the concepts varies and they include different numbers of phrases, there is no hierarchy among them, and there is a considerable overlap between concepts in the way that individual phrases can be connected to more than one concept. This arrangement suits the topic of these dictionaries particularly well, and an alphabetised lemma list including single-word concepts fits well with the traditional lemma organisation in semasiologic phraseological descriptions, as can be seen when the descriptions of the Phrase dictionary and the Concept dictionary have been combined. A good example of this is how a concept in many cases directly follows a homonymous lemma in the lemma file, where collocations and other set phrases with the lemma itself are described, for instance the lemma vinátta (friendship) is followed by the concept Vinátta, and the lemma reiði (anger) precedes the concept Reiði. In this way, the traditional description intertwines with the conceptual description in the dictionary text, and this relation becomes even stronger and more visible in the phrase index of the Combined dictionary where single keywords combine set phrases which can either be found under word lemmas or concepts.

 Jón Hilmar Jónsson

In the wordnet, it is an obvious choice to use a similar categorising method for concepts as in the Concept dictionary and the Combined dictionary, although it will not be limited to the concepts used there. The choice of concepts as they appear in the Combined dictionary is, of course, arbitrary and unknown to the user in advance. Therefore, the main way of searching the material will most likely be through the set phrases themselves in their tangible form via the respective keywords in the phrase index where the user is referred and directed to the correct phrase on the basis of the sorting rule. In the wordnet, the position of multi-word phrases in the lexicographic description is fundamentally different, and it is therefore not given that the material can be accessed in the same way as in the Combined dictionary. For one thing, the phrases are treated more individually than before, as independent (multi-word) lemmas, which to a large extent are the abstract forms of the phrases with a certain lexical diversity. By doing this, the active relation to individual words within the phrases is severed, especially to verbs in their basic form. For another, the whole presentation and organisation of the dictionary text is determined by the fact that the description will be used electronically. Instead of the support from the clearly organised phrase index of the Combined dictionary, the wordnet offers users the possibility of deciding for themselves the search conditions by entering the appropriate string or row of strings within the lemma. In addition, they can use the grammatical tagging of the lemma strings to qualify the choice of lemmas. This allows them to make connections from individual lemmas to semantically related lemmas without using and expressing visible concepts. The alternative is to use the concepts basically in the background as invisible common denominators for semantically related lemmas which can be conjured and combined under individual lemmas and, when needed, combined with other search conditions. Thus, the overlap of such background concepts becomes a key to widening and narrowing the conceptual field. 6. Onomasiological emphasis in general lexicographic description Above, it has broadly been explained how a lexicographic description can be built up gradually on the basis of a comprehensive description of a language’s phraseology combined with a semantically classified list of compound words. From the beginning, such a lexicographic description contains a diverse analysis and categorisation of the vocabulary where the relations of the lexical units have been exposed, especially the syntactic and morphological relations, but to some extent also the semantic relations. This large collection of collocations and compounds lays the basis for a more comprehensive analysis of the semantic relations within the whole vocabulary,

Lexicographic description of phraseology 

both concerning close relations (synonymous and antonymous) and more wideranging semantic fields. The more fixed phrases, which to a large extent have already been semantically analysed in the phraseological lexicographic description where they are presented in a standardised way, are placed in the lemma list as multi-word lemmas, which in addition to the semantic analysis are given formal analytical elements through the grammatical tagging of the lemma strings. The vocabulary added to the wordnet from the phraseological description is enormous. This entire vocabulary plays an active role in the structure of the wordnet and is not limited in the usual way applying to the selection of lemmas in general semasiological dictionaries. A lexicographic description focusing on the semantic relations of the vocabulary directly stimulates the collection of as large and as comprehensive a vocabulary as possible. When such a description also includes set phrases, it reveals clearly the activity of individual words, which give vital information on internal semantic relations of synonyms and other closely related words (i.e. which words are more actively used than others). Because of the nature of the lemma list of the wordnet it is easily open for new words and phrases to be added with the appropriate connections to the previous vocabulary. One of the most important characteristics of the wordnet, compared to a semasiological description, is the extent to which the analytical elements of the lemmas appear in the lemma forms themselves (in the form of multi-word lemmas). Thus many of the elements that usually belong to the microstructure of the dictionary have been moved to the macrostructure. This does not only lead to a much larger lemma file, it also provides a desirable division between the analysis of the lemma file which is the essence of the wordnet as an independent lexicographic description and the material which that description is based on, and which originally belongs to a different type of lexicographic description. In the beginning of this article, I described a set of main criteria a general lexicographic description must fulfil. The approach described here meets, in my opinion, those criteria better than has so far been possible in a traditional semasiological dictionary description. An important factor here is that the description does not treat individual words as isolated topics but focuses on the entire vocabulary. Instead of only analysing and defining the semantic characteristics of single words, the submeanings are here separated from the start and each single lemma is treated as monosemous. This work then results in a comprehensive lexicographic description allowing, for instance, new and more reliable foundations for the explanations and definitions of word meaning.

 Jón Hilmar Jónsson

Note: 1. The project is supported by Rannis, The Icelandic Centre for Research.

References A.

Dictionaries

[Orðastaður] Jónsson, J.H. 2001. Orðastaður. Orðabók um íslenska málnotkun. Second edition. Reykjavík: JPV útgáfa. [Orðaheimur] Jónsson, J.H. 2002. Orðaheimur. Íslensk hugtakaorðabók með orða- og orðasambandaskrá. Reykjavík: JPV útgáfa. [Stóra orðabókin um íslenska málnotkun] Jónsson, J.H. 2005. Stóra orðabókin um íslenska málnotkun. Reykjavík: JPV útgáfa. Orðasambandaskrá Orðabókar Háskólans. (Accessed 10 October 2008). Sigmundsson, S. (ed.). 1985. Íslensk samheitaorðabók. Reykjavík: University of Iceland.

B.

Other literature

Béjoint, H. 2000. Modern Lexicography. An Introduction. Oxford: Oxford University Press. Jónsson, J.H. 2008. “Entydiggjøring – en grunnleggende operasjon i en begrepsorientert ordbok.” In Nog ordat? Festskrift till Sven-Göran Malmgren den 25 april 2008, K. Jóhannesson, H. Landqvist, A. Lundqvist, L. Rogström, E. Sköldberg, and B. Wallgren Hemlin (eds.), 192–200. Göteborg: Meijerbergs institut för svensk etymologisk forskning. Lorentzen, H. and Trap-Jensen, L. 2006. “ordnet.dk – et nyt sprogligt opslagsværk på internettet.” In Nordiske Studier i Leksikografi 9, H. Lorentzen and L. Trap-Jensen (eds.), 253–264. Copenhagen: Nordisk forening for leksikografi. Svensén, B. 2004. Handbok i lexikografi. Ordböcker och ordboksarbete i teori och praktik. Stockholm: Norstedts Akademiska Förlag.

Item-specific syntagmatic relations in dictionaries Thomas Herbst Starting off with the verb patterns of the Oxford Advanced Learner’s Dictionary (1948), information on complementation and valency patterns has been a central issue in all kinds of production-oriented dictionaries. This article will focus on different ways in which information of this kind has been represented in dictionaries for foreign learners of English and German, including valency dictionaries. Furthermore, the question of how larger units of meaning (collocations and item-specific constructions) can be included in monolingual dictionaries will be discussed at some length, taking into account more recent corpus-based dictionaries. In both cases, the question of accessibility of information for the users is a central one. Keywords: valency; grammatical relations; coding systems; learner’s dictionaries; lexical elements; collocations

1. Dictionaries and words Dr. Johnson (1755) defines dictionary as “A book containing the words of any language in alphabetical order, with explanations of their meaning; a lexicon; a vocabulary; a word-book.” While the term word-book is not common in present-day English, combinations such as ordbog, Wörterbuch, woordenboek exist in other Germanic languages. Dr. Johnson’s definition shows like that of the New Shorter Oxford English Dictionary (1993) – “1 a A book explaining or translating, usu. in alphabetical order, words of a language or languages, giving their pronunciation, spelling, meaning, part of speech, and etymology, or one or some of these” – that the use of the term dictionary does not necessarily entail a different concept from that of a ‘word book’. Interestingly, the definition of dictionnaire to be found in Le Nouveau Petit Robert (1993/1996) describes a wider concept: “Recueil d’unités signifiantes de la langue (mots, termes, éléments...) rangées dans un ordre convenu, qui donne des définitions, des informations sur les signes” in that it is not restricted

 Thomas Herbst

to words but allows for larger units of meaning. This definition would seem a more appropriate description of the meaning of the English word dictionary, as well: since dictionaries are about more than individual words and their meanings, as the definition provided by Dr. Johnson might suggest.1 This is, of course, true in a number of different ways. Firstly, there are dictionaries that describe units of meaning or lexemes that comprise more than one word, which is of course revealed in their lemma structure. Examples of this type are dictionaries of idioms and idiomatic expressions as, for example, Henning Bergenholtz’s Ordbogen over Faste Vendinger. Secondly, however, there are many dictionaries which are lemmatized in terms of individual words but which aim at providing information about how words combine with other words and as such are much more extensive than is indicated in the SOED definition. This applies to all learner’s dictionaries that aim to be production dictionaries but also to collocation dictionaries or valency dictionaries. In fact, as is obvious from such concepts as Leech’s (1981) collocative meaning or Sinclair’s (2004) insights into the relationship between syntax and lexis, the distinction between multi-word units and single lexemes is by no means easy to make, which is also reflected in the design and character of dictionaries. The syntagmatic relations a word enters are no doubt an important part of its lexicographical description. They can be identified and described at different levels of abstraction: the co-occurrence of classes of words (article + noun), the cooccurrence of particular lexical items (guilty conscience), but also the co-occurrence of a particular lexical item with a particular class of words (avoid + V-ing-clause).2 In dictionaries, syntagmatic information thus has to be given which refers to different linguistic levels of abstraction and also takes different forms. This article will focus on the lexicographical types of information and lexicographical devices used in English learner’s dictionaries to express item-specific syntagmatic relations, i.e. those that generally fall under the scope of valency and collocation.3 2. Valency and other grammatical relations 2.1

Information conveyed by word class labels

One type of information concerning syntagmatic relations is presented by word class labels since they provide information not only about morphological but also about distributional properties of words, which, as Bergenholtz and Schaeder point out, makes them relevant for production purposes.4 They argue: “... man muß dabei von solchen Angaben verlangen können, daß sie systematisch angebracht und

Item-specific syntagmatic relations in dictionaries 

Table 1. Treatment of both in five learner’s dictionaries Both

OALD2

OALD6

COBUILD4

LDOCE4

MEDAL2

det

Det predet Quant Pron

determiner predeterminer

determiner predeterminer

Pron

pronoun

Conj

conjunction

no label

Adj

both... and

Pron Adv Adv

pron

einem leicht nachvollziehbaren Wortartensystem entnommen werden” [...one must thereby be able to require of such information that it is systematically attached and taken from an easily comprehensible kind of word system] (1977: 43). In modern English learner’s dictionaries, this undoubtedly is the case for most of the items included, especially those that can be attributed to such classes as nouns, verbs or adjectives. Other words are more problematic, however: for the word both, for example, a wide range of labels can be found, as is illustrated by Table 1. This list shows that the demands made on the users are quite high. The fact that there is no generally accepted system of word classes for a language such as English means that the importance to be attributed to a thorough explanation of the terms within the dictionary is very high. While LDOCE4 only defines the term predeterminer in the actual entry of predeterminer (and does not contain any explicit section on its metalanguage), the MEDAL2 entry of both contains a very explicit account of what is meant by the word class labels used. Both can be used in the following ways: 1. as a determiner (followed by a noun, but not by a pronoun): Both children are at school. 2. as a predeterminer (followed by a word such as ‘the’, ‘this’, ‘his’, etc): I like both these pictures. • Both her children are boys. 3. as a pronoun: Both arrived at the same time. (followed by ‘of ’): Both of them are learning English. (after a noun or pronoun subject): The twins both have black hair. (following a pronoun object): I like them both. (after a modal or auxiliary verb, or after the verb ‘to be’): We can both speak Spanish. • They are both good singers. 4. in the expression both...and...: a method that is both simple and effective (MEDAL2) Although this description covers the linguistic facts with an admirably high degree of discretion, it also shows how theory-bound any such description is: since in some terminologies words such as her or these are classified as pronouns, some users might

 Thomas Herbst

find the explanation that both as a predeterminer cannot be followed “by a pronoun” confusing. Although it would be inappropriate to blame the compilers of one dictionary for using a different terminological framework from that employed in other dictionaries or grammars, it has to be said that the use of the term predeterminer in LDOCE4 is not entirely unproblematic. LDOCE4 defines predeterminer as “a word that is used before a determiner (...)” and applies this label to words such as all or both (which are labelled determiner, predeterminer, pron) (1).

(1) They were quarrelling all the time.

Any or no, however, are not classified as predeterminers although they can be followed by more in such sentences as the LDOCE4 examples (2 and 3).

(2) Are there any more sandwiches?

(3) I have no more questions.

Even if LDOCE4 labels the lemma more2 “determiner, pron” – wisely without indicating which word class label applies to which of the examples provided – one can safely assume that (2) and (3) are intended as examples of determiner uses, which contrast with pronoun uses such as in (4).

(4) Don’t waste any more of my time.

It is at least debatable whether a category such as that of predeterminers, which in CGEL are characterized by the fact that they “can occur before certain central determiners” (1985: 257) and contrast with postdeterminers, can be interpreted correctly by many users (especially if the term postdeterminer is not employed). What is certainly much more helpful is highlighting certain (very frequent) combinations of this kind in bold type (Example 5). more

1... much/a lot/far more... 2... a little/many/some/any more...

Example 5: Highlighting combinations of predeterminers In fact, it must be doubted whether both from a lexicological and a lexicographical point of view distinguishing between determiners and pronouns as two different word classes is really appropriate.5 Similarly, the question must be raised of how many users are to benefit from a classification of a word as prep, conj, adv (OALD7). In fact, it could be argued (Huddleston and Pullum 2002: 612–617; Herbst and Schüller 2008: 61–67) that the uses of words such as before which dictionaries describe in terms of word class labels such as preposition, conjunction and adverb are described more appropriately as differences at the level of valency – all the more so since the definitions

Item-specific syntagmatic relations in dictionaries 

given, for example, in LDOCE4 do not indicate any semantic reasons for attributing a word such as before to different word classes.

before1... conjunction 1 earlier than a particular event or action before2... prep 1 earlier than someone or something before3... adv 1 at an earlier time

It is indeed remarkable that, as Huddleston and Pullum (2002: 600) point out, the fact that a word can be followed by a noun phrase or by a clause as in examples 5a to 6b. (5a) I can remember everything (remember + noun phrase) (5b) I remember the elocution teacher was very keen on the modern plays that were being published then. (remember + clause) (6a) I put my work before everything. (before + noun phrase) (6b) He could read and write before he went to school. (before + clause) is taken in traditional grammar as a reason for assigning it to different word classes in the case of words such as before but not in the case of words such as remember.What is even more remarkable, however, is that this traditional distinction has had an enormous impact on the structure of dictionaries: while LDOCE4 provides three different entries for before, there is, of course (?), only one for remember. This is not the place for discussing the linguistic reasons that might justify subsuming all uses of words such as before under one word class particle as has been done by Herbst and Schüller (2008) or for extending the category preposition in the way Huddleston and Pullum (2002) suggest. From a lexicographical point of view, however, one must raise the question why most compilers of dictionaries seem to find it appropriate to give semantic considerations priority over syntactic features in the case of words such as eat or remember and to take the opposite policy in the case of the distinction between prepositions, conjunctions and adverbs. The current policy must also be questioned with respect to accuracy and interpretability. Firstly, one must doubt whether many users will be able to interpret a label such as adverb correctly in the case of words such as before or on, especially when the definition of adverb provided by the dictionary reads as follows: a word that adds to the meaning of a verb, an adjective, another adverb, or a whole sentence, such as ‘slowly’ in ‘He ran slowly’, ‘very’ in ‘It’s very hot’, or ‘naturally’ in ‘Naturally, we want you to come’.

 Thomas Herbst

a word that adds more information about place, time, manner, cause or degree to a verb, an adjective, a phrase or another adverb

Secondly, the labels conjunction, preposition and adverb are not sufficiently discrete to really describe the fact that a word such as before can occur in constructions such as those illustrated in (6c). (6c) FIVE WEEKS before Brazil’s first presidential elections in 29 years, an increasingly volatile political and economic situation favours a struggle between youthful candidates of the right and left. It may thus make sense to recognize the fact that not all words fall neatly into classes and to provide explicit accounts of their syntactic features in cases where word class labels obviously do not suffice to grasp the linguistic facts, which of course is also one important function of example sentences. Attaching a word class label to a word or a word use is no great help to the users if it does not allow them to draw any conclusions about its use. For users of dictionaries, word class labels serve as a shortcut for a list of morphological and syntactic features shared by a class of words.6 Given the differences in the use of linguistic terminology, the properties associated with a particular label in the dictionary must be listed in a kind of dictionary grammar (Tarp 2008: 243–247);7 if not all of these properties are met by the word in question, this must either be stated explicitly or the word must not be given this label.8 It must be seen as a major drawback of the present generation of English learner’s dictionaries that they do not contain proper dictionary grammars that would enable users to find out, for example, what a particular word class label is supposed to cover. 2.2

The labels transitive and intransitive

One of the most established ways of making at least rudimentary statements about the valency of verbs is in terms of the traditional distinction between transitive and intransitive: labels such as vi or vt can be considered indicators of subclasses of verbs. The categories transitive and intransitive have long been shown to be inadequate for the description of languages such as English and German: one reason for this is that a label such as transitive does not provide a sufficiently explicit description of its syntactic properties (cf. e.g. Herbst and Schüller 2008: 167–172). In the description of English, grammars such as the Quirk et al. (1985: 54), and in a similar form in Huddleston and Pullum (2002: 216–217), replace the term transitive by a threefold distinction between monotransitive, ditransitive and complex transitive. Furthermore, as Bergenholtz (1984: 34) has shown convincingly in an

Item-specific syntagmatic relations in dictionaries 

analysis of German dictionaries, the term transitive is by no means employed on the basis of one generally accepted definition (cf. Klotz 2001: 68–69).9 Despite the obvious shortcomings of the terms for the purposes of linguistic analysis and lexicographical application, the distinction between transitive and intransitive still features prominently in many dictionaries, which, in the case of bilingual dictionaries can lead to unnecessarily complex entries if the same sense distinctions are made for intransitive and transitive uses of the same verb (Herbst and Klotz 2003: 180–182). It is encouraging to see that although LDOCE4 and MEDAL2 still make use of codes such as [I] and [T] to indicate the traditional distinction, COBUILD1 and OALD (from the fourth edition onwards) refrain from doing so and express this information in the form of codes such as [V] and [VN] (OALD7).10 This is exactly one of the options that Bergenholtz suggested for German dictionaries:11 ... Man verzichtet ganz auf die fragwürdige Information transitiv und führt statt dessen rudimentäre Satzmusterhinweise an. (...) Eine ersatzlose Streichung wie in DUDEN-GWB muß als Kapitulation vor nicht unüberwindbaren theoretischen und empirischen Problemen interpretiert werden. (Bergenholtz 1984: 34) (One completely foregoes the doubtful information transitive and presents instead rudimentary sample sentence references. (...) The decision of not replacing that information by something else must be interpreted in DUDEN GWB as a surrender to theoretical and empirical problems that are difficult to overcome.)

A further argument against codes such as [I] or [T] is that – although the terms transitive and intransitive belong to the standard repertoire of linguistic terms of whose existence laypeople are aware of – many users may not be sufficiently acquainted with them to really be able to interpret such labels correctly.12 2.3

Coding systems: valency information in learner’s dictionaries of English

It is one of the main features of learner’s dictionaries that they provide valency information in a more specific form than by the labels transitive and intransitive. The fact that Hornby introduced verb patterns in the OALD13 can be taken as indication of the importance he attributed to this type of syntagmatic relation in a dictionary that aims to be useable as a production dictionary. At the same time, it has to be said that certain features of the coding system were certainly more appropriate to the purposes of teaching material than to lexicographical purposes. It is thus not surprising that in the course of the development of learner’s dictionaries, new methods for providing this kind of information have been developed. It seems appropriate to divide the different approaches taken in English learner’s dictionaries into four types (Herbst and Klotz 2003: 78–83):

 Thomas Herbst

type I: non-transparent, non-mnemotechnic codes type II: non-transparent mnemotechnic codes type III: transparent codes type IV: explicit pattern indicators

The different systems of verb patterns used in the first three editions of the OALD are an example of type I: in OALD2, for example, under the different senses identified for the verb want one finds: want

... 1. (VP1, 6, 9, 18) require; be in need of ... 2. (VP 1, 2, 3) wish for; have a desire for... ... 3. (VP 1, 2, 17c) need, ought... ...

The different verb patterns are listed in the front matter (OALD 2: xiv-xxviii) and it is to be feared that users will have to consult these lists whenever they want to interpret the codes because no mnemotechnic value is attached to the symbols. In this sense, LDOCE1 (1978) can be seen as a great improvement because it makes use of symbols which are still not (or only partially) transparent but have some mnemotechnic value (i.e. type II) in that in the case of the code [T3] given for want

... 4 [T3]

T stands for a (mono)transitive verb and 3 for an infinitive: the fact that all monotransitive verb uses contain the letter T and infinitives are always coded as 3 makes this system superior to type I systems. Nevertheless, both type I and type II systems have the great disadvantage of not immediately making clear to the user of the dictionary what kind of information it is that is provided by the code. A user who is not sufficiently experienced in the use of this particular dictionary may not even be aware of the fact that codes such as VP1 or T3 provide exactly the information they are looking for. The great advantage of such coding systems is, of course, that they do not expect the user to be familiar with any kind of linguistic terminology. This is different in the case of transparent coding systems as they are used in COBUILD1, LDOCE2, OALD7 or in a specialised dictionary such as VDE, for example. A code such as

• to-INF

seems immediately interpretable, but of course only to users who are familiar with the notion of a ‘to-infinitive’, which may not always be an unproblematic assumption to make.14 Furthermore, even codes such as

•n • V-ing

Item-specific syntagmatic relations in dictionaries 

require the user to deduce that these symbols – like to-INF – do not stand for individual words or word classes but for particular types of phrase or clause. Thus even the use of such transparent coding systems involves some familiarity with the principles of linguistic analysis. This is the case to a much lesser degree with the so-called pattern indicators (type IV), in which the structure is spelt out in terms of lexical words as in: able... be ~ to do sth. want want to do sth want sb to do sth want sth to happen want ... want to do sth... want sb/sth to do sth Although this may perhaps appear as the most user-friendly way of presenting information on valency or complementation, the fact that verbs such as do or happen and pronouns such as somebody and something have to be used is not totally unproblematic. Thus it could be argued to what extent sentences such as (7) and (8)

(7)

BNC

... we want her to be a teacher

(8)

BNC

I don’t want her to be alone.

will be perceived by users as being covered by the codes given in LDOCE4 or MEDAL2 since be is a stative verb whereas do and happen could be interpreted to stand for dynamic verbs. From a descriptive point of view, a coding system of type III, which would cover (1) and (2) in a form such as + N + to-INF, definitely seems more adequate. 2.4

Different types of valency information

The development of the various English learner’s dictionaries shows a strong tendency towards types III and IV and thus towards systems which are considerably more user-friendly than those of earlier editions. Nevertheless, this improvement is restricted to the actual design of the patterns as such and not to the overall presentation of valency information. In fact, there is a considerable amount of inconsistency in two respects: the way that the optionality of complements is indicated and the treatment of particle complements. As shown above, the fact that, in the terminology of valency theory, a (noun phrase) complement can be optional, for instance, can be shown in dictionaries in different ways, namely (see also Table 2): 1. By assigning the uses to different word classes such as adverb (no valency complement) versus preposition or conjunction (with “obligatory” complement).

 Thomas Herbst

Table 2. Noun phrase complements in learner’s dictionaries

before

read

without complement

with complement

They had never met before. Adverb

She had met him before this. preposition

She was reading. I

She was reading a book. T

V

Vn

LDOCE4, MEDAL2, COBUILD, OALD7 LDOCE4, MEDAL2 COBUILD, OALD7, VDE

2. By marking verbs as intransitive or transitive (LDOCE4, MEDAL2). 3. By indicating different patterns such as [V] and [V n] (COBUILD, OALD7,VDE). 3. Particles: valency between lexis and construction 3.1

Lexical elements in valency

The relation between a governing word and certain types of complement such as a noun phrase complement [N] or a [V-ing]-complement

(9) ... avoid caffeine-containing drugs and diet aids.

(10) How can I avoid making mistakes? illustrates the co-occurrence of a lexical unit with a grammatical construction. Other types of complement can be described in lexical terms because the complement contains a fixed lexical element. This is the case, for instance, with that-clauses or wh-clauses, which are characterized by the presence of the word that (unless the grammar wishes to operate with such a category as a ‘that-clause without that’) or a member of the very small set of wh-words.15 The lexical character of such combinations is most apparent in the LDOCE4 coding system described above, where pattern illustrations such as “wonder who/what/how etc”, “wonder if/ whether” or “decide that” are given. MEDAL2 uses a plus sign “+ how/what/when etc”, whereas OALD7, for instance, includes these lexical element in grammatical pattern codes such as [V wh-] (for wonder) or [V that] (for conclude).

Item-specific syntagmatic relations in dictionaries 

3.2

The prepositional verb fallacy

A similar, or perhaps even more obvious, case is presented by examples such as (11) He had to look after his dahlias. (12) Your future might depend on doing well in maths. In cases like these, many grammars and dictionaries consider the verb and the particle to form a complex lexical unit, which in itself has valency properties. Thus LDOCE4, OALD6 and MEDAL2 classify look after and depend on as ‘phrasal verbs’. This is broadly in line with the analysis provided by Quirk et al. (1985; hereinafter called CGEL), who however refer to this class of multi-word verb as prepositional verbs. In the case of look after such an analysis is quite convincing since there are obvious semantic reasons for it. On the other hand, semantic classifications of this kind are a matter of degree. It seems at least debatable whether there is any such difference in the meaning of the verb decide in the uses (13) and (14). (13) Stirling, however, was worried that their cover was blown and decided on a change of plan. (14) He decided to change his tactics. Nevertheless, modern learner’s dictionaries (OALD6, LDOCE4, MEDAL2) tend to identify a separate phrasal verb use for decide on, providing explanations of the kind: decide on/upon sth phr v to choose something or someone after thinking carefully: Have you decided on a date for the wedding? Interestingly, neither OALD6 nor LDOCE4 treat the combinations decide against and decide in favour of in the same way. These are included in the main entry of decide: 1... to make a choice or judgment about something, especially after considering all the possibilities or arguments... decide against/in favour of (doing) sth He eventually decided against telling her.  After a long discussion, they decided in favour of (= chose) the older applicant. Within a theoretical framework such as the valency approach, it seems more consistent to deal with such cases as decide + [on_N], [on_V-ing] etc. and identify specified particle complements as complements of the verb in very much the same way as noun phrase complements.

 Thomas Herbst

Table 3. Treatment of prepositional complements in three learner’s dictionaries

beg for belong to decide against decide between decide in favour of decide on delight in depend on feed into invest in lecture on look at look for object to rely on respond to talk about wonder at

LDOCE4

OALD6

MEDAL2

beg phr v decide decide decide decide on phr v phr v feed + phr v invest + phr v lecture look + phr v look + phr v object rely on/upon respond talk wonder

Beg phr v Decide Decide

beg phr v decide + phr v decide decide phr v phr v phr v feed invest + phr v lecture look + phr v look + phr v object phr v respond talk wonder

phr v phr v phr v Feed invest + phr v Lecture Look Look Object phr v Respond Talk Wonder

The valency analysis – verb + phrase introduced by particle – has the great advantage of not separating this type of complement from other types of complementation in the entry because it means that all uses of a verb with the same sense are given in the same place in the dictionary. In any case, it seems that a considerable amount of inconsistency can be found in English learner’s dictionaries in this respect, as can be seen from the following table comparing the treatment of prepositional complements (which are all listed as such in VDE, for example) in three learner’s dictionaries (Table 3).16 Apparently, LDOCE4 tends to list cases such as... looking at his watch in the main verb entry (sense 1 “to turn your eyes towards something, so that you can see it” but also in the phrasal verb subentry (example:... looked at each other...) (sense 1, same explanation). In the phrasal entry, further senses are identified such as “to examine something and try to find out what is wrong with it” as in (15). (15) Can you look at my car? There’s a strange noise coming from the front wheel.

At the same time, look at that is listed as unit 16 in the entry for look, whereas look at sb/sth and not much to look at are given as units 5 and 7 of the phrasal verb

Item-specific syntagmatic relations in dictionaries 

subentry. OALD6 is linguistically more consistent in that it does not duplicate uses of look at given in the entry for look in the phrasal verbs section. Nevertheless, the problem remains that constructions with at are listed in different places in these dictionaries because on the one hand they follow a formal structure (main verb entry look followed by subentries for phrasal verbs or a phrasal verbs section) and on the other hand their structure is determined by semantic classifications. This complication may be due to the fact that these dictionaries aim to be encoding dictionaries and decoding dictionaries at the same time. For purposes of decoding or marking it is convenient for the user to be able to identify all uses of a verb with a particular construction as quickly as possible. For encoding purposes the starting point is the sense expressed by the verb – and there the user should find the (most frequent) constructions in which this sense occurs. The present practice of many dictionaries seems to be: 1. To include particle complements in the main entry of the verb if the complements are optional, i.e. if the verb can be used without them, or if other complements can be used in the sense identified. 2. To see the particle as part of a phrasal verb if the particle complement is obligatory in the particular sense of the verb identified. This has the disadvantage, however, that the same sense of the verb may appear under two different sublemmata in the dictionary. Thus LDOCE4 identifies two phrasal verbs pass as and pass for, which could be subsumed under one sense of the verb pass, as in VDE, where a monovalent use of the verb is identified too (16). (16) Anything unexpected, whatever it is, counts as expressive. It will pass. OALD6 and MEDAL2 also identify two phrasal verbs but refer the user from pass as to pass for. Nevertheless, one could argue that the idea of treating such combinations as lexical units is not entirely convincing – especially since, as the discussion of decide against and decide on has shown, the decision of what one considers to be a separate lexical unit will always be an arbitrary one – at least to a certain extent.17 3.3

Phrasal verbs as idioms

Cases such as decide on and look at (referred to as prepositional verbs in CGEL) must be distinguished from cases such as read out, in which the particle can occur before or after the other complement (and which are referred to as phrasal verbs in CGEL): (19) Claire... was given the job of reading out the instructions. (20) Don’t memorise notes or read them out like a sermon.

 Thomas Herbst

In fact, the difference between these two types can be seen as a reason for treating them as lexical units lexicographically because the patterns in which they occur need to be specified. While LDOCE1 makes use of rather abstract labels such as v prep and v adv, LDOCE4 and OALD7 (like VDE) make use of double headed arrows to indicate shiftability. MEDAL2, strangely enough, does not indicate the difference between (non-shiftable) prepositional verbs and (shiftable) phrasal verbs at all (unless the fact that the former have a stress mark on the verb and the latter on the particle should be taken as an indication of the difference in syntactic behaviour). It is certainly true that phrasal verbs (in this narrow sense of the term) present a stronger case for being treated as idiomatic combinations than constructions of the decide on type discussed above and as such can be analysed as separate lexemes.18 If, however, verb particle combinations such as read out, look up (and possibly also look after) are indeed regarded as complex lexemes, they can either be seen as compounds such as washing-up liquid or as idioms such as read between the lines. However, LDOCE4, OALD7, MEDAL2 assign them special status by grouping all phrasal verbs together in the form of a phrasal verbs block or a section of subentries which only comprises phrasal verbs. The problem of finding the most appropriate place for multi-word units in the dictionary is one that also poses itself with combinations of words that do not fall within the scope of word formation, namely collocations. 4. Collocations 4.1

Semantically significant collocations

Syntagmatic relations between individual lexical items can be described in terms of collocation – a term introduced into linguistics by Harold Palmer and John Rupert Firth (e.g. Firth 1956/1968).19 It has received a number of different interpretations, which are relevant to lexicography in different ways: the statisticallyoriented analysis of the co-occurrence of word forms propagated by corpus linguistics is a useful tool for designing examples that contain frequent word combinations and for identifying recurrent phraseological units.20 In the context of dictionaries for foreign learners, one type of phraseological unit, which is based on the semantically oriented view of collocation, is of particular interest. Combinations such as raise objections, make coffee or weak tea can be described as “encoding idioms” (Fillmore, Kay and O’Connor 1988) since they do not present any problem in terms of text comprehension but are unpredictable with respect to text production. As far as the lexicographical treatment of such collocations is

Item-specific syntagmatic relations in dictionaries 

concerned, Hausmann’s (2007: 218) distinction between Basis (base) and Kollokator (collocate) is particularly relevant: Kollokationen (Beispiele: confirmed bachelor, to lay the table / célibataire endurci, mettre la table / eingefleischter Junggeselle, Tisch decken) sind normtypische phraseologische Wortverbindungen, die aus einer Basis und einem Kollokator bestehen. Die Basis ist ein Wort, das ohne Kontext definiert, gelernt und übersetzt werden kann (bachelor, table). Der Kollokator ist ein Wort, das beim Formulieren in der Abhängigkeit von der Basis gewählt wird und das folglich nicht ohne die Basis definiert, gelernt und übersetzt werden kann (confirmed, to lay...). (Collocations (e.g. confirmed bachelor, to lay the table / célibataire endurci, mettre la table / eingefleischter Junggeselle, Tisch decken) are typical phraseological word connections that are composed of a base and a collocate. The base is the word that can be defined, learned and translated without context (bachelor, table). The collocate is the word that is chosen in view of its dependency from the base and that consequently cannot be defined, learned and translated without the base (confirmed, to lay...).)

Despite certain criticism that has been made of Hausmann’s concept of collocation (Tarp 2008: 249–254), it cannot be denied that the distinction between base and collocate addresses the needs of the foreign language user in at least some prototypical situations of language production. This concerns in particular combinations of adjective and noun (weak tea, confirmed bachelor) or adverb and verb (agree entirely, badly want)21 involving ‘degree’ or ‘intensification’, where one can certainly argue that the base will be a likely starting point of formulating the utterance, which is a good reason for including such collocations in the entry of the base in a dictionary that is intended to serve as a production dictionary. This is not always the case, however. One could argue that in the case of combinations such as change one’s mind or bear resemblance to the nouns mind and resemblance need not necessarily be regarded as starting points of the utterance, but rather that the combination as such is one way of expressing a certain meaning a user may want to express. While this could be taken as an argument for including change one’s mind under change and mind, pessimistically, it could also be argued that for production purposes neither is particularly promising. Furthermore, there is the obvious problem that there is no way of actually determining the unpredictability of a combination: if one identifies a sense of mind in terms of ‘current opinion or inclination of a person’, then change one’s mind and have a mind to do something could be subsumed as free combinations under this sense but one could also find reasons for treating change one’s mind as a phraseological unit. Monolingual dictionaries designed for users of different mother tongues are confronted with a kind of paradoxical situation: on the one hand, the

 Thomas Herbst

unpredictability of a combination may depend on the mother tongue of the users (hohe Geschwindigkeit – high speed; hohe Geldstrafe – heavy fine) (Herbst and Klotz 2003: 138–140); on the other hand, having understood the collocational character of language may make learners want to check whether a combination they want to use actually exists in the language. This is definitely an argument for including such collocations in the dictionary, it can also be taken as an argument for listing such collocations under the collocate,22 but it is perfectly obvious that a dictionary aiming to be usable as a production dictionary must in any case list collocations under the base, as Hausmann (1985: 121–122)23 argues. This is indeed the policy of collocation dictionaries such as the Oxford Collocations Dictionary for students of English (2002), which is lemmatised according to the bases. Thus under compromise (noun) one finds: • ADJ. acceptable, fair, good, happy, honourable, possible, pragmatic, reasonable, sensible, suitable  ideal  muddled, uneasy...  inevitable, necessary • VERB + COMPROMISE agree on, arrive at, come to, find, make, reach, work out...  look for, seek... offer, suggest...  accept  reject • COMPROMISE + NOUN agreement, deal, formula, proposal, resolution, solution  candidate... • PREP. ∼ between... ∼ on/over... ∼ with... Although it neither provides any information on the frequency of the collocations listed (which would be desirable) nor contains explicit information on meaning, the Oxford Collocations Dictionary is an excellent production tool for advanced foreign users of English.24 It is obvious that general learner’s dictionaries, which have many more purposes to serve, could not possibly be as explicit as that, but what is important is that collocations included should be recognizable as special combinations. One can distinguish between six different lexicographic devices used to provide collocational information in these dictionaries (Herbst and Mittmann 2008):25 1. Collocation given as or like an example but not marked in any special way: I entirely agree with the comments you made about public transport. have a clear (guilty) ~ 2. Collocation highlighted in bold type in an example of a particular use (subtypes: with or without glossed explanation): to have a clear/guilty conscience (= to feel that you have done right/wrong)

If moderates fail to reach a compromise, the extremists will dominate the agenda.

Item-specific syntagmatic relations in dictionaries 

3. Collocation highlighted in bold type under a particular use (subtypes: with or without glossed explanation) conspiratorial whisper/smile/wink etc. Britta gave him a conspiratorial smile

take/suffer the consequences (of sth) (=accept the bad results of something you have done) He broke the law, and now he must face the consequences of his actions. • guilty conscience (= a bad feeling because you have done something wrong) It’s hard to imagine how people live with a guilty conscience. • clear conscience (= the knowledge that you have done nothing wrong) We want to leave with a clear conscience, knowing we did the job right. 4. Collocation highlighted in bold type in explanation of the headword with a separate explanation Your conscience is the part of your mind that tells you whether what you are doing is right or wrong. If you have a guilty conscience, you feel guilty about something because you know it was wrong. If you have a clear conscience, you do not feel guilty because you know you have done nothing wrong.

5. Collocation highlighted in bold type or in colour as a separate use with explanation 3 the dubious honour/distinction/pleasure (of doing sth) a dubious honour etc is the opposite of an honour – used about something unpleasant that happens 2 change your mind [blue in original] to change your decision, plan, or opinion about something 6a. Box with collocations: Separate boxes are a particularly prominent way of drawing the users’ attention to collocations, as is shown by the following examples: come to a conclusion/reach a conclusion (=decide something) draw a conclusion (from sth) (=decide something because of information you have) ... logical conclusion firm conclusion inescapable conclusion (=the conclusion that you must come to)

make/draw up/write a list compile a list formal (= make a list) ...

 Thomas Herbst

a long/short list...

Collocation Adjectives frequently used with consequence • dire, disastrous, fatal, inevitable, serious, tragic, unforeseen, unfortunate Verbs frequently used with consequence as the object • accept, consider, face, suffer, take

SYNONYMS satisfying rewarding • pleasing • gratifying • fulfilling …. PATTERNS AND COLLOCATIONS • very/highly/extremely satisfying/rewarding/pleasing/gratifying/fulfilling • a satisfying/rewarding/gratifying/fulfilling experience/feeling • (a) satisfying/rewarding/fulfilling job/career/work • to be pleasing/gratifying to the eye/ear/senses • to find sth satisfying/rewarding/pleasing/gratifying/fulfilling 6b. List of collocations after headword and explanation

This wide range of lexicographic devices for presenting collocational information is an indication of the principal importance attributed to the aspect of collocation in this type of dictionary. It is obvious that it is important that users should be made aware of the special character of the combination, which is the case with all of the devices employed with the exception of type 1, which, however, is used relatively sparsely in the present generation of learner’s dictionaries. In particular, the boxes used in LDOCE4, MEDAL2, OALD7 and COBUILD4 can be seen as extremely useful for production purposes. 4.2

Further phraseological units

Semantically significant collocations represent one type of phraseological unit that must be covered in a dictionary for foreign learners. Phrased even more carefully: for the purposes of foreign language lexicography the concept of semantically significant collocations can be regarded as a methodological tool for identifying elements that ought to be treated as phraseological units in a dictionary.26 At the

Item-specific syntagmatic relations in dictionaries 

same time, other combinations must also be included and indicated with equal prominence. Certainly, elements such as in black and white, in line with, the two of them or in comparison with must also be seen as “Halbfertigprodukte der Sprache” (semi-finished products of language), a label Hausmann (1984: 398) attaches to collocations. These examples can also serve to illustrate the fact already indicated above that no clear dividing line can be drawn between collocations as “sequences of lexical items which habitually co-occur, but which are nonetheless fully transparent in the sense that each lexical constituent is also a semantic constituent” (Cruse 1986: 40) and idioms as “a combination of two or more words which function as a unit of meaning” (Cowie and Mackin 1974: viii-ix) or as “a lexical complex which is semantically simplex” (Cruse 1986: 37). Part of the problem is that whether a combination of words should be seen as being composed of different parts or not may depend on the perspective one chooses: from the point of view of the semanticist, bear resemblance can certainly be analysed into two meaningful units (and thus be classified as a semantically significant combination). This does not mean, however, that for the user this would not present a single choice (in the words of Sinclair 1991: 110) as one way of expressing similarity. In fact, bear resemblance can be seen as part of a lexico-grammatical chunk of the following form (Herbst and Klotz 2008: 339):27 a/an+ adjectiv bear +

+ resemblance +to negative determiner

A similar example is presented by combinations such as let alone (cf. Fillmore, Kay and O’Connor 1988) or in case, which, again, could be analysed as consisting of two constituents, but again could be seen as a pre-fabricated alternative to the single word if. Whether this should be taken as an argument for treating it as a separate lexeme and assigning it lemma status in a dictionary is a different matter, of course. It is interesting to see how dictionaries deal with combinations of this kind (Table 4). There seems to be a certain reluctance in monolingual dictionaries to attribute lemma status to such items, although other combinations of two or more words such as black and white (listed under IDM in OALD6) or black humour are listed as separate lemmata. What is even more surprising perhaps is that even a large bilingual dictionary such as Langenscheidt Collins Großwörterbuch Englisch should list of course, at all and in case as equivalents of German natürlich, überhaupt and falls but does not include them as separate lemmata in the

 Thomas Herbst

English-German parts of the dictionary. Partly, this may have to do with the fact that one would not expect users to identify a combination of words such as of course as a Table 4. Treatment of word combinations in three learner’s dictionaries LDOCE4

OALD6

MEDAL2

CamALD3

at all

phraseological unit as use 6 of all determiner, predeterminer, pronoun

idiom under all pronoun

phrase under at

lemma at all adv

in case

phraseological unit as use 3 under case noun (given as: (just) in case)

idiom under case noun (given as: (just) in case)

phrase under case

phraseological unit under case noun

let alone

phraseological unit 14 under let

idiom under alone (crossreference under let)

phrase under let

in bold type under let suggest adv (!!)

of course

phraseological unit as use 1 of course n

idiom under course noun

lemma of course adv

sublemma of course noun lemma of course adv

single lexeme and look for it under a separate lemma, which, of course, can just be taken as an illustration of the complexity of the problem because it could also be said of a combination such as black and white or black humour. On the basis of linguistic criteria of classification, multi-word lexemes of any kind ought to be given lemma status because they cannot sensibly be related to any lexical unit of one of their components. (As with idioms, a cross-referencing system taking the user from course to of course would still be desirable for decoding purposes.) It is obvious that the question of what is to be considered a word or a lexeme affects the question of the scope of the subject of the present article. Strictly speaking, it is not only compounds such as county council but also idioms such as it takes all sorts that do not fall under the scope of syntagmatic relations of words. On the other hand, the gradience character of the distinction between idioms and collocations makes a consistent policy that distinguishes between differ-

Item-specific syntagmatic relations in dictionaries 

ent types of multi-word units almost impossible, especially if it is to be understood by the users of such a dictionary.28 5. Solutions and problems All in all, one can say that modern English learner’s dictionaries address the question of syntagmatic relations of words with a great number of different devices. It is obvious that the lexicographical devices employed in the various dictionaries need not and cannot be identical since a device may be regarded as particularly suitable for the specific user group targeted by the dictionary in question. Nevertheless, certain devices seem to have become standard in the present-day generation of English learner’s dictionaries. For instance, all learner’s dictionaries include systematic information on the valency patterns of verbs as they were introduced by Hornby in OALD1, but none would make use of a non-transparent coding system as that of the original verb patterns. Rather, transparent codes making use of codes such as [N] or [to-INF] seem to be the established practice (OALD7, COBUILD, VDE) – with LDOCE4 and MEDAL2 going a step further by making use of pattern illustrations. While these at first sight appear to be easier to interpret, they may also give rise to misinterpretation. Strangely enough, the dictionaries opting for this seemingly simple way of providing valency pattern information make use of the traditional labels [I] and [T], which are not only linguistically problematic but which may well cause difficulties of interpretation to many users. Given the overall attempt to achieve greatest possible user-friendliness, it is surprising to see to what extent traditional word class labels such as conjunction, preposition or adverb and less established labels such as determiner or predeterminer are being used. MEDAL2 stands alone in creating explanatory adequacy by providing explicit information of these uses in the actual entries. One area where there seems to be scope for improvement in future English learner’s dictionaries is the inclusion of what Bergenholtz (2002: 43) and Tarp (2008) refer to as a dictionary grammar: such a dictionary grammar should contain a clear definition of all grammatical terms used in the dictionary so that the dictionary user has at least the chance to find out which syntactic or morphological features a particular word class label entails. From a valency point of view it would certainly seem desirable to replace terms such as transitive and intransitive by unambiguous codes (as OALD and COBUILD have done) and to describe the uses of words such as before, on, both or much explicitly without providing them with a label which is linguistically unsatisfactory and probably incomprehensible to most users.

 Thomas Herbst

As far as the level of lexical co-occurrence is concerned, there is definitely a justification for taking semantically significant collocations into account (see also Herbst and Klotz 2008). The practice of highlighting such collocations in bold type under particular uses or in example sentences is certainly a very good one and is far superior to collocations simply being given as examples without being marked especially. Drawing the users attention to such collocations in special boxes is certainly also a very positive innovation. It is debatable whether this should be done in rather unspecific lists also containing other types of word combinations and word formations as in LDOCE4, or whether more specific types of boxes with labels such as “collocations” as MEDAL2’s or “word partnership” as in COBUILD4, where the collocates are distinguished according to their word classes, should be preferred. In any case, the policy of including mini-collocation dictionaries for production purposes in the entries of general learner’s dictionaries is definitely a positive feature which future generations of these dictionaries should maintain and perhaps improve in terms of quantity and systematicity. On the whole, one can come to the conclusion that syntagmatic relations of words such as valency patterns and semantically significant collocations are expressed in modern learner’s dictionaries for English by a number of rather adequate lexicographical devices and that the insights about the idiomatic nature of language and the role of syntagmatic relations emphasized by modern corpus linguistics – as, for example, by John Sinclair (2004: 133), who said “patterns of co-selection among words, which are much stronger than any description has yet allowed for, have a direct connection with meaning” – have certainly been addressed with great care. What is more problematical, perhaps, is the treatment of multi-word units – either lexical units or one lexeme or the establishment of separate lexemes. Here, the lemmatisation practice of many dictionaries currently on the market seems questionable. Both for considerations based on the practical needs (and skills) of dictionary users and from a valency point of view, there is little justification of treating non-idiomatic verb particle combinations of the type decide on as separate lexemes outside the corresponding sense of the governing verb. Whether idiomatic combinations such as look after or look up should actually be listed in a special phrasal verbs section or be included amongst all other multi-word lexemes given under a particular headword seems worth considering. The established practice of making a difference between phrasal verbs on the one hand and phrases or idioms on the other results in two complexes violating the alphabetical order of lemmata in the dictionary, which could indeed by taken as an argument for giving up the distinction between idiomatic phrasal verbs and other idiomatic expressions. As far as lemmatisation is concerned, the crucial question is, as so often, the expected needs of the users. Linguistically, there are very good reasons for treating combinations such as of course, let alone or in case as complex lexemes, which

Item-specific syntagmatic relations in dictionaries 

means that like washing-up liquid they should have lemma status in any case. For decoding purposes, however, one could imagine users not recognizing the lexemecharacter of the combination and searching in the dictionary under course or case, for example, which is why the combinations ought to be listed there as well (at least in the form of a cross-reference). For production purposes, any alphabetical arrangement is unsuitable anyway: here, semantically-based access methods will have to be developed – some of the boxes or cross-referencing systems in the existing paper dictionaries can be seen as attempts in that direction, electronic dictionaries may open up more promising options. On the whole, it seems that the insights about the central role of that aspect of language for which Sinclair (1991: 110) coined the term idiom principle are indeed being taken account of in modern learner’s dictionaries of English. But even if dictionaries are increasingly not merely about words, as Dr. Johnson’s definition might suggest, but about combinations of words, it is important that syntagmatic phraseological units should not only be contained in dictionaries, but they should be found by the users – and this includes especially those users who are not looking for them. Notes 1. This definition does not make clear, however, that dictonnaire is also used to cover reference works with encyclopaedic character. (F.J. Hausmann; personal communication) 2. Compare the distinction between colligation and collocation as made by Firth (1956/1968: 106, 113) and Sinclair (2004). Compare also Stubbs (2009): “[1] COLLOCATION is the relation between the node word and individual word-forms which co-occur frequently with it. [2] COLLIGATION is the relation between the node word and grammatical categories which co-occur frequently with it.” 3. For the distinction between “Informationstypen” and “lexikografische Darstellungsformen” see Herbst and Klotz (2003: 32). This article is not concerned with the aspect of quantitative analysis of different dictionaries. 4. For the different ways in which information on word class could be useful to the user of a dictionary see Bergenholtz and Mugdan (1982: 27–29). For a specification of users’ needs see Tarp (2008: 69–78). 5. Compare the approach taken by Aarts and Aarts (1988) or Herbst and Schüller (2008), for instance. 6. Cf. the two main functions of word class labels identified by Bergenholtz and Mugdan (1982: 29): “(b) Der Benutzer möchte die Flexion des betreffenden Lemmas nachschlagen.... (c) Der Benutzer möchte Genaueres über den syntaktischen Gebrauch des Lemmas erfahren; auch hier können Wortartangaben als erste Hinweise verstanden werden. Wie bei der Flexion muß das jedoch in der Kurzgrammatik erläutert werden.”

 Thomas Herbst 7. Cf. also Bergenholtz and Pedersen (1994) and compare the grammatical explanations given in the Madagassich-deutsches Wörterbuch (1991). 8. Interestingly, dictionaries are relatively inconsistent here. Thus, LDOCE1 has a code (Wa5) to show that an adjective such as nuclear does not usually take comparative or superlative forms. LDOCE4 and OALD7 mark it as “usually before noun”, which COBUILD1 expresses by a label “ATTRIB” and COBUILD4 by a pattern [ADJ n]; MEDAL2 contains no such information for nuclear. 9. For transitivity see also Bergenholtz (1985). 10. OALD7, however, makes use of the term transitive and intransitive on its study pages where the codes are explained in some detail with reference to these categories. 11. Interestingly, a German learner’s dictionary such as the Langenscheidt’s Großwörterbuch als Fremdsprache makes use of such labels as vi and vt, whereas the Duden Universalwörterbuch 4 does not. 12. In a questionnaire given to students in a linguistics Hauptseminar (third and fourth year students of English) at the University of Erlangen-Nürnberg, for instance, about one-third were not able to give any or the precise definition of these terms (unpublished). 13. For the development of verb patterns by Palmer and Hornby see Cowie (1999). Compare Hornby (1954), Hanks (2008) and OALD2. 14. This applies in particular to codes referring to functional categories such as “object”, which receive rather different interpretations in different frameworks and teaching traditions. Thus a code such as [T+to-v; obj] used in LDOCE2 for want (You want to see a doctor about your cough) is meant to indicate that no noun phrase object can occur in this sense, which may contradict many users’ notion of object. 15. In systems of pattern indicators, such patterns are indicated by listing these lexical elements such as “1... tell sb (that)... tell sb what/how/where/who etc” . Here, it is not made clear whether (or when) the items listed can realize the complement position on their own and when they are to be interpreted as symbols for a clause: Ingrid told Peter that he should go ahead and tell Pia that her parents were getting a divorce. Jack had to go, but he didn’t tell me why . The danger of misinterpretation is reduced in pattern codes such as those of OALD7 [VN(that)] and [VNwh-] or by a notation as the one used in VDE: (that)-CL, wh-CL and wh to-INF. 16. “phr v” means that a verb particle construction is listed as a phrasal verb; otherwise it is included as a pattern of the verb entry. Certain discrepancies between the three dictionaries are due to the fact that sometimes certain uses are only given in one dictionary: thus the use of feed into identified in LDOCE4 as a phrasal verb is not covered at all in OALD7 or MEDAL2. 17. One should note that the situation with particle complements does not differ from that of other types of complementation, where, however, it is not seen as a reason in lexicography to create two different entries or subentries. Thus, look in looked after clearly represents a different sense of look from If you look carefully you can see that the painting represents a human figure. LDOCE4>, but this applies equally to the di- and the trivalent uses of a verb such as consider in He said he would take time to consider the matter. and ... they considered the prize money too low . Still, look after is listed as separate sublemma in such dictionaries as LDOCE4, OALD7, or MEDAL2, whereas the different uses of consider are indicated as different patterns under different senses of consider.

Item-specific syntagmatic relations in dictionaries  18. Nevertheless, it has to be said that the liberal practice of identifying so-called phrasal verbs has a number of major drawbacks. Thus, for instance, LDOCE4, OALD7 and MEDAL2 give eat out in the phrasal verbs section, although they could equally well be given under eat ‘have a meal’. For production purposes it certainly does not make sense to separate these from eat because of the supposed idiomaticity of out. 19. Compare Cowie (1999: 54–65), Firth (1956/1968) and Hausmann (2007). See also Hausmann and Blumenthal (2006) for a survey of research on collocation. 20. See Herbst (1996), Nesselhauf (2004), Jehle (2007) or Siepmann (2007) for a survey of different uses of the term collocation. 21. See Greenbaum (1988). 22. For situations of language production where the collocation should be listed under the collocate compare Tarp (2008: 253). 23. See Hausmann (1985: 121–2), Herbst and Klotz (2003: 85). For a detailed analysis of the coverage of collocations in earlier editions of the English learner’s dictionaries see Bahns (1996) and Mittmann (1999). 24. For the Oxford Collocations Dictionary see Klotz (2003) and Lea (2007). 25. Note that the examples given here serve to illustrate a type of presentation but should not necessarily be taken to represent the policy typical of the dictionary from which they are taken. 26. Compare also Hausmann (2007: 217–220). 27. If one considers language use, the notion that one way of expressing a concept or idea in a specific context is more probable than others; this notion, in German referred to as ‘Probabeme’, was developed in the context of bilingual lexicography (Herbst and Klotz 2003; 2008), will also have to be taken account of in this context, i.e. the fact that six months is a much more frequent way of referring to the time span in question than half a year or that it is more common to say the two etc. of us etc. than we two. 28. The two extreme points of the gradient can be identified relatively clearly: (1) One end point is made up of words such as washing line or railway station, which consist of more than one word form but which are nevertheless seen as single lexemes by linguists and lexicographers. In dictionaries, such complex lexemes are usually dealt with in a separate entry or possibly a subentry. (2) The other end is represented by single lexemes which enter valency relations with particular grammatical constructions (enjoy + [NP] or + [V-ing-clause]) or collocational relations with particular other words of the language (entirely agree). In dictionaries, this type of information must be given under the respective sense of the lexeme in question.

References A.

Dictionaries

[COBUILD1] Sinclair, J. (ed.). 1987. Collins COBUILD English Language Dictionary. First edition. London: Collins.

 Thomas Herbst [COBUILD4] Sinclair, J. (ed.). 2003. Collins COBUILD Advanced Learner’s English Dictionary. Fourth edition. Glasgow: HarperCollins. Langenscheidt Collins Großwörterbuch Englisch. 2004. Fifth edition. Berlin/München: Langenscheidt. [LDOCE1] Procter, P. (ed.). 1978. Longman Dictionary of Contemporary English. First edition. Harlow: Longman. [LDOCE2] Summers, D. (ed.). 1987. Longman Dictionary of Contemporary English. Second edition. Harlow: Longman. [LDOCE4] Summers, D. (ed.). 2003. Longman Dictionary of Contemporary English. Fourth edition. Harlow: Longman. [LGwbDaF] Götz, D., Haensch, G. and Wellmann, H. (eds.). 1993. Langenscheidt Großwörterbuch Deutsch als Fremdsprache. Berlin/München: Langenscheidt. [MEDAL2] Rundell, M. (ed.). 2007. MacMillan English Dictionary for Advanced Learners. Second edition. Oxford: MacMillan Education. [OALD2] Hornby, A.S., Gatenby, E.V. and Wakefield, H. (eds.). 1963. The Advanced Learner’s Dictionary of Current English. Second edition. London: Oxford University Press. [OALD3] Hornby, A.S., Cowie, A.P. and Lewis, J.W. (eds.). 1974. Oxford Advanced Learner’s Dictionary of Current English. Third edition. London: Oxford University Press. [OALD6] Wehmeier, S. (ed.). 2000. Oxford Advanced Learner’s Dictionary. Sixth edition. Oxford: Oxford University Press. [OALD7] Wehmeier, S., McIntosh, C. and Turnbull, J. (eds.). 2005. Oxford Advanced Learner’s Dictionary. Seventh edition. Oxford: Oxford University Press. [OCD] Deuter, M. (ed.). 2002. Oxford Collocations Dictionary for Students of English. Oxford: Oxford University Press. [SOED] Brown. L. (ed.). 1993. The New Shorter Oxford English Dictionary. Oxford: Clarendon. [VDE] Herbst, T. (ed.). 2004. A Valency Dictionary of English. Berlin: Mouton de Gruyter.

B.

Other literature

Aarts, F. and Aarts J. 1982/1988. English Syntactic Structures. New York/Leyden: Prentice Hall/ Martinus Nijhoff. Bahns, J. 1996. Kollokationen als lexikographisches Problem: Eine Analyse allgemeiner und spezieller Lernerwörterbücher des Englischen. Tübingen: Max Niemeyer. Bergenholtz, H. 1984. “Grammatik im Wörterbuch: Syntax.” In Studien zur neuhochdeutschen Lexikographie V, H.E. Wiegand (ed.), 1–46. Hildesheim/Zürich/New York: Georg Olms. Bergenholtz, H. 1985. “Vom wissenschaftlichen Wörterbuch zum Lernerwörterbuch.” In Lexikographie und Grammatik, H. Bergenholtz and J. Mugdan (eds.), 225–277. Tübingen: Max Niemeyer. Bergenholtz, H. 2002. “Das de Gruyter Wörterbuch Deutsch als Fremdsprache und das neue Duden-Wörterbuch in zehn Bänden: Ein Vergleich in Hinblick auf die Grammatik.” In Perspektiven der pädagogischen Lexikographie des Deutschen II, H.E. Wiegand (ed.), 35–53. Tübingen: Max Niemeyer. Bergenholtz, H. et al. (ed.). 1991. Madagassisch-deutsches Wörterbuch/Rakibolana MalagasyAlema. Moers: Edition Aragon.

Item-specific syntagmatic relations in dictionaries  Bergenholtz, H. and Mugdan, J. 1982. “Grammatik im Wörterbuch: Probleme und Aufgaben.” In Studien zur neuhochdeutschen Lexikographie II, H.E. Wiegand (ed.), 17–36. Hildesheim/ New York: Georg Olms. Bergenholtz, H. and Pedersen, J. 1994. “Grammar in bilingual LSP dictionaries, with a special view to technical English.” In Fachlexikographie: Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds.), 351–383. Tübingen: Gunter Narr. Bergenholtz, H. and Schaeder, B. 1977. Die Wortarten des Deutschen. Stuttgart: Klett. Cowie, A.P. 1999. English Dictionaries for Foreign Learners. Oxford: Oxford University Press. Cowie, A.P. and Mackin, R. 1974. Oxford Dictionary of Current Idiomatic English. Oxford: Oxford University Press. Cruse, D.A. 1986. Lexical Semantics. Cambridge: Cambridge University Press. Fillmore, C.J., Kay, P. and O’Connor, M.C. 1988. “Regularity and idiomaticity in grammatical constructions: The case of let alone.” Language 64: 501–538. Firth, J.R. 1956/1968. “Descriptive linguistics and the study of English.” In Selected Papers of J R Firth 1952–59, F. Palmer (ed.), 96–114. London/Harlow: Longman. Greenbaum, S. 1988. “Some verb-intensifier collocations in American and British English.” In Good English and the Grammarian, 113–124. London: Longman. Hanks, P. 2008. “Lexical patterns: From Hornby to Hunston and beyond.” In Proceedings of the XIII EURALEX International Congress, E. Bernal and J. DeCesaris (eds.), 89–129. Barcelona: IULA. Hausmann, F.J. 1984. “Wortschatzlernen ist Kollokationslernen.” Praxis des neusprachlichen Unterrichts 31: 395–406. Hausmann, F.J. 1985. “Kollokationen im deutschen Wörterbuch: Ein Beitrag zur Theorie des lexikographischen Beispiels.” In Lexikographie und Grammatik, H. Bergenholtz and J. Mugdan (eds.), 118–129. Tübingen: Max Niemeyer. Hausmann, F.J. 2007. “Die Kollokationen im Rahmen der Phraseologie: Systematische und historische Darstellung.” Zeitschrift für Anglistik und Amerikanistik 55.3: 217–234. Hausmann, F.J. and Blumenthal, P. 2006. “Présentation: collocations, corpus, dictionnaires.” In Collocations, Corpus, Dictionnaires [Langue Française 150], P. Blumenthal and F.J. Hausmann (eds.), 3–13. Herbst, T. 1996. “What are collocations: sandy beaches or false teeth?” English Studies 77(4): 379–393. Herbst, T. and Klotz, M. 2003. Lexikografie. Paderborn: Schöningh. Herbst, T. and Klotz, M. 2008. “Chunks statt Chomsky: Zur Rolle von lexikalischen Patterns im Fremdsprachenerwerb.” In FFF Fortschritte im frühen Fremdsprachenlernen, H. Böttger (ed.), 336–247. München: Domino. Herbst, T. and Mittmann, B. (2008). “Collocation in English dictionaries at the beginning of the twenty-first century.” In Kollokationen in der europäischen Lexikographie und Wörterbuchforschung, F.J. Hausmann (ed.). Lexicographica 24, 103–120. Tübingen: Max Niemeyer. Herbst, T. and Schüller, S. 2008. Introduction to Syntactic Analysis: A Valency Approach. Tübingen: Gunter Narr. Hornby, A.S. 1954. A Guide to Patterns and Usage in English. London: Oxford University Press. Huddleston, R. and Pullum, G.K. (eds.). 2002. The Cambridge Grammar of the English Language. Cambridge: C.U.P. Jehle, G. 2007. The Advanced Foreign Learner’s Mental Lexicon. Hamburg: Dr. Kovač. Johnson, S. 1755. A Dictionary of the English Language. London.

 Thomas Herbst Klotz, M. 2001. “Valenzinformation im monolingualen englischen Lernerwörterbuch und im bilingualen Wörterbuch englisch-deutsch.” Zeitschrift für Angewandte Linguistik 2001: 61– 79. Klotz, M. 2003. “[Review of] J. Crowther, S. Dignen & D. Lea (eds.), Oxford Collocations Dictionary for Students of English.” International Journal of Lexicography 16(1): 57–61. Lea, D. 2007. “Making a Collocations Dictionary.” Zeitschrift für Anglistik und Amerikanistik 55.3: 261–271. Leech, G. 1981. Semantics. Harmondsworth: Penguin. Mittmann, B. 1999. “The treatment of collocations in OALD5, LDOCE3, COBUILD2 and CIDE.” In The Perfect Learners’ Dictionary, T. Herbst and K. Popp (eds), 101–111. Tübingen: Max Niemeyer. Nesselhauf, N. 2004. Collocations in a Learner Corpus. Amsterdam/Philadelphia: John Benjamins. Quirk, R., Greenbaum, S., Leech, G. and Svartvik, J. 1985. The Comprehensive Grammar of the English Language. London/New York: Longman. [CGEL] Siepmann, D. 2007. “Collocations and examples: Their relationship and treatment in a new corpus-based learner’s dictionary.” Zeitschrift für Anglistik und Amerikanistik 55.3: 235–260. Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sinclair, J. 2004. Trust the Text: Language, corpus and discourse. London/New York: Routledge. Stubbs, M. 2009. “Technology and phraseology: With notes on the history of corpus linguistics.” In Exploring the Lexis-Grammar Interface, U. Römer and R. Schulze (eds), 15–32. Amsterdam/Philadelphia: John Benjamins. Tarp, S. 2008. Lexicography in the borderland between Knowledge and Non-Knowledge: General lexicographical theory with particular focus on learner’s lexicography [Lexicographica. Series Maior 134]. Tübingen: Max Niemeyer.

Henning Bergenholtz: Bibliovita Sandro Nielsen and Sven Tarp Henning Bergenholtz’s list of publication is so long and he has addressed so many topics that the best way in which to do him justice is to list the most important of his published works chronologically. This temporal presentation also gives an indication of the long period in which he has contributed to research and therefore influenced both national and international research communities. Not only has Henning Bergenholtz published theoretical works on lexicography and other topics, but he is also the author and co-author of a range of dictionaries. His theoretical publications in a variety of languages include books and papers on the development of theories and principles for the compilation of dictionaries as well as reviews of dictionaries and monographs in a constant quest to learn from other scholars. The dictionaries he has authored and co-authored cover the spectrum from monolingual general dictionaries to bilingual specialised dictionaries with languages such as Danish, Dutch, English, Malagasy and Spanish within fields as diverse as microbiology and accounting. In addition, he has been the editor of several books and academic journals, e.g. Hermes and LexicoNordica (he is one of the founding fathers of both), he still is and has been on the advisory and editorial boards of several international journals, and he has acted as consultant for several publishing houses around the world. The list is divided into two parts. The first is a list of the dictionaries authored and co-authored by Henning Bergenholtz and the second is a list of theoretical contributions. Titles with Henning Begenholtz as the sole author or editor contain no author name, whereas the names of co-authors and co-editors are indicated in all other cases. 1. Dictionaries With Katja Laursen, Sophie Leegaard, Maj Bukhave, Marta Sørensen and Heidi Agerbo Pedersen. Den Danske Netordbog. Second edition. Aarhus: Centre for Lexicography, Aarhus School of Business 2008.

 Sandro Nielsen and Sven Tarp

With Vibeke Vrang and Richard Almind. Ordbogen over Faste Vendinger. Aarhus: Centre for Lexicography, Aarhus School of Business 2007. With Sandro Nielsen, Lise Mourier, Mia Johnsen, Jóna Ellendersen, Rie Bobjerg Nielsen and Malene Vendelbo Visholm. Regnskabsordbogen engelsk-dansk. Copenhagen: Thomson 2007. With Vibeke Vrang, Richard Almind, Katja Laursen, Sophie Leegaard, Maj Bukhave, Mia Johnsen, Lena Lund, Helle Grønborg, Maria Bruun Jensen, Signe Rixen Larsen and Rikke Refslund. Den Danske Netordbog. First edition. Odense: Cool Systems 2006. With Inger Bergenholtz and Richard Almind. Musikordbogen. Aarhus: Centre for Lexicography, Aarhus School of Business 2006. With Sandro Nielsen and Lise Mourier. Den Engelsk-Danske Regnskabsordbog/ English-Danish Dictionary of Accounting. Aarhus: Centre for Lexicography, Aarhuss School of Business 2006. With Sandro Nielsen and Lise Mourier. Den Engelske Regnskabsordbog/English Dictionary of Accounting. Aarhus: Centre for Lexicography, Aarhus School of Business 2006. With Annelies van Hees, Harry Perridon, Gitte Möller and Andrea Voigt. Prisma groot woordenboek Deens-Nederland. Utrecht: Het Spectrum 2004. With Gitte Möller, Andrea Voigt, Annelies van Hees and Harry Perridon. DanskHollandsk Ordbog. Copenhagen: Gyldendal 2004. With Sandro Nielsen and Lise Mourier. Regnskabsordbogen dansk-engelsk. Copenhagen: Thomson 2004. With Sandro Nielsen, Lise Mourier, Brit Sørensen, Helle Grønborg, Mads Melgaard, Trine Middelboe and Richard Almind. Den Danske Regnskabsordbog. Aarhus: Centre for Lexicography, Aarhus School of Business 2003. With Sandro Nielsen, Lise Mourier, Brit Sørensen, Mads Melgaard, Trine Middelboe and Richard Almind. Den Dansk-Engelske Regnskabsordbog/Danish-English Dictionary of Accounting. Aarhus: Centre for Lexicography, Aarhus School of Business 2003. With Vibeke Vrang and Richard Almind. Den Danske Netordbog. Aarhus: Centre for Lexicography, Aarhus School of Business 2002. With Susanne Frandsen. Ordbogen om kvinden. Copenhagen: Forum 2001.

Bibliovita 

With Vibeke Vrang, Richard Almind and Jette Pedersen. DanskOrdbogen. Aarhus: Systime 1999. With Uwe Kaufmann, Sven Tarp, Bjarne Stumman and Laura de la Rosa Marabet. Diccionario Enciclopédico de Ingeniería Genética Español-Inglés. Toronto: Lugus 1998. With Uwe Kaufmann, Sven Tarp, Laura de la Rosa Marabet and Bjarne Stumman. Encyclopedic Dictionary of Gene Technology. English-Spanish. Toronto: Lugus 1998. With Ilse Cantell, Ruth V. Fjeld, Dag Gundersen, Jón Hilmar Jónsson and Bo Svensén. Nordisk leksikografisk ordbok. Oslo: Nordisk forening for leksikografi 1997. With Suzy Rajaonarivo. Deutsch-Madagassisches Wörterbuch/Rakibolana AlemaMalagasy. Moers: Aragon 1994. With Ilse Cantell, Ruth V. Fjeld, Dag Gundersen and Jón Hilmar Jonsson. Ordliste til Nordisk leksikografisk ordbog (NLO). Oslo: Nordisk forening for leksikografi 1993. Dansk frekvensordbog. Baseret på tekster fra danske romaner, ugeblade og aviser fra 1987–1990. Copenhagen: Gad 1992. With Uwe Kaufmann. Genteknologisk ordbog. Dansk-engelsk/engelsk-dansk molekylær-biologi og DNA-teknologi. Copenhagen: Gad 1992. With Suzy Rajaonarivo. Madagassish-Deutsches Wörterbuch/Rakibolana Malagasy-Alema. Moers: Aragon 1991. Frekvensordbog. Baseret på danske romaner, ugeblade og aviser fra 1987–1988. Aarhus: Aarhus School of Business 1989. 2. Theoretical contributions 2009 “Language policy.” In: Dictionaries. An international encyclopedia of lexicography. Supplementary volume: Recent developments with special focus on computational lexicography. Berlin: Walter de Gruyter (forthcoming). With Sven Tarp. “LSP Lexicography or Terminography? The lexicographer’s point of view.” In Future Trends in Specialised Dictionaries for Learners. A Festschrift in honour of Enrique Alcaraz Varó, Pedro Fuertes-Olivera (ed.) (forthcoming).

 Sandro Nielsen and Sven Tarp

With Loránd-Levente Pálfi. “Das erste dänische Nationalwörterbuch.” In Die großen Lexika Europas, unknown editors. Berlin: Walter de Gruyter (forthcoming). 2008 With Carsten Bergenholtz and Sven Tarp. “Leksikografi i videnskabsteoretisk perspektiv: sand, falsk eller irrelevant.” LexicoNordica. 2008–15: 155–168. With Loránd-Levente Pálfi. “Danmarks første nationalordbog: Om Videnskabernes Selskabs Ordbog I-VIII (1793–1905).” Fund og Forskning i det Kongelige Biblioteks samlinger. 2008, vol. 47: 181–222 “Hvilke krav skal en e-ordbog opfylde for at kunne anvendes som brugbart værktøj?” DF Revy. 2008, vol. 5: 12–14. “Ordbogsanmeldelser før og nu.” In Nog orddat? Festskrift for Sven-Göran Malmgren, K. Jóhannesson, H. Landqvist, A. Lundqvist, L. Rogström, E. Sköldberg, B.W. Hemlin (eds), 50–58. Göteborg: Meijerbergs institut för svensk etymologisk forsk ning 2008. With Rufus Gouws. “The access process in dictionaries for fixed expressions.” Lexicographica 23/2007: 237–260. With Sven-Göran Malmgren. “Ordbogsbrug i Norden.” LexicoNordica 2008, vol. 15: 1–4. “Von Wortverbindungen, die sie Kollokationen nennen.” Lexicographica. 24/2008: 9–20. 2007 With Richard Almind. “Fysisk genstand med ben og bagdel: funktionel fraseologi.” In: Det bedre argument: Festskrift til Ole Togeby, H. Jørgensen, P. Widell (eds), 13–34. Aarhus: Wessel og Huitfeldt 2007. With Inger Bergenholtz. “A timeless music dictionary.” Lexikos 2007, vol. 17: 407– 415. “Concrete treatment of culture bounded terms and collocations in translational dictionaries.” In 6th Symposium on Translation, Terminology and Interpretation in Cuba and Canada. Toronto: Canadian Translators, Terminologists and Interpreters Council 2007.

Bibliovita 

With Wolfgang Koch. “Hermes in the past and in the future.” Hermes 2007, vol. 39: 7–9. With Rufus Gouws. “Korrek, volledig, relevant: Dít is die fraag aan leksikografiese definisies.” Tydskrif vir Geesteswetenskappe 2007, vol. 47, no. 4: 568–586. With Inger Bergenholtz. “Kvaliteten af angivelser i encyklopædier og leksika.” LexicoNordica 2007, vol. 14: 11–34. With Mia Johnsen. “Log files can and should be prepared for a functionalistic approach.” Lexikos 2007, vol. 17: 1–20. With Sven-Göran Malmgren. “Nutidens och framtidens lexikon och encyklopedier.” LexicoNordica 2007, vol. 14: 1–9. “Om at parkere aben og andre fraseologiske udtryk.” In 11. Møde om Udforskningen af Dansk Sprog. Aarhus: Nordisk Institut, Aarhus Universitet 2007: 37–51. With Sven Tarp. “Politik und Sprachpolitik in der Lexikographie.” In Dictionary Visions, Research and Practice: Selected papers from the 12th International Symposium on Lexicography, Copenhagen 2004, H. Gottlieb, J.E. Mogensen (eds), 217– 240. Amsterdam/Philadelphia: John Benjamins 2007. With Esben Bjærge. “Værkstedsrapport.” In Ordbogen over Faste Vendinger, H. Bergenholtz et al. (eds). Aarhus: Centre for Lexicography, Aarhus School of Business 2007. With Loránd-Levente Pálfi. “Dansk Glossarium – til at hjelpe de af Lægfolk tilrette, som gjerne gad læst en dansk Bog.” Nomos 2007, vol. 5, no. 1: 55–91. 2006 With Richard Almind and Vibeke Vrang. “Theoretical and Computational Solutions for Phraseological Lexicography.” Linguistik online 2006, vol. 27, no. 2/06. With Vibeke Vrang. “Den Danske Ordbog: en ordbog for lingvister!” LexicoNordica 2006, vol. 13: 185–196. With Jan Engberg. “Hermes in the future.” Hermes 2006, vol. 36: 7–8. With Sven-Göran Malmgren. “Historiske ordbøger.” LexicoNordica 2006, vol. 13: 1–5. “How to do language policy with dictionaries.” Lexikos 2006, vol. 16: 13–45.

 Sandro Nielsen and Sven Tarp

“Idiomwörterbücher und ihre Benutzer.” In Wörter – Verbindungen: Festschrift für Jarmo Korhonen, U. Breuer and I. Hyvärinen (eds), 19–30. Frankfurt am Main: Peter Lang 2006. With Mia Johnsen. “Language Policy and Communication Policy – Same Same but Different?” Hermes 2006, vol. 37: 95–114. With Rufus Gouws. “Lexicography and Language Policy.” In African Association for Lexicography. Afrilex 2006. The User Perspective in Lexicography. Pretoria: SF Press 2006: 25–26. “Ny dansk disputats med forslag til en generel metaleksikografisk teori.” LEDA-nyt 2006, nr. 42: 31–32. “Sprachkultur, Lexikographie und Wörterbuchbenutzung: Bemerkungen zu Jürgen Scharnhorst (Hrsg.): Sprachkultur und Lexikographie. Von der Forschung zur Nutzung von Wörterbüchern. Frankfurt am Main: Peter Lang 2004.” Hermes 2006, vol. 37: 131–137. With Sandro Nielsen. “Subject-field components as integrated parts of LSP dictionaries.” Terminology 2006, vol. 12, no. 2: 281–303. “Towards a Definition of ‘communication policy’, ‘language policy’, and ‘language planning’.” Stellenbosch Papers in Linguistics. SPIL PLUS 2006, no. 34: 1–34. “US moet nog ver vorder.” Die Burger 25 March 2006: 8. 2005 “Alt for lidt om meget.” LexicoNordica 2005, vol. 12: 153–164. With Vibeke Vrang. “Den Danske Ordbog bind 2 (E-H) og 3 (I-L) – en ordbog for folket eller for akademikere?” LexicoNordica 2005, vol. 12: 169–188. “Den usynlige elektroniske produktions- og korrekturordbog.” LexicoNordica 2005, vol. 12: 19–38. “Deskriptive, præskriptive og proskriptive angivelser om substantivers pluralis.” In Nordiske studier i leksikografi 7: Rapport fra Konferance om leksikografi i Norden, R.V. Fjeld and D. Worren (eds), 36–51. Oslo: Nordisk forening for leksikografi 2005. With Sven Tarp. “Dictionaries and inflectional morphology.” In Encyclopedia of Language and Linguistics, Vol. 3, K. Brown (ed), 577–580. Oxford: Elsevier 2005.

Bibliovita 

With Sven Tarp. “Dictionaries and word formation.” In Encyclopedia of Language and Linguistics, Vol. 3, K. Brown (ed), 580–583. Oxford: Elsevier 2005. With Sven Tarp. “Electronic dictionaries: old and new lexicographic solutions.” Hermes 2005, vol. 34: 7–9. “Eva Wiesmann: Rechtsübersetzung und Hilfsmittel zur Translation. Wissenschaftliche Grundlagen und computergestützte Umsetzung eines lexikographischen Konzepts.” Hermes 2005, vol. 35: 259–266. “Falsche und richtige lexikographische Definitionen.” In Symposium on Lexicography XI. Proceedings of the Eleventh International Symposium on Lexicography May 2–4, 2002 at the University of Copenhagen, H. Gottlieb, J.E. Mogensen, A. Zettersten (eds), 125–132. Tübingen: Max Niemeyer 2005. With Sven-Göran Malmgren. “Förord.” LexicoNordica 2005, vol. 12: 1–6. With Sandro Nielsen. “Henning Koch og Anne Lise Kjær (red.): Europæisk rets kultur – på dansk. København: Forlaget Thomson, 2004.” Hermes 2005, vol. 36: 181–189. With Jan Engberg. “Hermes introduces new initiatives.” Hermes 2005, vol. 35: 9–10. “Lemmaselektion in deutschen und deutsch-finnischen Wörterbüchern.” In Schreiben, Verstehen, Übersetzen und Lernen: Zu ein- und zweisprachigen Wörterbüchern mit Deutsch, I. Barz, H. Bergenholtz and J. Korhonen (eds), 147–164. Frankfurt am Main/Bern/New York/Paris: Peter Lang 2005. “Marta Chromá: Legal Translation and the Dictionary.” Hermes 2005, vol. 35: 255– 258. “Metaleksikografiske bidrag om bilingvale juridiske ordbøger.” LexicoNordica 2005, vol. 12: 165–168. With Sven Tarp. “Política Lingüística: Conceptos y definiciones.” In 5th Symposium on Translation, Terminology and Interpretation in Cuba and Canada. Ontario, Canada: Canadian Translators, Terminologists and Interpreters Council 2005: 1–19. Co-edited with Irmhild Barz and Jarmo Korhonen. Schreiben, Verstehen, Übersetzen und Lernen: Zu ein- und zweisprachigen Wörterbüchern mit Deutsch. Frankfurt am Main: Peter Lang 2005. With Sven Tarp. “Verteilungsstrukturen in Wörterbüchern.” In Schreiben, Verstehen, Übersetzen und Lernen: Zu ein- und zweisprachigen Wörterbüchern mit Deut-

 Sandro Nielsen and Sven Tarp

sch, I. Barz, H. Bergenholtz and J. Korhonen (eds), 119–126. Frankfurt am Main: Peter Lang 2005. With Irmhild Barz and Jarmo Korhonen. “Vorwort.” In Schreiben, Verstehen, Übersetzen und Lernen: Zu ein- und zweisprachigen Wörterbüchern mit Deutsch, I. Barz, H. Bergenholtz and J. Korhonen (eds), 9–10. Frankfurt am Main: Peter Lang 2005. “Wortbildungen im Text und im Wörterbuch.” In Zwischen Lexikon und Text. Lexikalische, stilistische und textlinguistische Aspekte, U. Fix, G. Lerchner, M. Schröder and H. Wellmann (eds), 125–143. Stuttgart: Verlag der Sächsischen Akademie der Wissenschaften zu Leipzig 2005. With Sven Tarp. “Wörterbuchfunktionen.” In Schreiben, Verstehen, Übersetzen und Lernen: Zu ein- und zweisprachigen Wörterbüchern mit Deutsch, I. Barz, H. Bergenholtz and J. Korhonen (eds), 11–25. Frankfurt am Main: Peter Lang 2005. With Mia Johnsen. “Log files as a tool for improving Internet dictionaries.” Hermes 2005, vol. 34: 117–141. Co-edited with Sven Tarp. Thematic Section: Online Dictionaries. Aarhus School of Business: Fakultet for Sprog og Erhvervskommunikation 2005. 2004 “Den Danske Ordbog er en helt ny ordbog i seks store bind.” LEDA Nyt 2004, vol. 34: 4–5. With Vibeke Vrang. “Den danske ordbog imponerer og skuffer.” Hermes 2004, vol. 33: 149–178. With Sven-Göran Malmgren. “Forord.” LexicoNordica 2004, vol. 11: 1–4. With Vibeke Vrang. “Ny dansk ordbog i seks bind for sekretærer og forskere.” LexicoNordica 2004, vol. 11: 165–190. “Spannende Geschichte der deutschen Wörterbuchgeschichte? Zu Ulrike HaßZumkehr: Deutsche Wörterbücher – Brennpunkt von Sprach- und Kulturge schichte Berlin/New York 2001: Walter de Gruyter.” Neuphilologische Mitteilungen 2004, vol. 104: 506–509. “Sprachpolitik: Also: Deutsch sprechen Deutsche in deutschen Landen. Aber warum schreibt man manchmal Deutsch und manchmal deutsch?” EliS_e, Temanummer gewidmet K.-D.Bünting zum 65sten Geburtstag 2004, vol. 4, no. 1: 43– 54.

Bibliovita 

With Sven Tarp. “The concept of ‘dictionary usage’.” In Nordic Journal of English Studies. Worlds of Words. A tribute to Arne Zettersten, Cay Dollerup (ed). Special issue 2004, vol. 3, no. 1: 23–36. “Weiwei Wang: Zweisprachige Fachlexikographie. Benutzungsforschung, Typologie und mikrostrukturelle Konzeption. Frankfurt a.M.: Peter Lang 2002.” Lexicographica 19/2003: 333–378. “Wörterbuchkritik in den nordischen Ländern: Bericht über ein Symposium in Kopenhagen am 8.2 und 9.2.2003.” Lexicographica 19/2003: 314–319. 2003 With Richard Almind. “Internetordbøgers layout.” In Nordiske studier i leksikografi 6. Rapport fra Konference om leksikografi i Norden, Z.S. Hansen and A. Johansen (eds), 15–29. Tórshavn: Nordisk forening for leksikografi 2003. “Bryder Dansk Sprognævn den danske sproglov? Sprogpolitik i teori og praksis.” In Från Närpesdialekt till EU-svenska. Festskrift till Kristina Nikul, H. Lönnroth (ed), 17–31. Tammerfors: Tampere University Press 2003. “Die Entwicklung der Lemmaselektion.” In Untersuchungen zur kommerziellen Lexikographie der deutschen Gegenwartssprache I. »Duden. Das große Wörterbuch der deutschen Sprache in zehn Bänden«, H.E. Wiegand (ed), 83–98. Tübingen: Max Niemeyer 2003. With Sven Tarp. “Die moderne lexikographische Funktionslehre: Diskussionsbeitrag zu neuen und alten Paradigmen, die Wörterbücher als Gebrauchsgegenstände verstehen.” Lexicographica 18/2002: 253–263. “Förord.” LexicoNordica 2003, vol. 10:1–5. “Ordbogskritik i LexicoNordica.” LexicoNordica 2003, vol. 10: 7–26. “Sprachnormierung in nordischen Wörterbüchern: Bericht über ein Symposium in Kopenhagen am 8.-10. Februar 2002.” Lexicographica 18/2002: 264–269. With Jonna Bisgaard, Majken Brunsborg and Kamilla Kvist Wichmann. “Sprogpolitik: So ein Ding müssen wir auch haben.” Hermes 2003, vol. 31: 135–166. With Sandro Nielsen. “Terms in the Language of Culture-Dependent LSP Dictionaries.” Lexicographica 18/2002: 5–18. With Sven Tarp. “Two opposing theories: On H.E. Wiegand’s recent discovery of lexicographic functions.” Hermes 2003, vol. 31: 171–196.

 Sandro Nielsen and Sven Tarp

“User-oriented Understanding of Descriptive, Proscriptive and Prescriptive Lexicography.” Lexikos 2003, vol. 13: 65–80. 2002 “Das de Gruyter Wörterbuch Deutsch als Fremdsprache und das neue DUDENWörterbuch in zehn Bänden. Ein Vergleich im Hinblick auf die Grammatik.” In Perspektiven des pädagogischen Lexikographie des Deutschen II. Untersuchungen anhand des de Gruyter Wörterbuches Deutsch als Fremdsprache, H.E. Wiegand (ed), 36–56. Tübingen: Max Niemeyer 2002. With Sven-Göran Malmgren. “Forord.” LexicoNordica 2002, vol. 9: 1–4. With Jan Engberg. “Forord til Hermes 29 og NetHermes 29.” Hermes 2002, vol. 29: 9–11. “Hemmelighed om kvinder og penge.” Finans. Fagblad for Finansforbundet 2002, no. 6. With Christina Bøgelund. “Hvor præskriptiv er en deskriptiv ordbog? Hvor deskriptiv er en præskriptiv ordbog?” LexicoNordica 2002, vol. 9: 79–108. Introduction to Morphology (in Korean) (translation of book from 1979). Younghyundong, Namgu, Incheon. Korea: Inha University Press 2002. “Leksikologiske analyser og beskrivelser ctr. leksikografiske angivelser. Bemærkninger til Undine Kramer: Lexikologisch-lexikographische Aspekte der deutschen Gegenwartssprache. Berlin 2000.” Hermes 2002, vol. 29: 21–31. With Vibeke Vrang. “Ret og pligt. Om nye danske retskrivningsordbøger.” Hermes 2002, vol. 29: 197–216. “Towards a User-oriented Understanding of Descriptive, Proscriptive and Prescriptive Lexicography.” Afrilex 2002. Culture and Dictionaries. 2002: 12–13. 2001 With Richard Almind. “Orddelingsregler.” Festskrift til Martin Gellerstam den 15. oktober 2001. Gäller stam, suffix och ord, S. Allén, S. Berg, S.-G. Malmgren, K. Norén and B. Ralph (eds), 19–32. Göteborg: Göteborgs Universitet, Meijerbergs institut för svensk etymologisk forskning 2001.

Bibliovita 

With Irma Hyrvärinen and Jarmo Korhonen. “Bericht über Internationale Lexikografiekonferenz in Helsinki, 30.3 bis 1.4.2000.” Lexicographica 16/2000: 66– 72. With Henning Andersen. “Brug af surveys til lingvistiske undersøgelser. Om Erik Jørgen Hansen & Bjarne Hjort Andersen: Et sociologisk værktøj: introduktion til den kvantitative metode.” Hermes 2001, vol. 27: 201–209. With Jan Engberg. “Forord.” Hermes 2001, vol. 26: 9–10. “Förord.” LexicoNordica 2001, vol. 8: 1–4. “Kontrastive Analysen Deutsch-Madagassisch: Eine Übersicht.” In: Deutsch als Fremdsprache. Ein internationales Handbuch, G. Helbig, L. Götze, G. Henrici and H.-J. Krumm (eds), 470–480. Berlin/New York: Walter de Gruyter 2001. With John Bergenholtz. “Lars Eriksen: Tysk-dansk computerordbog med de engelske fagudtryk/Deutsch-dänisches Computerwörterbuch mit den englischen Fachausdrücken. 2000.” Hermes 2001, vol. 27: 193–200. “Leksikon eller encyklopædi – støtte eller ej.” Jyllands-Posten 7 September 2001. “Lexikografie ist Selektion ist Selektion ist Selektion....” In Von der mono- zur bilingualen Lexikografie für das Deutsche, J. Korhonen (ed), 11–30. Frankfurt am Main: Peter Lang 2001. “Lexikographische Wörterbücher. Werkstattbericht am Beispiel der Termini Deskription und Präskription.” In Germanistentreffen. Deutschland – Dänemark – Finnland – Island – Norwegen – Schweden 9.-13.10.2000. Dokumentation der Tagungsbeiträge. Bonn: DAAD 2001: 179–195. “Mindre kan være bedre end mere.” VID-Nyt. Tidsskrift for Voksenundervis ningssektionen i Dansklærerforeningen 2001, no. 15: 31–34. “Proskription, oder: So kann man dem Wörterbuchbenutzer bei Textproduktionsschwierigkeiten am ehesten helfen.”. In Sprache im Alltag. Beiträge zu neuen Perspektiven in der Linguistik. Herbert Ernst Wiegand zum 65. Geburtstag gewidmet, A. Lehr, M. Kammerer, K.-P. Konderding, A. Storrer, C. Thimm and W. Wolski (eds), 499–520. Berlin/New York: Walter de Gruyter 2001. Co-edited with Karen M. Lauridsen, Jan Engberg. Text & Translation. Aarhus: Aarhus School of Business, Det Erhvervssproglige Fakultet 2001.

 Sandro Nielsen and Sven Tarp

2000 With Richard Almind. “Die ästhetische Dimension der Lexikographie.” In Bild im Text – Text und Bild, U. Fix and H. Wellmann (eds), 259–288. Heidelberg: Universitätsverlag C. Winther 2000. With Richard Almind. “Orddeling: Principper og omsætning til leksikografisk praksis.” LexicoNordica 2000, vol. 7: 133–148. With Sven-Göran Malmgren. “Förord.” LexicoNordica 2000, vol. 7: 1–4. “Hvordan er ordforrådet systematiseret i sprogbrugernes hjerner? Anmeldelse af: Gerhard Augst: Wortfamilienwörterbuch der deutschen Gegenwartssprache. Tübingen: Max Niemeyer 1998.” LexicoNordica 2000, vol. 7: 205–212. With Sanne Jensen. “Inddragelse af informanter ved ordbogsarbejde.” LexicoNordica 2000, vol. 7: 149–166. “Informantenbefragung zur Auffindung von grammatischen Regeln für Textproduktion, für Textrezeption und für die Erlernung einer Sprache. Reflexionen zu: Bengt Sandberg: Zum ‘es’ bei transitiven Verben vor satzförmigem Akkusativobjekt. Tübingen: Narr, 1998.” Hermes 2000, vol. 25: 143–158. With Udo Miebs. “Lernerwörterbuch und Lernwörterbuch. Zu ‘de Gruyter Wörterbuch Deutsch als Fremdsprache’. Anmeldelse af Günter Kempcke m.fl.: Wörterbuch Deutsch als Fremdsprache. Berlin/New York: Walter de Gruyter, 2000.” Hermes 2000, vol. 26: 159–170. “Lexikographie und Wortbildungsforschung.” In: Praxis- und Integrationsfelder der Wortbildungsforschung, I. Barz, M. Schröder and U. Fix (eds), 19–30. Heidelberg: Winter 2000. “Nullelemente in der Morphologie.” In Morphologie: ein internationales Handbuch zur Flexion und Wortbildung/Morphology: An International Handbook on Inflection and Word-Formation, G. Booij, C. Lehmann and J. Mugdan (eds), 435–450. Berlin/New York: Walter de Gruyter 2000. With Irma Hyrvärinen and Jarmo Korhonen. “Om den ‘Internationale Lexikografiekonferenz’ i Helsinki.” LexicoNordica 2000, vol. 7: 195–204. “So ein Ding müssen wir auch haben. Anmeldelse af: Wolfgang Müller: Das Gegenwort-Wörterbuch. Ein Kontrastwörterbuch mit Gebrauchshinweisen. Berlin/ New York: Walter de Gruyter 1998.” LexicoNordica 2000, vol. 7: 213–222.

Bibliovita 

1999 “Das schlaue Buch. Vermittlung von Informationen für textbezogene und textunabhängige Fragestellungen.” In Proceedings of the Eigth International Symposium on Lexicography at the University of Copenhagen 1998, A. Zettersten, J.E. Mogensen and V.H. Pedersen (eds), 93–110. Tübingen: Max Niemeyer 1999. With Jette Pedersen. “Fachwörterbücher als Hilfsmittel bei der Übersetzung von Fachtexten.” In Fachsprachen. Ein internationales Handbuch zur Fachsprachenforschung und Terminologiewissenschaft, L. Hoffmann, H. Kalverkämper and H.E. Wiegand (eds), 1884–1889. Berlin/New York: Walter de Gruyter 1999. With Sven-Göran Malmgren. “Förord.” LexicoNordica 1999, vol. 6: 1–2. With Jan Engberg. “Hermes i sit 11. år.” Hermes 1999, vol. 22: 9–11. “Interjektionen im Kontrast. Am Beispiel der deutschen, madagassischen, englischen und französischen Sprache/Janie Noëlle Rasoloson. Frankfurt a.M.: Peter Lang.” Lexicographica 14/1998: 241–243. “Interjektionen im Kontrast. Am Beispiel der deutschen, madagassischen, englischen und französischen Sprache/Janie Noëlle Rasoloson. Frankfurt a.M. usw.: Peter Lang.” LexicoNordica 1999, vol. 6: 282–284. With Hans-Peder Kromann, Herbert Ernst Wiegand. “Die Berücksichtigung der Fachlexikographie in der neueren Wörterbuch- und Fachsprachenforschung: eine sachliche und bibliographische Übersicht.” In Fachsprachen: Ein internationales Handbuch zur Fachsprachenforschung und Terminologiewissenschaft, L. Hoffmann, H. Kalverkämper, H.E. Wiegand together with Christian Galinski and Werner Hüllen (eds), 1889–1909. Berlin/New York: Walter de Gruyter 1999. 1998 With Richard Almind. “Integreret cd-rom- og Internet-koncept til elektronisk referenceværk.” In Elefant – se også myg:Festskrift til Jens Axelsen, A. Garde, P. Jarvad and K.T. Thomsen (eds), 203–215. Copenhagen: Gyldendal 1998. “Center for Leksikografi.” Hermes 1998, vol. 20: 181–187. With Sven Tarp and Herbert Ernst Wiegand. “Datendistributionsstrukturen, Makro- und Mikrostrukturen in neueren Fachwörterbüchern.” In Fachsprachen: ein internationales Handbuch zur Fachsprachenforschung und Terminologiewissenschaft, L. Hoffmann, H. Kalverkämper and H.E. Wiegand (eds), 1762–1832. Berlin/New York: Walter de Gruyter 1998.

 Sandro Nielsen and Sven Tarp

“Deskriptiv, proskriptiv og præskriptiv leksikografi.” In Normer og regler: festskrift til Dag Gundersen, R.V. Fjeld and B. Wangensteen (eds), 79–102. Oslo: Nordisk forening for leksikografi 1998. With Jens Erik Mogensen. “Die Grammatik der Verben in Langenscheidts Grosswörterbuch Deutsch als Fremdsprache.” In Perspektiven der pädagogischen Lexiko graphie des Deutschen. Untersuchungen anhand von Langenscheidts Grosswörterbuch Deutsch als Fremdsprache, H.E. Wiegand (ed), 77–87. Tübingen: Max Niemeyer 1998. With Gregor Meder. “Die äussere Selektion in Langenscheidts Grosswörterbuch Deutsch als Fremdsprache.” In Perspektiven der pädagogischen Lexikographie des Deutschen, Untersuchungen anhand von Langenscheidts Grosswörterbuch Deutsch als Fremdsprache, H.E. Wiegand (ed), 285–296. Tübingen: Max Niemeyer 1998. With Sven-Göran Malmgren. “Förord.” LexicoNordica 1998, vol. 5: 1–3. With Finn Frandsen. “Hermes. Tidsskrift for Sprogforskning 1988–1998.” Hermes 1998, vol. 21: 9–13. “Linguistische und lexikographische Fachwörterbücher in Skandinavien.” Lexicographica 13/1997: 3–16. “Norsk Ordbok – nynorskens leksikografiske kanon? Rapport frå eit seminar på Blindern 31. mai 1996.” Norsk Lingvistisk Tidsskrift 1998: 289–290. “Studien zur kontextuellen Fachlexikographie. Das deutsch- französische Wörterbuch der Rechnungslegung/Franz Schneider.” LexicoNordica 1998, vol. 5: 285– 286. “Variantangivelser i en dansk produktionsordbog ud fra eksempler med fleksions angivelser i Retskrivningsordbogen.” Hermes 1998, vol. 21: 95–119. “Wörterbücher für kleine Sprachen in den nordischen Ländern: Bericht über ein Symposium in Kopenhagen am 30.11 und 1.12 1996.” Lexicographica 13/1997: 280– 284. 1997 With Sven-Göran Malmgren. “Förord.” LexicoNordica 1997, vol. 4: 1–3. With Annelies van Hees. “Gammel vin på ny flaske.” Hermes 1997, vol. 18: 246– 250. “Polyfunktionale ordbøger.” LexicoNordica 1997, vol. 4: 15–29.

Bibliovita 

With Uwe Kaufmann. “Terminography and lexicography. A critical survey of dictionaries from a single specialised field.” Hermes 1997, vol. 18: 91–125. With Finn Frandsen. “At anmelde i Hermes – tidsskrift for sprogforskning.” Hermes 1997, vol. 19: 169–184. 1996 “Dansk sprogbrug. En stil- og konstruktionsbog/Erik Bruun. Kbh.: Gyldendal, 1995.” Hermes 1996, vol. 16: 223–225. With Uwe Kaufmann. “Enzyklopädische Informationen in Wörterbüchern.” In Semantik, Lexikographie und Computeranwendungen, N. Weber (ed), 167–180. Tübingen: Max Niemeyer 1996. With Uwe Kaufmann. “Forklaring følger.” Mosaik 1996, no. 8: 17. With Sven-Göran Malmgren. “Förord.” LexicoNordica 1996, vol. 3: 1–4. “Grundfragen der Fachlexikographie.” In Proceedings I-II: papers submitted to the 7th EURALEX International Congress on Lexicography in Göteborg, Sweden, M. Gellerstam, J. Järborg, S.-G. Malmgren, K. Norén, L. Rogström and C.R. Papmehl (eds) 731–758. Göteborg: Göteborg University 1996. “Håndbogens dage er talte.” Info 1996, vol. 5, no. 3: 2–3. “Korpusbaseret leksikografi.” LexicoNordica 1996, vol. 3: 5–17. “Korpusbasierte Lexikographie: Bericht über ein Symposium in Kopenhagen am 10.2 und 11.2 1996.” Lexicographica 12/1996: 255–260. “Norsk international ordbog: anmeldelse af: Norsk ordbok: ordbok over det norske folkemålet og det nynorske skriftmålet. Band III. Flusker-Gigla. Reidar Bø, Arnbjørg Hageberg, Laurits Killingbertrø, Sigurd Norlie, Gunnar Pedersen (red.). Oslo: Det Nors.” Norsk Lingvistisk Tidsskrift 1996, no. 14: 75–84. “Norsk Riksmålsordbok: Bind V. Første tilleggsbind A – hogstplass. Bind VI. Annet tilleggsbind hohenstaufer – a-y. Udarb. af Trygve Knudsen, Alf Sommerfelt, Harald Noreng. Oslo: Kunnskapsforlaget, 1995.” LexicoNordica 1996, vol. 3: 195– 207. With Joachim Mugdan. “Wortstrukturen.” In Sprachwissenschaft. Ein Reader, L. Hoffmann (ed) 417–426. Berlin/New York: Walter de Gruyter 1996.

 Sandro Nielsen and Sven Tarp

1995 “Dansk grundordbog. Basic dictionary of Danish. Shortcut to the Danish language. Genvej til det danske sprog. Cay Dollerup og Inge Padkær Nielsen. Kbh.: Høst & Søn, 1994.” LexicoNordica 1995, vol. 2: 147–156. With Sven-Göran Malmgren. “Förord.” LexicoNordica 1995, vol. 2: 1–3. With Jens Erik Mogensen. “Geschichte der Lexikographie mit Deutsch und Dänisch.” In Studien zur zweisprachigen Lexikographie mit Deutsch II, H.E. Wiegand (ed), 191–222. Hildesheim/New York: Georg Olms 1995. “Grammatik i bilingvale ordbøger.” LexicoNordica 1995, vol. 2: 5–18. “Grammatik in bilingualen Wörterbüchern: Bericht über ein Symposium in Kopenhagen am 18.2 und 19.2.1995.” Lexicographica 11/1995: 252–258. “Hvor mange ordbøger er der brug for?” Info 1995, vol. 4, no. 7: 20–22. “Leksikografi: hvad er det?” In Nordiske studier i leksikografi III, J.H. Jónsson (ed), 37–49. Reykjavik: Nordisk forening for leksikografi 1995. “LGP and LSP.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 16–20. Amsterdam/Philadelphia: John Benjamins 1995. “Specialised lexicography.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 28–31. Amsterdam/ Philadelphia: John Benjamins 1995. “Material for the dictionary.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 90–96. Amsterdam/Philadelphia: John Benjamins 1995. “Lexicographical instructions.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 96–97. Amsterdam/Philadelphia: John Benjamins 1995. “Lemma selection.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 98–104. Amsterdam/Philadelphia: John Benjamins 1995. “Linguistic information.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 111–142. Amsterdam/ Philadelphia: John Benjamins 1995.

Bibliovita 

“Further work on the dictionary.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 224–231. Amsterdam/Philadelphia: John Benjamins 1995. “Dictionary criticism.” In Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries, H. Bergenholtz and S. Tarp (eds), 232–235. Amsterdam/ Philadelphia: John Benjamins 1995. Co-edited with Sven Tarp. Manual of Specialised Lexicography. The Preparation of Specialised Dictionaries. Amsterdam/Philadelphia: John Benjamins 1995. “Nordiske Studier i leksikografi II. København 1994.” Hermes 1995, vol. 14: 236– 238. “Ordbøgers funktion og æstetik.” In Sprogets funktion og æstetik, P. Skyum-Nielsen (ed), 55–88. Copenhagen: Gad 1995. “Oversættelse af fagsproglige tekster.” Info 1995, no. 7: 17–18. With Jan Engberg. “Schwerpunkte der neueren Fachsprachenforschung in Dänemark.” Fachsprache 1995, no. 12: 55–62. With Jan Engberg. “Tendenser inden for den nyere fagsprogsforskning i Danmark.” Hermes 1995, vol. 15: 179–206. “Wodurch unterscheidet sich Fachlexikographie von Terminographie?” Lexicographica 11/1995: 50–59. With Karl Dieter Bünting. Einführung in die Syntax: Grundbegriffe zum Lesen einer Grammatik. Weinheim: Beltz/Athenäum 1995. 1994 “Almensprog og fagsprog.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 12–16. Herning: Systime 1994. “Fagleksikografi.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 25–28. Herning: Systime 1994. “Materiale til ordbogen.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 90–97. Herning: Systime 1994.

 Sandro Nielsen and Sven Tarp

“Leksikografiske instruktioner.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 97–98. Herning: Systime 1994. “Lemmaselektion.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 99–105. Herning: Systime 1994. “Sproglige oplysninger.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 112–144. Herning: Systime 1994. “Yderligere arbejde med ordbogen.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 246– 253. Herning: Systime 1994. “Ordbogskritik.” In Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag, H. Bergenholtz and S. Tarp (eds), 254–257. Herning: Systime 1994. Co-edited with Sven Tarp. Manual i fagleksikografi. Udarbejdelse af fagordbøger – problemer og løsningsforslag. Herning: Systime 1994. With Jette Pedersen and Sven Tarp. “Basic Issues in LSP Lexicography.” In Translating LSP Texts, H. Bergenholtz (ed), 151–187. Copenhagen: Copenhagen Business School 1994. “Beispiele in Fachwörterbüchern.” In Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds), 421–439. Tübingen: Gunter Narr 1994. “Die empirische Basis zweisprachiger Wörterbücher (mit madagassisch-deutschen Beispielen).” In Portugiesisch-Deutsche Lexikographie. Grundlagen, Makro- und Mikrostruktur, Computerunterstützung, Anwendung, U.L. Figge (ed), 47–63. Tübingen: Max Niemeyer 1994. “En almen alordbog? Anmeldelse af: Der Brockhaus in fünf Bänden. Achte, neu bearbeitete Auflage. Erster Band: A-Eis 1993, Zweiter Band: Ney-Sil 1994.” LexicoNordica 1994, vol. 1: 249–251. With Burkhard Schaeder. “Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern. Ein Vorwort.” In Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds), 1–8. Tübingen: Gunter Narr 1994.

Bibliovita 

With Jette Pedersen. “Grammar in bilingual LSP dictionaries, with a special view to technical English.” In Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds), 351–383. Tübingen: Max Niemeyer 1994. “Fachsprache und Gemeinsprache: Lemmaselektion im Fachwörterbuch.” In Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds), 285–304. Tübingen: Gunter Narr 1994. “Zehn Thesen zur Fachlexikographie.” In Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds), 43–56. Tübingen: Gunter Narr 1994. With Jette Pedersen. “Zusammensetzung von Textkorpora für die Fachlexikographie.” In Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds), 161–176. Tübingen: Gunter Narr 1994. Co-edited with Burkhard Schaeder. Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern. Tübingen: Gunter Narr 1994. With Sven Tarp. “Mehrworttermini und Kollokationen in Fachwörterbüchern.” In Fachlexikographie. Fachwissen und seine Repräsentation in Wörterbüchern, B. Schaeder and H. Bergenholtz (eds), 385–419. Tübingen: Gunter Narr 1994. “Faglige oplysninger i monolingvale betydningsordbøger.” LexicoNordica 1994, vol. 1: 13–26. “Grammatik i ordbøger.” SPRÅU. Sprogvidenskabelige Arbejdspapirer fra Aarhus Universitet 1994: 115–124. “Grundordbog til genbrug. Anmeldelse af: Lexin. Språklexikon för indvandrare. Svenska ord – med uttal och förklaringar. Andre Upplagan 1993.” LexicoNordica 1994, vol. 1: 243–247. “Hvilken relation er der mellem terminologi og leksikografi? Anmeldelse af: Proceedings af 2. nordisk symposium om Terminologi, EDB & Vidensteknik.” Hermes 1994, vol. 13: 305–310. “Hvorfor ikke skrive af? Anmeldelse af: Flisnes, Leiv: Musikkordboken.” LexicoNordica 1994, vol. 1: 239–242. “Ordbøger skal anmeldes af fagfolk.” LEDA Nyhedsbrev 1994, vol. 14: 2–6. “Plan eines nordischen Wörterbuchs der Lexikographie. Ein Werkstattbericht.” In Proceedings of the Sixth International Symposium on Lexicography at the University

 Sandro Nielsen and Sven Tarp

of Copenhagen, K. Hyldgaard-Jensen and V.H. Pedersen (eds), 315–332. Tübingen: Max Niemeyer 1994. With Bo Svensén. “Systematisk inledning till Nordisk lexikografisk ordbog (NLO).” LexicoNordica 1994, vol. 1: 149–185 With Uwe Kaufmann and Sven Tarp. “Vore mænd i Havanna: Udarbejdelse af konception til spansk-engelsk genteknologisk ordbog.” Hermes 1994, vol. 13: 291– 304. With Jens Erik Mogensen. “Wörterbuchkritik in Dänemark.” Lexicographica 9/1993: 8–35. Translating LSP Texts. Conference papers for the OFT Symposium 11–12 April 1994. H. Bergenholtz (ed). Copenhagen: Copenhagen Business School 1994. 1993 With Jette Pedersen. “Tekstkorpora til brug ved udarbejdelse af fagordbøger.” In Proceedings af seminar om korpuslingvistik i fagsprogs-forskningen, G. Engel (ed), 37–55. Kolding: Handelshøjskole Syd 1993. 1992 “Ein Textkorpus der geschriebenen dänischen Gegenwartssprache.” Folia Scandinavica 1992, vol. 1: 89–94. With Uwe Kaufmann. “Konception af en ny fagordborg.” In Oversættelse af fagsproglige tekster. Indlæg fra Sandbjergkonferencen, 21.-22. november 1991, A.L. Jakobsen (ed), 183–203. Copenhagen: Handelshøjskolen i København 1992. With Richard Barlach and Wilhelm Gubba. “Steen Hjelmblink: Retsplejeordbog.” Language International 1992: 41–42. 1991 “Almensproglige informationer i fagordbøger.” In Nordiske studier i leksikografi. Rapport fra Konferanse om leksikografi i Norden 28.-31. mai 1991, R.V. Fjeld (ed), 244–259. Oslo: Nordisk forening for leksikografi 1991. With Ole Lauridsen. “Berücksichtigung und Einfluss der historischen Grammatik einschliesslich der Wortbildung im Deutschen Wörterbuch.” In Studien zum Deut-

Bibliovita 

schen Wörterbuch von Jacob Grimm und Wilhelm Grimm, Band I, A. Kirkness, P. Kühn and H.E. Wiegand (eds), 265–297. Tübingen: Max Niemeyer 1991. “Deutsche Grammatiken am Scheideweg.” In Wegweiser durch die Grammatik von Heinrich Bauer. Verzeichnisse und Erläuterungen, H. Bergenholtz, A. Dörner, R. Karatas and G. Meder (eds), 61–89. Berlin/New York: Walter de Gruyter 1991. “Kommentar zum grammatischen Index: wenn aber jeder dem Gegenstande einen andern Namen giebt, wo soll denn die Verständigung herkommen?” In Wegweiser durch die Grammatik von Heinrich Bauer. Verzeichnisse und Erläuterungen, H. Bergenholtz, A. Dörner, R. Karatas and G. Meder (eds), 123–129. Berlin/New York: Walter de Gruyter 1991. With Ramona Karatas, Andreas Dörner and Gregor Meder. Wegweiser durch die Grammatik von Heinrich Bauer. Verzeichnisse und Erläuterungen. Berlin: Walter de Gruyter 1991. “DK87-DK90: Dansk korpus med almensproglige tekster.” In 3. Møde om Udforsk ningen af Dansk Sprog, Aarhus Universitet 11.-12. Oktober 1990, M. Kunøe and E.V. Larsen (eds), 31–42. Aarhus: Aarhus Universitet 1991. With Uwe Kaufmann. “Leksikografi og molekylærbiologi.” Dansk Veterinær Tidsskrift 1991, vol. 74, no. 16: 619–622. “Lemmaselektion in zweisprachigen Wörterbüchern.” In Worte, Wörter, Wörterbücher. Lexikographische Beiträge im Essener linguistischen Kolloquium, G. Meder and A. Dörner (eds), 49–65. Tübingen: Max Niemeyer 1991. With Joachim Mugdan. “Munksgaards Tysk-Tysk Ordbog.” Hermes 1991, vol. 6: 139–160. With Uwe Kaufmann. “Ny dansk fagordbog om molekylærbiologi.” Forskning og Samfund 1991, vol. 2: 25–26. “Rechtschreibewörterbuch, bleib bei deinen Leisten.” In Rechtschreibwörterbücher in der Diskussion, G. Augst and B. Schaeder (eds), 403–417. Frankfurt am Main/ Bern/New York/Paris: Peter Lang 1991. 1990 With Richard Almind. “Klæder skaber folk: Ordbøgers lay-out.” Hermes 1990, vol. 4: 31–47. With Joachim Mugdan. “Formen und Probleme der Datenerhebung II: Gegenwartsbezogene synchronische Wörterbücher.” In Wörterbücher. Dictionaries. Dic-

 Sandro Nielsen and Sven Tarp

tionnaires. Ein internationales Handbuch zur Lexikographie. Zweiter Teilband, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds), 1611–1625. Berlin/ New York: Walter de Gruyter 1990. “Lexikographische Instruktionen für ein zweisprachiges Wörterbuch.” Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 1990, vol. 43: 31– 47. With Helle Vrønning Dam and Torben Henriksen. “Udarbejdelse af en spanskdansk juridisk ordborg: juridiske og sproglige problemer.” Hermes 1990, vol. 5: 127–136. 1989 With Joachim Mugdan. “Korpusproblematik in der Computerlinguistik: Konstruktionsprinzipien und Repräsentativität.” In Computional Linguistics. Computerlinguistik. An International Handbook on Computer Oriented Language Research and Applications. Ein internationales Handbuch zur computerunterstützten Sprachforschung und ihrer Anwendungen, E.S. Bátori, W. Lenders and W. Putschke (eds), 141–149. Berlin/New York: Walter de Gruyter 1989. With Sven-Olaf Poulsen. “Leksikografi på HHÅ: Udvikling og perspektiver.” In Handelshøjskolen i Århus 50 år: Festskrift i anledning af Handelshøjskolens 50-års jubilæum 31. august 1989, unknown editors, 110–117. Aarhus: Aarhus School of Business 1989. “Probleme der Selektion im allgemeinen einsprachigen Wörterbuch.” In Wörterbücher. Dictionaries. Dictionnaires. Ein internationales Handbuch zur Lexikographie. Erster Teilband, F.J. Hausmann, O. Reichmann, H.E. Wiegand and L. Zgusta (eds), 772–778. Berlin/New York: Walter de Gruyter 1989. With Sandro Nielsen. “Veje til bedre tosprogede ordbøger.” Hermes 1989, vol.2: 217–220. 1988 With Ann-Theres Faets. “Angest, Angst, vorhte, Furcht: Vorschläge für ein historisches Wörterbuch des Gefühlswortschatzes.” In Historisches Wörterbuch des Gefühlswortschatzes. Aspekte, Probleme und Beispiele seiner lexikographischen Erfassung, L. Jäger (ed), 56–94. Aachen: Rader 1988. With Sandro Nielsen. “Den hidtil største dansk-engelske/engelsk-danske fagordbog.” Hermes 1988, vol. 1: 251–255.

Bibliovita 

“DK87: Et korpus med dansk almensprog.” Hermes 1988, vol. 1: 229–237. “Empiriske metoder i lingvistisk forskning.” Hermes 1988, vol. 1: 7–23. “Projekt eines madagassisch-deutschen/deutsch-madagassischen Wörterbuchs.” GAL-Bulletin 1988, vol. 8: 53–60. 1986 With Joachim Mugdan. “Der neue ‘Super-Duden’ – die authentische Darstellung der deutschen Gegenwartssprache?” Studien zur neuhochdeutschen Lexikographie 1986, vol. 6, no. 1: 1–149. “Eva Pauline Diedrichs: Johann Bödikers Grund-Sätze der deutschen Sprache mit den Bearbeitungen von Johann Leonard Frisch und Johann Jakob Wippel.” Beiträge zur Geschichte der deutschen Sprache und Litteratur 1986, vol. 109: 426–428. 1985 “Kasuskongruenz der Apposition.” Beiträge zur Geschichte der deutschen Sprache und Literatur. 1985, vol. 107: 21–44. Henning Bergenholtz (ed.). Lexikographie und Grammatik. Akten des Essener Kolloquiums zur Grammatik im Wörterbuch 28.-30.6.1984. Tübingen: Max Niemeyer 1985. With Joachim Mugdan. “Linguistic Terms in German and English Dictionaries.” Lexicographica 1/1985: 3–23. With Marlis Becher. “Sei oder nicht sei. Probleme des Modusgebrauchs in der indirekten Rede.” Nouveaux Cahiers d’Allemand 1985, vol. 3: 443–457. “Vom wissenschaftlichen Wörterbuch zum Lernerwörterbuch.” In Lexikographie und Grammatik. Akten des Essener Kolloquiums zur Grammatik im Wörterbuch 28.-30.6.1984, H. Bergenholtz and J. Mugdan (eds), 225–256. Tübingen: Max Niemeyer 1985. With Joachim Mugdan. “Vorwort.” In Lexikographie und Grammatik. Akten des Essener Kolloquiums zur Grammatik im Wörterbuch, H. Bergenholtz and J. Mugdan, 7–19. Tübingen: Max Niemeyer 1985. With Joachim Mugdan. “Wortstrukturen.” In Nach-Chomskysche Linguistik, T. Ballmer and R. Posner (eds), 417–426. Berlin/New York: Walter de Gruyter 1985.

 Sandro Nielsen and Sven Tarp

1984 “Grammatik im Wörterbuch: Syntax.” In Studien zur neuhochdeutschen Lexikographie V, H.E. Wiegand (ed), 1–46. Hildesheim/New York: Georg Olms 1984. With Joachim Mugdan. “Grammatik im Wörterbuch: von Ja bis Jux.” In Studien zur neuhochdeutschen Lexikographie V, H.E. Wiegand (ed), 47–102. Hildesheim/ New York: Georg Olms 1984. “Grammatik im Wörterbuch: Wortarten.” In Studien zur neuhochdeutschen Lexikographie IV, H.E. Wiegand (ed), 19–72. Hildesheim/New York: Georg Olms 1984. 1983 With Bjarne Ulvestad. “Es als ‘Vorgreifer’ eines Objektsatzes, Teil II.” Deutsche Sprache 1983, vol. 1: 1–26. “Klaus-Michael Köpcke: Untersuchungen zum Genussystem der deutschen Gegenwartssprache.” Informationen zur Deutschdidaktik 1983, vol. 8: 123. “Sind systematische Textuntersuchungen eine unnötige Zeremonie?” Zeitschrift für germanistische linguistik 1983, vol. 11: 211–213. “Zur Terminologie und zur empirischen Basis.” Kopenhagener Beiträge zur germanistischen Linguistik 1983, vol. 21: 70–92. 1982 “Appositionssyntax im Deutschen.” In La linguistique a la session 1982 de l’agregation d’allemand. Journée annuelle des linguists de l’association des germanistes de l’enseignement supérieur, 12 decembre 1981, E. Faucher (ed), 61–104. Nancy: Nancy University 1982. With Joachim Mugdan. “Die Grammatik im Wörterbuch: Probleme und Aufgaben.” In Studien zur neuhochdeutschen Lexikographie II, H.E. Wiegand (ed), 17– 323. Hildesheim/New York: Georg Olms 1982. 1980 Das Wortfeld ‘Angst’. Eine lexikographische Untersuchung mit Vorschlägen für ein grosses interdisziplinäres Wörterbuch der deutschen Sprache. Stuttgart: Klett 1980.

Bibliovita 

1979 With Sabine Harling. “Das Wortfeld fear. Erforschung und Ansätze einer Didaktisierung.” Die neueren Sprachen 1979, vol. 4: 339–354. With Joachim Mugdan. Einführung in die Morphologie. Stuttgart: Kohlhammer 1979. Henning Bergenholtz (ed). Empirische Textwissenschaft. Aufbau und Auswertung von Text-Corpora. Königstein: Scriptor 1979. With Bjarne Ulvestad. “Es als ‘Vorgreifer’ eines Objektsatzes.” Deutsche Sprache 1979, vol. 3: 97–116. With Joachim Mugdan. “Ist Liebe primär? Über Ableitung und Wortarten.” In Deutsche Gegenwartssprache. Entwicklungen, P. Braun (ed), 339–354. München: Fink 1979. 1978 With Burkhard Schaeder. “Ausblicke auf eine deskriptive Lexikographie.” In Interdisziplinäres deutsches Wörterbuch in der Diskussion, H. Henne, W. Mentrup, D. Möhn and H. Weinrich (eds), 116–172. Düsseldorf: Schwann 1978. “Zur Sprache der Psychologie und ihrer lexikographischen Erfassung.” In Inter disziplinäres Wörterbuch in der Diskussion, H. Henne, W. Mentrup, D. Möhn and H. Weinrich (eds), 102–115. Düsseldorf: Schwann 1978. 1977 With Burkhard Schaeder. “Deskriptive Lexikographie.” Zeitschrift für germanistische linguistik. 1977, vol. 5, no. 1: 2–33. With Burkhard Schaeder. Die Wortarten des Deutschen. Versuch einer syntaktisch orientierten Klassifikation. Stuttgart: Klett 1977. 1976 Zur Morphologie deutscher Substantive, Verben und Adjektive. Bonn: Dümmler 1976. With Burkhard Schaeder. “Zur Syntax nominaler Einheiten der geschriebenen deutschen Standardsprache.” Text & Kontext 1976, vol. 4, no. 2: 3–36.

 Sandro Nielsen and Sven Tarp

1975 “Volksetymologie oder synchrone Etymologie.” Muttersprache 1975, vol. 85, no. 2: 89–94. “Zur Wortfeldterminologie.” Muttersprache 1975, vol. 85, no. 4: 78–85.

Notes on contributors Pedro A. Fuertes-Olivera Escuela Universitaria de Estudios Empresariales University of Valladolid Spain [email protected] Franziskus Geeb Fakultät Design, Medien & Information University of Applied Sciences of Hamburg Germany [email protected] Rufus Gouws Department of Afrikaans and Dutch Stellenbosch University South Africa [email protected] Thomas Herbst Institut für Anglistik und Amerikanistik Friedrich-Alexander-Universität Erlangen-Nürnberg Germany [email protected] Jón Hilmar Jónsson Linguistic Society of Iceland Reykjavík Iceland [email protected] Patrick Leroyer Centre for Lexicography Aarhus School of Business, University of Aarhus Denmark [email protected]

 Lexicography in the 21st Century

Marie-Claude L’Homme Observatoire de linguistique Sens-Texte (OLST) Université de Montréal Canada [email protected] Sven-Göran Malmgren Institutionen för svenska språket Göteborg University Sweden [email protected] Sandro Nielsen Centre for Lexicography Aarhus School of Business, University of Aarhus Denmark [email protected] D.J. Prinsloo Department of African Languages University of Pretoria South Africa [email protected] Bo Svensén Stockholm Sweden [email protected] Sven Tarp Centre for Lexicography Aarhus School of Business, University of Aarhus Denmark [email protected] Lars S. Vikør Norsk Ordbok 2014 University of Oslo Norway [email protected]

Notes on contributors 

Herbert Ernst Wiegand Germanistisches Seminar Universität Heidelberg Germany [email protected]

Subject index A abstract hierarchical constituent structure 64 abstract hybrid text constituent structure 65 abstract microstructure 68 abstract need 48 abstract pure text constituent structure 65 academic dictionary review 25 access process 12, 30 access route 12, 165 access structure 111, 165 accessibility 24, 115 accounting dictionary 57, 173 accounting terminology 58, 173 actant 99, 244 actantial structure 244, 249 adjective 99, 244, 268, 273 AIML database 222 AIML language 213 alphabetical stretch 184 annotated corpus 190 antonym 98, 260 antonymous relation 98, 274, 279 architectonically enriched article constituent structure 84 architectonically enriched item structure 86 architecture 72 argument 269 audience of academic reviews 25 B balanced focus 112 base 98, 247, 295 bilingual LGP dictionary 210 bottom expanded item 71 business dictionary 166 C categorising process 268 chat communication 208

chatbot 207, 209, 211, 221 chronolectic labelling 187 chunking 190 classification scheme 149 codification 128 coding system 287 cognition-oriented function 162 cognitive function 17, 27, 30, 93, 116 co-hyponym 98 collocate 99, 240, 242, 247, 295 collocation 98, 238, 247, 262, 263, 294, 296 collocation dictionary 282, 302 communication-oriented function 165, 173 communication policy 124 communication problem 106 communicative function 30, 93, 115 communicative tourist lexicography 105 comparative review 31 compound 6, 97, 133, 264, 267, 300 comprehension-related information cost 34 concept 126, 148, 195, 277 concept classification 148 concordance 184, 193 concrete hierarchical article microstructure 66 concrete hybrid text constituent structure 65 concrete pure text constituent structure 64 concrete user needs 48 co-occurrence 282, 294 corpus 181, 182, 248 corpus annotation 190 corpus balance 183 corpus data 191, 193 corpus lexicography 182 corpus planning 126, 129 corpus size 183

cross-reference 52, 117, 165, 168, 200, 209 cross-reference structure 27, 209 culture-dependent subject field 164 D data 32, 35, 47, 113 data access 46 data distribution 165 data structure 218 description 260 diachronic presentation 31 diasystematic information 98 dictionary 26–29, 282 dictionary component 30, 32–33 dictionary function 4, 10, 33 dictionary grammar 286, 301 dictionary of idioms 282 dictionary review 24, 26, 35 dictionary reviewing 23, 28–35 disambiguation 189, 272 disciplinary community 24 E elementary item 67 encyclopaedic classification 148 encyclopaedic information 161, 166, 219 encyclopaedic note 161, 162 encyclopaedic section 161 exhaustive functional-positional segmentation 64 expanded elementary item 71 experiential information need 115 Explanatory Combinatorial Lexicology 247 extended aided fully-integrated systematic introduction 174 external subject classification 148 extra-lexicographic component 33

 Lexicography in the 21st Century F factual approach 29, 31 factual knowledge 31, 163, 173 field of computing 244 field of the Internet 244 first level nesting 7, 13 free combination 262, 295 frequency trajectory 187 functional approach 15, 115 functional method 30 functional segmentative isolation 64 functional shortcoming 109 functional transformation 114 function-related component 33 function-related user needs 49 function theory 15, 46, 171 fundamental challenge of lexicography 44 G generally accepted reviewing principle 35 geolocalised communicative assistance 110 global information 47 gloss conditioned item structure 77 grammatical code 140 grammatical parsing 190 grammatical term 301 H homonym separation 189 horizontal ordering 5, 10 hybrid article microstructure 70 hypernym 227 I idiom 52, 268, 300 inclusion of lemma 186 inflection 53, 95, 140 inflectional variation 96 information 12, 24, 30, 45, 47, 94, 104, 209, 282 information death 46 information retrieval 46 informative value of reviews 35–37 internal subject classification 148 interpretive function 116 intransitive 286

K keyword 182, 208, 247, 267 keyword-in-context extraction 184 knowledge base 209, 210 knowledge-orientated lexicographical function 209 L language acquisition planning 124 language corpus planning 124 language planning 124 language policy 124 language status planning 124 layperson 164 learner’s dictionary 49, 283, 287 lemma 53, 221, 228, 267 lemma list 184, 267 lemma status 300 lemmatisation 184, 260, 302 lexical combination 238 lexical relation 246 lexical resource 193 lexicographic approach 29–30 lexicographic component 32, 110 lexicographic data 27, 32, 45 lexicographic description 259, 263 lexicographic function 5, 16, 27, 94 lexicographic information cost 34, 47 lexicographic ruler 184 lexicographic structure 27 lexicographic term 156 lexicographic theory 44 lexicographic tool 54, 105, 110, 199 lexicographic triangle 112 lexicographic user needs 46 lexicographical-enabled chatbot 211 lexicographical data 227 lexicographical database 209, 210 lexico-semantic criterion 245 linguistic approach 29, 33–34 linguistic information 219 linguistic knowledge 173 linguistic term 156 LSP dictionary 210

M macrostructural procedure 5 macrostructure 4 mandatory spelling 139 man-machine communication 207 materiality 36 maximising dictionary 32, 163 maximising review 32 meaning 11, 53, 139, 192, 242, 258, 282 metalexicographic term 156 metalexicographic terminology 151 metalexicography 147 microstructure 70, 82, 227 minimised gloss-conditioned partial structure 82, 86 minimising dictionary 32, 163 minimising review 32 mono-functional dictionary 48 monolingual LGP dictionary 210 morphological information 97, 135 morphology 97 multi-field dictionary 29, 163 multi-word lexeme 300 multi-word phrase 271 multi-word unit 282, 301 N needs-adapted data 47 nest 8, 138 nesting 7, 18 neutrality 37 niche 6 niching 5, 13, 15 non-exhaustive functional-positional segmentation 66 non-expanded elementary item 71 non-functional-positional segmentation 64 O objective of dictionary reviews 25, 28, 36 object of dictionary reviews 26, 28 onomasiological approach 259 ontology 224 operative function 116 optional forms 138

Subject index  outside matter 30, 168 OWL ontology 224, 227 P paradigmatic relation 246 part-of-speech label 190 part-of-speech tagging 190 phrasal verb 291, 293 phrase 258, 260, 267 phraseological database 265 phraseological description 262, 266 phraseological lemma 260 phraseological unit 294 phraseology 263, 264 poly-functional dictionary 48 primary user need 49, 112 production-oriented information 95, 100 pronunciation 95, 136 publicity 24 punctual information 47 pure article constituent 65 R rapid access 12 reception-oriented information 100 relevance 36 reliability 37 representativeness of corpus 183 research 25 reviewing 28–35 review team 31 review topic 32, 37, 41 S scholarly dictionary review 24, 26, 35 search criterion 54 search-related information cost 34 secondary user need 50 second level nesting 7 selection of lemmata 10, 258 semantic categorisation 277 semantic property 260 semantic relation 228, 260, 263, 266, 274 semantic role 250

semantic tag 190 semantic tagging 191 semantically significant collocation 294, 298 semasiological approach 259 semasiological description 261 semi-expert 164 semi-finished products of language 299 set phrase 262, 263 shallow hierarchical structure 79 significant feature 27, 35 single-field dictionary 29, 163 single lexeme 282 single word 260 sinuous lemma file 5, 10 space-saving procedure 12 specialised dictionary 147, 161, 238 specialised lexical database 238 spelling dictionary 128, 130, 132, 134 straight alphabetical ordering 5, 12 sub-field dictionary 29, 163 subject-field classification 147, 148 subject-field component 161, 165 successful access 12 synchronic presentation 31 synonym 98, 221, 225, 228, 268, 274 synonymous relation 274, 279 syntactical analysis 190 syntagmatic relation 246, 282, 294 systematic introduction 163, 165, 173 T target user 12, 164 temporal method 31 term 148, 244 term bank 238 term classification 148 term extractor 245 terminological classification 148

text production 16, 30, 57–58, 99, 115 text reception 17, 49, 53, 58, 100 textual dictionary structure 64 theory of lexicographic functions 16, 46 theory of textual dictionary structure 65 tourist lexicography 103, 113 transitive 286 transitivity 227 translation dictionary 33 true and fair view 36 trustworthiness 24 two-dimensional search criterion 57 type of user needs 48 typical term 250 U underlying assumption 36 useful information 26 use-related component 33 use-related user needs 50 user guide 30 user need 3, 35, 45 user profile 165 user situation 47 utility product 44 V valency 99, 291 valency dictionary 282 valency information 99, 287, 289 verbal phrase 269, 271 verb pattern 288 vertical item architecture 72 vertical ordering 5 W word 258 word class 140, 267, 283 word combination 239 word division 141 word formation 97, 264, 266 word frequency 186 wordnet 195, 267

In the series Terminology and Lexicography Research and Practice the following titles have been published thus far or are scheduled for publication: 12 Nielsen, Sandro and Sven Tarp (eds.): Lexicography in the 21st Century. In honour of Henning Bergenholtz. 2009. xi, 341 pp. 11 Fuertes-Olivera, Pedro A. and Ascensión Arribas-Baño: Pedagogical Specialised Lexicography. The representation of meaning in English and Spanish business dictionaries. 2008. ix, 165 pp. 10 Gottlieb, Henrik and Jens Erik Mogensen (eds.): Dictionary Visions, Research and Practice. Selected papers from the 12th International Symposium on Lexicography, Copenhagen 2004. 2007. xii, 321 pp. 9 Yong, Heming and Jing Peng: Bilingual Lexicography from a Communicative Perspective. 2007. x, 229 pp. 8 Antia, Bassey (ed.): Indeterminacy in Terminology and LSP. Studies in honour of Heribert Picht. 2007. xxii, 236 pp. 7 Görlach, Manfred: English Words Abroad. 2003. xii, 189 pp. 6 Sterkenburg, Piet van (ed.): A Practical Guide to Lexicography. 2003. xii, 460 pp. 5 Kageura, Kyo: The Dynamics of Terminology. A descriptive theory of term formation and terminological growth. 2002. viii, 322 pp. 4 Sager, Juan C.: Essays on Definition. With an introduction by Alain Rey. 2000. viii, 257 pp. 3 Temmerman, Rita: Towards New Ways of Terminology Description. The sociocognitive approach. 2000. xvi, 258 pp. 2 Antia, Bassey: Terminology and Language Planning. An alternative framework of practice and discourse. 2000. xxiv, 265 pp. 1 Cabré Castellví, M. Teresa: Terminology. Theory, methods and applications. Edited by Juan C. Sager. Translated by Janet Ann DeCesaris. 1999. xii, 248 pp.