Symposium on Lexicography VII: Proceedings of the Seventh International Symposium on Lexicography May 5–6, 1994 at the University of Copenhagen [Reprint 2017 ed.] 9783110937916, 9783484309760

The proceedings cover new perspectives in the field of lexicography, including both theoretical and practical topics, an

252 17 19MB

English Pages 263 [264] Year 1996

Table of contents :
TABLE OF CONTENTS
INTRODUCTION
Prospects for Automatic Lexicography
Über die Mediostrukturen bei gedruckten Wörterbüchern
An Analysis of Danish Nominal Inflectional Paradigms Built on the Computer Based System DANB0J - with a Presentation of the System
Herausforderungen der Textlexikographie: Der Belegschnitt
Masculine, feminine and epicene nouns revisited: informant reactions versus lexicographic definitions
Probleme der Aussprache englischer Wörter im Deutschen und ihre Behandlung im Anglizismen-Wörterbuch
A Concise Dictionary of Early Modern English Pronunciation
Schüsselwörter der Wendezeit Ein lexikologisch-lexikographisches Projekt zur Auswertung des IDS-wendekorpus
The Electronic Conversation of a Dictionary: From Norwegian-French to French-Norwegian
Computerwörterbücher der Bulgarischen Neologismen
Exploiting the Masses: the corpus-based study of language
Überlegungen zu einem Wörterbuch der Archaismen
Metaphorization in the Prefixation of Hungarian and Russian Verbs
How Alphabetical Should A Dictionary Be? (the case of HIGH and its combinations1 in some dictionaries)
Cognitive linguistics and lexicography: Suggestions for a diagrammatic representation of un-Adj-formations in a monolingual dictionary
Hans Christian Andersen and Bilingual Lexicography
Introspection and Computer Corpora: The Meaning and Complementation of start and begin
Idiom Transformation, Idiom Translation and Idiom Dictionaries
Bibliography

Recommend Papers

Symposium on Lexicography VIII: Proceedings of the Eighth International Symposium on Lexicography May 2–4, 1996, at the University of Copenhagen [Reprint 2017 ed.] 9783110929836, 9783484309906

The proceedings cover new perspectives in the field of lexicography, including both theoretical and practical topics, an

156 55 27MB Read more

Symposium on Lexicography II: Proceedings of the Second International Symposium on Lexicography, May 16–17, 1984 at the University of Copenhagen [Reprint 2017 ed.] 9783111341132, 9783484309050

164 13 18MB Read more

Symposium on Lexicography XI: Proceedings of the Eleventh International Symposium on Lexicography May 2-4, 2002 at the University of Copenhagen 9783110928310, 9783484391154

The proceedings cover new perspectives in the field of lexicography, including both theoretical and practical topics, an

175 48 18MB Read more

Symposium on Lexicography VI: Proceedings of the Sixth International Symposium on Lexicography May 7–9, 1992 at the University of Copenhagen 9783111592473, 9783484309579

226 38 20MB Read more

Symposium on Lexicography V: Proceedings of the Fifth International Symposium on Lexicography May 3–5, 1990 at the University of Copenhagen 9783111341095, 9783484309432

147 84 21MB Read more

Symposium on Lexicography X: Proceedings of the Tenth International Symposium on Lexicography May 4-6, 2000 at the University of Copenhagen 9783110933192, 9783484391093

The proceedings cover new perspectives in the field of lexicography, including both theoretical and practical topics, an

162 73 10MB Read more

Symposium on Lexicography IX: Proceedings of the Ninth International Symposium on Lexicography April 23-25, 1998 at the University of Copenhagen [Reprint 2010 ed.] 9783110915044, 9783484391031

The proceedings cover new perspectives in the field of lexicography, including both theoretical and practical topics, an

150 38 59MB Read more

Proceedings of the Third International Symposium on Lexicography: May 14–16, 1986, at the University of Copenhagen 9783111347349, 9783484309197

236 17 27MB Read more

Symposium on Lexicography IV: Proceedings of the Fourth International Symposium on Lexicography April 20–22, 1988, at the University of Copenhagen 9783111340555, 9783484309265

151 44 14MB Read more

SOMA 2003 - Symposium on Mediterranean Archaeology: Proceedings of the Seventh Meeting of Postgraduate Researchers at the Institute of Archaeology, University College London, 21st –23rd February 2003 9781841718293, 9781407328256

32 papers from the Symposium on Mediterranean Archaeology held at the Institute of Archaeology, London in 2003.

137 66 57MB Read more

Symposium on Lexicography VII: Proceedings of the Seventh International Symposium on Lexicography May 5–6, 1994 at the University of Copenhagen [Reprint 2017 ed.]
9783110937916, 9783484309760

Author / Uploaded
Arne Zettersten (editor)
Viggo Hjornager Pedersen (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

USMOEMMMTODA M £

LEXICOGRAPHICA Series Maior Supplementary Volumes to the International Annual for Lexicography Suppléments à la Revue Internationale de Lexicographie Supplementbände zum Internationalen Jahrbuch für Lexikographie

Edited by Sture Allén, Pierre Corbin, Reinhard R. K. Hartmann, Franz Josef Hausmann, Ulrich Heid, Oskar Reichmann, Ladislav Zgusta 76

Published in cooperation with the Dictionary Society of North America (DSNA) and the European Association for Lexicography (EURALEX)

Symposium on Lexicography VII Proceedings of the Seventh Symposium on Lexicography May 5-6, 1994 at the University of Copenhagen edited by Arne Zettersten and Viggo Hjornager Pedersen

Max Niemeyer Verlag Tübingen 1996

To the memory of Professor Karl Hyldgaard Jensen

Die Deutsche Bibliothek - CIP-Einheitsaufnahme [Lexicographica / Series maior] Lexicographica: supplementary volumes to the International annual for lexicography / publ. in cooperation with the Dictionary Society of North America (DSNA) and the European Association for Lexicography (EURALEX). Series maior. - Tübingen : Niemeyer. Früher Schriftenreihe Reihe Series maior zu: Lexicographica NE: International annual for lexicography / Supplementary volumes 76. Symposium on Lexicography (7, 1994, K0benhavn): Symposium on Lexicography VII. - 1996 Symposium on Lexicography (7, 1994, K0benhavn): Symposium on Lexicography VII : proceedings of the Seventh Symposium on Lexicography, May 5-6,1994 at the University of Copenhagen / ed. by Ame Zettersten and Viggo Hj0mager Pedersen . - Tübingen : Niemeyer, 1996 (Lexicographica : Series maior ; 76) ISBN 3-484-30976-8

ISSN 0175-9264

© Max Niemeyer Verlag GmbH & Co. KG, Tübingen 1996 Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. Printed in Germany. Gedruckt auf alterungsbeständigem Papier. Druck: Weihert-Druck GmbH, Darmstadt Einband: Industriebuchbinderei Hugo Nädele, Nehren

TABLE OF CONTENTS

Introduction English French German

VII XI XV

John Sinclair,

Prospects for Automatic Lexicography

Herbert Ernst Wiegand,

Über die Mediostrukturen bei gedruckten Wörterbüchern

Christian Becker-Christensen and Peter Widell

Evelyn Breiteneder

Ingmari Bergquist and Gunnar Persson

Ulrich Busse

Bernhard Diensberg

Dieter Herberg

Tove Jakobsen and Randi Saebee

Ljubima Jordanowa

Ramesh Krishnamurthy

1

11

An Analysis of Danish Nominal Inflectional Paradigms Built on the Computer Based System DANB0J - with a Presentation of the System 45 Herausforderungen der Textlexikographie: Der Belegschnitt

61

Masculine, Feminine and Epicene Nouns Revisited: Informant Reactions Versus Lexicographic Definitions

69

Probleme der Aussprache englischer Wörter im Deutschen und ihre Behandlung im Anglizismen-Wörterbuch

83

A Concise Dictionary of Early Modern English Pronunciation

93

Schlüsselwörter der Wendezeit. Ein Lexikologisch-lexikographisches Projekt zur Auswertung des IDS-Wendekorpus

The Electronic Conversion of a Dictionary from Norwegian-French to French-Norwegian

119

...

127

Computerwörterbücher der bulgarischen Neologismen

137

Exploiting the Masses: The Corpus-based Study of Language

141

VI Klaus-Dieter Ludwig

Sàndor Martsa

Geart van der Meer

Athur Mettinger

Viggo Hj0rnager Pedersen

Hans-Jörg Schmid

Andrejs Veisbergs

Überlegungen zu einem Wörterbuch der Archaismen

157

Metaphorization in the Prefixation of Hungarian and Russian Verbs

171

How Alphabetical Should a Dictionary Be? The case of HIGH and its combinations in some dictionaries

183

Cognitive Linguistics and Lexicography: Suggestions for a Diagrammatic Representation of un-Adj-formations in a Monolingual Dictionary

199

Hans Christian Andersen and Bilingual Lexicography

213

Introspection and Computer Corpora: The Meaning and Complementation of start and begin

223

Idiom Transformation, Idiom Translation and Idiom Dictionaries

241

INTRODUCTION

The Seventh International Symposium on Lexicography at the University of Copenhagen, the proceedings of which are hereby published, took place on 5-6 May, 1994. This time we enjoyed meeting participants from Austria, Bulgaria, Denmark, Germany, Great Britain, Hungary, Latvia, the Netherlands, Norway, Russia, Sweden and Ukraine. There was one plenary lecture on the first day - by John Sinclair, University of Birmingham, and one on the second day by Herbert E. Wiegand, University of Heidelberg. The sectional lectures were divided into two groups: an English-speaking section and a German-speaking one. As from 1994, the Otto Jespersen Memorial Lecture is planned to be an annual event. This year it was given as the opening plenary lecture of the Symposium by John Sinclair, of the University of Birmingham, the creator of the Cobuild Dictionaries and the Birmingham English language corpus, The Bank of English. In his paper, Prospects for Automatic Lexicography, John Sinclair describes the problems connected with automatic lexicography and all the tools and means with which to overcome them. The paper ends with the prediction that automatic dictionary production may well become possible within a few years. In the second plenary paper of the conference, Herbert Wiegand discusses central dictionary problems in Über die Mediostrukturen bei gedruckten Wörterbüchern. Printed dictionaries may be regarded as a traditional form of knowledge representation. The dictionary-internal mediostructure interconnects the knowledge elements represented in different sectors of the dictionary on several levels of lexicographic description to form a network. This paper focuses on dictionary-internal mediostructures introducing a formalized theory of mediostructure in an informal way. Christian Becker-Christensen and Peter Widell present a joint paper analyzing Danish nominal inflectional paradigms built on the system DANB0J. The Danish system of inflected forms may be regarded as rather complicated, the nominal inflection being most complicated. Therefore there is a need for a computer-based system which can generate the required information in a proper and reliable form. Ingmari Bergquist and Gunnar Persson's paper, Masculine, Feminine and Epicene Nouns Revisited: Informant Reactions Versus Lexicographic Definitions, is a study of informant reaction to the sexual implications of pejorative and ameliorative lexemes like bastard, twit, kitten, peach. Recent dictionaries have tended to give epicene or sex-neutral definitions of such words. However, despite considerable disagreement among informants, the study shows that for most native speakers, many of these words still suggest either masculinity or femininity: bastard to most informants is male, chatterbox female. The article suggests that dictionary definitions should be made to correspond with these facts. Evelyn Breiteneder in her paper, Herausforderungen der Textlexikographie: Der Belegschnitt, discusses delimitations of lexicographic references in text lexicography. She shows that a distinction should be made between delimitations of the lexicographic reference as a whole and those of reference texts which are to function as lexicographic references together with other components. Ulrich Busse in his paper Probleme der Aussprache englischer Wörter im Deutschen und ihre Behandlung im Anglizismen-Wörterbuch deals with transcription problems in the new

VIII dictionary of anglicisms produced at Paderborn, Germany. This dictionary project founded by the late Professor Broder Carstensen, is now being completed by Ulrich Busse. Bernhard Diensberg describes his plan for making A Concise Dictionary of Early Modern English Pronunciation, based on the idea of the DEMEP Dictionary presented by Professor Bror Danielsson, Stockholm, in the late 1970's. The plan is to draw mainly on primary sources of early Modern English pronunciation, such as the evidence given by orthoepists and spelling reformers. Dieter Herberg in his paper on Schlüsselwörter der Wendezeit. Ein lexikologisch-lexikographisches Projekt zur Auswertung des IDS-Wendekorpus presents the outline of a project dealing with the years 1989-90, when the previous "German Democratic Republic" ceased to exist and words and concepts in the German language became different. Tove Jacobsen and Randi Sasbae in The Electronic Conversion of a Dictionary: From Norwegian-French to French-Norwegian describe the problems of conversion, which in the case of the Norwegian-French dictionary seem to have been rather fewer than anticipated. Quoting Masereeuw and Serail (1992) the article concludes that "the electronic form of the dictionary should no longer be considered a spin-off of the printed form, but rather the other way around". Ljubima Jordanowa's Computerwörterbücher der bulgarischen Neologismen describes a dictionary of 4.5 megabyte covering the period 1980 to 1993 and listing not only undoubtedly new lemmata, but also new senses of established words, and, in some cases, nonceformations. The format gives the user access to information about the source, first occurrence and etymology of the words. Ramesh Krishnamurthy is one of the senior compilers of the Cobuild Dictionary, Birmingham. In this paper, he explains the importance of using a large database like the Bank of English for dictionary work. He particularly stresses the significance of the additional grammatical information provided by the dictionary and the wealth of collocations given. Klaus-Dieter Ludwig in his paper on Überlegungen zu einem Wörterbuch der Archaismen presents new plans for a dictionary of archaisms in the German language. The period after 1995 is the starting-point for looking at archaisms. These should be collected from the mid19th century onwards. Sandor Martsa's Metaphorization in the Prefixation of Hungarian and Russian Verbs describes Hungarian and Russian verbs of motion with indications of direction or location and their extended metaphorical use, where the spatial sense may retreat into the background or be entirely lost. The article concludes that "literal and metaphoric meanings form an interrelated network within which they can be identified relative to one another". Geart van der Meer asks the very pertinent question, How Alphabetical Should a Dictionary Be?. Taking combinations of high + noun as his point of departure, he examines the treatment of this phenomenon in a couple of English/Dutch dictionaries and in some monolingual English ones, demonstrating many differences between the various dictionaries and inconsistencies in individual dictionaries. He recommends the use of strict alphabetization wherever possible, no matter whether the English combination consists of one or more orthographical words, and makes the additional point that in bilingual dictionaries, source language collocations should be given entry status if their translation cannot be predicted from the translations of their constituent parts. Arthur Mettinger's Cognitive Linguistics and Lexicography: Suggestions for a Diagrammatic Representation of un-Adj-formations in a Monolingual Dictionary discusses the meaning of the adjectival prefix un and possibilities of representing it in diagram form. A comparison of structuralist and cognitivist approaches to a definition leads to the conclusion that"... both

IX approaches yield practically identical typologies, though from different points of departure ...: while the structuralist framework and its descriptive apparatus concentrate on the intralinguistic aspects of ««-/Id/'-formations, the cognitivist approach tries to represent linguistic phenomena in the context of human cognition in general and thus operates with basically non-linguistic tools of explanation". Both typologies may be helpful, however, and furthermore the use of diagrams may help to clarify to the lexicographer and the dictionary user the various meanings of the prefixes studied. Viggo Hj0rnager Pedersen's Hans Christian Andersen and Bilingual Lexicography describes work with Danish and English Andersen corpora, and discusses ways of presenting the many different English translations. Solutions include access via WordCruncher to a CDbased corpus, presentation of the translations in a variorum edition, and the compilation of a Danish Andersen dictionary with English translations of high-frequency items. Hans-Jôrg Schmid in his paper Introspection and Computer Corpora: The Meaning and Complementation of start and begin tackles two problems which have interested scholars and students of English for a long time. He has studied the semantic difference between the verbs start and begin and also the difference and meaning and use of the TO and ING complements. Both the Lancaster-Oslo-Bergen Corpus and the London-Lund Corpus of spoken English are used. Besides the general conclusion that begin + TO is preferred in writing and start + ING in speech, many interesting observations are made. Andrejs Veisbergs' Idiom Transformation, Idiom Translation and Idiom Dictionaries describes the problems involved in translating idioms and discusses how far existing idiom dictionaries help to solve them. Often there are no "equivalents", and often idioms are alluded to rather than quoted in full - which does not make the translator's task any easier. Acknowledgements: The editors wish to thank the authors of the contributions for placing their manuscripts at our disposal and all participants, old friends and newcomers, for joining the symposium. We are indebted for financial support to the Danish Research Council for the Humanities, Einar Hansen's Forskningsfond, the Center for Translation Studies and Lexicography, and the Faculty of the Humanities, Copenhagen University, and we cordially thank the British Council, Copenhagen for grants to the social frame of the symposium.

Copenhagen, May 1994

The editors

INTRODUCTION

Le Septième Symposium International de Lexicographie organisé par l'Université de Copenhague et dont les communications sont publiées dans le présent volume, s'est tenu les 5 et 6 mai 1994. C'est avec plaisir que nous avons, cette fois, accueilli des participants d'Allemagne, d'Autriche, de Bulgarie, du Danemark, de Grande-Bretagne, de Hongrie, de Lettonie, de Norvège, des Pays-Bas, de Russie, de Suède et d'Ukraine. Le premier jour s'est ouvert sur une séance plénière, présidée par John Sinclair, de l'Université de Birmingham, et le deuxième jour sur une autre, présidée par Herbert E. Wiegand, de l'Université de Heidelberg. Les séances de section ont été divisées en deux: une section en langue anglaise, l'autre en langue allemande. Désormais, une conférence à la mémoire d'Otto Jespersen est donnée chaque année. En 1994, elle a été faite lors de la séance plénière d'ouverture à ce symposium par John Sinclair, de l'Université de Birmingham, fondateur des dictionnaires Cobuild et du corpus de la langue anglaise de Birmingham, "the Bank of English". Dans sa communication, Prospects for Automatic Lexicography (Perspectives de Lexicographie automatique), John Sinclair décrit les problèmes qui sont liés à la lexicographie automatique et les instruments destinés à les résoudre. La communication se conclut par la prédiction que la production automatique de dictionnaires sera rendue possible dans un proche avenir. Lors de la seconde séance plénière du symposium, Herbert E. Wiegand discute de plusieurs problèmes cruciaux en rapport avec les dictionnaires dans Über die Mediostrukturen bei gedruckten Wörterbüchern (Sur les médio-structures des dictionnaires imprimés). Les dictionnaires peuvent être conçus comme une forme traditionnelle de représentation du savoir. Les médio-structures à l'intérieur du dictionnaire relient entre eux les éléments du savoir représentés dans différents secteurs du dictionnaire à plusieurs niveaux de description lexicographique de façon à pouvoir constituer un réseau. Cette communication met l'accent sur les médio-structures internes au dictionnaire en introduisant de manière informelle une théorie formalisée de la médio-structure. Christian Becker-Christensen et Peter Widell présentent une communication commune dans laquelle ils analysent les paradigmes danois d'inflexion nominale construits sur le système DANB0J. Le système danois des formes infléchies peut être considéré comme plutôt complexe et l'inflexion nominale comme étant la plus compliquée. C'est pourquoi l'on a besoin d'un système basé sur l'électronique qui puisse fournir l'information requise sous une forme appropriée et fiable. La communication d'Ingmari Bergquist et de Gunnar Persson, Masculine, Feminine and Epicerie Nouns Revisited: Informant Reactions Versus Lexicographie Définitions (Noms masculins, féminins et épicènes revisités: Réactions des informateurs face aux définitions lexicographiques), analyse la réaction de certains informateurs devant les implications sexuelles de lexèmes péjoratifs et mélioratifs tels que bastard, twit, kitten, peach. Des dictionnaires récents tendent à donner des définitions épicènes ou sexuellement neutres à ces mots. Pourtant, malgré un désaccord profond entre les informateurs, cette analyse démontre que, pour la plupart des locuteurs natifs, beaucoup de ces mots suggèrent encore la masculinité ou la féminité: ainsi, bastard est, pour la majorité des informateurs, masculin, chatterbox féminin. L'exposé propose que les définitions des dictionnaires reflètent ces faits. Evelyn Breiteneder, dans sa communication, Herausforderungen der Textlexikographie: Der Belegschnitt (Défis de la lexicographie de texte: délimitation des références), aborde la question épineuse de la délimitation des références lexicographiques dans la lexicographie de

XII texte. Elle montre qu'il faudrait opérer une distinction entre les délimitations de la référence lexicographique considérée comme un tout et celles des textes de référence qui doivent fonctionner comme des références lexicographiques en corrélation avec d'autres composants. Dans la communication d'Ulrich Busse, Probleme der Aussprache englischer Wörter im Deutschen und ihre Behandlung im Anglizismen-Wörterbuch (Problèmes de la prononciation des mots anglais en allemand et le traitement qui leur est appliqué dans le dictionnaire des anglicismes), sont discutés les problèmes de transcription dans le nouveau dictionnaire des anglicismes, réalisé à Paderborn, Allemagne. Ce projet de dictionnaire, lancé par le regretté professeur Broder Carstensen, est dorénavant poursuivi par Ulrich Busse. Bernhard Diensberg expose son projet de faire A Concise Dictionary of Early Modem English Pronunciation (Dictionnaire concis de la première prononciation moderne de l'anglais) en reprenant l'idée du dictionnaire DEMEP présenté par le professeur Bror Danielsson, Stockholm, à la fin des années 1970. Ce projet consiste à exploiter surtout les sources primaires de la prononciation du premier anglais moderne, comme, par exemple, les témoignages de spécialistes d'orthoépie et de réformateurs de l'orthographe. Dieter Herberg, dans sa communication sur Schlüsselwörter der Wendezeit. Ein lexikologisch-lexikographisches Projekt zur Auswertung des IDS-Wendekorpus (Mots-clés d'un tournant. Projet lexicologique et lexicographique pour l'utilisation du "IDS-Wendekorpus"), présente les grandes lignes d'un projet concernant les années 1989-1990, au moment où l'ancienne République Démocratique Allemande a cessé d'exister et où des termes et des concepts de la langue allemande ont changé. Tove Jacobsen et Randi Saebee, dans leur communication, The Electronic Conversion of a Dictionary: From Norwegian-French to French-Norwegian (La conversion électronique d'un dictionnaire: du Norvégien-Français au Français-Norvégien), passent en revue les problèmes de conversion, qui, dans le cas du dictionnaire norvégien-français, semblent avoir été bien moindres que prévu. Citant Masereeuw et Serail (1992), l'exposé conclut que "l'on devrait cesser de tenir le dictionnaire sous forme électronique pour un dérivé de la forme imprimée, et plutôt adopter le point de vue inverse". Dans Computerwörterbücher der bulgarischen Neologismen (Dictionnaires computerisés des néologismes bulgares), Ljubima Jordanowa décrit un dictionnaire de 4,5 méga-octets couvrant la période de 1980 à 1993 et enregistrant non seulement des lemmata incontestablement nouveaux, mais aussi des sens nouveaux de mots bien établis et, dans quelques cas, des occasionalismes. Le format permet à l'usager d'avoir accès aux informations sur la source, la première occurrence et l'étymologie des mots. Ramesh Krishnamurthy est un des premiers compilateurs du dictionnaire Cobuild de Birmingham. Dans son exposé, il explique l'importance qu'il y a d'utiliser une grande banque de données telle que "the Bank of English" pour la production de dictionnaires. Il met particulièrement l'accent sur la signification des informations grammaticales additionnelles fournies par le dictionnaire ainsi que sur la richesse des collocations offertes. Dans son exposé, Überlegungen zu einem Wörterbuch der Archaismen (Considérations sur un dictionnaire des archaïsmes), Klaus-Dieter Ludwig présente un nouveau projet de dictionnaire d'archaïsmes en allemand. La période qui va suivre 1995 sera le point de départ d'une prise en considération des archaïsmes qui seront recueillis à compter du milieu du XIXe siècle. Sândor Martsa, dans sa communication, Metaphorization in the Prefixation ofHungarian and Russian Verbs (Métaphorisation des préfixes dans les verbes hongrois et russes), décrit des verbes de mouvement hongrois et russes renfermant des indications de direction ou de lieu ainsi que leur usage métaphorique étendu, où le sens spatial peut passer au second plan,

XIII voire disparaître complètement. L'exposé conclut que "le sens littéral et le sens métaphorique forment un réseau interconnecté, à l'intérieur duquel ils peuvent être identifiés en relation l'un avec l'autre". Geart van der Meer pose la question très pertinente de savoir How Alphabetical Should A Dictionary Be? (Dans quelle mesure un dictionnaire doit-il se présenter alphabétiquement?). En prenant son point de départ dans des combinaisons de high + nom, il examine le traitement de ce phénomène dans quelques dictionnaires anglais-néerlandais et monolingues anglais pour mettre en évidence de nombreuses disparités entre les différents dictionnaires ainsi que des incohérences dans des dictionnaires individuels. Il préconise l'application d'une alphabétisation stricte dans tous les cas possibles, sans tenir compte de ce que la combinaison anglaise consiste en un ou en plusieurs mots orthographiques. En outre, il insiste sur le fait que, dans des dictionnaires bilingues, les collocations de la langue-source devraient obtenir le statut d'entrées, si leur traduction ne peut être prévue à partir de la traduction de leurs éléments constitutifs. Dans son exposé, Cognitive Linguistics and Lexicography : Suggestions for a Diagrammatic Représentation of un-Adj-formations in a Monolingual Dictionary (Linguistique et Lexicographie cognitives : Suggestions pour une représentation diagrammatique des formations adjectivales avec le préfixe un- dans un dictionnaire monolingue), Arthur Mettinger discute la signification du préfixe adjectival un- et les possibilités de la représenter sous forme de diagrammes. Une comparaison des approches structuraliste et cognitiviste en vue d'une définition mène à la conclusion que "... les deux approches débouchent sur des typologies pratiquement identiques, malgré leurs points de départ différents...: tandis que la contexture structuraliste et son système descriptif se concentrent sur les aspects intralinguistiques des formations d'adjectifs avec le préfixe un-, l'approche cognitiviste s'efforce de représenter les phénomènes linguistiques dans le contexte de la cognition humaine en général et, de ce fait, opère avec un appareil d'explications basalement non-linguistique." Néanmoins, les deux typologies peuvent être utiles, et même l'emploi de diagrammes permet de clarifier, pour le lexicographe et l'usager du dictionnaire, les différentes significations des préfixes étudiés. Viggo Hjernager Pedersen, dans sa communication, Hans Christian Andersen andBilingual Lexicography (Hans Christian Andersen et la lexicographie bilingue), décrit le travail mené sur des corpus danois et anglais d'Andersen et passe en revue les manières de présenter les nombreuses traductions anglaises différentes des textes d'Andersen. Les solutions incluent l'accès par WordCruncher à un corpus sur disque compact, la présentation des traductions dans une édition de variantes et la compilation d'un dictionnaire danois d'Andersen avec des traductions en anglais des éléments à haute fréquence. La communication de Hans-Jôrg Schmid, Introspection and Computer Corpora: The Meaning and Complémentation ofstart and begin (Introspection et corpus sur ordinateur: la signification et les formes complétives de start et de begin), traite de deux problèmes qui retiennent l'attention des spécialistes et des étudiants d'anglais depuis longtemps. L'auteur a étudié la différence sémantique qu'il y a entre les verbes start et begin ainsi que la différence, la signification et l'emploi des formes complétives à l'aide de TO et de -ING. Sont utilisés à cette fin le corpus de la langue anglaise parlée de Lancaster-Oslo-Bergen et celui de Londres-Lund. Outre la conclusion générale que begin + TO est préféré dans la langue écrite et start + -ING dans la langue parlée, on relève dans cet exposé maintes observations intéressantes. Dans Idiom Transformation, Idiom Translation and Idiom Dictionaries (Transformation, traduction et dictionnaires d'expressions idiomatiques), Andrejs Veisberg expose les problèmes inhérents à la traduction des expressions idiomatiques et discute de l'utilité des

XIV dictionnaires d'expressions idiomatiques existants pour pouvoir surmonter ces problèmes. Souvent, il n'existe pas d'"équivalents", et, souvent, les expressions idiomatiques sont esquissées et non notées intégralement, ce qui ne facilite pas la tâche du traducteur. Remerciements: Les éditeurs tiennent à exprimer ici leurs remerciements aux auteurs de communications qui ont bien voulu leur confier leur manuscrit, ainsi qu'à tous les participants, vieux amis et nouvelles connaissances, pour avoir pris part à ce symposium. Nous sommes redevables, pour leur soutien financier, au Conseil de la Recherche des Lettres et Sciences Humaines du Danemark, à la Fondation pour la Recherche Einar Hansen, au Centre de Théorie de la Traduction et de Lexicographie et à la Faculté des Lettres de l'Université de Copenhague. Nous remercions cordialement le British Council de Copenhague pour avoir contribué à offrir le cadre social à ce symposium. Copenhague, mai 1994

Les éditeurs

VORWORT

Das 7. internationale Symposion über Lexikographie an der Universität Kopenhagen, dessen Akten hiermit veröffentlicht werden, fand in der Zeit vom 5. bis 6. Mai 1994 mit Teilnehmern aus Bulgarien, Dänemark, Deutschland, Großbritannien, Lettland, den Niederlanden, Norwegen, Österreich, Rußland, Schweden, Ukraine und Ungarn statt. Das Symposion wurde von zwei Plenarvorträgen eröffnet. Den einen hielt am ersten Tag John Sinclair, Universität Birmingham, den zweiten am folgenden Tag Herbert Ernst Wiegand, Universität Heidelberg. Die Sektionsreferate waren in zwei Gruppen, und zwar eine englischsprachige bzw. eine deutschsprachige Sektion, eingeteilt. Ab 1994 soll die 'Otto Jespersen Memorial Lecture' regelmäßig stattfinden. In diesem Jahr wurde sie als ein das Symposion eröffnendner Plenarvortrag von John Sinclair, Universität Birmingham, dem Gründer der Cobuild Wörterbücher und des Birmingham English language coipus, The Bank of English, gehalten. In seinem Beitrag Prospects for Automatic Lexicography beschreibt John Sinclair die mit der automatischen Lexikographie verbundenen Probleme sowie die verschiedenen Methoden, die zur Lösung dieser Probleme beitragen könnten. Sein Aufsatz schließt mit der Vorhersage, die Herstellung automatischer Wörterbücher werde in wenigen Jahren eine realistische Möglichkeit sein. Im zweiten Plenarvortrag der Tagung Über die Mediostrukturen bei gedruckten Wörterbüchern behandelte Herbert Ernst Wiegand wichtige Probleme der Lexikographie. Gedruckte Wörterbücher sind als eine traditionelle Wissensrepräsentation aufzufassen. Durch die wörterbuchinterne Mediostruktur werden die in verschiedenen Teilen des Wörterbuches vertretenen Informationen auf mehreren Ebenen der lexikographischen Beschreibung zu einem Netzwerk miteinander verflochten. Der Beitrag konzentriert sich auf wörterbuchinterne Mediostrukturen, indem eine formalisierte Theorie der Mediostruktur in einer informellen Weise vorgestellt wird. In ihrem gemeinsam abgefaßten Beitrag analysieren Christian Becker-Christensen und Peter Widell anhand des DANB0J-Systems die Paradigmen der dänischen Nominalflexion. Das dänische System der flektierten Formen ist als ein ziemlich kompliziertes, das der Nominalflexion als ein sogar äußerst kompliziertes System zu betrachten. Gerade deshalb besteht ein Bedarf eines computerbasierten Systems, das die gewünschten Informationen richtig und zuverlässig generieren kann. Der Beitrag von Ingmari Bergquist und Gunnar Persson Masculine, Feminine and Epicene Nouns Revisited: Informant Reactions Versus Lexicographic Deflnitions bildet eine Studie zur Informantenreaktionen angesichts der sexuellen Implikationen von pejorativen und verbessernden Lexemen wie bastard, twit, kitten und peach. Neuere Wörterbücher neigen zu sexus-neutralen Definitionen solcher Wörter. Trotz beträchtlicher Uneinigkeiten unter den Informanten kann festgestellt werden, daß viele dieser Wörter immerhin Männlichkeit bzw. Weiblichkeit beinhalten: beispielsweise ist bastard für die Mehrzahl der Informanten männlich, chatterbox hingegen weiblich. Im Aufsatz wird vorgeschlagen, daß dieser Tatbestand beim Abfassen von Wörterbuchdefinitionen künftig berücksichtigt werde. In ihrem Beitrag Herausforderungen der Textlexikographie: Der Belegschnitt behandelt Evelyn Breiteneder die Abgrenzung lexikographischer Belege in der Textlexikographie. Sie zeigt, daß ein Unterschied gemacht werden soll zwischen der Abgrenzung lexikographischer

XVI Belege an sich und der Abgrenzung solcher Belege, deren Funktion sich erst aus dem Zusammenspiel des jeweiligen Belegs mit anderen Komponenten ergibt. Ulrich Busse behandelt in seinem Beitrag Probleme der Aussprache englischer Wörter im Deutschen und ihre Behandlung im Anglizismen-Wörterbuch Probleme der Transkription in dem neuen an der Universität Paderborn hergestellten Anglizismen-Wörterbuch. Das von dem verstorbenen Professor Broder Carstensen begründeten Wörterbuchprojekt wird jetzt von Ulrich Busse weitegeführt. Ausgehend von den Gedanken von Professor Bror Danielsson, Stockholm, in den späten 1970er Jahren zu einem DEMEP-Wörterbuch beschreibt Bernhard Diensberg seinen Plan, A Concise Dictionary of Early Modern Pronunciation herzustellen. Der Plan besteht darin, vorrangig Primärquellen zur frühenglischen Aussprache, z.B. Aussagen zeitgenössischer Orthoepisten und Orthographiereformatoren, zu benutzen. Dieter Herberg stellt in seinem Beitrag Schlüsselwörter der Wendezeit. Ein lexikologischlexikographisches Projekt zur Auswertung des IDS-Wendekorpus ein Pilotprojekt vor, das die Zeit von 1989 bis 1990 behandeln soll, als die ehemalige DDR zu existieren aufhörte und sich viele Wörter und Begriffe des Deutschen veränderten. Tove Jacobsen und Randi Sacbae beschreiben in The Electronic Conversion ofa Dictionary: From Norwegian-French to French-Norwegian die Probleme der Konversion, die jedoch in bezug auf das norwegisch-französische Wörterbuch kleiner waren als zunächst angenommen. In Anlehnung an ein Zitat von Masereeuw und Serail (1992) wird abschließend festgestellt, daß ein elektronisches Wörterbuch keine Spin-Off-Fassung des gedruckten Wörterbuches mehr sein sollte; vielmehr sollte das Umgekehrte der Fall sein. Im Referat Computerwörterbücher der bulgarischen Neologismen beschreibt Ljubima Jordanowa ein den Zeitraum 1980 bis 1993 umfassendes Wörterbuch von 4.5 Megabyte, in dem nicht nur neue Lemmata, sondern auch neue Bedeutungen bereits existierender Wörter und in einigen Fällen auch ad hoc-Bildungen erfaßt werden. In den Wörterbuchartikeln sind Informationen über die Quelle, das erstmalige Auftreten und die Etymologie der einzelnen Lemmata enthalten. In seinem Beitrag begründet Ramesh Krishnamurthy, einer der Gründer vom Cobuild Dictionary, Birmingham, seine Auffassung, daß jede Wörterbucharbeit auf einer größeren Datenbank wie etwa der 'Bank of English' basieren müsse. Insbesondere hebt er die Bedeutung zusätzlicher grammatischer Informationen im Wörterbuch sowie die Fülle der gebotenen Kollokationen vor. Klaus-Dieter Ludwig stellt in seinem Beitrag Überlegungen zu einem Wörterbuch der Archaismen einen neuen Plan zur Herstellung eines Wörterbuches über die Archaismen im Deutschen vor. Dabei dient die Zeit nach 1995 als Ausgangspunkt für die Erfassung der Archaismen. Diese wiederum sollen von der Mitte des 19. Jahrhunderts an gesammelt werden. Sándor Martsa beschreibt in Metaphorization in the Prefixation ofHungarian and Russian Verbs ungarische und russische Bewegungsverben mit Angaben zur Richtung und Ort sowie zum potentiellen metaphorischen Gebrauch, bei dem die räumliche Bedeutung oft entweder in den Hintergrund tritt oder ganz verschwindet. Die Konklusion des Aufsatzes ist die, daß wörtliche und metaphorische Bedeutungen ein verwandtes Netzwerk bilden, innerhalb dessen sie relativ zueinander identifiziert werden können. Geart van der Meer stellt eine sehr relevante Frage: How Alphabetic Should a Dictionary Be? Ausgehend von der Konstruktion 'hoch' + Substantiv untersucht er eine Reihe von bilingualen englisch/niederländischen bzw. monolingualen englischen Wörterbüchern. Dabei gelingt es ihm, eine Vielfalt von Unterschieden zwischen den einzelnen Wörterbüchern wie

XVII auch Inkonsequenzen in ein und demselben Wörterbuch festzustellen. Er empfiehlt, wenn möglich, eine strikt-alphabetische Verfahrensweise, und zwar unabhängig davon, ob die englische Konstruktion aus einem oder mehreren orthographischen Wörtern besteht. Des weiteren befürwortet er in bezug auf die zweisprachige Lexikographie die Lemmatisierung ausgangssprachlicher Kollokationen in solchen Fällen, wo die Übersetzung nicht einfach aus einer Übersetzung der einzelnen Konstituenten hervorgeht. Viggo Hj0rnager Pedersen bescreibt in Hans Christian Andersen and Bilingual Lexicography die Arbeit an den dänischen und englischen Andersen-Korpora. Dabei geht er insbesondere auch auf die Weise ein, wie man die vielen englischen Übersetzungen darbieten kann. Mögliche Lösungen sind u.a. der Zugang über ein CD-basiertes Korpus, die Darbietung der Übersetzungen in einer "Variorum"-Ausgabe sowie das Verfassen eines dänischen Andersen-Wörterbuches mit englischen Übersetzungen hochfrequenter Lexeme. Arthur Mettinger behandelt in Cognitive Linguistics and Lexicography: Suggestions for a Diagrammatic Reprensentation of un-Adj-formations in a Monolingual Dictionary die Bedeutung des adjektivischen Präfixes un- bzw. die Möglichkeit, diese Bedeutung anhand eines Diagramms darzustellen. Ein Vergleich der strukturalistischen und kognitivistischen Ansätze zu einer Definition führt zu der Konklusion, daß beide Ansätze, wenn auch von einem verschiedenen Ausgangspunkt aus, praktisch identische Typologien ergeben. Während sich die strukturalistische Beschreibung auf die innersprachlichen Aspekte der «/i-/ld/'-Bildungen konzentriert, versucht der kognitivistische Ansatz, die sprachlichen Daten in dem Kontext der menschlichen Erkenntnis im allgemeinen zu sehen und bedient sich deswegen im Prinzip eines nicht direkt sprachlichen Erklärungsmodells. Beide Typologien können allerdings nützlich sein, und außerdem bietet der Gebrauch von Diagrammen dem Lexikographen eine gute Hilfe bei der Klärung der verschiedenen Bedeutungen in Frage kommender Präfixe. Hans-Jörg Schmid behandelt in seinem Beitrag Introspection and Computer Corpora: The Meaning and Complementation of Start and begin zwei Fragen, welche die Anglisten seit eh und je beanspruchen. Er hat den semantischen Unterschied zwischen den Verben Start und begin sowie die Bedeutung und den Gebrauch von TO bzw. ING unter die Lupe genommen, indem er zwei Korpora zum gesprochenen Englisch, und zwar das Lancaster-Oslo-Bergen— Korpus sowie das London-Lund-Korpus benutzt hat. Neben der allgemeinen Konklusion, begin + TO werde in der Schrift, Start + ING in der Rede bevorzugt, enthält das Referat eine Vielzahl von anderen interessanten Beobachtungen. Andrejs Veisberg beschreibt in seinem Beitrag Idiom Tranformation, Idiom Translation and Idiom Dictionaries die mit der Übersetzung von Idiomen verbundenen Probleme und stellt die Frage, inwieweit die vorhandenen Idiomwörterbücher zu deren Lösung beitragen können. Häufig findet man keine "Äquivalente", und des öfteren werden die Idiome nicht vollständig zitiert, sondern es wird nur auf sie verwiesen. Dies macht die Aufgabe des Übersetzers bei weitem nicht einfacher. Die Herausgeber danken sehr herzlich den Autoren für ihre Bereitschaft, die Manuskripte ihrer Beiträge für den Druck zur Verfügung zu stellen, sowie den Mitgliedern unserer Forschungsgruppe für ihre aktive Teilnahme am Symposion. Auch sind wir der Dänischen Humanistischen Forschungsgemeinschaft, Einar Hansens Forskningsfond, der geisteswissenschaftlichen Fakultät der Universität Kopenhagen und dem Zentrum für Übersetzungswissenschaft und Lexikographie für die finanzielle Unterstützung des Symposions sowie dem British Council für Beiträge zu den Rahmenveranstaltungen sehr zu Dank verplichtet. Kopenhagen im Mai 1994

Die Herausgeber

John Sinclair

Prospects for Automatic Lexicography Contents 1. Homage to Otto Jespersen 2. The Drudgery of Lexicography 3. Automatic Lexicography 4. A model of lexicography 5. Defining Language 6. The corpus 7. Retrieval 8. Analysis 9. Organization 10. Drafting 11. Conclusion

1. Homage to Otto Jespersen Otto Jespersen is one of the Great Grammarians of English, whose work remains relevant today, and is so well argued and documented that it will always be relevant and interesting. My own biggest single investment as a student was in a copy of his complete Modern English Grammar, and it has been for me a model of how to study language as much as a store of information about English grammar. My title, then, may seem a little inappropriate. Jespersen was a grammarian, not a lexicographer, and his time was before the time of automation. True, I have just had the honour of presenting to the University, on behalf of Dr Jespersen's family, the typewriter on which all his famous works were first manifested; I am sure that many of his contemporaries were still struggling with the technology of the fountain pen. But how can we tell what his attitude would have been to the IT revolution, the huge corpora and the automatic tagging and parsing of today's linguistics? As regards lexicography, we should note that Jespersen was one of the most lexically oriented of grammarians; although he was doubtless remote from the semantically obsessed dictionaries of his time, which paid very little attention to usage and collocation, he would I am sure have approved very much of the trend, especially in learner's dictionaries, of giving more and grammatical and phraseological information, linking the lexis and the grammar. In particular, he would have been delighted at the way in which computers can organize and present examples taken straight from texts. His own work is remarkable for the profusion

2

John Sinclair

of real examples, in marked contrast, I regret to say, to most of the grammars that have come after him. We have gone through a lengthy period - not all ascribable to the influence of MIT - of relative neglect of the evidence of usage, and Jespersen is a monument to the importance of the attestation of patterns in the activity of communication. I remember as an undergraduate hearing from one of his contemporaries about Jespersen's desk. It was a huge roll-top affair, with dozens of pigeonholes and little drawers and compartments in front, and down the sides several large drawers with various subdivisions in them. Apparently Jespersen would find in his reading a juicy example of something grammatically interesting, and make out a slip - just like any lexicographer - with the example and its source. He would then place it in the appropriate compartment in his desk ~ there was one for passives, one for auxiliaries, one for adverbials, etc. Then, when he wanted to write a piece on a grammatical matter, he would tip out the contents of the compartment and build the paper from the evidence that he had collected. His desk was a simple database. Nowadays he would probably use one of a thousand software packages to achieve the same effect, but his methodology is quite contemporary, and I think he would have found modern computer technology entirely to his taste.

2. The Drudgery of Lexicography Corpus linguistics is at its heart a way of increasing efficiency in the handling of linguistic evidence. It is not, at first sight, an agent for change. This role would only be possible if the evidence had not been efficiently handled in the past - if linguists had been restricted in their access to the evidence of their languages; if not even Jespersen's meticulous hoarding and sorting of examples had been enough to form the basis of accurate and reliable grammars. The reason that change is coming fast is because the capacity and processing power of modern computers so far exceeds the ability of an unaided human being that new observations and new challenges to the assumptions of linguistics are an everyday certainty in the presence of corpora. There is also the difference in skills between human and machine - the indefatigable routine of counting and sorting precisely defined phenomena, in which the machine excels, as against the speculation, hypothesizing and lateral thinking of the human, who has access to personal experience of language that goes beyond out present ability to understand. Wherever lexicographers are talked about, the famous phrase of Dr. Johnson has to be reckoned with - the "harmless drudge". Johnson himself was a lot more than a drudge and was certainly not harmless, but he managed to lay a stigma on the whole profession, suggesting that it was largely a clerical enterprise. He drew attention to the substantial responsibility of the lexicographer to collect and organize the evidence on which to base the lexicographical statements. This is the part that the computer can do far better than the human being. Instead of copying out every context once for every word in it, in a monstrous version of Jespersen's desk, it can just hold any number of complete texts, and retrieve anything a lexicographer wants as and when it is needed. It replaces the armies of readers, who were never properly briefed on what to look for, and who applied human judgement just where it would cause maximum disruptive effect - as a filter for the evidence before the expert had a chance to examine it. That part of the drudgery has, then, been removed for Dr Johnson's successors. Other parts have been substantially relieved. For example in traditional lexicography an entry is built up physically; the little piles of paper slips are stacked according to provisional criteria, and

Prospects for Automatic Lexicography

3

moved around as more examples come in and are added to the picture. Sometimes there will be major rethinkings, and wholesale restructurings of big entries, and there are dark stories of crucial slips getting lost, stuck to coffee cups or used as bookmarks, thus changing the description of a language! In the structured text files of a present day language computing environment, there are hazards, to be sure, but not of the same type. Building up an entry, changing one's mind, revisiting, editing and moving material from corpus to dictionary - all these are handled effortlessly by the machine, which can also calculate the length for you, validate and check aspects of the work against automatically applied criteria, instantly show a printed version of what has been compiled and cut out the whole drudgery of proof-reading.

3. Automatic Lexicography We might then, see the achievement of automatic lexicography as the continuation of this process of reduction of drudgery. Each new software tool improves life for the lexicographer, uses his or her human skills more efficiently and speeds up the work. Good news for those who pay the bills. Nowadays a massive new dictionary can be completed in two or three years rather than two or three decades. My feeling is that this would not be a safe or adequate method for arriving at automatic lexicography. However small the human role becomes in relation to the general effort, it is the determining factor, and in this model cannot be removed. Everything depends on the moments of human judgement, to pick out the significant patterns and impose a meaningful organization on the mass of material. All the computer does is reduce the drudgery, not make the decisions. Many, if not most, lexicographers and language analysts feel that the computer is not likely to go much further than that. The track record in areas such as machine translation is poor, extravagant claims are made, and deep cynicism is the result. If I am going to assert that automatic lexicography is attainable within present understanding and technology, and within a reasonable period of time, then it is certainly not a straightforward automation of present practice. Nor is the result guaranteed to replicate the organization and format of present-day, human-compiled books. We have to go back to what a dictionary is, and design an appropriate set of procedures with which to assemble a dictionary automatically. We usually think of a dictionary as an account of the meanings of words. Words are listed alphabetically and alongside each is an account of its meaningful behaviour; its various meanings, if it has more than one; its participation in phrases, of those phrases where it is one of the principal words; sometimes an example or two; often some other useful information, like its textual provenance, or how it is pronounced or what its origins are. The last two of these features are not recoverable from textual evidence with any reliability, so an automatic dictionary will not generate pronunciations or etymologies without assistance. Of the rest, the examples should be fairly easy, since they are the starting point of the lexicography. The other two features pose some problems. The distinction between word and phrase in the context of usage is complex. No reliable criteria exist for establishing a line between a word meaning which is governed by a tightly restricted context and a phrase involving that word. Perhaps that is a human problem, and no distinction is ultimately necessary. The feature that is central to the process of lexicography, and seemingly impossible to replicate in a machine, is the statement of meaning, the definiens. Even if the computer could organize the instances from a corpus into groups that would correspond to the human

4

John

Sinclair

impression of meanings, (and this is a big "even if') how could it express the meanings in its own words?

4. A model of lexicography It is necessary to build up an appropriate model of how a dictionary might be written from a collection of evidence, and then to see what, if anything, stands in the way of implementing the model. The outline of the process is: CORPUS -> PROCEDURES -> DICTIONARY

That is, we begin with a corpus, apply certain procedures to it, and end up with an output that is a dictionary. The design of the corpus is the first critical factor, which controls everything else. Unless all the necessary information is available somewhere in the corpus, the eventual dictionary will be deficient, because the computer cannot originate information. Next, the various procedures will have to be worked out. They can initially be classified as follows: RETRIEVAL,

ORGANIZATION,

DRAFTING

First the necessary information will have to be retrieved from the corpus - not necessarily all at once; then each set or type of information will have to be organized into a lexicographic structure; then the text of the definitions will have to be generated. There is, of course, an assumption here already - that conventional statements of meaning such as can be found in human dictionaries are an essential component of any dictionary worth considering. Perhaps other means can be found for expressing the meaning. In these days of multimedia CD-ROMs and virtual reality it may indeed be unnecessary to limit the automatic dictionary to what can be recovered and stated in the language that is being described. However, to keep our feet on the ground and avoid accusations of sharp practice I shall set myself the goal today of designing a method for automatic generation of the definitions as well as everything else.

5. Defining Language The language of dictionary definition is often referred to as a metalanguage, and this is rather misleading, since it is possible (cf the Cobuild range of dictionaries) to write definitions in ordinary sentences, apparently of the object language. Certainly defining is one of the frequent kinds of language behaviour that we indulge in, interspersing it with all sorts of other kinds of behaviour without appearing to switch between an object language and a metalanguage. Harris (1988) points out that, uniquely, language is its own metalanguage, and there seems to be no line that can be drawn to circumscribe 6the metalanguage. Here is an instance of defining behaviour, from a recent conversation. "There's a new phrase in cafes in Italy - caffè grande. There's caffè ristretto, you know, very small, then caffè normale, that's espresso, then caffè lungo, with water. And now caffè grande, in a big cup, for Americans."

Prospects for Automatic Lexicography

5

So any corpus is likely to contain lots of definitions or at least defining behaviour, where it is, as in the example above, not formally expressed as a set of definitions.

6. The Corpus There has been little activity in the design of corpora for particular purposes. The first corpora were simply attempts to make available better evidence than was previously available, and they used fairly straightforward sampling techniques. Naturally, if someone wants to build a corpus of the language of a restricted area of experience, they will select texts from that area, thus restricting the vocabulary and other features of the texts; otherwise the criteria are the same as for general reference corpora, namely a set of parameters that will ensure good variety and coverage. For automatic lexicography we must design a more complex structure because we must retrieve two kinds of data from the corpus, and not just one. So far, corpora have been used almost exclusively to provide examples of words and phrases in use. Linguists have not attempted to explore the propositional content of the texts in the corpora. Of course, for many this is the ultimate goal, but few think that our understanding of language is nearly good enough to be able to devise a useful system at the present time. Only the methods of content analysis, keyword searches and the like, are able to get results in this area, and these are not regarded as very interesting to linguists. There is a halfway house that is worth building - a means of recognising different kinds of propositions now, even if we are a long way from being able to say what they mean. What lexicography needs is definitions, and it may be the case that definitions can be recognised, even if not understood. We can be reasonably sure that there are instances of definition in almost all text. It is very likely that software can be designed that can pick out definitions, or sentences that make some sort of contribution to definition - defining behaviour, as it was called above. Research in progress by Geoff Barnbrook and Jennifer Pearson is moving in this direction. This would allow us to relate texts to each other by a kind of intertextuality, one text containing definitional material relevant to another text. Intertextuality in general is the relationship of one text to another through quotation, echo or reference. Let us define lexicographical intertextuality as a relationship where a word or phrase used in one text is defined in the other. We need to have, then, two sets of texts at least in the corpus - one set, called the central component, which contains the language for which a dictionary is being prepared, and another, called the supporting component, which contains defining material for some of the words and phrases in the central component. There is no requirement on a text to contain its own definitions, and indeed it seems natural that every text takes for granted a large proportion of its vocabulary; but given the large amount of defining behaviour that goes on, it should be possible to identify a set of supporting texts for any central component. This intertextuality is particularly valuable for more specialised kinds of lexicography, which the prospect of automation may attract. It would be a mistake just to gather lots of representative text and try to write a dictionary on that basis. The mere statistics of word occurrence militate against that kind of approach; in all large wordlists published so far, about half the vocabulary consists of single occurrences of words, and only a small proportion of the words occur often enough to support an accurate dictionary entry. In technical and scientific texts, and other specialised ones, there will be a lot of important words that are not defined because they are taken for granted; similarly there will be a lot of words that are not

6

John Sinclair

defined because they are not important enough, and only occur once or twice in a text. Nevertheless, the reader of the text will expect to find them all in a dictionary. The less common words may need some help from other texts too, texts where these words are more central. There is also, then, the extension component, which can supply definitions or partial definitions of some or all of these words. So the vocabulary of the central component, the target texts, will consist of several sets of words: words that are articulated in the central component, that is, used often enough for a definition to be checked against usage. of these, some will be defined in the central component; others will be defined in the support component words that are only mentioned in the central component; they do not occur often enough to yield enough evidence for a definition. of these, some may be defined in the support component; others will be defined in the extension component It is not possible to say as yet how many instances of a word are needed to provide the evidence for a machine-generated definition; hence a corpus design that gives high priority to the possibility of definitions being found in support or extension texts, and if not full definitions then at least a number of statements that can be checked against a draft machinegenerated definition. It has been clear from the early days of computer-assisted lexicography that the statistics of word occurrence made it impossible for any corpus to contain sufficient evidence for the meanings of its vocabulary to be defined. This leads to the requirement of the extension corpus. From informal study of technical texts over the years, it has also become clear that some words are assumed to be well known to the reader, and therefore although they may be used quite often, the evidence in the corpus may not be of a the kind that would lead to a noncontroversial definition; that is to say, the meaning of one such word in the corpus may be ascertainable, but may be very partial or biased as compared with the broad spread of the word in a variety of contexts.

7. Retrieval It is clear that retrieval will have two rather separate sides to it. On the one hand, the defining behaviour will have to be identified, evaluated and sorted, and on the other side the central component will have to be exhaustively processed for linguistic information. For each separate job to be done, I shall specify a piece of software to do the job. Sometimes such a tool is already working, at least in prototype form; other times it may still be at the design stage; occasionally it may not even be there, but the job does not seem intractable. I shall indicate the stage of development of each tool as far as I am personally aware, but there may be groups working under conditions of commercial confidentiality that are more advanced in some areas.

Prospects for Automatic Lexicography

1

7.1 A Lexicographical Evaluator This tool reads the sentences in a text and derives from them information which is potentially useful in a definition. In particular it finds definition sentences and related types of sentence. It is also used to establish intertextuality relations in a corpus, and to supply settings for variables in draft definitions. The central algorithm for this tool is nearly ready since its main criteria will be the categories of the definition parser (see Sinclair and Barnbrook 1994). The parser's initial strategy is to identify which of a number of types a definition statement belongs to, and it should not be difficult to turn this into a searching device for any sentence that falls into any one of its types. It will also be necessary to extend the types - at present covering any sentence which could be a Cobuild definition - to include less formal kinds of defining statement. In particular, the tool will be sensitive to textual restrictions on the occurrence of words, and to discriminators - aspects of meaning that differentiate a word from others round it.

7.2 Pattern Retrieval Kit I will not go into details, since most students of corpus linguistics will be well aware of the characteristics of concordances and the like. In addition to the normal toolkit, we shall need:

7.2.1 A Fuzzy Matcher This software compares a given pattern with a text and returns instances of matches at various levels of exactness. Such a tool is already working, called CHOC (Osborne 1994), and is being developed to fit in with other tools.

8. Analysis A number of standard tools are brought into play here, and I shall mention them briefly. The first group provide some basic grammatical information that will be extremely useful in sorting out the meanings, since although not all meanings follow grammatical classifications, a considerable number do.

8.1 A Word Class Tagger This tool ascribes a word class to each successive word in a corpus. The word classes are chosen from a prearranged tagset.

8.2 A Parser A parser provides a syntactic analysis for every sentence, clause etc. in a text. Again this comes from a pre-existing grammar, which may not always fit the facts of the corpus well but for the time being we must accept one of those available.

8

John Sinclair

8.3 A Collocator This tool finds the significant co-occurrence of words and phrases in the immediate textual environment of a word. Various versions exist, differing largely in the type of statistical tests used, but this is one of the central tools in the toolkit.

9. Organization The software in this segment of the work is in general not so well developed as in the retrieval and analysis sections, but it is taking shape.

9.1 Grouping Examples The beginnings of sense discrimination are in the grouping of examples according to the patterns of the words round about. It is a cornerstone of corpus-based lexicography that there is a recoverable association between meaning and form, whether it is the unambiguous single word or the patterning of two or more words together. The Saussurean relation of arbitrariness between sign and meaning persists, so that the meaning cannot be deduced from the pattern, but meanings, again in Saussurean doctrine, can be differentiated. This association can be taken literally - in fact the more unhesitatingly it is assumed that each different pattern correlates with a different meaning, the clearer the job becomes. Some important pieces of software are in preparation at this stage.

9.1.1 A Typicaliser This is an example selector which considers each line of a concordance in relation to the concordance as a whole, and groups lines with similar characteristics. The sensitivity of the discrimination is such that lines with the same value turn out to be examples of the same pattern/meaning complex. A prototype typicaliser called TYPICAL is in operation and under development in current work; it is an exceptionally powerful tool.

9.1.2 A Phrase Finder This is a general kind of sense discriminator. Where analysis indicates that the presence of one or more words in the context causes the rest of the context to differ from any of the other patteraings, a putative phrase is constructed and is considered potentially a different sense. Phrase finders deal with the output of fuzzy matchers, typicalisers, and second-stage collocators (dealing with the intercollocation of collocates). This tool is at present little more than a rough design, since it relies on the output of other tools which themselves are not stabilized in their output protocols.

Prospects for Automatic

Lexicography

9

9.2 Selecting Examples This is possibly the most important activity in computational lexicography, because it provides the evidence on which the statements about meaning rest. It is not easy for a human being, who has little alternative to reading every example and agonizing over almost impossible choices. Where the computers are responsible for the decision making, selecting examples may well be a very late stage in the actual construction of a dictionary text, for the computer needs to have all the examples available throughout so that it can make principled, and often statistically supported decisions. When it does come, however, it proves to be rather easy for the machine. 9.2.1 C-LECT This piece of software accepts as input a concordance, with one collocation picked out to indicate the sense that is relevant. The program hunts for normalcy among the examples, calculated arithmetically, and usually provides an example that is at least satisfactory. C-Lect is in daily use in a number of routines. It is not the only possible way of selecting examples, and a number of others are in preparation, from variations on the TYPICAL program to Classifiers, which handle each successive pair of lines and progressively classify them all.

10. Drafting Once the organization of the vocabulary is established via the previous routines, we must find a way of drafting entries. These will be based on a major piece of software, the dictionary definition parser (Sinclair and Barnbrook 1994).

10.1 A Definition Parser This suite of programs breaks down full-sentence definitions into componments of the defining process. It has already provided the basic engine for the Evaluator (7.1), and it also powers the next item.

10.2 A Definition Drafter This is the reverse of a definition parser. From database information about a putative sense of a word or phrase and its grammar, the parser can offer a framework and a specification for the definition. Where there are variables, the parser's analysis of the verbal environment can be consulted for an appropriate setting. For example if a sense is labelled "ergative verb, usually passive", the parser/generator will supply on the left hand side "If VI is verbed or verbs, ...". The context is now scanned for the most suitable setting of variable VI; if you then the definition continues with you; if someone, the pronoun will be they, and so on. The parser has not yet been turned into an operational generator, but because of its patternmatching algorithms, it is not expected to be a big job when the time comes.

10

John Sinclair

10.3 The drafting of definitions often involves the choice of a superordinate, which is rarely available in the text. Use has to be made of two further derivatives of the definition parser.

10.3.1 A Thesaurus Matcher This program examines an occurrence of a word or phrase in relation to its place in a thesaurus. It is not appropriate here to explain how a semi-automatic thesaurus will be constructed, but plans are well advanced for it. Based on it, the next item is of great importance in automatic lexicography.

10.3.2 A Generalizer This software groups together words and phrases that share the same superordinate. It is based on the thesaurus matcher, which in turn is based on the definition parser. For example there is a meaning of the verb fire which is to do with weapons. Now the word weapon may not itself collocate importantly with fire, but words like gun do. The thesaurus will show that a gun is a weapon, and so is a rifle, a revolver etc. Another class of things that are fired are projectiles, such as bullets, arrows and missiles. The generalizer identifies any appropriate superordinate and, depending on the collocation analysis, offers a phrasing of the first part of the right hand side of the definition. 11. Conclusion It is contended that, given a corpus of the kind envisaged above, and of sufficient size for the job, and a kit of software tools like the one just described, there is nothing in the way of satisfactory fully automatic lexicography being developed over the next few years. To get the best out of the evidence, there will have to be further tools built to co-ordinate output from more than one source and to reconcile competing claims from different pieces of software. The most critical pieces of software are those that deal with the groupings of words and phrases into senses. The individual instances can readily be grouped into sets on the basis of physical similarity, but the further grouping of them around senses is not yet established as a technique.

References Cobuild Dictionaries are published by HarperCollins in London Harris, Z. (1988): Language and Information; New York: Columbia University Press Osborne, G. (1993): Computer Based Analysis of Idioms and Idiom-like Phrases in English; M.Philthesis, University of Birmingham Sinclair, J. and Barnbrook, G. (1994): "The grammar and parser" in Sinclair et al (eds), The Languages of Definition; Studies in Machine Translation and NLP; Luxembourg, CEC.

Herbert Ernst Wiegand

Über die Mediostrukturen bei gedruckten Wörterbüchern ABSTRACT The present article introduces a formalized theory of mediostructure in an informal way. Printed dictionaries are a traditional form of knowledge representation. The dictionary-internal mediostructure interconnects the knowledge elements represented in different sectors of the dictionary on several levels of lexicographic description - depending on the dictionary type in a more or less systematic way - to form a network. In addition, many scientific dictionaries of languages and cultures which are lexicographically highly developed are interconnected by reference relations - for instance in the form of dictionary items. Such reference relations also exist - e.g. via items giving the source - with respect to the sources used in the dictionary. This applies particularly to specialized lexicography when making references to scientific literature. Naturally, the dictionary-internal mediostructures are complemented by different kinds of intertextual mediostructures. This article focusses on dictionary-internal mediostructures. The theory is introduced by referring to the terms (in italics) listed in the following paragraph and placing them in a theoretical framework. A lexicographer refers the potential user from a reference position giving the reference item or other reference transmitting items to the reference address, which possibly provides access to the lexicographic data relevant for obtaining the user's objective. Thus, a reference relation is established either between the reference item or other reference transmitting items to one or more reference address(es). Referring is a methodological lexicographic action guided implicitly or explicitly by reference conditions related to the dictionary subject or dictionary form. The motive for referencing is that a user infers a reference and follows it, thus fulfilling the condition that he can infer the dictionary-subject-related reference condition and thus possibly obtaining his user objective. HJ^E.^W. (Vorschlag zur Mediostruktur eines postmodemen Abkürzungswörterbuches)

1. Vorbemerkung: Strukturen in gedruckten Wörterbüchern Seitdem man in der neueren Wörterbuchforschung verstanden hat, daß Wörterbücher Textsortenträger sind, deren zugehörige Texte, Teiltexte und Textsegmente in bestimmten Beziehungen zueinander stehen, lassen sich im Rahmen der Systematischen Wörterbuchforschung (i.S.v. Wiegand 1989a:262f. u. 1994[95]) mindestens die folgenden Strukturen von gedruckten Wörterbüchern genau differenzieren und exakt untersuchen: - die textuelle Rahmenstruktur (vgl. Hausmann/Wiegand 1989:331ff; Wiegand 1990[91]: 107fT.) - die textuelle Binnenstruktur (auch: Wörterverzeichnisstruktur; vgl. Hausmann/Wiegand 1989: 33Iff. u. Wiegand 1990[91]: llOff.) - die Makrostrukturen (vgl. Wiegand 1989b)

H.E. Wiegand

12

- die äußeren Zugriffsstrukturen (vgl. Hausmann/Wiegand 1989: 334ff. u. Wiegand 1989b: 383ff.) - die folgenden beiden reinen Textkonstituentenstrukturen: - die Mikrostrukturen (vgl. Wiegand 1989c, 1989d, 1990[91]: 34ff., 1991: 360ff., u. 1993[94]: Abschnitt 3.) - die Artikelstrukturen (verstanden als um die nichttypographischen Mikrostrukturan zeiger erweiterte Mikrostrukturen, vgl. Wiegand 1989c: 441 ff. u. 1990[91]: 96f.) - die Positionsstrukturen (vgl. Kämmerer 1994) - die Angabestrukturen (vgl. Wiegand 1989c: 445ff u. 1990[91]: 96f.) - die Adressierungsstrukturen (vgl. Hausmann/Wiegand 1989: 349ff. u. Wiegand 1989c: 445ff. u. 1990[91]: 97ff.) - die Kohäsionsstrukturen (vgl. Wiegand 1988b: 81 ff. u. 1993 [94]: 2. Abschnitt) - die Thema-Rhema-Strukturen (vgl. Gerzymisch-Arbogast 1989) - die Kohärenzstrukturen (vgl. Wiegand 1988b: 79ff.) - die inneren Zugriffsstrukturen (vgl. Wiegand 1994[95]) - die inneren Schnellzugriffsstrukturen (vgl. Hausmann/Wiegand 1989: 337ff.) - die Mikroarchitektur (vgl. Wiegand 1993 [94]: Abschnitt 3 u. 1994) und last not least - die Mediostrukturen (oder: Verweisstrukturen\ vgl. Wiegand 1988a: 559ff; Blumenthal/Lemnitzer/Storrer 1989; Rey-Debove 1989) Ein Teil der genannten Strukturen ist relativ gut erforscht. Das gilt besonders für die Ordnungsstrukturen, also die Makrostrukturen, die Zugriffsstrukturen und die Textkonstituentenstrukturen. Für die Mediostrukturen gilt das nicht. Soweit ich sehe, gibt es nur zwei neuere Arbeiten zur Verweisproblematik, die von theoretischem Interesse sind: Rey-Debove 1989 und Blumenthal/Lemnitzer/Storrer 1988. Inzwischen habe ich eine Theorie der Mediostrukturen als Teil einer Allgemeinen Theorie der Lexikographie ausgearbeitet. Diese Theorie kann in einem Vortrag nicht vollständig dargeboten werden. Was ich heute vortrage, ist eine einführende Übersicht, und zwar nicht zu allen, aber doch zu den wichtigsten begrifflichen Differenzierungen und verweistheoretischen Termini, welche man benötigt, um die Theorie der Mediostrukturen auf wenigen Seiten formal aufschreiben zu können.

13

Mediostruhuren

2. Der Gegenstandsbereich einer Theorie der Mediostrukturen bei gedruckten Wörterbüchern Gedruckte Wörterbücher stellen eine traditionelle Form der Wissensrepräsentation dar. Durch die wörterbuchinterne Mediostruktur werden die textuell in unterschiedlichen Sektoren des Wörterbuchs vor allem sprachlich, aber auch durch Angabesymbole und mittels Illustrationen aller Art repräsentierten Wissenselemente auf mehreren lexikographischen Bearbeitungsebenen (z.B. auf der phonetischen, auf der flexionsmorphologischen, auf der wortbildungsmorphologischen u.a.) - je nach Wörterbuchtyp mehr oder weniger systematischvernetzt. Weiterhin sind vor allem die wissenschaftlichen Wörterbücher lexikographisch wohlbearbeiteter Kultursprachen (wie z.B. die historischen Einzelsprachen 'Deutsch* oder 'Dänisch') z.T. untereinander - z.B. über Wörterbuchbuchungsangaben (WbBA) - durch Verweisbeziehungen verbunden, so daß neben der wörterbuchinternen eine wörterbuchübergreifende Mediostruktur gegeben ist, in der n Wörterbücher mit n > 2 miteinander vernetzt sind, so daß man von einer wörterbuchvernetzenden Mediostruktur sprechen kann. Beispielsweise finden sich im Wörterbuchartikel wa, aus dem FWB (vgl. dazu Wiegand 1991: 394ff.) fünf Wörterbuchbuchungsangaben (WbBA5). In Abb. 1 (vgl. die folgende Seite) gebe ich eine Veranschaulichung zu demjenigen Ausschnitt aus der wörterbuchvernetzenden Mediostruktur, der durch wa, „eröffnet" wird.

a b f a s s e n , V. 1. >erw. schriftlich niederlegen, niederschreiben ecw. konzipierenecw. abmessen, v e r m e s s e n e H A L L M A N N , Marianne 3, 169 (Breslau 1670): Hindu. | Dir Spruch ilt abgefaßt: Crttfft die l/irralhcr an.

- Drcr. C E R M . - G A L I _ - L A T . 6a; Rwn 1,65; W R E O E , Aköln. Sprachsch. 14b; D I E T . / W O . 6; TRÜO.NEÄ Dt. Wb. 1, 12. 2. > (Hoffnung o. ä.) fassen, hegen, haben. Gäz. Leichabd. 196, 15 (Jena 1664): xtm (...] Jas menschliche Her.£ durch und durch Beriüilet [...] Hoffnung ¡intm lieblichen Mittage unjpyieifth len.

j auch abfal-

- FWB, 104 Weitere aus dem Wörterbuch herausführende Verweisbeziehungen bestehen - z.B. über die Belegstellenangaben (BStA) - zu den Quellen der Wörterbuchbasis, so daß man eine quellenbezogene Mediostruktur unterscheiden muß. In wa, z.B. werden zwei Verweisbeziehungen zu den Quellen des FWB eröffnet. Schließlich wird in Wörterbüchern (bes. in Fachwörterbüchern) mittels Literaturangaben (LitA) auch auf wissenschaftliche Literatur verwiesen, so daß man eine literaturbezogene Mediostruktur unterscheiden muß. Insgesamt ergibt sich mithin, daß der wörterbuchinternen Mediostruktur mindestens drei Arten von intertextuellen Mediostrukturen gegenüberstehen, so daß sich also der Gegenstandsbereich einer Theorie der Mediostrukturen bei gedruckten Wörterbüchern so gliedert, wie es sich in Abb. 2 (vgl. die folgende Seite) ergibt.

14

H.E. Wiegand Ausschnitt aus einer WÖRTERBUCHVERNETZENDEN MEDIOSTRUKTUR

WbBA

7\

WbBA

A

WbBA

A

WbBA

A

A.WbM TStA A.WbM TStA A.WbM TStA A.WTjM TScA i i i i ! i i i I i i i i i i i Dict. 5a RWB 1,65 Wrede, 143 Dief./ o Germ.(=WbJ Aköln Wü. 2 Gafl.Sprach(=WbJ Lat sch. (=Wb 3 ) (-Wb,)

/

A.WbiVI i I Trübner, Dt. Wb (=Wb.)

TScA i I 1,12

Abb. 1: Veranschaulichimg zu einer wörterbuchvemetzenden Mediostruktur anhand von wa,: A.WbM = Angabe der Wörterbuchmarke; TStA = Textstellenangabe; A = wörterbuchexterne Venveisauße nadresse (vgl.unten 3.8.); Wb = Wörterbuch

Mediostrukturen

intertextuelle Mediostrukturen

wörterbuchinterne Mediostrukturen

wörterbuchveroetzende Mediostrukturen

quellenbezogene Mediostrukturen

literaturbezogene Mediostrukturen

Abb. 2: Grobgliederung des Gegenstandsbereiches einer Theorie der Mediostrukturen bei gedruckten Wör terbtlchem Die wörterbuchinterne Mediostruktur kann viele verschiedene Arten von Verweisbeziekungen umfassen. In Abb. 3 (vgl. die folgende Seite) ist das lediglich angedeutet. „X" ist eine Variable für ein Textsegment und „X—>X" ist zu lesen wie X steht zu X in einer wörterbuchinternen Verweisbeziehung. Dabei können die beiden „X" im gleichen oder in verschiedenen Teilen des Wörterbuchs liegen. „X=>X" ist zu lesen wie X steht zu X in einer intertextuellen Verweisbeziehung. Aus Gründen der Vereinfachung in der Abb. wurde nur ein Variablenbuchstabe ()VX") verwendet. Es ist klar, daß die Vorbereichsvariable jeweils mit einem anderen Textsegment belegt werden muß als die Nachbereichsvariable.

15

Mediostrukturen

wdrterbuchinteme

Mediostruktur

WÖRTERBUCH WÖRTERVERZEICHNIS

VORSPANN

NACHSPANN

} X

i X

X—> X

"

x x X

X-n

X

ARTIKEL.

*

andere Wörterbücher

X

REGISTER

ILLUSTRATION

ARTIKEL

— x

wörterbuchvemetzende Mediostruktur

TI

ARTIKEL

X

I

xf--x

\

X

y

andere Wörterbücher

/js X QUELLENVERZEICHNIS

ILLUSTRA-vJ^ HON X

4

x n 7

ZT

7 LEISTE

N

7

X LITERATUR

literaturbezogene MediostnJaur

f>

X

\ /

QUELLEN

X

WORTERBUCHBASIS

quellenbezogene

MediostnJaur

Abb. 3: Übersicht über den Gegenstandsbereich einer Theorie der Mediostrukturen'

3. Grundbegriffe einer Theorie der Mediostrukturen Die Grundbegriffe einer Theorie der Mediostrukturen werde ich im folgenden so einfuhren,daß ich diejenigen Termini erläutere und vor dem Hintergrund meiner Theorie lexikoraphischer Texte in einen theoretischen Zusammenhang stelle, die in folgendem Satz kursiv gedruckt sind:

1) Auf die spezielle Verweistechnik, die mit einer Leiste unter dem Wörterverzeichnis arbeitet, gehe ich hier nicht ein: vgl. dazu Wiegand 1988a: 559ff. Sie wird durch ein spezielles Textverdichtungsverfahren erforderlich und verlangt - im Unterschied zu den üblichen Verweisverfahren - daß die Textsegmente, zu denen eine Verweisbeziehung etabliert wird, zusammen wahrnehmbar sind!

16

H E.

Ein Lexikograph verweist

. (vgl. 3 . 1 . ) .

aufgrund wörtcrbuchgegenstandsbedingtcr aufgrund wörterbuchformbedingter

Vanvaisvoraussatzungun

und

Verweisvoraussetzungen

explizit mit einer Verweisangabe

- (vgl. 3.2.) - (vgl. 3.3.) -

oder

. (vgl. 3.4.) -

explizit mit einer um eine Verweisbeziehungskennzeichnung

erweiterten,

- (vgl. 3.5.) -

oder implizit mit einer verweisvermittelnden

Angabe, deren Angabeform

- (vgl. 3.6.) -

keine mediostrukturanzeigende

aufweist,

venveisvermittelnden

Wiegand

Angabe

(die keine Verweisangabe ist),

Eigenschaß

den potentiellen Benutzer mit dem Verweismotiv,

daß er einen Verweis erschließt und diesem

- (vgl. 3.7.u. 3.9.) -

folgt, so daß dadurch die Voraussetzung gegeben ist, daß er die Wörterbuch gegenstandsbedingte Verweisvoraussetzung erschließen und gegebenenfalls sein Benutzerziel erreichen kann, von der

Verweisposition,

in welcher die Venveisangabe Angaben

oder die anderen

verweisvermittelnden

stehen,

an die Verweisadresse,

- (vgl. 3.3.) -

unter der gegebenenfalls die für die Erreichung des Benutzerziels relevanten Icxikographischen Daten erreichbar sind, so daß eine Verweisbeziehung

- (vgl. 3.10.) -

entweder von der Verweisangabe oder den anderen verweisvermittelnden Angaben zu einer Verweisadresse oder mehreren etabliert wird.

3.1. Verweisen als methodische, lexikographische Handlung Eine Handlung vom generischen Typ JEMANDEN VON ETWAS AUF ETWAS VERWEISEN ist - im vorliegenden Kontext - ein Handlungstyp, zu dem lexikographische Handlungen gehören. Kurz: Verweisen gilt hier nur als lexikographische Handlung. Das Handlungsziel der meisten Verweisungshandlungen besteht darin - um es zunächst bewußt allgemein zu sagen - die textuellen Voraussetzungen dafür zu schaffen, daß wenigstens zwei diskrete, semiotische Entitäten von Zeichenrezipienten in eine Beziehung gebracht werden können, umd zwar solche semiotischen Entitäten (z.B. lexikographische Textsegmente), von denen gilt, daß sie von einem angenommenen Wahrnehmer (z.B. einem Benutzer-in-actu) aufgrund ihrer Wahrnehmungsraumkoordinaten (z.B. einem Platz auf verschiedenen Wörterbuchseiten) nicht zusammen wahrgenommen werden können (vgl. aber Anm. 1).

Mediostrukturen

17

Entitäten sind gerade dann semiotische Entitäten, wenn jemand - z.B. auf der Basis der Kenntnis der Konventionen für ihren Gebrauch - durch sie zu anderen Entitäten gelangt. Nennen wir diese anderen Entitäten Wissenselemente und die Beziehung von den semiotischen Entitäten zu den Wissenselementen Repräsentationsbeziehung, dann können wir feststellen: Dadurch, daß der Lexikograph - z.B. durch das Setzen eines Rechtspfeils „->" - zwischen lexikographischen Textsegmenten, welche Wissenselemente repräsentieren und nicht zusammen wahrgenommen werden können, Verweisbeziehungen indiziert, schafft er die textuellen Voraussetzungen dafür, daß ein Benutzer-in-actu die repräsentierten Wissenselemente miteinander kognitiv vernetzt. Es wäre offensichtlich Nonsens, davon zu sprechen, daß ein Lexikograph Wissenselemente miteinander vernetzt, denn Wissenselemente sind als Teile eines Wissens intraindividuelle Entitäten und die Lexikographen können - gottlob - nicht in unseren Köpfen herumfummeln. Die Ausführung einer lexikographischen Verweisungshandlung erfolgt durch die Anwendung einer lexikographischen Verweismethode (welche eine Menge von geordnet anzuwendenden Vorschriften darstellt). Die gängigste Verweismethode in gedruckten Wörterbüchern besteht darin, daß innerhalb der Textstelle, von der aus verwiesen wird, der Verweisposition, die Verweisadresse, auf die verwiesen wird, erwähnt wird. Zusätzlich kann in der Verweisposition eine Verweisbeziehung durch Angabesymbole (z.B.: „T") oder durch Abkürzungen (z.B.: vgl.) indiziert werden. Die Methode hat zahlreiche Varianten. Das Pendant zur Verweisungshandlung des Lexikographen sind verweisbefolgende Benutzungshandlungen der Benutzer, von denen es zahlreiche unterschiedliche Typen gibt (vgl. Wiegand 1994[95]; Kap. II). Entscheidend für das erfolgreiche Zusammenspiel von Lexikograph und potentiellem Benutzer ist, daß die lexikographischen Daten in der Verweisposition so präsentiert sind, daß der Benutzer-in-actu aus ihnen einen Verweis und ein Leitelement (z.B. eine bestimmte Buchstabenfolge, eine Ziffernfolge oder eine alphanumerische Kette) zum Auffinden der Verweisadresse erschließen kann.

3.2. Wörterbuchgegenstandsbedingte Verweisvoraussetzungen Um zu verstehen, was Verweisvoraussetzungen sind, welche durch den Wörterbuchgegenstand bedingt sind, muß zunächst klargestellt werden, was unter einem Wörterbuchgegenstand zu verstehen ist. Zu diesem Zweck sei (nach Wiegand 1994[95]) folgende Definition gegeben: (D-l: Wörterbuchgegenstand) Der Wörterbuchgegenstand eines bestimmten Wörterbuches ist die Menge der in diesem Wörterbuch lexikographisch bearbeiteten Eigenschaftsausprägungen von wenigstens einer, höchstens aber von endlich vielen, sprachlichen Eigenschaften bei einer bestimmten Menge von im Wörterbuch erwähnten sprachlichen Ausdrücken, die zu einem bestimmten Wörterbuchgegenstandsbereich gehören. Es ist zu beachten, daß in (D-l) von einer Menge der im Wörterbuch erwähnten sprachlichen Ausdrücke die Rede ist, und nicht von einer Menge von erwähnten sprachlichen Lemmazeichen-, denn es sind - besonders in polyinformativen Wörterbüchern - nicht nur Eigenschaften von Lemmazeichen, die lexikographisch bearbeitet sind, vielmehr finden häufig Themenwechsel statt (vgl. Gerzymisch-Arbogast 1989), so daß also auch Eigenschaften von Angaben zum Lemmazeichen lexikographisch bearbeitet sind. Im Definiens von (D-l) ist der Terminus Wörterbuchgegenstandsbereich verwendet. Er läßt sich (mit Wiegand 1994[95]) wie folgt definieren:

18

H.E. Wiegand

(D-2: Wörterbuchgegenstandsbereich) Der Wörterbuchgegenstandsbereich ist der sprachliche Bereich, aus dem diejenigen spachlichen Ausdrücke stammen, die hinsichtlich bestimmter Eigenschaftsausprägungen in einem Wörterbuch lexikographisch bearbeitet sind. Wörterbuchgegenstandsbereiche sind Einzelsprachen und Varietäten wie Dialekte, Fachsprachen etc. sowie Sprachstadien. Es sei die Bemerkung erlaubt, daß die Unterscheidung von Wörterbuchgegenstand und Wörterbuchgegenstandsbereich auch weiterreichende, wissenschaftstheoretische Gründe hat (vgl. z.B. Holzkamp 1968). Wir sind nun soweit, daß wir angeben können, was eine wörterbuchgegenstandsbedingte Verweisvoraussetzung ist. Es handelt sich um eine Beziehung zwischen mindestens zwei sprachlichen Ausdrücken. Solche Beziehungen treten auf allen Beschreibungsebenen auf, auf der phonologischen, der morphologischen, der lexikalsemantischen usw. Einfache Beispiele sind:

(i) (ii) (iii) (iv) (v)

ging, gehen Lift, Fahrstuhl, Aufzug schreiben, Schrift Haustür, Tür gut, böse

Die Beziehungen werden auf der Metaebene bezeichnet durch Relationsterme wie z.B.: (i*) (ii') (iii*) (iv*) (v*)

x ist eine Flexionsform zu y x ist synonym mit y x ist eine Ableitung zu/von y x ist ein Kompositum zu y x und y sind antonym

Welche von solchen Beziehungen als Verweisvoraussetzung gilt, hängt vom Mediostrukturenprogramm (oder: Verweisprogramm) eines Wörterbuches ab, in welchem festgelegt wird, von wo auf was wie verwiesen wird.

3.3. Wörterbuchformbedingte Verweisvoraussetzungen Eine wörterbuchformbedingte Verweisvoraussetzung besteht darin, daß die Textsegmente, welche die zu vernetzenden Wissenselemente repräsentieren, innerhalb des zweidimensionalen Repräsentationsraumes eines Papierwörterbuches einen Repräsentationsort aufweisen, der solche Koordinaten hat, daß eine „gemeinsame" Wahrnehmung im Rahmen einer Benutzungshandlung nicht gewährleistet ist. Eine Wörterbuchform hat bestimmte Teile. Ein Teil z.B. ist, daß die Lemmata immer in bestimmter Weise angeordnet sind, z.B. glattalphabetisch. Die Anordnungsform als Teil der Wörterbuchform bedingt, daß z.B. ging in der g/G-Lemmateilreihe viele Seiten hinter gehen stehen muß. Falls die Lemmatisierungskonvention festlegt, daß Verben im Infinitiv als Lemmata angesetzt werden, und falls weiterhin festgelegt ist - wie z.B im Duden-GW und im

19

Mediostrukturen

Duden- 2 GW - daß die Stammformen der starken Verben als Lemmata angesetzt werden, dann kann im Verweisprogramm festgelegt werden, daß zu den Stammformen reine Verweisartikel geschrieben werden wie z.B. die folgenden: wa2:

ging: Tgehen. - Duden-'-GW, 1341 -

wa3:

buk: T'backen - Duden- 2 GW, 607 -

Reine Verweisartikel sind Einzeilenartikel und weisen rudimentäre Mikrostrukturen (i.S.v. Wiegand 1991, 479ff.) auf und liegen vor genau dann, wenn auf das Lemma nur eine verweisvermittelnde Angabe und sonst nichts folgt. Reine Verweisartikel weisen ein Verweislemma (i.S.v. Wiegand 1983, 454f. Def. 50) auf. Es kann jedoch im Verweisprogramm (in Abstimmung mit dem Mikrostrukturenprogramm) auch festgelegt sein - wie z.B. im Duden-GW - daß einfach erweiterte Verweisartikel zu den Stammformen-Lemmata zu schreiben sind, wie z.B. die folgenden: wii:

ging [grq]: Tgehen. -Duden-GW, 1039 -

wa3:

b u k [bu:k]: T'backen. - Dudcn-GW, 444 -

Ein Verweisartikel heißt einfach erweitert genau dann, wenn neben der verweisvermittelnden Angabe genau eine weitere elementare Angabe auftritt. In wa 4 und wa 5 ist dies die Ausspracheangabe (AusA).

3.4. Verweisvermittelnde Angaben I: Verweisangaben Gegeben seien die beiden Wörterbuchartikel waj und wa 7 aus dem Lexikon der Jugendsprache (Müller-Thurau 1985): Folgende Textsegmente sind Verweisangaben (VerwA): 2 Tgehen T'backen s. Softi

(Tv wa 2 und Tv wa4) (Tv wa 3 und Tv wa 3 ) (Tv waApposition" werden Verweisaußenadressen in der Lemmareihe des Allbuches erwähnt, das sich im Vorspann des Knaurs-GW findet.

37

Mediostrukturen

In wa, aus dem FWB findet sich u.a. die Wörterbuchbuchungsangabe „Trübner, Dt.Wb 1, 12". Mit Angaben dieses Typs wird mit der Bandangabe 1 und der Seitenangabe 12 ein Teil einer wörterbuchexternen Verweisaußenadresse erwähnt, den fehlenden Teil erhält der Benutzer aus dem Quellenverzeichnis und das Leitelement liefert das Lemma abfassen. Gegeben sei nun folgender Verweisartikel aus dem FWB: wa ls : angebunden, s. anbinden 3 - FWB, 1126 In der Verweisadressenangabe „anbinden 3" ist zuerst eine Verweisaußenadresse, das Lemma anbinden erwähnt und unmittelbar danach eine Verweisbinnenadresse, und zwar die mit der Bedeutungsidentifizierungsangabe „3", welche zusammen mit den anderen zwei im Artikel zum Lemma anbinden (FWB, 999) auftretenden Polysemieangaben eine innere Schnellzugriffsstruktur „(1. < 2. < 3.)" bildet (mit „>

S" ?

w

tf c 1

£ V x 3

- ^c

CC

antabus 5 aluminium 5

~ ' \-0\-er-ne

c

5

\-0\-0-ene

-en: -el:

c ^ O J- o

- t - it 1 o

4

' WW

~ 2 WW, C# -> CC

An Analysis of Danish Nominal Inflectional Paradigms Built on the Computer Bases System DANB0J

c >L "iu C

53

0101 ,

\-e\-ne

\-er\-er-ne

\-er\-er-ne, syncope

15

11

" \-e\-e-ne, syncope

9

' \-0V0-ene, syncope

-en~. fœrdsel tummel -et: mylder skrammel

~ 3 \0\0, syncope

14 2 3 2

-et: v&ben

~ 7 \-0\-0-ene, syncope_pl-def

1 -en: cylinder 11 -et: baiger 1

~ " \-e\-e-ne, syncope

13

~

-

16

center -et: alter nummer

~

-en: aposteI himmel

-et: krater 3 2

2

1 25 4

~ 12 \-e\-e-ne, syncope_pl,pl-def

-en: a/ten -et: asen kokken

~ 15 \-er\-er-ne, syncope

11 4 3

— " \-er\-er-ne, syncope_pl,pl-def

S •se « s 1 a JS c4) «Scui.1 1 -en: cykel boffel -et: mobel middel

66 31 17 3

54 Christian Becker-Christensen and Peter Widell

An Analysis

of Danish Nominal Inflectional

Paradigms

Built on the Computer Bases System DANB0J

55

3.1. The computer based inflectional representation system DANB0J As indicated above, the Danish system of inflected forms is rather complicated. Therefore, it has been a long felt desire to get rid of the tedious work with the Danish inflected forms by way of a computer based system that could generate the required information in a proper and reliable form. With the computer based system DANB0J we have tried to comply with the need for such a system. The word 'DANB0J' is an amalgamation of the phrase 'danske bejninger', i.e. Danish inflected forms. The system is for the present implemented as a prototype in a database management system. It is our intention to implement the system as a stand alone application. The main intention with DANB0J has been to offer the lexicographer a system that makes it possible automatically: (1)

to generate information on inflected forms in various formats for representation in dictionaries.

(2)

to assign to Danish words information on how they are inflected.

These two features can be found in DANB0J. DANB0J also includes tools for generating expanded forms lists for spelling checkers, tools for indicating division of words, and tools for building syntactic parsers. We will not describe these additional features here. The system is based on 3 databases (see the three cylinders on fig 1): (A)

The database SIMPLEX, which is a closed database consisting of Danish simple words in all their inflected forms and supplied with parts of speech and coded information on the inflected forms of the words. As for the prototype the database amounts to the lemma list from Politikens Retskrivningsog Betydningsordbog (21,000 words). In the final version the database will comprise all simple words from the Dansk Retskrivningsordbog (60,000 words). (B)

Another part of DANB0J is the database SMS which is an open database consisting of mainly composite words. Danish - like German, but unlike English - is heavily armoured with composite words. By the phrase 'open database' we mean that the user is allowed to feed the database with new words from text files (or word lists) in DOS. The database contains inflectional codes of the samme type as the words in the SIMPLEX database. Obviously, the words must be provided with this information before they can be stored in the database. It is quite easy though to provide the information with the set of automatic and semiautomatic tools provided in DANB0J. The tools are chiefly connected to the third database which we have called: (C)

ANALYSE, i.e. analysis. ANALYSE is an open, temporary or ad hoc database consisting of the words read from a textfile, chosen by the user of the system. The role of the ANALYSE database is to store the results of a set of analyses supplying the words in the base with their proper inflectional code. The analysis is conducted either automatically, semiautomatically or by manually filling in a template. We will return to that.

56

Christian Becker-Christensen and Peter Widell

To manipulate the various databases and to exploit the information stored in the bases a set of tools has been developed. The user can choose among the tools from a range of menus. The menus are distributed on a hierarchy of forms (see the menu forms on fig. 1 connected with dotted lines). The best way to present the system is probably to invite you on a guided tour around in the menu system. When the system starts the menu form DANB0J appears on the screen. Here the user finds four buttons. Firstly, a button to close the system after use. Secondly, a 'help'-button which gives the user access to information on how to operate the system. The user will find a 'help'button in every menu form throughout the system for information concerning local problems. The most important buttons are located to the right on the DANB0J form. They lead to the two main parts of the system: While the button 'formater', i.e. formats, guides you to the format part of the system (the dotted line down to the left) which controls the building of inflectional representation strings, the button 'database' guides you to the analysis part of the system (the dotted line down to the right) where the words not yet analyzed will acquire their inflectional code. That could be simplex words not contained in the SIMPLEX base. Or it could be composite words not yet stored in the SMS base.

3.2. The format part of DANB0J On pressing the 'formater'-button a new form 'vaelg format' appear. By the menu in this form it is possible for the user to build inflectional representation strings and store them in a sign delimited DOS file. As you can see it is possible to choose between several formats. For the present DANB0J offers the user 5 different formats. Here is a brief presentation of each of the formats: Format 1 as in Politikens Retskrivnings- og Betydningsordbog: asparges B0JES: no., -en, fit. -er, -erne B0JES OGSA: no., -en, fit. asparges, aspargesene Format 2: asparges B0JES: no., -en, fit. -er, -erne B0JES OGSA: no., -en, fit. -, -ene Format 3 as in Politikens Nudansk Ordbog: asparges B0JES: subst., -en, plur. -er, -erne B0JES OGSA: subst., -en, plur. asparges, aspargesene Format 4 (the same format as format 1, but with expanded forms): asparges B0JES: no., aspargesen, fit. aspargeser, aspargeserne

An Analysis of Danish Nominal Inflectional Paradigms Built on the Computer Bases System DANB0J

57

B0JES OGSA: no., aspargesen, fit. asparges, aspargesene Format 5 as in Dansk Retskrivningsordbog: asparges -en, -er el. asparges, bf. fit. aspargese(r)ne Beside those 5 types of strings it is also possible from the format menu in DANB0J to generate a list of all expanded forms for every word in the base. The generated inflectional strings and expanded forms can be sent either to the screen, to the printer or to a sign delimited file in ASCII. The user just chooses the format he or she wants by pressing the respective button with the mouse. Thereafter he or she selects output medium, i.e. screen, printer or file, presses the 'ok'-button and the program runs and provides the output. The user can return to the previous menu form via the button 'tilbage'. You can find a 'tilbage'-button on every form in DANB0J with the exception of the DANB0J form itself, which is highest in the hierarchy of menu forms in DANB0J.

3.3. The database part of DANB0J Now we will turn to the database part of DANB0J. Let's return to the DANB0J form and now press the 'database'-button. We will then find ourselves in a form (connected to the DANB0J form with the dotted line) where we can chose between 3 functions. We can: (1)

import an ordinary ASCII text file, analyze the imported words automatically and assign a inflectional code to them (arrow 1 in fig. 1; all data transport is indicated with arrows in fig. 1);

(2)

manually trim the words that could'nt be dealt with automatically and start another automatical analysis on the trimmed words (the form 'TILRETNING AF ORD I ANALYSEBASE');

(3)

select inflected forms by chosing manually from four special templates (the forms at the bottom of fig. 1).

The automatic system for assigning inflectional codes in (1) is based on a morphological parser, i.e. a programme modul designed to split a composite word in its morphological constituents (arrow 2 in fig. 1). To assign code it is sufficient to find the last stem morpheme plus its inflectional suffixes. Our parser does just that. The last stem morpheme with its suffixes will be looked up in the SIMPLEX database, and the code for the simple word in the base will be assigned to the composite word. Occasionally it is difficult for the parser to do its job properly. For instance, it is impossible for the parser to 'know', whether we have defective or suppletive inflected forms in a certain composite word. There are also problems with the rather few words in Danish of the type bondegard 'farm' ~ bondergarde 'farms', i.e. composite words with plural modification in the first constituent. When the parser has done its job the result of the analysis is stored in the ad hoc database ANALYSE. The base will contain only composite words and simplex words that couldn't be

58

Christian Becker-Christensen and Peter Widell

looked up in the SIMPLEX base. You can get access to the base by choosing the number 2 button on the form and confirm with 'ok'. Then a new form will appear: TILRETNING AF ORD I ANALYSEBASE which is a form for trimming the words in the ANALYSIS base. Words from the ANALYSIS base will be displayed in a special window on the form. You can browse through the words and you can modify, delete or manually create new words as required. To the right of the word window there is a window for the part of speech. The automatic analysis suggests a part of speech for some of the words it was not able to assign a code to. The tiny window beneath the word window is for administrative purposes. The trimming work is ready to start when the filter 'ord ikke ok' is turned on. Then all the words DANB0J couldn't assign a code are shown. Your job is to chose the lemma form of the last simplex word constituent of the composite word and supply the word with the correct part of speech. When you have looked through all the words you can press the 'ok'-button in the box 'kodning af tilrettede ord' which will then start another automatic analysis. Now all the composite words in the ANALYSE database should have a correct coding. As for the simple words from the analyzed text file most of the words will be found in the SIMPLEX database. Those words will be discarded since the inflectional information on them is already stored in the database. It happens that simple words, not in the SIMPLEX base, are imported from a text file. Since you cannot look them up in the SIMPLEX base they must be coded manually. You code manually by choosing inflectional information from 4 templates, 3 templates for each of the major word classes, substantive, verb and adjective, and 1 template for all the other parts of speech. You choose the templates by pressing the respective button in the box in the middle of the form 'TILRETNING AF ORD I ANALYSEBASE' On the chosen template you then compose the inflected forms of the word to be coded. We will not go into details about these templates, but a glance at the template for nouns will give you a fairly good picture of the procedure. The procedure is to choose elements for the inflected forms via the buttons. After your confirmation on the 'ok'-button DANB0J automatically assigns the internal code for the word. As soon as the remaining codes are chosen from the templates and stored in the ANALYSE base the words in the ANALYSE base must be transfered to the more permanent base SMS, i.e. the base for composite words to be of general use in the system. The procedure is quite simple: You return via the 'tilbage'-button to the previous form (the form just below the DANB0J form to the right in fig. 1). Here you press the number 3 button 'indf0ring af ord fra ANALYSE base til SMS base', and the words in the ANALYSE base will immediately be imported to the SMS base with all its inflected forms (arrow 3 in fig. 1). All words from the ANALYSE database with a duplicate word in the SMS database will of course be discarded. Properly stored in the SMS database the word from the ANALYSE database can now contribute to the generation of inflectional representation strings via the previously described menu form in the format part of DANB0J. With this transfer of inflectional information from the database part of DANB0J to the format part of DANB0J the cycle is closed and we have reached the end of our tour.

References Carstairs, Andrew (1987): Allomorphy in Inflexion. - Beckenham: Croom Helm. Danske Dobbeltformer (1992). Ed. by Henrik Galberg Jacobsen. Kabenhavn: Munksgaard (= Dansk Sprognasvns Skrifter 18).

An Analysis of Danish Nominal Inflectional Paradigms Built on the Computer Bases System

Politikens Retskrivnings- og Betydningsordbog (1994). - Kobenhavn: Politikens Forlag. Politikens Nudansk Ordbog (1990) 14th ed. - Kobenhavn: Politikens Forlag. Politikens Nudansk Ordbog (1992) 15th ed. - Kebenhavn: Politikens Forlag. Retskrivningsordbogen (1986). Ed. by Dansk Sprognsevn. - Kabenhavn: Gyldendal.

DANB0J

Evelyn Breiteneder

Herausforderungen der Textlexikographie: Der Belegschnitt ABSTRACT

The essay presents considerations on delimitations of lexicographic references in text lexicography at the example of the Fackel-text dictionary. It shows that in lexicography a difference should be made between delimitations of the lexicographic reference as a whole and delimitations of reference texts which are to function as lexicographic references together with other components. Furthermore, it illustrates at an example the problem of delimitations inside a reference text.

Der Not und dem eignen Trieb gehorchend, gewillt und gezwungen benütze ich Zeitpunkt und Raumgemeinschaft dieses Vortrags, der in keinem andern »Rahmen« als in seinem eigenen stattfindet, zu einer Klarstellung. Denn wenn das gedruckte Wort sich über das vorausgesetzte Mitwissen und Mitfühlen der Anhänger getrost erheben mag und immer wieder die bekannte Handlung zwischen mir und der Gegenwelt zum neuen Werk erhöhen — das gesprochene, vor den förmlich Stoffbeteiligten gesprochene, muß ihnen in der Tat nicht sagen, was sie nicht nur wissen, sondern woran sie selbst gewirkt haben. Wenn das gedruckte Wort auch ohne Leser würde und wäre, die Hörer gehören zum Vortrag, der ohne sie nicht wäre, nicht wegen des fehlenden Ohrs, sondern wegen des fehlenden Elements. Aber nicht um Sie handelt es sich, sondern um die, die nicht hören. Das so 'Gehörte' ist nicht die elaborierte Einleitung der Verfasserin. Das soeben 'Gelesene' ist Ausschnitt eines Belegtextes, den man beispielsweise in einem Textwörterbuch auflesen könnte. Losgelöst von seiner Funktion in einem Wörterbuch, ist dieser Beleg-Ausschnitt ein als Zitat nicht gekennzeichneter, montierter Textauschnitt aus der 'Fackel'. Einleitend ist nachzutragen, daß dieser Beitrag ein zweifaches Interesse verfolgt: Erstens will er auf das Wörterbuch-Unternehmen 'Fackellex' hinweisen, zweitens soll er zeigen, daß der innere Belegtextschnitt zumindest für die Textlexikographie unbrauchbar ist. Die Darlegung dieser Interessen erfolgt jedoch in umgekehrter Reihung: Erstens wird klargestellt, was im Kontext dieses Beitrags unter einem Belegschnitt verstanden wird, und zweitens wird über die Arbeit an einem 'Wörterbuch der Fackel' berichtet.

1. Der Belegschnitt Bevor der eingangs repräsentierte Beleg-Ausschnitt als Beispiel analysiert werden kann, ist es notwendig, eine terminologische Klarstellung vorzunehmen, die der Erörterung des Folgenden dienen soll und zugleich Teil der Werkstattsprache von 'Fackellex' ist. 1.1. Was ist ein Beleg ? In der Textlexikographie gilt der Beleg als das relevante lexikographische Beispiel. Hermanns' (1988:165) Vorschlag für eine Be/eg-Definition lautet: "Beleg ist im Wörterbuch ein authentisches Beispiel, das ein wörtliches Zitat ist und dessen Quelle nachweisbar ist. Nicht unbedingt nachgewiesen ist im Wörterbuch selbst, und erst recht nicht unbedingt ganz genau, mit Seiten- und Zeilenzahl etwa, nachgewiesen sein muß. Es genügt, damit etwas ein Beleg ist, wenn es überhaupt irgendwo nachgewiesen ist, also etwa in einem Zettelkasten oder im Computer der Redaktion, die das Wörterbuch gemacht

62

Evelyn Breiteneder

hat, so daß man den Beleg im Prinzip nachprüfen kann, auch wenn das vielleicht einige Mühe macht und man dazu extra nach Mannheim fahren muß." Hermanns' Definition dient zur Unterscheidung der Belege von lexikographischen Beispielen anderer Art: Gemeint sind jene, die vom Wörterbuch-Macher, der mit 'Beispielkompetenz' ausgestattet sein sollte, konstruiert werden (zur Unterscheidung authentische / konstruierte Beispiele vgl. Nikula 1986). Diese für lexikographische Zwecke 'erzeugten' Beispiele werden oft Kompetenzbeispiele in Abgrenzung zu den Belegbeispielen genannt (vgl. u.a. Haß 1991 und hinsichtlich der Abgrenzung der Bezeichnungen differenzierter (Kompetenzbeispiel / Beispielbeleg) Reichmann 1988). Eine Sichtung der Fachliteratur kann zur Beantwortung der Frage 'Was ist ein BelegT wenig beitragen. Die Ausführungen von Reichmann (1988) sind auf die im Titel seines Aufsatzes genannten Inhalte - 'Zur Funktion, zu einigen Typen und zur Auswahl von Belegbeispielen im historischen Bedeutungswörterbuch' - konzentriert. Haß (1991) stellt die lexikographiehistorisch relevanten Beispiel- und ße/eg-Traditionen dar und beschreibt deren Funktionen in Relation zu den jeweiligen Sprachauffassungen und Wörterbuchkonzepten. Die von ihr verwendete Bezeichnung Belegbeispiele "(abkürzend verwende ich auch Belege)" (Haß 1991:541) erfährt bereits zwei Jahre später in ihrer Habilitationsschrift eine terminologische Änderung in Beispielbelege; diese entsprechen den von Sanders verwendeten Belegen "in heutiger Terminologie" (Haß 1993:216). Die terminologische Kurzlebigkeit und Unklarheit ist offensichtlich, wenn es um den Beleg in heutiger Diskussion geht. Die Lexikographie hat bisher aber noch keine umfassende Theorie des lexikographischen Belegs und damit eine fundierte Be/eg-Definition vorgelegt, sie ist bei Wiegand noch "in Arbeit" (Wiegand 1994:17) und für den zweiten Teilband seiner Monographie 'Wörterbuchforschung. Untersuchungen zur Wörterbuchbenutzung, zur Theorie, Geschichte, Kritik und Automatisierung der Lexikographie' bereits angekündigt. In einem textlexikographisch orientierten Beitrag wie diesem geht es um Belege, Kompetenzbeispiele kommen in Textwörterbüchern im Prinzip nicht vor. Belege im 'Wörterbuch der Fackel' sind primär Textsegmente aus der 'Fackel', die als Zitate im Wörterbuchartikel gekennzeichnet sein müssen. Diese Kennzeichnung kann beispielsweise durch Angabe der Belegstelle, durch Typographie, Formatierung, Positionierung, durch das Einfügen von Anführungs- und Auslassungszeichen und durch Kombinationen von den genannten Komponenten erfolgen. Dem Wörterbuchbenutzer ist deutlich zu signalisieren, daß der Beleg eine eigene Textebene im Wörterbuch-Artikel repräsentiert, kurzum: Der Beleg muß als Beleg für den Benutzer klar erkennbar sein. Ein Beleg besteht demnach aus mehreren Komponenten, die ihrer Gestalt und Funktion entsprechend zu benennen sind. Die wichtigsten Komponenten sind der Belegtext und die Belegstellenangabe. 1.2. Was ist ein Belegschnitt ? Unter einem Belegschnitt ist das Reduzieren eines Belegs um einzelne Komponenten zu verstehen; diese Reduktion erfolgt immer zielorientiert, z.B. um eine lexikographische Beispielfunktion mit dem Beleg-Ausschnitt erfüllen zu können. Dem Belegschnitt muß ein Belegtextschnitt technisch und zeitlich vorausgehen. Ich versuche, eine Unterscheidung der beiden 'Schnitt-Typen' festzulegen: Mit einem Belegschnitt wird ein Beleg segmentiert. Dies ist auf die lexikographische Praxis bezogen dann der Fall, wenn ein Lexikograph mit einem Zettelkasten arbeitet, in dem sich die üblicherweise von Hilfskräften aus den Corpustexten exzerpierten Belegtexte, die bereits einem Lemma zugeordnet worden

Herausforderungen der Textlexikographie: Der Belegschnitt

63

sind, befinden. Der Lexikograph bearbeitet im Prinzip die im jeweiligen Zettelkasten vorgefundenen Belege. In diesem Zusammenhang ist es vollkommen gleichgültig, ob dieser 'Zettelkasten' aus Holz und Karteikarten oder aus elektronischen Datenträgern und den Files einer Datenbank besteht. Für die Textlexikographie ist ein derartiges Verfahren aus mehreren Gründen nicht wünschenswert. Während der Lexikograph einen Beleg für ein allgemeinsprachliches Wörterbuch so lange 'zurechtschneiden' und nachbearbeiten kann, bis der daraus resultierende Beleg-Ausschnitt als lexikographisches Beispiel für das Lemma oder eine andere Artikelposition dienen kann, ist ein Textlexikograph (z.B. bei der Ausarbeitung des 'Wörterbuchs der Fackel') verpflichtet, alle Artikelpositionen den Erfordernissen des Belegtextes anzupassen und entsprechend unterzuordnen. Im Unterschied zu seinem allgemeinsprachlich tätigen Kollegen ist es für den Textlexikographen wichtig, den Belegtextschnitt s e l b s t durchzuführen. In einem Textwörterbuch wäre die Verwendung von Beleg-Ausschnitten kontraproduktiv. Man wird daher prinzipiell fragen müssen, welche Beispielfunktion ein Beleg im Textwörterbuch übernehmen kann, ob er überhaupt als lexikographisches Beispiel im herkömmlichen Sinne dienen kann. Minimalisiert betrachtet, kann ein Beleg ohnehin nur das Vorkommen einer sprachlichen Einheit in einem bestimmten Kontext belegen. Die Feststellung von Nikula (1986:189), daß "'authentische' Beispiele also grundsätzlich keine geeigneten Beispiele im Sinne von 'Instanzen allgemeiner Regeln'" sind, bestätigt indirekt diese Überlegung. Das heißt aber auch, daß der Beleg in einem Textwörterbuch in seiner Funktion lexikographisch neu zu bestimmen sein wird, denn er erfüllt keineswegs die lexikographische Beispiel-Funktion oder sonst eine der traditionellen, von der Lexikographiegeschichte her bekannten Funktionen. Im 'Wörterbuch der Fackel' sollen beispielsweise Kontextfaktoren der Belegtexte und Interpretamente der 'Fackel', die die Kenntnis des gesamten Quellen-textes bei den Bearbeitern voraussetzen, belegt werden. Die Funktion eines Belegs in einem Textwörterbuch generell neu zu bestimmen, ist im Rahmen dieses Beitrags nicht möglich. Ein Beleg im Textwörterbuch kann möglicherweise nicht mehr als Beleg im traditionellen Sinn verstanden werden: Was ist denn auch mit einem Belegtext in einem Textwörterbuch zu 'belegen', wirklich 'belegbar' sind lediglich Entsprechungen der Zeichenkombinationen von Systemelementen der Sprache und Textelementen. Bei einer Funktionsbestimmung von Belegen in Textwörterbüchern ist beispielsweise zu fragen, in welcher Art und Weise Kontextbedingungen des Quellentextes — lexikographisch erfaßt und dargestellt werden können. Für die Beschreibung des Belegtextschnittes hat Wiegand ein brauchbares Instrumentarium vorgelegt und dabei auch auf die Problematik der Textausschnittsbildung hingewiesen (Wiegand 1994:24): "Wenn ein Textausschnitt aus einem Text des lexikographischen Korpus herausgelöst wird (z.B. durch Exzerpieren, Kopieren und Zerschneiden usw.), hat man damit nicht automatisch einen Belegtext für die zu belegende Eigenschaft(en) eines sprachlichen Ausdrucks; dies ist nur dann der Fall, wenn man bei der Textausschnittsbildung bereits genau weiß, was man eigentlich wie belegen wird (vgl. zu dieser Problematik auch Reichmann 1990a)1."

1) "Reichmann 1990a" entspricht "Reichmann (1990)" in diesem Beitrag.

64

Evelyn Breiteneder

2. Das 'Wörterbuch der Fackel1 - ein Textwörterbuch An der Österreichischen Akademie der Wissenschaften wird seit 1992 unter der Leitung von Werner Welzig an einem Textwörterbuch der 'Fackel' gearbeitet (vgl. Welzig/Breiteneder/ Sellner 1990/91, Welzig 1993, Wiegand 1994). 'Die Fackel' ist jene Zeitschrift, die in 922 Nummern, verteilt auf 415 feuerrote Hefte, von 1899 bis 1936 in Wien publiziert wurde; sie umfaßt 22.578 Seiten, der Umfang des Quellentextes für das Wörterbuch entspricht somit etwa jenem der Weimarer Goethe-Ausgabe. Karl Kraus, der 1874 in Jicin in Böhmen geboren wurde und bis zu seinem Tod am 12. Juni 1936 in Wien lebte, war der Gründer und alleinige Herausgeber der 'Fackel'. Das 'Wörterbuch der Fackel' wird in drei Wörterbuch-Typen, einem 'Wörterbuch der Redensarten', einem 'Schimpf- und Schmähwörterbuch' und einem nach L.V. Scerba so bezeichneten, 'ideologischen Wörterbuch' ausgearbeitet. Die beiden erstgenannten Typen werden alphabetisch gegliedert, das 'ideologische Wörterbuch' soll nach thematischen Gesichtspunkten strukturiert werden. Die lexikographische Erfassung ist auf Teilbereiche des Wortschatzes des Korpustextes bezogen, die Daten werden selektiv erhoben (vgl. Reichmann 1990:1591). Der gesamte Text der 'Fackel' ist auf elektronischen Datenträgern für die Wörterbuch-Arbeit verfügbar; er steht allen Bearbeitern für Suchabfragen und für die intelligente Ausführung von Belegtextschnitten zur Verfügung. Das von Alan Kirkness als "Fackellex" bezeichnete Vorhaben wird vorerst von vier vollbeschäftigten Mitarbeitern, einem studentischen Aufsichtsrat und dem Projektleiter getragen. Zur Zeit wird am 'Wörterbuch der Redensarten' gearbeitet, das 1999, zum 100. Geburtstag der 'Fackel', erscheinen soll. Die Ausarbeitung des gesamten Wörterbuchs ist mit 15 Jahren von uns und den Geldgebern zwingend festgelegt. 3. Der Belegschnitt: Eine Klarstellung zur Einleitung Als Einleitung zu diesem Beitrag wurde ein Beleg-Ausschnitt verwendet. Der Belegtext, ein Textsegment aus der 'Fackel', ist als wichtige Belegkomponente vorhanden. "Weggeschnitten" wurdenu.a. die Belegstellenangabe, die Anführungs- und Auslassungskennzeichnung, die typographische Auszeichnung und das Format; auf eine belegspezifische Positionierung im Text wurde verzichtet und eine orthographische Änderung am senkrechten rechten Schnitt* des im Beleg-Ausschnitt selbst nicht gekennzeichneten, inneren Belegtextschnitts vorgenommen. Nach Wiegand (1994:24) ist die Textausschnittsbildung von den "folgenden beiden ausschlaggebenden Faktoren" bestimmt: "(i) vom Ziel der Textausschnittsbildung (ii) vom jeweiligen Stand der Textkenntnis (d.h. vom zum Zeitpunkt des Textschneidens sich auswirkenden Textinhalt, über den der verfügt, der diese Handlung vollzieht)."

2) Die 'Schnitt-Terminologie' ist bei Wiegand (1994:20ff.) nachzulesen.

Heraitsforderungen der Textlexikographie: Der Belegschnitt

65

Es ist zu prüfen, ob diese 'glasklar' definierten Faktoren auch bei dem hier durchgeführten Belegschnitt mitgedacht wurden: (i) Ziel dieses Belegschnitts war es, den Belegtext als Einleitung für diesen Beitrag zu verwenden und weiters an diesem Beleg-Ausschnitt auch die Problematik des inneren Belegtextschnittes darzustellen. (ii) Der Kontextfaktor ist von der 'Textkompetenz des Schnitters' abhängig, vergleichbar etwa mit der 'Beispielkompetenz des Lexikographen', der Kompetenzbeispiele erzeugt. Ob Sie als Hörer oder Leser den hier als Einleitung verwendeten Beleg-Ausschnitt als Ergebnis eines Belegschnitts sofort erkennen konnten, ist zu bezweifeln. Möglich ist es auch, daß Sie die 'Textmontage' erkannt haben und darüber hinaus sogar in der Lage sind, die drei Punkte in der eckigen Klammer [...], die hier absichtlich weggeschnittene Auslassungskennzeichnung, die für einen inneren Belegtextschnitt als Markierung allgemein gebräuchlich ist, an der richtigen Stelle im Belegtext einzufügen. Eines können Sie jedenfalls an diesem und auch an keinem anderen inneren Belegtextschnitt mit Sicherheit nicht ermessen: den Textumfang, der sich hinter den drei Punkten im Auslassungskennzeichen verbirgt. Die Problematik des inneren Belegtextschnitts besteht nicht - um Mißverständnissen vorzubeugen - in der Quantität der "versteckten" Zeichen, vielmehr in der 'höheren Bearbeitungsstufe' des Belegtextes, die in den seltensten Fällen zu einer qualitativen Verbesserung des Belegtextes führt. Der Belegtext hat nach der Exzerption die Etappen der De- und Rekontextualisierung bereits einmal hinter sich und wird danach, oft auch nur aus Druckraumgründen durch innere Belegtextschnitte verdichtet, vielfach aber eben nur ausgehöhlt und sprachlich skelettiert. Ich kehre zum Belegschnitt aus der 'Fackel', zum Beginn meiner Ausführungen zurück. Der von mir gesetzte innere Belegtextschnitt, den ich absichtlich nicht mit einer Auslassungskennzeichnung markiert habe, repräsentiert einen Textumfang von 158 Seiten der 'Fackel'; einen Umfang also, der sich über zwei 'Fackel'-Hefte erstreckt und am Publikationszeitraum der Hefte gemessen, die Jahreswende von Dezember 1924 bis Januar 1925 überspringt. An den ersten Satz einer Rede von Karl Kraus, die am 5. Oktober 1924 von ihm gesprochen wurde und unter der Überschrift Klarstellung in der 'Fackel' abgedruckt ist (F 668-675, 6063) wurde ein Textsegment aus der Rede Zweihundert Vorlesungen und das geistige Wien, gesprochen am 1. Januar 1925, (F 676-678, 47-68), angehängt. Nach dem Belegtextschnitt haben die zwei Belegtexte ursprünglich folgende Gestalt: "Der Not und dem eignen Trieb gehorchend, gewillt und gezwungen benütze ich Zeitpunkt und Raumgemeinschaft dieses Vortrags, der in keinem andern »Rahmen« als in seinem eigenen stattfindet, zu einer Klarstellung." (F 668-675, 60) "denn wenn das gedruckte Wort sich über das vorausgesetzte Mitwissen und Mitfühlen der Anhänger getrost erheben mag und immer wieder die bekannte Handlung zwischen mir und der Gegenwelt zum neuen Werk erhöhen — das gesprochene, vor den förmlich Stoffbeteiligten gesprochene, muß ihnen in der Tat nicht sagen, was sie nicht nur wissen, sondern woran

66

Evelyn Breiteneder

sie selbst gewirkt haben. Wenn das gedruckte Wort auch ohne Leser würde und wäre, die Hörer gehören zum Vortrag, der ohne sie nicht wäre, nicht wegen des fehlenden Ohrs, sondern wegen des fehlenden Elements. Aber nicht um Sie handelt es sich, sondern um die, die nicht hören." (F 676-678, 54) Der innere Belegtextschnitt führt immer zu zwei Textausschnitten, nämlich zum Belegtext und zu dem Textausschnitt, der nicht als Belegtext Verwendung findet; in dem angegebenen Fall ist der Textumfang des 'Abfallprodukts' um ein Vielfaches größer als der noch verbliebene Belegtext. Was sollte gezeigt werden? Die Anwendung des inneren Belegtextschnitts ermöglicht im Prinzip die Kombination unterschiedlicher Textsegmente aus einem Korpustext. In jedem Fall kann jener Text, der zwischen zwei Textsegmenten steht, weggeschnitten werden und somit eine Verbindung von zwei ursprünglich kontextgetrennten Textausschnitten herbeigeführt werden. Die manipulativen Möglichkeiten bei der ße/eg-Erzeugung sind enorm. Der innere Belegtextschnitt kann nur bei bereits vorhandenen Belegtexten problemlos angewendet werden, also nachdem bereits ein Belegtextschnitt im Korpustext erfolgt ist. Diese Schnittechnik steht auf einer den Beleg betreffenden Bearbeitungsstufe. In einem Textwörterbuch sollten jedoch alle Belegtextschnitte am Quellentext selbst erfolgen. Der innere Belegtextschnitt ist für die Textlexikographie keine geeignete Technik und sollte in den Belegtexten für Textwörterbücher vermieden werden. In diesem Beitrag wurde versucht, einige Problemfelder zum Thema "Belegschnitt" zu benennen und zu skizzieren. Abschließend soll an noch ausstehende Antworten zu folgenden Punkten erinnert werden: -Komponenten eines lexikographischen Belegs -Funktion eines Belegs im Textwörterbuch -.Belegtextschnitt: Konsequenzen für das Textverständnis

Literatur An dieser Stelle gilt mein Dank für Literaturhinweise und bedingungslose Weitergabe von bibliographischem Wissen Herbert Ernst Wiegand. F = Die Fackel. Wien, Nr.l, April 1899 - Nr. 922, Februar 1936. Hg.v. Karl Kraus. Photomechanischer Nachdruck, hg. von Heinrich Fischer. 39 Bände. München: Kösel 1968-1976. Außerdem: Nachdruck in 12 Bänden. Mit einem Personenregister von Franz Ögg. Frankfurt/M.: Zweitausendeins 1977. Haß, Ulrike (1991): Zu Bedeutung und Funktion von Beleg- und Kompetenzbeispielen im Deutschen Wörterbuch. In: Studien zum Deutschen Wörterbuch von Jacob Grimm und Wilhelm Grimm. Band II. Hg. von Alan Kirkness, Peter Kühn und Herbert Ernst Wiegand. Tübingen: Niemeyer ( = Lexicographica. Series Maior 34), 535-594. Haß-Zumkehr, Ulrike (1993): Daniel Sanders (1819-1897). Kulturkonzept, Sprachauffassung und Lexikographie im Kontext von Wissenschaft und Gesellschaft. Habilitationsschrift [masch.]. Heidelberg. Hermanns, Fritz (1988): Das lexikographische Beispiel. Ein Beitrag zu seiner Theorie. In: Das Wörterbuch: Artikel und Verweisstrukturen. Jahrbuch 1987 des Instituts für deutsche Sprache. Hg. von Gisela Harras. Düsseldorf: Schwann ( = Sprache der Gegenwart 74), 161-195. Nikula, Henrik (1986): Wörterbuch und Kontext. Ein Beitrag zur Theorie des lexikalischen Beispiels. In: Akten des VII. Internationalen Germanisten-Kongresses, Göttingen 1985. Kontroversen, alte und neue. Hg. von

Herausforderungen der Textlexikographie: Der Belegschnitt

67

Albrecht Schöne. Band 3: Textlinguistik contra Stilistik? Wortschatz und Wörterbuch. Grammatische oder pragmatische Organisation von Rede? Hg. von Walter Weiss, Herbert Ernst Wiegand und Marga Reis. Tübingen: Niemeyer, 187-192. Reichmann, Oskar (1988): Zur Funktion, zu einigen Typen und zur Auswahl von Beispielbelegen im historischen Bedeutungswörterbuch. In: Symposium on Lexicography III. Proceedings of the Third International Symposium on Lexicography May 14-16, 1986 at the University of Copenhagen. Ed. by Karl Hyldgaard-Jensen and Arne Zettersten. Tiibingen: Niemeyer ( = Lexicographica. Series Maior 19), 413-444. — (1990): Formen und Probleme der Datenerhebung I: Synchronische und diachronische historische Wörterbücher. In: Wörterbücher. Dictionaries. Dictionnaires. Ein internationales Handbuch zur Lexikographie [...]. Hg. von Franz Josef Hausmann, Oskar Reichmann, Herbert Ernst Wiegand und Ladislav Zgusta. Zweiter Teilband. Berlin, New York: de Gruyter ( = Handbücher zur Sprach- und Kommunikationswissenschaft [ = HSK] 5.2), 1588-1611. Welzig, Werner (1993): Uns geht die Sprache verloren. Österreicher: die wortlosen Europäer? Für ein Wörterbuch der 'Fackel'. Ein Manifest. In: Die Presse. Spectrum v. 23./24.10.1993, I-II. Welzig/Breiteneder/Sellner (1990/91) = Werner Welzig/Evelyn Breiteneder/Angela Sellner: Vorarbeiten zu einem Wörterbuch der 'Fackel'. Typoskript. Wien ( = Projektantrag). Wiegand, Herbert Emst (1994): Kritische Lanze für Fackel-Redensartenwörterbuch. Bericht und Diskussion zu einem Workshop in der Österreichischen Akademie der Wissenschaften am 14.2.1994. Typoskript. Heidelberg (erscheint 1994 in: Lexicographica 9.1993).

Ingmari Bergquist Gunnar Persson

Masculine, feminine and epicene nouns revisited: informant reactions versus lexicographic definitions

In Persson (1994) a pilot study of the lexicographic treatment of pejorative and ameliorative lexemes in English was made with a view to finding out whether the often epicene or sex-neutral dictionary definitions of such words (e.g bastard, bully, lout, etc.) are in accord with the usage of ordinary language users. Four native speakers of English were given a questionnaire with a hundred such lexemes and were asked to judge whether the referents of the words were felt to be predominantly female, male or sex-neutral (see further below). The results hinted at a great deal of idiolexy among native speakers, but they also indicated that ordinary speakers identify either female or male referents in many cases in which dictionaries show a predilection for epicene definitions. It was argued that in this respect dictionaries ought to provide better and more precise information for foreign learners of the language. As the small pilot study was based on too few informants to allow safe conclusions to be drawn, we have now collected reactions from more informants and also taken a closer look at the definitions of the lexemes in a number of dictionaries. In the present paper, a study of some 60 informants will be presented together with a comparative investigation of the treatment of the same lexemes in nine monolingual English dictionaries. Before presenting and discussing this data, we will make a summary of the theoretical preliminaries of the former as well as the present study. Many evaluative lexemes which used to be lexically defined as either male or female in older dictionaries are now described as epicene (that is, sex-neutral) nouns in recent dictionaries. One example is bastard, which is defined as 'a man that one strongly dislikes' in LDCE (1978), but as 'an obnoxious or despicable person' in CED (1986) and similarly as 'an offensive or disagreeable person' in LDEL (1984). In the latter two dictionaries its definition no longer includes the concept 'male'. (OED 1989, however, still defines this sense as "...a term of abuse for a man or a boy..."). The same semantic change - if that is the proper description - may be traced for a number of other lexemes, e.g. boor, which is defined as 'clumsy or ill-bred fellow' in COD (1964), as 'a rude ungraceful ungentlemanly person, esp. a man' in LDCE (1978), and as 'an ill-mannered, clumsy, or insensitive person' in CED (1986). These lexical descriptions seem to represent a development from a stage in which 'male' is a conceptual or analytic component of boor (COD), via an intermediate stage where 'male' is no longer obligatory (LDCE) (though it is difficult to understand how a woman could be described as an "ungentlemanly" person!) to the final stage in which boor can be freely ascribed to persons of both sexes (CED). As has been repeatedly pointed out to us, a definition containing the term person need not imply that the lexicographer has had an epicene definition in mind. It is thus claimed that when e.g. lout is defined "a crude or oafish person" (CED) everybody will know that person really refers to men and not to women. While this may be obvious to native speakers, it may be misleading to foreign learners of English, who after all are likely to be the ones who use dictionaries most frequently. And to use person in such a way is of course not in line with all standard definitions of the term, e.g. "an individual human being, a man, woman, or child" (OED). Such a usage is in fact reminiscent of the strong efforts made by British universities and courts of law in the 19th and early 20th centuries to keep women out of higher education and learned professions. According to Ordoubadian (1986) the term person was defined by the

Ingmari Bergquist and Gunnar Persson

70

courts as referring to the "male, to the exclusion of women" to prevent women from holding offices or entering such professional schools as medicine and law. The historical reason for this blatant disregard of linguistic facts was that earlier legal and parliamentary documents sometimes used the term man, and sometimes with exclusively male reference expressions such as "office holder must be a person of sound mind". A similar confusion of sense and reference is still found in some dictionaries, though, as is to be hoped, for quite different reasons. The lexicographers' use ofperson seems rather to be the effect of a kind of non-sexist policy, but as we shall see, the results are often very doubtful in view of the informant reactions we will present. The question is what manner of description would be more accurate than the one suggested in recent dictionaries. A good theoretical starting-point seems to be provided by the prototype theory, as developed by for example Rosch (1975), Kay & McDaniel (1978), Coleman & Kay (1981) and Aitchison (1987). When a predicate, in the logical sense of the term, may be applied to both sexes, though in most cases only to one of them, it is useful to think of the preferred sex as the "prototype", that is, as the typical member of the extension of the predicate (cf. Hurford & Heasley 1983:87-100), as illustrated in Figure 1: Figure 1: Central and peripheral referents of chatterbox

men

y

Chatterbox is a good example of this point, because it was deemed to have primarily female reference by a large majority of the informants, but thought of as epicene by a minority (cf. below). Hence we can regard men as marginally or peripherally included in its extension. On the basis of this principle, which inevitably makes statistical analysis of the responses of native speakers very important, the extended pilot study reported on below was devised. The material consists of 57 questionnaires (37 female and 20 male informants) and 100 evaluative terms collected mainly from CED (1986). The inforggmants have stated their age, sex and nationality (or rather, their regional variety of English). They are all educated adults between the ages 19 - 88. No close analysis of the possible response differences between the sexes has been made yet, but there does not seem to be any striking discrepancies between them. We have also yet to look more closely at different age groups to see how usage may have changed over time. The informants were given the following instructions: Below you will find a list of words which may be used figuratively about human beings. Some of them are perhaps used only about men/boys, some only about women/girls, while others may be used about members of both sexes. I would like you to indicate for each word whether you think it applies only to men/boys (= M), only to women/girls (= F) or to both sexes ( = B). It may be useful in each case to think of the sentence That person is a (real)..., which may help you to find out what kind of referent (male or female or both) you associate with the word. Don't think for too long about each word, just tick off your spontaneous reaction.

71

Masculine, feminine and epicene nouns revisited

The results are presented in Table 1 below. The small letters in brackets indicate the sex of the informants. The cardinal letters in the left-hand column indicate the most frequent response type. The symbol + indicates a response type exceeding the others by more than 10, a difference taken as an indication of the existence of a potential prototype; ++ indicates a difference of more than 20, which is judged as a confirmed prototype; a plain letter indicates a difference of < 10 between the largest and the second largest response types, which is taken to be non-significant. Table 1. Informant reactions F l(lm)

M

ace

B+

angel

B M++

ass

l(lm)

29(19f+10m)

bastard

0

52(35f+17m)

5(2f+3m)

M++

beast

0

42(27f+15m)

14(10f+4m)

35(20f+15m)

19(16f+3m)

B

B

0

24(16f+8m) 22(17f+5m) 31(17f+14m)

F++

beauty

M++

blackguard

4(2f+2m)

41(29f+12m)

M+ B++

boor

0

37(25f+12m)

18(10f+8m)

brat

0

1 l(7f+4m)

46(30f+16m)

46(32f+14m)

l(lm)

9(4f+5m) 3(lf+2m)

M++

bugger

0

43(27f+16m)

13(10f+3m)

M++ M++

bully

l(lm)

38(24f+14m)

18(13f+5m)

butcher

0

49(3 lf+18m)

5(4f+lm)

M+

champ

0

34(24f+10m)

21(12f+9m)

F++

chatterbox

41(27f+14m)

B+

chicken

14(8f+6m)

M B++

churl

7(5f+2m)

17(12f+5m)

16(8f+8m)

crank

3(2f+lm)

15(7f+8m)

39(28f+l lm)

B++

creature

B

creep

M B

16(10f+6m) 32(21f+l lm)

3(lf+2m)

34(26f+8m)

27(20f+7m)

29(16f+13m)

crook

l(lf) 0

32(22f+10m)

25(15f+10m)

daredevil

0

26(13f+13m)

31(24f+7m)

B++

dimwit

2(lf+lm)

6(4f+2m)

49(32f+17m)

F F++

doormat

24(19f+5m)

7(2f+5m)

22(15f+7m)

dragon

46(3 lf+15m)

5(3f+2m)

B++

dupe

F

flirt

32(21f+l lm)

M B++

fox

1 l(6f+5m)

freak gem

14(6f+8m)

0 8(6f+2m)

0

l(lt) 27(16f+l lm)

12(8f+4m) 0 21(14f+7m)

3(2f+lm) 35(21f+14m) 25(16f+9m) 20(13f+7m)

5(4f+lm)

51(32f+19m)

0

29(21f+8m)

9(4f+5m)

43(27f+16m)

B B++

go-getter

F+

gold-digger

34(23f+l lm)

5(3f+2m)

18(1 lf+7m)

F

goose

25(16f+9m)

3(lf+2m)

21(13f+8m)

M++ F++

gorilla

0

gossip

39

F+

harpy

29(18f+l lm)

M+

hawk

l(lt)

33(20f+13m)

14(9f+5m)

M M++

heavy

30(21f+9m)

22(13f+9m)

hooligan

100 l(lf)

46(28f+18m)

10(8f+2m)

M++

jerk

0

47(32f+15m)

10(5f+5m)

5(5t)

51(32f+19m)

4(3f+lm)

0

18(1 lf+7m)

2(lf+lm)

10(7f+3m)

Ingmari Bergquist and Gunnar Persson

72 F-HM-HB B+ M++ B++ B++ M++ B++ M B B F++ B++ M++ B++

F++ F++ M++ M B+ M-hM++ F,B B++ B M++ M++ M++ M++

B++ F+ M++

B M++

B-HB

B-HM-HB-HB M+

F M M++ F++ B++ M++

kitten knucklehead lamb leech lion loony loudmouth lout madcap milksop moron mouse nag nitwit oaf parrot peach pearl pervert Pig P"g piss artist prick prude punk ragbag rat rogue roughneck ruffian saint saucebox savage serpent shark sheep shit simpleton skunk smarty-pants snake sod sourpuss stinker stuffed shirt stunner sucker swine

53(34f+19m) 0 25(14f+l lm) 5(2f+3m) 0 0 Him) 0 5(4f+lm) 5(3f+2m) 0 24(15f+9m) 48(31f+17m) 4(2f+2m) 0 12(8f+4m) 49(3 lf+18m) 49(3 lf+18m) 0 0 15(13f+2m) 0 0

28(20f+8m) 0

19(13f+6m) l(lf) 0 0 0

7(6f+lm) 27(17f+10m) l(lf) 17(10f+7m) l(li) 8(5f+3m) 0

l(lm) l(lm) 9(6f+3m) 7(4f+3m) 0

32(23f+9m)

0 0 46(28f+18m)

0 0

m 42(26f+16m) 4(3f+lm) 16(1 lf+5m) 46(29f+17m) 1 l(6f+5m) 16(9f+7m) 52(36f+16m) 1 l(6f+5m) 20(12f+8m) 27(17f+10m) 2(2f) l(lf) 7(4f+3m) 48(3 lf+17m) 2(lf+lm) 2(lf+lm) l(lf) 43(29f+14m) 30(16f+14m) 7(4f+3m) 42(27f+15m) 50(3 lf+19m)

Hit) 14(10f+4m) 5(2f+3m) 41(29f+12m) 48(33f+15m) 53(34f+19m) 48(29f+19m) 13(8f+5m) 2(lf+lm) 43(28f+15m) 1 l(7f+4m) 44(30f+14m) 7(5f+2m) 26(17f+9m) 13(10f+3m) 36(25f+l lm) 9(6f+3m) 22(15f+7m) 33(20f+13m) 0 31(21f+10m) 48(32f+16m) 0 17(1 lf+6m) 43(29f+14m)

0 9(6f+3m) 26(19f+7m) 35(23f+12m) 6(4f+2m) 46(31f+15m) 40(28f+12m) 2(lf+lm) 35(23f+12m) 12(7f+5m) 29(19f+10m) 29(19f+10m) 8(5f+3m) 46(33f+13 m) 7(5f+2m) 36(23f+13m) 6(5f+lm) 3(2f+lm) 13(7f+6m) 27(21f+6m) 29(17f+12m) 7(4f+3m) 4(3f+lm) 28(16f+12m) 41(26f+15m) 23(16f+7m) 15(7f+8m) 8(4f+4m) 2(lf+lm) 9(8f+lm) 36(23f+13m) 8(5f+3m) 13(8f+5m) 22(13f+9m) 12(6f+6m) 37(23f+14m) 29(18f+l lm) 43(27f+16m) 14(8f+6m) 39(25f+14m) 23(14f+9m) 20(14f+6m) 25(14f+l lm) 24(16f+8m) 6(2f+4m) 1 l(9f+2m) 38(25f+13m) 14(8f+6m)

Masculine, feminine and epicene nouns revisited

73

F

tit

M

toad

M

tough

B

twat

12(6f+6m)

B++

twit

3(2f+lm)

M++

villain

0

45(30f+15m)

1 l(7f+4m)

B+

vulture

9(6f+3m)

14(9f+5m)

33(22f+l lm)

M

whippersnapper

l(lf)

27(19f+8m)

22(12f+10m)

B

whiz kid

0

25(16f+9m)

31(20f+l l m )

M++

wimp

0

44(29f+15m)

13(8f+5m)

M

wino

0

30(21f+9m)

23(13f+10m)

M++

wolf

l(it)

52(33f+19m)

3(2f+lm)

M+

worm

0

34(26f+8m)

22(1 lf+1 l m )

22(14f+8m) 1(11) 0

8(5f+3m)

16(9f+7m)

23(15f+8m)

17(13f+4m)

30(17f+13m)

27(20f+7m)

17(10f+7m)

19(13f+6m)

7(4f+3m)

46(30f+16m)

The findings are of interest in several respects. First of all, the high degree of idiolexy or individual variation among the informants is noteworthy. This is, however, to some extent due to sociolinguistic factors such as age and geographical distribution. The considerable number of non-responses is also due to similar factors, which it will not be possible to account for within the scope of the present paper. Some lexemes have more than one evaluative sense, which may also give rise to varying responses. On the whole, the data presents an interesting picture of a lexical field in which uniform sense perceptions are rare - not one lexeme has been given a unanimous scoring! - and meaning changes are fairly frequent. Informant reactions indicated confirmed prototypes in 56 % of the cases. These are listed in Table 2 and discussed below. Table 2. Nouns with confirmed male, female or epicene prototypes Male bastard

Male piss artist

Female beauty

Epicene brat

Epicene sheep

beast

prick

chatterbox

crank

simpleton

blackguard

rat

dragon

creature

smarty-pants

bugger

rogue

gossip

dimwit

sucker

bully

roughneck

kitten

dupe

twit

butcher

ruffian

nag

freak

gorilla

savage

peach

go-getter

hooligan

shark

pearl

loony

jerk

skunk

stunner

knucklehead

stuffed shirt

madcap

lion

swine

nitwit

lout

villain

parrot

oaf

wimp

punk

pervert

wolf

saint

loudmouth

It is interesting to find that so many more nouns were judged to have prototypically male rather than female reference. Another interesting point is to see to what extent these judgements are in accord with dictionary definitions. Of the 28 nouns associated primarily with males by the informants 26 are defined as epicene in CED (1986), e.g. bastard : "an

74

Ingmari Bergquist and Gunnar Persson

obnoxious or despicable person", beast: "a brutal, uncivilised, or filthy person", etc. Only two of these nouns, gorilla and prick , are given male definitions in this dictionary. In order to find out to what extent informant reactions agree with dictionary definitions, we made a study of the definitions of the lexemes in nine monolingual dictionaries published between 1964 and 1992. This study may be partly read in terms of recent lexical and lexicographic history. Some lexemes or senses were suppressed for reasons of prudery at the beginning of the period, some have become or are becoming obsolete, some have come into existence during the period, some have undergone meaning changes, and some evince various developments in lexicographic descriptive techniques. The scope and purpose of the dictionary also accounts for a great many quantitative as well as qualitative differences. The dictionary definitions are in many cases as varied as the informant responses, which is no wonder, as they are also likely to rely on introspection to a great extent, especially in older dictionaries. Any definition making exclusive reference to either girls and women or boys and men has been judged as Female (F) and Male (M) respectively. Definitions with tags such as "esp. of men (women)" are referred to as Epicene with Male Prototype (EMP) and Epicene with Female Prototype (EFP) respectively. The absence of an entry in the relevant sense or word-class is marked 0. The presence of two alternative evaluative senses is indicated by &. "Epicene" has been applied to all cases in which the noun person or the pronoun someone {somebody) figures without any explicit mention of either sex. In some cases it has been difficult to make these judgements because of the circumlocutory use of synonyms employed to a great extent, in particular in older dictionaries, but in most instances it has been possible to make a fairly firm decision. The largest and the second largest response categories are given in brackets after each lexeme. Table 3: The treatment of lexemes with confirmed female prototypes in nine dictionaries COD 1964 EFP E EFP

PED 1969 F E EFP

gossip(39F, 18B) kitten(53F,3M) nag(48F,8B) peach(49F,6B)

EFP F

E F

0 EFP

0 F

pearl(49F,3B) stunner(46F,l IB)

0 E

E E

beauty (46F, 9B) chatterbox(41F, 16B) dragon(46F,5M)

ACD OALD LDCE LDEL CED 1970 1974 1978 1984 1986 EFP E EFP EFP F E E E E E E& F F E EFP F EFP E E E E 0 0 0 0 0 0 0 E EFP EFP E E& EFP EFP E F 0 E EFP E E E E EFP E E

CELD BBCD 1987 1992 F F E E F 0 E 0

E 0

0 E

0 0

E F

0 F

Some interesting observations can be made here. We can see that beauty is judged as either prototypically female, exclusively female or epicene in earlier dictionaries, but only as exclusively female in the latest ones. Yet some informants judged it to be epicene. Chatterbox is considered to be female by a vast majority of informants, but is nevertheless defined as epicene by all dictionaries. CED and CELD define pearl as epicene, but most informants regard it as female. Stunner can be epicene according to some informants, but it is defined as exclusively female in the latest dictionaries.

75

Masculine, feminine and epicene nouns revisited

Table 4: The treatment of lexemes with confirmed male prototypes in nine dictionaries: COD 1964

PED 1969

ACD 1970

OALD LDCE LDEL CED CELD BBCD 1974 1978 1984 1986 1987 1992

M& E E

0

E

M

E

E

E

E

E

E

E

E

E

E

0

E

E& M E

E

M

E

E

E

E

E

E

E

E

E

EMP E

E

E

E

E

E

EMP E 0 0 M M

E M E

E 0 E

EM P E M E

EM P E E M E

E M E

M 0 E

0 0 E

bastard(52M, 5B) 0 beast(42M,14B) blackguard^ 1M,4F) bugger(43M,13B) bully(38M,18B)

M& E E M

M

butcher(49M,5B) gorilla(51M,4B) hooligan(46M,10B) jerk(47M,10B) knucklehead(42M,9B) lion(46M,6B)

0 0

0 0

E 0

0 0

E 0

E E

E E

E 0

E 0

E

E

É

E

E

E

E

Ô

lout(52M,2B) oaf(48M,7B)

M M

M M

E E

M E

M E

E

E

E

M EM P E

M E

pervert(43M,13B) piss artist(42M,7B) prick(50M,4B) rat(41M,15B)

M E& M E

M& E E M

E

E

E

E

0

0

0

0

0

E

E

0

0

0 E& M M

0 E

0 E

E E

M M

E E

M E

M E

0 E

É

E

Ë

E

0

EM P E

M

M

EM P E

M

E

ËM P M

0

0

EMP E E M E

E E E E E

E E E E E

M 0 E E

E E E E E

E E E E E

É

E

E E& M

E E

E E& M

E E& M

M E E E ÉM P E E

E E

E E

M E

E E

rogue(48M,8B) roughneck(53M,2B) ruffian(48M,9B) savage(43M,13B) shark(44M,12B) skunk(36M,14B) stuffed shirt(48M,6B) swine(43M,14B) villain(45M,l IB)

É

E E E E

Kl

E 0 0

Ingmari Bergquist and Gunnar Persson

76 wimp(44M,13B)

0 E

0 E& M

0 Ë& M

0 M

0 ù

0 E

E

E

E

Ë

Ô

Ô

Informant reactions strongly indicate that a number of lexemes, all of which are markedly pejorative, e.g. bastard, beast, blackguard, bugger, hooligan, jerk, oaf, pervert, savage, shark, stuffed shirt, villain and wolf, ought to be defined as epicene with male prototype rather than as merely epicene, as is the case in many dictionaries, in particular the most recent ones. On the other hand, we do find male definitions in recent dictionaries of some nouns that were treated as epicene in older ones, e.g. butcher and swine in CELD. Table 5: The treatment of lexemes with confirmed epicene prototypes in nine dictionaries brat((46B,l 1M) crank(39B,15M) creature(34B, 14F) dimwit(49B,6M) dupe(35B,6M) freak(51B,5M) go-getter(43B,9M) loony(46B,llM) loud-mouth(40B, 16M) madcap(35B,l 1M) nitwit(46B,7M) parrot(36B,12F) punk(41B,14M) saint(36B,13M) sheep(37B,8F) simpleton(43B, 13M) smartypants(39B,9F,9M) sucker(38B,17M) twit(46B,7M)

COD PED 1964 1969 E E E E E E 0 E E E E E E E E M 0 0 E E E E E E 0 M E E E E E E 0 0

ACD OALD LDCE 1970 1974 1978 E E E E E E E E EFP E 0 E E E E E E E E E E E E E 0 0 E EFP E E E E E E E E 0 E M E E E E 0 0 E E E 0 0 0

E 0

E 0

E E

E E

E E

LDEL CED 1984 1986 E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E

E E E E 0 E 0 E E 0 E 0

BBCD 1992 E E E 0 E E E E E 0 E 0 E E 0 E 0

E E

E E

E E

E E

CELD 1987 E E E E E

Not surprisingly, there is a very high degree of conformity between informant reactions and dictionary definitions in these cases. We can note that creature and madcap are considered to have female prototypes in a couple of older dictionaries, while loony and punk were taken to have male prototypes, but otherwise lexicographers and lay language users are in agreement. Another interesting observation is that the second largest response category is nearly always Male in these cases with the exception of creature and parrot. The nouns that in our assessment (based on a response difference between 10 and 20) may have a potential prototype are listed in Table 6, and their treatment in the nine dictionaries is listed in the following tables.

77

Masculine, feminine and epicene nouns revisited

Table 6. Nouns with potential male, female or epicene prototypes Male

Female

Epicene

boor

angel

chicken

champ

gold-digger

leech

hawk

harpy

sod

saucebox

png vulture

worm

Table 7. The treatment of lexemes with potential female prototypes in nine dictionaries.

angel(3^F,176)

COD 1964 Ë

PED 1969 Ë

ACD 1970 ËËP

OALD LDCE LDEL CED CELD 1974 1978 1984 1986 1987 LPP ËFP ËFP Ë Ë

BBCD 1992 Ë

gold-digger^4Ë,iSÔ)

P

Ë

F

F

F

P

F

0

0

liaipyÂiÔBl)

E

E

E

F

Ë

ËËf>

Ë

F

F

saucebox(27F,8B)

Ë

ô

Ë

Ó

Ó

E

Ë

Ô

Ó

The extension of angel, in its evaluative sense, has probably widened to include men in recent years, as indicated both by informant reactions and the most recent dictionaries. The dictionaries are firmer in their treatment of gold-digger than the informants, who only indicate epicene usage with a potential female prototype. Harpy should probably rather be treated as epicene with female prototype according to informant reactions, not as exclusively female as is done in the latest dictionaries. Saucebox is probably a rare item, as indicated both by the paucity of informant reactions and the lack of entries in the majority of the dictionaries studied. Table 8. The treatment of lexemes with potential male prototypes in nine dictionaries.

boor(37M,l8B)

COD 1964 M

PED 1969 Ë

ACD OALD LDCE LDEL CED CELD BBCD 1970 1974 1978 1984 1986 1987 1992 E ËMP Ë Ë Ë Ë Ë

champ(54M,2iË)

0

E

Ë

Ë

Ë

E

Ë

E

E

hawkô3M,i4ë)

Ë

E

Ë

E

Ë

E

Ë

E

Ë

M

È

Ó

ËMÏ>

M

EMP

Ë

Ë

Ë

worm(54M,22&)

E

Ë

E

Ë

Ë

Ë

Ë

Ë

Ô

Informant responses show that boor and hawk should probably be judged as epicene with male prototype rather than as epicene, as is done in most dictionaries. Sod may have widened its extension to include women in recent years, as indicated by both informants and recent dictionaries.

Ingmari Bergquist and Gunnar Persson

78

Table 9. The treatment of lexemes with potential epicene prototypes in nine dictionaries. COD 1964

PED 1969

ACD 1970

OALD LDCE LDEL CED CELD BBCD 1992 1974 1978 1984 1986 1987

chicken(32B,14F)

0

E

EFP

0

E

E

E

E

leech(35B,16M)

E

E

E

E

E

E

E

E

0 E

prig(29B,15F)

E

E

E

E

E

E

E

E

E

vulture(33B,14M)

E

E

E

E

E

E

E

E

0

The dictionary definitions are on the whole in accord with the informant data. In the case of chicken there are two possible figurative senses: (1) 'a young person'; (2) 'a coward' (cf. LDEL). In the American dictionary (ACD) the former sense is defined 'a young person, esp. a young girl'. Table 10. The treatment of lexemes with non-significant informant response differences (< 10) in nine dictionaries. 10 a: female majority with epicene in second place

doormat(24F,22B)

COD 1964 0

PED ACD 1969 1970 E ó

OALD LDCE LDEL CED CELD BBCD 1974 1978 1984 1986 1987 1992 0 0 È È E ó

flirti,

EFP

EFP

E

E

EFP

EFP

E

E

E

goose(25F,21B)

fe

U.

E

fe

E^P

É

fc

E

Ó

sourpuss(32F,25B)

E

È

Ó

Ó

E

Ó

Ó

E& F

E

F

Ò

fe E

E

tit(22F,16B)

E

E

F

É E

In these cases the dictionaries are on the whole as divided in their definitions as are the informants in their reactions. Flirt and goose have probably widened their extensions to include men during the last thirty years. In the case of sourpuss, folk etymology may lie behind the high response frequency for Female. The second element puss is derived from IrGael pus 'mouth', but is likely to have become associated with puss 'girl' or 'woman' (cf. LDEL and CED). 10 b: male majority with epicene in second place

churl(l7M,l6B) crook(32M,25B)

COD 1964 M& E

PED 1969

ACD 1970

OALD LDCE LDEL CED CELD BBCD 1974 1978 1984 1986 1987 1992 E È E E fi fi

E

E

M& E

E

E

E

E

E

E

E

É

fe

M

fe

ó

È

E

E& F

E

Ó

heavy(30M,22B)

Ó

Ó

Ó

Ó

Ó

E

M

M

M

mllksop(ÌÓM,i2&)

M E ò

M

M

M

Ó

Ó

6

E

É

E

È

M E fi

M

É

M E 0

fe tì

E ò

E ó

pig(30M,27B)

stinke^S

Masculine, feminine and epicene nouns revisited

toad(23M,17B) tough(30M,27B) whippersnapper(27M,22) wino(30M,23B)

E M E 0

0 M E E

E E E 0

79

0 E E 0

E E E E

0 E E 0

E E E E

0 E E 0

0 0 E 0

In some cases, e.g. churl, fox, pig, tough and whippersnapper, the informant reactions are nearly evenly divided between male and epicene usage, which must be taken to justify the epicene definitions given in the dictionaries. Interestingly, heavy, which is defined as exclusively male in recent dictio-naries, is judged to be epicene by quite a few informants (30M, 22E) and as female by one (female) informant. 10 c: epicene majority with female in second place

gem(29B,27F) lamb(26B,25F) mouse(29B,24F) ragbag(23B,19F) serpent(22B,17F)

COD

PED

ACD

OALD LDCE LDEL CED CELD BBCD

1964 0 E E 0 E

1969 0 E E 0 E

1970 0 E 0 0 E

1974 0 E E E E

1978 E E EFP 0 E

1984 E E EFP E E

1986 E E E E E

1987 E E Ó 0 0

1992 E 0 0 0 0

Epicene definitions seem to be justified in all these cases, and the female prototype for mouse given in LDCE and LDEL is not supported by the informant data. 10 d: epicene majority with male in second place COD

PED

1964

1969 1970 EMP E Ë M M E 6

ace(24B,l9M)

E

ass0ÌB,MM)

ô

creep(WM?M) daredevil(31 B,26M) moron(29B,27M)

ô

shit^B^M)

E E ô

E E E

snake(23B,22M) twat(19B,17M)

Ë ó ó

ACD

OALD LDCE LDEL

1974

CED CELD BBCD

1984

1986

EMP E

E

E

Ë

Ë

Ë

Ë

Ë

É É E

E

Ë Ë

E

E ô ô E

E E E E

MË

ó

Mi

ó

1978

E

E E E

ô

ô É

E E

E

E

E

E E E M

1987 1992 ô E Ë È E

E

E E E

E E E

ô ó

ô ô

E

E

E

E

E

There are few surprises in Table 10 d, as definitions and informant responses can be said to agree with each other in most cases. Interestingly, CELD does not record the nominal, but only the adjectival use of ace in its figurative sense: "A person who is ace at something is extremely good at it; an informal use. EG an ace marksman." 10 e: equal female and epicene scorings

prude(28F,28B)

COD

PED

1964

1969 1970

ACD

OALD LDCE LDEL CED

1974

CELD BBCD

1978 1984 1986 1987

1992

80

Ingmari Bergquist and Gunnar Persson

This lexeme, which derives from Fr. prudefemme ('good woman', 'prudish woman' LDEL) has probably undergone an epicene meaning change from 'prudish woman' to 'prudish person' in the last thirty years. CELD gives an example with a male referent: "Come on Frank, don't be such a prude." The present study largely confirms the suspicions aroused by the pilot study that in many cases the non-sexist mode of definition adopted by recent dictionaries is not in line with the views held by ordinary native speakers. Even though dictionaries are not involved in any direct kind of two-way communication, they ought to follow Grice's (1975) co-operative principle, in particular the Maxim of Quantity (i): "make your contribution as informative as is required for the current purposes of the exchange". For the present purpose it would seem to be very easy to indicate prototypical usage by inserting a phrase such as "esp. a man (woman)" as in LDEL's (1984) definition of bugger in its pejorative sense: "...an offensive or disagreeable person, esp. a man...". The difficulty lies in what should constitute the basis of such an assessment. Single-handed introspection is a very uncertain method leading to subjective statements, as we have seen from a number of dictionaries investigated. Judgements based on large corpora might be able to identify prototypes, (although the many non-entries in CELD, which is based on a very large corpus, are noteworthy), but the safest procedure is probably to make large-scale informant studies with more strict control of various sociolinguistic factors than we have been able to exercise in the present study.

References (with abbreviations used) ACD Barnhart, C.L. & Stein, J. (eds) (1970) The American College Dictionary. New York: Random House Aitchison, J. (1987) Words in the Mind. An Introduction to the Mental Lexicon. Oxford: Basil Blackwell. BBCD Sinclair, J. et al. (eds) (1992) The BBC English Dictionary. London: BBC English, Harper Collins Publishers. CED Hanks, P., et al. (eds) (1986) The Collins English Dictionary. 2nd ed. London: Collins. CELD Sinclair, J. et al. (eds) (1987) The Collins Cobuild English Language Dictionary. London and Glasgow: Collins. Cole, P. & Morgan, J.L. (eds) (1975) Syntax and Semantics 3. New York: Academic Press. Coleman, L. & Kay, P. (1981) 'Prototype semantics: The English Verb lie.' Language 57:26-44. COD Fowler, H.W., Fowler, F.G. et al. (eds) (1964). The Concise Oxford Dictionary of Current English. 5th ed. Oxford: Oxford University Press. Grice, H.P (1975) 'Logic and conversation'. In Cole, P. & Morgan, J.L. (eds) (1975) Hurford, J.R. & B. Heasley (1983) Semantics: a coursebook. Cambridge: Cambridge University Press. Hyldgaard-Jensen, K. & Hjernager Pedersen, V. (eds) (1994) Symposium on Lexicography VI. Proceedings of the Sixth International Symposium on Lexicography May 7-9, 1992 at the University of Copenhagen. Lexicographica, Series Maior 57. Tubingen: Max Niemeyer Verlag. LDCE Procter, P. et al. (eds) (1978) The Longman Dictionary of Contemporary English. Harlow: Longman. LDEL (1984) Longman Dictionary of the English Language. Harlow: Longman. OALD Hornby, A.S. & Cowie. A.P. (eds) (1974) The Oxford Advanced Learner's Dictionary of Current English. London: Oxford University Press. OED Simpson, J.A. & E.S.C. Weiner (eds) (1989) The Oxford English Dictionary. 2nd ed. Oxford: Clarendon Press. Ordoubadian, Reza (1986) 'A History of the Term "Person". The SECOL Review X, 3:104-113. Paul, K. & McDaniel, C. (1978) 'The linguistic significance of the meanings of basic colour terms'. Language 54:610-646. PED Garmonsway, G.N. & Simpson J. (eds) (1969) The Penguin English Dictionary. Harmondsworth: Penguin Books.

Masculine, feminine and epicene nouns revisited

81

Persson, G. (1994) 'Masculine, feminine and epicene nouns in English dictionaries: presentation of a pilot study'. In Hyldgaard-Jensen, K. & Hjernager Pedersen, V. (eds) (1994). Rosch, E. (1975) "Cognitive representations of semantic categories". Journal of Experimental Psychology: General 104: 192-233.

Ulrich Busse

Probleme der Aussprache englischer Wörter im Deutschen und ihre Behandlung im Anglizismen-Wörterbuch 1. Einleitung Die Arbeiten, die sich mit den theoretischen oder praktischen Problemen der Aussprache von Anglizismen im Deutschen beschäftigen, sind bisher nicht sehr zahlreich. Die mehr als 1450 Titel umfassende Bibliographie des Anglizismen-Wörterbuches [=AWb] zur Interferenzliteratur mit besonderem Schwerpunkt des englischen Einflusses auf die deutsche Sprache nach 1945 verzeichnet nur wenige Arbeiten zu diesem Problem. Stellvertretend seien vier Arbeiten mit unterschiedlichem methodischen Ansatzpunkt angeführt: Neubert (1962) behandelt die Aussprache der haupttonigen Vokale englischer Wörter wie Derby und Manager, die im Deutschen verwendet werden, indem er die charakteristischen Unterschiede zwischen dem englischen und dem deutschen Vokalsystem herausarbeitet. Fink (1980) legt eine umfassende Studie auf empirischer Grundlage vor. Hansen (1986) analysiert die Unterschiede im Phoneminventar und in den phonotaktischen Regeln des Englischen und des Deutschen. Sein besonderes Augenmerk gilt dabei der Realisierung lautlich vergleichbarer Phoneme in beiden Sprachen, und Pawlowski (1986) legt einen ausfuhrlichen Rezensionsaufsatz zu den phonetischen Angaben in einsprachigen Wörterbüchern der deutschen Gegenwartssprache anhand von Brockhaus/Wahrig (1980-1984), Das Große Wörterbuch der deutschen Sprache (1976) und Mackensen (1982) vor, in dem er u.a. auf die Transkription von Anglizismen eingeht. 1.1. Vorgehensweise Ausgangspunkt meiner Ausführungen zur Aussprache englischer Wörter im Deutschen sind die Ausspracheangaben in dem von Broder Carstensen begründeten und von Regina Schmude und mir zu Ende geführten AWb. Dabei sollen die wesentlichen Prinzipien der Darstellung und Behandlung der Ausspracheangaben und ihre Prämissen, aber auch einige der bei der praktischen Wörterbucharbeit aufgetretenen Probleme dargestellt werden. 1.2. Die Transkription in anderen deutschen Wörterbüchern Das entscheidende Problem bei der Darstellung der Aussprache von aus dem Englischen entlehnten Lexemen ergibt sich aus der Tatsache, daß die Entlehnungen einer mehr oder minder starken lautlichen Angleichung an das phonologische System der deutschen Sprache unterliegen, "die vor allem durch die Unterschiede zwischen der Ursprungssprache und der Muttersprache in der Artikulationsbasis und in der phonologischen Struktur sowie die damit gegebenen Möglichkeiten der Interferenz bedingt ist." (Hansen 1986: 89). Häufig ist es schwierig zu entscheiden, inwieweit sich die Aussprache von im Deutschen verwendeten Wörtern englischer Herkunft von der englischen Aussprache entfernt und sich an das deutsche Lautsystem angeglichen hat, da dieser Vorgang sich in einem zeitlichen Kontinuum vollzieht und insbesondere bei jüngeren Übernahmen eher ein dynamischer Prozeß denn ein abgeschlossener Zustand ist. Zahlreiche deutsche Wörterbücher versuchen, dieses Problem durch die Angabe von Aussprache-Varianten zu lösen; so fuhrt etwa Duden Aussprachewörterbuch [=DA] (1990) bei Anglizismen entweder nur die deutsche Aussprache, deutsche und englische Aussprache nebeneinander oder nur die im Englischen gültige Aussprache an (vgl. DA 1990: 20). "In

84

Ulrich Busse

vielen Fällen steht dort neben der angenäherten Aussprache die aus der Ursprungssprache" (Pawlowski 1986: 288), die als solche gekennzeichnet wird. Die stichprobenartige Überprüfung der Anglizismen des Buchstabens ' J' hat jedoch keine einheitlichen Prinzipien erkennen lassen, nach denen etwa Alter oder Verwendungsfrequenz der Anglizismen den Ausschlag für die Zuordnung zur Rubrik "deutsche Aussprache" bzw. "englische Aussprache" gegeben haben können: Gruppe 1: dt. Aussprache [....] + dt. Vokalphoneme Jacketkrone, Jackpot, Jeep, Jet, Jetten, jobben, Jobber, Jockette, Jockey, joggen, Jogger, Jogging, Joule, Juice, Jukebox, Jumbo, Jump, jumpen, Jumper Gruppe 2: engl. Aussprache [....] + engl. Vokalphoneme Jamboree, Jam-Session, Jeans, Jet-Set, Jet-Stream, Jingle, Jitterbug, Jive, Job, Joint-venture, Joystick, Jumpsuit, Junk-Food, Junkie Gruppe 3: dt. Aussprache/engl. Aussprache: [j/....] Jazz, Jazzer, Job, Jockey Ein System, nach dem etwa häufige Wörter anders als Exotismen oder Neologismen behandelt werden, ist daraus nicht erkennbar. Darüber hinaus darf der Wert einer Ausspracheangabe engl, x in einem deutschen Aussprachewörterbuch bezweifelt werden, weil aufgrund des unterschiedlichen Phoneminventars und der phonotaktischen Strukturen der beiden Sprachen eine "echte" englische Aussprache wohl kaum vorkommen wird.

2. Die Ausspracheangaben im AWb 2.1. Präskriptive oder deskriptive Angaben? In den beiden 1980 und 1983 in Paderborn anläßlich des AWb durchgeführten Kolloquien sind u.a. auch die Ausspracheangaben problematisiert worden. Die Frage, ob das AWb deskriptiv die unterschiedlichen Aussprachemöglichkeiten dokumentieren und damit als Alternativen betrachten oder ob es als präskriptives Wörterbuch entweder die nach dem englischen Vorbild oder die nach der deutschen Standardaussprache "korrekte" Aussprache angeben soll, ist 1980 relativ zum angesprochenen Benutzerkreis beantwortet worden, für den das Wörterbuch konzipiert werden soll. Während der sprachinteressierte Laie eher die Normaussprache erwartet, hofft der wissenschaftliche Benutzer "eher eine deskriptive Dokumentation der Realisationstypen" vorzufinden (vgl. Reichmann/Wiegand 1980: 338). 1983 auf dem zweiten Kolloquium wurde zur Handhabung der Aussprache folgendes Verfahren vorgeschlagen: Bei den Lemmata, die direkt aus dem Englischen genommen sind, wird die englische Aussprache nach Jones/Gimson [...] angegeben. Bei allen Lemmata, die ausdrucksseitig aus dem Englischen übernommen sind oder mit ausdrucksseitigem Sprachmaterial englischen Ursprungs (im Deutschen) gebildet sind, werden die wichtigsten

85

Probleme der Aussprache englischer Wörter im Deutschen

und öfter belegten Aussprachevarianten angegeben, und zwar nach den Notierungskonventionen im Duden West und Ost. (Kirkness/Wiegand 1983: 325)

Neubert (1962: 621 ff.) hat jedoch überzeugend dargelegt, daß die Aussprache englischer Wörter im Deutschen sich nicht nach der von englischen Informanten oder der in englischen Wörterbüchern kodifizierten Aussprache orientieren kann, weil der Verweis auf eine derartige Berufungsinstanz nur eine Scheinlösung des Problems darstellt und an der Existenz zweier Lautsysteme vorbeigeht, wobei das Unvermögen des normalen deutschen Durchschnittssprechers, die entsprechenden englischen Laute hervorzubringen, als Beweis der Realität zweier verschiedener Lautsysteme dienen kann. Neubert ordnet den 20 im English Pronouncing Dictionary [=EPD] aufgeführten Vokalen die deutschen lautlichen Entsprechungen zu und kommt zu dem Ergebnis, daß die Lautentsprechungen bzw. -substitutionen sich in drei Gruppen einteilen lassen: 1. Lautsubstitution nach dem Gehör 2. Lautsubstitution nach dem Schriftbild 3. Verwendung eines annähernd gleichen Lautes

(ebd.: 623)

Dabei erfahren gut die Hälfte, nämlich 58%, eine annähernd herkunftsgetreue Wiedergabe mit nur geringen artikulatorischen Änderungen, aber 42% weisen in bezug auf ihre phonetische Realisierung erhebliche Unterschiede zum englischen Vorbild (Received Pronunciation) auf. Aus diesen Gründen wurde die Angabe der englischen Normaus- spräche für das deutsche Sprachzeichen für nicht angemessen erachtet. Bezüglich der Notation ist schließlich beschlossen worden, daß die phonetische Transkription im AWb auf der Lautschrift der International Phonetic Association basieren soll und nicht auf der volkstümlichen am lateinischen Alphabet orientierten Umschrift des Rechtschreibdudens (West). Zur Geschichte der Ausspracheangaben im Rechtschreibduden vgl. Busse (1993: 31 f.). Die in den Duden-Wörterbüchern (z.B. DA, DU, GWb) - wie oben dargestellt getrennt aufgeführten Lautzeichen für "deutsche" und "fremdsprachliche" Aussprache wurden zu e i n e m leicht modifizierten Transkriptionssystem zusammengefaßt, doch wurden insbesondere die in DA (1990) verwendeten Lautzeichen so weit wie möglich beibehalten. Das AWb versucht anzugeben, wie im gegenwärtigen Deutsch verwendete Anglizismen von der Mehrzahl der deutschen Sprecher in der Standardsprache ausgesprochen werden. Dabei wird in den meisten Fällen nur eine Aussprache angeführt. Wenn mehrere Angaben gemacht werden, spiegeln diese den vermuteten unterschiedlichen Grad der Integration der Lexeme in das deutsche Lautsystem wider. Die an erster Stelle stehende ist nach Meinung der Bearbeiter des AWb als die häufigere anzusehen. Dieser Anspruch ist problematisch, vor allem da das AWb auf einem schriftsprachlichen Korpus basiert und von Anglisten verfaßt worden ist, die von berufswegen wohl nicht als durchschnittliche native speakers des Deutschen angesehen werden können. Diese Inkonsequenz wurde jedoch aus mehreren Gründen, vor allem im Hinblick auf ausländische Benutzer, bewußt in Kauf genommen. Da das AWb neben der deutschen Aussprache an einer anderen Stelle innerhalb der Mikrostruktur des Wörterbuchartikels auch die nach Ausweis des EPD (1977, 1991) normgerechte englische Aussprache angibt, soll so dem an Interferenzproblemen interessierten Leser die Möglichkeit gegeben werden, die Prozesse nachzuvollziehen, die sich bei der Integration eines Anglizismus in das deutsche Sprachsystem auf der phonologischen Ebene ergeben. Dabei ergibt sich jedoch das Problem, die Standardaussprache(n) eines Anglizismus im Deutschen zu ermitteln. Die Frage des Standards bei Ausspracheangaben stellt sich jedoch nicht nur für die deutschen, sondern gleichfalls auch für die englischen Wörterbücher, denn

86

Ulrich Busse

die in (Aussprache)-Wörterbüchern vermittelte Aussprachenorm und die Sprachwirklichkeit sind nicht in allen Fällen kongruent. "Im englischen Sprachraum selbst ist die Zahl der Aussprachevarianten und -abweichungen von irgendeiner Norm unüberschaubar groß" (Fink 1980: 113) und die dem EPD als Basis dienende Received Pronunciation ist ein relativ artifizielles Konstrukt, das allerdings insbesondere in Lehre und Forschung den Vorteil eines international verständlichen Standards für sich beanspruchen kann. Gute Ansätze, die tatsächlichen Ausspracheverhältnisse auch im Wörterbuch zu berücksichtigen, finden sich m.E. im Longman Pronunciation Dictionary [=LPD], das bei zahlreichen Einträgen die Aussprachevarianten mit den Ergebnissen der "BE poll panel preference" versieht, so z.B. bei data ' d e l t 3 92%, ' d ( x : t 3 6%, 'daet3 2%.

2.2. Englische, deutsche oder gemischte Aussprache? Fink (1980: 118) unterscheidet in seiner empirischen Feldstudie zwischen einer "annehmbaren" englischen Aussprache* [*Fußnote: [...] volle Realisierung der englischen Vokal und Diphthong-Phoneme] und einer deutschen Aussprache, ausgerichtet an der für das Deutsche gültigen Graphem-Phonem-Relation [...]. Die dritte von uns als gemischte Aussprache ausgewiesene Variante bestand aus Mischungen von englischen und nicht-englischen bzw. nicht identifizierbaren Phonemen.

Das AWb schließt sich diesem Standpunkt nicht ganz an, sondern modifiziert ihn: Sicher ist, daß eine reine englische Aussprache nur in den seltensten Fällen anzutreffen ist, und deutlich ist auch, daß häufig Laute realisiert werden, die bezogen auf die charakteristischen phonetischen Merkmale der beiden Sprachen einen "Kompromiß" zwischen einer englischen und einer deutschen Aussprache darstellen. (AWb

1993: 81*)

Das AWb ist sich der Tatsache bewußt, daß der Grad der Integration englischen Wortmaterials in der Aussprache in Abhängigkeit von Alter, Bildungsgrad, Dialekt und besonders den Englischkenntnissen des individuellen Sprechers sehr unterschiedlich sein kann und eine erhebliche Variationsbreite zuläßt. Da aber die Angaben in einem Wörterbuch aus Gründen der Übersichtlichkeit und Benutzbarkeit eine Reduktion erforderlich machen, wird grundsätzlich ein hoher Grad an Integration im lautlichen Bereich zugrundegelegt und bei der Transkription entsprechend berücksichtigt, wobei gewisse Regelhaftigkeiten des Integrationsprozesses wie folgt systematisch gehandhabt werden:

2.3. Vokalismus 2.3.1 Die Behandlung der Diphthonge Besonders deutlich wird das o.a. Prinzip bei der Darstellung der englischen Diphthonge, die zu Monophthongen eingedeutscht und entsprechend transkribiert werden: Engl, [e—] -> dt. [e:], selten [1:] wie in Laser, Spray, Steak etc.

Probleme der Aussprache englischer Wörter im Deutschen

87

Bei Trainer z.B. wird der unterschiedliche Öffnungsgrad durch die Angabe beider Möglichkeiten [e:, 1:] berücksichtigt. Engl. [31] -> dt. [o:] wie in Boat People, Nobody, Poster, Roadster, Show etc. Diese Diphthonge wird kaum ein deutscher Sprecher mit voller englischer diphthongischer Aussprache verwenden, aber wohl auch nicht monophthongisch [e:] und [o:] aussprechen, sondern wahrscheinlich als Monophthonge mit einer mehr oder weniger deutlichen diphthongischen Komponente. Wenn sie im AWb trotzdem mit [e:] und [o:] transkribiert werden, so soll damit der Endpunkt eines phonetischen Integrationsprozesses abgebildet, gleichzeitig aber nicht ausgeschlossen werden, daß bei vielen Sprechern, besonders solchen mit Englischkenntnissen, auch eine diphthongische oder mehr oder weniger diphthongisierte Aussprache zu beobachten ist. Engl. [2:] -> dt. [o:} ] wie in Callgirl, Learning by doing, Service 2.3.2 Die Behandlung der Monophthonge Engl. [ dt. [1] wie in Big Bang, Camp, Gang, Jam-Session, Tramp, Uncle Sam etc. Die Aussprache des englischen [ dt. [u, a, ce] wie in Bluff, Bungalow, Butler, Curry, Cut, Pumps etc. Für den englischen Vokal [Z] läßt sich keine einheitliche deutsche Wiedergabe feststellen, denn hier scheint die Frage, ob deutsche Sprecher [u], [a] oder [ce] realisieren, vom Alter der Entlehnung und vom Alter und den Englischkenntnissen der Sprecher abhängig zu sein. Die im AWb dokumentierte Variationsbreite habe ich (Busse im Druck b) in einer kleinen Informantenbefragung empirisch überprüft und dabei festgestellt, daß die Sprachwirklichkeit in einigen Fällen von den in deutschen (Aussprache-)Wörterbüchern niedergelegten Angaben abweicht. Engl. [":] -> dt. [a:] wie in Fast Food, last not least, Showmaster etc. Folgt auf diese Vokale ein , so wird dies in der Regel durch die Angabe eines [}] wiedergegeben, wie in Party ['pa:} ti]. Engl. [3]

Ulrich Busse

88

Die Behandlung von engl. [ 3 ] ist von seiner Stellung im Wortkörper und den Betonungsverhältnissen abhängig, die im Deutschen anders sein können als im Englischen. Am Wortanfang oder im Wortinneren wird der [3]-Laut entweder beibehalten wie in Permanent Press, Rent a Car, zu [ & ] bzw. [o] wie in Lean Production, Opinion-leader oder zu [ 1 ] wie in law and order etc. Systematisch hingegen wurde der engl. Schwa-Laut in Nachsilben behandelt: Die Nachsilbe -ment wie in Apartment, Disengagement, Establishment etc. wurde systematisch als [m3nt] bzw. [ m i n t ] transkribiert. Die Aussprachevarianten sollen die verschiedenen Stadien der lautlichen Integration dokumentieren, wobei in solchen Fällen die weniger integrierte Variante an erster Stelle aufgeführt wird. Engl. [ 3 ] im Wortauslaut wird systematisch als [}] transkribiert, z.B. bei hire andfire und Wörtern mit -er im Auslaut, wie Designer, Killer, Manager, Western . Bei den movierten Formen Designerin, Killerin , Managerin etc. wird jedoch [ 3 ] beibehalten. Engl. [#] -> dt. [ & ] wie in Cotton, Pop, Rock, Spotlight etc. Engl. [ & : ] -> dt. [ & : ] wie in Callboy, Fallout, Talkmaster, Windfall-Profits etc. Im Unterschied zu DA (1990) behält das AWb den [&:]-Laut bei, der in Callboy, Talkmaster und anderen Anglizismen vorkommt, die englisches [ & : ] haben. Diese Entscheidung ist sicher nicht unumstritten, da der [&:]-Laut nicht Bestandteil des deutschen Phonemsystems ist, und die Verwendung des gleichen Transkriptionszeichens für das Englische und das Deutsche suggeriert fälschlicherweise, daß die Laute weitgehend identisch sind, wie Carstensen in der Einleitung zum AWb (1993: 85*) annimmt. DA (1990) transkribiert diese Wörter mit [o:], was m.E. ebenso wenig der deutschen Sprachwirklichkeit entspricht. Überdies ist dieses Zeichen im AWb für Wörter wie Show, Smoking etc. reserviert, die im Englischen [31] haben. Engl. [I]-> dt. [U] wie in Look, Notebook etc. Engl. [W] -> dt. [V] Der engl. Halbvokal [W] wird zu dt. [V] wie in Gangway, Lambswool, Swing, Whisky etc. Engl. [ U , + ] -> dt. [ U , + ] wie in Thriller Die engl. Phoneme [ U , + ] werden beibehalten, obgleich - der Erfahrung zahlreicher Englischlehrer Rechnung tragend - diese dem deutschen Phonemsystem fehlenden Laute dem deutschen Sprecher Schwierigkeiten bereiten.

Probleme der Aussprache englischer Wörter im Deutschen

89

2.4. Konsonantismus Im Vergleich zum Vokalismus sind die Änderungen, die bei der Übernahme englischen Wortguts und ihrer phonetischen Transkription im Konsonantismus einhergehen nicht so zahlreich und so gravierend, was sich u.a. auch aus der Geschichte der beiden germanischen Schwestersprachen erklärt. Denn im Gegensatz zum Deutschen hat das Englische sein Vokalsystem stark verändert. Als größtes Problem ist hier in erster Linie die Auslautverhärtung anzusprechen. 2.4.1 Auslautverhärtung Es unterliegt kaum einem Zweifel, daß auslautende englische stimmhafte Konsonanten im Deutschen stimmlos werden: Engl, [b, d, g] -> dt. [p, t, k] wie in Job, Trend, Gag etc. Auch hier gilt, daß Sprecher des Deutschen mit Englischkenntnissen eher stimmhafte Endkonsonanten aussprechen als solche, die über keine oder nur geringe Englischkenntnisse verfugen. In vielen Fällen gibt es Zwischenwerte oder eine nicht deutlich als englisch oder deutsch erkennbare Aussprache. Das AWb verallgemeinert auch in diesen Fällen und gibt grundsätzlich als einzige deutsche Aussprache die mit stimmloser Konsonanz an. Die Auslautverhärtung tritt nicht nur im Wort-, sondern auch im Silbenauslaut ein; vgl. z.B. Job und seine Komposita Jobhopper, -killer, -rotation, -sharing etc. 2.4.1.1 Stimmhafte bzw. stimmlose Konsonanz im Wortanlaut Engl, [z] -> dt. [z] oder [s] im Wortanlaut, wie in Service, Sex, Single etc. Eine generelle, vom Einzelfall unabhängige Regel, ob der Laut nun stimmhaft oder stimmlos realisiert wird, kann nicht aufgestellt werden. Das gleiche gilt übrigens auch für [sp ]/[Rp] sowie für [st]/[Rt], bei denen eine (vermutete) überregional verbreitete Aussprache angegeben wird, wobei allerdings deutlich ist, daß das deutsche Sprachgebiet keineswegs einheitliche Ausspracheformen aufweist, sondern sich Einflüsse der Dialekte auch bei der Aussprache von Anglizismen zeigen. So wird man etwa in Norddeutschland andere Aussprachen der Konsonanten hören als in anderen Teilen des deutschen Sprachgebietes, etwa bei [st]/[Rt] etc.

2.5. Geographische Besonderheiten Wie bereits angemerkt, versucht das AWb eine überregionale Aussprache anzugeben. Regional und dialektal bedingte Aussprachevarianten von Anglizismen, die wie heimische Wörter den jeweils dialektal bedingten Ausspracheregeln folgen, bleiben unberücksichtigt, geographische Besonderheiten, die sich ausschließlich auf Anglizismen beziehen, werden (in bescheidenem Rahmen) erfaßt. Beispiele dafür sind z.B. österreichische Besonderheiten bei Cottage und Jumper.

90

Ulrich Busse

2.6. Betonung Probleme, die sich aus der Angabe der Wortbetonung, insbesondere bei phraseologischen Einheiten, ergeben, werden hier ausgeklammert. Ich verweise in diesem Zusammenhang auf die Einleitung des AWb, insbes. S. 85 f. und auf Broeders (1987), der Betonungsprobleme phonologischer Idiome in englischen Wörterbüchern und ihre sprachlichen Hintergründe untersucht. 3. Zusammenfassung Die Aussprache englischer Wörter im Deutschen ist von vielen Faktoren wie Alter der Entlehnung, Bekanntheitsgrad, Integration in das Laut- und/oder Schriftsystem der deutschen Sprache, den Englischkenntnissen und dem Alter der Sprecher abhängig. Die Frage, ob ein Anglizismus akustisch oder optisch vermittelt worden ist, scheint ebenfalls von Belang zu sein. Eine ein- oder mehrfach gehörte Aussprache kann sich bei den Sprechern eventuell verfestigen. Auch wenn meine Hypothese (Busse: im Druck b) bezüglich des Einflusses von Radio und Fernsehen in der o.g. kleinen Erhebung nicht mit Sicherheit bewiesen werden konnte, trifft das von Fink (1986: 178) Gesagte m.E. zu: Es sieht also so aus, als ob Rundfunk und Femsehen tatsächlic h in manchen Fällen fllr die Aussprache von englischem Wortgut im Deutschen richtungsweisend sein könnten. Mangels eindeutiger Beweise müssen wir aber zugleich den spekulativen Wert unserer Vermutung unterstreichen.

Ein weiterer Faktor, der u.a. für die Aussprache von Bedeutung sein könnte, ist das Verständnis der Anglizismen; vgl. dazu (Fink: ebd.) und Carstensen/Hengstenberg (1983). Allen diesen Faktoren kann ein Wörterbuch, auch das Anglizismen- Wörterbuch, nicht gerecht werden. Das AWb versteht sich deshalb bewußt nicht als Norm - vielen Sprechern ist die Norm und demzufolge auch ein Normverstoß gar nicht bewußt. Nicht ausgeschlossen werden kann jedoch, daß das AWb entgegen seiner Intention als deskriptives Wörterbuch möglicherweise auch präskriptiv-normativ wirken könnte. Das AWb legt - wie dargestellt - einen hohen Grad lautlicher Integration zugrunde, was bestimmte Vokalqualitäten oder -quantitäten oder die Qualität auslautender Konsonanten betrifft, andererseits wird der bei einigen Lauten und Lautgruppen vorhandenen, z.T. beträchtlichen Schwankung in der tatsächlichen Realisation der Laute durch deutsche Sprecher Rechnung getragen. Dies betrifft z.B. die Wiedergabe von engl. [Z], anlautendem [.] oder [st], die im Wörterbuch häufig mit Varianten erscheinen, wobei auch auf den Einzelfall bezogene Entscheidungen notwendig sind, weil keine allgemeingültigen Tendenzen erkennbar sind. Insgesamt können die Ausspracheangaben in einem Wörterbuch, zumal wenn es im wesentlichen auf einem schriftsprachlichen Korpus beruht, nicht den gleichen Grad an Verbindlichkeit wie etwa die grammatischen Angaben beanspruchen, sind aber dennoch m.E. sowohl für die sprachinteressierten Laien als auch für die wissenschaftlichen Benutzer eine wertvolle Informationsquelle. Abzuwarten bleibt jedoch, ob das AWb den divergierenden Interessen beider Benutzergruppen gleichermaßen gerecht zu werden vermag (vgl. Busse im Druck a). Möglicherweise kann die Bewußtmachung der Unterschiede im englischen und deutschen Phonemsystem - ähnlich wie Neubert (1962) für seinen Beitrag erhofft - im Englischunterricht produktiv gemacht werden, um den Englischlernenden deutlich zu machen, daß englische Laute nicht durch Modifikation der muttersprachlichen Laute zu erreichen, sondern neu zu phonemisieren sind.

Probleme der Aussprache englischer Wörter im Deutschen

91

Bibliographie Wörterbücher AWb = Anglizismen-Wörterbuch. Der Einflu/3 des Englischen auf den deutschen Wortschatz nach 1945. Begründet von Broder Carstensen, fortgeführt von U. Busse. (1993) Bd. 1. - Berlin: de Gruyter. DA = Wörterbuch der deutschen Standardaussprache (1990) Bearbeitet von M. Mangold in Zusammenarbeit mit der Dudenredaktion. Der Duden in 10 Bänden. Bd. 6, 3. Aufl. - Mannheim: Dudenverlag. DU = Duden. Deutsches Universalwörterbuch (1989) Herausgegeben und bearbeitet vom Wissenschaftlichen Rat und den Mitarbeitern der Dudenredaktion unter der Leitung von Günther Drosdowski. 2. Aufl. - Mannheim: Dudenverlag. EPD = Jones, D. (1991) English Pronouncing Dictionary. Ed. by A. C. Gimson and S. Ramsaran. 14th ed. Cambridge: Cambridge University Press [and 14th ed. 1977], GWb = Duden. Das große Wörterbuch der deutschen Sprache in sechs Bänden (1976-1981) Herausgegeben und bearbeitet vom Wissenschaftlichen Rat und den Mitarbeitern der Dudenredaktion unter Leitung von Günther Drosdowski. - Mannheim: Bibliographisches Institut. LPD = Wells, J. C. (1990) Longman Pronunciation Dictionary. - Burnt Mill: Harlow.

Sekundärliteratur: Benson, M. et al. (1986) Lexicographic Description of English. - Amsterdam: John Benjamins. Brazil, D. (1987) "Representing Pronunciation". - In: J. M. Sinclair, ed. (1987) Looking up. An Account of the Cobuild Project in lexical Computing and the Development of the Collins COBUILD English Language Dictionary. -London: Collins, 160-166. Breeders, T. (1987) "The Treatment of phonological Idioms". - In: A. Cowie, ed. (1987) The Dictionary and the Language Learner. Papers from the EURALEX Seminar at the University of Leeds, 1-3 April 1985. Tübingen: Niemeyer ( = Lexicographica. Series Maior 17) 246-256. Busse, U. (1993) Anglizismen im Duden. Eine Untersuchung zur Darstellung englischen Wortguts in den Ausgaben des Rechtschreibdudens von 1880-1986. - Tübingen: Niemeyer ( = Reihe Germanistische Linguistik 139). - (im Druck a) "Das Anglizismen-Wörterbuch und seine Benutzer". - In: Fremdsprachen Lehren und Lernen 23 (1994). - (im Druck b) "Wenn die Kötterin mit dem Baddibuilder". Ergebnisse einer Informantenbefragung zur Aussprache englischer Wörter im Deutschen. - In: D. Haiwachs et al., Hgg. (1994) Sprache - Sprechen Handeln. Akten des 28. LinguistischenKolloquiums, Graz 1993.-Niemeyer: Tübingen(= Linguistische Studien 320, Bd. 1) 23-30. Carstensen, B./Hengstenberg, P. (1983) "Zur Rezeption von Anglizismen im Deutschen". - In: H. E. Wiegand, Hg. (1983) Studien zur neuhochdeutschen Lexikographie III. - Hildesheim: Olms ( = Germanistische Linguistik 1-4/82)67-118. Fink, H. (1980) "Zur Aussprache von Angloamerikanischem im Deutschen". - In: W. Viereck, Hg. (1980) Studien zum Einflu/3 der englischen Sprache auf das Deutsche. Studies on the Influence of the English Language on German. - Tübingen: Narr ( = Tübinger Beiträge zur Linguistik 132) 109-183. Gimson, A. C. (1981) "Pronunciation in EFL Dictionaries". - In: Applied Linguistics 2, 250-262. Hansen, K. (1986) "Zur Aussprache englischer Wörter und Namen im Deutschen". - In: H. Stiller, Hg. (1986) Der angloamerikanische Einflu/3 auf die deutsche Sprache der Gegenwart in der DDR. Dem Wirken Martin Lehnerts gewidmet. - Berlin: Akademie Verlag, 89-102. Kirkness, A./Wiegand, H. E. (1983) "Wörterbuch der Anglizismen im heutigen Deutsch". - In: Zeitschrift für Germanistische Linguistik 11, 321-328. Landau, S. I. (1984) Dictionaries. The Art and Craft of Lexicography. - New York: The Scribner Press. Lewis, J. W. (1975) "Symbols for the General British English Vowel Sounds". - In: Zielsprache Englisch 2, 1-4. Neubert, A. (1962) "Linguistische Betrachtungen zur Aussprache englischer Wörter im Deutschen (haupttonige Vokale)". - In: Wissenschaftliche Zeitschrift der Karl-Marx-Universität Leipzig 11, 621-626. Pawlowski, K. (1986) "Die phonetischen Angaben in einsprachigen Wörterbüchern der deutschen Gegenwartssprache". -In: H. E. Wiegand, Hg. (1986) Studien zur neuhochdeutschen Lexikographie VI, 1. Teilbd. Hildesheim: Olms ( = Germanistische Linguistik 84-86) 279-326.

92

Ulrich Busse

Reichmann, O./Wiegand, H. E. (1980) "Wörterbuch der Anglizismen im heutigen Deutsch". Kolloquium vom 14. bis 16. Februar 1980 an der Universität-GH-Paderborn. - In: Zeitschrift für Germanistische Linguistik 8, 328-343. Standop, E. (1985) Englische Wörterbücher unter der Lupe. - Tübingen: Niemeyer ( = Lexicographica. Series Maior 2). Ternes, E. (1989) "Die phonetischen Angaben im allgemeinen einsprachigen Wörterbuch". - In: F. J. Hausmann et al., Hgg. (1989) Wörterbücher. Ein internationales Handbuch zur Lexikographie. - de Gruyter: Berlin ( = Handbücher zur Sprach- und Kommunikationswissenschaft. Bd. S.l) 508-518. Wells, J. C. (1985) "English Pronunciation and its Dictionary Representation". - In: R. Ilson, ed. (1985) Dictionaries, Lexicography and Language Learning. - Oxford: Pergamon Press, 45-51.

Bernhard Diensberg

A Concise Dictionary of Early Modern English Pronunciation Every scholar and student of English is supposed to be familiar with the standard pronunciation dictionaries such as The English Pronouncing Dictionary (EPD) by Jones/Gimson/Ramsaran (first published 1917; 14th edition 1977; revised edition 1988), A Pronouncing Dictionary of American English by Kenyon/Knott (1953), and the Longman Pronunciation Dictionary (LPD) by John Wells (published in 1990). A large number of you are well acquainted with the Middle English Dictionary (MED) by Kurath/Kuhn/Lewis (1952ff.) and of course know the famous Oxford English Dictionary (OED), the corrected re-issue of the New English Dictionary (NED), published between 1884 and 1928 (supplements published between 1972 and 1986: OEDS1"4; integrated edition published 1989; two supplementary volumes published 1992). However, you may never have heard of A Dictionary of Early Modern Pronunciation (DEMEP), a project which was conceived in Uppsala/Sweden nearly 90 years ago. At the beginning of this century the collection of materials for DEMEP was started at Uppsala University. A powerful stimulus to this dictionary project was given in 1923 when Professor Wilhelm Horn (Humboldt University, Berlin) stated that such a dictionary was one of the greatest desiderata of English historical linguistics (Danielsson 1976: 1 Introduction). The project's center was then shifted to Stockholm University under the direction of the late Professor Bror Danielsson. The following meetings took place: 1972 An editorial committee - based on international cooperation - was set up in Bonn on 1st May 1972; 1973 Preliminary international discussion of intensifying the DEMEP project in Edinburgh: 27-30 March 1973; 1974

Symposium on DEMEP and editorial meeting in Edinburgh: 23-26 October 1974.

All in all six volumes were projected for DEMEP: vol. I : 1500-1650 Bror Danielsson (general editor)/Arne Zettersten (co-editor); vol. II : 1650-1700 Bertil Sundby (Bergen) vol. Ill: 1700-1750 Klaus Dietz (then Bonn) vol. IV : 1750-1800 Horst Weinstock (Aachen) vols. V and VI were planned as companion volumes containing material from old pronouncing dictionaries, published before Daniel Jones's English Pronouncing Dictionary (EPD), first edition issued in 1917. The more modest version of the DEMEP which I envisage, namely the Concise Dictionary of Early Modern English Pronunciation (CDEMEP) is intended to fill the gap between the (as yet unfinished) MED and the OED. The OED, which mostly draws on literary sources, provides no information on the pronunciation of Early Modern English (EModE) period (15001880). The pronunciation given under the OED entries tells us nothing about the period before 1884 (when the first fascicle of the NED came out).

94

Bernhard Diensberg

There is plenty of primary material, i.e. statements about the pronunciation of specific sounds, available for the period from about 1550 through 1800 (and, of course, beyond). It consists of the numerous works of English orthoepists (i.e. teachers of correct pronunciation) and spelling reformers of that period. Only some of these treatises - and not even the most important ones - have been edited so far. The greater part exists either in manuscript and early printed form or has been collected in a microfiche edition, entitled English Linguistics 15001800, selected and edited by Robin C. Alston (published 1968). As a consequence, information about a crucial period in the development of the English language, during which the system of (long) vowel phonemes underwent radical changes, called the Great/Long Vowel Shift, is restricted to the few scholars working in that field. Nonspecialists such as students of English literature who want to know more about rhymes and puns in Shakespeare's works, rarely consult treatises and manuals like Helge Kokeritz's Shakespeare's Pronunciation (1953) or Fausto Cercignani's Shakespeare's Works and Elizabethan Pronunciation (1981). As for poets and dramatists before and after Shakespeare, even fewer manuals of that type are available. Precursors of modern pronouncing dictionaries were first published in the course of the 18th century. Among many others Thomas Sheridan, General Dictionary of the English Language, 2 vols. (London, 1780) and John Walker, A Critical Pronouncing Dictionary (London, 1791) deserve mention. Their works were in due course superseded by the above-mentioned EPD by Daniel Jones (1917ff.). Now what can be expected from a Concise Dictionary of Early Modern English Pronunciation (CDEMEP)? It will be able to show, in a concise fashion, pronunciations current at the time of Spenser, Shakespeare, Marvell, Dryden and Pope, to name but a few. Thus, rhymes like ear : repair : there (A. Pope, Essay on Criticism, 1711) can be shown to correspond to the standards of the 17th or 18th century for instance. We have to reckon with the coexistence of more than one standard of cultivated pronunciation during the EModE period. Besides pure dialectalisms, e.g. cloud [tloud] and glory ['dlori] (Robinson, 1617), pronunciations can be shown to have existed which are no longer standard today, e.g. mean v.1 [mjen] alongside meaneth ['mened] 'means' and dear adj. [djer], both recorded by John Hart (1551/69/70). This is equally true of gyarded ['gjarded], recorded by Robert Robinson (1617). Educated speakers (the king and his court, the church, the law courts, the schools, the administration) may have switched from one standard of pronunciation to another during their lifetime. In Present-Day English (PDE), what has become known as Received Pronunciation (RP) is widely regarded as a model. It is anything but a monolithic block, although differences on the phonemic level do no longer exist, in contrast to the situation in Early Modern English.

1. Editorial Methods The CDEMEP is planned as a one-volume pronunciation dictionary of ca. 2500 entries roughly situated between the size of the COD and the SOED - and will be based on material published so far, which is therefore readily available. The CDEMEP will, first and foremost, draw on primary sources, i.e. on editions and monographs on individual Early Modern English orthoepists and spelling reformers. The existing manuals on Early Modern English pronunciation - e.g. Kokeritz 1953, Dobson 1968, Cercignani 1981 - will be critically scrutinised. As far as possible sociological factors, e.g. age, sex, social status and profession, will be taken into account. Thus Alexander Gil in his Logonomia Anglica (1619/21) refers to the

A Concise Dictionary of Early Modern English

Pronunciation

95

pronunciation of the so-called Mopsae (women of the higher strata of the society but with low formal education), of which he disapproves. Diatopical factors, such as the provenance of a witness to Early Modern English pronunciation from areas far away from the capital, e.g. Northern England, Scotland, Southwest England, will be taken into consideration. - Although most of our material will come from native speakers (i.e. Englishmen), we will not down-right exclude the evidence of non-native speakers, e.g. Frenchmen or Germans. Use may be made more sparingly of secondary sources such as rhymes (cf. Gabrielson 1909), puns/jingles and eventually of spellings. The latter will be classified as tertiary sources. Rhymes of the major poets (Spenser, Shakespeare, Pope, Dryden) will be taken into account as will be the lists of homophones (words pronounced alike or near-alike) by witnesses such as Richard Hodges, Christopher Cooper and others.

2. Preliminary Remarks CDEMEP is intended to cover the period from ca. 1500 to ca. 1800, as was its greater parent DEMEP. The sources drawn upon should be as evenly distributed as possible. The orthoepists and spelling reformers should be representative for their decade or period. However, direct pronouncements on the English language begin only around 1550 (Sir Thomas Smith/John Hart). The Hymn to the Virgin (a 1500), also called the Welsh Hymn, is of the secondary type of evidence. The turn from the 17th century to the 18th century is particularly rich as to primary evidence, e.g. 1699 John Wallis (5th and definitive edition), 1704 Anon., Right Spelling (quoted from Dobson 1968: 1020, 1022). A transitional period between Early Modern English proper (ca. 1500-1700) and Modern English should be intercalated (ca. 1700 to ca. 1800). Evidence from the works and manuals of Johnston 1764, Sheridan 1780, Walker 1791, Browne 1800 (quoted from D. Sherman 1976: 127) may selectively be considered for inclusion. The best and most reliable witnesses, such as John Hart (1551/69/70), Alexander Gil 1619/21, Richard Hodges 1643/44, John Wallis 1653, Christopher Cooper 1685 could serve as anchors and touchstones in view of less reliable sources, e.g. Owen Price 1665/68. Of course, the interrelation of the works on Early Modern English pronunciation and the mutual or chronological indebtedness must be closely looked into.

3. CREMEP and Phonetic Transcription of the Evidence At this point a rough sketch of the notation is called for or, more precisely, what will the entries look like? In fact, universal agreement had been reached by the participants of the DEMEP symposium in Edinburgh (23-26 October 1974) about the following system of transcription which they dubbed inclusive notation (cf. Danielsson 1976: 60ff.). By inclusive they understood that it will be neither completely phonemic (usually referred to as broad transcription) nor exclusively phonetic (usually referred to as narrow transcription). However, broad transcription will be the rule - narrow transcription only if it is sufficiently vouched for by the sources, e.g. clear or dark 1, a given r-quality, vowel quality (may be rare). The capital letters V (unspecified vowel sound), D (un-specified diphthong), and C (unspecified consonant). I don't think that there will be many occurrences where a given sound

Bernhard Diensberg

96

segment cannot be specified at all. Anyway, the entries would somehow be disfigured by using both capital and small case letters for the same word. As a consequence capital letters will be reduced to the size of small case letters. It is well known that EModE grammarians (i.e. orthoepists and spelling reformers) rarely made a distinction between the mid vowels, namely close and open e, close and open o. Moreover, there is the problem of the short a-vowel. Like his later colleagues, John Hart gives no indication in his treatise Orthographie (1569) whether short a (as in hat) is front or back. In these cases the use of the capital letter A had been suggested. There is, of course, the possibility of reducing it to the size of a small case letter. On the other hand, an alternative solution to that problem could be envisaged. In the introduction to the Concise Dictionary of Early Modern English Pronunciation (CDEMEP), idiosyncrasies and peculiarities of the sources drawn upon could be briefly stated. John Hart, as I have just pointed out, doesn't make clear whether his short low vowel a is front or back. In view of the subsequent development of EModE a, the phonetic symbol (a = cardinal vowel 4) might be justified. Use will be made of the system of cardinal vowels - basically 4 vowel heights - and, in some cases of vowel symbols used for transcribing RP (= Received Pronunciation; cf. Gimson/Ramsaran 1989). This is not to mean that the evidence on EModE pronunciation is taken to lead directly to RP. As a result the vowel system will look as follows: i e E a, ae

y o e oe A

u o O

R (a firictionless continuant or, in modern terms, an approximant) will be used alongside r (all other r-qualities). - For a comprehensive list of phonetic symbols for DEMEP, see B. Sundby in Danielsson 1976: 65-67 and A.J. Aitken in Danielsson 1976: 117-120; see also Sundby in Stanley/Gray 1983: 151-155. It stands to reason that even primary evidence, i.e. statements of EModE authors of treatises on pronunciation or reformed spelling, cannot be taken at face value, but are clearly in need of interpretation. Both the methods and terminology used in the description of individual sound segments are a far cry from modern English phonetics. Sometimes even Latin terms are used, e.g. by Alexander Gil in his 2 editions of Logonomia Anglica (1619 and 1621). Some authors give an impressionistic or even folkloristic description of individual sounds. One has to be aware of the fact that, more often than not, different phonetic terms refer to the same sound segment. This leads to the conclusion that there is a frequent lack of consistency, as far as terminology is concerned. Unlike the modern pronouncing/pronunciation dictionaries, referred to at the beginning of this paper, CDEMEP will record the contemporary pronunciation of both inflected and uninflected word forms. This may include parsing, as a matter of fact. This holds for both regular and irregular inflection of verbs. In some cases, word class and/or meaning cannot be determined with certainty so that a question mark will have to be used. Both lexical and grammatical homonyms will be distinguished. In order to disambiguate these homonyms, the German (G) equivalent will be given in some cases (see below). Allowance must be made for lexical homonyms which existed in Early Modern English only, e.g. boy n. ModE buoy, G 'Boje' (Ho 1644) 1) ; 2) ModE boy, G 'Junge'. As to the

A Concise Dictionary of Early Modern English Pronunciation

97

numbering of these homonyms I shall follow the OED (as did the late Professor Bror Danielsson in his edition of John Hart's Works, 1955). Actually, in quite a few cases lexical and grammatical homonyms are not distinguished in the sources. Thus broil may stand for a noun, glossed by Latin tumultus 'noisy disturbance' (OED: broil sb.1), or a verb, meaning 'to roast on open fire' (OED: broil v.1). In such cases a question mark should be added in brackets, i.e. broil v.1 (?). For the sake of userfriendliness, question marks should be used as sparingly as possible. With some orthoepists and spelling reformers, there may be an alternative pronunciation which would certainly be worth recording. Towards the end of the 17th century, both a diphthongal and a monophthongal pronunciation may be vouched for in the lexical sets bone, cloak, know, audible, cause, fall and cane, lane, praise, made (quoted from Sundby in Danielsson 1976: 59). Furthermore, the so-called homophone-lists found in many treatises of our witnesses on Early Modern English pronunciation will be drawn upon. The following symbols will be used: =

exactly alike word pairs, e.g. ModE meet v./adj. = meat n.

..

nearly alike word pairs

— ambiguous pairings =

words not alike, i.e. homographs, e.g. ModE bow n., G 'Bogen' vs. bow v./n., G 'sich verbeugen, Verbeugung'

Thus, the alike list found in Fox and Hookes (1673) turns out on inspection to be a patchwork of alike and near alike pairs taken from Hodges (1644/53). Here the ambiguous symbol is called for; and similarly in dealing with, for instance, Wharton (1654) (quoted from Sundby in Stanley/Gray 1983: 152). Not unlike the previous section, the following kind of secondary sources may be much less informative than direct statements on Early Modern English pronunciation (which I have termed primary material). Consequently, the CDEMEP will contain two more symbols: —

=

words rhyming with each other, e.g. EModE ear : repair : there", love : move; words not rhyming with each other

(quoted from Sundby in Stanley/Gray 1983: 152).

Sources (editions) listed in chronological order The following editions of primary sources, i.e. orthoepists and spelling-reformers, will be drawn upon and consulted:

98

Bernhard Diensberg

16th Century: Bjurman, M. 1977. The Phonology of Jacques Bellot's Le Maistre d'Ecole Anglois (1580). Together with readings of the anonymous editions of 1625, 1639, 1647, 1652, 1657, 1670, 1679 and 1695. Stockholm Studies in English, 40. Stockholm: Almqvist and Wiksell. Danielsson, B. 1955. John Hart's Works on English Orthography and Pronunciation (1551/1569/1570). Part I. Biographical and Bibliographical Introductions. Text and Index Verborum. Stockholm Studies in English, 5. Stockholm: Almqvist and Wiksell. Danielsson, B. 1963. Sir Thomas Smith. Literary and Linguistic Works. Part I. Certaigne Psalmes and Songes of David (1542, 1549). Stockholm Studies in English, 12. Stockholm: Almqvist and Wiksell. Danielsson, B. 1978. Sir Thomas Smith. Literary and Linguistic Works. Part II. De Recta et Emendata Linguae Graecae Pronuntiatione Dialogus (1568). Stockholm Studies in English, 50. Stockholm: Almqvist and Wiksell. Danielsson, B. 1983. Sir Thomas Smith. Literary and Linguistic Works. Part III. De Recta et Emendata Linguae Graecae Scriptione Dialogus (1568). Stockholm Studies in English, 56. Stockholm: Almqvist and Wiksell. Danielsson, B. and R.C. Alston. 1966. The Works of William Bullokar. A Short Introduction or Guiding (1580/81). Leeds Texts and Monographs, New Series, I. Vol. I. The University of Leeds: School of English. Turner, J. R. 1969. The Works of William Bullokar. Aesop's Fables (1585). Leeds Texts and Monographs, New Series, I. Vol. IV. The University of Leeds: School of English. Turner, J. R. 1970. The Works of William Bullokar. Booke at Large (1580). Leeds Texts and Monographs, New Series, I. Vol. III. The University of Leeds: School of English. Turner, J. R. 1980. The Works of William Bullokar. Pamphlet for Grammar (1586). Leeds Texts and Monographs, New Series, I. Vol. III. The University of Leeds: School of English. 17th Century: Danielsson, B. and A. Gabrielson. 1972. Alexander Gill's Logonomia Anglica (1619). Part I: Facsimiles of Gil's Presentation Copy in the Bodleian Library. Part II: Biographical and Bibliographical Introductions. Translation and Notes. Stockholm Studies in English, 26/27. Stockholm: Almqvist and Wiksell. Dobson, E.J. 1957. The Phonetic Writings of Robert Robinson (1617). Early English Text Society, 238. London: Oxford University Press. Ekwall, E. 1911. The Writing Scholar's Companion (1695). Neudrucke früneuenglischer Grammatiken, Band 6. Halle: Niemeyer.

A Concise Dictionary of Early Modern English Pronunciation

99

Jiriczek, O.L. 1903. Alexander Gill's Logonomia Anglica (1621). Quellen und Forschungen zur Sprach- und Culturgeschichte der germanischen Völker, Band 90. Straßburg: Trübner. Jones, J.D. 1911. Christopher Cooper's Grammatica Linguae Anglicanae (1695). Neudrucke früneuenglischer Grammatiken, Band 5. Halle: Niemeyer. Lehnert, M. 1936. Die Grammatik des Sprachmeisters John Wallis (1616-1703). Sprache und Kultur der germanischen Völker, Serie A. Anglistische Reihe, Band 21. Breslau: Priebatsch's Buchhandlung. Nöjd, T. 1977. Richard Hodges' The English Primrose (1644). A Study of the Strong-Stressed Vowels and Diphthongs with some Regard to A Special Help to Orthography (1643), The Plainest Directions (1649), and Most Plain Directions for True-Writing (1653). Stockholm Studies in English, 45. Stockholm: Almqvist and Wiksell. Sundby, B. 1953. Christopher Cooper. The English Teacher (1687). Lund Studies in English, 22. Lund: Gleerup.

18th Century: Gabrielson, A. 1930. Edward Bysshe's Dictionary of Rhymes (1702) as a Source of Information on Early Modern English Pronunciation. Uppsala/Stockholm: Almqvist and Wiksell. Kern, K.L. 1913. Die englische Lautentwicklung nach Right Spelling (1704) und anderen Grammatiken. Unpublished dissertation, University of Giessen. Popp, M. 1989. Die englische Aussprache im Lichte englisch-französischer Zeugnisse. Teil I: Das Dictionnaire de la Prononciation Angloise 1756. Anglistische Forschungen, Band 199. Heidelberg: Winter. Sheridan, T. 1780. General Dictionary of the English Language. Two volumes. London. Walker, J. 1791. A Critical Pronouncing Dictionary. London.

19th Century: Zettersten, A. 1974. A Critical Facsimile Edition of Thomas Batchelor. An Orthoepical Analysis of the English Language and An Orthoepical Analysis of the Dialect of Bedfordshire (1809). Lund Studies in English, 45. Lund: Gleerup. Sample entries (CREMEP): For sample entries of DEMEP, see B. Sundby in Danielsson 1976: 68; see also Sundby in Stanley/Gray 1983: 155-156.

100

Bernhard Diensberg

Sources (given in chronological order): Sir Thomas Smith (Sm) 1542/68; John Hart (Ha) 1551/1569/1570; Robert Laneham (La) 1575; Bullokar (Bui) 1580; Bellot (Be) 1580; Richard Mulcaster (Mu) 1582; Robert Robinson (Ro) 1617; Alexander Gill (Gi) 1619/21; Charles Butler (But) 1633/34; Simon Daines (Da) 1640; Richard Hodges (Ho) 1644; John Wallis (Wa) 1653; Owen Price (Pr) 1665/68; John Wilkins (Wi) 1668; Christopher Cooper (Co) 1685/87; The Writing Scholar's Companion (WSC) 1695; Right Spelling (RS) 1704;

Note: For technical reasons long i, e, o, a, u will be rendered by i, e, o, a, u. The letter e in unstressed syllables is taken usually to represent schwa, and so are o, a, u (e.g. continual, custom/custom). - The voiceless dental spirant th will be rendered by J), and its voiced counterpart will be rendered by the exact IPA symbol, namely d. The voiceless palatal and velar fricatives will be represented by ? and x respectively. The combinations tS and dz respectively stand for the voiceless/voiced palatal affricate (e.g. ModE chin/gin). - The symbols I, n, r, when following a consonantal segment, will be regarded as syllabic (cf. able/ladle, brethen/children, father/mother). - Hart's variation between s and z after voiced segments in weakly-stressed syllables will be left unchanged (e.g. birds, eksampls, elders). Stress marks are inserted only if found in the sources (e.g. offend, offended). As was stated above, in the case of homographs or homonyms the German (G) equivalents will be given, e.g. ear n.1 G 'Ohrear n.2 G 'Ahre'.

able adj. abl Ha 1569, abl Ha 1569; above prep, abuv Ha 1569; above-said adj. abuv-sed ~ abuv-sed Ha 1569; abuse n. abius Ha 1569; abuse v. abiuz inf. and pres. pi. Ha 1569; abiuzef) pres. 3 sing. Ha 1569, abiuzd adj., pa.t. and p.p. Ha 1569; acute adj. akiut Ha 1569; add v. ad inf. Ha 1569; aded p.p. Ha 1569; again adv. agen Ha 1569/70; against prep, agenst Ha 1569, agenst Ha 1569/70; Ha 1551 age n. adzes pi. Ha 1569; all indef. pron. al Ha 1551/69; aul Ha 1569; cf. also adv.; allow v. alou pres. 1 sing. Ha 1569; aloued p.p. Ha 1569;

A Concise Dictionary of Early Modern English Pronunciation

101

almighty adj. aulmi^ti Ha 1569/70; almifti Ha 1570; also adv. aulso Ha 1569; also Ha 1569; cf. so adv.; although conj. auldox Ha 1569; among prep, emorjg Ha 1569; amongst prep, emorjgst ~ amorjgst Ha 1569; ancient adj. aunsjent Ha 1569; anoint v. anweinted pa.t./p.p. Ho 1644; another indef.adj. anuder Ha 1569, an uder Ha 1569; cf. other indef. pron.; appoint v. apuint Bui 1580; apweinted Ho 1644; avoid v. avoid Bui 1580; avoiding p.pr. Mu 1582; avoid Ho 1644;

back adj. bak Ha 1569; bee n.1 bi Ha 1569; bird n. birds pi. Ha 1569; blood n. blud Ha 1551; blue adj. bliu Ha 1570; boil n. beil Sm 1542/68; boil v. builirjg p.pr. Bui 1580; buil Gi 1619, buil (dial.) Gi 1621; bweilir]g p.pr. Ho 1644; boiler n. 'a cook' bweilers pi. Ho 1644; book n. buk Ha 1569; buks pi. Ha 1569; bow n.1 G 'Bogen' bou Ha 1570; bow v.1 G 'sich verbeugen' bou Ha 1570; boy n. boi Sm 1568; bwe Ha 1569, Ha 1551, Ha 1551; boi Ha 1569; boi Bui 1580; boi Mu 1582; boi (dial.) bwoi Gi 1621; boi (or buoy?) Ho 1644; boyish adj. boiis Ho 1644; bovstios adj. 'boisterous' boisties Bui 1580;

102

Bernhard Diensberg

bread n. bred Ha 1570; break v. brèk Ha 1569; breath n. brej) Ha 1569, bredes pi. Ha 1569; bréj) Ha 1569, bréòs pi. Ha 1569, bréóz pi. Ha 1569; breathe v. bréd inf. and pres. 1 pi. Ha 1569, brèdd adj. and p.p. Ha 1569; brèóirjg p. pr. Ha 1569; bredd p.p. Ha 1569; breóed adj. Ha 1569; brethren n.pl. brìdrn Ha 1570, Ha 1551; bring v. brr]g inf. Ha 1569; bringing p.pr. Ha 1569; broxt (up) p.p. Ha 1569; brduxt Ha 1570; brouxt up Ha 1570; broad adj. brfid Ha 1569; brdder compar. Ha 1569; broil n.1 (tumultus) broil ~ bruil ~ brùil (dial.) Gi 1621; broil (or v.?) Ho 1644; brueil (or v.?) Ho 1644; broil v.'(?) broil ~ bruil La 1575; broil (or n.?) Ho 1644; brueil (or n.?) Ho 1644; buov n. bwei Ha 1569; bui Bui 1580; bui Gi 1619, bui (dial.) Gi 1621; boi (or boy?) Ho 1644; bury v. biuried p.p. Ha 1570; buyer n. beier Ha 1569;

cause n. kauz Ha 1569, kauzes pi. Ha 1569, cf. Ha 1551; cause v. kauz Ha 1569; kauzef) pres. 3 sing. Ha 1569; kauzed p.p. Ha 1569, cf. Ha 1551; change n. and v. tSandz, tSandz, tsaundz inf. Ha 1569/70 1569; tSandzd, tSandzed, tsaundzd p.p. Ha 1569; tSandzdirjg Ha 1569; chief adj. tiff Ha 1551; tsifest superi. Ha 1569, Ha 1551; child n. tSild, tSeild Ha 1569; children pi. tsildrn Ha 1570, tsilder Ha 1569, Ha 1551, Ha 1569; choice n. tSois Ha 1569; tSois Bui 1580; tSois ~ tsuis Mu 1582; tsois Ho 1644; clean adv. klin Ha 1569;

A Concise Dictionary of Early Modern English Pronunciation

103

clov V. kloi Sm 1568; klueid ir]g Ha 1569; nought indef.n. noxt Ha 1569; now adv. *nou Sm 1568; nou Ha 1569; nou Bui 1580; nou Be 1580; nou Ro 1621; nou Gi 1621; nou Ho 1644;

oar n. oer Ha 1570; obtain v. obténd p.p. Ha 1569, Ha 1551; see also contain, retain v.; occasion n. okazjon Ha 1569; occupy v. okupeied p.p. Ha 1569, okupied Ha 1569; offence n. o'fens Ha 1551/69; offend v. o'fended p.p. Ha 1551/69; office n. ofis Ha 1569, ofises pi. Ha 1569; often adv. oftn Ha 1569; oil n. oil Bui 1580; oil ~ uil Mu 1582; oil ~ weil Ho 1644; oint v. oint ~ uint Mu 1582; ointment n. ointment Bui 1580; old adj. old Ha 1569, elder compar.; see also elders n.pl.; once num. dns Ha 1569;

A Concise Dictionary of Early Modern English Pronunciation

one num. on Ha 1569, don e l < / k > vaere) til hêtre monté ( < k > e l < / k > à cheval); legge i seg som en hdévorer; slite som en htravailler comme un b&oe;uf, être un vrai cheval à l'ouvrage; stige til hmonter en selle; bruke apostlenes hester aller à pied, aller pedibus (cum jambis) (2) sport ( < k > g y m n a s t i k k < / k > ) cheval m d'arçons (3) tekn ( < k > i pl fam for hestekrefter) chevaux

NB410 FR410 NB410 FR410 NB070 FR001 NB070 FR001 end of record

In cases where the headword is provided with different numbers denoting different meanings, these have been generated as separate records. As shown by (3), three headwords have been generated from the original record in the Norwegian-French Dictionary. The information on gender has been moved to a separate field, and so also the meta-linguistic comments associated with the translation equivalents. (3) FR001 cheval YY001 m YY003 -aux NB014 s NB001 hest NB015 m

The Electronic Conversation of a Dictionary

FR410 NB410 FR410 NB410 FR410 NB410 FR410 NB410 FR410 NB410 FR410 NB410 FR410 NB410 FR410 NB410 FR410 NB410

le cheval ailé, Pégase den vingede hest il a été désarçonné, il a vidé les étriers han ble kästet av hesten descendre de cheval stige av hesten monter sur ses grands chevaux sette seg pà sin haye hest être monté ( < k > e l < / k > à cheval) sitte ( < k > el < /k > vaere) til hest dévorer legge i seg som en hest travailler comme un b&oe;uf, être un vrai cheval à l'ouvrage slite som en hest monter en selle stige til hest aller à pied, aller pedibus (cum jambis) bruke apostlenes hester

FR001 YY001 XX000 NB014 FR070 NB001 NB015

cheval d'arçons m < k > gymnastikk < /k > s sport hest m

FR001 YY001 XXOOO NB014 FR070 NB001 NB015

chevaux mpl < k > i pl fam for hestekrefter < /k > s tekn hest m

129

In cases where the French translation field also contains information on inflection in the feminine gender and/or in the plural, this information has been moved to separate fields. As for lexical information on the Norwegian language, it was decided to include, in addition to information pertaining to parts of speech, information on grammatical gender for nouns. This information is implied by the inflectional pattern (in the field NB015), and could therefore be generated automatically. In cases where the headword of the Norwegian-French Dictionary is provided with illustrative phrases, the entire set of phrases has been copied to all the new headwords generated. This implies that the illustrative phrases do not necessarily reflect the usage of the new headword, a fact which of course makes it necessary to pay close attention to these in the form of further analyses and processing. In the conversion no identifiers have been used as pointers to the original record in the Norwegian-French data (e.g. by adding a polysemy number). All the same, provisions for traceability will be established by loading the Norwegian-French Dictionary as a data base where in principle all information can be made retrievable, thus ensuring easy access to information on the original context of the word.

130

Tove Jacobsen and Randi Sceboe

3. Adaption to the SGML format It was decided to adapt the converted version of the dictionary to the SGML format, an international standard markup language. There were many reasons for this particular choice: the SGML is a flexible markup language for publishing involving different media (printed or electronic dictionaries). Moreover, the SGML provides for efficient data interchange between different types of computers and easy data conversion from one text editor to another. As for reusability, it was desirable to establish a dictionary format which could form the basis for future dictionaries. In order to arrive at a suitable SGML format, the proposal from the Text Encoding Initiative (TEI), an international project to develop a common format for the encoding of machine readable literary and linguistic data, was adopted. The SGML format for the French-Norwegian Dictionary has not yet been definitely settled; this because it would be of interest to consider in greater detail the most recent proposal from TEI. (4) does, however, show the main structure and the basic elements of the format which will be adopted. (4) < entry >

< orth > cheval < /orth > < plur > -aux < /plur >

n m

< trans > hest m

a c- < trans > < tr > til hest < /trans > aller k c-

ri < /trans >

< sense n='2'> < usg type=style > fig < /usg >

monter sur ses grands chevaux < trans > sette seg pi sin haye hest < /trans >

The Electronic Conversation of a Dictionary

131

< usg type=dom > sport < /usg >

cheval d'arçons m < plur2 > chevaux d ' arçons < /plur2 >

< tr > boy lehest < /tr >

The following abbreviations are used: eg entry form gen gramGrp orth plur plur2 pos pron q re sense tr trans usg

example containing at least one occurrence of the headword dictionary entry (headword) orthography and pronunciation gender grammatical information orthographic form of headword plural headword plural compound part of speech pronunciation of headword quotation compound information relating to one word sense translation of headword, compound or example translation text and related information information on usage

The tag "", which is not explained above, contains administrative information pertaining to data management and documentation of the entry. This is an important aspect of project quality assurance, making it possible to trace the individual responsible, updates or modifications and final approval. As for SGML editor, the Author/Editor is a likely choice. This editor is flexible in the sense that it can also present data in an approximately typographical version with no codes showing. The internal data format has an SGML compatible syntax. The Author/Editor operates on DTDs (Document Type Definitions) compiled to a so-called rules file against which validations are made. The final check of the field contents must be supplemented by means of Pascal programs. There may be a drawback concerning the Author/Editor if, as we fear may be the case, it should prove to be unable to handle all the French-Norwegian data simultaneously. As for checking cross-references, this implies that the data will have to be subjected to a separate SGML parsing. The entry hest used above to illustrate the multiplication of new entries is a fairly simple entry. A more complex entry like edelegge ("destroy"), which consists of three different meanings, will generate several new headwords, each with its own set of illustrative phrases. These will be identical if the headwords belong to the same sense number in the Norwegian-

Tove Jacobsen and Randi Sœboe

132

French Dictionary. (5) below shows the complete Norwegian entry and (6) a sample of the new headwords (there are seven in all) and the illustrative phrases derived exclusively from odelegge

(1):

(5)

NB001 0de/legge NB014 v NB070 (1) FR001 ( < k > o m fysisk skade) casser, briser, détruire, ( < k > d e l v i s < / k > ) abîmer, endommager, ( < k > f a m < / k > ) esquinter, ( < k > o m m a t < / k > ) gâter: NB410 disse husene ble edelagt under krigen FR410 ces maisons ont été détruites pendant la guerre; NB410 kjatt edelegges fort i varmen FR410 la viande se gâte ( < k > e l < / k > s'abîme) vite par temps chaud; NB410 denne gutten edelegger ait FR410 cet enfant casse ( < k > e l < / k > brise) tout, cet enfant est un brise-fer; NB410 hun fikk edelagt ansiktet i en kollisjon FR410 elle a été défigurée dans une collision; NB410 regnet 0dela avlingene FR410 la pluie a endommagé les récoltes; NB410 skarpe steiner edela dekkene FR410 des cailloux pointus ont endommagé ( < k > e l < / k > abîmé, esquinté) les pneus; NB410 e- sin helse FR410 ruiner sa santé; NB410 jeg har edelagt en tann FR410 je me suis cassé une dent NB070 (2) FR001 ( < k > o m moralsk skade) gâter, gâcher, compromettre, nuire à, briser, faire du tort à, porter préjudice à NB410 hun 0dela forholdet mellom dem FR410 elle les a brouillés; NB410 0- sitt liv FR410 gâcher sa vie; NB410 0- moroa FR410 gâcher ( < k > el < /k > gâter) le plaisir; NB410 0- sitt gode navn og rykte FR410 compromettre ( < k > e l < / k > faire du tort à, nuire à) sa réputation; NB410 mangel pâ disiplin edelegger undervisningen FR410 le manque de discipline compromet ( < k > e l < / k > nuit à) l'enseignement; NB410 0- for sine konkurrenter FR410 faire du tort ( < k > e l < / k > nuire) à ses concurrents NB070 (3) tekn FR001 : NB410 edelagte deler mâ skiftes ut FR410 il faut remplacer les pièces défectueuses ( < k > e l < / k > endommagées); NB410 bilen/maskinen er helt adelagt FR410 la voiture/la machine est hors d'usage ( < k > e l < / k > hors service) . inn TJ

(6) FR001 casser XX000 < k > om fysisk skade < /k > NB014 v

The Electronic Conversation of a Dictionary

NB001 0de/legge FR410 ces maisons ont été détruites pendant la guerre NB410 disse husene ble adelagt under krigen FR410 la viande se gâte ( < k > el < /k > s'abîme) vite par temps chaud NB410 kjertt adelegges fort i varmen FR410 cet enfant casse ( < k > e l < / k > brise) tout, cet enfant est un brise-fer NB410 denne gutten adelegger ait FR410 elle a été défigurée dans une collision NB410 hun fikk adelagt ansiktet i en kollisjon FR410 la pluie a endommagé les récoltes NB410 regnet 0dela avlingene FR410 des cailloux pointus ont endommagé ( < k > e l < / k > abîmé, esquinté) les pneus NB410 skarpe steiner adela dekkene FR410 ruiner sa santé NB410 ade/legge sin helse FR410 je me suis cassé une dent NB410 jeg har adelagt en tann FR001 briser NB014 v NB001 0de/legge FR410 ces maisons ont été détruites pendant la guerre NB410 disse husene ble edelagt under krigen FR410 la viande se gâte ( < k > e l < / k > s'abîme) vite par temps chaud NB410 kjett adelegges fort i varmen FR410 cet enfant casse ( < k > e l < / k > brise) tout, cet enfant est un brise-fer NB410 denne gutten edelegger ait FR410 elle a été défigurée dans une collision NB410 hun fikk edelagt ansiktet i en kollisjon FR410 la pluie a endommagé les récoltes NB410 regnet adela avlingene FR410 des cailloux pointus ont endommagé ( < k > e l < / k > abîmé, esquinté) les pneus NB410 skarpe steiner edela dekkene FR410 ruiner sa santé NB410 ade/legge sin helse FR410 je me suis cassé une dent NB410 jeg har edelagt en tann FR001 détruire NB014 v NB001 ade/legge FR410 ces maisons ont été détruites pendant la guerre NB410 disse husene ble adelagt under krigen FR410 la viande se gâte ( < k > el < /k > s'abîme) vite par temps chaud NB410 kjett edelegges fort i varmen FR410 cet enfant casse ( < k > e l < / k > brise) tout, cet enfant est un brise-fer NB410 denne gutten adelegger ait FR410 elle a été défigurée dans une collision NB410 hun fikk adelagt ansiktet i en kollisjon FR410 la pluie a endommagé les récoltes NB410 regnet edela avlingene FR410 des cailloux pointus ont endommagé ( < k > e l < / k > abîmé, esquinté) les pneus NB410 skarpe steiner 0dela dekkene

Tove Jacobsen and Randi Sœboe

134 FR410 ruiner sa santé NB410 0de/legge sin helse FR410 je me suis cassé une dent NB410 jeg har sdelagt en tann

These examples show that the work of sorting and deleting will be rather extensive. In fact, in its present form, the inverted dictionary has 80,000 entries, which is 35% more than the source dictionary.

4. Easy access to lexical material An interesting result of the conversion so far is that the editors will have access to lexical material not easily found elsewhere. Traditionally, learners of a foreign language have been taught to think that more information is to be found in a monolingual dictionary than in a bilingual one. Sometimes, however, the opposite is true, as shown by (7) below, where a series of prepositional phrases has been generated automatically. These are all translations of Norwegian adverbs. In a French monolingual dictionary like le Petit Robert only about half of these prepositional phrases are to be found in the entries for acharnement, admiration etc. As these expressions are quite frequent in written French, and often translated incorrectly into Norwegian, they ought to be included in a French-Norwegian dictionary. Hopefully, there will be many cases like this: (7) FR001 NB014 FR070 NB001

avec acharnement ( < k > el < /k > ténacité) adj adv iherdig

FR001 NB014 FR070 NB001

avec admiration adj adv beundrende

FR001 NB014 FR070 NB001

avec agilité adj adv behendig

FR001 NB014 FR070 NB001

avec ame adj adv sjel/full

FR001 NB014 FR070 NB001 NB015 FR410 NB410

avec amertume adj adv bitter pi bitre regretter qch amèrement angre bittert pi noe

The Electronic Conversation of a Dictionary

FR410 NB410 FR410 NB410

dire qch avec amertume ( < k > el < /k > amèrement) si noe bittert ce remède a un goût amer medisinen smaker bittert

FR001 NB014 FR070 NB001 FR410 NB410

avec application adj adv flittig travailler avec application ( < k > e l < / k > entrain) arbeide flittig

FR001 NB014 FR070 NB001

avec approbation adj adv bifallende

FR001 NB014 FR070 NB001 FR410 NB410

avec ardeur adj adv idig travailler avec ardeur arbeide idig

135

etc.

5. Reusability The editors like to think that whatever new material is put into the dictionary will be of great assistance in further dictionary compiling (e.g. when preparing a new edition of the Norwegian-French Dictionary). Extraction of this material could be done quite easily with the use of a tag like the ones used for notes for data management. In fact, theoretically, this process could be repeated an infinite number of times. Thus the assertion that "the electronic form of the dictionary should no longer be considered a spin-off of the printed form, but rather the other way around" (Masereeuw and Serail, 92), seems well founded.

Bibliography Grundt, L.O. et al. (1991): Stor norsk-fransk ordbok, Oslo Honselaar, W. and Elstrodt M. (1992): The electronic conversion of a dictionary: from Dutch-Russian to RussianDutch. - In: Euralex '92 Proceedings /-//, Tampere ISO 8879: Standard Generalized Markup Language Masereeuw, P.C. and Serail, I. (1992): DictEdit: a computer program for dictionary data entry and editing. - In: Euralex '92 Proceedings /-//, Tampere Sperberg-McQueen, C.M. & Burnard, L. (eds.) (1994): Guidelines for Electronic Text Encoding and Interchange, London

Ljubima Jordanowa

Computerwörterbücher der Bulgarischen Neologismen 1993 wurde die Arbeit an zwei Computerwörterbüchern der bulgarischen Neologismen beendet: Neologisches Computerwörterbuch der 70er und Anfang der 80er Jahre und Neologisches Computerwörterbuch der 80er und Anfang der 90er Jahre. Verfasserin des Wörterbuches ist Ljubima Jordanowa, Redakteur Miroslav Janakiev, Programmierer Bojan Baev. Im Beitrag wird das zweite Wörterbuch (der 80er und Anfang der 90er Jahre bis Oktober 1993 einem Kurzkommentar unterworfen, da es meines Erachtens für einen breiteren Kollegenkreis von Interesse sein dürfte. Darin ist der Wortschatz entscheidend der Wende in Osteuropa enthalten, die ja von Interesse die Entwicklungsprozesse im Wortschatz entscheidend geprägt hat. Das Wörterbuch umfasst einen Textcorpus von 4,5 Megabyte und enthält lexikalisierte Einheiten, die in der Periode 1982 bis 1993 gesammelt wurden. Das in der Quelle aufgeführte Jahr weist also auf die Belegaufnahme hin und nicht auf deren Schöpfung bzw. Eingang in die Sprache. Als Lemmata werden aufgeführt: 1. Wörter, die in der o.g. Periode aufgekommen bzw. sehr aktiv geworden und in den Bedeutungswörterbüchern der bulgarischen Gegenwartssprache nicht enthalten sind, wie: evrocentrizam/Eurozentrismus, evrotunel/Eurotunnel, nepalnikravnost/Unzulänglichkeit, abfart/Ab fahrt, aviolainer/Jet, avionositel/Jettransporter, gabenist/Gabenist, gazdovci (nach demNamen von Major Gasdov, der während der sozialistischen Zeit Aufseher in Konzentrazionslagren war), gazoanalicen/gasanalytisch, gastroenterolog/Gastroenterologe, evrogangster/Eurogangster, evrodeputat/Abgeordneter beim Eoroparat u.a.m. 2. Im Unterschied zum Wörterbuch der Neologismen der 70er und Anfang der 80er Jahre nimmt das vorliegende Wörterbuch Okkasionalismen auf: sie werden mit dem Vermerk individuell (indiv) versehen. Es sind in der Regel expressive Bezeichnungen, die konkrete Redesituationen entstehen Hessen und nur von kurzer Lebensdauer sind, vgl. nistopravene/Nichtsteuerei, Grand na otecestvenata zurnalistika/die Grand der bulgarischen Journalistik (in bezug auf die Zeitung Duma der ehemaligen Kommunisten und gegenwärtigen Sozialisten). 3. Mit Sternchenzeichen (*) werden Neubedeutungen aufgeführt, z.B. *gore/oben Adv umg pej (in bezug auf die höhren Kreise der regierenden Partei vom Standpunkt der Opposition): "Njakoi hora 'gora' govorjat za stabilizirane/Manche Leute da oben reden von einer Stabilisierung". Podkrepa, 3.4.1990. 4. Neuprägungen werden in der Regel beim Grundwort hinter dem Zeichen (#) aufgeführt, vgl. * zvezden/Stern-, # zvezdendazd/Sternenregen, sh. dazd/Regen; # zvezden mesec. Sh. mesec; # zvezdna era. Sh. era # zvedni voini. Sh. voina. 5. Als Stichwörter werden auch Grund- und Bestimmungskonstituenten von Komposita aufgenommen, z.B. samo- selbst/auto, -oeko-, evro/Euro, mono-/mono u.a. 6. Bei den Abkürzungen wird die Aussprachenorm aufgeführt, vgl. etwa * ABFK, deren Vollwortbezeichnung beim Lesen wie folgt ausgesprochen wird: "Aktive Vorkämpfer gegen Faschismus und Kapitalismus" und ABBA (abba). 7. Aufgenommen werden auch Warenzeichen, die eine Wortklasse für sich ausmachen und weitgehend gebraucht werden, vgl. Avtkoop/Alisa i Ko u.a. 8. Neue stilistische Nuancen werden ebenfalls als Lemmata aufgeführt, vgl. etwa * drugar, jat, -ja, mn. i, iron (in bezug auf ein Mitglied oder einen Sympatisanten der Bulgarischen Sozialistischen früher Kommunistischen Partei). Hajde, drugarju polkovmik/Nun los, Genosse Oberst! Aufruf von einem Aufzug der BSP am 5.11.1991; Az sam izsmukan ot drugariteAch

138

Ljubima Jordanowa

bin von den Genossen aufgesagt worden. Losung von einer Kundgebung der Union der Demokratischen Kräfte, stattgefunden am 3.3.1990 auf dem Platz vom Dom Alexander Newski. Vom Wörterbuch werden also Vokabeln aus den unterschiedlichsten Lebensbereichen erfasst: aus der Politik, die für die Periode ganz typisch ist, der Elektronik, Industrie und Landwirtschaft, der Wirtschaft, dem Alltag, dem Sport u.v.a. Benutzt wurde mein eigenes Archiv; es enthält Belege aus Zeitungen, Zeitschriften, aus der Umgangssprache direkt - aus Losungen, Aufrufen, Wahlprogrammen, Liedern, Anzeigen, Plakaten, Aufschriften aus dem Städtchen der Wahrheit etc., die ich auf Kundgebungen und Protestzügen gesammelt habe. Die Stichwörter liegen in alphabetischer Reihenfolge vor, grammatische und stilistische Vermerke begleiten sie. Die Erläuterungen bauen auf der Grundlage der Beschreibung und/oder der Synonymie auf. Da die Periode nach 1989 politisch breit gefächert ist, war es notwendig, die Stichwörter aus dem Bereich der Politik mit einem Differenzierungsmerkmal zu versehen, vgl. tamnosin,../dunkelblau pej. (bezogen auf die extremen Anhänger der UDK seitens der Anhänger der BSP); tamnocerven,../dunkelrot pej. (bezogen auf die extremen Anhänger der BSP seitens der Anhänger der UDK); tamnosinjoturskosin,.. /dunkelblautürkis pej. (bezogen auf die Koalition zwischen der UDK und in der Bewegung für Rechte und Freiheiten seitens der BSP nach den Wahlen vom 13. Oktober 1992. Die Erläuterung ähnlicher Vokabeln obliegt meiner eigenen Verantwortung. Sie entspricht meiner eigenen Orientierung in der komplizierten politischen Situation im Lande in der betreffenden Periode. Erläuterung auf der Grundlage der Synonymie führt man bei Fremdwörtern bzw. bei deren Komponenten auf; dabei wird auf den Gebrauch des bulgarischen Wortes bzw. der bulgariscehn Komponente hingewiesen, z.B. superdemokraticen, -cna, -cno, mn. -cni Adj. Svrahdemokraticen/höchstdemokratisch; mach Numerale; superkomjuter/SuperComputer. Nun folgt das Belegmaterial, das eine der Registrierungen darstellt. Öfters handelt es sich um Belege, die auf Kundgebungen und Protestzügen nach 1989 gesammelt wurden. Dabei ging es mir auch um die Aufnahme der gesellschaftlichen Wahrnehmung der Wendezeit. Die Wortartikel schliessen mit Angaben über den Ursprung des Wortes, an manchen Stellen auch über ihre erstmalige Verwendung. Auf den Ursprung weist man durch Aus hin, wobei das Wort in der herkömmlichen Sprache aufgeführt wird, soweit es nicht aus dem Arabischen, Chinesischen, Japanischen, Koreanischen, Mongolischen, Vietnamesischen stammt. Sh. dilar.. aus dem Englischen dealer; seksi.. aus dem Englischen sexy. Wenn auch selten bietet man die erstmalige Verwendung des Wortes, z.B. beim Wort * ljuspi, obikn. mn. Schuppe meist Plural (in bezug auf jemanden oder eine Partei, der bzw. die aus der UDK ausgeschieden ist). Das Wort wurde zum ersten Mal von Christophor Sabev in einem Fernsehinterview 1992 verwendet. Das Wörterbuch liegt als Computerauflage in folgender Konfiguration vor: IBM PC/AT-kompatibel, EGA/VGA-Monitor, 6,5 Mb-freier Diskraum. Nachdem das Wörterbuch aufgenommen und das Programm gestartet worden ist, erscheint auf dem Bildschirm ein Fenster mit drei Wahlmöglichkeiten: 1) Wortliste; 2) Klasse des CWB;

Computerwörterbücher der Bulgarischen Neologismen

139

3) Sortieren, und drei weiteren Einsatzmöglichkeiten: 1) Belegquelle; 2) Belegjahr; 3) Ursprung. In der Wortliste erscheint die Liste sämtlicher Stichwörter. In der Klasse des CWB kann die ARt der Stichwortauflistung gewählt werden, d.h. in alphabetischer bzw. rückläufiger Reihenfolge. Beim Sortieren können Lemmata abgesondert werden, indem man sie in alphabetischer Reihenfolge, Quellennachweis, Belegjahr und Ursprung auflisten lässt. Beim Starten liegen die Klasse des CWB und das Sortieren in alphabetischer Reihenfolge vor. Die Einsatzmöglichkeiten im Fenster unten dürfen nicht verwendet werden, das lässt nur die Wahl vom Sortieren zu. Die Wörterbücher der Neologismen aus den 70er, 80er und 90er Jahren stellen erste Computerversionen eines erläuternden Wörterbuchs der bulgarischen Sprache dar. Diesen ersten Versuch der Erfassung von Neologismen im Bulgarischen hoffe ich in meiner weiteren Arbeit fortzusetzen.

Ramesh Krishnamurthy

Exploiting the Masses: the corpus-based study of language ABSTRACT In this paper, I will describe some of the methods currently used at Cobuild in our corpus-based study of language and show how these methods help us to give a more reliable and more detailed description of current English.

Introduction The Cobuild project was set up in 1980 by Collins the Publishers (now HarperCollins) and the University of Birmingham. Our aims were to collect a large amount of modern English text in a computer, to use the computer to help us in analyzing the language, and to publish our findings in user-friendly reference books. By 1983 we had collected a corpus of 7.3 million words of written and spoken texts, and by 1985 we had analyzed this data and recorded the results, accompanied by authentic examples from the corpus, in a computer database. In 1986 we extracted a draft dictionary text from this database. Meanwhile, our corpus had grown to 20 million words, and we used this extra data to enhance our analysis as we edited the dictionary text. Features of our user-friendly presentation included writing our definitions in full sentences, showing the headword in its typical syntactic patterns and semantic contexts, avoiding abbreviations and traditional lexicographic ellipsis, and retaining as many of the corpus examples as we could. Our first product, the Collins Cobuild English Language Dictionary (henceforth CCELD), was published in 1987. The details of this initial phase of Cobuild activities can be found in Sinclair (1987). Since then, our publications list has grown considerably, and now includes several dictionaries, grammars, and usage books. A new corpus-building initiative, the Bank of English, was launched in 1991 and recently passed the 200 million word mark. The figures in this paper relate to the Bank of English when it stood at 167 million words.

Why use a corpus? Over the past ten years, the main focus of academic argument has shifted from the usefulness of language corpora to the ideal size and composition of a corpus, and the computational methodology used in its analysis, so I will only summarize the main arguments for using a corpus here: 1. Even expert users have only a partial knowledge of a language, constrained by various domestic, geographical, social and educational factors, as well as by their own linguistic aptitudes and personal interests. A corpus can sample a more comprehensive range of language input. 2. Expert users have an innate tendency to consider what is possible. Informants asked whether a particular utterance is valid can find situations and contexts to justify it relatively easily. A corpus can show us what is common and typical. 3. Expert users cannot quantify their knowledge. A corpus can give us reasonably accurate statistics about the words contained in it. 4. It is difficult for expert users to generate natural examples by introspection. Many collocational and syntactic features cannot be recalled consciously, but manifest themselves effortlessly when language is being used in authentic communicative situations. A corpus can

142

Ramesh Krishnamurthy

provide numerous examples of such genuine usage, with much of its linguistic context preserved. A brief comparison between CCELD and non-corpus-based dictionaries will demonstrate these points. In many of the latter we find an entry for the word 'overstrung' with the meaning 'nervous, tense, excited'. CCELD does not have an entry for this item, because there was no evidence for it in the 20 million word corpus. In fact it is also totally absent from the 200 million word Bank of English, so its omission was amply justified. To a native-speaker's intuition, the item is eminently 'possible', but evidently it has not been used even once by hundreds and thousands of expert users in authentic communicative situations, including novelists, journalists, and broadcasters as well as informal writers and speakers. It certainly cannot be described as common or typical usage. Some dictionaries define the word 'racialism' and cross-refer to it at 'racism'. Others define 'racism' and cross-refer at 'racialism'. We found 20 occurrences of 'racialism' and 147 of 'racism' in the 20 million word corpus, so 'racism' was defined in CCELD and cross-referred at 'racialism'. By looking closer at the occurrences, we were also able to add the information that 'racialism' was more 'old-fashioned'. Several dictionaries give 'an overbearing manner' as their example for 'overbearing'. In 12 out of 18 occurrences in the Cobuild 18-million word corpus, the word describes a person directly, not their 'manner': ...when parents are overbearing. a spoilt and overbearing boy. an arrogant, overbearing, bullying little drunkard. her jealous, overbearing motherin-law...'' - the last being the example given in CCELD. Such subtleties as this are revealed on innumerable occasions when looking at a corpus, and are very difficult to be confident about on the basis of introspection alone. Having thus demonstrated some of the advantages of using a corpus, let us now look at some of the methods that we currently employ at Cobuild to inspect the corpus. For example, what can I find out about the phrase used in the title of this paper, 'exploiting the masses', in the Bank of English? The main computer program for inspecting the corpus is called 'lookup', and the simplest and most direct method is to type in 'exploiting+the+masses'. However, this yields only one occurrence: '...branded them agents of neo-colonialism, exploiting the masses of the Third World, expropriating capital...'. One of the problems of being over-reliant on intuition, and hence too precise in our search parameters at the outset is that we may unwittingly miss various important features of the language. So let us adopt a more comprehensive methodology, which will also serve to illustrate more of the corpus investigation facilities.

Frequency The Cobuild system allows us to look at the frequency of occurrence of all the word-forms in the corpus. We can therefore begin by looking at the frequencies of all the forms which start with the letters 'exploit'. This tells us that 'exploit' itself is the most frequent form, with 1770 occurrences, followed by 'exploitation' (1313) , 'exploited' (1231), 'exploiting' (741), and so on (see Appendix 1). Further down the frequency list, we find rare forms such as 'exploitativeness' and 'exploitatively' (only 1 occurrence each), and obvious typing or printing errors such as 'exploitedby, exploition, exploites' (also only 1 occurrence each). A similar investigation of forms beginning with the letters 'mass' shows that there are 199 of these in the Bank of English, of which only the forms 'mass' (12123 occurrences) and 'masses' (1809) are of relevance here.

Exploiting the Masses: the corpus-based study of language

143

This raises the problem of what level of frequency constitutes reliable information. For practical lexicographic purposes, Cobuild regards any item with a frequency of less than 15 in the 167-million word Bank of English (i.e. less than 1 occurrence in 10 million words) as unreliable for further analysis. Although it is not required in this case, the computer can, of course, present frequency lists in alphabetical rather than numerical order, or selected according to the ending of a word rather than the beginning. For example, a recent enquiry from a member of the public as to whether there was a word that specifically meant 'the killing of a husband' led, at one stage, to the investigation of forms ending in '-cide' (see Appendix 1 for the most frequent of these). Other procedures can help us to assess the significance of a particular word-form in relation to the lexicon. For example, the computer can tell me that 'exploit' is the 7487th most common form in the corpus and 'exploiting' the 13,399th. Lemmatization (grouping all the related forms of a word) can be advantageous, and Cobuild has developed lemmatizing programs. However, it is often difficult to predict the particular grouping of forms which would be relevant for a particular enquiry, and equally difficult to predict when forms that obey the lemmatization rules are nevertheless unlikely to be required (eg 'canning' would be redundant for an analysis of the modal verb 'can').

Distribution Appendix 1 contains 16 vertical columns of figures. This is because the Bank of English is held in the computer as fifteen separate subcorpora, which can be inspected together, individually, or in any grouping. The final column on the right shows the total frequency. So the next piece of information that we might care to consider is the distribution of the frequency information across the subcorpora. At present, the division into subcorpora allows us a very general assessment of whether a word occurs more frequently, for example, in written texts rather than spoken texts, in American English rather than British English, or in books as opposed to newspapers, and so on. The frequency list in Appendix 1 already gave us details of raw frequencies for the forms beginning with 'exploit' within each subcorpus, but there is another problem here, because the subcorpora are of different sizes. So Cobuild has introduced another display which shows the average frequency in each subcorpus, in terms of 'occurrences per million words'. For example, the list in Appendix 1 includes the forms 'exploitative' and 'exploitive'. While it is important to bear in mind that the first is substantially more frequent than the second (178 against 23), and the second is actually relatively rare, it is nevertheless interesting to note that only 18 out of 178 occurrences of 'exploitative' are from American sources (npr, National Public Radio; ambooks, American Books; wsj, Wall Street Journal), whereas the proportion is much higher (11 out of 23 occurrences) for 'exploitive' (Appendix 2). I would cautiously suggest on the basis of this evidence that 'exploitive', to the extent that it is used at all, is favoured by American rather than British users of English. Such a statement can be made with total conviction in other cases, for example in the choice of the spelling 'defense' or 'defence' (Appendix 2), where the figures show the overwhelming preference for 'defense' in American texts and 'defence' in British texts.

144

Ramesh Krishnamurthy

Concordances The next facility at our disposal at Cobuild is the ability to see all the occurrences of a word in concordance form. Let us look at 'exploitive', as it has a low frequency and will not occupy too much space. We can first look at the occurrences sorted alphabetically by the word to the right (Appendix 3). This shows us that 'exploitive' frequently occurs immediately before nouns: activities, attitude, boss, economy, element, greed, nature, presentation, purposes, sex, start, strangers, way. When we look at the same concordances sorted alphabetically by the word to the left (Appendix 3), we might notice the conjunctions and and or. If we look more closely at the words to the left of these conjunctions, we find shoddy, sadistic, sleazy, cruel, heedless, coercive, indifferent, which emphasize the negative evaluations associated with the word 'exploitive'. If we scrutinize the concordance lines yet further, we see that the word sometimes occurs in predicative or complement position after verbs such as be, consider, and conceive. The study of the concordances can be enhanced by displaying various additional information. For example, the general distributional analysis given earlier can be given a more specific focus by displaying the exact text references for each concordance line (Appendix 4). In some cases, this will help us to see patterns and collocations that are frequently used in particular text-types or by particular authors. In the case of 'exploitive' the display assures us that, although there is a high incidence of American uses (7 npr, 4 amb), most of the occurrences are from different texts, so do not point to the linguistic preferences of a particular author or broadcaster.

Grammar Another way in which we can enhance concordances is by the addition of grammatical information. All the texts in the Bank of English have been passed through a tagging program which attaches a word-class identification tag to each word. As all the occurrences of 'exploitive' are adjectival, this would not be very interesting, so I have asked the computer to select 50 random lines from the total of 564 concordances for 'exploits', and asked for the display to reveal the word-class tag for 'exploits' for each of the 50 occurrences selected (Appendix 5). This information can be seen at the left-hand margin. The code NNS stands for 'plural noun', and VBZ for '3rd person singular, present tense, of the verb'. It is difficult to evaluate the accuracy of the tagging program very precisely. If we look only at the lines labelled VBZ, we find that eight out of the ten in this display are correctly assigned (...Leatherhead's FA Cup exploits of the mid-70s... and ...to describe the true-life exploits of the suburban Los Angeles kid... are incorrectly labelled as verbs). If we look only at the lines labelled NNS, we find that 39 out of 40 are correct {...Dylan Thomas jokes, exploits Welsh speech rhythms and wallows... is incorrectly labelled as a noun). In general then, we can rely on an accuracy level of 80-90%. Given the vast size of the corpus and the variety of language data that it contains, and given that the results of automatic processes are always subjected to human lexicographic scrutiny, this is an acceptable level for most of our purposes. However, we accept that this is an area in which we can improve our tools, and we are currently collaborating with the University of Helsinki to re-tag and parse the entire Bank of English.

Exploiting the Masses: the corpus-based study of language

145

Collocation Collocation, the significant co-occurrence of specific lexical items, has been of particular interest to Cobuild since the beginning of the project. Why do some words attract each other, while other words with apparently very similar meanings do not? This is an area in which the computer can be of particular help, and a corpus is essential. For example, several dictionaries give crisp apples and crisp toast among their primary examples for 'crisp'. The Bank of English has 1717 examples for 'crisp', but only 5 are for crisp apples, and 3 for crisp toast. More common are salads (13), vegetables (12) and lettuce (9). The reason may be that apples and toast are considered by native-speakers to be inherently or notionally 'crisp', therefore their crispness does not need to be stated lexically. However, we cannot rely on raw frequency figures for collocation, because the commonest words in the corpus such as the, of, to, and, a, in, that, is, it would tend to occur more frequently with all the other words anyway. Therefore, we are experimenting with various statistical measures that will help to highlight the words that genuinely occur more significantly in the context of a particular word. I will not go into great detail here, but the general principle is: word X occurs so many times in the corpus, therefore I can calculate how many times it is likely to occur within 3 words of word Y; if X occurs near Y more frequently than expected, X is a collocate of Y. Two statistical measures are currently in use. The first, known as 'T-score', measures the number of standard deviations of the observed co-occurrence from the expected co-occurrence, and is therefore regarded as a measure of 'the confidence with which we can assert that an association exists between two words'. The 'T-score' is higher when the co-occurrence is seen many times, because we can be more confident that it is not a freak of chance. The second, known as 'Mutual Information', compares the expected frequency of co-occurrence with the observed frequency, and is therefore regarded as a measure of the strength of association between two words. 'Mutual Information' tends to highlight lower-frequency words as collocates. In Appendix 6, we see the results of using 'T-score' to look at collocates of the form 'exploiting'. The first column lists the collocates, the second column says how often the collocate occurred within three words of'exploiting', and the last column tells us the 'T-score' for the collocation. The collocates are arranged in order of 'T-score' significance. We see immediately that 'T-score' has not completely eliminated the commonest words in the corpus: the, by and of are the top three collocates. However, they are at the top of the list of collocates because they occur even more frequently within three words of 'exploiting' than their corpus frequency would lead us to expect. A detailed discussion of the specific functionality of these words would be too lengthy, so let us just say that a quick look at the concordance lines shows that the frequently introduces the object of 'exploiting' (e.g. ...exploiting the crisis...), by introduces 'exploiting' as the means by which another action is achieved (e.g. ...speculators who try to make a killing on the Bourse by exploiting advance knowledge of poll results...), and (»/"introduces 'exploiting' as the charge that someone is being accused of (e.g. ...the trade unions were accused of exploiting non-union workers...). Of greater interest from the point of view of collocation are the items in the list with a greater semantic content, such as accused, crisis, resources, fears, potential, companies, technology, and so on. The Cobuild program allows us to select the concordance lines containing these, and this enhances our ability to choose dictionary examples containing typical collocates. However, the list only tells us that, for example, accused occurs within three words of 'exploiting', but does not tell us the position in which the collocate occurs in relation to

146

Ramesh Krishnamurthy

'exploiting'. For that, we have the benefit of a different display called 'picture' (Appendix 7). The first column lists the collocates that occur three words before 'exploiting', in 'T-score' order. The second column lists the collocates that occur two words before 'exploiting', and so on. Now we can see more clearly the specific phraseologies that the collocates form part of. For example, accused occurs two or three words before 'exploiting'. The columns to the left generally indicate the subjects of the verb: countries, companies, slogans, opposition, traders', the columns to the right indicate who or what is exploited: them, crisis, loophole, fears, economies, fans (in this display, because of space constraints, only the first ten letters of a word are shown: opportunit, difficulti).

Conclusion In the preceding paragraphs, I have given some indication of the richness and subtlety of the information that can be derived from a corpus-based study of language. However, I have not fully covered my point of departure: 'exploiting the masses'. My intuition had suggested that this would be the most typical form for the phrase, but the corpus contained only one occurrence. Looking back at the top 50 collocates of 'exploiting' (Appendix 6), the list disappointingly does not include 'masses'. Nor does the 'picture' display (Appendix 7). My final recourse is to ask the computer to find all the occurrences of all the words in the Bank of English beginning with the letters 'exploit' (5956 occurrences in all), and select the concordance lines that contain the letters 'mass'. This process yields 29 lines (Appendix 8). So there is a relationship between forms of 'exploit' and forms of 'mass', but the phraseology that my intuition proposed is only one among many phraseologies that contain the two forms. We can see occurrences of 'mass exploitation', 'exploited masses', and so on. But the most significant is 'the exploitation of the masses', which might have been a better title for this paper. As a final check, I looked at the text references for the 29 lines (Appendix 9), and noticed that only 4 were from American sources, 2 of which referred to exploiting the mass media, and that 'the exploitation of the masses' occurs only in British books in the corpus.

Exploiting the Masses: the corpus-based study of language

147

Appendix 1

FREQUENCY e x p l o i t 91 245 224 13 73 ISO 14S 31 I I S S4 43 137 122 39 211 1770 e x p l o i t a t i o n 56 126 392 23 33 205 73 28 71 33 13 101 73 27 40 1313 e x p l o i t e d 59 146 239 9 51 166 83 32 94 53 17 14" 43 23 64 1231 e x p l o i t i n g 32 120 36 4 27 73 58 13 75 27 12 72 43 12 32 741 e x p l o i t s 33 39 74 0 16 121 25 5 65 58 10 56 25 13 23 564 e x p l o i t a t i v e 5 2 73 0 4 35 11 5 S 11 2 12 2 1 6 173 e x p l o i t e r s 7 5 15 2 1 8 2 1 0 0 2 3 2 3 5 61 e x p l o i t a b l e ' 1 1 14 0 3 2 0 0 5 1 0 5 6 0 0 33 exploitive 4 0 2 0 2 6 7 0 0 1 0 1 0 0 0 23 exploiter 1 0 7 0 0 1 2 0 4 0 1 1 0 1 2 20 exploitations 0 0 2 0 0 2 0 0 1 0 0 0 0 0 1 6 exploitants 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 3 exploitational 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 2 exploition 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 exploites 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 exploitedby 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 exploitech 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 exploitativeness 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 1 e x p l o i t a t i v e l v 0 0 0. 0 0 0 0 0 0 0 0 1 0 0 0 1

d é c i d é 666 1659 1323 153 347 1694 2048 490 685 725 338 953 224 186 670 12151 s u i c i d e 261 426 462 9 121 718 690 61 261 472 30 391 54 89 121 4166 c o i n c i d e 27 215 97 8 34 369 79 9 91 38 25 99 20 12 45 1169 g e n o c i d e 12 137 32 0 5 136 199 11 56 8 10 63 1 6 31 707 homicide 90 5 130 2 1 47 161 10 26 19 10 40 1 29 8 579 p e s t i c i d e 28 72 131 1 17 38 77 1 13 6 51 11 59 1 21 537 i n s e c t i c i d e 7 S5 33 0 6 127 10 1 8 3 15 11 23 4 S 309 f u n g i c i d e 6 2 28 0 1 7 5 3 0 0 S 3 2 3 1 2 131 h e r b i c i d e 7 25 19 0 0 12 5 0 1 1 19 6 16 0 3 114 s p e r m i c i d e 2 0 35 44 0 5 5 0 0 1 0 2 0 0 0 94 i n f a n t i c i d e 24 1 8 0 1 10 4 1 4 2 1 12 6 1 3 73 d e i c i d e 0 0 0 0 1 27 0 0 3 0 0 2 0 0 0 33

Ramesh Krishnamurthy

endix 2

SUBCORPUS DISTRIBUTION 'exploitative': 178 occurrences

'exploitive':

23 occurrences

britbooks mags today guardian times econ indy newsci spoken npr ambooks ws j scbooks bbc ephem

73 35 11 12 9 6 4 2 5 11 5 2 X 2 0

3 .0/million 1 .2/million 1 .1/raillion 0 .9/raillion 0 .9/million 0 .9/million 0 .a/million 0 .S/million 0 .S/million 0 .5/million 0 .5/million . 0 .3/million . 0 .3/million . 0 .1/millicn . 0 .0/million

indy ambooks npr mags today britbooks guardiar. times ephem spoken vsj bbc econ newsci scbooks

2 4 7 6 1 2 1 0 0 0 0 0 0 0 0

'defense':

11648

occurrences

'defence':

22459 occurrences

npr ws j ambooks newsci britbooks scbooks mags bbc today guardian indy ephem econ times spoken

7S20 1701 1665 67

281 38 140 73 24 21 8 2 4 4 0

345 .S/million 270 .9/million ISO .1/million 21 .S/million 11 .7/million 9..5/million 4 .7/million 3 .9/million 2 .3/million 1 . .7/million 1 . .6/million 1.,3/million 0 .6/million . 0 .4/million 0 .0/million

bbc econ indy guardian cimes today newsci britbooks scbooks mags ephem spckan ambooks npr wsj

8079 2050 1352 3028 2027 1570 407 2206 176 1311 48 171 31 3 0

0 •4/million 0 . 4/million 0 . 3/millior. 0 . 2/million 0 .1/million 0 .1/million 0 .1/million 0 .0/million 0 . 0/millior. 0 .0/million 0 .0/million 0 .0/million 0 .0/million 0 .0/million 0 .0/million

431 .7/million 290 .8/million 268 .5/million 238 .7/million 195 .7/million 152 •3/million 130 •9/million 92 .1/million 44 .0/million 43 .6/million 32..2/million 19..5/million 2.8/million 0 .1/million 0 .0/million

Exploiting

the Masses: the corpus-based

study of

language

Appendix 3

CONCORDANCES 'exploitive': 23 lines: RIGHT-SORTED < that were cruel,heedless, or exploitive. Social structures are < some futile defence of the exploitive activities of his industry. < true religion by cheap, shoddy, exploitive and trivialising religious < a society which is conceived as exploitive and alienating (although « of films which I don't consider exploitive at all.The imagines of--of pimps

< who seek to benefit off the exploitive nature of capitalism, and they < enough to understand the exploitive nature of capitalism # And then « it may also lead to shoddy and exploitive presentation, warns David «exercise discretion for private, exploitive purposes, for political purposes

« life bearable in an oppressive/exploitive social context and then calling < campaign got off to a typically exploitive start with the offer of 200 « exhaustion among indifferent or exploitive strangers.If the enjoyment of shows it in some light that isn't exploitive, then that's interesting to Michael Korda's fictionalized, if exploitive, version of the last days of find myself trying to justify the exploitive way in which I and hundreds of 'exploitive': 23 lines: LEFT-SORTED < in the 19th century. In an exploitive economy very little domestic « it may also lead to shoddy and exploitive presentation, warns David < old struggle between worker and exploitive boss. Another legacy of < of sadistic manipulation and exploitive greed are something else. < a society which is conceived as exploitive and alienating (although do the movie, thinking it will be exploitive at best (this is the same agency c of films which I don't consider exploitive at all.The imagines of--of pimps Michael Korda's fictionalized, if exploitive, version of the last days of < Do you sense any kind of exploitive element to it? I mean, you know, < life bearable in an oppressive/exploitive social context and then calling

< repression and American exploitation of the masses. What shall we do < overboard.Through the ruthless exploitation of mass media, popular music > < them agents of neo-colonialism, exploiting the masses of the Third World, > « Yeltsin, she said, is exploiting the mass media to call for the > < operators will be free to exploit the massive market of maturing > «power structures have continued to exploit the masses. They have > «social marketing". This technique exploits the private sector s > by the big clearing banks which can exploit their massive resources far > « But this in turn led managers to exploit their freedom to massage their > it is still unclear why the 'super-exploited" Third World can provide markets of cavalry immediately available to exploit to the full a successful assault > « and war, a massive crime wave, exploitation,unemployment, homelessness, and « solely for the purpose of mass exploitation, we still witness the outcome >

Exploiting the Masses: the corpus-based

study of language

155

Appendix 9

CONCORDANCES for words beginning 'exploic' and containing che letters 'mass' WITH TEXT REFERENCES gua00009072 tim00080892 tim00040992 gua0000907i gua0000907l amb0000043S briOOOOOSSl npr00111092 bri00000828 npr00071290 gua00009071 bri00000843 tim00050992 bri000005S2 briOOOOOSSl briOOOOOSSO bri00000123 bri00000673 bri00000826 npr00100391 mag0000017l mag00000761 new000010S5 bri00000S58 eco00009067 bri0000082o briOOOOOSSl mag00000887 mag00000324

pollution from mass exploitation". 3ut human vapours are > his successors were exploiting a vast mass of legend and > as a wicked state that exploits and represses the masses. Those > class and one set of exploiters. And as mass parties of the > and one villain 5 one exploited class and one set of exploiters. impoverished, equally exploited, equally dependent proletarians, consolidating class exploitation, had raised the masses beyond a a mass murderer who exploited Indians 5 The Italian community > to develop momentum and exploit mass = the so-called front> artist to really exploit mass media images in works of :• but the anonymous exploited masses, including women, > wealth through exploitation meant that, if the masses were > about capitalist exploitation of che masses? 'People were > slavery, ana on the exploitation of the mass of peasants, serfs > condemned the exploitation of che masses with the same > it intensifies its exploitation of the labouring masses in the > and American exploitation of the masses. What shall we do Through the ruthless exploitation of mass media, popular music > of neo-colonialism, exploiting che masses of che Third World, > Yeltsin, she said, is exploicing che mass media co call for che > will be free Co exploic che massive markec of maturing > have concinued Co exploic the masses. They have > This technique exploics che privace seccor # > banks which can exploic Cheir massive resources far > in Curn led managers Co exploic cheir freedom co massage cheir > unclear why che 'super-exploiced" Third World can provide markecs available Co exploic to Che full a successful assault > massive crime wave, exploitation, unemployment, homelessness, and the purpose of mass exploicacion, we scill wicness Che ouccome >

156

Ramesh

Krishnamurthy

References Sinclair, J.M., ed. (1987): Looking Up. An account of the COBUILD Project in lexical computing. HarperCollins.

Klaus-Dieter Ludwig Überlegungen zu einem Wörterbuch der Archaismen

1. Einführung Beginnen möchte ich meine Überlegungen mit einigen Zitaten aus der Belletristik. Zunächst zwei kurze Textstellen aus Thomas Manns Roman "Lotte in Weimar"1: "Der Kellner des Gasthofes 'Zum Elephanten' in Weimar, Mager, ein gebildeter Mann, hatte an einem fast noch sommerlichen Tage ziemlich tief im September des Jahres 1816 ein bewegendes, freudig verwirrendes Erlebnis. Nicht, daß etwas Unnatürliches an dem Vorfall gewesen wäre; und doch kann man sagen, daß Mager eine Weile zu träumen glaubte. Mit der ordinären Post von Gotha trafen an diesem Tage, morgens kurz nach acht Uhr, drei Frauenzimmer vor dem renommierten Hause am Markte ein, denen auf den ersten Blick - und auch auf den zweiten noch - nichts Sonderliches anzumerken gewesen war. Ihr Verhältnis untereinander war leicht zu beurteilen: es waren Mutter, Tochter und Zofe. Mager, der, zu Willkommsbücklingen bereit, im Eingangsbogen stand, hatte zugesehen, wie der Hausknecht den beiden ersteren von den Trittbrettern auf das Pflaster half, während die Kammerkatze, Clärchen gerufen, sich von dem Schwager verabschiedete, bei dem sie gesessen hatte und mit dem sie sich gut unterhalten zu haben schien. [...] 'Guten Tag, mein Freund!' sagte die mütterliche der beiden Damen, eine Matrone allerdings, schon recht bei Jahren, Ende Fünfzig zumindest, ein wenig rundlich, in einem weißen Kleide mit schwarzem Umhang, Halbhandschuhen aus Zwirn und einer hohen Capotte, unter der krauses Haar, von dem aschigen Grau, das ehemals blond gewesen, hervorschaute. 'Logis für dreie brauchten wir also, ein zweischläfrig Zimmer für mich und mein Kind' (das Kind war auch die Jüngste nicht mehr, wohl Ende Zwanzig [...]) - 'und eine Kammer, nicht zu weitab, für meine Jungfer. Wird das zu haben sein?'" (1965:9 f.). Nehmen wir noch zwei Sätze aus Theodor Fontanes Roman "Effi Briest": "Johanna, die mit im Garten war, brachte ihr denn auch Umhang, Hut und Entoutcas, und mit einem freundlichen 'Guten Tag' trat Effi aus dem Hause heraus und ging auf das Wäldchen zu, neben dessen breitem chaussierten Mittelweg ein schmalerer Fußsteig auf die Dünen und das am Strand gelegene Hotel zulief." (1969:114). "Effi [...] nahm ihren Platz bei den alten Damen, für die, ganz in der Nähe der Musikempore, die Fauteuils gestellt waren." (1969:173). Wir begegnen in diesen Texten sprachlichen Einheiten, die auffällig sind, und zwar auffällig insofern, als sie an die Peripherie des gegenwärtigen Sprachgebrauchs gerückt sind. Werden sie heute verwendet, dann als "Expressivitätspotential der Gegenwartssprache" (Fleischer 1991:37), in belletristischen Werken, um zeitliches Kolorit zu schaffen. Dem durchschnittlichen Sprachteilhaber sind sie nicht mehr verständlich. In unseren Textstellen betrifft das folgende lexikalischen Einheiten bzw. bestimmte Bedeutungen dieser Ausdrücke: ordinäre Post, Frauenzimmer, Zofe, Hausknecht, Kammerkatze, Schwager, Capotte, Logis, Jungfer, Entoutcas, chaussieren, Fauteuils. Die entsprechenden Bedeutungen sind heute für viele Leser nur mit Hilfe eines Wörterbuches zu erschließen. In den einschlägigen allgemeinen einsprachigen Wörterbüchern der deutschen Gegenwartssprache weisen diese Lexeme eine diachronische Markierung auf, und zwar bezogen auf die "Diachronie nach rückwärts" (Hausmann 1977:113 f.). So werden die in den angeführten

1)

Hervorhebung bestimmter lexikalischer Einheiten von mir. K.-D.L.

Klaus-Dieter Ludwig

158

Textbeispielen vorkommenden auffalligen lexikalischen Einheiten in einsprachigen synchronischen Bedeutungswörterbüchern des Deutschen - wenn sie hier mit der in Frage kommenden Bedeutung verzeichnet sind -mit folgenden "Markierungsetiketten" (Hausmann 1989:649) versehen: Frauenzimmer

('weibliche Person'): "veraltet" (WDG); "veraltet, noch landschaftlich" (Duden-GWB2, Duden-UW) Zofe ('weibliche Person zur persönlichen Bedienung einer vornehmen, meist adligen Dame'):"historisch" (WDG); "früher" (Duden-GWB„ DudenUW) Hausknecht ('Angestellter eines Hotels, dessen Aufgabenbereich sich auf Dienstleistungen erstreckt'): "veraltet" (WDG, Duden-GWB) Schwager ('Postkutscher'): "historisch" (WDG); "früher" (Duden-GWB„ DudenUW); "veraltet" (Wahrig-DW, Knaur) Capotte - im Wörterbuch Kapotte ('kleiner Damenhut, der mit Bändern unter dem Kinn gehalten wird'): "veraltet" (WDG); "früher" (Duden-GWB2, Duden-UW) Logis Jungfer Entoutcas chaussieren Fauteuil

('Unterkunft, Bleibe'): "veraltend" (HDG) ('Dienstmädchen, Zofe'): "veraltet" (WDG, Wahrig-DW) ('Schirm gegen Regen und Sonne'): "veraltet" (WDG, Duden-GWB) ('asphaltieren'): "Straßenbau veraltend" (Duden-GWB); "veraltet"(Wahrig-DW) ('bequemer Polstersessel mit Armlehnen'): "veraltend" (WDG); "besonders österreichisch, sonst veraltend" (Duden-GWB, Duden-UW); "veraltet" (Knaur)

In der Wortgruppe ordinäre Post wird ordinär ohne pejorative Konnotation im alten Sinne von 'allgemein üblich, gewöhnlich, alltäglich, landläufig' verwendet. Für Post in der Bedeutung 'Postkutsche' finden sich in den Wörterbüchern die Markierungsprädikate "historisch" (WDG) und "früher" (Duden-GWB, Duden-UW, Wahrig-DW, Knaur). Kammerkatze für 'Zofe' ist als Lemma in den Wörterbüchern nicht verzeichnet; es findet sich Kammerkätzchen mit folgenden Bedeutungserläuterungen und Markierungen: 'Kammermädchen' - "veraltet scherzhaft" (WDG, Duden-GWB); 'hübsche, junge Kammerjungfer' "scherzhaft" (Wahrig-DW); 'junge, hübsche Kammerzofe' und 'junges Zimmermädchen' "scherzhaft" (Knaur). Neben den genannten Markierungsprädikaten "veraltet", "veraltend", "historisch", "früher" wird "Diachronie nach rückwärts" auch durch die Bedeutungserläuterung mit erfaßt, z. B.: Zofe 'weibliche Person, die früher für die persönliche Bedienung einer wohlhabenden (adligen) Dame tätig war' (HDG) Kapotte '(um die Jahrhundertwende getragener) kleiner, hoch auf dem Kopf sitzender, mit zwei Bändern unterm Kinn festgehaltener Hut' (Duden-GWB,) oder 'unter dem Kinn gebundener, kleiner Damenhut der Biedermeierzeit' (Wahrig-DW).

2)

Wenn nicht besonders vermerkt, bezieht sich Duden-GWB sowohl auf die sechsbändige (1976 - 1981) als auch auf die achtbändige Ausgabe (1993/94) des "Großen Wörterbuches der deutschen Sprache"; sonst: Duden-GWB, bzw. Duden-GWB2.

Überlegungen zu einem Wörterbuch der Archaismen

159

Wir stellen fest, daß die Kennzeichnung von lexikalischen Einheiten in bezug auf die Dimension "Zeit" - hier in bezug auf den Pol "Alt" - recht heterogen ist. Auf dem III. Internationalen Kopenhagener Lexikographie-Symposium von 1986 hat D. Herberg (1988:445 ff.) die Praxis diachronischer Markierungen in allgemeinen einsprachigen Wörterbüchern charakterisiert und einige Probleme dieser Markierungen relativ zu ihrem Zweck diskutiert. In diesem Zusammenhang weist er einerseits auf die regelmäßige Markierung von Archaismen in allgemeinen einsprachigen Wörterbüchern des Deutschen hin und beklagt andererseits die Enthaltsamkeit bei der Markierung von Neologismen in diesen Wörterbüchern. Im folgenden möchte ich die Aufmerksamkeit auf die "Diachronie nach rückwärts (Alte Wörter)" (Hausmann 1977:113) lenken. Da Gegenstand meiner Überlegungen - oder sagen wir besser Vorüberlegungen - ein "Wörterbuch der Archaismen" ist, soll zunächst der Terminus Archaismus erörtert bzw. der Frage nachgegangen werden: "Was sind eigentlich 'alte Wörter' aus heutiger Sicht?" (2.). Ich möchte danach kurz die Kennzeichnung von Archaismen und Historismen in allgemeinen einsprachigen Wörterbüchern der deutschen Gegenwartssprache behandeln (3.), um schließlich einen Ausblick auf ein mögliches "Wörterbuch der Archaismen" zu geben und die damit im Zusammenhang stehende Problematik zu skizzieren (4.).

2. Zu den Termini 'Archaismus' und 'Historismus' 2.1. Bekanntlich vollziehen sich Veränderungen im Wortschatz insbesondere durch die Bildung von Neologismen, durch Archaisierung, Entlehnungen und Bedeutungsveränderungen (vgl. z. B. Braun 1993:158 ff.; Schippan 1992:240 ff.). Archaisierung meint in diesem Zusammenhang den Prozeß des "Veraltens" und "Aussterbens" von lexikalischen Einheiten, von bestimmten Bedeutungen lexikalischer Einheiten oder auch von bestimmten grammatischen Formen. Gemeint ist mit Archaisierung, daß Lexeme oder einzelne Bedeutungen von Lexemen vom Zentrum an die Peripherie des Wortschatzes oder besser: an die Peripherie des Sprachgebrauchs treten. Die Verwendung von Archaisierung in dieser Bedeutung wird im Zusammenhang mit Ausfuhrungen über Wortschatzveränderungen häufig vorausgesetzt. In Fachwörterbüchern der Sprachwissenschaft taucht Archaisierung nicht auf. In allgemeinen einsprachigen Wörterbüchern des Deutschen finden wir zum Lemma archaisieren z. B. die Bedeutungserläuterungen " K u n s t g e s c h . auf firühzeitliche Formen zurückgreifen" (WDG); "archaische Sprach- od. Kunstformen verwenden" (DudenGWB; Duden-UW); "altertümliche Sprach-oder Kunstformen nachahmen" (Knaur). Durch Archaisierung - durch "Veralten" und "Aussterben" von lexikalischen Einheiten entstehen Archaismen. Häufig ist in diesem Zusammenhang auch von Historismen die Rede. Wie werden beide Termini erklärt? Befragen wir zunächst allgemeine einsprachige Wörterbücher, in denen Lexeme bzw. bestimme Bedeutungen von Lexemen als Archaismen oder Historismen gekennzeichnet sind. Archaismus nach WDG: " 1. altertümliche Form: in seiner Sprache, Kunst finden sich viele Archaismen 2. K u n s t g e s c h . das Zurückgreifen auffrühzeitliche Formen [...]" nach Duden-GWB2 und Duden-UW:"(Sprachw., Stilk., Kunstwiss.):

160 1.

Klaus-Dieter

Ludwig

einzelnes archaisches Element (in Sprache od. Kunst): die Archaismen in Thomas Manns Romanen; 'weiland' ist ein A. (veralteter, altertümelnder Ausdruck)

2. archaisierende sprachliche od. künstlerische Haltung, Gestaltungsweise: der A. in der modernen Kunst, Dichtung" nach HDG: "aus alter Zeit stammende, altertümliche, veraltete sprachliche Form: in diesem Roman finden sich viele Archaismen" nach Wahrig-DW:" Altertümelei, Wiederbelebung altertüml. Formen; altertüml. Form" nach Knaur: " 1 Nachahmung archaischer Kunstformen 2 altertümliche Form, altertümliches Wort". Wir stellen fest, daß von den konsultierten Wörterbüchern lediglich das HDG Archaismus ausschließlich auf Sprachliches beschränkt. In den anderen Wörterbüchern wird Archaismus einerseits auf eine "altertümliche Form" bzw. ein "einzelnes archaisches Element" in der Sprache oder Kunst bezogen und andererseits auf das "Zurückgreifen", die "Nachahmung" bzw. "Wiederbelebung" archaischer Formen, auf die "archaisierende sprachliche oder künstlerische Gestaltungsweise" -im Wahrig-DW mit "Altertümelei" gleichgesetzt. Auffallig ist, daß das WDG Archaismus mit der Bedeutungserläuterung "das Zurückgreifen auf frühzeitliche Formen" dem Fachgebiet Kunstgeschichte zuordnet, während Duden-GWB und Duden-UW Archaismus in beiden Verwendungsweisen ("archaisches Element" und "archaisierende sprachliche od. künstlerische Gestaltungsweise") zu Sprachwissenschaft, Stilkunde und Kunstwissenschaft zählen. In Nachschlagewerken zur Sprachwissenschaft bzw. Linguistik, die in jüngerer Zeit erschienen sind, finden sich für Archaismus z. B. die folgenden Erklärungen: Sachwörterbuch für die deutsche Sprache, hrsg. von Sommerfeldt/Spiewok (1989:26 f.):"Bezeichnung für früher einmal übliche Wörter und grammatische Formen, die heute als veraltend oder schon veraltet empfunden werden und in der Regel durch Synonyme ersetzt worden sind". Es wird zwischen lexischen Archaismen (z. B. Perron statt 'Bahnsteig', Gendarm statt 'Polizist') und grammatischen Archaismen (z. B. Dativ-e im Singular maskuliner und neutraler Substantive auf dem Hofe, mit dem Kinde; Verwendung des Genitivobjekts er erinnert sich meiner statt des heute üblichen Präpositionalobjekts an mich) differenziert. Formvarianten (z. B. gülden statt 'golden', Verlöbnis statt 'Verlobung') werden ebenfalls zu den Archaismen gezählt. Es wird darauf hingewiesen, daß Archaismen im historischen Roman "oft gezielt zur Schaffung eines bestimmten Zeitkolorits" gebraucht werden und daß der Begriff Archaismus in der Forschung "auch in einem weiteren Sinn verwendet" wird, so daß er die "Historismen einschließt". Bußmann, Lexikon der Sprachwissenschaft (1990:95): "Stilmittel der Rhetorik: Effektvoller Gebrauch veralteter Ausdrücke, mit poetischer, pathetischer oder ironischer Konnotation (z. B. Minne, Wonne, Hort, sintemal, Anbeginn) oder aus ideologischen Gründen (z. B. Gau, Maid u. dgl. im NS-Vokabular). Gelegentlich auch allgemeiner für lexikalische Relikte wie Ungeziefer (zu ahd. zebar >Opfertier neutral > y = neg. 2.4.2 Non-scalar m/i-,4¿^'-formations True - untrue, able - unable, equal - unequal constitute non-scalar semantic dimensions, which is inferred from the (canonical) non-gradability of both members of the pairs. The feature y that characterises the prefixed member can be represented as (NEG x), i. e. as the negation of

202

Arthur Mettinger

the feature x characterising the unprefixed member. It is for this type of un-A¿^'-formations that Marchand's assumption of the meaning 'not' for the prefix is justified.

3. The Cognitive Linguistics framework 3.1 Some axioms of Cognitive Linguistics (CL) Though it is impossible to characterise the cognitive linguistics approach to the study of language adequately within this paper (cf. Radden 1992) I would nevertheless like to point out some basic assumptions that cognitive linguists hold about language in order to establish the theoretical background against which I will try to characterise wn-prefixation in this subchapter. Above all, CL emphasises the experientially embodied nature of language. It is assumed that "lexical meaning is not ... an autonomous phenomenon, but is ... inextricably bound up with the individual, cultural, social, historical experience of the language user" (Geeraerts 1992:266). Methodologically, this leads to giving up the structuralist attempts to achieve a language-immanent approach with a sharp distinction between linguistic and encyclopaedic knowledge and to replacing it by a basically hermeneutic method that "consists of an interpretative attempt to recover the original experience behind the expressions" (Geeraerts 1992:267). This emphasis on lived experience has three consequences: a) The postulation of schemata because of the assumption that "in order for us to have meaningful, connected experiences that we can comprehend and reason about, there must be pattern and order to our actions, perceptions, and conceptions" (Johnson 1987:29). A schema is thus defined in the following way: A schema is a recurrent pattern, shape, and regularity in, or of, these ongoing ordering activities. These patterns emerge as meaningful structures for us chiefly at the level of our bodily movements through space, our manipulation of objects, and our perceptual interactions. It is important to recognize the dynamic character of image schemata. I conceive of them as structures for organizing our experience and comprehension... They are dynamic in two important respects. (1) Schemata are structures of an activity by which we organize our experience in ways that we can comprehend. They are primary means by which we construct or constitute order and are not mere passive receptacles into which experience is poured. (2) Unlike templates, schemata are flexible in that they can take on any number of specific instantiations in varying contexts... (Johnson 1987:29f.)

b) As it is claimed that "language is an integral part of human cognition" (Langacker 1987:12) the task of linguistic semantics is conceived of as attempting the structural analysis and explicit description of abstract entities like thoughts and concepts: Meaning is equated with conceptualization.... Because conceptualization resides in cognitive processing, our ultimate aim must be to characterize the types of cognitive events whose occurrence constitutes a given mental experience. ... It is claimed ... that semantic structures (which I call "predications") are characterized relative to "cognitive domains", where a domain can be any sort of conceptualization: a perceptual experience, a concept, a conceptual complex, an elaborate knowledge system, etc. The semantic description of an expression therefore takes for its starting point an integrated conception of arbitrary complexity and possibly encyclopedic scope. The basic observation supporting this position is that certain conceptions presuppose others for their characterization (Langacker 1991:2f.)

Cognitive linguistics and lexicography

203

c) The grammar of a language is characterised as "a structured inventory of conventional linguistic units" (Langacker 1987:57), where a unit "is a structure that a speaker has mastered quite thoroughly, to the extent that he can employ it in largely automatic fashion, without having to focus his attention specifically on its individual parts or their arrangement" (Langacker 1987:29). Furthermore, it is claimed that grammatical structures do not constitute an autonomous formal system or level of representation: they are claimed instead to be inherently symbolic, providing for the structuring and conventional symbolization of conceptual content. Lexicon, morphology, and syntax form a continuum of symbolic units, divided only arbitrarily into separate components; ... (Langacker 1991:1)

Within this concept of grammar, a "semantic structure is then defined as a conceptual structure that functions as the semantic pole of a linguistic expression. Hence semantic structures are regarded as conceptualizations shaped for symbolic purposes according to the dictates of linguistic convention." (Langacker 1987:98) 3.2

The descriptive format of MM-,4«^'-formations

3.2.1 Adjectives in CL On the basis of the axioms established above we can now proceed to attempt a characterisation of English adjectives containing the prefix un-. In Langacker's (1987, 1991) cognitive grammar framework every 'predication' (i.e. the meaning of a linguistic expression) imposes a 'profile' on a 'base' where "the base of a predication is its domain" (Langacker 1991:5) and the profile "is a substructure elevated to a special level of prominence within the base, namely that substructure which the expression 'designates'" (Langacker 1991:5). Moreover, a broad distinction is made between basic classes of predications depending on the nature of their profile: a noun is regarded as a symbolic structure that designates a 'thing', "where 'thing' is a technical term defined as a 'region in some domain'" (Langacker 1991:20), whereas verbs, adverbs, adjectives and prepositions are regarded as 'relational' expressions profiling "the 'interconnections' among conceived entities" (Langacker 1991:20). They are thus conceptually dependent in that "one cannot conceptualize interconnections without also conceptualizing the entities that they interconnect" (Langacker 1987:215). Relational expressions must therefore always be characterised in terms of two participants, viz. the 'Trajector' (Tr) and the 'Landmark' (Lm). "The Tr is the more salient participant in the relation. The less salient participant constitutes the Landmark ..., which serves as a kind of reference point for the specification of the Tr." (Taylor 1992:10). For the topic under discussion here the semantic structure of scalar adjectives is of prime importance. Taylor (1992) assumes the following schematic representation of a scalar adjective:

204

Arthur

Mettinger

Dom Figure I [A scalar adjective ] designates a relation between its Tr (a thing) and its Lm, a region on a scale....The large box encloses the relevant cognitive domain of the adjective, e.g., 'length', 'height', 'speed', etc., symbolized by [Dom], while the horizontal line represents the dimension itself. The heavy portions of the diagram represent the profiled elements of semantic structure. The heavy circle represents the Tr of the adjective, the small box surrounding the circle symbolizing the cognitive domain of the Tr. The Tr is located within a profiled region of the dimension, represented by the heavy portion of the horizontal line. The profiled region of the dimension lies in excess of some norm, represented by the region surrounding the point n. The norm is represented as a region so as to capture the 'fuzziness' of scalar adjectives. There is, namely, no precise point on the dimension of, e.g., tallness, which clearly cuts off the class of 'tall' entities from the class of 'not tall' entities (Taylor 1992: lOf.)

The semantic structure of the adjective old in attributive position as in old man, old box, etc., can be represented as an elaboration of the scalar adjective schema, cf.:

t

R Dom Figure 2 Essentially, old denotes that its Tr has been in existence for a period of time in excess of some norm. In Figure 2, the passage of time is represented by the horizontal time-line, at the bottom of the diagram, while the double appearance of the Tr entity represents the continued existence of the Tr over the intervening period of time. The broken line linking the two instantiations of the Tr symbolizes the perceived identity of the Tr at the different times. R denotes the reference time, i.e. the time at which the Tr is characterized with respect to its oldness. (Taylor 1992:11)

Cognitive linguistics and lexicography

205

The combination of old with a nominal predication is possible if this nominal predication is able to elaborate the schematic Tr of the adjective. Moreover, the norm n associated with the adjectival predication receives its precise value only in this composite structure in that it depends on the kind of entity that serves as the adjective's Tr. (Taylor 1992:12f.) From a methodological point of view such schematic representations are meant to capture degrees of specificity: a schema is more abstract and less specified than its elaborations or instantiations; i.e. an instantiation is always fully compatible with the specifications of the schema it instantiates, but is characterised in finer detail. In the following, I will try to sketch the basic schemata which are instantiated by 4/'-formations.

3.2.2 Schematic representation of un-Adj The observation that un-Adj-formations in English can be either scalar (unimportant, unhappy, uncertain) or non-scalar (untrue) requires the assumption of two basically different conceptual characterisations.

3.2.2.1 The SCALE schema According to Mark Johnson "the SCALE schema is basic to both the quantitative and qualitative aspects of our experience" (Johnson 1987:122) and exhibits the following properties: i) the SCALE schema has a more or less fixed directionality. ... Normally, the further along the scale one moves, the greater the amount or intensity... ii) Scales have a cumulative character of a special sort. If you are collecting money and have $15, then you also have $10.... iii) SCALES are typically given a normative character; ... Having more or less of something may either be good or bad, desirable or undesirable. Having more heat in the winter can be desirable, while having more heat in the summer might be awful. In either case, however, norms are mapped on to the scale. iv) [Scales] can be either closed or open... At any rate, SCALARITY does seem to permeate the whole of human experience, even where no precise quantitative measurement is possible. Consequently, this experientially basic, value-laden structure of our grasp of both concrete and abstract entities is one of the most pervasive image-schematic structures in our understanding. The image schema which emerges in our experience of concrete, physical entities is figuratively extended to cover abstract entities of every sort... (Johnson 1987:122f.)

If this image schema is accepted as basic to scalar ««-.^'-formations we can schematically characterise them as in Figure 3:

206

Arthur Mettinger

Figure 3

All scalar formations schematically locate the trajector within a profiled region of the scale that is situated "below" some norm, thus indicating that they denote LESS of the property represented by the scale than is required by the norm. This very general schematic representation must, however, be further elaborated to accommodate three different instantiations of scalar un-A¿^'-formations: a) Adjectives like unimportant (such as in unimportant person) suggest a scale that is bounded at the lower end. The Trajector is then positioned anywhere in the region between the norm and the zero-point on the scale: the more unimportant a person is thought to be, the closer the Trajector is moved towards the zero value until the Trajector ultimately reaches it when s/he is regarded to be completely/totally/absolutely unimportant, cf.:

Figure 4

Cognitive linguistics and lexicography

207

b) The second instantiation concerns cases like uncertain (an uncertain affair), unsafe (an unsafe car), or unsteady (an unsteady hand) where the scale is bounded at the upper end by the norm. The Trajector is placed in the "negative" region of the scale, and the more uncertain, unsafe or unsteady the Trajector is thought to be the farther it is removed from the norm, cf.:

Lm

()Tr

Figure 5

Dom

c) In the case of unhappy (an unhappy child) or unpleasant (an unpleasant situation) the Trajector is again positioned "below" the norm region which does not, however, bound the scale: on the contrary, it extends into the "positive" direction as well as into the "negative" one; when an expression containing un- is used the Trajector is always situated in the "negative" part of the scale - the more negative it is conceived of, the farther the Trajector is removed from the norm which represents some kind of "neutral state", cf.:

Lm

()Tr

Figure 6

Dom

208

Arthur Mettinger

3.2.2.2 The CONTAINER schema I will claim that non-scalar un-A ^'-formations of the type untrue {an untrue statement) must be accounted for in terms of the CONTAINER schema - "a schema consisting of a boundary distinguishing an interior from an exterior. The CONTAINER schema defines the most basic distinction between IN and OUT" (Lakoff 1987:271). Johnson (1987:39) claims that this schema is also responsible for our understanding of negation: together with the metaphorical understanding of propositions as locations we assume that to hold a proposition is understood in terms of being located in a definite bounded space (the space defined by the proposition), whereas to hold the negation of that proposition is understood as being located outside that bounded space. Figure 7 is intended to illustrate this phenomenon for cases like untrue-.

Lm

Dom Figure 7

The Trajector is placed outside the bounded region, thus capturing our intuition that untrue can be adequately paraphrased as 'not true'. On the other hand, this schema can also accomodate cases like almost true (the Trajector is moved into the direct vicinity of the bounded region) or half true (the Trajector is conceptualised as having in part entered the bounded region) as well as slightly untrue (the Trajector is located partly inside and partly outside the bounded region).

4.

Comparison and summary

4.1 Similarities and differences of the two approaches Both the structuralist and the cognitivist approach thus offer a typology of w«-^i^'-formations that can serve as the basis for the lexicographer's work. It will have become obvious that both approaches yield practically identical typologies, though from different points of departure and with different metalanguages: while the structuralist framework and its descriptive apparatus concentrate on the intralinguistic aspects of formations, the cognitivist approach tries

Cognitive linguistics and lexicography

209

to represent linguistic phenomena in the context of human cognition in general and thus operates with basically non-linguistic tools of explanation. Another major difference between the two approaches lies in the fact that while the structuralist account has been empirically substantiated (cf. Mettinger 1994), the viability of the cognitive framework proposed in this paper is still being tested: I am presently working on a large-scale, computer-assisted empirical investigation of the various 'meanings' the prefix un- can have, i.e. I am trying to establish its semantic, or rather, conceptual, profile. This invokes two perspectives: the onomasiological perspective that takes as its starting point the various schemata in whose elaboration un- is involved, and the semasiological perspective that takes its starting point in the linguistic unit as a form, and describes what semantic values (as dependent variable) un- (as independent variable) may receive (cf. Geeraerts et al. 1994:5f.). The result of an analysis of this type is expected to be an array of interrelated senses of unthat exhibits different degrees of prototypicality.

4.2 Consequences for the monolingual dictionary Despite the undeniable fact that the cognitivist approach is little worked out at present, it seems worth while discussing some advantages this approach might have: The diagrammatic representation of the schematic characteristics of un-Adj-formations as illustrated by figures 4 - 7 could, I think, easily be incorporated into an explanatory appendix to a monolingual dictionary, and each individual lemma could then be related to the schema it elaborates (by a numerical index, for example). It is, of course, probably the case that, depending on context, one un-Adj-formation can be regarded as instantiating more than one of the schematic types represented by figures 4 - 7: in these cases a corpus-based analysis would have to find out which schema acts as prototype and in which circumstances different (less prototypical) schemata are invoked, and the results of such an analysis should then be mirrored in the format of the dictionary entry. Thus the schema diagrams suggested in this paper can be regarded as transpositions of explanatory paraphrases into a different semiotic system. According to Schelbert (1988:64) the use of pictures in dictionaries saves long explanations and taps the user's facilities of interpreting such pictures, even to the extent of inferencing metaphorical and metonymic relations (cf. Schelbert 1988:66f.). The schemata act as foils against which actual occurrences of wn-^i^'-formations can be checked; they are recurrent patterns which are, however, flexible enough to be adapted according to the domains and trajectors (i.e. nouns) that are involved. Finally, the approach suggested in this paper is in line with Dirk Geeraerts' observation that the semantic data that are brought to the fore by lexicography support prototype theory [i.e., the idea that a lexical concept is a set of closely related senses with rather vague boundaries, which are grouped round a central, 'prototypical' sense], or, to put it the other way round, that the prototypical model is the best theory available to deal theoretically with the phenomena that are the material of lexicography, and with the empirical observations that result from the descriptive efforts of the lexicographer (Geeraerts 1987:4)

The purpose of this contribution could be summed up in the following way: the schemata suggested (together with their diagrammatic representations) are meant as models of the conceptual substratum underlying the prototypical senses of the respective w«-^^'-formations. Even if their ontological status is far from meeting with consent in the linguistic community, they might give the lexicographer some food for thought as to how the treatment of ««-^informations in the monolingual dictionary might be put on a new basis.

210

Arthur Mettinger

References Ayto, J. R. (1983): On specifying meaning. Semantic analysis and dictionary definitions - In: R. R. K. Hartmann (ed.): Lexicography: Principles and Practice (London etc.: Academic Press) 89-98. Bauer, Laurie (1983): English word-formation. - Cambridge: Cambridge University Press (= Cambridge Textbooks in Linguistics). Geeraerts, Dirk (1987): Types of semantic information in dictionaries - In: R. Ilson (ed.): A spectrum of lexicography. Papers from AILA Brussels 1984 (Amsterdam, Philadelphia: Benjamins) 1-10. Geeraerts, Dirk (1992): The return of hermeneutics to lexical semantics. - In: M. Pütz (ed.): Thirty years of linguistic evolution. Studies in honour of René Dirven on the occasion of his sixtieth birthday. (Philadelphia, Amsterdam: Benjamins) 257-282. Geeraerts, Dirk (1993): Cognitive semantics and the history of philosophical epistemology. - In: R. A. Geiger, B. Rudzka-Ostyn (eds.): Conceptualizations and mental processing in language. (Berlin, New York: Mouton de Gruyter) (= Cognitive Linguistics Research 3) 53-79. Geeraerts, Dirk, Grondelaers, Stefan, Bakema, Peter (1994): The structure of lexical variation. Meaning, naming, and context - Berlin, New York: Mouton de Gruyter (= Cognitive Linguistics Research 5). Herberg, Dieter (1992): Antonymische Beziehungen im Wortschatz und im Wörterbuch - In: K. HyldgaardJensen, A. Zettersten (eds.): Symposium on Lexicography V. Proceedings of the Fifth International Symposium on Lexicography May 3-5, 1990 at the University of Copenhagen (Tubingen: Niemeyer) (= Lexicographica. Series Maior 43) 245-264. Johnson, Mark (1987): The body in the mind. The bodily basis of meaning, imagination, and reason. - Chicago, London: The University of Chicago Press. Kastovsky, Dieter (1982): Wortbildung und Semantik. - Düsseldorf, Bern, München: Bagel und Francke (= Studienreihe Englisch 14). Kastovsky, Dieter (1992): The formats change - the problems remain: Word-formation theory between 1960 and 1990. - In: M. Pütz (ed.): Thirty years of linguistic evolution. Studies in honour of René Dirven on the occasion of his sixtieth birthday. (Philadelphia, Amsterdam: Benjamins) 285 - 310. Lakoff, George (1987): Women, fire, and dangerous things. What categories reveal about the mind. - Chicago, London: The University of Chicago Press. Langacker, Ronald W. (1987): Foundations of cognitive grammar. Vol. 1: Theoretical prerequisites - Stanford: Stanford University Press. Langacker, Ronald W. (1991): Concept, image, and symbol. The cognitive basis of grammar. - Berlin, New York: Mouton de Gruyter (= Cognitive Linguistics Research 1). Lewandowska-Tomaszczyk, Barbara (1988): Universal concepts and language-specific meaning - In: M. SnellHomby (ed.): ZüriLEX '86 Proceedings. Papers read at the EURALEX International Congress, University of Zürich, 9-14 September 1986 (Tübingen: Francke) 17-26. Lipka, Leonhard (1990): An outline of English lexicology. Lexical structure, word-semantics, and word-formation. - Tübingen: Niemeyer (= Forschung und Studium Anglistik 3). Marchand, Hans (1960; 21969): The categories and types of present-day English word-formation. A synchronicdiachronic approach. - Munich: Beck. Mettinger, Arthur (1988): Negativpräfixe im Englischen: Opposition oder Negation?. - In: K. Hyldgaard-Jensen, A. Zettersten (eds.): Symposium on lexicography III. Proceedings of the Third International Symposium on Lexicography May 14-16, 1986 at the University of Copenhagen (Tübingen: Niemeyer) (= Lexicographica. Series Maior 19) 485-501. Mettinger, Arthur (1990): Oppositeness of meaning, word-formation, and lexicography: the English prefix un-, In: J. Tomaszczyk, B. Lewandowska-Tomaszczyk (eds.): Meaning and lexicography (Amsterdam, Philadelphia: Benjamins) (= Linguistic and Literary Studies in Eastern Europe 28) 93-112. Mettinger, Arthur (1994): Aspects of semantic opposition in English. - Oxford: Clarendon Press (= Oxford Studies in Lexicography and Lexicology). Radden, Günter (1992): The cognitive approach to natural language. - In: M. Pütz (ed.): Thirty years of linguistic evolution. Studies in honour of René Dirven on the occasion of his sixtieth birthday (Philadelphia, Amsterdam: Benjamins) 513-541. Schelbert, Tarcisius (1988): Dictionaries - too many words? - In: M. Snell-Homby (ed.): ZüriLEX '86 Proceedings. Papers read at the EURALEX International Congress, University of Zürich, 9-14 September 1986 (Tübingen: Francke) 63-70.

Cognitive linguistics and lexicography

211

Snell-Homby, Mary (1984): The bilingual dictionary - help or hindrance? - In: R. R. K. Hartmann (ed.): LEXeter '83 Proceedings. Papers from the International Conference on Lexicography at Exeter, 9-12 September 1983 (Tübingen: Niemeyer) (= Lexicographica. Series Maior 1) 274-281. Taylor, John R. (1992): Old problems: Adjectives in cognitive grammar. - In: Cognitive Linguistics 3/1, 1-35.

Viggo Hjornager Pedersen

Hans Christian Andersen and Bilingual Lexicography The title of this paper may seem somewhat contrived. What have Hans Christian Andersen and bilingual lexicography to do with each other? Precious little at the moment, I must confess; and I might add that this is at least one reason why so many mediocre translations of Andersen still appear, even though most translation problems in the tales have been tackled several times in the very many English translations that have appeared since the start in 1846. But as I happen to be interested in both Andersen and in Danish/English lexicography, I have found it interesting to see how far these two areas of research may stimulate each other.

Andersen in English As most of us are aware, Andersen is one of the most widely translated writers in the world a household name for Indians and Chinese no less than for Germans and Englishmen. He has been translated repeatedly into most European languages: there are about 30 English more or less complete translations of the Eventyr og Historierx, and many more of the most popular tales. As I have argued in an earlier paper (Pedersen: 1984) Andersen is indeed part of the English literary tradition as of that of many other languages. However, it is a relatively small part of his oeuvre which is kept in print; and often this is only available in translations which leave a lot to be desired. We are not here concerned with the sociological and educational reasons for this. But even for TL-oriented, conscientious translators, Andersen is very difficult to handle. Much of his language is his own 'parole' rather than standard Danish 'langue', and he is full of specific references, puns and stylistic peculiarities which may well escape even conscientious and well qualified non-Danish translators, and which are very difficult to render once they are understood. Moreover, Andersen has certain words and phrases that he uses again and again frequently with ironic reference to earlier texts. Therefore, for a start it would be a great help for students of Andersen, including translators, to have a complete Andersen-vocabulary in a concordance format. But it seems to be equally obvious that modern translators of the texts should draw on the efforts of their predecessors - in a rather more systematic fashion than at present, where one has the impression that many new translations are made with reference to one or two older ones, picked more or less at random, if they are not simply unacknowledged revisions of such texts. When a serious literary student or translator is in doubt as to the precise meaning of a detail, there should be available, over and above an editorial note, a list of possible solutions attempted by earlier translators. In an earlier paper (Pedersen 1987) I have proposed a bilingual annotated variorum edition of Andersen. Fig. 1 shows how the English page might look - without the notes, of course. However, such a plan is not incompatible with other ways of treating the material, and the production of such works as well as of dictionaries proper may nowadays be facilitated by means of data processing.

1) Most English Andersen editions are incomplete, notable exceptions being those of Jean Hersholt (1942-47) and Erik Haugaard (1985). I am here referring to editions containing between a third and 100 % of the Eventyr og Historier. Considering that some of the late stories in particular are by no means found in all Danish editions either, such a selection will give a high proportion of all well known tales and stories.

214

Viggo Hjernager

Pedersen

Fig. 1 THE HISSE AT TOE GROCER'S There was a student, who lived in the attic and didn't own anything. There was also a grocer, who lived on the ground floor

and owned

the whole house, and the nisse stuck to

the grocer, for every Christmas Eve it was the grocer who could

afford

him

a

bowl

of

porridge

with

a big

pat of

butter in it. So the nisse stayed in the grocery shop, and that was very educational. One evening the student came in by the back door to buy some candles and cheese. He had no one to send, and that's why he came himself. He got what he came for, paid for it, and the grocer and his wife nodded "Good evening." She was a woman who could do more than just nod, she had the gift of

the

gab. The

student

nodded,

too, but

while

he was

reading something on the piece of paper which was wrapped around his cheese, he suddenly stopped. It was a page torn out of an old book that ought never to have been torn up, an old book full of poetry. "There" s more of woman

a

few

coffee

it,"

said

beans

the grocer.

for

it.

If you

"I gave an old will

give

me

eightpence, Sir, you shall have the rest." 1. Th« Kiss« at th« Grocer's The Goblin and th« Grocer H.; The Brown!• at th* Buttarnan's Brakstad* Tht Goblin and the Huckster Oulcken, Si«v«rs; Th« Goblin at th« Provislon-Oealer's JCingsland. 2. stud«ftt H.j r««l student Peacheyj proper student Kings land; (H* was) a student of th« good old sort Brakstad. 2. attic Peacheyj garret H. 2. own«d H.j possessed Brakstadj 6. porridge H.j plum porridg« Oulekenj gruel Peachey; jaa Hrs. Paul. 6. pat H.j luap Brakstadj piece Hrs. Paul 7. stayed K.j settled down in Brakstad. 7. educational H.j where there was auch to learn Brakstadj and was right eoafortable there Peacheyj which was very cunning at him Hrs. Paul. 11. and that is why H., Quicken; so Brakstad. 11. got what h« cam« for *H.; procured what ha wante'd Dulcken. 12. Good evening H.j Good night Brakstad. 13. gift of th« gab Kingslandj she was gifted with a glib tongue H.j sh* had an unusual gift of sp««ch Brakstad. 17. C o m up Oulckenj put to this purpose H.; torn to pieces Brakstad. 21. *ightp«nc« Peacheyj eight pennies H.j two groschen Oulckenj sixpence JCingsland.

The Odense-Copenhagen Corpus and its Uses Over the last few years, the Danish standard edition of the tales by Erik Dal has been copied and stored in Mclntosh-format at the H.C. Andersen Center in Odense, and the Jean Hersholt translation at the Center for Translation Studies and Lexicography in Copenhagen. The purpose has been to make it possible to prepare educational machine readable editions of the various tales. But obviously the material has other possible uses, and will prove increasingly useful as it is enlarged and expanded, now by means of optic scanning.2 Until recently, the texts 2) At the H.C. Andersen Center in Odense more of Andersen's Danish texts are to be made available. At the Translation Centre in Copenhagen, work has begun on Dulcken's 19th century translation, which should be ready before the end of the spring term 1995.

Hans Christian Andersen and Bilingual Lexicography

215

were typed in, a slow and laborious process, and one which must be followed up by proofreading - but then again, so must scanning. The texts are also available, with 4 other Andersen texts, on the CD-rom "Magnus", issued by the Royal Library. The system used is a poor man's WordCruncher, but it does allow you to search for individual words and words in combination, and to look at more extensive contexts - cf. fig. 2, which shows a search word, a list of mini-contexts, and one search word in its full context. In the following, I shall describe, and try to demonstrate, a number of different but related uses of the corpus, and so hopefully eventually approach a discussion of the connection between Andersen and lexicography. Fig. 2 Filnavn:

HCA_EV

W C V i e w

[FI]: Se s i d s t e o v e r s i g t s l i s t e CF2]: K o m b i n e r fiere ord [Enter]: Skriv:

2.

3.

Find dette Det ord du

ord sCger

svaerere svaereste svaerger svaerm svsrme svaermeriske svsrt svasrte

SCgeinterval: Inaktiv V/£LG O R D 3 4 1 5 6 1 22 2

S n e e d r o n n i n g e n . ( 1 8 4 5 ) , A n d e n H i s t o r i e . En l i l l e , 3 O V E R S I G T S L I S T E fygede Sneen. " D e t e r d e h v i d e B i e r , s o m svaerme," s a g d e d e n g a m l e B e d s t e m o d e r . "Har de o g s a a e n B i d r o n n i n g ? " s p u r g t e d e n lille D r e n g , for han v i d s t e , at Sneedronningen. (1845), Anden Historie. En lille, 5 i m e l l e m de v i r k e l i g e B i e r er d e r s a a d a n een. " D e t h a r d e ! " s a g d e B e d s t e m o d e r e n . " H u n f l y v e r d e r , h v o r d e svaerme taettest! h u n er s t C r s t af d e m alle, og a l d r i g b l i v e r h u n s t i l l e p a a J o r d e n , hun f l y v e r E n H i s t o r i e f r a K l i t t e r n e . ( 1 8 6 0 ) , T e k s t , 141 selv d e r vilde sluge dem; Skyerne kastede Skygge over vandet, og igjen kom b l i k e n d e S o l s t r a a l e r ; s k r i g e n d e F u g l e , i s t o r e Svaerme, f o e r h e n o v e r h a m , o g Vildaenderne, d e r t u n g e o g s C v n i g e l o d s i g d r i v e p a a V a n d e t , fl