Turning a Bilingual Dictionary into a Lexical-Semantic Database [Reprint 2014 ed.] 9783110920116, 9783484309791

This volume addresses the complex problem of the use and exploitation of bilingual lexical resources available in machin

220 26 17MB

English Pages 344 [348] Year 1997

Table of contents :
Abbreviations
Mel’čuk’s lexical functions and lexical-semantic relations
Acknowledgements
1 General introduction
2 Acquisition of lexical and co-occurrence knowledge: an overview
2.1 Lexical acquisition and machine-readable dictionaries
2.2 Defining collocations
2.2.1 Lexical collocations
2.2.2 Grammatical collocations
2.2.3 Support verbs
2.3 Acquiring lexical knowledge from textual corpora
2.4 Conclusion
3 A few collocational dictionaries
3.1 Introduction
3.2 The BBI Dictionary
3.3 Selected English Collocations (SEC)
3.4 English Adverbial Collocations (EAC)
3.5 Langenscheidts Kontextwörterbuch Französisch-Deutsch
3.6 Conclusion
4 Pustejovsky’s Generative Lexicon
4.1 The theory
4.1.1 Argument structure
4.1.2 Event structure
4.1.3 Qualia structure
4.2 Conclusion
5 MeaningText Theory and the Explanatory Combinatory Dictionary
5.1 Introduction
5.2 The Explanatory Combinatory Dictionary
5.3 Structure of an ECD entry
5.3.1 Introductory Zone
5.3.2 Semantic Zone
5.3.3 Syntactic Zone
5.3.4 Lexical Functions Zone
5.4 MeaningText Theory and Natural Language Processing
5.5 Conclusion
6 Constructing a database from the Collins-Robert Dictionary
6.1 Introduction
6.2 The Collins-Robert Dictionary
6.3 The Collins-Robert metalinguistic apparatus
6.3.1 Part of speech of the source item
6.3.2 Meaning equivalents, explanations and micro-definitions
6.3.3 Subject field codes
6.3.4 Grammar notes
6.3.5 Selection restrictions
6.4 Metalanguage and lexical functions
6.5 Collocations, terminology and lexical functions
6.6 A relational database
6.6.1 Structure of the database
6.6.2 Modifying the metalinguistic apparatus
6.6.3 Enriching the database with lexical-semantic information
6.6.4 Retrieving information from the database: application programs
7 Defining formulae and lexical functions
7.1 Introduction
7.2 Mult
7.2.1 body of
7.2.2 group of
7.2.3 set of
7.2.4 series of
7.2.5 list of
7.2.6 cluster of
7.2.7 number of
7.2.8 A note on “collective”
7.3 Sing
7.3.1 piece of
7.3.2 a member of
7.3.3 flash of
7.3.4 act of
7.4 Sinstr
7.5 Caus/Perm
7.5.1 cause to/allow to
7.5.2 make + adj
7.6 A0
7.6.1 concerning
7.6.2 of the/of a
7.6.3 made of
7.7 Aj
7.7.1 having + N
7.7.2 causing + N
7.7.3 lacking + N
7.7.4 who + V
7.7.5 without
7.8 Anti
7.8.1 not
7.8.2 lack of
7.8.3 as opposed to
7.8.4 no longer
7.9 Contr
7.10 Son
7.10.1 sound of/noise of
7.11 Incep
7.11.1 become
7.11.2 grow
7.11.3 get
7.11.4 turn
7.12 Smod
7.12.1 way of + V-ing
7.13 Able
7.13.1 meant to
7.13.2 fit to
7.13.3 can be
7.13.4 able to
7.13.5 capable of
7.14 Perf
7.14.1 thing + Ven
7.15 Magn//
7.16 Centr
7.17 Conclusion
8 A few suggestions towards the creation of additional functions
8.1 Introduction
8.2 Unit
8.3 Part
8.3.1 part of
8.3.2 branch of
8.4 Child/Parent
8.5 Male/Female
8.6 Process
8.7 Telic
8.8 Spec
8.9 Conclusion
9 Assigning lexical functions: tests and consistency
9.1 Introduction
9.2 IncepPredMinus vs. FinFunc0
9.3 Culm vs. Centr
9.4 Telic
9.5 Real vs. AntiReal
9.6 Morphological clues
9.6.1 Up/Down
9.6.2 Off
9.6.3 Away
9.7 Conclusion
10 A closer look at the lexical function Son
10.1 Introduction
10.2 Phonaesthesia and onomatopoeia
10.3 Sound verbs in the Collins-Robert dictionary
10.4 Analyzing regularities
10.5 Final clusters
10.6 Conclusion
11 Noun alternations and sense extensions
11.1 Introduction
11.2 Lexical Implication Rules
11.3 LIRs and the bilingual dictionary
11.3.1 Mass/Count alternation
11.3.2 The Animal ? Fur LIR
11.3.3 The Fruit/Flower of plant ? Plant LIR
11.3.4 The Container -> Contents LIR
11.4 Conclusion
12 Transitivity alternations
12.1 Introduction
12.2 The causative/inchoative alternation
12.3 Ergative verbs and the Collins-Robert dictionary
12.4 Ergativity and translation
12.4.1 No modification
12.4.2 Pronominalization
12.4.3 Causative operator: Faire + infinitive
12.4.4 Causative operator: Rendre + adjective
12.4.5 Passive construction
12.5 Automatic vs. semi-automatic acquisition of ergative verbs
12.6 Ergative verbs and lexical functions
12.7 Conclusion
13 Metaphors and lexical functions
13.1 Introduction
13.2 Lakoff & Johnson’s Metaphors We Live By
13.3 Lexical functions
13.3.1 Mult
13.3.2 Basic shapes and lexical functions
13.4 Conclusion
14 Pedagogical applications
14.1 Introduction
14.2 Teaching collocations in a business English class: an experiment
14.3 Towards CALL exercises
14.3.1 Studying the collocational potential of a given base
14.3.2 Recovering the base
14.3.3 Studying lexical functions
14.4 Improving reference works
14.5 Conclusion
15 General conclusions
16 Appendices
16.1 Appendix A: The Son lexical function (sample list)
16.2 Appendix B: Verbs of sound (in reverse alphabetical order)
16.3 Appendix C: The Mult lexical function (sample list)
16.4 Appendix D: The Culm lexical function
16.5 Appendix E: The Centr lexical function
16.6 Appendix F: The Sloc lexical function (sample list)
16.7 Appendix G: The Oper, lexical function (sample list)
16.8 Appendix H: The Liqu lexical function (sample list)
16.9 Appendix I: The noun PRICE and lexical functions
16.10 Appendix J: List of ergative verbs
17 Resume
18 Zusammenfassung
19 Bibliography
20 Index

Recommend Papers

Manipuri Verbal Dictionary: A Manipuri-English Bilingual Dictionary 9383195266

496 48 37MB Read more

Education, Inc: Turning Learning Into a Business [revised] 0325004897

While educators want their students to grow into thoughtful and curious people, the overriding objective of corporations

167 74 5MB Read more

Turning Ideas into Research 9781473927438

195 41 1MB Read more

Roadmap to Strategic HR: Turning a Great Idea into a Business Reality 0-8144-0867-2

Foreword by Dave Ulrich For all the theories and talk about making human resources a strategic component of business, in

385 88 1MB Read more

Make Money from Makes: A Guide to Turning your Hobby into a Business 9781908003522, 9781908003485

128 93 71MB Read more

Turning Psychology into a Social Science [1 ed.] 0367898136, 9780367898137

This radical book explores a new understanding of psychology based on human engagement with external contexts, rather th

251 29 594KB Read more

Turning Your Business Into A Success Monster: The LinkedIn B2B Lead Generation Revolution 9780645605204, 0645605204

119 34 2MB Read more

The Professor Is In: The Essential Guide to Turning Your Ph.D. Into a Job 9780553419429, 9780553419436

354 43 4MB Read more

Turning 50 During a Pandemic

490 12 148KB Read more

Smart Strategies for Turning an Idea into a Product or Service [1 ed.] 9781477776353, 9781477776346

Turning an idea into a product or service takes dedication and perseverance, but the best part is anyone can do it. This

106 42 15MB Read more

Turning a Bilingual Dictionary into a Lexical-Semantic Database [Reprint 2014 ed.]
9783110920116, 9783484309791

Author / Uploaded
Thierry Fontenelle

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Series Maior

LEXICOGRAPHICA Series Maior Supplementary Volumes to the International Annual for Lexicography Suppléments à la Revue Internationale de Lexicographie Supplementbände zum Internationalen Jahrbuch für Lexikographie

Edited by Sture Allén, Pierre Corbin, Reinhard R. K. Hartmann, Franz Josef Hausmann, Ulrich Heid, Oskar Reichmann, Ladislav Zgusta 79

Published in cooperation with the Dictionary Society of North America (DSNA) and the European Association for Lexicography (EURALEX)

Thierry Fontenelle

Turning a Bilingual Dictionary into a Lexical-Semantic Database

Max Niemeyer Verlag Tübingen 1997

Die Deutsche Bibliothek - CIP-Einheitsaufnahme [Lexicographica / Series maior] Lexicographica : supplementary volumes to the International annual for lexicography / pubi, in cooperation with the Dictionary Society of North America (DSNA) and the European Association for Lexicography (EURALEX). Series maior. - Tübingen : Niemeyer. Früher Schriftenreihe Reihe Series maior zu: Lexicographica 79. Turning a bilingual dictionary into a lexical semantic database. - 1997 Turning a bilingual dictionary into a lexical semantic database / Thierry Fontenelle. Tübingen : Niemeyer, 1997 (Lexicographica : Series maior ; 79) ISBN 3-484-30979-2

ISSN 0175-9264

© Max Niemeyer Verlag GmbH & Co. KG, Tübingen 1997 Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. Printed in Germany. Druck: Weihert-Druck GmbH, Darmstadt Einband: Industriebuchbinderei Hugo Nadele, Nehren

Table of contents

Abbreviations Mel'cuk's lexical functions and lexical-semantic relations Acknowledgements 1 General introduction

xi xiii xv 1

2 Acquisition of lexical and co-occurrence knowledge: an overview 2.1 Lexical acquisition and machine-readable dictionaries 2.2 Defining collocations 2.2.1 Lexical collocations 2.2.2 Grammatical collocations 2.2.3 Support verbs 2.3 Acquiring lexical knowledge from textual corpora 2.4 Conclusion

9 9 16 16 21 21 23 28

3 A few collocational dictionaries 3.1 Introduction 3.2 The BBI Dictionary 3.3 Selected English Collocations (SEC) 3.4 English Adverbial Collocations (EAC) 3.5 Langenscheidts Kontextwörterbuch Französisch-Deutsch 3.6 Conclusion

31 31 32 36 39 41 43

4 Pustejovsky's Generative Lexicon 4.1 The theory 4.1.1 Argument structure 4.1.2 Event structure 4.1.3 Qualia structure 4.2 Conclusion

47 47 47 47 48 51

5 MeaningoText Theory and the Explanatory Combinatory Dictionary 5.1 Introduction 5.2 The Explanatory Combinatory Dictionary 5.3 Structure of an ECD entry 5.3.1 Introductory Zone 5.3.2 Semantic Zone 5.3.3 Syntactic Zone 5.3.4 Lexical Functions Zone 5.3.4.1 Standard Lexical Function 5.3.4.2 Paradigmatic vs. syntagmatic relations

53 53 56 58 58 59 60 62 62 64

vi

5.4 5.5

5.3.4.3 Compound lexical functions 5.3.4.4 Fused lexical functions 5.3.4.5 List of standard lexical functions 5.3.4.6 Subscripted modifiers 5.3.4.7 Paraphrasing rules MeaningText Theory and Natural Language Processing Conclusion

66 67 68 90 92 93 98

Constructing a database from the Collins-Robert Dictionary 6.1 Introduction 6.2 The Collins-Robert Dictionary 6.3 The Collins-Robert metalinguistic apparatus 6.3.1 Part of speech of the source item 6.3.2 Meaning equivalents, explanations and micro-definitions 6.3.3 Subject field codes 6.3.4 Grammar notes 6.3.5 Selection restrictions 6.4 Metalanguage and lexical functions 6.5 Collocations, terminology and lexical functions 6.6 A relational database 6.6.1 Structure of the database 6.6.2 Modifying the metalinguistic apparatus 6.6.3 Enriching the database with lexical-semantic information 6.6.4 Retrieving information from the database: application programs

101 101 102 104 104 105 105 107 107 112 118 121 123 132 135 140

Defining formulae and lexical functions 7.1 Introduction 7.2 Mult 7.2.1 body of 7.2.2 group of 7.2.3 set of 7.2.4 series of 7.2.5 list of 7.2.6 cluster of 7.2.7 number of 7.2.8 A note on "collective" 7.3 Sing 7.3.1 piece of 7.3.2 a member of 7.3.3 flash of 7.3.4 act of 7-4 SinstI 7.5 Caus/Perm 7.5.1 cause to/allow to 7.5.2 make + adj 7.6 Ao 7.6.1 concerning

147 147 149 149 150 151 151 151 151 152 152 154 154 155 155 155 156 157 157 158 160 160

vii

7.7

7.8

7.9 7.10 7.11

7.12 7.13

7.14 7.15 7.16 7.17

7.6.2 of the/of a 7.6.3 made of A, 7.7.1 having + Ν 7.7.2 causing + Ν 7.7.3 lacking + Ν 7.7.4 who + V 7.7.5 without Anti 7.8.1 not 7.8.2 lack of 7.8.3 as opposed to 7.8.4 no longer Contr Son 7.10.1 sound of/noise of Incep 7.11.1 become 7.11.2 grow 7.11.3 get 7.11.4 turn Smod 7.12.1 w a y o f + V - i n g Able 7.13.1 meant to 7.13.2 7.13.3 can be 7.13.4 able to 7.13.5 capable of Perf 7.14.1 thing+ Ven Magn// Centr Conclusion

fitto

8 A few suggestions towards the creation of additional functions 8.1 Introduction 8.2 Unit 8.3 Part 8.3.1 part of 8.3.2 branch of 8.4 Child/Parent 8.5 Male/Female 8.6 Process 8.7 Telic 8.8 Spec 8.9 Conclusion

160 161 162 164 164 164 165 165 166 166 166 166 167 167 168 168 169 169 170 170 170 171 171 171 171 172 172 173 173 174 174 174 175 176 179 179 181 183 184 185 185 186 187 190 192 192

vili 9 Assigning lexical functions: tests and consistency 9.1 Introduction 9.2 IncepPredMinus vs. FinFunc0 9.3 Culm vs. Centr 9.4 Telic 9.5 Real vs. AntiReal 9.6 Morphological clues 9.6.1 Up/Down 9.6.2 Off 9.6.3 Away 9.7 Conclusion

195 195 195 196 197 198 200 201 203 204 205

10

A closer look at the lexical function Son 10.1 Introduction 10.2 Phonaesthesia and onomatopoeia 10.3 Sound verbs in the Collins-Robert dictionary 10.4 Analyzing regularities 10.5 Final clusters 10.6 Conclusion

207 207 207 208 210 214 216

11

Noun alternations and sense extensions 11.1 Introduction 11.2 Lexical Implication Rules 11.3 LIRs and the bilingual dictionary 11.3.1 Mass/Count alternation 11.3.2 The Animal Fur LIR 11.3.3 The Fruit/Flower of plant Plant LIR 11.3.4 The Container -> Contents LIR 11.4 Conclusion

219 219 220 222 222 223 225 226 227

12

Transitivity alternations 12.1 Introduction 12.2 The causative/inchoative alternation 12.3 Ergative verbs and the Collins-Robert dictionary 12.4 Ergativity and translation 12.4.1 No modification 12.4.2 Pronominalization 12.4.3 Causative operator: Faire + infinitive 12.4.4 Causative operator: Rendre + adjective 12.4.5 Passive construction 12.5 Automatic vs. semi-automatic acquisition of ergative verbs 12.6 Ergative verbs and lexical functions 12.7 Conclusion

229 229 231 233 236 236 236 237 237 237 238 240 243

13

Metaphors and lexical functions 13.1 Introduction 13.2 Lakoff & Johnson's Metaphors We Live By

245 245 245

ix 13.3 Lexical functions 13.3.1 Mult 13.3.2 Basic shapes and lexical functions 13.4 Conclusion

246 247 250 253

14

Pedagogical applications 14.1 Introduction 14.2 Teaching collocations in a business English class: an experiment 14.3 Towards CALL exercises 14.3.1 Studying the collocational potential of a given base 14.3.2 Recovering the base 14.3.3 Studying lexical functions 14.4 Improving reference works 14.5 Conclusion

255 255 256 260 261 262 263 268 270

15

General conclusions

273

16

Appendices 16.1 Appendix 16.2 Appendix 16.3 Appendix 16.4 Appendix 16.5 Appendix 16.6 Appendix 16.7 Appendix 16.8 Appendix 16.9 Appendix 16.10Appendix

281 281 284 285 286 287 289 290 291 292 296

A: The Son lexical function (sample list) B: Verbs of sound (in reverse alphabetical order) C: The Mult lexical function (sample list) D: The Culm lexical function E: The Centr lexical function F: The Sloc lexical function (sample list) G: The Oper, lexical function (sample list) H: The Liqu lexical function (sample list) I: The noun PRICE and lexical functions J: List of ergative verbs

17

Résumé

303

18

Zusammenfassung

305

19

Bibliography

307

20

Index

325

Abbreviations

Dictionaries AHD: BBI: CIDE: Cobuild: CR: DCFC: EAC: LDOCE: LKWB: LOLEX: OALD: ODCIE: OH: SEC: W7:

American Heritage Dictionary (1991) BBI Combinatory Dictionary of English (Benson et al. 1986a) Cambridge International Dictionary of English (Procter 1995) Collins Cobuild English Language Dictionary (Sinclair 1987a) Collins-Robert E-F, F-E Dictionary (Atkins & Duval 1978) Dictionnaire de collocations français-chinois (Liang, in press) English Adverbial Collocations (Kozlowska 1991) Longman Dictionary of Contemporary English (Procter 1978) Langenscheidts Kontextwörterbuch Französisch-Deutsch (Ilgenfritz et al. 1989) Longman Lexicon of Contemporary English (McArthur 1981) Oxford Advanced Learner's Dictionary of English (Cowie 1989) Oxford Dictionary of Current Idiomatic English (Cowie 1983) Oxford-Hachette E-F, F-E Dictionary (Corréard & Grundy 1994) Selected English Collocations (Kozlowska & Dzierzanowska 1993) Webster's Seventh New Collegiate Dictionary (1963)

Miscellaneous AI: ASCII: CALL: CEC: CL: DBMS: ECD: EFL: ELRA: ESL: EU: Euralex: GP: IR: ISA: LCP: LDB: LE: LF:

artificial intelligence American Standard Code for Information Interchange computer-assisted language learning Commission of the European Communities computational linguistics database management system Explanatory Combinatory Dictionary English as a Foreign Language European Language Resources Association English as a Second Language European Union European Association for Lexicography Government Pattern Information Retrieval is a member of the class of Lexical Conceptual Paradigm lexical database Language Engineering (CEC) lexical function

xii LIR: LKB: LRE: LSP: MI: MLAP: MLIS: MRD: MT: MTD: MTT: NLP: POS: RDBMS: SGML: s.v.: TLA

lexical implication rule lexical knowledge base Linguistic Research and Engineering (CEC) Language for Special Purposes mutual information Multilingual Action Plan (CEC) Multilingual Information Society (CEC) machine-readable dictionary machine translation machine-tractable dictionary MeaningoText Theory natural language processing part of speech relational database management system Standard Generalized Markup Language sub voce three-letter abbreviation

MePcuk's lexical functions and lexical-semantic relations

A„: adjective Afl (law)= legal A¡: typical modifier for ith actant A, (surprise)= surprised Ablej: adjective for capability of ith actant Able, (read)= literate Adv0: adverb Adv0 (honest)= honestly Adv, (dismay)= in dismay Advp adverbial for ith actant Anti (like)= dislike An ti: antonym Bon (advice)= sound Bon: 'good' (expression of praise) Cap (school)= head Cap: leader/chief Caus (rise)= raise Caos: cause CausPredMinus (price)= drop CausPredMinus: cause to decrease CausPredPlus (price)= increase CausPredPlus: cause to increase Centr (problem)= crux Centr: centre/middle ContOper, (influence)= maintain Cont: continue Contr (heaven)= earth Contr: contrastive term Conv2, (more)= less ConVy: converse term Culm (anger)= paroxysm Culm: culmination Degrad (milk)= go/turn sour Degrad: degrade, get worse Equip (hospital)= staff Equip: team, crew Excess (heart)= palpitate Excess: function excessively Fact,, (dream)= come true Fact 0jU : be realized Figur (smoke)= cloud Figur: standard metaphor FinOper, (influence)= lose Fin: cease, stop F u n c ^ : nearly empty verb (keyword=subj.) Func0 (silence)= reign Gener: superordinate Gener (anger)= feeling Germ: germ, core Germ (evil)= root Imper: order, command Imper (silence)= shut up! Incep: begin IncepFunc0 (war)= break out IncepPredMinus: start to decrease IncepPredMinus (price)= fall IncepPredPius: start to increase IncepPredPlus (price)= skyrocket Instr: typical preposition («with the help of) Instr (car)= by Involv: keyword = subject Involv (smell)= fill [the room] Labor^: nearly empty verb:ith actant= subject; jth actant= direct obj Labor,2 (consideration)= take into Liqu: liquidate, delete Liqu (disease)= eradicate Loc,b/,d/in: locative prepositions Locin (list)= on Magn: intensifier Magn (bachelor)= confirmed Manif: be manifest Caus,Manif (opinion)= express Minus: less IncepPredMinus (price)= fall Mult: regular group/set Mult (dog)= pack Nocer: damage, attack Nocer (mosquito)= bite

xiv Obstr: function with difficulty Oper, 2 : nearly empty verb (keyword=obj.) Pejor: worse Perf: perfective (completed action) Perm: permit Plus: more Pos¡: positive evaluation of ith actant Pred: predicate to be) Prepar: prepare Propt: typical preposition (« because of) Prox: on the verge of Qual¡: able¡ + highly probable Real, 2 j satisfy the requirements of Result: result of an event S 0 : noun S¡: typical noun for ith actant S instr : typical instrument Sloc: typical place S med : typical means S mod: typical mode Sres: typical result Sing: regular 'portion' Son: typical sound Sympt: physical symptoms Syn: synonym V 0 : verb Ver: as it should be

Obstr (voice)= falter Oper, (attention)= pay CausPredPejor (prospect)= darken S, Perf (marry)= spouse Perm ι Fact,, (passion)=succumb to IncepPredPlus (price)= skyrocket Pos2 (opinion)= favorable Pred (actor)= act PreparFacto (rifle)= load Propt (fear)= for ProxFunc 0 (storm)= approach Qual, (deceive)= deceitful Real, (promise)= keep Result (learn)= know S 0 (die)= death S, (murder)= murderer S¡„slr (paint)= brush Sloc (lion)= den Smed (write)= ink Smod (write)= handwriting Sres (copy v )= copyn Sing (rice)= grain Son (elephant)= trumpet Degrad (speech) + Sympt 23 (surprise) speechless Syn (help)= aid V 0 (advice)= advise Ver (excuse)= legitimate

Other lexical-semantic relations Unit: unit of Part: part of Child: child/young of Parent: parent of Male: male of Female: female of Process: typical noun for process Telic: relation between instrument and verb Spec: specific term (hyponym)

Unit (gravity)= g Part (table)= leg Child (horse)= foal Parent (lamb)= sheep Male (pig)= boar Female (elephant)= cow S 0 Process (apply)= application Telic (rubber)= erase (see Sinstr) Spec (flower)= rose, tulip... (see Gener)

Acknowledgements

Writing a book such as this one is hardly ever the result of purely individual work, not only because the writer usually benefits from discussions with supervisors, colleagues and friends, but also because he usually has to rely on a number of people for practical or moral support to bring his research to a successful end. In some cases, however, it is not possible to thank everyone individually and viva voce, especially when some of the most influential protagonists have passed away. I wish to express my gratitude to the late Professor Jacques Noël and to my father, who both departed this life while I was preparing this book. By teaching me linguistics and lexicography and subsequently offering me the opportunity of working with him at the University of Liège, Jacques Noël persuaded me to embark on the exploitation of the Collins-Robert dictionary. Without his unflagging energy and contagious enthusiasm, without his unfailing support in the early stages, this work would never even have started. My father's role was undoubtedly different but, by constantly setting himself new goals to achieve and by obstinately refusing self-satisfaction, he was, and still is, a shining example to me. I was lucky enough to benefit from fruitful discussions with and critical comments by Professor André Moulin, Professor Anthony P. Cowie, Professor Siegfried Theissen, Professor Ladislav Zgusta, Dr Ulrich Heid and Dr Michel Kefer and I wish to thank them all. I am also especially indebted to Professor Archibal Michiels who taught me how to program in Clipper. His contribution is not limited to the programming skills he inculcated into me, however, or to the critical remarks he passed about an earlier version of the manuscript, and the numerous references to his dissertation and papers are only the tip of the iceberg of inspiration he was to me. The usual disclaimer applies, however, since I am solely responsible for errors and inconsistencies that may remain. A similar vote of thanks goes to Jacques Jansen who devoted a lot of energy to the construction of the Collins-Robert database. Nothing would have been possible without the meticulous care he took in analyzing the typesetting tapes of the dictionary. Luc Alexandre has also been most helpful, writing some of the retrieval programs used in this work. His efforts are especially appreciated. Thanks are also due to the publishers for granting access to the tapes of the Collins-Robert dictionary for research purposes. Editorial and clerical assistance from Jean Robertson, Anne-Michèle Debruche, Cécile Vanoorschot, Gisèle Fontenelle and Georges Delbrouck was also invaluable, especially given the time constraints under which I had forced them to work. I must also thank other members of my former department in Liège, in particular Claire Gérardy and Luc Thomas, for their support and comments. I am also grateful to my colleagues Carlo Mergen and Achim Blatt for helping me with the German summary. Finally, I want to thank my wife, Cécile, who suffered uncomplainingly and tried to put up with my idiosyncratic behaviour during the genesis of this book. Her contribution extended

xvi much beyond typing the first draft and keyboarding around 40,000 lexical-semantic labels. Her unfailing support defies any attempt at conventional expression. With the help of our family, she took on the parental responsibilities I was too often prone to shirk, especially in the final stages of the project, and managed to provide me with the encouragement and impetus I needed so sorely. This book is dedicated to her and to our two sons who, I hope, have not resented my unavailability too much.

Ambition: Toujours précédé de folle quand elle n'est pas noble. Défaite: S'essuie, et elle est tellement complète qu'il ne reste personne pour en porter la nouvelle. Dictionnaire: En dire: "N'est fait que pour les ignorants." - Dictionnaire de rimes: s'en servir? Honteux! Enterrement: A propos du défunt: "Et dire que je dînais avec lui il y a huit jours!" S'appelle obsèques quand il s'agit d'un général, enfouissement quand c'est celui d'un philosophe. Question:

La poser c'est la résoudre.

Gustave Flaubert Dictionnaire des idées reçues.

1 General introduction

A little over ten years ago, as a B.A. student in Germanic philology at the University of Liège, I had to choose a topic for my final-year thesis and, since I was mainly interested in English language and linguistics, I was left with a Gordian knot which was rather difficult to untie. The choice I was faced with at the time was indeed a pragmatic one. Quite naively, I tended to make a crude distinction between grammar on the one hand and vocabulary or dictionaries on the other. In the course of my studies, I had experienced a number of problems in trying to find my way about grammars. I had always had difficulty finding solutions to very specific problems of word order, modal auxiliaries or determiners. The non-alphabetical ordering of the problems in a traditional grammar was of course to blame (I did not feel blameworthy at all) and having to resort to an index at the end of a grammar book was felt to be a real nuisance. Short indexes are usually of no use at all whereas a 100-page index such as can be found in Quirk et al. (1985) is so detailed that many students consider it tedious to have to leaf through it to find an answer to a concrete problem. The seemingly hopeless quandary I was in was then solved fairly quickly. I turned to dictionaries, quite naturally. My conception of dictionaries and of lexicography in general, was a rather simplistic one. Since dictionaries are usually organized alphabetically, I thought, accessing information is trivial (at the time, I had known the alphabet for around 20 years) and the only interesting problems for a linguist are probably those that are related to the macrostructure of a dictionary (which entries are included? which are not? why? etc.). Dictionaries were therefore a safe choice. The information I needed was buried in a dictionary entry and I was sure to find it quickly simply by looking up the relevant entry, without having to consult the dreaded index of a grammar. My then supervisor, the late Professor Noël, advised me to read a Ph.D. dissertation which had been written a few years before by a member of the English Department of the University of Liège, Dr Michiels (see Michiels 1982). This dissertation, which dealt with the exploitation of the machine-readable version of the Longman Dictionary of Contemporary English (Procter 1978, henceforth LDOCE) offered me the opportunity of getting acquainted with the emerging field of what had come to be known as computational lexicography and with dictionaries and natural language processing (NLP) in general. Part of my naivety then began to disappear as I realized that there is more to dictionaries than meets the eye. After writing my B.A. thesis on a simulated application of LDOCE in a machine translation (MT) perspective, I then joined the Eurotra team in Liège and availed myself of the opportunity to learn a little more about dictionaries in general and computational lexicography in particular. In the meantime, Professor Noël had signed a contract with Collins Publishers in Glasgow and Dictionnaires Le Robert in Paris and had obtained, for research purposes, a copy of the machine-readable version of the Collins-Robert English-French French-English dictionary (Atkins & Duval 1978, henceforth CR). The CR magnetic tapes differed significantly from the LDOCE material

2 Michiels had used in his dissertation, however, insofar as what had now been made available to the team was basically a machine-readable dictionary (MRD), i.e. a dictionary which had been encoded in machine-readable form for typesetting purposes. It was not, like LDOCE, a computerized dictionary, i.e. a dictionary whose organization of information is based on a set of explicit and well-defined conventions, to paraphrase Michiels (1982:7). To avoid having to transform CR into a database with formatted fields, i.e. explicit locations for specific types of lexical information, Jacques Jansen, the team's computer engineer, took on the task of processing the tapes in order to make them more readily accessible for consultation and research. To give us access to the dictionary files in a fast and efficient way, it was decided to use the Brigham Young University ETC Text Retrieval Software (formerly known as BYU Concordance, a.k.a. WordCruncher), a software package running under MS-DOS. Although access was very fast and enabled the linguist to browse quickly through the contents of the dictionary, we felt this text-retrieval system was inadequate for several reasons which will be dealt with in more detail in this book. Realizing the importance of the linguistic material we had at our disposal, we decided in 1992 to embark on the construction of a true lexical database whose design and structure would enable linguists, teachers and language students alike to access information from various angles and hence to answer questions few people had ventured to ask so far, for want of appropriate resources. This book is basically concerned with the creation of a large bilingual lexical-semantic database from the Collins-Robert dictionary. Like most projects in computational lexicography, and in lexicography in general, it should not be viewed as a purely individual piece of work but rather as a collaborative effort aimed at paving the way for a new generation of dictionaries. Thanks to EU funding in the framework of the DECIDE Project {Designing and Evaluating Extracting Tools for Collocations in Dictionaries and Corpora Multilingual Action Plan MLAP-93/19), Jacques Jansen was able to implement the EnglishFrench part of the CR dictionary under database format. Nothing would have been possible without his expertise and the care he took in designing the structure of this database. Luc Alexandre, another computer engineer who joined our team at a later stage of the project, wrote a series of application programs to help the linguist or student to retrieve information from the database in a simple and efficient way. This does not mean I have entrusted colleagues with all the computing aspects of this project, however. The menu-driven front-end for enriching the database with lexical-semantic information and the application program for querying and updating the database are all my own work. As is frequent in the rapidly-evolving field of computational lexicography, the research project which is the topic of this book has been discussed in a number of published papers and oral presentations at national and international conferences (Fontenelle 1992a,b; 1994a,b; 1996a,b; 1997; Fontenelle et al. 1996). Some aspects have also been dealt with, albeit in lesser detail, in a number of unpublished research reports and deliverables in the framework of the DECIDE project (Fontenelle et al. 1994b). Some of the published papers were written well before the database could be implemented and enriched, however, and could therefore only describe preliminary results. The full rationale of the methodological positions adopted here has never received any detailed treatment and the potential use of the semantic network

3 described in this book has only been alluded to. This work is therefore meant to stand on its own. At this juncture, I should also like to comment briefly on the quotations this book opens with. The various entries excerpted from Flaubert's brilliant Dictionnaire des idées reçues are, I think, particularly relevant illustrations of the type of material which is going to be dealt with below. Flaubert was primarily concerned with denouncing in a sarcastic and humorous way the pervasive use of stereotyped expressions, clichés and fixed phrases which, he thought, tend to make language uniform and preclude any form of originality. Although it is tempting to agree with this criticism, I cannot help thinking that the types of word combinations and associations the quotations illustrate are precisely the sort of knowledge which has been so neglected in foreign-language teaching so far and which is so crucial to achieve native-like competence.1 Such recurrent word combinations, which are generally known as collocations, do not usually pose any serious problem in the understanding process since any non-native speaker is likely to recognize and understand them. The converse, however, is far from true and, in an encoding perspective, selecting the appropriate term in context is much more difficult and may even be considered as one of the most serious stumbling blocks in language learning. As an element of our lexical knowledge, collocations cannot be accounted for in terms of grammatical rules, which is why linguists such as Aisentadt (1979), Mackin (1978), Hausmann (1979) or Cowie (1986) argue for the compilation of specialized dictionaries of collocations. As will be made clear in Chapter 3, the past few years have seen the emergence of a range of both bilingual and monolingual collocational dictionaries. In the field of computational linguistics, much effort has also been put into the extraction and acquisition of co-occurrence knowledge from large bodies of textual material in machine-readable form, or corpora, using sophisticated statistical techniques. Although the Collins-Robert dictionary is not a collocational dictionary, its transformation into a semantically-enriched database in the framework of this project has been motivated by a wish to make its wealth of metalinguistic and collocational knowledge readily accessible. The attempt is therefore fully in line with current efforts to reuse, extract and formalize the explicit or implicit lexical knowledge contained in computerized lexical resources, be they huge corpora or machine-readable dictionaries. Such efforts, which testify to the strong revival of interest in lexical studies over the past 15 years, have led to the development of numerous large-scale, long-term linguistic studies, focussing primarily on the lexical aspects of the linguistic description of natural languages. As is pointed out by Wilks et al. (1996), most of them can be traced back to Amsler's seminal work on the machine-readable version of the Merriam-Webster Pocket Dictionary (Amsler 1980), immediately followed by Michiels's dissertation on the computerized version of LDOCE (Michiels 1982). The various projects which were launched later to investigate the possibility of re-formatting, re-using and extracting the information contained in the computerized versions of lexical resources (dictionaries, term banks, thesauri, etc.) have aimed (and still aim) at exploiting these resources in an NLP perspective. The boom

Moreover, this criticism begs a number o f crucial questions. Indeed, it is not sure that all multi-word units are clichés and there are fixed phrases we definitely need to be aware of. The main problem is therefore to distinguish the cliché and the non-cliché.

4 of the information market and the increasing need for translation and other linguistic services have indeed made it necessary to develop various types of NLP products, ranging from machine-translation systems to natural-language front-ends to information retrieval or speechprocessing systems. The major stumbling block in the development of such systems, however, is the construction of the large lexicons these systems generally have to draw upon. It is a well-known fact that building these lexicons from scratch is both time-consuming and costly and the idea soon emerged that linguistic information about words could be derived from already existing computerized resources. It should be realized, however, that applications in computational lexicography tend to be limited to NLP and often discard other goals such as language teaching and language learning which seem to be considered less ambitious, perhaps because the computer scientists and computational linguists who research MRDs often tend to be more computer-oriented than teaching-minded. Yet, it should be borne in mind that teaching the behaviour of words to language students and formalizing the syntactic, semantic, pragmatic or collocational properties of lexical items for computers often boil down to describing the same linguistic facts, albeit in an altogether different terminology. The special attention paid by computational lexicographers to the computerized versions of learners'" dictionaries (see Boguraev & Briscoe 1989 who devote a whole book to LDOCE) testifies to this absence of clear-cut distinction between computational and pedagogical applications. In a way, this book could also be seen as an attempt to bridge the gap between computational research and pedagogical concerns which continually cross-fertilize each other. A word should also be said about the linguistic framework which I have opted for in this book. The ultimate purpose was to build a repository of lexical-semantic information by systematically tapping the metalinguistic information contained in the CR dictionary. As will be made clear below, the metalinguistic apparatus of the dictionary is remarkable not only for the sheer amount of data it contains (around 70,000 items appearing in italics in the printed version), but also for the impressive range of lexical-semantic relations it covers. A major part of the project has consisted in making these relations explicit by labelling them appropriately in order to make them computationally accessible. The manifold relations which are illustrated by the metalinguistic material range from selection restriction features to collocational constraints, which is why the enriched version of the dictionary as its stands now can best be seen as a multi-access collocational database. The term "collocational database" does not appear in the title of this book, however. Instead, I have preferred the term "lexical-semantic database" because the metalinguistic material also includes a large amount of non-collocational data consisting mainly of synonyms, hyperonyms, antonyms and various types of derived forms. To perform the systematic enrichment of the lexical database as explained in Chapter 6, it was therefore necessary to look for a model which was powerful enough to cope with the numerous types of lexical-semantic relations holding between the metalinguistic items and the headwords of the dictionary. The approach to the lexicon which was required for the tasks I had in mind was one that uses relational semantics. As noted by Evens (1988:2), relational semantics is one of the three major competing approaches to the study of word meaning. The other two are, on the one hand, componential analysis (see Katz & Fodor 1963) and, on the other, the structural approach which views concepts as semantic fields (see Lehrer 1974).

5 These approaches are not mutually exclusive, however, and distinctions within fields can be stated in terms of semantic components. Although semantic fields have been most useful in descriptive linguistics, Evens argues, the theoretical framework they provide is too weak for building NLP lexicons, even if one cannot ignore Lyons's (1977) and Cruse's (1986) outstanding contribution to the study of sense relations. Componential analysis is also considered to be inadequate and of limited use when one wishes to use it outside the restricted domains of colour or kinship terms, or personal pronouns (for a useful summary of the criticism levelled against componential analysis, see McCarthy 1991). The relational approach adopted in this book owes a lot to the survey of lexical-semantic relations carried out by Evens et al. (1980) and Evens (1988). This approach tries to make explicit the structural organization of a semantic domain by describing how the elements are related to one another. Evens defines the basic concepts of this approach as follows: "The links that connect the elements of the domain are called lexical or semantic relations. Relations between words are called lexical relations. Relations between concepts are called conceptual or semantic relations. Since words and concepts are inextricably intertwined the phrase lexical-semantic relations is used when it is unnecessary or impossible to make a distinction" (1988:2). Evens et al. (1980) have shown that relational models are found useful by several categories of researchers involved in problems of language, ranging from linguists, of course, to psychologists, computer scientists and anthropologists or ethnographers. One of these models, developed by the Russian linguist Igor Mel'cuk, has been the centre of attention for the past 15 years or so and is the underlying framework adopted in this book. His theory, called the MeaningText Theory (MTT), aims at developing a formal model of language based on a set of rules which convert meanings into corresponding texts and the other way round. The lexical component of the MTT model is called the Explanatory Combinatory Dictionary (ECD), a dictionary which is intended to provide a non-native speaker with a detailed description of everything he or she is expected to know about the vocabulary of a language in order to generate well-formed sentences. More particularly, the key concept which is used in this book is that of lexical functions, the name Mel'cuk uses for lexical relations. Anecdotally, Evens (1988:11) relates how Mel'cuk invented this concept about 30 years ago, when experiencing a gale in the Russian countryside. Reflecting on the fact that, in English, the noun rain combines with heavy while light can be used with bright led him to invent the first lexical function which he called Magn (from Magnitude) to refer to an intensifying meaning. Pursuing this idea, he soon came up with a body of over 50 such lexical functions which are used to label systematic and recurring lexical-semantic relations. As will be shown in the book, the apparatus of lexical functions comes in handy to account for a whole range of collocational phenomena. Moreover, and this is something many linguists tend to forget, lexical functions can also be used to describe purely paradigmatic relations (e.g. hyperonymy, synonymy, etc.). Although there is no consensus as to the number of relations posited by relational models2, it is clear, as I argue in Chapter 5, that Mel'cuk's theory forms a most

2

Werner ( 1 9 8 8 ) argues that all lexical knowledge can be expressed in terms o f only three basic relations modification, taxonomy and queuing/sequencing - while Ahlswede & Evens (1988a) use more than 100 relations for adjectives only.

6 interesting starting point to perform a systematic semantic labelling of the material contained in the CR dictionary. This MTT theory is certainly not faultless or infallible and some reservations must definitely be expressed about the overgenerality of the lexical functions or about the capacity of the model to cope with fine-grained distinctions of style or register. Such reservations, together with other criticism which has been aptly levelled against the ECD, will be found mainly in Chapter 5 and Chapter 9. The usefulness of this model in NLP is also a moot point when one considers the difficulties computational linguists have had in trying to adopt MTT in machine translation projects (see also Heylen 1993 for a summary of the problems raised by Mel'cuk's framework). Despite all the criticism and reservations of which I am fully aware and which will become apparent in the body of this book, the model provides the linguist with a powerful descriptive tool, not only based on sound theoretical research but also capable of coping with a wide range of lexical-semantic relations, both paradigmatic (on the vertical axis) and syntagmatic (on the horizontal axis), which is precisely the type of material I wanted to retrieve from and formalize in the Collins-Robert dictionary. Before concluding this general introduction, I now wish to give a general outline of the contents of each chapter of this book. Chapter 2 deals with a state-of-the art survey of current trends in computational lexicography and in lexical semantics. Since the book basically describes an attempt to extract lexical-semantic information from a bilingual MRD, this chapter is mainly concerned with surveying major recent initiatives in the field of lexical knowledge acquisition. The acquisition of collocational knowledge from textual corpora is also broached upon because the tools and techniques developed here to identify and retrieve collocations from the CR dictionary may fail to give access to central and representative linguistic usages. Since the collocations included in the CR dictionary sometimes prove to be arbitrary and since we have no evidence that they reflect the actual behaviour of words, the description of lexical items will have to be enriched by lexical data extracted from large statistically-analyzed corpora. This is not the main concern of this book but the complementary nature of the two approaches deserves more than a passing remark. Chapter 2 also examines the multiple facets of what has come to be known as collocations, since around 80% of the database consists of co-occurrence knowledge.3 Chapter 3 provides a critical analysis of a few prominent collocational dictionaries and Chapter 4 briefly describes Pustejovsky's theory of the Generative Lexicon, paying special attention to the apparatus of qualia structures which has attracted a lot of attention over the last few years. Chapter 5 deals with the basic concepts which underlie Mel'cuk's MeaningText Theory. It focuses on the key notion of 'lexical function' and shows the role it plays in the structure of the explanatory combinatory dictionary. While concentrating on the power of this descriptive model and its capacity to formalize a wide range of general-language lexical relations, it nevertheless expresses a number of reservations about the vagueness or overgenerality of some lexical functions. Next to a detailed list of lexical functions, this chapter also surveys a few current attempts to use MTT in an NLP perspective. Chapter 6 deals with the construction of the database proper. After describing a dictionary entry, it explains the structure of the relational database in minute detail. The application

The CR material this book deals with also includes various types of derived forms and paradigmatic relations.

7 programs for enriching the database with lexical functions and the functionalities of the retrieval software are also dealt with. This chapter also contains a description of the adjustments carried out to make the metalinguistic information readily accessible and computationally more tractable. Chapter 7 is concerned with the use that was made of the dictionary's micro-definitions and defining formulae (i.e. recurring structural patterns in definitions) to semi-automate the assignment of lexical functions and the identification of frequent lexical relations. Chapter 8 tries to show that Mel'cuk's list of lexical functions should be extended to cope with a number of relations which are repeatedly illustrated in the CR dictionary but are not considered as lexical relations proper by Mel'cuk and his colleagues. The part-whole relation, for example, which is of cardinal importance in an information retrieval perspective, plays a crucial part in the database and needs to be dealt with using the lexical function mechanism. The difficulties met with during the assignment of lexical functions and the lack of operational criteria for some functions are addressed in Chapter 9. The second part of this book deals with a number of potential applications of the bilingual database I have constructed. It should be clear from the start that this list is far from complete and that the material we now have at our disposal opens up exciting vistas for a linguist or a language teacher. Some types of linguistic investigations which are made possible by the database include the study of the realization of particular lexical functions across the whole dictionary. In Chapter 10, for example, the function Son, which accounts for typical sounds and noises, is examined closely with a view to improving on existing descriptions of the relationship between sound and meaning. Another set of potential applications, which may sound more appealing to the traditional syntactician or semanticist, includes basic research into the structure of the lexicon, the nature of sense extensions and the relation between syntax and semantics. Chapter 11 deals with predictable meaning shifts and what Ostler & Atkins (1991) have called Lexical Implication Rules (LIRs). Examples of such LIRs are shown to be retrievable partly automatically on the basis of some structural properties of the database and of its metalinguistic material. In Chapter 12, I tackle the problem of extracting a subset of verbs (ergatives) from the database. This chapter relates to current efforts to account for the syntactic behaviour of certain classes of verbs in terms of their membership of well-defined semantic classes. Although the need to exploit lexical regularities in the design of natural language processing systems has long been recognized (Katz & Levin 1988, Boguraev 1991), the extraction from bilingual dictionaries of lists of verbs participating in transitivity alternations has, to my knowledge, never been attempted. Nor has there been any attempt to correlate semanticosyntactic properties of verbs with Mel'cukian lexical functions. Capitalizing on recent work in cognitive semantics, I show in Chapter 13 that the Collins-Robert database can be used to establish a correlation between lexical functions on the one hand and metaphorical phenomena on the other. The very organization of the collocational material in database format and the relational-semantic approach adopted in this book are used to bring to the fore a number of metaphors which provide linguistic motivation for a series of seemingly idiosyncratic collocations.

8 Finally, computer-assisted language learning with a huge repository of semantic information is probably the most straightforward type of application which comes to a teacher's mind. After all, existing collocational dictionaries are still in their infancy and their access mechanisms are most often limited to conventional access paths (alphabetical ordering of the base of the collocation). By providing learners, and teachers alike, with diversified access to collocational data, the pedagogical perspective adopted in Chapter 14 makes it possible to envisage using the CR material both to give students a feel for idiomaticity and to improve existing reference works used in language teaching. The applications envisaged in the second part of this book provide ample evidence that the Collins-Robert database is a most useful resource for linguists, lexicographers and teachers insofar as it helps to shed light onto the structure of the lexicon of the English and French languages. It will also be shown that such a computerized dictionary, with the numerous access keys the application programs provide, forms a major type of reference work which definitely ought to be exploited in the production of the future generation of collocational dictionaries and thesauri.

2 Acquisition of lexical and co-occurrence knowledge: an overview

2.1

Lexical acquisition and machine-readable dictionaries

In this chapter, I should like to introduce the basic concepts I will make use of in this book. Since the work described here can actually be seen as a sort of practical exercise in lexical acquisition and representation, it is essential to situate this contribution in the wider context of current research into the use of machine-readable dictionaries and into lexical semantics in general (see Zernik 1991 or Boguraev 1991 for a definition of 'lexical acquisition' and the problems related to it). The main purpose of the project described in this book is the construction of a semantic network from the collocational and paradigmatic knowledge it contains. This means that I have also deemed it necessary to try to clarify this concept of collocation (see section 2.2) and to provide a critical analysis of a few collocational dictionaries (see Chapter 3), which should facilitate a comparison with the results I have obtained. Finally, corpus-based research will also be alluded to since a lot of effort is currently being put into the statistical analysis of large textual corpora in order to extract the very information contained in the CR database. Although the techniques and methods applied for CR do not resort to any statistical apparatus, some reference will be made to the commonest statistical techniques used by lexicographers in their everyday work. The generative tradition of the 1960s and 1970s in linguistics tended to neglect the role of the lexicon in the description of language (Chomsky 1965). The advent of computer technology, however, has enabled linguists to test their hypotheses and intuitions and it soon became apparent that the development of a whole range of natural language processing systems required a lot more information about words than what was needed to parse a lexically simple sentence such as John loves Mary. Developers were rapidly confronted with what came to be known as the 'lexical acquisition bottleneck' (see also Boguraev & Briscoe 1989; Wilks et al. 1989, 1993, 1996). The problem was very simple: in order to develop a large-scale NLP system, for example a machine translation system or an information retrieval system, one needs to feed the lexical component of the system with the description of tens of thousands of lexical items. The provenance of these descriptions, however, is rather problematic. Should the developer hire a team of highly specialized lexicographers to code the whole lexicon from scratch or should other resources be tapped to reduce the costs and save time? A number of researchers in computational linguistics are convinced that the former solution is undesirable because it is too time-consuming and because lexicographical skills for computational applications cannot readily be found. The latter solution, i.e. exploiting existing resources to build the lexicon, then seems a more promising approach, although the nature of the resources remains to be defined. In a paper which was originally written for the Grosseto Workshop on Automating the Lexicon (1986), Boguraev (1994) carried out a survey of MRD-based research and showed

10 that the exploitation of machine-readable commercial dictionaries can be traced back to two doctoral dissertations which exerted a profound influence on the field of computational lexicography. Amsler's seminal work on the structure of the Merriam-Webster Pocket Dictionary (Amsler 1980) described various procedures for connecting definition texts and for revealing implicit semantic information buried in the micro structure of the dictionary entries. His goal was to make explicit the conceptual hierarchies buried in this MRD. His basic assumption was that most definitions comply with the traditional Aristotelian distinction between genus and differentiae and that the very structure of these definitions could be used as a starting point to semi-automatically create taxonomies usable in NLP. Shortly after Amsler's thesis, Michiels's dissertation (1982) explored the elaborate grammatical coding system of LDOCE, which he described as the first truly computerized dictionary of English. In his dissertation, Michiels proposed a database organization of the lexical data which would make it possible to reuse the syntactic and semantic information of LDOCE entries to feed the lexical component of a parser of English. Such proposals have been recently put into practice in HORATIO, an experimental definite-clause grammar parser for a subset of English whose lexical material is directly imported from LDOCE (Michiels 1995a,b,c). The suggestions for a pedagogical exploitation of the learner's dictionary should also be mentioned when one considers the fact that, surprisingly enough, relatively few people in the computational lexicography community have considered using MRDs in a language learning or teaching perspective (although this could be changing, witness the Compass Project - see Segond & Zaenen 1994, Bauer et al. 1994, Breidt et al. 1996). Amsler's and Michiels's revolutionary work led to the worldwide recognition that MRDs house a lot of NLP-relevant information the extraction of which could be partly automated. Several other research groups managed to obtain the tapes of these dictionaries and a group at Cambridge University started using LDOCE in an attempt to deliver a computational grammar of English with a 50,000 strong word list indexed to it. Some of the results of this project are reported on in a book which is entirely devoted to the computational analysis of LDOCE, its grammar coding system, its definitions, etc. (Boguraev & Briscoe 1989). In the United States, the Lexical Systems group at IBM Yorktown Heights purchased the LDOCE tapes together with several other MRDs such as W7 to create WordSmith, an automated dictionary whose powerful browsing functionalities enable users to retrieve words that are close to a given lexical item along dimensions of sound, spelling, syntax or even meaning (Chodorow et al. 1985). The same group also embarked on a long-term project called CompLex, a computational lexicon which provides semantic information to NLP systems (Byrd 1989; Byrd et al. 1987; Klavans et al. 1993; Klavans & Tzoukerman 1990). Unlike earlier attempts which focus on the grammatical components of MRDs (Boguraev & Briscoe 1989), CompLex is described by Byrd (1989) as a network of word senses or, to put it slightly differently, as a Lexical Knowledge Base (LKB). Among the relationships between word senses they are interested in finding, Byrd cites the following: homography, hyperonymy (genus) and hyponymy, synonymy, typical arguments and selection restrictions (1989:72). To these pieces of information, which he seeks to identify in the definition part of an entry, Byrd adds less 'central', more 'exotic' relations such as the one which links merchants to the goods they typically sell (e.g. pusher = a person who sells narcotics illegally). The IBM team derive

11 the latter category by extracting automatically the nouns whose definition contains the combination "who"+"sell" within a window of 7 words. Disambiguation is undoubtedly a key concept in computational linguistics and most efforts to extract lexical information from MRDs stem from the recognition that disambiguation should be done at word sense level (see also Michiels 1982, 1995a,b; Michiels & Noël 1982). In this context, a key lexical relation is definitely hyperonymy, which corresponds to the lexicographical notion of genus or superordinate term in a definition (e.g. a rose is a flower). Researchers have long been trying to collect all hyperonym links in a dictionary to produce complete taxonomies of concepts (see Michiels & Noël 1984 and Michiels 1982:188 who search for items belonging to the thesauric 'instrument' set in LDOCE; see also work by Vossen et al. 1989, Vossen 1992 or Calzolari's work on the Italian Lexical Database (1984, 1988) and the recent EuroWordNet project which aims at creating multilingual semantic networks à la WordNet - Vossen 1996). One of the main stumbling blocks which hampers the automatic generation of taxonomies, however, is that the language used in definition texts is usually not disambiguated. The following examples from W9 make it clear that basic genus filiation techniques such as those described by Amsler (1980) or Michiels (1982) need to rely on a previously disambiguated defining vocabulary (the genus terms are italicized for clarity's sake): pigsty: pigpen 1 pigpen (1): a pen for pigs quill: a pen for writing pen1: a small enclosure for animals pen2: an implement for writing Most NLP systems need an inference mechanism based on the ISA link to make sure that hyponyms inherit properties known to apply to the genus term. The ambiguity of the word pen above means that it is of cardinal importance to block the wrong inference that a pigsty is an implement and the quill is an enclosure (see Klavans et al. 1990 and Kilgarriff 1992:17). Of course, the correct links can be identified on the basis of some similarity between the definitions to be matched ("a pen for pigs" and "an enclosure for animals" definitely share a similar structure, viz. "FOR+N", and pigs are animals, another piece of information which should be retrievable from the dictionary; "a pen for writing" and "an implement for writing" are even closer - "FOR+Ving"). Pattern-matching procedures are not sufficient, however, and the automatic identification of the genus term and its disambiguation must resort to more indepth syntactic and semantic analysis of the definitions (Alshawi 1989, Vossen et al. 1989, Klavans et al. 1993). The preceding considerations all refer to work carried out on monolingual English dictionaries which stick to the traditional lexicographical defining tradition based on the distinction between genus and differentiae. In the late 1980s, however, the Collins Cobuild dictionary (Sinclair 1987a), another learner's dictionary developed in the English Department of the University of Birmingham, set revolutionary standards in the presentation of information about the language. This dictionary was innovative in at least two respects:

12

1. The first edition was based on a 20-million-word text corpus of general English from which a huge database aiming to describe general English had been compiled; 2. It used an inventory of explanatory devices designed to help learners write in English (encoding perspective). Further details on the Cobuild Project and the innovative strategies employed by its designers can be found in Sinclair (1987b). Let me point out, however, that the defining strategies used by the Cobuild lexicographers represent a major breakthrough in lexicographic tradition. Consider the following definition for the verb abdicate, which is a case in point: Cobuild entry: ABDICATE 1. If a king or queen abdicates, he or she resigns. EG. ... the day Edward VII abdicated 3. If you abdicate a right or some power that you have, you choose not to take advantage of it. EG. . . abdicating her ability to control her environment

V or V+O = step down V+O

Besides grammatical codes which account for the syntactic environment in which a lexical item can be inserted (V+0= monotransitive verb - verb followed by one direct object), the extra-column contains synonyms, antonyms and superordinates (respectively introduced by the symbols =, lì). This entry could certainly be improved because the verb resign in the first sense is rather imprecise (in fact, it means 'give up the position of king or queen'). The most innovative and revolutionary feature of Cobuild, however, is undoubtedly the double structure of the definitions in which the first part places the headword in a typical environment and the second part identifies the meaning (see Hanks 1987 for more details on the Cobuild defining practice; see also p.59). Moreover, Cobuild also departs from traditional practice in overtly mentioning restricted collocates such as the typical subject {king/queen in def. 1) or the typical object (right/power in def.3 - note, however, that def.l does not indicate that, if an object is used, it must be throne or crown). This enables the reader/user to infer, for example, that the subject of the verb must be an animate being whereas the object, if any, must be an abstract entity. It should be noted that this is precisely the kind of information which is crucial for word sense disambiguation in an NLP task. It also corresponds to the information which is high on the priority list of researchers who are engaged in lexical knowledge acquisition (see Byrd et al. 1987; Byrd 1989:72-76; Boguraev 1991 among others). It is therefore not surprising that the availability of the Cobuild dictionary in machine-readable form has given a new impetus to the field, spurring some researchers to try to formalize the Cobuild definitions in an attempt to extract, represent and use the syntactic-semantic information they contain. The CEC project ET-10/51, "Semantic Analysis using a Natural Language Dictionary" carried out by the Universities of Birmingham, Bochum and Pisa attempts to do just that by constructing a parser for Cobuild definitions (Sinclair et al. 1995; Barnbrook & Sinclair 1995; Calzolari et al. 1995). Their aim is, more specifically, to use the dictionary itself to construct a series of disambiguated combinatorial links. The tools they apply recognize the correct sense of the words which are used as arguments of the headword in the first part of a Cobuild definition. In the case of the entry for abdicate (sense 1) above, for

13 example, the disambiguation procedure makes it possible to infer that the subject must be [+MALE] and [+HUMAN] or [+FEMALE] and [+HUMAN] ([+MONARCH] would even be more appropriate here). The noun king in the definition is then linked to the first sense of the headword king of the dictionary, discarding other readings such as the chess piece or the playing card. Similarly, the occurrence of queen is disambiguated and mapped onto the first sense of the headword queen which has 'woman' as a genus term, discarding the 'homosexual' reading of the noun, for example (see also Peters et al. 1994). The above-mentioned project on Cobuild illustrates the Commission of the European Communities' interest in supporting research projects that deal with language processing or, as it is now also frequently called, language engineering (LE). The tremendous economic potential of language technologies has long been recognized and the CEC has been active in the field of machine translation for around 15 years, as the EUROTRA project (see Allegranza et al. 1991 and the references given therein) will attest. However, Alberto & Bennett (1995:9) rightly point out that, "in EUROTRA, dictionaries generally played a secondary role and that grammatical modules have been accorded primacy". It is therefore not surprising that the main bulk of research into lexical knowledge acquisition and representation has taken place within the framework of other CEC-funded programmes. Two of the most prominent projects are the ACQUILEX Esprit Project and the EUROTRA-7 Project. Both projects are directly related to the concept of "reusability of lexical resources", a buzzword employed to refer to the construction of large computerized lexicons for real-life NLP systems from already existing lexical resources. Acquilex is specifically concerned with the construction of a single integrated multilingual lexical knowledge base in which lexical information is extracted from multiple MRDs in a multilingual context (see Zampolli 1994:11). The languages covered by this project are English, Dutch, Italian and Spanish. As Calzolari (1991a) notes, the extraction procedure focuses not only on information which is already explicit (such as word lists, partof-speech information, spelling variants, etc.) but also on implicit information such as semantic relations, taxonomies and argument structures. One of the ultimate goals is to make it possible for the user to navigate within the LKB with access through concepts and semantic relations. The theoretical framework is Pustejovsky's notion of 'qualia structure' (see chapter 4), which can be seen as a sort of argument structure for nouns. To populate their lexical entries, the Acquilex team make use of general conceptual templates whose argument slots contain attributes such as agent, set_of, location, used for, cause_of, colour, etc. (see Calzolari 1991 a: 191-193; see also Briscoe et al. 1993, Vossen 1992, Antelmi & Roventini 1992 or Briscoe et al. 1990 for more information on Acquilex). The work described in this book undoubtedly owes a lot to the results obtained by this project. In particular, it shares with it the dual approach to acquisition based on empirical observations on the one hand (see Chapter 7 which concentrates on the use of 'defining formulae' to extract lexical-semantic relations) and on theoretical hypotheses on the other. The main difference is that my own work uses Mel'cuk's model, which covers a much broader range of relations than Pustejovsky's qualia structures. The EUROTRA-7, or ET-7, Project was a feasibility study whose objective was to recommend measures to be taken to construct shareable lexical resources (Heid & McNaught 1991). Acknowledging the need for standards in the description of lexical items, this survey

14 assessed the various basic standards at several levels of description: orthography, phonology, phonetics, morphology, collocation, syntax, semantics and pragmatics. In particular, this feasibility study suggested exploring the existing machine-readable versions of paper dictionaries for the discovery of collocational information (Heid et al. 1991:36-38). More specifically, the authors of this study stressed the fact that there is no point in assembling corpora or in counting occurrences of specific phenomena, as long as we do not know what to look for. This is also true of turning an MRD into an LKB or LDB, a fruitless exercise if one does not know what to look for or what to do with the information the browsing capabilities of the system reveal. Alternatively, Heid et al. also note that representation as such is not interesting if it remains at the theoretical level without data to be represented. This dissertation, which aims to construct a large lexical-semantic database from a bilingual MRD, using Mel'cuk's descriptive framework, falls in line with these suggestions, even if it lacks the inheritance mechanisms and the knowledge representation features used by AI specialists (see Briscoe et al. 1993). The preceding survey provides ample evidence that lexical relations, semantic networks and machine-readable dictionaries have attracted a lot of attention in the past few years, both in Europe and in the United States. Some projects tend to concentrate on one or two prominent lexical relations such as antonymy and synonymy, as is the case with WordNet, a large on-line lexical database developed at the Department of Psychology at Princeton University by George Miller and his colleagues (Miller 1990; Miller et al. 1990; Fellbaum 1990). Other projects have considered a broader range of relations, mainly exploiting structural regularities in definitions to identify recurring relations (Calzolari 1984, 1988 for an Italian lexical database). To my knowledge, however, no attempt has ever been made to build a large knowledge base around the concept of lexical functions. The main dictionary fragments which have been produced in the framework of the MeaningText Model (see Chapter 5) are not available in machine-readable form, which strongly limits possibilities of accessing implicit information. It should also be stressed that most projects described so far focus on monolingual dictionaries (whether intended for learners, such as LDOCE, Cobuild or OALD, or for a general audience, such as W7). Relatively little work has been done on the exploitation of bilingual dictionaries. As is pointed out in Jansen & Fontenelle (1994), this is probably due to two factors: 1. Few research teams have been granted access to the tapes of bilingual MRDs. To my knowledge, only the English Department at the University of Liège and the IBM Yorktown Heights Lexical Systems Group (see Byrd 1989, Byrd et al. 1987) managed to acquire a copy of the large Collins-Robert dictionary. Interestingly, the IBM group has also used the Collins English-German dictionary as one of the lexical components of the LMT machine translation system (see Neff & McCord 1990). Neff & McCord have concentrated on the typical objects and subjects of verbs in the bilingual dictionary, in an attempt to use this information for sense disambiguation and target term selection. The example they give is the following sentence which is to be translated into German (p.89): The basement flooded yesterday.

15 The information on selection restrictions given in the dictionary is presented as follows (simplified for the sake of clarity - the square brackets indicate that the italicized item is a potential subject): flood vi [river] über die Ufer treten [bath] überfließen [cellar] unter Wasser stehen [garden, land] überschwemmt werden Neff & McCord aim to establish a unifying link between basement in the sentence above and one of the typical subjects in the entry. Linking the bilingual database with other, monolingual resources such as LDOCE and W7 enables them to create a chain of relations facilitating sense disambiguation (LDOCE giving cellar as a synonym of basement and W7 giving basement as a synonym of cellar). Using such a network of lexical knowledge is enough to yield the following translation: Der Keller stand gestern unter Wasser. It should be noted that this type of application is very close to what the Liège group is currently doing with the Collins-Robert and Oxford-Hachette databases (see Michiels 1996, Michiels & Dufour 1996). This of course presupposes that the metalinguistic apparatus is readily accessible, which is precisely the topic of this book. My own work departs from Neff & McCord's approach, however, insofar as it makes the metalinguistic information readily accessible and also enriches it systematically, whenever possible, with lexical-semantic labels with a view to enhancing the knowledge base and creating new access paths. Other teams have had access to some bilingual dictionaries in the Collins series. The Pisa group, for instance, has used the Collins English-Italian pocket dictionaries as a lexical resource to establish links between parallel texts and hence align sentences in bilingual corpora (Picchi et al. 1992). The ISSCO group in Switzerland has also used the pocket machine-readable version of the English-French Collins-Robert dictionary to implement a network-based dictionary-consultation tool on the campus of the University of Geneva (Petitpierre et al. 1994). However, it should be borne in mind that these two projects make use of the pocket versions of these bilingual dictionaries, which means that the collocational knowledge they contain is much too poor to compete with what can be extracted using traditional corpus-based statistical tools. 2. A second, perhaps more fundamental, reason for which bilingual dictionaries tend to have been somewhat neglected by the research community is the less structured format of the typesetting tapes. In this respect, it is important to bear in mind the distinction I alluded to in the general introduction (see p.2) between computerized dictionaries, which display a logical organization making them suitable for transformation into a lexical database, and machine-readable dictionaries proper which are usually hardly more than flat text files with codes aimed at driving the typesetting process. The former dictionaries are of course easier to process since their logical format makes it possible to identify each piece of information

16 more readily (definitions, examples, parts of speech, grammar codes etc. are already partly formalized and identifiable). The Collins-Robert clearly belongs to the latter class insofar as what was made available to us was the typesetting tape, which means that the transformation of this file into the relational database described in Chapter 6 was far from trivial. Fortunately, the new generation of dictionaries, which usually rely on an SGML' system of tags and codes, is easier to exploit. In this respect, it is interesting to see the results obtained by the COMPASS project, a CEC-funded project coordinated by the Rank Xerox Research Centre in Grenoble which aims to exploit machine-readable versions of bilingual dictionaries to build the context sensitive look-up component of an interactive text comprehension aid for advanced language learners (Segond & Zaenen 1994; Bauer et al. 1994). More specifically, the Compass project aims at exploiting the tape of the recent Oxford-Hachette dictionary (Corréard & Grundy 1994), the first corpus-based bilingual dictionary.2 Unlike the Liège database, which concentrates on the collocational knowledge contained in the dictionary, Compass mainly focuses on the description of idioms, i.e. fixed multi-word units which are found towards the end of the cline defined in the following section. Compass does not use the metalinguistic information which covers selection restrictions and constraints imposed on the type of subjects, objects and complements. In this respect, the Compass project is more limited in scope than the five-year (1995-2000) DEFI project run by the University of Liège which specifically addresses the exploitation of two English-French dictionaries (Collins-Robert and Oxford-Hachette) in a word sense disambiguation perspective, using every single bit of information available in the electronic versions of the dictionaries (subject field labels, selection restrictions, part-of-speech information, examples, idioms... - see Michiels & Dufour 1996, Dufour 1997).

2.2

2.2.1

Defining collocations

Lexical collocations

Although the CR dictionary also contains material that may be interpreted in terms of paradigmatic relations (antonyms, synonyms, hyperonyms, etc.), a major component of the database is made up of what linguists generally call collocations. This concept is far from being a clear-cut one, however, and in this section I should like to clarify the situation by attempting to define what is meant by "collocation". Given the wealth of articles on collocations that have appeared in the past few years, it is not possible (nor is it desirable) to quote and review all the papers that have something to contribute to the topic. Excellent

2

SGML=Standard Generalized Markup Language The University of Tübingen also participates in the Compass project and specifically deals with the exploitation of the Collins-Klett English-German German-English dictionary. This dictionary belongs to the same range of intuition-based bilingual dictionaries as the Collins-Robert (Breidt & Segond 1996, Breidt et al. 1996).

17

introductions to the field exist (Cowie 1986, 1994; Heid 1994b, Fernando 1996, Benson 1985, 1989..., to name but a few), but it is nevertheless necessary to clarify this phenomenon, or at least to try to make it sound less confused by outlining the properties of collocations. Consider the French sentence 'La chaleur avait fait tourner le lait'. On the occasion of one of the small tests which spice every teacher's life, my students had been asked to translate this sentence into English. The main purpose of such an exercise was to check that the students were aware that the structure faire + infinitive in French does not always correspond to make + infinitive in English. Among the many translations produced, one attracted my attention immediately, namely 'The heat had made the milk rotten'. Since I expected something like 'the heat had turned the milk' or 'the heat had turned the milk sour', I concluded the student had produced that deviant construction because he did not know that the noun 'milk' does not usually co-occur with 'rotten' to render the absence of freshness. The problem is that the English sentence is syntactically (i.e. grammatically) correct and any native speaker of English would most certainly understand the meaning of a combination of words such as 'rotten milk', but this very combination is likely to elicit some kind of mocking smile. Why is it then that we can say that an egg is rotten, bad or addled, while milk can go or turn sour and butter can become rancid? The adjectives bad, rotten, addled, sour or rancid can all be combined with nouns denoting foodstuffs but are by no means interchangeable. This means that some words are more likely to combine with specific items and form natural-sounding combinations while other types of combinations are simply not found, even though they would be possible and understandable, at least theoretically. Knowing that the noun milk combines with the verb turn and the adjective sour is certainly most interesting, but it tells us little, if anything at all, about the very nature of such a combination. After all, an expression such as 'to lick somebody's boots' can also be said to illustrate a particular type of combination; yet all linguists would agree that this expression is not a collocation proper, but an idiom. What characterizes idioms in the popular but debatable view is the fact that they constitute a single semantic entity, and that their meaning is not equal to the sum of the meanings of the words they are made up of. This notion of noncompositionality is a moot point because compositionality itself is very hard to define operationally, as Michiels (1977) points out (Fernando & Flavell 1981 make a similar point). In the above example, it is clear, however, that there is no actual 'licking' whatsoever and that the expression is not about boots either. More importantly, perhaps, idioms are not entirely variable and cannot be submitted to various standard syntactic manipulations such as the following ones (the asterisk indicates that the sentences are not grammatical): (a) Passivization: * The teacher's boots had been licked by one of his pupils. (b) Pronominalization: * The student had licked my colleagues' boots but hesitated to lick mine. (c) Cleft sentence: * It was my boots that he had tried to lick. (d) Insertion of material: * The student tried to lick the teacher's leather boots. Of course, the variations described above are possible if one wants to sound jocular or if the sentence is taken literally, in its non-idiomatic sense (consider (a) or (d)). Moreover, there are idioms which may undergo only some transformations while excluding others: consider

18

bury the hatchet, which can be passivized, which indicates that, as Fraser (1970) puts it, there is a 'frozenness hierarchy' (on the fixity of idioms, see also Fernando 1996). However, despite the reservations stated above, it is commonly admitted that an idiom is basically a fixed multi-word unit whose meaning cannot be computed from the meanings of its components. As such, it is easy to see that the combinations "sour milk" or "the milk turned" cannot be considered as idioms since the noun milk keeps its basic sense in any case. These combinations are not subject to the same types of syntactic constraints as idioms. This is typical of what Aisenstadt (1979) calls 'restricted collocations', i.e. word combinations whose constituents are restricted in their commutability. Unlike idioms, restricted collocations do not form one single semantic unit and they display some variability. Unlike totally free phrases, however, the elements they are made up of are not freely interchangeable, which explains why we cannot say that the milk was rotten or that the egg was rancid or sour. As Aisenstadt notes, such collocations are not limited to adjective-noun or subject-verb combinations. We also find adverb-adjective (stark naked, dead drunk) or verb-object collocations (to command admiration, to pay attention, to make a mistake). Howarth (1997), Cowie (1997) and Cowie & Howarth (1995) also extend the category of restricted collocations along the cline to include combinations which are semantically linked and for which there is an open choice (see also my discussion of combinations such as blame unjustly and blame subsequently on p.40). Moreover, as noted by Halliday (1966:151) or Carter and McCarthy (1988:35), the concept of collocation is, to some extent, independent of grammatical categories: the relationship which holds between the verb argue and the adverb strongly is the same as that holding between the noun argument and the adjective strong. Interestingly, contemporary research into the nature of word combinations owes a lot to the Firthian tradition that the statement of collocation is one of the most fruitful approaches to the study of lexical items and their relations (Firth 1968; see also Halliday 1966). Recognizing that collocations (and other types of multi-word units) should be studied in relation to their grammatical and pragmatic functions, Cowie (1994:3169) defines collocations as "associations of two or more lexemes (or roots) recognized in and defined by their occurrence in a specific range of grammatical constructions".3 In a paper entitled 'Words shall be known by the company they keep', Mackin (1978) tackles the problem of how to teach collocations to foreign language learners. He stresses the fundamental distinction between production and understanding and argues that collocations do not pose any serious problems in the understanding process. Any non-native speaker is likely to recognize and understand a collocation but using collocations and selecting the appropriate term is much more difficult and may even be considered as one of the most serious stumbling blocks in language learning. This is also why both Aisenstadt and Mackin argue for the compilation of specialized dictionaries, since it is generally admitted that collocations cannot be accounted for in terms of grammatical rules. It is therefore natural to consider them as an element belonging to lexis. Their idiosyncratic nature makes them particularly well-suited for inclusion in a special type

3

This definition clearly rules out combinations of semántically-related items such as doctor and nurse which some researchers tend to consider as collocations because they frequently co-occur in the same contexts. Cowie also notes that collocations are characterized by arbitrary limitation of choice at one or more points (e.g. light/heavy exercise).

19 of dictionary designed not so much for decoding, i.e. understanding, text (like most traditional dictionaries) as for encoding text. The problem is that we first have to address the question of deciding on what elements to include in such a dictionary. Defining collocations as nonidiomatic expressions on the one hand and as non-free combinations on the other hand enables us to discard the two extremes of a continuum, but the area we are eventually left with is still too fuzzy and we immediately feel the need for further clarification. In one of his papers on the topic, Cowie argues that "journalistic prose draws very heavily on verb-noun collocations that are already well-established and widely known" (Cowie, 1992:1). This does not seem to be typical of journalese only, however, and research in applied linguistics has shown that native speakers often memorize ready-made word combinations. They usually have some predisposition to store these combinations as wholes, which accounts for the pervasiveness of lexical collocations in everyday language. The term 'lexical collocation' is used here to refer to the privileged, idiosyncratic relationship that holds between some verbs and their subjects/objects or between some nouns and the adjectives that modify them. It should be noted that this term may also denote the special relationship between two nouns or between a verb and an adverb. Some authors (e.g. Cowie 1986, 1991; Howarth 1993) also distinguish between free collocations and restricted collocations. The former refer to combinations that "allow substitution of either, or any, of their elements without semantic change in the other element or elements" (Cowie, 1986:62). The combination fire staff, for instance, is considered as free because other verbs can be substituted for fire (dismiss, lay off sack) and staff can be replaced by worker, employee, clerk, etc. This, I think, is more a co-occurrence restriction on the type of nouns that can be inserted in the direct object slot than a collocation: in this case, besides the requirement that the direct object of these verbs should be [+HUMAN], an additional selection restriction demands that this object NP should be [+SUBORDINATE] and that the subject NP should be [+SUPERIOR]. In contrast, restricted collocations are semi-fixed combinations and are characterized by the limited and arbitrary collocational range of their elements. In the case of verb + direct object restricted collocations, Howarth (1993) distinguishes between delexical collocations (usually with a nearly semantically empty verb such as have, make, do, get, take: to make a mistake, to have a drink, to take/have a bath...), collocations with a figurative verb {adopt a policy, blow a fuse, hazard a guess) and technical or specialized collocations (snuff out a candle). Typically, the verb often loses its primary meaning to acquire a figurative meaning under the influence of the noun it accompanies (or 'supports', to use Gross's terminology - see Gross 1981). The French noun choix, for example, is responsible for the selection of the verbs faire or opérer and it can be safely asserted that opérer in opérer un choix has not the meaning of "pratiquer une intervention chirurgicale sur" illustrated by opérer un patient. In most dictionaries, the verb "opérer" is given various readings, one of which being "faire", "réaliser". Similarly, the delexical verb pay in pay attention has acquired a figurative or nearly empty meaning in the vicinity of attention. We will come back to this notion of support verb below. It should be stressed here that knowledge of such restrictions is of paramount importance since cross-linguistic divergences are the rule (payer attention is unacceptable in French, the delexical verb faire having to be used instead). Learners should therefore be made aware of these restrictions and taught how to identify, memorize and use

20 these semi-frozen expressions that account for a native speaker's phraseological competence. Similarly, computers must have access to that type of knowledge for any NLP application, especially in the generation process (see section 5.4; see also Nirenburg 1989). For example, it is necessary to encode lexical constraints that account for the well- or ill-formedness of the following sentences which illustrate overlapping collocations: 1. 2. 3. 4. 5. 6.

John quenched the fire. John extinguished the fire. * John slaked the fire. John quenched his thirst. * John extinguished his thirst. John slaked his thirst. The following table illustrates a very simple case where such co-occurrence knowledge is essential in a multi-lingual perspective. The equivalents of the noun question in the various languages below do not pose any problem but the verbs that take these nouns as direct objects are definitely not equivalents or translations of each other and can be said to be unpredictable in any of the languages in question because they heavily depend on the noun with which they have to co-occur. Allerton (1982:27-28) claims that the noun actually "tailors", i.e. is responsible for the selection of, the verb which collocates with it, very much in the same way as a verb "tailors" the choice of the accompanying preposition.

English

French

Dutch

German

Spanish

Verb

ask

poser

stellen

stellen

hacer

Object noun

question

question

vraag

Frage

pregunta

As pointed out by Heid (1994b:230), collocations can generally be classified according to the part of speech of their elements: noun-verb (NV), noun-adjective (NA), noun-noun (NN), verb-adverb (VB), adjective-adverb (AB) collocations. Noun-verb collocations can in turn be subdivided in terms of the grammatical function of the noun (usually, though not exclusively, subject or object). The following examples, excerpted from Hausmann (1989) and cited by Heid (1994b:230) illustrate the various types of collocations in a multi-lingual perspective:

Grammatical type

English

German

French

Noun + Adjective

confirmed bachelor

eingefleischter Junggeselle

célibataire endurci

Noun (Subj)+ Verb

his anger abates

Zorn verraucht

la colère s'apaise

Noun (Obj) + Verb

to withdraw money

Geld abheben

retirer de l'argent

Verb + Adverb

it's raining heavily

es regnet in Strömen

il pleut à verse

Adjective + Adverb

seriously injured

schwer verletzt

grièvement blessé

Noun + N o u n

a gust of anger

Wutanfall

une bouffée de colère

21

This view of collocations as polar combinations also owes much to Hausmann (1985) who distinguishes the base of a collocation, i.e. the element which is responsible for the selection of the other member of the combination, and the collocator. i.e. the element whose meaning is modified or, so to speak, tailored to fit the meaning of the base. In the examples above, the noun is the base in a NA or NV collocation, the verb is the base in a VB collocation and the adjective is the base in an AB collocation.

2.2.2

Grammatical collocations

All the examples given in the preceding sections involve two items belonging to open (nonfinite) classes, for instance a verb and a noun or an adjective and a noun. These collocations are frequently referred to as 'lexical collocations', as opposed to 'grammatical collocations'. Unlike the former, grammatical collocations involve one element from an open class and an element from a closed class, typically, but not necessarily, a preposition. For example, the verb depend collocates with the preposition on and not with of. This subclass mostly, but not exclusively, contains items such as verbs or nouns which subcategorize for a prepositional phrase headed by a specific preposition (account for, rely on, likely to, good at...). Similarly, we say 'on the stock exchange' and not 'at the stock exchange', even if one cannot say here that the noun subcategorizes for a specific preposition. It should be noted that, for some linguists (e.g. Benson et al. 1986a,b), grammatical collocations also include what is commonly considered to be the realm of complementation, viz. the various types of syntactic structures a given lexical item subcategorizes for. For instance, verbs such as avoid or suggest require an -ing form while the verbs offer or decide require a to-infinitive. 772ai-clauses can also be part of grammatical collocations, as illustrated by the fact that an adjective such as adamant can be used with this type of clause (consider the sentence 'She was adamant that I should come with her'). Traditionally, British learners' dictionaries such as LDOCE, OALD, Cobuild or CIDE are very good at capturing grammatical collocations (in the narrow sense) and other complementation patterns. Foreign students are (or at least should be) used to consulting these reference works to find out, for example, which specific preposition a given lexical item requires. A sophisticated (and sometimes rather obscure) system of grammar codes is proposed to help the learner select the correct syntactic construction in which an item has to be inserted. These dictionaries, however, usually provide uneven help when it comes to supplying a user with information on lexical collocations, which is just the area on which this book is going to concentrate.

2.2.3

Support verbs

Machonis (1991) defines support verbs as verbs that carry little semantic content and are used for syntactic support (e.g. commit suicide, make an analysis, to do somebody a disservice, to give a sigh). Gross (1981) analyses support verbs within the lexicon-grammar framework and

22 notes that they embody semantic restrictions that are much more complex than selection restrictions. In another paper (Gross 1994:237), he notes that support verbs are in fact verbs which do not present selectional restrictions with respect to their subject and complement. The selectional relation holds instead between the subject of the verb and the complement. Indeed, many deverbal nouns usually need some kind of semantically almost empty verb which only conveys information about person, tense and aspect, as in: (1) John brought forward an argument. John can be seen as the first semantic participant (a.k.a. actant) of argument (i.e. the person who argues). Support verbs tend to preserve the relationship between the first actant and the supported noun (a predicative noun). They cannot really be characterized as totally empty, however, because they do have some (abstract) meaning and their function is to exclude other potential candidate verbs (make is used in conjunction with the noun mistake to mean what it means and to exclude do or give). In the following sentence, however, reject cannot be considered as a support verb: (2) John rejected the argument. Reject is not a semantically empty verb either. It carries a special (negative) meaning together with information about tense, person and aspect and is therefore an ordinary verb. It should be noted that support verbs roughly correspond to the type of lexical relation that can be encoded through the Oper lexical function used by Mel'cuk (see Chapter 5 and the section devoted to lexical functions and Oper in particular, p.81). Oper,, Oper 0 ... refer to the semantically empty verb which takes the first, second ... actant of the keyword as its subject and the keyword as its direct object (Steele & Meyer 1990:57). The examples given in Steele & Meyer's paper are: Oper, (attention) = pay Oper 2 (attention) = attract An analysis of the collocations of attention in the Collins-Robert dictionary shows that this noun can be combined with many more verbs (see Chapter 6 for a detailed account of how this information is extracted from the dictionary): Oper, (attention) = concentrate, focus, turn Oper 2 (attention) = arrest, capture, draw, engage, engross, excite, fix, invite, occupy, stir up, take up, win It could be objected that the link between pay and attention differs significantly from the link between concentrate/focus/turn and attention. Indeed, the former combination should be considered more as an idiom (or fixed multi-word unit) than as a collocation proper. Although the distinction between idioms and collocations is far from being a clear-cut one (I pointed out above that it is more a cline than a dichotomy), it should be realized that the expression

23 pay attention (to sth) displays some characteristics that are typical of idioms. For example, it cannot undergo certain syntactic manipulations that are normally possible with regular collocations: a. no possibility of using a pro-form or pronominal anaphoric reference: (3) * She didn't pay attention to me but I paid some/it to her. b. no possible cleft sentence: (4) * It is attention that should be paid to ... Moreover, the mere fact that attention cannot be modified by a determiner (* He paid his attention to...) clearly demonstrates that pay attention is different from focus attention or concentrate attention. The possibility of passivizing the expression (Attention should be paid to...), inserting some modifying material (They paid scant/little/considerable attention to...) or using a relative clause (It is something to which you should pay close attention) shows, however, that it is also different from truly fixed, totally frozen expressions such as to take place. Cruse (1986:41) uses the term 'bound collocations' for combinations such as pay_attention which belong to the transitional area bordering on idiom and represent the most fixed kind of collocations. Before concluding this section on support verbs, it should be repeated that delexical verbs such as make, do, get, take in English or the French verbs prendre, faire, avoir, etc. cannot be regarded as totally devoid of any meaning, unlike what can generally be found in the literature. Michiels (personal communication) points out that, in the collocation prendre un bain, prendre implies a feature like [+immersion] applied to the whole body or to the body part to which the action is restricted ("consider prendre un bain de pieds, un bain de siège, un bain de minuit, un bain de foule, etc. vs. faire un bain de bouche, un bain d'yeux). Moreover, the translational equivalences (prendre o take/have in prendre un bain vs. take/have a bath) cannot be extended to cases where bain is the head of a noun phrase with a further specification introduced by de and the whole NP must be considered before trying to translate the support verb (consider the following pairs given by the Oxford-Hachette and CollinsRobert dictionaries: prendre un bain de foule = to mingle with the crowd; prendre un bain de soleil = to sunbathe).

2.3

Acquiring lexical knowledge from textual corpora

Although this book is exclusively concerned with exploiting the metalinguistic information of a bilingual dictionary in a lexical-semantic perspective, it is necessary to devote a section to efforts that are currently being made to extract collocations from corpora. A dictionary such

24 as the Collins-Robert indeed typically aims for breadth of description. Unlike MRDs, however, a corpus is more useful to analyze frequent phenomena in depth. As pointed out in Fontenelle (1992a), one of the major problems with dictionaries is that the collocations they include are often arbitrary and we have no evidence that they reflect the actual behaviour of words.4 Only a corpus can provide us with statistical information on the frequency of combinations. A number of techniques have been proposed in the past few years and reviewing them all in depth would fall outside the scope of this book. Since some of the material included in the Collins-Robert database may be found debatable by those who argue for the use of statistical techniques in collocation studies, however, it seems reasonable to introduce the basic concepts current corpus-based approaches draw on. Interested readers will find detailed presentations of such techniques in Church & Hanks (1990), Church et al. (1991, 1994), Choueka (1988), Smadja (1991, 1993), Sinclair (1991), Zernik (1991), Clear (1993) or Baugh et al. (1996). In the framework of the DECIDE project, Fontenelle et al. (1994b) also provide a critical survey of some corpus-based tools used by lexicographers in their everyday work, focussing more specifically on the identification and extraction of statistically significant collocations. Electronic corpora are now becoming widely available and lexicographers who analyze concordances (a.k.a. KWIC5 lines) when preparing or writing a dictionary entry are now awash with an ocean of data and evidence. Detecting patterns and regularities in word meaning is a daunting task indeed and lexicographers therefore badly need tools to group concordance lines into meaning and usage patterns (Church et al. 1994:154). More specifically, lexicographers who are working on a lexical item X are mostly interested in finding answers to questions such as: * What does X typically do? * What does one typically do to or with X? * What is X typically like? In addition to these questions pertaining to the lexical collocations the item X participates in, the lexicographer is also interested in identifying the most frequent prepositions and constructions associated with X. To sift through the mass of information contained in a large6 4

5 6

The new generation of monolingual learner's dictionaries relies on corpus-based material, however, and the lexicographers in most British publishing houses are now used to working with statistically-analyzeddata (see inter alia Cobuild (Sinclair 1987a), the fifth edition of OALD (Crowther 1995), or the brand-new CIDE dictionary - Procter 1995). In the field of bilingual dictionaries, the situation is changing as well, witness the new Oxford-Hachette English-French dictionary (Corréard & Grundy 1994) which is the first entirely corpusbased bilingual dictionary (note that the 1993 version of CR is also largely based on corpus data). KWIC = Key Word In Context The Bank of English, which is used by the Cobuild research team at the University of Birmingham, is a collection of over 320 million words of written and spoken English (Clear el al. 1996). The British National Corpus, constructed by a consortium consisting of Longman Publishers, Chambers Publishers, the Universities of Lancaster and Oxford, the British Library and led by Oxford University Press, contains around 100 million words. The Cambridge Language Survey corpus used in compiling CIDE (Procter 1995) also contains around 100 million words.

25 corpus and to retrieve collocations ordered according to some statistical measure of significance, researchers have developed a set of techniques which summarize concordances and present the lexicographer with a picture of words which are significantly associated with a given item. One such technique is Mutual Information (MI) which is used to identify relations which occur more often than chance. It compares the probability of observing two words X and Y together (i.e. joint probability) with the probability of observing these words X and Y independently (chance). MI scores can be calculated as follows: P(X,Y) MI (X,Y) = log2 P(X)*P(Y) where P(X,Y) = probability that X occurs with Y within a given span of words, P(X) = freq(X) (frequency of X) divided by Ν (total number of words in the corpus), P(Y) = freq(Y) (frequency of Y) divided by Ν (total number of words in the corpus). In a clear and enlightening attempt to explain the usefulness of statistical techniques in corpus analysis, Church et al. (1994:159-163) show how Mutual Information values may be used in deciding whether a sequence of two words such as "requested and" is more or less interesting than the sequence "requested anonymity" (their figures are calculated on the basis of a corpus of over 44 million words): "In our corpus of Ν = 44,344,077 words from the 1988 Associated Press Newswire, we observed 161 instances of requested anonymity. Thus, the joint probability of requested anonymity is 161/N « 3.6 per million. Mutual Information compares this probability with chance: the probability of requested times the probability of anonymity. Since we have 1,419 instances of requested and 4,764 instances of anonymity, chance is 1419/N X 4764/N » 0.0034 per million. If we now compare the joint (3.6 per million) to chance (0.0034 per million), we see that the joint is much larger than chance (3.6/0.0034 « 1059), indicating that requested anonymity is probably very interesting, and it might be worthwhile to ask a lexicographer to see if the apparent pattern is linguistically significant. Since the ratio of the joint probability to chance tends to be quite large, mutual information expresses the ratio as a logarithm. (...) Thus, MI (requested,anonymity) » log21059 « 10, which is a relatively large score. In contrast, Ml(requested,and) « -0.2, because the joint (22/N » 0.50 per million) is slightly less than chance (1419/N X 793296/N « 0.57 per million). Since the mutual information is small (near zero), it is unlikely that requested and is interesting, and there is no statistical reason why a lexicographer should look at it further." For the sake of clarity, I also reproduce Church et al.'s table (1994:160) which is to be interpreted as a quick summary of what the lexicographer ought to look for in a concordance: A summary of words after requested:

26 MI(X;Y)

freq(X,Y)

freq(X)

freq(Y)

X

10.0 8.2 7.8 7.3 7.1 6.8 6.2 6.0 5.7 5.4 5.0 -0.2

161 14 5 5 4 4 9 5 6 4 199 22

1,419 1,419 1,419 1,419 1,419 1,419 1,419 1,419 1,419 1,419 1,419 1,419

4,764 1,529 698 968 935 1,090 3,744 2,519 3,498 2,928 190,545 793,296

requested requested requested requested requested requested requested requested requested requested requested requested

Y anonymity permission asylum copies detailed background documents protection additional meetings by and

The table clearly shows that a combination such as requestedby, although much more frequent in the corpus (199 occurrences) than requested anonymity (161 occurrences) or requested asylum (5 occurrences), is lexicographically much less interesting than the combinations which get a higher MI score. This summary also provides ample evidence that the data obtained is precisely what a lexicographer is looking for since it makes it possible to find out what one typically requests (one may request anonymity, permission to do something, political asylum7, etc.). This procedure does not give any information about the degree of restrictedness of the combinations it identifies, however. Some of the collocations in the table above are free and one looks in vain for restricted collocations such as Your presence/attendance is kindly requested, etc. It should not be inferred from the above description that Mutual Information is the only statistical measure at the lexicographer's disposal to acquire information about the different uses of a word and about the combinations which are not mentioned in dictionaries or which are simply overlooked by human introspection. Other statistical tools include the so-called tscore statistics, which measures the confidence with which one can claim that there is some association between two items, unlike MI which measures the strength of this association (Clear 1993:281). Church et al. (1994:163-168) argue that the t-test is useful to find subtle differences between near synonyms, thereby answering questions such as: What is the difference between strong and powerful, or between request and demand!

In collocation studies, 'co-occurrence' is defined as co-occurrence within the span o f words that appear to the left and right o f a given item (the node, here requested). It should therefore not be equated with immediate adjacency, which explains why the combination request asylum in He requested political asylum would get a high MI score, even if asylum is premodified and is not immediately ad jacent to requested. Most programs used by professional lexicographers make it possible to change the parameters, i.e. to customize the number o f words which are scanned to the left and right of the node. By default, the range is frequently set to 5 (-5,+5). Clear (1993:278) and Fontenelle et al. (1994b) provide more information on settings and parameters in collocation extraction tools. Baugh et al. ( 1 9 9 6 ) also provide interesting information on the use of corpus tools in compiling CIDE (Procter 1995) and describe in detail the statistical indices they used to measure the strength of collocations.

27 In their Associated News corpus, for example, Church and his colleagues note that anonymity is much more likely to be requested than demanded, even though the two combinations are equally common (they found 175 instances of request anonymity and 165 instances of demand_anonymity). Since request is much less common in their corpus than demand, they note, one would not have expected so many occurrences of request anonymity, which gets a higher t-score. The increasing availability of lemmatizers, taggers (programs which label items with their word class) and parsers (programs which identify the syntactic function of a given item) also makes it possible to refine the queries formulated by lexicographers. Smadja's lexicographic tool, Xtract, capitalizes on a combination of linguistic analysis and statistical techniques to retrieve co-occurrence knowledge from large textual corpora (Smadja 1991, 1993). Xtract takes a parsed corpus as input (i.e. a corpus with syntactic labelling) and produces lists of statistically-significant lexical collocations. The following examples illustrate some results for two corpora consisting of around 2,300,000 words (Smadja 1991). Each row represents a productive word (the node of the query) and its collocates are automatically classified according to the part of speech (N=noun; V=verb; A=adjective): law: V: change, enforce, pass, violate N: court, universe A: Jewish, penal, physical argument: V: have, reject, settle, use N: court A: side, valid food: V: eat, have, prepare, sell N: health, industry, product, shortage, supply A: good, kosher, spicy, Yemenite These results obviously reflect the small size of the corpus and its specialized nature (mostly Israeli newspapers). The composition (size and balance) of a corpus is obviously an important issue in corpus construction and the criteria which are available to the researcher and the lexicographer to build a representative corpus involve several dimensions which depend on the nature of the applications envisaged by the compiler (see Engwall 1994). Smadja is not concerned with constructing a balanced corpus, however, since his main goal is to design and implement collocation extraction tools. He has also refined his algorithm to be able to specify that a noun used as the object of a verb is a direct or indirect object (taking into account the distribution of words, the presence of relative clauses or passive constructions, etc.). This program is mainly efficient with respect to predicative relations (verb-noun, adjective-noun, noun-noun relations), which are precisely the type of relations on which this book is going to concentrate. It will be noted that a program such as Xtract has to rely on a previously analyzed corpus and that no tagger can be considered as infallible (most surveys report success rates of around 90%) and parsers are even more error-prone, which may result in failure to identify combinations which, to a human eye, would inevitably be considered as an

28 interesting collocation.8 In addition to these, Xtract can also be used to identify rigid noun phrases (e.g. "New York Stock Exchange", "The Dow Jones average of 30 industrials"...) or even multi-word expressions which Smadja calls phrasal templates and which can contain empty slots. A typical example of such phrase-long collocations is 'The Dow Jones average of 30 industrials fell X points to Y', where X and Y are empty slots that must be filled in by a number. Other linguists (e.g. Nattinger & DeCarrico 1992) would not consider such a combination as a collocation proper, but as a lexical phrase. The use of part-of-speech tagged and syntactically-analyzed corpora has also led some researchers to use statistical techniques to extract subject nouns and object nouns of verbs in an attempt to automatically or semi-automatically generate thesauri (see Grefenstette 1994a,b). Such work relies on a measure of semantic closeness which is based on the widely-held assumption that words that share a large number of collocates belong to the same semantic set. Implications for the semi-automatic compilation of domain-dependent lexicons and thesauri are obvious, especially in an information retrieval perspective. The primary goal of such corpus-based techniques, however, is to assist lexicographers in compiling dictionaries, whether general-purpose reference works such as learners' dictionaries or specialized collocational dictionaries such as those which are described in Chapter 3 and which are still usually compiled manually. Before concluding this chapter, I should like to comment briefly on a distinction which is often made between general-language and sublanguage collocations (see Martin 1992). Since the Collins-Robert dictionary, from which the material of this book is extracted, is a generalpurpose dictionary, I have decided to deliberately exclude terminology, which means that emphasis will be laid on general-language collocations, even if there is no sharp cut-off, as is evidenced by the fact that the Collins-Robert dictionary includes some items that are unarguably technical. It should be realized that the set of collocations extracted from a corpus heavily depends on the nature of the corpus. The noun "coffee", for example, will most often be associated with words such as drink, prepare, strong, black, hot, etc. in a general-language corpus. In a corpus of financial texts, however, the same noun will most likely be associated with verbs such as trade, buy, sell, exchange, import, export or even, through a process of metonymy, with verbs such as increase, drop, fall, surge... (as in "Coffee dropped dramatically on the New York Stock Exchange", meaning "the price of coffee....").

2.4

Conclusion

The aim of this section was not to suggest yet another definition of collocations but rather to try to sketch the various facets of these quaint combinations of words and to provide a picture

8

Consider the sentence This law was done away with in 1983. It combines several difficulties which hinder the automatic recognition of the collocation: passivization, use of a complex phrasal verb (most collocation extraction tools work best with bigrams, i.e. pairs of words co-occurring within a given span of words. The phrasal verb do away with is actually made up of three "words" and the combination with law involves four highly frequent items, which makes the identification of this V-N collocation very difficult).

29 of current attempts to extract them from MRDs and corpora. It should now be clear that there is no such thing as a clear, non-controversial and all-embracing definition of a collocation. This very notion should be conceived as a rather fuzzy area along a cline ranging from totally free combinations on the one hand to completely fixed multi-word units on the other. Various factors need to be taken into account to get a clear picture of collocations, from purely linguistic factors to more statistical phenomena. It should be realized that the various competing schools would probably not reach any agreement as to the exact place a given combination of items occupies on this continuum. As will be made clear below in the chapter devoted to a brief survey of a few prominent collocational dictionaries, compilers themselves often have difficulty in adhering to the principles they establish in the front matter of their dictionaries. For the purpose of this book, what should be borne in mind is that the primary material which I will seek to extract from the Collins-Robert dictionary belongs to the central area of the cline. Idioms will not be considered here because they are found in the section devoted to examples in the printed dictionary and this section has not been exploited. Free collocations, which are usually predictable and do not pose too serious problems for language learners will normally not be taken into consideration either, although it should not be forgotten that the Collins-Robert dictionary does not make any distinction between restricted and semi-restricted or free collocations, between delexical, figurative, specialized or bound collocations. These technical terms definitely come in handy when one tries to characterize the linguistic properties of these combinations but, for reasons which will become clear in Chapter 6 which describes the construction of the database proper, no effort has been made in this dissertation to label the 70,000-odd combinations contained in our database with such technical terms. To sum up briefly, the analysis of the Collins-Robert metalinguistic apparatus in Chapter 6 will make it clear that most of the material contained in the database consists of lexical collocations belonging to the 'restricted' type defined above. Because of the very nature of the metalinguistic apparatus, a non-negligible number of 'free' collocations, of selectional restrictions and of non-collocational relations will be treated as well, using, whenever possible, Mel'cuk's theory of lexical functions (see section 5.3.4). All in all, then, the entire enterprise could be considered as an exercise in lexical knowledge acquisition from a bilingual MRD with the ultimate aim of making its lexical-semantic (co-occurrence and paradigmatic) information readily accessible for a number of NLP or human applications.

3 A few collocational dictionaries

3.1

Introduction

It hardly needs to be said that the Collins-Robert dictionary is not a collocational dictionary. It is a standard, bilingual, general-purpose dictionary which happens to be a treasure trove of collocational information. It is precisely this wealth of co-occurrence knowledge, together with the availability of the dictionary in a machine-readable form, which convinced me that there was even more in it than meets the eye and that it was worth embarking on its exploitation from a collocational perspective. The project described in this book, however, builds on earlier attempts to compile collocational lexicons since the past 10 or 15 years have seen the emergence of a new generation of combinatorial dictionaries intended for learners of foreign languages. Cowie (1997) notes that EFL dictionaries of collocations and idioms have been strongly influenced by Russian (and in general Eastern) phraseological research. In his survey of some major developments in British and Russian phraseology over the period 1930 to 1960 and their influence on recent EFL dictionary-making, Cowie focuses his attention on collocational dictionaries proper and on dictionaries of idioms such as Cowie, Mackin & McCaig (1983/1993) or Long & Summers (1979). The latter dictionaries are devoted to idioms and restricted collocations, strongly emphasizing the transformational potential (possibilities and constraints) of these combinations. Cowie notes that, as far as the Oxford Dictionary of Current Idiomatic English (ODCIE, Vol.2, Cowie et al. 1983) is concerned, "in the case of restricted collocations, only those items were included which were entirely invariable (e.g. break one's journey, curry favour) or which displayed limited collocability (e.g. a chequered career/history, do the necessary,/needful)". Interestingly, when the headword in ODCIE is an idiom or a restricted collocation, the collocability of this headword is also shown. An entry such as a flying start, for example, includes collocating verbs such as give, have, present or get off to, which illustrates the problem of recursive collocability (see also p.79). Since such true idioms and multi-word combinations which belong to the area bordering on idiom are most often not found in the italicized material of the Collins-Robert dictionary, but are usually dealt with in the boldface section (i.e. frequently buried among the examples), I will not consider dictionaries of idioms such as ODCIE or LDOEI any further. The type of material they contain, however important it may be for language teaching or computerized applications, is not included in the database described in this book, which is essentially derived from the italicized material of the CR dictionary. In this chapter, I would therefore like to survey a few collocational dictionaries whose design has, to some extent, exerted a strong influence on the structure of the Liège database and the rationale of the semantic information which has been added to it.

32 3.2

The BBI Dictionary

The BBI Combinatory Dictionary of English: A Guide To Word Combinations (Benson et al. 1986a) is a monolingual dictionary of English that is designed to tell users which words go together. For a noun entry, for example, it gives the set of verbs that take this noun as subject or as direct object, the adjectives that typically modify it or the other nouns which form a collocation with it. Like a learner's dictionary, it also mentions the various grammatical collocations the items enter into, which, for the BBI team, comprise genuine grammatical collocations such as verb/adjective + preposition combinations but also complementation patterns (e.g. verb followed by a that-clause). As noted by Cowie (1997), the latter can hardly be called 'collocations', since they do not involve lexical items but rather grammatical categories (+ infinitive, + gerund, etc). As a matter of fact, they are subcategorization phenomena and a number of linguists would prefer to call them 'colligations', even if the term is mainly used for the combination of grammatical categories. The principles on which the BBI dictionary is based are expounded in Benson (1985, 1989, 1990) and in Benson et al.'s book on the Lexicographic Description of English (1986b). Suffice it to say here that the authors insist that only restricted collocations are entered in their dictionary. The underlying principle for the inclusion of given combinations is based on their conception of collocations as 'arbitrary recurrent word combinations', the notion of 'recurrence' being applied rather empirically and intuitively, since the dictionary only includes made-up material and excludes any type of corpus-based data. Benson (1989:6-8) insists that the BBI adheres to the principle that lexical collocations should be placed at the entries for bases (e.g. at the noun entries for noun+ verb collocations such as to do/perform a caesarean section, to gain/get access...). As a rule, the BBI authors have therefore stuck to Hausmann's division between base and collocator, which provides an appropriate framework for the compilation of encoding dictionaries. The following figure illustrates the BBI entry for the noun impression·. impression n. 1. to create an ~ on 2. to make an ~ on, upon 3. to gain an ~ 4. an accurate; deep, indelible, lasting, profound, strong; erroneous, false, inaccurate, wrong; excellent; favorable; first; fleeting; general; good; painful; personal; pleasant; unfavorable; unpleasant; vivid ~ 5. an ~ that + clause (she created the erroneous ~ that her family is wealthy) 6. under an ~ (I was under the ~ that you would come) Despite the fact that the use of an indefinite article is most disputable in under an impression (but the example in parentheses is correct), it has to be admitted that the BBI dictionary provides the user with a rich source of decision-making material in the form of lists of collocations to be used by someone who does not know which term to employ in a given context. The combinations are arranged as follows: lexical collocations are listed first and the so-called grammatical collocations are found at the end of the entry. A swung dash (~) replaces the headword and its position indicates the syntactic function it plays in a given construction (in a noun entry, a swung dash appearing to the right of a verb indicates that the noun is the direct object of a transitive verb, as in to create an ~ above). To save space,

33 collocations are listed in strings and a comma separates synonyms or near synonyms in a given string. Non-synonymous collocations are separated by a semi-colon. In a very interesting and informative introduction, the BBI authors outline the various principles that have governed the compilation of their dictionary. They describe the major types of grammatical and lexical collocations which are included in the BBI. The eight grammatical collocations described in their introduction (IX-XXIII) are labelled as follows: Gl: noun + preposition: a witness to, blockade against G2: noun followed by to+infinitive: an obligation to do something, a pleasure to... G3: noun followed by that-clause: the gossip that he intends to give up his job G4: preposition + noun: on the Stock Exchange, by accident G5: adjective + preposition: angry at someone, busy at/with G6: predicate adjective + to-infinitive: he is ready to go G7: adjective + that-clause: She is adamant that I should accompany her. G8: this category is subdivided into 19 verb patterns indicating whether a verb is followed by to-infinitive, a gerund, a that-clause, etc. Some patterns also specify whether the verb undergoes a specific type of alternation such as, for example, the dative movement alternation (John gave Mary the book vs. John gave the book to Mary). All in all, the eight types of grammatical collocations, including the 19 verb patterns, roughly correspond to the complementation and valency patterns which have made learners' dictionaries such as LDOCE, OALD, COBUILD or CIDE so popular. The latter dictionaries, however, explicitly indicate possible constructions in the form of grammatical codes appearing in the entries themselves, most often at word sense level. Whether such codes are "easy to understand", as the Cobuild team claims they are, or more obscure, is immaterial here. What is important is that all learners' dictionaries provide this information (even the CIDE illustrative examples are assigned grammar codes), while the BBI has decided not to adopt the principles set out in its introduction. Cowie (1997) argues that "nothing has been lost by deciding (in the great majority of cases) not to include pattern codes in the entries themselves (because) syntax is clearly and economically conveyed through examples". One may then wonder why the BBI authors have invented such codes as Gl, G2... which are hardly ever used in the dictionary itself (verb patterns are more frequently mentioned but the level of specificity in the description of the syntactic environment in which a given item may be inserted is much lower than the detailed grammatical coding provided by the learners' dictionaries cited above). The introduction to the BBI also includes an interesting description of the principles which guided the selection of lexical collocations. Benson et al, (1986a: XXIV-XXVIII) distinguish seven major types of lexical collocations, which they label as follows: LI : transitive verb + noun: make an impression, inflict a wound, wind a watch. The authors note that many verbs in this category denote creation (compose music) and/or activation (set an alarm), which explains why they call these arbitrary fixed combinations CA collocations.

34 L2: transitive verb + noun: annul a marriage, quench one's thirst, eradicate a disease, abolish a law. Syntactically speaking, these combinations resemble LI collocations. From a semantic point of view, however, the verbs in the L2 class basically denote eradication and/or nullification, which is why the authors call these combinations EN collocations. L3: adjective + noun: a formidable challenge, a confirmed bachelor; warm/kind/best regards. L4: subject noun + verb: telephones ring, elephants trumpet, water boils, freezes... L5: noun + noun: a herd of cattle, a school of fish, a piece of advice. The collocator denotes the unit which is typically associated with the base. This unit can refer to a regular group or to a smaller, specific unit. L6: adverb + adjective: closely related, deeply absorbed. L7: verb + adverb: argue strongly/heatedly, affect deeply, protest vigorously. Several remarks are called for here. First of all, the seven categories of lexical collocations bear a strong resemblance to Mel'cuk's lexical functions (see Chapter 5). The LI class includes collocations which are usually formalized in terms of the Oper and Real LFs. The L2 class contains the EN verbs which are exponents of the Liqu function in Mel'cuk's framework. The L3 category is slightly more complex: in many cases, the adjectives have an intensifying function, which can best be represented in terms of the Magn LF. In other cases, however, they cover meanings which are covered by the Ver, Bon or Qual functions. L5 collocations correspond to Sing and Mult, depending on whether the collocator refers to a single unit or to a regular group.1 In many cases, the L6 and L7 categories contain exponents of the Magn function applied to adjectives and verbs respectively. L4 is a mixed bag of nounverb collocations which, in Mel'cuk's formalism, correspond to LFs as diverse as Son (elephants trumpet), Fact (bombs explode, adjectives modify) or Degrad (milk turns sour), to name but a few. It is therefore clear that the BBI classification is basically syntactic since it depends on the part of speech and, occasionally, the syntactic function of the constituents of the collocation. Only in the LI and L2 categories do we see an attempt at providing a broad semantic classification to distinguish CA and EN collocations. The most serious drawback of the BBI approach, however, is that these codes, however superficial they may be, are never used in the dictionary itself. Consider the entry for habit, which reads as follows: habit n. ['custom'] ['usual manner'] 1. to acquire, develop, form a ~ 2. to make a ~ of smt. 3. to get into a ~ 4. to break a to get out of a (slang) to kick the ~ 5. to break smb. of a ~ 6. an annoying; bad; entrenched, ingrained; filthy; good; incurable; nasty; repulsive ~ 7. irregular; regular ~s 8. a ~ of (he has a bad ~ of interrupting people) 9. by force of ~ 10. out of ~ (I did it out of ~) ['costume'] 11. a monk's; nun's; riding ~ Even though the BBI dictionary manages to capture many arbitrary collocations, it does not attempt to provide the 'semantic' interpretation the authors describe in their introduction. The distinction between CA and EN collocations, for example, is an important one, but the user

The term unit used by the BBI compilers in their definition of the L5 class is a strange word since it can refer to a set (shoal, school, herd, colony...) or to small pieces (bit, piece,...).

35 is left to discover which combinations listed in the entry correspond to these semantic distinctions. Of course, synonymous collocations that share a common semantic component are listed in a string {to acquire, develop, form a habit), but the reader has to discover the deep meaning of the collocation since no indication tells him or her that the verb has, say, a durative, a terminative or a causative meaning. In this respect, the BBI information is very limited. In fact, the authors could even have dispensed with the nice classifications and subcategories described in the prefatory matter since the entries contain hardly more than lists of collocations without any attempt at guiding the user through some kind of semantic organization. The principle that commas separate synonymous collocators in a given string is, in some cases, not adhered to by the compilers, which means that the user is likely to be misled by the internal organization of an entry, as in the following example, abridged for the sake of clarity: horse n. ['animal'] (...) 6. ~s canter; gallop; neigh; snicker; trot; whinny The string of collocations above (item 6) is intended to provide information about the types of intransitive verbs which may co-occur with horse, thereby answering a question such as: What do horses typically do? It is clear, however, that the material within this string could, and even should, have been arranged differently. On the one hand, canter, gallop, and trot, although not synonymous, all refer to the gait of a horse, speed being the distinguishing factor. On the other hand, neigh, snicker and whinny refer to typical sounds made by horses and should appear in one and the same category. In an ECD-like entry (see section 5.3.4.5), the lexical function Son would be used to formalize the meaning relationship between horse and these three verbs. Even if no semantic labels are introduced in the dictionary (this would entail a drastic reorganization of the microstructure of the entries and would dramatically increase the size of the dictionary), it is obvious that the arrangement of collocators should, as much as possible, reflect the semantic components they share. The alphabetical ordering adopted in the entry above implies that the user of the dictionary who does not know the meaning of the verbs in question is sure not to perceive the link between these verbs and the two broad categories the compilers wished to cover (viz. movement and noise). A few words should also be said about the criteria which governed the selection of the material in the entries. Benson et al. (1986a:IX) make it clear that "collocations should be included in dictionaries (while) free combinations should generally not be included". The two attributes which the authors use to characterize restricted collocations are recurrent and fixed. Cowie (1997), however, shows that this definition of restricted collocations is based on two misconceptions. First, Cowie argues, there are many arbitrary restrictions which are not entirely fixed {quench/extinguish a fire, but *slake a fire). Second, recurrence is a statistical measure and whether the frequency of occurrence can be correlated with restrictedness is a moot point, as is testified by the fiery debate opposing the proponents of corpus-based studies (see inter alia Sinclair 1987b, 1991) to those who argue for the inclusion of invented examples based on the intuition of the lexicographer (Cowie 1991, Hausmann 1979, Laufer 1992). The debatable nature of the BBI authors' definition of restricted collocations explains why the BBI lists many free combinations, alongside genuine restricted ones. While to proofread a book

36 or to censor a book are restricted collocations, to write a book is unquestionably a free combination. The BBI compilers have decided to include the latter because it typically refers to the main actions associated with books. The basic questions one asks with respect to books are the following (see also Fontenelle 1992a:224; Howarth 1997): - What are books typically like? (a comic, illustrated, rare - What do books typically do? (they appear, come out, are published, go out of print...) - What does one typically do with books? (one writes, publishes, censors, translates, edits2... books) All these acts and attributes help us build a sort of semantic network around the notion of book, thereby helping us understand the nature of books in general. Co wie (1997) notes that this analysis draws on the concept of lexical function and accounts for the presence of numerous free collocations in the BBI dictionary. We will see in the subsequent chapters of this book that the database built from the Collins-Robert material is based on the same principles insofar as the distinction between restricted and free collocations has not always been respected. Neither did it have to, if one considers that the Collins-Robert dictionary is not a collocational dictionary. Moreover, the applications envisaged in this book are likely to make full use of the wealth of information provided by this database even though, theoretically, a distinction ought to have been made between free and restricted collocations.

3.3

Selected English Collocations (SEC)

Kozlowska & Dzierzanowska's Selected English Collocations (1982; 2nd ed. 1993) is a dictionary which is intended to give guidance on the verbs, adjectives and nouns that can be used in combination with the noun headwords of the dictionary. The latter are listed alphabetically and the short preface describes the typical structure of an entry (pp.7-11): -

a a a a a

headword, list of transitive verbs that can have the headword as object, list of verbs that can have the headword as subject, list of adjectives that can qualify the headword, list of nouns that can form with the headword noun+noun collocations.

Unlike the BBI dictionary, SEC, which draws on newspaper data, does not indicate grammatical collocations or complementation patterns. Another feature which distinguishes SEC from the BBI is that the compilers of the former have deliberately excluded verbs and

2

Strangely enough, the verb read, which immediately springs to mind when talking about books, is omitted from this list of "collocations".

37 adjectives as headwords, all the headwords in SEC actually being nouns. The compilers justify this decision on the grounds that "the most important kind of collocation sought by a writer or translator is one based on the noun, for it is the noun that sets the semantic context in the sentence" (p.8). The user who is interested in non-nominal bases is referred to English Adverbial Collocations (Kozlowska 1991), which can be considered as SEC's companion volume and contains lists of adverbs collocating with verbal and adjectival entries (see below). The following extract from SEC illustrates the entry for habit: HABIT V. abandon, acquire, adopt, break, break oneself of, catch on to, conceal, develop, divest oneself of, drop, eschew, fall into, follow, form, get into, get out of, get rid of, give up, go back, grow out of, have, incur, keep to, learn, lose, pick up, pursue, resort to, take up ~ V. ~ become entrenched/ingrained, develop, grow, grow up, persist, stick Adj. bad, compulsive, dangerous, (un)desirable, disgusting, (un)healthy, horrible, incorrigible, irritating, obnoxious, obstinate, odd, offensive, old, pernicious, persistent, (un)pleasant, regular, reprehensible, repulsive, strict, unfortunate, useful, usual, vicious, well-worn ~ The compilers note (p. 10) that the adjectives listed in SEC are mostly qualitative (reliable is given as a potential collocate of data) and not defining (statistical is not listed in the entry for data although the two words enter into a collocation). This reflects the encoding perspective adopted by the compilers who wanted to provide users with memory-jogging material. The most frequent problem non-native speakers have is to choose an item from an arbitrarily limited set of words (see also Cowie 1997; Hausmann 1979). In the case of habit, this policy explains why an adjective such as persistent is listed (to be contrasted with persisting which forms with habit an unacceptable collocation) and why smoking or drinking do not appear in the entry, even though smoking (etc) habit is a frequent combination. Cowie (1997) notes that "SEC - fortunately for the user - provides the opposite of what [the authors] claim it provides". Indeed, they insist their dictionary focuses on 'free' or 'open' collocations to the exclusion of restricted phraseological units while, in fact, SEC lists restricted collocations only (annotate, edit, publish a book) and excludes the free collocations which are to be found in the BBI (to write a book, for example). The latter, which also contribute to our understanding of the keywords under which they appear, are neglected because their inclusion would have entailed a dramatic increase in the size of the dictionary. Theoretically speaking, it is also more appealing to focus on the class of restricted collocations, which are likely to pose more problems to foreign learners, but the inclusion of material bordering on the 'free' zone also presents numerous advantages, such as the possibility of constructing semantic networks or envisaging applications in information retrieval, as is the case with the CollinsRobert database described in this book. Before concluding this section, I should also say a word about the 'retrieval' facilities offered by the dictionary. Since no semantic grouping is provided, the classification by part of speech is the only type of access key. Within a given grammatical class, the collocates are listed alphabetically. Consider the adjectives which can collocate with enthusiasm:

38 ENTHUSIASM Adj. ardent, catching, delirious, eager, fanatical, fickle, great, guarded, lasting, lukewarm, much, overwhelming, surging, tremendous, violent ~ The presence of much in the class of adjectives is highly questionable. More serious, perhaps, is the criticism which can be levelled against the alphabetical ordering of collocates. Cowie (1986, 1997) suggests that it would be more appropriate to put together words of related meaning, using a superordinate term as a keyword introducing the semantic sub-class. Terms such as BIG or SMALL could be used to introduce classes containing adjectives such as ardent, catching, great, overwhelming, etc. on the one hand and lukewarm on the other. Such metalinguistic superordinates would help the learner to retrieve the appropriate items on the basis of a meaning component he or she would be armed with. Cowie's suggestion falls into line with the proposals put forward by Hausmann (1979). The latter also argues for the introduction of capitalized superordinates which enable the dictionary user to directly zero in on the relevant semantic sub-class, which dramatically reduces the retrieval time. Hausmann's entry for doute in French (also cited in Cowie 1986:66) clearly illustrates this methodology which makes a collocational dictionary much more user-friendly than SEC or the BBI: doute 1. [S+v] NAITRE, EXISTER: naître, surgir, m'envahit, plane, subsiste, persiste; DISPARAITRE: s'évanouir, s'envoler. 2. [v+S] AVOIR: avoir, concevoir, éprouver, il me vient des doutes; FAIRE NAITRE: inspirer; EXPRIMER: émettre, formuler; FAIRE DISPARAITRE: lever, écarter, éclaircir, dissiper, balayer. 3. [v+prép+S] (être) assailli de doutes, rongé, torturé, tourmenté par le doute; être, laisser dans le doute; mettre, révoquer en doute. 4. [(a)+S+(a)]: légers -, - affreux, subits, persistants, bien fondés. 5. [s+prép+S]: le supplice du doute There is clearly some logic behind the ordering of the superordinate labels, although Hausmann leaves it implicit. The five main categories clearly reflect the traditional distinctions between N+V collocations, V+N collocations (where Ν is the direct object of the transitive verb), V+Prep+N collocations, Adj+N and N+N collocations. Interestingly, Hausmann and Cowie's suggestion that superordinates should be used as labels for semantic subclasses looks very much like Mel'cuk's use of lexical functions (see 5.3.4). The following correspondences between the metalinguistic labels above and lexical functions can be established: NAITRE, EXISTER: IncepFunc0, Func0, ContFunc0 DISPARAITRE: FinFunc0 AVOIR: Oper, FAIRE NAITRE: CausFunc0 EXPRIMER: CausManif FAIRE DISPARAITRE: Liqu The superordinate labels are much more telling because they are natural language expressions and have a mnemonic quality. Pedagogically, they are certainly more effective than the

39 somewhat obscure lexical functions whose opaqueness and complexity often confuses and baffles potential users of the ECD. The very complexity of these lexical functions, however, makes it possible to refine the formalization of the lexical-semantic relations between the base and the collocator. The broad class of verbs which are assigned the NAITRE/EXISTER label above are in fact far from being synonymous and would be distinguished with aspectual operators in Mel'cuk's formalism. Naître and surgir are inchoative verbs and should be the exponents of the IncepFunc 0 function. Envahir is also inchoative but takes a direct object, which means that the IncepFunc, LF should be used, making use of the subscripted reference to the first semantic actant of doute. Subsister and persister are durative verbs and should therefore be considered as exponents of the ContFunc 0 LF. In the remainder of this book, I shall try to show the advantages of such semantic coding in a collocational database, capitalizing on hypotheses and suggestions put forward by Hausmann and Cowie, among others, and making use of the general rules formulated by the BBI and SEC compilers, in an attempt to improve these dictionaries in a principled way.

3.4

English Adverbial Collocations (EAC)

Kozlowska's English Adverbial Collocations (1991) originated as a companion volume to the SEC dictionary described in the preceding section. While SEC focuses exclusively on noun headwords, EAC only includes verb and adjective headwords and is intended to provide lists of verb-adverb and adjective-adverb collocations. The book consists of two parts. Part one (pp.9-50) deals with an elaborate definition of collocations, paying special attention to the concept of chain collocation (three-unit collocations in which the units are interlinked in a chain). Part two is the dictionary itself (pp.53-159). Like its predecessor (SEC), the EAC dictionary was produced without any computational support, which is quite remarkable when one considers the wealth of information it contains. The corpus from which evidence was extracted basically consists of newspapers (printed material only) and we are not given any information about its size and the criteria which were used to justify the exclusion or inclusion of collocations. The compiler acknowledges that "the material collected was intuitively recognized as likely to be of aid to learners, or writers, or translators" (p. 12). She considers her dictionary as a 'time-saver' which is intended to give ready access to a range of possible suitable words for a given context. No grammatical information is given, nor is there any attempt to assign stylistic or pragmatic labels. The following examples illustrate a few EAC entries: BLAME sb/sth Adv. entirely, rightly, subsequently, unjustly, wrongly BLEED (lose blood) Adv. copiously, heavily, profusely, to death BLUSH Adv. deeply, furiously, hotly, painfully, violently

40 BOAST Adv. openly, rashly BORING adj. Adv. deadly, excruciatingly IGNORE sth Adv. altogether, blatantly, blithely, completely, conveniently, deliberately, endlessly, entirely, flatly, happily, indefinitely, pointedly, prudently, safely, shrewdly, stoically, stubbornly, studiously, uncharitably, utterly, wilfully The entries above make it clear that the EAC material contains both free and restricted collocations. On the semantic level, this material also displays some heterogeneity. While rightly, unjustly or wrongly have a clear semantic link with the verb blame (roughly speaking, they correspond to the exponents of the Ver and AntiVer lexical functions in Mel'cuk's formalism), it is hard to perceive anything else than a free combination in blame subsequently where the adverb is not bound specifically to the verb and allows free substitution. The author's decision to deliberately reject frequency studies in the compilation of a collocational dictionary is justified as follows: "which collocates are better? - a high-frequency collocate, but somewhat commonplace and banal, or a low-frequency collocate, but expressive? Which should be included in a dictionary of collocations? Probably both are needed. But it may be that the high frequency of a word, or collocation, is not always the same as high value. Possibly a rare collocation is more valuable than a common one" (p. 14). Kozlowska is certainly right when one considers the needs of translators who require a profusion of decision-making material in their everyday activities. I believe, however, that an adverb such as subsequently does not contribute in any way to our understanding of the verb blame and should therefore be excluded from a collocational dictionary. To use Heylen's terminology (1993), "blame subsequently" is a p-collocation insofar as a corpus-based statistical analysis would probably reveal that the two items frequently co-occur, but subsequently is a peripheral element that is much freer than verbs such as write or read which, when combined with the nouns letter or book, bear a syntagmatic and semantic relationship with the bases of the collocation, even though they cannot be called restricted collocations (see section 3.2, p.35). In fact, this illustrates the need for a term other than 'free' or 'restricted', since these concepts can be combined to form the following categories (see also Howarth 1997): - semantically linked + restricted choice: advise strongly, bleed profusely - semantically linked + open choice: blame unjustly/wrongly, write/read a book - semantically unlinked + open choice: blame subsequently Finally, the question of retrieval should also be mentioned. Unlike the SEC dictionary, EAC only provides two related classes of collocates, viz. adverbs and adverbial phrases. Within these classes, access is limited to the alphabetical order, very much as in the SEC material. No semantic information guides the user in his quest for the appropriate collocator. It is clear that many adverbs have an intensifying function (bleed profusely, affect deeply/profoundly...), but many collocatore can also convey other shades of meaning, such as evaluation (justifiably/unduly optimistic, sing beautifully) or frequency {hardly/scarcely used). The precise meaning, however, is not specified and the user is advised to consult a traditional dictionary

41 for further explanations. The drawbacks are therefore the same as those which have been described in the section devoted to SEC. Some effort has been made in EAC, however, to guide the user more quickly to the appropriate word by introducing, when necessary, a few contextual or semantic indications such as the following: ACCLAIM sb/sth Adv. (with active) enthusiastically, loudly, rapturously, rightly, warmly (with passive) justly, loudly, much, wildly DESERVE sth (sth positive; e.g. reward, success) Adv. fully, richly, truly DESERVED adj (sth negative; e.g. punishment) Adv. richly, well The restrictions in parentheses may sound most useful, at first glance. The syntactic (active vs. passive) or semantic ([±POSITIVE] direct object) constraints they stand for indeed exercise a strong influence on the membership of the set of possible collocators. Knowles (1993:301-302) remarks rather harshly that such indications "are contentious for three reasons: firstly, they are dogmatic statements, rather than nicely stylistic observations; secondly, they introduce criteriality into the situation without fully naming and defining the criteria; and thirdly, they do not take proper account of actual linguistic usage in terms of a cline of acceptability/acceptance rather than the polarization of right vs. wrong". Even if it is true that such value-judgmental claims are often debatable and have not been checked against corpus data, their usefulness cannot be denied, however.

3.5

Langenscheidts Kontextwörterbuch Französisch-Deutsch

Ilgenfritz et al.'s Langenscheidts Kontextwörterbuch Französisch-Deutsch (LKWB, 1989) is one of the few existing bilingual collocational dictionaries. It is intended for German-speaking learners of French and provides around 20,000 collocations for a set of approximately 3,500 noun entries. The structure of an entry is fairly straightforward. The French base is given, together with its German translation. Adjective-noun collocations are then listed, with their German equivalents. Verb-noun collocations and their translations come at the end of the entry and are frequently illustrated with a French example (which, unfortunately, is not translated). The following entry illustrates the collocational behaviour of the noun enthousiasme·. enthousiasme m Begeisterung ~ ardent flammende Begeisterung / ~ aveugle, irréfléchi blinde Begeisterung / ~ débordant iiberschwengliche Begeisterung / ~ délirant tobende Begeisterung; Taumel der Begeisterung / ~ mitigé gedämpfte Begeisterung brûler, déborder d'~ vor Begeisterung brennen, überströmen: Lorsqu'à l'âge de 20 ans, il a pris la firme en mains, il brûlait d'~, rêvant de tout réformer. / se laisser emporter, gagner par l'~ sich von der allgemeinen Begeisterung mitreißen, anstecken lassen: Il criait

42 à pleins poumons, se laissant emporter par / refroidir, calmer, (F) doucher l'~ die Begeisterung dämpfen, abkühlen: Il s'est rendu compte qu'il s'était fait beaucoup d'illusions et cela a refroidi son / son ~ se relâche seine Begeisterung leißt nach: Dès la première difficulté, son ~ se relâcha. / soulever l'~ Begeisterung hervorrufen: Son discours a soulevé l'~ de ses adeptes. / transporter qn d'~ j-n hinreißen: Ce n'est pas un spectacle de nature à transporter d'~ les spectateurs. Within a given grammatical category, collocations are listed in alphabetical order. In some cases, however, collocators which are deemed to be synonymous may appear together (consider the collocations enthousiasme aveugle, irréfléchi above). One may then wonder whether a more semantically-oriented approach would not have been preferable, especially when the entries are so long that looking for a given combination entails reading them from A to Z, which is like looking for a needle in a haystack. The artificial nature of some of the examples can also be criticized and one may definitely doubt their usefulness in a pedagogical perspective (Consider "C'est terrible, j'ai oublié de lui souhaiter son anniversaire et elle y tient tellement!", s.v. anniversaire, or, even worse, the example found under respect: "Cette personne, bien qu'elle soit très petite, inspire le respect"). Proponents of intuition-based dictionaries may have cogent arguments against the use of corpus data (see Cowie 1991) but the compilers of this dictionary would have been welladvised to consult a native speaker and make sure that the examples they had constructed were not too unnatural or odd, to put it mildly. Problems of consistency are present in any type of dictionary. Consider the following two entries which illustrate complex multi-word terms: durée du temps de travail f Arbeitszeit réduire, diminuer, raccourcir la ~ die Arbeitszeit verkürzen: Ces dernières décennies, on a progressivement réduit la temps de travail m Arbeitszeit récupérer le ~ perdu ausgefallene Arbeitszeit hereinarbeiten, nacharbeiten·. Comment voulez-vous récupérer le ~ perdu? / réduire le ~ die Arbeitszeit verkürzen: Le ~ sera réduit à trente-six heures. One first thing to note is that the verbs diminuer and raccourcir should have been listed next to réduire in the latter entry because they can also collocate with temps de travail, at least in the active voice (but consider the impossible constructions: * le temps de travail sera diminué/raccourci à 36 heures). Another, more fundamental criticism is that the compilers should not have deliberately excluded the all-important class of Noun-Noun collocations from their dictionary. Knowledge of collocations such as réduire le temps de travail is definitely essential in an encoding perspective, but I firmly believe that it is equally important to know that temps de travail is much more likely to collocate with réduction or diminution than with raccourcissement. In the field of nouns expressing emotions and feelings, to take another example, it is important to know that the noun colère can collocate with bouffée or accès, inter alia, to express a single "unit/portion" or manifestation of the feeling in question. In this respect, it is a pity that the compilers of the Kontextwörterbuch have ignored all the nominal

43 collocations which, in Mel'cuk's MeaningText Theory, correspond to the exponents of the lexical functions Sing, Mult, Culm, Centr or the complex LFs with S0.

3.6

Conclusion

As stated in the introduction to this chapter, I deliberately excluded from this survey dictionaries of idioms such as Cowie et al. (1983/1993) or Long and Summers (1979) on the grounds that they are not mainly concerned with the type of restricted collocations which can be found in the database derived from the Collins-Robert italicized material. The dictionaries surveyed include only one bilingual collocational dictionary, although it must be recognized that other such reference works are currently emerging such as, for example, Benson & Benson's Russian-English Dictionary of Verbal Collocations (1993), which is based on the same principles as the BBI dictionary. Very much in the same vein as Ilgenfritz et al.'s Kontextwörterbuch, Liang's Dictionnaire de collocations français-chinois (DCFC, in press) lists various types of collocations for around 2,500 nominal bases. Like the French-German dictionary, Liang's lexicon provides bases and their translations, followed by verbal collocations, adjectival collocations and noun-noun collocations (the latter type of combinations being excluded from LKWB, as I pointed out above). Descamps (1994:566) notes that the very structure of this French-Chinese dictionary is based on semantic principles: "(...) nul ordre alphabétique n'ordonne la file de ces collocations mais un ordre - autant que faire se peut - sémantique. Principale hypothèse de travail des auteurs: sous le gros des bases de la nomenclature - c'est-à-dire les bases désignant des états, des événements ou des actions il serait d'ordinaire possible de disposer les collocations verbales en cycle, de sorte qu'apparaisse l'évolution logique de la notion désignée par la base, "avec sa naissance, son développement, son apogée, son déclin, sa disparition et enfin sa renaissance éventuelle" (Liang 1991)". Such organization, which is designed to facilitate access to the collocations, strongly resembles the proposals put forward by Hausmann (1979), although Liang has chosen not to use the metalinguistic labels suggested by Hausmann to introduce the semantic subclasses corresponding to the different phases of the processes described. In a way, this semantic structuring also draws on Mel'cuk's approach, but without resorting to the formal apparatus of lexical functions. In the entry for joie, the first class of collocations will therefore refer to causative+inchoative verbs (inspirer, causer, apporter, procurer, donner de la joie à qn) while causative+terminative combinations (ôter la joie de qn) will appear at the very end of the entry (disregarding the fact that this collocation sounds slightly unnatural to me). I have also excluded specialized (LSP) dictionaries although the past few years have seen the emergence of a number of glossaries and specialized lexicons containing, collocational information. Canada, for example, has a long tradition of terminological work and several recent sublanguage combinatory dictionaries were produced there (Cohen 1986 for a collocational lexicon of the stock exchange; Lainé 1993: Vocabulaire combinatoire de la CFAO mécanique). Verlinde et al.'s Dictionnaire contextuel du français économique - Tome

44 A: L'entreprise (1992) is also an interesting attempt to describe the combinatorial properties of a set of terms in a given sublanguage. The pedagogical approach adopted in the latter dictionary demonstrates the usefulness of such collocational information in a languageteaching situation. However, it should be borne in mind that these specialized vocabularies cannot readily be described and formalized in terms of sets of relations such as Mel'cuk's lexical functions, which are geared to the description of general-language collocational phenomena. Blampain (1993:47) argues that it would be dangerous to try to apply Mel'cuk's theories to sublanguage phenomena because LSP dictionaries must resort to a large number of relations which Mel'cuk would call non-standard. It cannot be denied, however, that Mel'cuk's approach has exerted a strong influence upon the design of a few specialized collocational dictionaries and it is sensible to argue that LSP dictionaries could follow the general principles laid down by Mel'cuk, admittedly with different and very specific individual functions. Cohen (1986), for example, draws heavily on Mel'cuk's phasal and aspectual lexical functions in compiling entries for her Lexique de Cooccurrents ( Bourse - conjoncture économique). Consider the following entry from her dictionary, which deals with the collocations of the noun emprunt (also cited in Heid 1994b:245):

emprunt

nom

sujets des verbes

objet des verbes

START

émission lancement

INCREASE

accroissement augmentation

s'accroître augmenter monter

accroître augmenter

considérable élevé gros

baisse diminution réduction

baisser diminuer

réduire restreindre

petit

adjectifs

émettre lancer

UNDETERMINED DECREASE

END

clore liquider rembourser restituer

It is clear that the metalinguistic labels which appear in the left-hand column are very close to the lexical functions Caus, CausPredPlus, IncepPredPlus, CausPredMinus, IncepPredMinus and Liqu. Cohen's contention is that it is possible to describe the behaviour of stocks, bonds, interest rates and share prices in terms of the various stages in an economic cycle (1993:505-506). She also argues that the use of her dictionary does not pre-suppose prior knowledge of the metalinguistic descriptive apparatus, as is the case with Mel'cuk's lexical functions. START, INCREASE, DECREASE, END, etc. are indeed thought to be self-

45 explanatory although it has to be admitted that the level of specificity of the description is shallower than Mel'cuk's. Consider a label such as INCREASE which does not indicate whether the verb is causative or inchoative; in this respect, it merely corresponds to the lexical function Plus. The adjectives petit, considérable, élevé and gros do not display the increase/decrease seme and would be more aptly described in terms of BIG or SMALL, which would be closer to the Magn and AntiMagn functions, rather than INCREASE and DECREASE. Moreover, it may be that the field of economics, and more particularly that of economic fluctuations on the Stock Exchange, is more liable to formalization in terms of lexical functions à la Mel'cuk than other sublanguages. The richness of the co-occurrence potential of terms belonging to this field is also demonstrated by Verlinde (1995) who uses corpus-analysis techniques to come up with most useful generalizations and suggestions for improved lexical entries in the field of economic fluctuations in French. Such a study also shows that, contrary to what one might expect, sublanguage collocations are not easier to circumscribe, even though a restricted field of activity and a limited number of possible events and actions could lead us to think that Mel'cuk's lexical functions would suffice (see also Spencer 1975 who, in his collocational dictionary of legal English, provides multiple access keys). All the dictionaries that have been surveyed in this chapter are general-language collocational dictionaries, which facilitates comparison with the approach adopted in this book, since the Collins-Robert database described here contains, mutatis mutandis, approximately the same types of collocations (bases and collocators) as the BBI, SEC, EAC, LKWB... dictionaries. It is mainly in the area of the lexical-semantic information which has been added to the original dictionary and the diversified access potential that the Liège CR database differs from the other products I have discussed in this chapter. Access to collocations is undoubtedly a crucial feature in collocational dictionaries (see also Cowie 1986:65 and Heid 1994a). We have seen that, in most cases, collocates are listed in the base entries, in keeping with Hausmann's proposals (1979), which take the user's point of view into account. The arrangement of collocations may vary widely, depending on whether the compilers have adopted a syntactic or a more semantically-oriented approach. Some dictionaries, such as LKWB or SEC, have adopted a part-of-speech classification and, within each subclass, list collocates in alphabetical order, without attempting to group synonyms or near-synonyms. Other dictionaries, such as the BBI, adopt a theory-oriented approach in making a distinction between semantic subclasses such as eradication/nullification verbs or creation/activation verbs, but without labelling them explicitly in the entries. The meaning-based organization of collocates may be implicit, as in the BBI dictionary or in DCFC, or more explicit with metalinguistic labels introducing each subclass (Hausmann 1979, Cohen 1986). Since, at the time of going to press, none of the dictionaries under review here is available in machine-readable form, access is limited to the internal structuring of the entries. In an electronic collocational dictionary, collocations can of course be retrieved even more readily since the user may be provided with several access keys: the base, the collocator, the lexical function à la Mel'cuk or the semantic label à la Hausmann, the translation, the part of speech, etc. These multiple access points, which are an essential feature of the collocational database described in this book, may be used in isolation or in combination, unlike what is

46 generally possible in a traditional printed collocational dictionary which usually provides one type of access key only or which simply lists potential collocates without any attempt at formalizing the meaning relationship holding between the constituents of the collocation.

4 Pustejovsky's Generative Lexicon

4.1

The theory

In the past few years, Pustejovsky's theory of the Generative Lexicon (Pustejovsky 1991, Pustejovsky et al. 1993) has aroused a lot of interest in the lexical-semantics community. Pustejovsky's contention is that it is necessary to develop a theory which accounts for the generative devices which give rise to novel word senses. Such a theory should therefore allow for the flexible and creative aspects of language. While a lot of attention has generally been paid to the semantics of verbs (Levin (1993)'s recent book on verb alternations is but one example of this), nouns tend to have been somewhat neglected, although they also give rise to numerous problems when one wishes to tackle the modelling of their semantic structure (see inter alia Chapter 11 on noun alternations and Lexical Implication Rules). Generative Lexicon theory postulates the existence of various representational devices for the formalization of the syntactic and semantic structures of lexical items and of nouns in particular. A word's meaning can thus be characterized by the following aspects which are part of its lexical structure:

4.1.1

Argument structure

This structure characterizes the general syntactic and semantic environment into which a given lexical item can be inserted. Typically, it specifies that the verb 'eat', in its basic sense, subcategorizes for two arguments, viz. a subject NP marked as [+ANIMATE] and a direct object NP marked as [+CONCRETE]. This type of syntactic information can generally be found in learners' dictionaries (LDOCE 1978 even offers a whole range of semantic restrictions in the form of semantic features à la Katz & Fodor; they are only available in the computerized version, however - see Michiels 1982 or Boguraev & Briscoe 1989).

4.1.2

Event structure

This structure specifies the event-based interpretation of a word. Some nouns can refer to transitions, for example, (Pustejovsky et al. 1993 use the nouns merger, acquisition and joint venture), but can also be used to refer to states (see my discussion of [+RESULT] or [+STATE] nouns in section 8.6). Verbs can also be defined in terms of their belonging to one of the numerous classes identified by Levin (1993): change-of-state verbs, verbs of activity, transfer-of-possession verbs, grooming verbs... This information is extremely important since

48 it is argued that the syntactic behaviour of verbs can be predicted on the basis of the semantic class they belong to.

4.1.3

Qualia structure

The theory of qualia forms the crux of Pustejovsky's model. The underlying hypothesis is that nouns have to be described in terms of four roles ('qualia structures') which are not unlike the frames of knowledge representation language described by Bobrow & Winograd (1977). These four roles are to be represented in a structured way to account for the semantics of nomináis. The roles distinguished by Pustejovsky are as follows: A. Constitutive role This role refers to the relation between an object and its constituents (e.g. parts of the object). B. Formal role This role distinguishes the object within a larger domain. It houses information on the colour, shape, dimensions, position, etc of the object in question. C. Telic role This role refers to the purpose which is typically associated with the noun. It can also specify the function of an object. D. Agentive role This role characterizes the factors which bring about the object. The information specified in this slot refers to whether the object is an artifact, for instance, or indicates how it can be created. An example is certainly in order here to illustrate how such information is formalized in the qualia structure. Pustejovsky et al. (1993) show what the entry for book looks like: book (x, y) [Const: information(y)] [Form: bound-pages(x) or disk(x)] [Telic: read(T, w, y)] [Agentive: artifact(x) & write (T, z, y)] Admittedly, part of the information here is clearly encyclopedic and definitely not lexical. As I argue in Chapter 8, I do not see this as a problem since formalizing encyclopedic information makes it accessible by the function mechanism. Moreover, much of this information is crucial in full-text information retrieval (IR), and the qualia structures provide a sort of template with slots which have to be filled in with their respective values. Pustejovsky et al. (1993) argue that such information for book can be used to formalize (and hence predict) the environment in which the noun is generally found. They illustrate their contention with the following sentences: (1) This book weighs four ounces.

49 (2) John finished a book. (3) This is an interesting book. (1) clearly refers to the Formal role of "book" while (3) refers to the Constitutive role. (2) can be considered as ambiguous since it can refer to either the Telic role or the Agentive role (an Agent has to write a book and the primary purpose of a book is to be read), depending on whether John is the writer of the book or a mere reader. Several remarks are in order here. First of all, Generative Lexicon theory is only in its infancy and even its proponents do not always achieve consistency in defining the fundamentals of the model they advocate. Pustejovsky (1991:426) clearly states that "weight" is a possible value of the Constitutive role (next to Material and Part/Component elements), while this very factor is used by Pustejovsky et al. (1993:333) in (1) above to refer to the Formal role of the book! A second thing to note is that it is not entirely clear to me how an adjective such as "interesting" in (3) above can be mapped onto the Constitutive role which states that a book is a container of information (although Bouillon & Viegas 1994 suggest a tentative method for doing just that, starting from the hypothesis that "adjectives will submodify the different qualia roles and the arguments inside them and, depending on the information they modify, will acquire their different senses" - p.40). Third, a comparison with the qualia structure of dictionary as proposed by Pustejovsky (1991) may be useful at this juncture: dictionary (x) [Const: alphabetized-listing (x)] [Form: book (x), disk (x)] [Telic: reference (P, y, x)] [Agentive: artifact (x), compile (T, z, x)] This "definition" is somewhat disappointing from a lexicographical point of view since dictionaries are not necessarily alphabetized listings of words (reverse dictionaries are not alphabetized in a standard way and yet are called dictionaries; there are also thematic dictionaries in which information is not presented in alphabetical order but rather in a thesaurus-like way). Artificial Intelligence (AI) researchers have, for a long time, been confronted with this type of problem related to the existence of prototypes and concepts which deviate from a norm. Moreover, an inheritance mechanism is required to make sure that the noun dictionary inherits the properties of book and that compile can be viewed as a nearsynonym of write, to account for the well-formedness of 'Here is a new dictionary written by Sue Atkins'. Qualia structures have exerted some influence on researchers in computational lexicography. Briscoe et al. (1990) argue that they can be used to provide a semantic interpretation for sentences such as ' John enjoyed the paper', in which the object NP does not denote any particular activity. Verbs such as enjoy, finish, prefer or start can be used with either an NP or an infinitival/progressive VP (I enjoyed the paper vs. I enjoyed reading the paper). These verbs are normally associated with an event or activity (see section 2 above) but the sentence needs some kind of interpretation if the NP is not an event-related noun (e.g. cigarette in 'John finished his cigarette'). The solution suggested by Briscoe et al. to interpret such sentences is to expand them with a present participle inserted between the verb and the

50 object NP. The present participle is derived from the default activity verb which is to be found in the telic slot of the qualia structure of the noun. If the telic role states that the primary purpose of cigarettes is to be smoked, the sentence ' John finished his cigarette' is expanded to 'John finished smoking his cigarette'. Similarly, Ί enjoyed the film' is interpreted as Ί enjoyed watching the film' because watch is the verb which expresses the telic role of the now film. Briscoe et al. argue that the information contained in the telic role can account for the default interpretation, at least in unmarked cases. In marked cases, however, the context should make it possible to assign a different interpretation. Consider the sentence 'Kubrick directed over 10 films but The Shining is the one he enjoyed best'. This should definitely not be interpreted by resorting to the telic role of film (telic = watch), but the eventtype should be found in the Agentive slot (Agentive = make/direct/shoot). Hence, this sentence should be read as 'Kubrick enjoyed [shooting] The Shining'. Another aspect of Pustejovsky's theory is that the Generative Lexicon as a whole is organized around a number of paradigms which nouns fall into. Such paradigms, known as Lexical Conceptual Paradigms (LCPs), include various types of noun alternations like the following (sample list quoted from Pustejovsky (1991:432): a) Count/Mass alternation b) Container/Containee alternation c) Product/Producer alternation d) Plant/Fruit alternation These paradigms account for systematic ambiguities and alternations (e.g. the noun 'glass', which prototypically refers to a container, can also be used as a containee in 'He went back home after drinking several glasses'). LCPs roughly correspond to Ostler & Atkins's (1991) Lexical Implication Rules which are dealt with in chapter 11. To put it briefly, the purpose of organizing the lexicon around such paradigms is to reduce the size of lexical entries to a minimum while capturing useful generalizations. The ultimate aim is to be able to predict the behaviour of lexical items by simply stating the semantic class to which they belong. It should be stressed that Pustejovsky gives new names to concepts which have been used for quite a long time in NLP-related fields. The qualia structures closely resemble the thesaurus-like structures which are so popular in the Information Retrieval community and in circles which advocate the use of relational models (see inter alia Evens et al. 1980, Evens 1988...). Noun alternations are similar to Levin's work on verb diathesis (Levin 1987, 1993) and can be traced back to Apresyan's notes on polysemy (1973). Finally, some of the information contained in the qualia structures can also be expressed in terms of Mel'cuk's lexical functions (Real and Labreal in particular roughly convey the same type of information as Telic although, as I argue in section 8.7, the need arises to devise a function-like relation called Telic in Mel'cuk's formalism to distinguish instrumental/purpose relations). Although many ideas developed in the Generative Lexicon theory are not new and the range of relations it covers is limited, Pustejovsky's framework is interesting insofar as it attempts to provide an integrated theory capitalizing on most of the current hot issues in lexical semantics.

51 4.2

Conclusion

Pustejovsky and his colleagues argue that corpus analysis techniques make it possible to go somewhat further than the mere detection of co-occurrence relations. They argue that the collocation systems which can be arrived at by purely statistical techniques can be enriched significantly if they rest on an elaborate semantic framework such as Generative Lexicon theory and one of its fundamental components, qualia theory. However, this theory, which is the underlying framework of the Acquilex project to which I alluded in the introduction, does not provide as broad a spectrum of lexical-semantic relations as another, somewhat competing, theory called the MeaningoText Theory developed by Mel'cuk. It is to this framework I would now like to turn in the following chapter.

5 Meaning^Text Theory and the Explanatory Combinatory Dictionary

5.1

Introduction

In this chapter, I should like to introduce the basic concepts which underlie the formalism I have opted for in this book. Since I am primarily concerned with turning the machine-readable Collins-Robert dictionary into a large repository of lexical-semantic information, it is essential to draw on a linguistic formalism which is powerful enough to enable the linguist or lexicographer to capture the numerous types of relations that may connect lexical items. I believe that Mel'cuk's theory is, in spite of its limitations, one of the best candidates because it offers an impressive spectrum of relations which, for general language at least, can be used to formalize most recurrent links in the lexicon. Mel'éuk calls these relations lexical functions but, before turning to a presentation of these LFs, it is important to situate them within the broader context of the so-called Meaning-Text Theory (henceforth MTT). The best introductions to MTT are to be found in the first three volumes of the Explanatory Combinatory Dictionary of Modern French (ECD) (Mel'cuk et al. 1984, 1988, 1992). Scores of other articles have also been published on the topic and the repetitive nature of many of them allows me to limit my review to the most interesting and relevant papers (e.g. Mel'cuk 1978a, 1978b, 1988, 1989...). One of the seminal articles is unquestionably Apresyan et al. (1969) which paves the way for all subsequent combinatory dictionaries. It originally dealt with the Russian language and with a new type of dictionary which departs from traditional lexicographical practice insofar as it 'displays the process of text generation as an integral succession of steps', which means that the dictionary Apresyan, Mel'cuk and Zolkovsky have in mind 'must display in an explicit and logical form whatever information may be necessary for the correct choice and usage of words and phrases to convey a given idea in a given speech context' (1969:1). One of the key concepts in the quotation above is undoubtedly the word 'generation' which is the essence of the Meaning-Text Theory. MTT aims at modelling language by developing a set of rules which can convert meanings into corresponding texts. The word "text" is here to be understood as 'any kind of linguistic production (words, morphemes, sentences, phrases, paragraphs, etc). In other words, "this theory describes a natural language as a kind of logical device which associates with any given meaning M the set of all the texts in this language which are expressions of M (and which are consequently synonymous with one another), and with any text T, the set of all the meanings which are expressed by Τ (and which are, so to speak, homonymous with one another)" (Mel'cuk & Zholkovsky 1988:42). This explains why Meaning-Text Theory is also frequently represented as MeaningoText Theory, the double arrow () indicating that the relationship works both ways (synthesis and analysis). Since the correspondence between meanings and texts is not a one-to-one relationship (one given text may have different meanings - polysemy or ambiguity - and one given meaning may be

54 expressed in various ways - synonymy), it is essential to distinguish various levels of representation: 1. semantic representation 2. deep-syntactic representation 3. surface-syntactic representation 4. deep-morphological representation 5. surface-morphological representation 6. phonetic representation These six levels of representation, which account for the transition between a given abstract meaning (semantics) and the actual utterance of a message (with all its phonological/prosodic features), are described in great detail in Mel'cuk et al. (1992:14-31). Each level is independent and has its distinct rules of combination. In this respect, it resembles the stratificational approach adopted in various machine translation systems such as Eurotra (see Allegranza et al. 1991), although the latter system can hardly be said to draw on a large body of (deep) semantic information. Nakhimovsky (1990:5-6) notes that the MTT approach to syntax draws on three sets of concepts: a) a situation and its participants (a.k.a. actants) b) a word and its semantic actants (semantic valence of the word) c) a word and its syntactic actants (syntactic valence of the word). This is very close to Wierzbicka's scenarios as a means of analysis and description of lexical meaning (Wierzbicka 1987, 1988). It is also related to Fillmore's notion of 'case' (Fillmore 1968) and, more recently, of 'frame' (Fillmore 1982a, 1991, Fillmore et al. 1994, Fillmore & Atkins 1994) insofar as the 'situation' represents a piece of reality in which keywords are contrasted with one another and classified as a function of the relationships which hold between the various actants in a given frame. The classical example is the so-called commercial-transaction scene, which involves four semantic actants (Fillmore would use the term 'participant' or 'semantic role' but it is only a matter of terminology): 1. a seller (S) 2. goods (G) 3. a buyer (B) 4. the price/money (M) A speaker who wishes to describe a commercial transaction may resort to a series of verbs such as sell, buy, pay, charge or cost. The choice of a given verb means that the speaker in a way imposes a point of view from which he considers the situation as a whole. All the verbs above can be contrasted as a function of the ways in which they enable the various actants to be realized syntactically. Consider the following sentences which can be considered as paraphrases insofar as they describe the same situation or frame: 1. John sold the car to Peter for £1,000. 2. Peter bought the car from John for £1,000. 3. Peter paid John £1,000 for the car. 4. John charged Peter £1,000 for the car. 5. The car cost Peter £1,000.

55 As can be noted above, the various actants can occupy different positions in the sentence, which means that they are realized differently in terms of grammatical functions. This has strong implications for the lexical description of the verbs, of course. As a matter of fact, for each lexical entry, the number and the nature of the actants need to be specified, together with information on how a given actant is to be realized at surface level (1= subject; 2= second or direct - object; 2'= indirect object). The commercial-transaction scene can be schematically represented as follows (the typical preposition needs to be specified if a given actant is realized as the head of a prepositional phrase):

Β

S

M

G

buy

1

(from)

(for)

2

sell

(to) / (2')

1

(for)

2

pay

1

(to) / (2')

2

(for)

charge

(2')

1

2

(for)

cost

(2')

-

2

1

(Fillmore 1991 - Seminar on "Semantic Interpretation and Construction Grammar", Prague Summer School in Computational Linguistics, Formal and Computational Models of Meaning - July 1991) This way of formalizing relations between semantic and syntactic actants has clear parallels with Mel'cuk's linguistic model. The various actants are identified and labelled in the syntactic description of the lexeme and this labelling is of cardinal importance in the identification of lexical functions associated with a given lexeme. I shall return to the concept of lexical function below (see 5.3.4). Suffice it to say here that verbs such as buy and sell involve the same actants as shown above, though with different labels. If one considers that the lexical functions S„ S2, S 3 and S4 stand for the typical terms for the numbered participants in Mel'cuk's model, we come up with the following sets of relations: S, (buy) = buyer S, (sell) = seller 5 2 (buy) = goods S2 (sell) = goods 5 3 (buy) = seller S3 (sell) = buyer 5 4 (buy) = price S4 (sell) = price It should be noted that Mel'cuk's notion of actant/participant is to be construed more broadly than, say, Tesnière's actant (1959). First of all, the latter, who does not even take prepositional objects into account, is primarily concerned with a purely syntactic approach to the lexical description of items. Second, while buy and sell are trivalent items in Tesnière's framework, Mel'cuk and Fillmore introduce a fourth actant, viz. price, which makes it possible to distinguish buy and steal semantically. It should therefore be realized that the traditional distinction between arguments and modifiers is not crucial in Mel'cuk's model as far as the identification of semantic roles is concerned.

56 The preceding considerations point to the need for a detailed description of the syntactic and semantic properties of lexical items. The examples given so far have typically illustrated the description of verbs, but it is clear that nouns also undergo a similar treatment and need to be characterized in terms of their syntactic and semantic actants as well. If Mel'cuk's MeaningText Theory is concerned with devising formal rules to account for the correspondence between meanings and texts, such a theory must draw on an elaborate mechanism to represent the manifold properties of the lexical elements of a language. The lexical component on which the MeaningoText Model is based is the so-called Explanatory Combinatory Dictionary (henceforth ECD). I now turn to a description of the ECD and of the underlying model for representing lexical-semantic relations. Since it is basically this model which is used in this book to enrich the dictionary database with semantic information, this description of the ECD will be more detailed than is usually deemed necessary in a state-ofthe-art survey.

5.2

The Explanatory Combinatory Dictionary

Mel'cuk & Zholkovsky (1988:43) note that, "until recently, the predominant type of monolingual dictionary was the comprehension dictionary", which is oriented towards making texts understandable (providing tools to decipher a text and discover its meaning). A new generation of active dictionaries, oriented towards helping users produce or encode texts, began to appear in the 1970s and 1980s. Following the EFL tradition established by Palmer (1938), A.S. Hornby (1954) and continued by A.P. Cowie and Crowther for the Oxford Advanced Learner's Dictionary (Cowie 1989, Crowther 1995), the Longman Dictionary of Contemporary English (Procter 1978, Summers 1987, 1995), the Collins Cobuild English Language Dictionary (Sinclair 1987a, 1995) and the Cambridge International Dictionary of English (CIDE, Procter 1995) resort to a set of (more or less explicit) grammatical codes which aim to specify the type of syntactic environment into which a given lexical item may be inserted (e.g. He said a few words to John, but not He said John a few words). These learner's dictionaries are also very good at capturing grammatical collocations (for example, the typical preposition required by a given verb, as in to depend on something or to deal with something). Foreign students are used, or at least should be used, to consulting these reference works to find out information about the complementation of verbs, nouns or adjectives (to suggest and to avoid require an -ing form while to offer and to decide are constructed with a to-infinitive). Very much like learner's dictionaries, the ECD provides the user with information about the syntactic potential of the items it describes. Such information is to be found in the "Government Pattern" section of an entry, to which I shall return below. Next to the more traditional types of lexical information (entry word, morphological information, examples), the ECD is characterized by a formal approach to definition writing. The basic lexicographical principles for definitions are described at length in Mel'cuk (1988:171 ff). The first principle is that the defining language must not contain any ambiguous term. This

57 contrasts with traditional dictionaries which, unfortunately, hardly ever indicate which sense of a word used in a definition is intended (see Michiels 1982 who argues for the use of a controlled defining vocabulary in which each defining lexeme is associated with a given sense; see also p.l 1). As a matter of fact, the suggestion formulated by Mel'cuk (1988:172) to use only one definition pattern such as 'intended for' in the ECD instead of the several variants found in other dictionaries ('serving', 'having the goal o f , 'for', 'that one uses for'...), closely resembles the proposals put forward by Michiels & Noël (1984) to restrict the choice of definition patterns for key semantic relationships (I show in Chapter 7 how this harmonization has been achieved in the Collins-Robert database). Another principle is that the ECD definitions should only contain terms that are semantically simpler than the lexeme they define. This means that if a lexeme L is defined in terms of a word W, W cannot be defined in terms of L (although Cowie 1990 demonstrates that this is not always feasible). The main purpose of this rule is to avoid the pervasive circularity which can be found in any dictionary (e.g. harm = damage and damage = harm). Mel'cuk's definitions also depart from traditional lexicographical practice insofar as they reflect the number of actants involved by a lexeme. This means that each item is defined with respect to the frame it belongs to and its definition includes variables representing its semantic actants. The following definition for teach, quoted from Meyer & Steele (1990:66), illustrates this novel practice: [TEACH] 1.1. X teaches Y to Ζ = X, having knowledge of, or skills in, Y, causes Ζ intentionally and methodically to learn Y As can be seen above, the definition is actually composed of two units. The definiendum (in italics) is a propositional form including the lexical unit and the variables (slots into which the semantic actants have to be inserted). The definiens, or definition proper, tries to make the semantic invariants as explicit as possible, using the variables (actants) in the body of the definition itself. The preceding principles all relate to the 'explanatory' nature of the ECD. The term 'combinatory' deserves much more than just a brief favourable comment since it is precisely the combinatory nature of the ECD that has made this dictionary so famous. As will be shown below, one of the essential aspects of the ECD is that it aims to describe the grammatical and lexical collocations of the lexeme as precisely and systematically as possible. To do so, it resorts to the concept of 'lexical function', to which I shall return below in greater detail since it provides a theoretical basis for the construction of my database. Before turning to a description of the general structure of an ECD entry, it is useful to briefly summarize the main properties of the ECD: A. It is a production-oriented dictionary insofar as it supplies all the necessary information to turn a given meaning into a well-formed and idiomatic text. B. It is a semantically-based dictionary insofar as it resorts to a special defining language to account for the various syntagmatic and paradigmatic relations a lexeme can enter into.

58 C. It is a combinatory dictionary insofar as it provides a systematic account of the cooccurrence (collocational) potential of the lexeme. D. It is a formal dictionary insofar as the lexical information it contains is presented in a formalized, explicit, consistent and structured way. E. It is a theorv-oriented dictionary insofar as it is to be viewed as the lexical component of a broader theoretical framework, viz. MeaningoText Theory. F. It is a general-purpose dictionary insofar as it includes all types of lexical information (morphological, semantic, syntactic information, co-occurrence potential, definitions, examples...), unlike specialized dictionaries which usually focus on specific information types.

5.3

Structure of an E C D entry

The Explanatory Combinatory Dictionary is the lexical component of Mel'cuk's MTT model. Explanatory means that the dictionary describes what a lexical item means (very much like any other dictionary, albeit more systematically and in a more formal way, as is stressed above). Combinatory means that an ECD entry has to make it clear how the item may be combined with other items in a given context. The term combination covers various types of information since the dictionary specifies morphological, syntactic and collocational combinations. An ECD entry consists of six major zones which can be further sub-divided. The six zones are labelled as follows: 1. Introductory Zone 2. Semantic Zone 3. Syntactic Zone 4. Lexical Functions Zone 5. Examples Zone 6. Phraseology Zone A detailed description of the contents of each zone can be found in Mel'cuk et al. 1984 and in Meyer & Steele (1990:62-94). Since I am not concerned with the construction of an ECD proper but only intend to draw on the formal MTT apparatus for the description of lexicalsemantic relations in the Collins-Robert dictionary, I will not reproduce these detailed descriptions here. I will merely give a brief summary of what the user may expect to find in each section of an ECD entry.

5.3.1

Introductory Zone

This zone contains information on the headword (its definition number), its morphological properties (irregular forms, information on the non-existence of some forms, etc). It also specifies the part of speech of the lexeme and provides syntactic information that is common to all the definitions of a lexeme. The introductory zone also contains usage labels, if

59 appropriate (vulgar, taboo, colloquial, formal, etc). The following example illustrates the kind of information one may find in this section: EXPLAIN verb, no dative movement transformation (one cannot say * He explained me why he had not answered the question)

5.3.2

Semantic Zone

As its name indicates, this section describes the meaning of a lexeme. It can be subdivided into two parts: the definiendum (the item which is being defined) and the definiens (or definition proper). The definition for teach which is used above (see p.57) in the general presentation of the ECD typically illustrates the strategy adopted by Mel'cuk and his colleagues. This definition, quoted from Meyer & Steele (1990:66) is reproduced here for the sake of clarity: [TEACH] 1.1. X teaches Y to Ζ = X, having knowledge of, or skills in, Y, causes Ζ intentionally and methodically to learn 1 Y. The first part (up to the = sign) is a propositional form which introduces the semantic actants involved in the process, action or state referred to by the lexeme. These actants are specified in the form of variables (capital letters: Χ, Y, Z). The definiens explains the meaning of the lexemes in terms of simpler meaning components (which, as noted by Mel'cuk 1989, must eventually lead to the discovery of semantic primitives). As can be seen, the lexeme is considered as a predicate with 'slots' for arguments represented by the variables. Another thing to note is that the words used in the definition are disambiguated (learn in the example above is not merely the verb learn, it is the verb learn-sense 1). This of course presupposes that all the words appearing in the definition have been treated in the ECD, which is far from being the case since the three volumes of the French ECD which Mel'cuk and his team have produced so far cover around 200 vocables (in MTT parlance, vocable corresponds to the traditional notion of lexical unit). Surprisingly, no one seems to have noted the similarity between Mel'cuk's definitions and the new style of definition writing advocated by the Cobuild English Language Dictionary (Sinclair 1987a). Sinclair (1987b), which describes the genesis of the Cobuild project, does not contain any specific reference to Mel'cuk's work, but it seems sensible to assume that Mel'cuk has, to some extent at least, inspired the Cobuild lexicographers. To quote Hanks (1987:117-118), Cobuild's Managing Editor in the 1980's,"each explanation consists of two parts; the first part (...) actually places the word being explained in a typical structure whereas (...) the second part consists of a more traditional-looking dictionary definition and identifies the meaning". It is instructive to compare the definition for teach cited above to illustrate Mel'cuk's model and the Cobuild entry which I reproduce below:

60 teach 1 If you teach someone something 1.1 you give them instructions so that they know about it or how to do it. As can be seen, the first part of the definition also gives details of the semantic valence of the verb. You, someone and something (which correspond to the variables Χ, Ζ and Y respectively) can also be considered as slots in the predicate-argument structure of the verb. Of course, the information is not formalized in the same way as in Mel'cuk's ECD: the Cobuild user has to rely on his or her intuition, judgment and experience to discover that them and someone are co-referential while it and something are different names for the same variable. Cobuild is a learner's dictionary and, as such, targets a different kind of public altogether. Mel'cuk's theory-grounded dictionary has often been criticized for its nonpedagogical characteristics: in many cases, it is indeed hardly decipherable for the layman but it is undeniable that it has paved the way for better learner's dictionaries such as Cobuild.

5.3.3

Syntactic Zone

This section aims at establishing the correspondence between the semantic actants and the syntactic actants of a lexeme. To do so, it draws on the so-called Government Pattern (GP) table. Mel'cuk & Zholkovsky (1988:53) define GP as "a table in which each column represents one semantic actant of the lexeme (marked by the corresponding variable), and each element in the column represents one of the possible surface realizations of the corresponding syntactic actant". Very much like Fillmore's table referred to above (p.55), the GP table will indicate whether an English verb is followed by an infinitive or a gerund, whether it subcategorizes for a THAT-clause or selects a specific preposition, etc. Roughly speaking, it could be argued that this table corresponds to the various grammatical codes which are meant to formalize the syntactic environment of lexical items in English monolingual learner's dictionaries such as Cobuild (Sinclair 1987a), LDOCE (Procter 1978/ Summers 1987), OALD (Cowie 1989) or CIDE (Procter 1995). The ECD differs from the latter dictionaries in several respects, however. First, the GP table also gives information on the possible realizations of the subject, while learner's dictionaries usually provide very little or no information on the first actant. A second difference is that, next to the GP table, the syntactic zone includes possible restrictions specifying that certain combinations of actants are, for example, impossible or undesirable. In EFL dictionaries, such unacceptable combinations are specified by default but this presupposes that all possible constructions have been listed, which is clearly not always the case (see Fontenelle & Vanandroye (1989:29) for a few examples of errors of omission in the assignment of LDOCE grammar codes). Third, unlike learner's dictionaries, the ECD explicitly links the semantic actants with their surface-syntactic realizations. Cobuild, OALD, CIDE and LDOCE all resort to an elaborate system of grammatical codes to account for what might be called 'frame' information, i.e. information on the syntactic environment into which a lexical item can or must be inserted (see Michiels 1982 or Boguraev & Briscoe 1989 for a detailed analysis of the LDOCE system of codes; see also Aarts 1991 for a comparative analysis of Cobuild, LDOCE and OALD). In the ECD, every syntactic valence is supplied with a semantic interpretation. Apresyan et al. (1969:5)

61 note, for example, that the noun aggression involves two actants. The first column of the government pattern for this word is then interpreted as 'the one who commits aggression' while the second column refers to the victim (the one against whom aggression is committed). An example may be in order here to illustrate the GP table and the restrictions which can be associated with it. I reproduce the Government Pattern table for the entry teach (1.1) as defined above (from Chen 1990:160). For the sake of clarity, I also reproduce the definition which makes it possible to see the relationship between the variables in the definition proper and their treatment in the GP table: teach 1.1 X teaches Y to Ζ = X, believed to have knowledge of, or skills in, Y, causes Ζ intentionally and methodically to learn 1 Y.

1= X 1. Ν

2 = Y 1. Ν 2. 3. 4. 5.

to Vinf that PROP WHinterr Vinf WHinterr PROP

3 = Ζ 1. Ν 2. to Ν

This table should be read as follows: the first actant of teach (X) takes the form of a noun phrase (N). The third actant (Z) can be either a noun phrase or a prepositional phrase introduced by to. The second actant (Y) can be realized in several ways, ranging from a simple NP or a to-infinitive to a THAT-clause or a clause introduced by a WH-pronoun. It is possible to refer to a specific realization of an actant in the table with a symbol of the form Oy, in which C (which stands for Complement) refers to the actant, "i" is the number of the actant and "j" is the number of one of the possible implementations of this surface-syntactic actant (Meyer & Steele 1990:71). Each example can then be labelled so that the type of syntactic construction which is being illustrated is made explicit: C,,: John is not teaching today. C u + C 2 ,: John teaches English. C u + C2 I + C 3 2 : John teaches English to law students. C „ + C 3 , + C 2 4 : John taught me how to drive. The restriction zone can specify the following constraint on the co-occurrence of actants: C 2 4 or C2 5 without C3: impossible This accounts for the ungrammatical nature of: * John taught what we were supposed to do. (vs. the correct John taught us-what we were supposed to do). C2 j with C3 is possible, however, as is testified by the following acceptable sentence:

62 The philosopher taught that all truth was relative.

5.3.4

Lexical Functions Zone

This zone represents a major part in an ECD entry since it addresses the problem of the representation of restricted lexical co-occurrence. To describe the lexical combinability of a keyword, the ECD resorts to the apparatus of what Mel'cuk, Apresyan and their colleagues have called 'lexical functions' (henceforth LF). The core of the idea is relatively simple. In Lat. dominus/domina, servus/serva, a change of gender is expressed morphologically. In the absence of vir/*vira and *feminus/femina, it is reasonable to accept that the relation vir/femina, though not expressed morphologically, is the same at the level of semantics. Along with Wilkins (1668) who posited the existence of abstract particles or, much more recently, Katz & Fodor (1963) who developed the concept of markers, the MeaningoText Theory would conceive of a lexical function called 'Female' to capture and describe this gender difference as part of the lexical meaning. Mel'Cuk (1994) defines this concept formally as follows: Lexical Function [-LF] A function f associating with a lexical expression L a set f(L) of lexical expressions is called a Lexical Function if and only if one of the following two conditions is met: A. Either f is applicable to several Ls; in this case, for any two different L, and L2, if f(L,) and f(L 2 ) both exist, then: 1. Any elements of f(L,) and of f(L 2 ) bear an almost identical relationship to L, and L2, respectively, as far as their meaning and the Deep-Syntactic role are concerned 2. At least in some cases, f(L,) * f(L 2 ) B. or f is applicable to one L only (maybe to two or three semantically related Ls). Mel'òuk calls LFs which satisfy criterion A normal LFs whereas LFs which meet criterion Β are called degenerate LFs. In the same paper, Mel'òuk introduces two other criteria which make it possible to distinguish standard and non-standard LFs. He defines Standard Lexical Functions as follows:

5.3.4.1 Standard Lexical Function A normal LF is called a Standard Lexical Function if and only if the following two (additional) conditions are simultaneously met:1 1

The standard notation for lexical functions is f(X)=Y, where X is the argument of the function f and Y is the value.

63 3. f is defined for a relatively large number of arguments (in other words, f has a vast semantic co-occurrence and is general enough to be applicable to many different meanings). 4. f has a relatively large number of lexical expressions as its possible values. Non-standard LFs are either degenerate functions or LFs which do not satisfy criteria 3 and 4. At this juncture, it may be worthwhile to compare Mel'cuk's rather abstract and formal definition with the explanation provided by Steele & Meyer (1990:41): "A lexical function (f) is used together with a keyword to signify a set of either phraseological combinations related to the keyword or those words which can replace the keyword under certain conditions". While Mel'cuk is concerned with a formal description of meaning relationships between lexemes, Steele & Meyer give a more pedagogically-oriented definition, taking into account the fact that the arguments and values of a lexical function can either be combined or substituted (see the discussion about syntagmatic vs. paradigmatic relations below). To illustrate the type of phraseological combinations Steele & Meyer have in mind in their definition, one can consider the Magn lexical function which links the keyword (or argument) to the items that express the highest degree of the concept denoted by the keyword, as in: Magn (thirst)= unquenchable Magn (liar)= arrant Magn (bachelor)= confirmed

Magn (smoker)= heavy Magn (pain)= gnawing, excruciating Magn (rely)= heavily

Another function is the Opern LF which provides for the keyword (X) a semantically impoverished verb (Y) that takes X as its direct object (the subscript refers to the nth actant which is taken as subject). The following examples are cases in point: Oper, (complaint)=lodge Oper, (support)= lend

Oper, (attention)= pay Oper2 (attention)= attract

The relevance of the concept of LFs in a multilingual perspective is obvious as it provides a systematic and consistent way of representing restricted collocations. Consider the following expressions: English: to ask a question French: poser une question Dutch: een vraag stellen German: eine Frage stellen Spanish: hacer una pregunta It can be argued that, in an encoding perspective, the choice of the verb is lexically determined by the base. In a decoding perspective, it is impossible to assign a sense to the verb without considering its direct object. The examples above illustrate what Gross (1981) calls 'support verb constructions', i.e. constructions in which the role of the verb is limited to 'supporting' the direct object with which it co-occurs, thereby establishing a link between the object and the subject and conveying information on tense, person and aspect (see section 2.2.3). As I argue in Chapter 2, such constructions often prove to be a stumbling block for

64

foreign language learners who frequently have difficulty in selecting the appropriate verb ( *faire or demander une question are out in French while make/do a question definitely sound un-English). In terms of Mel'öuk's lexical functions, these idiosyncratic relationships can be represented as follows (based on the table on p.20): Fr. Oper, (question) = poser Eng. Oper, (question) = ask Sp. Oper, (pregunta) = hacer

Ge. Oper, (Frage) = stellen Du. Oper, (vraag) = stellen

Interestingly, a lexical function can be viewed as a structuring element which provides a basis for a relation of proportionality. Cruse (1986:118ff) establishes a parallel between traditional numerical proportionalities and lexical proportional series. Starting from a single four-element cell such as: A Β I I

I I

C D the relations between the elements can be determined and expressed as follows: A is to Β as C is to D C is to A as D is to Β Replacing the letters by lexical items and extending the cell along one of its axes, one may come up with the following proportional series: bachelor confirmed I I

liar I I

I I

arrant I I

smoker heavy The Magn function is the lexical-semantic relation which connects the elements of the left column with items on the right (bachelor is to confirmed as liar is to arrant...). It should be stressed, however, that proportional series do not show that the choice of the adjective is arbitrary. 5.3.4.2 Paradigmatic vs. syntagmatic relations Unlike what is generally found in superficial accounts of Mel'òuk's theory, lexical functions should not necessarily be equated with relations whose purpose is to capture various kinds of lexical co-occurrence only. LFs can undeniably be used to account for a whole range of general-language restricted collocations. It should be realized, however, that several LFs also account for paradigmatic relations. Steele & Meyer's definition, which is cited above (1990:41), illustrates the dual character of lexical functions which can also "point to the words which can replace the keyword under certain conditions". In the same chapter, Steele & Meyer (1990:43ff) suggest a tentative classification of LFs, breaking them down into paradigmatic and syntagmatic functions. In their classification, the paradigmatic class consists of functions or relations which can serve as the basis of substitutions for the keyword itself. It includes the

65 basic logical relations of synonymy or antonymy which are denoted by the Syn and Anti LFs respectively: Syn (timid)= shy Anti (like)= dislike This class also includes various types of derivations (S0 = substantive derivate; A0 = adjectival derivate; Adv0 = adverbial derivate - each function is described in detail below): S0 (hate)= hatred A0 (sun)= solar Adv0 (ease)= easily It is clear that words such as sun and solar stand in a paradigmatic relationship with each other. Lyons (1968:73ff) notes that paradigmatic refers to the set of relations a given linguistic unit can enter into with all the units which can also occur in the same context. As a rule, then, the value of a paradigmatic function is used instead of the argument of this function while the values of syntagmatic functions are likely to co-occur with the keyword, thereby constituting its context (Lyons talks of 'potentiality of occurrence' for the latter type of relation). This dichotomy is not as clear-cut as it may seem, however, as witnessed by the recent work carried out by Justeson & Katz (1991a,b), whose statistical analyses demonstrate that antonymous adjectives co-occur in texts much more significantly than was previously imagined (and not only in prototypical fixed phrases or binomials such as black and white). This study is of course concerned with the co-occurrence of antonyms in discourse but, within the language system as such, antonymy is definitely a paradigmatic (choice) relation. Although I fully agree with Steele & Meyer that some of the lexical functions definitely belong to the paradigmatic group, I cannot but disagree with them when they suggest that qualifying functions such as Magn (=very, to a high degree), Bon (=good; standard expression of praise) or Ver (=as it should be) should also be considered as paradigmatic functions. The three examples they give (p.45) clearly testify to the strong collocational link holding between the arguments and their values: Magn (temper)= hot Bon (argument)= forceful Ver (knife)= sharp Steele & Meyer admit that "the placement of a function in a particular class is (...) necessarily somewhat arbitrary" (p.42) but I am afraid I cannot see any sense in making the above distinction (paradigmatic vs. syntagmatic) if the classification they arrive at contradicts the definitions of the classes on which it is based. To give but one other example, Mult (grape) = bunch and Sing (soap)= bar, cake are also considered as paradigmatic relations in their classification even though anybody would identify a bunch of grapes and a bar/cake of soap as well-formed collocations and hence as syntagmatic relations. Some examples are difficult to classify, however. The collocation a loaf of bread undoubtedly illustrates a syntagmatic

66 relation (Sing(bread)=loaf), but in She bought a sliced loaf, loaf is clearly in paradigmatic relation with bread. In an unpublished manuscript on the relationship between phraseology and language teaching, Mel'öuk (1992b) makes a similar distinction but sticks to the traditional definition of paradigmatic (i.e. substitutive) relationships as opposed to syntagmatic (i.e. combinatory) relationships. He also further subdivides each category as a function of the part of speech of the value of the function. For example, the Oper LF introduced above is a verbal function (since it formalizes the notion of support verb) which belongs to the syntagmatic class (the argument and the value can be combined to form a verb phrase - Oper (pressure)= exert; Oper (mistake)= make...). 5.3.4.3 Compound lexical functions Mel'ôuk has identified around 60 lexical functions (see below for a detailed description). His contention is that they can be used to formalize most of the systematic and recurrent lexicalsemantic relationships in a general-language lexicon. It should be realized, however, that the spectrum of lexical relations cannot be covered by these LFs taken in isolation only. Mel'éuk's apparatus therefore makes it possible to combine syntactically related lexical functions to form what Mel'cuk & Zholkovsky (1988:64) call compound lexical functions. The meaning of a compound LF can be derived from the meanings of the simple LFs it is made up of. A classic example is the combination of an adjectival function with Anti, which yields negative items, as in the following examples (Ver= as it should be): AntiMagn (sum)= modest AntiVer (suggestion)= unreasonable Anti can appear in isolation to yield antonyms, as in: Anti (like)= dislike Anti (hot)= cold Other functions, like Plus (more) or Minus (less), hardly ever occur in isolation: they are usually associated with Pred, which means 'to be the keyword of the function' and with another LF expressing the aspectual component of a collocate. Aspect is conveyed through the use of the following functions which express the phases of a process: - Incep (to express the beginning of a process or state) (inchoative verb) - Cont (to express the continuation of the process or state) (durative verb) - Fin (to express the end of a process or state) (terminative verb) Causative operators may also come into play in order to build compound LFs: - Caus: to cause - Perm: to permit/allow The latter two LFs convey the diathetic value of the verb insofar as they account for the intervention of an external actant who causes someone or something to do something (Levin

67 1993:2 defines diathesis alternations as "alternations in the expression of arguments, sometimes accompanied by changes of meaning"). Cowie & Mackin (1993) and Cowie (1997) note that there are in English multi-purpose (delexical) verbs which regularly occur with each other to convey such aspectual meaning. The same verb system is manifested with the same LFs for several bases and the role played by prepositions in signalling the total meaning of the combination should not be neglected (into corresponds to Incep; out of corresponds to Fin, etc). The following examples are cases in point: Incep: come into play Cont: stay in play Fin: go out of play

Causlncep: bring into play CausCont: keep in play CausFin: put out of play

The following examples illustrate various types of compound (or complex) LFs: CausPredPlus (price)= raise, increase IncepPredPlus (price)= rise, increase, rocket, surge CausPredMinus (anger)= assuage, soften, soothe FinFunc0 (anger)= subside, wear off FinOper, (influence)= lose Cont Oper, (habit)= keep IncepOper, (habit)= get into It should be noted that this depth of analysis enables the lexicographer or the linguist to distinguish the verb raise and rise in the examples above, since the former verb appears as the value of the function CausPredPlus (»to cause to be more) while the latter is the value of the function IncepPredPlus («to begin to be more). In a pedagogical perspective, it is clear that Mel'ôuk's apparatus is a powerful tool to account for these subtle distinctions which often pose a number of problems for foreign-language students. Of course, the distinction between transitive and intransitive is a valid one, but it is only syntactic and does not take into account the semantic differences between the two verbs. The explanation is sometimes to be found in a diachronic account of the evolution of these verbs but a synchronic perspective must resort to the aspectual and diathetic concepts of inchoativity or causativity to prevent students from producing ill-formed sentences such as: * The government has decided to rise the price of petrol. * The price of bread raised three times last year.

5.3.4.4 Fused lexical functions The distinction between syntagmatic and paradigmatic functions which is alluded to above is not a clear-cut one, as I pointed out when discussing the classification proposed by Steele &

68 Meyer (1990). At this juncture, mention should also be made of the notion of fused lexical function. The value of a fused LF actually covers the meaning of the keyword and hence cannot co-occur with it. In the following example, AntiMagn (wind)= light it is clear that the adjective light modifies and co-occurs with the noun wind. However, there is a word which refers to a light wind, viz. breeze. The relationship between wind and breeze can also be accounted for in terms of the AntiMagn LF but, since the two words are not likely to co-occur in a well-formed phrase, the relationship is represented as follows with the symbol

// : AntiMagn (wind)= light // breeze In Mel'ôuk's ECD, the symbol // separates the fused values (on the right) from the non-fused ones.

5.3.4.5 List of standard lexical functions Although lists of lexical functions can be found in most accounts of the MeaningoText Theory (Mel'öuk et al. 1984, 1988 and 1992; Steele & Meyer 1990; Mel'cuk & Zholkovsky 1988; Frawley 1988a,b...), it is both useful and necessary to introduce each function briefly in this section. Moreover, most of the lists to be found in the published literature usually contain only French or Russian examples, with the notable exception of Steele & Meyer's illustrative material in English (1990:52-59). Since the relational lexicon I describe in this book is based on Mel'cuk's lexical functions and is derived from an English-French dictionary, it is essential that the reader should be in a position to fully understand the underlying principles of my database. The functions are listed below in alphabetical order. 1. A^ adjective derived from the keyword. A„ (sun)= solar \ (spring)= vernal Af, (law)= legal 2. A,. A-,. A,...: typical modifier for the first, second, third... actant A, (surprise)= surprised A2 (surprise)= surprising It should not be forgotten that the choice of the numerical subscript depends on the identification of the actants that play a part in the scenario triggered by the keyword. This

69 means that, in Mel'cuk's ECD, the lexical co-occurence zone and the lexical functions have to be interpreted in terms of the information contained in the definition and in the Government Pattern Zone. The noun surprise, for example, would be defined as follows: Surprise of X at Y: a short-term feeling experienced by X and caused by an event Y which has taken place and which X did not expect. The government pattern zone will specify that X is the first actant and that it usually takes the form of an o/-phrase or an Anglo-Saxon genitive (John's surprise or The surprise of the people who attended the lecture). Y is the second actant and usually appears as a noun in a prepositional phrase introduced by at (e.g. There was some surprise at his return). Therefore, the adjective that characterizes the first actant of the noun surprise is surprised (A,) while the adjective which refers to the second actant is surprising (A2). The distinction between A0 and An, where η is the number of the actant which is considered, may not seem relevant at first sight, but is perfectly justified. The nouns sun or spring do not involve any actant, which accounts for the subscript 0 for the derived adjectives solar and vernal. For a noun such as stomach, the definition must undeniably refer to an actant, namely the person or animal whose stomach is being referred to (stomach: part of the body of a person or animal X in which digestion takes place...). Two adjectives are closely related to the noun stomach: ventral (of or relating to the stomach or belly) and potbellied (having a large rounded stomach). The former adjective clearly does not qualify any actant of the keyword while the latter refers to the person who has this type of stomach (potbelly). This means that the relationship between the noun stomach and these two adjectives must be expressed by two different lexical functions: A0 (stomach)= ventral A, (stomach)= potbellied

A0 (ventre)= ventral A, (ventre)= ventru

For further detail on these functions and the ways in which they can be identified in the definitions of a bilingual dictionary, the reader is referred to Chapter 7. 3. Able,. Able-,...: adjective denoting a capability of the first, second... actant to perform an action inherent in the keyword. In many cases, there is a correlation with an active or passive meaning, as in the following examples: Able, (read)= literate Able, (read)= legible The adjective literate can be paraphrased as 'who is able to read' while legible means 'which can be read' (assuming that the verb read - the keyword - takes two actants: X reads Y). 4. Adv„: adverb derived from the keyword.

70 Advg (honest)= honestly 5. AdV|, Adv·,...: typical adverbial which characterizes the behaviour of the first, second... actant of the keyword. Adv, (dismay)= in dismay Adv2 (dismay)= to [someone's] dismay 6. Anti: exact or near antonym Anti (respect)= disrespect Anti (like)= dislike Mel'cuk has introduced a further classification, using the subscripted symbols -, c , and n to subdivide this type of relation. The subscript = indicates meanings which are broader than the meaning of the keyword. The symbol c indicates a narrower meaning and n denotes an intersecting meaning. The examples cited by Steele & Meyer (1990:53) are:

Anti c (respect)= scorn Anti., (hope)= doubt, fear, dread Anti n (scorn)= consideration These three symbols are also used for the Syn LF, which denotes synonyms or quasisynonyms (see p. 89). In the database described in this book, however, I have deliberately ignored such refinements because they need to be used in relation to an elaborate system of definitions for the keywords. Since no definitions are provided for the keywords in my database, such a level of refinement would have been achieved on a purely arbitrary basis, which is why I preferred to stick to the classical notion of antonymy or synonymy. 7. Bon: adjective referring to a standard expression of praise or approval. Bon (advice)= sound, sensible, good Bon (cause)= worthy, deserving 8. Cap: noun denoting the leader or chief of something Cap (school)= head Cap (ship)= captain Cap (university)= chancellor, president 9. Caus: to cause Caus (rise)= raise CausFunc, (problem)= create, pose

71 CausPredMinus (price)= drop, decrease As can be seen here, Caus is often, though not necessarily, used in combination with other LFs such as Fune, Oper, Pred, Plus or Minus (see below). Moreover, Caus is frequently associated with the phasal operators since causing something necessarily involves a change of state with a beginning, a continuation or an end. Caus will then be found in the following compound LFs: Causlncep, CausCont and CausFin. For the sake of clarity, Mel'cuk suggests abbreviating Causlncep to Caus while CausFin is rewritten as Liqu (see below). 10. Centr: refers to the centre or middle of something. Centr Centr Centr Centr

(wheel)= hub, nave (atom)= core (town)= heart (problem)= crux, knot

11. Cont: to continue This phasal operator is most often found in combination with the Oper, Fune or Labor LFs. ContOper, (contact)= keep, stay, remain [in contact with sb] ContOper, (influence)= maintain ContOper2 (influence)= remain [under the influence of] CausContFunc 0 (peace)= maintain 12. Contr: contrastive term Contr (heaven)= earth Contr (zenith)= nadir As is argued in section 7.9, knowledge of contrastive terms, or polar opposites (Lyons 1977:695ff), is crucial to perceive how a given lexical item schematizes the world. This function is therefore essential in a frame-semantics perspective as developed by Fillmore (1982a) and Fillmore & Atkins (1994) (see p.167 for further details). 13. ConVjj: converse terms The value of this function is a lexical item which has the same meaning as the keyword but with the actants i and j inverted. Conv21 (more)= less Convj, (precede)=follow A typical example of a converse term is the passive voice of a verb: Convji (build)= be built

72 The examples above all involve lexical items with two actants, hence the subscripts 21 to denote the permutation of the first and second actants while the overall situation is left unchanged. It is also possible to have converse terms with more than two semantic actants, as in: Conv3214 (sell)= buy This notation requires some explanation. The verb sell involves four actants, as can be inferred from the following propositional form (see also the table above, p.55): A person X sells an object Y to a person Ζ at a price W. The actants Χ, Υ, Ζ and W are equated with the first, second, third and fourth actants respectively. The notation 3214 applied to the LF Conv means that the third actant of sell (Z) becomes the first actant of buy while the first actant of sell (X) becomes the third actant of buy. The second (Y) and fourth (W) actants of these two verbs remain unchanged. This enables us to infer that the logical proposition of the verb buy has the following form: a person Ζ buys an object Y from a person X at a price W. As can be seen, the syntactic functions of the names of the participants have changed because there has been a change of information structure but the situation (or frame, script, scenario or whatever name we can give to it) remains basically unchanged. 14. Culm: term referring to the culmination of something Culm (anger)= paroxysm Culm (glory)= summit, height Culm (absurdity)= height Interestingly, this function was not present in the earlier presentations of Mel'cuk's ECD. In Apresyan et al. (1969:12), for example, such combinations were accounted for in terms of the Centr LF. The example cited in this paper is: Centr (glory)= summit The definition for Centr was: the name of the 'central', 'culminating' part of an object or process. It is only later that Mel'cuk decided that the Centr LF should be further subdivided into two functions, Centr and Culm, to account for fairly similar though admittedly different types of co-occurrence. It seems that Culm is reserved for nouns which can be evaluated along a scale and which therefore can collocate with verbs such as increase or decrease. Conversely, Centr is reserved for items which represent a geographical (spatial) or temporal entity whose centre or middle part can be identified. Note that it is also used to capture metaphorical sense extensions (Centr(problem)=crux, knot). This metaphorical interpretation of a function, which is by no means infrequent in Mel'cuk's formalism, is extremely dangerous insofar as it allows too much freedom and severely limits the formal nature of the description. 15. Degrad: to degrade, to get worse.

73 Degrad (milk)= go/turn sour Degrad (marriage)= fall apart This verbal function, which always yields a verb when used in isolation, is very often used in conjunction with A0 to generate the adjectives which can be combined with nouns denoting foodstuffs and which render the idea of absence of freshness, as in: A0Degrad (butter)= rancid A 0 Degrad (egg)= bad, addled, rotten This lexical function clearly illustrates the difficulty of drawing a dividing line between collocations proper and selection restrictions. Heylen (1993) focuses on this very function to show that the distinction is not a clear-cut one: "From a definitional point of view, the difference between collocations and selection restrictions may be clear: in the former case, we are dealing with the selection of lexical items but in the second case, we are dealing with semantic selection. A nice example of a selection restriction is 'rancid' which can only be said (by definition, or in its primary sense) of 'oily' or 'fatty' foodstuffs. Notice that in this case we would say that 'rancid' selects nouns that denote objects with these properties, whereas as collocational constructs we would reverse the direction of selection. From a statistical point of view, we may have a p-collocation here (statistically significant 'probability' or 'performance' collocation), and we could also find a lexical function for it (say Degrad), but strictly speaking, it is probably not a collocation in our sense. In practice, it may be hard to decide on the distinction between selection restrictions and pure collocations and we will sometimes treat cases of selection restrictions as collocations" (Heylen, 1993:32). Another thing worth noting is that there may be more than one way of degrading or decaying. Gentilhomme (1992:106) remarks that, in French: "le pain peut rassir, durcir, sécher mais aussi moisir, le vin [aigrit] mais peut se casser; le miroir [.ve ternit et peut] se piquer, le métal [rouille mais peut aussi] s'oxyder". Gentilhomme's contention is that it is necessary to account for these variants in a more finegrained analysis. At first, the basic feature characterizing the combination, viz. Degrad, has to be extracted. Nuances can subsequently be accounted for by narrowing down the basic LF with specific indices (s'oxyder, for instance, is more technical than rouiller). This problem of overgenerality of Mel'cuk's theory is addressed in section 5.3.4.6 devoted to sub- and superscripts. It should also be pointed out that a lexical function such as Degrad cannot account for the existence of some arbitrary variants which originate as alternative metaphors (aigrir vs. se casser). 16. Epit: standard (empty) qualifier The status of this function is not very clear as it is not used extensively in the ECD. Mel'cuk gives the following example in volume III (p. 127): Epit (océan)= immense

74 Steele & Meyer (1990:54) add that the qualifier represents part of the meaning of the keyword as in: Epit (pedal)= foot. These two examples show that Epit is used to denote a redundant descriptive word or phrase acting as a marker of a literary stylistic level. 17. Equip: noun denoting the team or crew. Equip (ship)= crew Equip (hospital)= staff Interestingly, Equip and Cap can be considered as 'organizational qualifiers' (Steele & Meyer 1990:45). Organizations can indeed usually be described in terms of a hierarchy with a leader at the top (Cap) controlling a body of members (Equip). 18. Excess: verb meaning 'to function excessively'. Excess (heart)= palpitate Excess (motor)= race This function is often used in combination with a subscripted modifier such as trem to indicate that the excessive functioning is related to trembling, for instance: Excesstrem (earth)= quake Excesslrem (voice)= shake 19. Facto 12 : denotes a verb meaning 'to be realized', which takes the keyword as subject and the first, second... actant, if any, as direct object. AntiFact, (memory)= fail Facto (dream)= come true AntiFactf, (plan)= collapse Fact means 'to function, to work, to be realized, fulfilled or implemented' and can be used in conjunction with aspectual or causative operators to form complex lexical functions. Compare the following examples which illustrate the use of the LF Fact with artefacts: FinFact0 (brake)= fail CausFacto (brake)= operate, apply 20. Figur: denotes a standard metaphor of the keyword. The combination of the keyword with the value of this function usually results in a narrower synonym of the keyword itself.

75 Figur (smoke)= cloud [of smoke] Figur (passion)= flame Figur seems to be used mainly to formalize frozen metaphors and conveys a stylistic, not a semantic, value. It could therefore be used next to other functions which specifically address the semantic dimension of a collocation. When one wishes to indicate that the verb which expresses the typical sound made by an elephant is trumpet, one has to represent this in the following way: Son (elephant) = trumpet (see below). Trumpet is clearly a metaphor but its basic meaning has to be extracted and represented in terms of the Son function. Similarly, when we say that prices fall or that someone's enthusiasm has withered, we use other metaphors for which Figur could also be applied, next to IncepPredMinus or FinFunc0. We may therefore expect to find Figur alongside other functions when trying to provide a full picture of the collocational environment of a given item. Steele & Meyer (1990), for instance, use two different functions to code the same pair of collocates, viewing the situation from two different angles (stylistically with Figur and semantically with Culm): p.46: Figur (despair)= depths [of despair] p. 54: Culm (despair)= depths [of despair] Culm obviously applies to collocations that can also be described in terms of metaphors, as is also evidenced by the examples given above (14). I shall return to the problem of the relationship between collocations, metaphors and lexical functions in Chapter 13. 21. Fin: denotes a terminative verb meaning 'to cease' or 'to stop'. Like Cont (see above) or Incep (see below), this phasal operator is usually found in conjunction with Oper, Fune or Labor: FinOper, (influence)= lose FinFunc0 (enthusiasm)= wither 22. Func 0 1 2 : denotes a semantically empty verb which takes the keyword as its subject and the first, second... actant, if any, as its main object. Func0 (silence)= reign Fune, (remorse)= gnaw [at sb] Func2 (blow)= fall [on sb] The subscripted figures should not be omitted since, in the examples above, they indicate that the noun silence does not involve any actant while remorse definitely involves one (the experiencer), who is the object of the verb gnaw. In the case of blow, it should be realized that two actants are involved, viz. a causer (the person who deals a blow - actant number 1) and a victim (actant number 2). Since the second actant is the object of fall, Fune has to be associated with the subscript 2.

76 23. Gener: superordinate or generic word Gener (anger)= feeling [of anger] Gener (yellow)= colour Gener (crawl)= move The Gener LF actually captures the genus term in a traditional lexicographical definition. This crucial lexical relation makes it possible to construct taxonomies with hyperonyms (the values of the function) and hyponyms (the keywords). Another thing which is worth noting is that the status of this function is not easy to define when one wishes to classify it as a paradigmatic function. As the examples above indicate, the phrase a feeling of anger is wellformed, which provides evidence that the Gener function is also a syntagmatic relation, at least in some cases (see my discussion of the dichotomy between paradigmatic and syntagmatic functions above, section 5.3.4.2).

24. Germ: the germ or core of something. Germ (problem)= root Germ (discontent)= seed, root Germ (evil)= root Mel'cuk's definition in his ECD ("le germe de...", Vol III, p.128) might lead the user to infer that there is some potential confusion with the Centr LF. As a matter of fact, the paraphrase given by Steele & Meyer (1990:55) seems slightly more informative and makes it possible to avoid any such confusion, Germ being paraphrased as 'small amount of X beginning X'. Unlike Centr, Germ is therefore not concerned with the central part of a geographical or temporal entity. 25. Imper: denotes the order or command which is expressed by a fixed formula other than the imperative form of the verb. Imper (silence)= silence!, shut up! () Imper (go away)= beat it!, vamoose!()... Imper (shoot)=fire! 26. Incep: denotes a verb meaning to 'begin'. Like Fin or Cont, this phasal operator is most often combined with other LFs such as Oper, Fune, or Labor. IncepFunc 0 (war)= break out IncepOper, (trouble)= run into IncepPredMinus (anger)= cool down

77 It should be noted that the function Incep is associated with inchoative verbs, i.e. verbs which express the beginning of a change of state. This function is therefore bound to play a crucial part in the analysis of verbs which display the so-called causative/inchoative alternation discussed in chapter 12. 27. Instr: denotes the preposition headed by the keyword and meaning 'with the help o f : Instr (car)= by Instr (foot)= on Instr (hammer)= with 28. Involv: verb which describes the action of the situation of the keyword upon an object which is not a participant. The keyword is the subject and the person or thing which undergoes the action in the situation of the keyword is the direct object. Involv (smell)= fill [the room] Inceplnvolv (snowstorm)= catch [people]

29. Labor^: semantically empty verb which takes the ith actant as its subject and the jth actant as its direct object and the keyword as second object: Labor, 2 (esteem)= hold [someone in (high) esteem] Labor, 2 (consideration^ take into The term 'indirect object' used by Steele & Meyer (1990:56) to refer to the function of the keyword with respect to the value of the function is most certainly debatable since the keyword actually occurs in a prepositional phrase. 30. Labrealjj: verb meaning 'to realize' which takes the actant i as its subject and the actant j as its main direct object; the keyword is the second object, as in: Labreal, 2 (mind)= bring to Labreal,2 (saw)= cut [something with a saw] Two things should be noted here. First, Labreal is actually a combination of Labor and Real. Second, the relationship between cut and saw is, in my opinion, wrongly described by Steele & Meyer (1990:56) as a relation between a predicate and its indirect object. It can in fact be argued that saw enters in an instrumental relationship with the verb cut and the analysis of such a relationship in Chapter 8.7 is used to argue for the creation of a new function which I call Telic (the reader is referred to this chapter and to Chapter 4 for an explánation of this term). 31. Liqu: to liquidate, to destroy, to cause not to be.

78 Liqu Liqu Liqu Liqu

(marriage)= annul (trace)= wipe out (law)= abolish, abrogate (disease)= eradicate

Roughly speaking, the combinations of words illustrated above correspond to what Benson et al. (1986a) call EN collocations in their BBI dictionary since the verbs basically mean eradication and/or nullification (see the wealth of examples given in the introduction to the BBI dictionary - p.XXVI: to lift a blockade, to withdraw an offer, to quench one's thirst, to exterminate vermin...). Mel'cuk et al. (1992:129) note that Liqu is actually a shorthand notation for CausFin since it involves an agent who causes the end of the existence of something. 32. Loc lk . Loc1fi. Locin: typical prepositions used with the keyword to mean 'from', 'to' or 'in' respectively. The examples below show that this category includes restricted as well as free types. Locab (distance)= from Locjn (list)= on Locad (place)= to To distinguish temporal location from spatial/geographical location, the ECD also uses the temp superscript, as in the following examples: Loclempin Loc,empin LocKmpm Loclempin

(night)= at [night] (morning)= in [the morning] (Easter)= at [Easter] (Sunday)= on [Sunday]

33. Magn: adjective or adverb meaning big, large, intense, very, much or to a large extent. Magn Magn Magn Magn

(bachelor)= confirmed (pain)= excruciating (fear)= mortal (contrast)= sharp, vivid

Gentilhomme (1992:108ff) notes that intensity can be expressed in numerous ways. This lexical function commonly illustrates stereotypes and clichés which often reflect some kind of racial prejudice or class bias. Gentilhomme is only concerned with French but English can definitely be shown to display the same characteristics: Magn (swear)= like a bargee Magn (happy)= as a king Magn (sober)= as a judge

79 As can be seen here, similes can be formalized in terms of the Magn function. Interestingly, some similes refer to famous characters belonging to our cultural patrimony: Magn (rich)= as Croesus Magn (work)= like a Trojan In a multilingual perspective, it is important to note that the underlying mechanism used to express the notion of intensity may vary from one language to another. The following example in French illustrates a cultural allusion while the corresponding simile in English is based on the productive domain of animal metaphors (compare with as blind as a bat, as busy as a bee, as stubborn as a mule - note that fier comme un paon is also possible in French): Magn (fier)= comme Artaban Magn (proud)= as a peacock Van der Wouden (1992a, 1997) shows that such examples should force us to reconsider the traditional definition of the term collocation. While collocations are generally assumed to involve idiosyncratic restrictions on the combinations of two words, it is clear that working at word level is much too narrow. The Magn LF definitely applies to collocational combinations which go beyond that level when they involve complex phrases, clauses or even sentences as in: (1) He talked till he was blue in the face. (2) It's raining cats and dogs. Although these expressions should certainly be best conceived of as idioms, given the relative fixity and rigidity of their constituents, it is clear that they can also be formalized in terms of Magn applied to the arguments tall? and rain respectively. The collocation [rain [cats and dogs]] is actually made up of a verb and an idiom, the former being the sole verb determined by the latter. This level of embedding testifies to the recursive character of collocability and idiomaticity, as noted by Mitchell (1971). Van der Wouden also notes that Magn can also be expressed morphologically, i.e. below the word level, by means of noun-adjective compounds, as in the following Dutch examples: Magn (duur)= peperduur (lit. 'pepper-expensive') Magn (naakt)= poedelnaakt (lit. 'poodle-naked' - stark naked) It is therefore important to be able to capture such combinations and to be aware that different processes may come into play (syntagmatic vs. morphological) to express this notion of intensity (such collocational restrictions in compounds are unfortunately most often neglected by most collocational dictionaries): French: English:

2

Magn (sourd)= comme un pot (syntagmatic) Magn (deaf)= stone-deaf (morphological)

Disregarding the fact that an expression such as till [sb is] blue in the face conveys the additional meaning of something which is done in vain.

80 34. Manif: to be manifest in something, to become apparent. This LF is often used in combination with other functions such as Caus. Manif (courage)= shine through Caus, Manif (opinion)= express Caus, Manif (hatred)= profess AntiPerm,Manif (feeling)= hide 35. Minus: this function, which means 'less', is only used in combination with other LFs (especially with the aspectual functions Incep, Cont, Fin and the causative functions - Caus and Perm). IncepPredMinus (price)= toboggan, tumble, fall, drop CausPredMinus (anger)= assuage, soften, soothe This LF is usually contrasted with Plus (which means 'more'). 36. Mult: noun denoting a group or collection of something Mult (dog)= pack Mult (goose)= gaggle Mult (bee)= swarm As is shown above, this function often applies to animals to denote a regular group. It is also used for people when the noun which refers to a group of specific persons cannot be predicted: Mult (runner)= bunch Mult (fireman)= platoon Mult (gangster)= ring, gang, mob (US), firm (Brit) Note that French uses the term peloton in combination with both cycliste (runner) and pompier (fireman). 37. Nocer: verb meaning 'to damage' or 'to attack'. Nocer (mosquito)= bite Nocer (car)= damage, hurt Mel'cuk (personal communication, June 1993) is now reluctant to consider Nocer as a standard lexical function because the range of possible values is too limited. He finds it difficult to invent more examples than the three or four collocations which are repeatedly cited in the MTT literature. 38. Obstr: verb meaning 'to function with difficulty'.

81 Obstr (memory)= falter S0Obstr (memory)= lapse Obstr (voice)= falter AoObstr (voice)= hoarse 39. Oper, -, : almost semantically empty verb which takes the first/second... actant as subject and the keyword as direct object. Oper, Oper2 Oper, Oper, Oper, Oper, Oper,

(attention)= pay (attention)= attract (pressure)= exert (support)= lend (blow)= strike, deal, deliver (measure)= take (mistake)= make

The verbs which combine with the arguments of the Oper LF roughly correspond to Gross's 'support' verbs (1981) (see 2.2.3). Several papers describe them as semantically empty (Mel'cuk 1978a; Mel'cuk et al. 1984, 1988, 1992; Steele & Meyer 1990...). In fact, they cannot really be characterized as totally empty because they do have some (abstract) meaning such as 'to feel' or 'to make'. Some verbs are semantically emptier than others, however. In fact, one could say that the verb make is used in conjunction with the noun mistake to mean what it means, which excludes the verbs give or do. The verb make actually serves to join the first actant of the noun mistake to the noun in question. It is important to realize that the subscript is essential in the relation between the noun and the verb. As the following diagram shows, the noun attention may be considered to trigger a scenario (Fillmore would call it a 'frame'- 1982a) involving two actants, X (usually a person) and Y (a person or an object)): attention

pay X*

attract *Y

When the first actant is the subject of the verb, the noun attention 'selects' the verb pay (Oper,). When the second actant is the subject, it is the verb attract which must be chosen (Oper2). As pointed out earlier in this chapter, it is crucial to realize that the choice of the subscript is linked to the propositional form of the definition found in the semantic zone of the ECD. To understand which function links the noun admission to the verb elicit, for example, one must go back to a formal proposition of the following type based on the verb admit:

82 X admits Υ to Ζ where Χ, the first actant, is the person who admits, Y, the second actant, refers to what is admitted and Z, the third actant, is the person to whom one admits something. Since the subject of the verb elicit corresponds to Z, the relationship has to be represented as follows: Oper 3 (admission)= elicit This also shows that Oper only allows gross categorization since it lumps together verbs some of which are closer in meaning than others (pay, lend, take, make are definitely closer in meaning than attract). More fine-grained distinctions are only possible through the use of subscripts. Moreover, the Oper LF does not enable us to distinguish between delexical verbs proper (have, make, get, bring, come, etc.) and figurative verbs (Howarth 1993). Like many other LFs, Oper is also often found in complex functions, most often in combination with causative or phasal operators. The following French examples, excerpted and adapted from Mel'cuk's ECD (Vol.1), illustrate the fine-grained analysis which may be arrived at with this system of complex LFs (ART=determiner):

LF

Value

Example

Oper,

avoir, éprouver [ART -]

Il éprouve du respect pour ses parents.

ContOper,

garder [ART -]

Je garde un profond respect pour elle.

FinOper,

perdre [ART / tout -]

Ce chanteur a perdu tout respect pour son public.

CausOper,

inciter [N à ART -]

Nous devons les inciter au respect de ces valeurs.

Oper2

jouir [de ART -] , avoir [ART -]

Le directeur jouit du respect de ses employés.

ContOper2

conserver [le -]

Le ministre a conservé le respect de ses proches.

This table clearly shows that weakened metaphors are at stake here. The verbs garder, conserver, perdre (and similarly in English, lose, keep...) illustrate the metaphor that respect is something of value. 40. Peior: adjective meaning worse. This function is often used in combination with other LFs, especially Caus, Incep and Pred. CausPredPejor (prospect)= darken Pejor can in fact be considered as a shorthand notation for MinusBon. IncepPredPejor is actually hardly ever used because it corresponds to the Degrad LF. 41. Perf: refers to the perfective aspect of a verb, meaning that an action or a process is carried through to its limit. As Apresyan et al. (1969) note, the standard expression is the verb used in a perfect tense, as in:

83 Perf (die)= to have died This type of relation is so regular and predictable, however, that it is practically never used in isolation in the ECD. It nevertheless appears in complex LFs, mostly with S, which refers to the typical name for the first actant, as in the following examples: S, Perf (marry)= spouse S, Perf (escape)= escapee 42. Perm: verb meaning to permit. A subscript is usually attributed to the LF to account for the actant who is the subject of the verb. Like Caus, this function is also often used in complex LFs, but this need not necessarily be the case, as the first example shows: Perm (go)= let Perm,Facto (passion)= succumb to 43. Plus: this function, which means 'more', is, like Minus, exclusively used in combination with other LFs (Incep, Cont, Fin, Caus...). IncepPredPlus (price)= rise, soar, skyrocket, increase CausPredPlus (enthusiasm)= whip up 44. Pos,, Pos-,...: standard expression for the positive evaluation of the first, second... actant of the keyword. In the following example, the keyword opinion involves two actants (the opinion of X about someone/something - Y). The adjective which is used to convey a positive evaluation of the second actant (Y) is favorable, which is represented as follows: Pos2 (opinion)= favorable 45. Pred: this function (meaning 'predicate') is used to verbalize adjectival functions and is roughly equivalent to the verb to be. It is most often used in compound LFs. IncepPredMinus (pressure)= fall Pred can also be used to verbalize nouns to mean 'to be the keyword', as in: Pred (actor)= act 46. Pregar: verb meaning 'to prepare' (before using or activating something) PreparFact,, (rifle)= load Prepar (table)= lay

84 It should be noted that some criticism could be levelled against the nature of this LF. If we postulate that a function cannot have several values (mathematically, a standard function is a one-to-one relation), Prepar does not qualify as a function, but rather as a semantic relation (see my description of the Part relation in section 8.3). Consider the following example excerpted from Mel'cuk et al. 1992:130): PreparFactd (voiture)= mettre au point; faire le plein It is clear that mettre au point and faire le plein are by no means interchangeable or synonymous. They do share a common component of meaning, however, since they can roughly be interpreted as 'to make ready for use'. This explains why they are treated similarly. 47. Propt: typical preposition meaning 'because o f . Propt (fear)= for Propt (interest)= out of It should be noted that Propt, Instr (27) and Loc (32) are the only lexical functions specifically devoted to capturing the privileged links between a keyword and its accompanying prepositions. It is also worth noting that the prepositions which are under discussion here are those that govern the keyword. Prepositions governed by the keyword (depend on, responsible for, busy with) are not dealt with in the lexical function zone, but rather in the syntactic zone (Government Pattern). 48. Prox: function meaning 'be about to' or 'on the verge o f . ProxFunc0 (storm)= approach Prox (death)= on the brink of Prox (tear)= on the verge of 49. Qual¡: adjective meaning that able; is entailed with a high degree of probability. The subscript i refers to the actant who is considered in the scenario in question. Consider the following examples: Qual, (deceive)= deceitful Qual2 (deceive)= naïve Since the verb deceive involves two actants (a 'deceiver' and a 'deceivee', to coin a term which comes in handy here), the person who is likely to deceive other people may be said to be deceitful while the person who is easily deceived is said to be naive. It should be stressed that what actually distinguishes Qual¡ and Able¡ is only a matter of degree, which means that the assignment of such a function is bound to be debatable in many cases. No test can be found in the MTT literature to ensure that consistency is preserved. However, one could

85 probably design a test on the basis of the acceptability of a sentence which combines the value of the argument and the argument itself used with an adverb expressing low frequency, as in: ?/* He is naive, but he is rarely deceived. The unacceptable, or at least odd, nature of this sentence shows that the adjective naïve entails a high frequency or probability of deception, which corroborates the fact that Qual should be used instead of Able. Interestingly, the LF Qual is also used by some linguists in combination with the subscript „. Slightly departing from Mel'cuk's contention that standard LFs ought to account for 1—>1 relations only, Frawley (1988a,b) uses Qual0 to account for non-standard relations which are pervasive in scientific sublanguage. Starting from a standard definition of water as a 'transparent, tasteless, odorless and colorless liquid', Frawley (1988b:359) postulates that an ECD entry for water should contain the following triples: Qual0 Qual0 Qual0 Qual0

(water)= (water)= (water)= (water)=

transparent tasteless colorless odorless

Qual0 is therefore used here to refer to the standard qualities of the keyword, bearing in mind that the different values are by no means synonymous or substitutable. In zoological sublanguage, to take another example, the term endostyle, defined as 'a glandular ciliated groove running along the floor of the pharynx in [...] the larvae of lampreys' (Frawley 1988b:351) has the following LFs: Qual0 (endostyle)= ciliated, glandular In fact, Frawley makes use of the function mechanism to capture 1—>n relations, very much like Grimes (1990) who accounts for part-whole relations in terms of the Part 'function' (see my discussion of this topic in section 8.3). 50. Real,. Real·...: verb meaning 'to comply with the demands o f , 'to satisfy the requirements o f being the first, second... actant of the keyword. The verb can then be paraphrased as 'realize' and takes the first, second... actant as its subject. Real, Real, Real, Real,

(law)= abide by (promise)= keep (order)= obey, carry out (advice)= follow

It is important to distinguish Oper and Real. Consider the word exam, which can be used in the following constructions: to sit for/take an exam and to pass an exam. The normal goal which has to be achieved in an exam situation is exemplified by the latter collocation (implying success). This goal is not necessarily reached when one merely takes an exam. Such differences are accounted for in the ECD as follows:

86 Oper2 (exam)= take, sit for Real2 (exam)= pass AntiRealj (exam)= fail Let me point out in passing that the well-known pair of false friends pass/passer can be explained in terms of such LFs which bring out the opposition between the two verbs: English: French:

Real2 (exam)= pass Oper2 (exam)= passer

In a pedagogical perspective, the application of lexical functions to language teaching certainly deserves more than a passing remark (see Chapter 14). 51. Result: expresses what normally results from an event. The keyword is usually a verb, as in the following examples: Result (get up)= be standing Result (learn)= know 52. S,,: noun which is derived from the keyword. The argument and the value often have a common root (morphological derivation), though this is not necessary. S0 S0 S0 S0

(move)= movement (happy)= happiness (honest)= honesty (die)= death

S0 belongs to the same category of functions as A0, Adv0 and V0 (see below) which account for the paradigmatic relations of derivation. 53. S ^ S ->. S,. S^: typical noun for the first, second, third or fourth actant of the keyword. S, (murder)= murderer S2 (murder)= victim For the sake of clarity, it may be interesting to reproduce here the functions associated with the four actants of the commercial transaction described at the beginning of this chapter. For a verb such as sell , the participants (or frame elements, to use Fillmore's terminology) are referred to as follows: Sι 52 53 54

(sell)= (sell)= (sell)= (sell)=

seller goods buyer, customer price

87 54. Sinslra_SloGl_Smedl_Smodl_Sres: noun used to express the typical instrument, location, means, mode or result respectively. In all accounts of Mel'cuk's ECD, these functions are grouped together but it is clear that they are in fact five different functions. Unlike the functions in the preceding category (53), they correspond to concepts which do not play a central defining role in the situation. The following examples illustrate the spectrum of relations these functions are expected to capture: Sinstr (paint)= brush Sinstr (write)= pen, pencil SltJC (fox)= kennel, earth S|oc (Hon)= den Smed (paint)= paint (substance) Smed (write)= ink Smod (write)= handwriting Sres (catastrophe)= consequences, aftermath Sres (copyv)= copyn 55. Sing: noun denoting a regular portion of something. Sing (rice)= grain Sing (grass)= blade Sing (straw)= wisp The function Sing is also used to indicate which partitive noun has to be used to express countability with uncountable nouns denoting an undifferentiated mass: Sing (information)= piece Sing (clothing)= item Sing (news)= item, piece As noted by Quirk et al. (1985:249-250), piece, bit and item are general partitive nouns. In addition to this class, we find the more restricted category of typical partitives which co-occur with concrete uncountable nouns, as in the first three examples above. This function has always attracted a lot of attention in language courses, as is testified by the number of chapters devoted to making uncountable words countable (consider the numerous examples given by McCarthy & O'Dell (1994:60): a good spell of summer, a flash of lightning, a gust of wind, a breath offresh air, a stroke of luck...). 56. Son: verb denoting the typical sound or cry of the keyword. Most examples in the MTT literature illustrate animal sounds, though this need not be the case, as the following set of examples clearly shows: Son (dog)= bark Son (elephant)= trumpet Son (horse)= neigh, whinny

88

Son (brake)= screech, squeal Son (wind)= whistle Gentilhomme (1992:112) points out that two different functions could have been devised to account for the distinction between animal subjects and artefacts: Cry would be used for the former category while Sound would be reserved for [-ANIMATE] subjects. The principle of economy can be applied, however, to devise a more general function (Son) which captures the common component of meaning shared by Cry and Sound. Gentilhomme represents this economical generalization as follows, using the symbol u to refer to the union of two LFs: Cry u Sound = Son. It is clear that Son is also used in a complex LF to refer to the typical noun used for an animal's cry. Son is then combined with S0, as in: S0Son (elephant)= trumpeting S0Son (cat)= caterwauling It should also be stressed that there is a clear relationship between the values of the Son LF and onomatopoeic phenomena. This relationship will be studied in more detail in Chapter 10. 57. Sympt: verb or verbal expression referring to the physical symptoms of an emotion, a feeling or a state of mind. This function differs from the other LFs in several respects. First, the keyword refers to the emotion or the feeling which is responsible for the symptom in question. Unlike other LFs, however, Sympt is usually accompanied by another function which specifies what happens to the body part affected by the feeling or emotion. This function usually expresses the degradation (Degrad), impairment (Obstr) or excessive use of this body part. A few examples are in order here to illustrate this rather complex representation: Degrad (speech) + Sympt23 (surprise)= be speechless DegradC0|0r (face) + Sympt23 (envy)= turn green [with envy] Excess (hair) + Sympt^ (horror)= stand on end As can be seen, Sympt is assigned subscripted figures referring to the grammatical functions of the various actants. Sympt takes three actants: (1) the part of the body or organ affected by the emotion; it usually undergoes some change of state. (2) the person whose body part is affected by the emotion; (3) the emotion or feeling itself (stimulus causing the symptom). The order in which the reference to the actants appears makes it possible to identify the grammatical function: the first figure is reserved for the subject of the verb, the second figure points to the direct object and the third figure (if any) to the indirect object. In Sympt,., (surprise) above, the subject of be speechless is the person (2nd actant). In Sympt,3 (horror), the subject of stand on end is the body part affected by the emotion, viz. hair (e.g. His hair stood on end with horror).

89 Consider the following two sentences: (a) John suffocated with anger. (b) Anger suffocated John. The verb suffocate participates in an interesting type of alternation since its experiencer can appear either in subject or in object position. The stimulus {anger) can appear in the form of a prepositional phrase, in (a), or as the subject, as in (b). This alternation is reflected in the order in which the subscripts of the Sympt LF are listed: (a) Degrad (breath) + Sympt23 (anger)= suffocate (b) Degrad (breath) + Sympt32 (anger)= suffocate Such decompositions pose a number of problems insofar as they closely resemble the generative semanticists' attempts to break down meaning into semantic primitives. Consider the example of turn green above, where Degradco,or (face) + Sympt23 (envy) only accounts for He turned green with envy, but is unable to account for His face turned green with envy (which requires Sympt13 instead of Sympt23). This decomposition procedure is faced with almost insurmountable difficulties, however (see Wilks et al. 1996). 58. Svn: exact or near synonym Syn (help)= aid Syn (approval)= acceptation As with Anti, Mel'cuk uses the subscript c , and n to indicate broader, narrower or intersecting meanings (see also Cruse 1986). In my database, synonymy is treated like antonymy since I have deliberately chosen not to resort to these subscripted symbols, given the absence of definitions for the keywords. I have therefore opted for the classical approach to synonymy which rests on the intuitive notion of substitutability (cf. Church et al. 1994), bearing in mind that there may be no such thing as a pair of perfect synonyms. 59. V0: verb which is derived from the keyword. The argument and the value usually have a common root, as is the case with the other paradigmatic functions of the same category (A„, Adv0, S0). V0 (advice)= advise V„ (promise„)= promisev 60. Ver: adjective or adverb meaning 'as it should be' or 'correct'. Ver (instruction)= exact Ver (excuse)= legitimate Ver (remark)= opportune

90 This function definitely refers to a rather subjective view of the world insofar as the range of values for a given item depends on our favourite view of the situation it evokes. Its usefulness for natural language processing therefore seems doubtful. 5.3.4.6 Subscripted modifiers Heylen & Maxwell (1994:302) observe that the verb oppose appears in a large number of verb-adverb constructions. In their corpus, this verb collocates with the following adverbs: adamantly, bitterly, consistently, steadfastly, strongly, vehemently, vigorously, deeply and resolutely. They note that all of these combinations can be formalized in terms of the Magn function since the adverbs are basically used to intensify the meaning of oppose. However, it is of paramount importance to bear in mind that all these adverbs differ in denotation and in the types of connotations they have. While consistently implies a durative component, bitterly entails some animosity on the opponent's part and strongly is probably more neutral in this respect. To cope with such imprecision and to address this issue of overgenerality, Mel'èuk has introduced a range of subscripted modifiers which enable him to arrive at a more fine-grained analysis of the relationship between a keyword and the possible values of LFs applied to this keyword. Two subscripted modifiers are used mainly with S, Fact, Degrad and Excess: actual: to refer to something which is currently taking place (Mel'òuk's functions often derive their names from Latin or French roots, which can pose a number of problems when the term chosen by Mel'cuk may have a different meaning in English, as is the case with actual which is in fact based on the French word actuel). usual: to refer to a usual, common situation. In early accounts of the ECD and in Vol.1 and II (Mel'ôuk et al. 1984, 1988), the Magn LF could be assigned the following two modifiers: quant: related to quantity temp: related to duration These two subscripts may be illustrated as follows: Magnttmp (beard)= stubbly Ma n g q u a m (beard)= thick AntiMagnquam (beard)= thin These distinctions should not obscure the fact that stubbly also differs stylistically from the other adjectives, being more informal. Another potential modifier appeared in Vol.III (Mel'èuk et al. 1992), which demonstrates the need for an even finer analysis of lexical-semantic relations in MTT: qual: related to the quality of something

91 In the database described in the remaining chapter of this book, I have chosen to introduce a further distinction: speed: related to speed/velocity This modifier makes it possible to distinguish the following combinations: Magnqual (car)= powerful Magnspeed (car)= fast The following series of subscripts is also commonly used, mainly with Degrad and Excess: color: related to colour dim: related to dimension/size fulg: related to fulgency motor: related to movement stat: related to vertical position trem: related to trembling t°: related to temperature The examples below illustrate a few possibilities: Excesstrem (earth)= quake Degrad (metal)= rust Degradfulg (metal)= tarnish Excessm0t0[ (tooth) + Sympt l3 (cold)= chatter There does not seem to be any stopping point, however, and one may reasonably question the usefulness of some modifiers which sometimes seem to have been created by Mel'cuk to provide a solution to more basic semantic problems with what resembles semantic primitives. In fact, it seems that the more specific Mel'öuk's lexical functions become, the more problematic their nature turns out to be. Finally, three superscripted modifiers make it possible to specify the level at which a certain event may be realized: I: realization at the psychological or mental level II: realization at the final stage of the mental level III: realization at the physical level. These subscripts, which are mainly used with the LFs Fact, Real, Labreal, Prepar and Sres, appear in the following examples illustrating the keyword advice: Real'3 (advice)= listen to Real"3 (advice)= follow

92 The third actant of advice (namely the person to whom some advice has been given) first has to listen to this advice (mental level) before being able to follow it (closer to the physical level). 5.3.4.7 Paraphrasing rules We have seen that the essence of the MeaningoText Theory is the development of a set of rules which account for the process of text generation. To complement the lexical component of his theory, Mel'cuk has established a paraphrasing system for synonymic transformations of sentences. This system makes it possible to produce, for any text, a set of synonymous or nearly synonymous paraphrases. Compare the following two sentences: (1) The enemy forces resisted fiercely. (2) The enemy forces put up a fierce resistance. These two sentences are almost synonymous and the transformation can be explained as follows: the verb resist is turned into a noun (nominalization process - resistance) which requires a support verb (to put up) in order to establish a link between the first actant (enemy forces) and the predicative noun. Mel'öuk (1992a:35fí) identifies around 50 paraphrasing rules which he further subdivides into 'substitution' and 'fission' rules. The former, which can be represented as X Y, account for the substitution of one term for another term to view the situation from a different angle while preserving the meaning. The verbs teach and learn, for example, can be substituted for one another but this substitution entails some reshuffling of the surface grammatical functions of the deep-syntactic actants, as in: (3) John learnt linguistics from Professor Kiparski. (4) Professor Kiparski taught John linguistics. The paraphrasing rule which comes into play here is: K(V) Conv321(K) This rule makes it possible to substitute the value of the Conv32l LF for a given keyword. Needless to say here that this rule can only be triggered if the lexical function zone of the ECD entry for learn explicitly specifies that Conv32i (learn)= teach. Unlike substitution rules which involve one word substituted for another, fission rules can be represented as follows: X Y + Z. The rule which applies for (1) and (2) above is: K(v) o S0 (K) + Oper, (S0(K)) where K= resist S0 (resist)= resistance Oper, (resistance)= put up To put it slightly differently, Κ is the keyword and a first operation transforms it into a noun with the lexical function S„. The Oper, LF is then applied to this predicative noun to generate the appropriate support verb.

93 5.4 MeaningoText Theory and Natural Language Processing

The publicity leaflets that are distributed to market the Dictionnaire Explicatif et Combinatone du Français Contemporain explicitly state that the "ECD of Modern French will serve as a useful tool for linguists, translators, terminologists, editors, journalists, computer scientists and teachers of French at all levels of language instruction". The inclusion of computer scientists among the set of potential users may seem slightly odd at first glance. Yet, it should be borne in mind that the computational linguistics community has been trying, for several decades now, to model language with a view to designing natural language processing applications ranging from fully-automated high-quality machine translation to question-answering systems to information retrieval or generation systems. The need for more detailed descriptions of the lexical coverage of these systems is so imperative that several researchers have turned to MeaningText Theory since, as noted by Haenelt & Wanner (1992:5), MTT is based on the postulate that language is a logical device which establishes the correspondence between the infinite set of all possible meanings and the infinite set of all possible texts. This logical device is used to synthesize possible texts (i.e. to generate utterances expressing a given meaning) or to analyze texts (i.e. to discover the possible meanings of a given utterance). MTT has now attracted so much attention in NLP circles that several researchers felt it necessary to organize an "International Workshop on MeaningoText Theory" at the Integrated Publication and Information Systems Institute of the Gesellschaft für Mathematik und Datenverarbeitung in Darmstadt (30 July - 1 August 1992 - see Haenelt & Wanner 1992 and Wanner 1996). The proceedings of the above-mentioned workshop make it abundantly clear that MTT is most often used in a production or generation perspective. Escalier & Foumier (1992), for example, describe a natural language generation system developed at Dassault Aviation to generate paraphrases of the semantic representation derived from Prolog rules in a deductive database. The paraphrasing power of lexical functions is at the heart of the system insofar as they make explicit the nature of the links between lexemes. In passing, the two researchers note that a traditional dictionary such as Le Petit Robert only uses four relations: Syn (to cover synonyms), Ant (to refer to antonyms), CF (confer, compare, for idioms and multi-word expressions) and V. (= voir aussi, to refer to related terms). The latter two symbols actually cover a whole range of lexical-semantic relations, which makes them a much less powerful tool than the highly-explicit, or at least more explicit, lexical functions. Iordanskaya et al. (1996) discuss some of the problems they were faced with when implementing lexical functions in a text generation system which automatically generates Labour Force Statistics and Retail Trade Statistics reports. They discuss various decisions they had to take in the practical development of the system, such as, for example, whether the values LI, L2 ... for a given LF applied to a lexeme L should be fully described with morphological and syntactic information in the entry for L or should have separate entries of their own (they provide arguments for the latter option). Their approach allows them to generate the following paraphrases:

94 (a) Retail sales decreased sharply in October. (b) Retail sales showed a sharp decrease in October. The relational information required to generate such paraphrases is formalized in the following way: Magn (decrease^ = sharply Magn (decrease,,) = sharp S0 (decrease^ = decrease,, Oper, (decrease) = show Lee & Evens (1996) describe a medical expert system for the diagnosis and management of stroke patients. Mel'cuk's lexical functions are used to form appropriate collocations and cater for what Halliday & Hasan (1976:318ff) call 'lexical cohesion' (which embraces both reiteration, i.e. the repetition of a lexical item or a synonym in the context of reference, and collocation, i.e. the tendency for one word to occur frequently in the environment of another). LFs such as Magn, Cap, Equip, Centr, Gener, Plus or Minus are therefore of paramount importance in such a framework. To describe the treatment of a patient, for instance, the following basic information is used by the system: Sing (drug) = dose Mult (drug) = dosage Gener (aspirin) = drug Thanks to its inheritance mechanism, the program is able to conclude that Sing (aspirin) = dose to generate sentences such as: Aspirin was prescribed for Mrs Smith. The dosage was 325 mg, the average oral dose. Implementing an expert system obviously requires a formal description of the sublanguage it deals with. This presupposes the identification of the numerous types of relations between lexemes. Although Mel'cuk's MTT is better suited for general language (because it mainly relies on standard lexical functions as described above - see section 5.3.4), some technical relations in the medical sublanguage can be formalized using the LF mechanism. To give but one specific example, Lee & Evens (1996:305) use the following compound lexical functions to link verbs denoting an activity and the nouns which express the loss of ability to perform these activities (usually after a stroke): S 0 AntiAble, (talk) = aphasia S0AntiAble, (swallow) = dysphagia S 0 AntiAble, (write) = agraphia Other specific relations must also be described and formalized, possibly in the MTT framework. Gérardy et al. (1995) discuss an attempt to describe in detail the internal structure of medical terms appearing in the International Classification of Diseases (ICD-9). Such an

95 experiment, carried out in the framework of the EU-funded ANTHEM project, revealed that it is essential to use specific relations to label the links between a lexeme denoting an organ and lexemes denoting the removal of or the typical infection associated with, this organ. Adapting the terminology used in ANTHEM to the formalism of lexical functions (e.g. 'disdia' refers to diseases affecting a given organ, which can be substituted for 'Infection'), one may suggest the following LF-like representation for such relations: S 0 Liqu (appendix) = appendectomy Infection (appendix) = appendicitis In the past few years, MTT researchers have also sought to apply Mel'cuk's ideas to machine translation, in the hope that the concept of lexical function would help them refine the translation process by giving them access to the appropriate terms in context. Van der Wouden (1992a) shows, for example, that the expressions toute la vérité in French and the whole truth in English are actually exponents of the Magn function applied to truth/vérité. He argues for the use of such functions as a sort of pivotal language providing an abstract semantic representation. In a paper presented at the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Danlos & Samvelian (1992) suggest a similar methodology to use support verbs in machine translation. Their contention is that it is preferable to start from the predicative noun, which is the most informative element in a sentence, insofar as it is responsible for the selection of the accompanying verb. The ultimate aim of such an approach is to avoid bilingual context-sensitive rules where the translation of a verb is determined by its object. Danlos & Samvelian (1992:21) illustrate the traditional approach with French-English examples of such rules which are used in most transfer-based MT systems (Fontenelle et al. (1994a:5ff) provide other examples in the context of the METAL machine translation system developed by Siemens-Nixdorf): avoir (_ habitude) —> be in perdre (_ habitude) get out of prendre (_ habitude) -> get into Danlos & Samvelian suggest that these rules are unnecessary and that information at transfer level should be limited to a simple translation rule such as habitude —* habit. In the monolingual lexicon, however, each noun should be described in terms of the set of support verbs with which it can be associated. Each verb should also be assigned a given semantic feature reflecting the aspectual, diathetic or modal values conveyed by the combination of the two. Some of the semantic values suggested by these two researchers are: neuter: be in (_ habit) inchoative (beginning of a process or state): get into (_ habit) terminative (end of a process or state): get out of (_ habit) durative (duration of a process or state): keep (_ ascendancy) causative (diathetic value - causative meaning): give (_ hangover)

96 Instead of having numerous (and costly) context-sensitive rules at transfer level, the system would contain a richer monolingual description of lexical items. A noun such as habit would then be described in the English lexicon as follows: habit -> {be in (neuter), get out of (terminative), get into (inchoative)} A similar entry would characterize habitude in the French monolingual lexicon: habitude —> {avoir (neuter), perdre (terminative), prendre (inchoative)} The proponents of this approach then explain how these types of information can be used in a transfer-based MT system such as EUROTRA. They also argue in favour of the automatic extraction of such collocational information from large corpora. As I show in the remainder of this book, however, this information can also be extracted from a machine-readable dictionary such as the Collins-Robert English-French dictionary. The verbal collocates of habit/habitude extracted from CR can be classified as follows, using Danlos & Samvelian's terminology:

English: habit

Semantic value

French: habitude

inchoative

acquire, contract, develop, form, get into, take to

prendre, contracter

terminative

break off, drop, conquer, forsake, get rid of, give up, outgrow, relinquish, shake off, slough off, throw off

abandonner, se débarrasser de , se défaire de , perdre, renoncer à, rompre avec, surmonter, vaincre

infix

inculquer

wean (from)

détacher (de), détourner (de)

causative causative+ terminative

Although Danlos & Samvelian hardly ever refer to Mel'cuk's lexical functions, it is clear that their approach draws on the methodology adopted in the combinatory dictionaries. The aspectual values of the support verbs above can all be represented in terms of lexical functions: the inchoative value is labelled as Incep in Mel'òuk's ECD, the durative aspect corresponds to Cont while the end of a process or state (terminative) is labelled as Fin. The usefulness of such information for machine translation, however, is doubtful since the items appearing in a given category are not synonymous. While conquer, give up, outgrow, relinquish, slough o f f , etc. can all be considered as values of the FinOper, LF applied to habit, it is clear that they do not display the same connotations at all. Give up or relinquish are most certainly neutral in this respect, but slough o f f , throw off and conquer, for instance, will usually only collocate with potentially "bad" or "negative" habits (consider the definitions

97 given by LDOCE; slough off: to get rid of as something worn out or unwanted; throw off: free oneself from something bad). I have already pointed out that the philosophy of lexical functions is to group and treat similarly words that share a common semantic component when used in the vicinity of a given lexeme, but it should be recognized that lexical functions are by no means sufficient for such complex tasks as machine translation, which require a great deal of pragmatic knowledge. Such knowledge could perhaps be inferred on the basis of a subclassification of each semantic value of the table above into a cluster of metaphorical verbs on the one hand (shake off slough o f f , throw o f f ) and a cluster of delexical verbs on the other. Mel'cuk's use of subscripted and superscripted modifiers shows that he is very much aware of the strong limitations of this mechanism. The new modifiers which appear in the more recent volumes of the ECD also provide ample evidence that LFs are, in some cases, far from satisfactory since the multiplication of such modifiers reflects a desperate need for an even more fine-grained classification. It is unclear, however, how an explosion in the number of complex LFs can be avoided if the combinations of simple LFs and modifiers is allowed to increase exponentially. The non-formalizable nature of certain pieces of information associated with lexical functions thus complicates the task of the researcher who wishes to use LFs in an MT perspective. To give but one last example, which is closely linked to the discussion above of the support verbs collocating with habit, consider the following data, excerpted from the French ECD (Volume I, Mel'cuk et al. 1984:78): Oper, (colère)= éprouver, ressentir, avoir [ART / de ART

être [en ~]

In Volume III (1992a:55), Mel'òuk admits that the values of a given LF such as Oper may include verbs which are semantically opposed. Eprouver/ressentir are semantically different from être (en), as becomes obvious when one compares the following sentences: (a) Laisse-la tranquille! Tu vois bien qu'elle est en colère. (b) Laisse-la tranquille! (???) Tu vois bien qu'elle éprouve/ressent de la colère. Mel'òuk uses these two sentences to justify his claim that the value of Oper, should be assigned semantic distinguishers to account for the fact that être (en) refers to the general state of the person who is angry, while éprouver/ressentir only refers to the emotion itself ('éprouver "actualise" l'émotion', to quote Mel'òuk). Sentence (b) is therefore considered odd because only general states can be seen (emotions themselves can hardly be seen). The notation Mel'cuk would then like to introduce in a revised version of the entry for colère would be as follows: Oper, (colère)= être [en ~] [état général et/ou émotion] éprouver, ressentir [de la ~] [émotion] This calls for two comments. First, it is clear that the values of the Oper, LF are not totally depleted from a semantic point of view, since they are not interchangeable or substitutable. Second, computing the value of the semantic distinguishers above is beyond state-of-the-art

98 machine translation systems and is likely to remain so for quite a while when one considers the coverage and depth of the lexical components of state-of-the-art NLP systems (see Bouillon & Clas 1993, Zampolli et al. 1994).

5.5

Conclusion

The all-important nature of the lexicon in the MeaningText Theory should now be evident. As noted by Mel'öuk (1992a:53), the Explanatory Combinatory Dictionary relies upon the crucial concept of paraphrase while paraphrasing rules themselves extensively draw on, and are expressed in terms of, lexical functions. We have seen that an increasing number of researchers consider MTT as an interesting testbed for various types of NLP applications. Such enthusiasm should certainly be tempered, however, when one considers the criticisms which can be levelled against certain aspects of the theory. Alonso Ramos & Tutin (1996), for example, consider that many lexical functions are much too imprecise to be used in NLP and I have shown in the preceding sections that using support verb information in a machine translation perspective poses a number of problems which cannot be solved readily if one only uses functions such as Oper, without any information on register, style or connotation. Alonso Ramos & Tutin also argue for a grammar of LFs which would describe the combinatory potential of LFs themselves. Their contention is that LFs should be classified according to syntactic and semantic criteria: Germ and Caus are related on the grounds of their common meaning, for example, but should be treated differently because the application of Germ always yields nouns while Caus is associated with verbs. Conversely, Perm and Caus share obvious meaning components and should therefore be treated similarly, unlike the unnatural fragmentation suggested by Steele & Meyer (1990:46-47). The nature of certain lexical functions which are not defined very clearly is also a major problem. In this respect, the absence of tests is certainly responsible for a number of inconsistencies and in Chapter 9,1 shall try to show that, in some cases, very simple linguistic tests can be developed to make sure the appropriate lexical function is assigned to a given collocational pattern. The grammar of LFs should also be complemented by a classification of the functions according to their nature. Such a classification would show that lexical functions actually fall under several different headings such as: 1. Derivational morphology (Ao, A„ S0, Adv0, Able,, V0) 2. Deep cases: S„ S2, numerical subscripts in Real,, Real2, Oper,, Oper2... 3. Circumstantial functions: Sinstr, Smcd, Smod, Sloc... 4. Aspectual distinctions (inchoative: Incep; durative: Cont; terminative: Fin; causative: Caus, Perm) 5. View of the world (functions which depend on the speaker's favourite and hence subjective view of the world: Ver, Bon, Pos 6. Quantifiers: Mult, Sing 7. Stylistics: Figur

99 8. Purely semantic functions: Liqu, Degrad, Culm, Magn, Son... (which could in turn be subdivided into negative and non-negative functions). etc. This type of classification resembles Alonso Ramos & Tutin's (1996) in several respects. It departs from their headings, however, insofar as it introduces further distinctions such as "View of the world" or "Stylistics", which could explain why several functions applied to the same item are not in complementary distribution and can yield the same value. In the example excerpted from Steele & Meyer (1990), Culm and Figur applied to despair both yield depths because the same piece of lexical reality (despair) is being looked at from two different angles, from a semantic point of view in the former case, from a stylistic point of view in the latter. This classification also shows that only a subset of these lexical functions are necessary to specify the environment of lexical items. I have also pointed out that there does not seem to be any limit to the number of lexical functions, which means that the nature of the highly specific functions unfortunately tends to become more problematic. Although Mel'éuk claims that the maximum number of standard lexical functions is around 60, it is clear that hundreds, perhaps thousands, of relations are theoretically possible if one makes full use of the compounding power of LFs (consider the following triple which makes use of five LFs: S0CausPredPlusBon (health) = improvement). Each time a new function is added, the theoretical maximum of compound LFs increases exponentially and there is sometimes no agreement as to whether a given type of relation should be granted LF status. In Chapter 8, I make a few tentative suggestions to cope with some well-known l->n relations (such as the part of relation) which, I argue, deserve to be treated as LFs if one wishes to employ the lexicon developed in the framework of this project in an information retrieval perspective or in a language teaching environment. To conclude this brief overview of MTT, it may be said that, despite its obvious limitations, Mel'cuk's theory proves to be a most interesting framework which makes it possible to account for and describe an impressive range of collocations and lexical-semantic (both syntagmatic and paradigmatic) relations, using the formal and descriptive apparatus of lexical functions. Together with other linguists who have not necessarily adopted MTT, we may agree with Mel'cuk's contention that any formal linguistic model should be based on a highly structured and detailed lexical component accounting for all the syntactic, semantic, pragmatic and combinatory phenomena the ECD aims to capture. Every entry in the dictionary is very lengthy, however, and, unfortunately, hardly decipherable for the layman since the very use of the ECD presupposes profound knowledge of the sophisticated system of LFs. It is also common knowledge that hardly any users ever read the preface to a dictionary, apart perhaps from those rather odd and freakish linguists who spend most of their time closely scrutinizing dictionaries to try to improve them. One may then reasonably wonder whether Mel'cuk's dictionaries are ever likely to become a popular tool among translators, editors, journalists and teachers, as is argued on the back cover. It is undeniable, however, that Mel'cuk's contribution to lexicography has paved the way for better collocational dictionaries in the future.

6 Constructing a database from the Collins-Robert Dictionary

6.1

Introduction

As I have pointed out above, machine-readable dictionaries have been the object of a lot of research in natural language processing but researchers have tended to concentrate mainly on monolingual English learners' dictionaries, somewhat neglecting bilingual dictionaries. The less structured format of the magnetic tapes of the latter is probably partly responsible for this lack of interest. The reluctance of publishers to distribute the machine-readable versions of their bilingual dictionaries has also certainly contributed to the concentration of efforts on the exploitation of monolingual MRDs. However, it has to be admitted, as Atkins & Levin point out (1991:255), that "the explicit treatment often accorded to restrictions on subjects/objects of verbs in dictionaries for the foreign learner (i.e. bilingual dictionaries) renders such works a valuable source of material for the semi-automatic construction of a lexical database". In this chapter, I wish to embark on a description of the database constructed from the Collins-Robert dictionary, paying attention to the structure of this database built by Jacques Jansen and to the retrieval programs which make it possible to readily extract the collocational and thesauric information it contains. Emphasis will also be laid on the enormous amount of manual work which was required in order to enrich the database with lexical-semantic information based on Mel'cuk's descriptive apparatus. The coding of around 70,000 pairs of items, which took more than two and a half years forced me to write a number of application programs to facilitate the task of the coder and to ensure consistency. As I show below in chapter 9, however, such consistency may be open to criticism because Mel'cuk's definitions of lexical functions sometimes lack the operational criteria which should normally lead to the assignment of one and only one lexical function to a given pair of lexical items entering into some kind of paradigmatic or syntagmatic association. In some cases, I put forward a number of suggestions to better circumscribe the range of application of lexical functions, but, in other cases, the descriptive nature of some functions is so imprecise that inconsistencies are inevitable in a single-handed task which started in mid-1992 and ended in late 1994. The general conclusions I draw concerning the imprecise nature of some functions and the evaluation of the material would never have been possible, however, if only a fragment of the dictionary had been analyzed in detail. Exploiting the whole dictionary and focussing on such a large section of the lexicon had never been attempted and the potential applications, whether purely linguistic or pedagogical, could only have been hinted at if I had decided to focus on a few dozen items and their semantic environment instead of taking into account the whole metalinguistic apparatus of the dictionary.

102 6.2

The Collins-Robert Dictionary

Michiels (1996) points out that bilingual and monolingual dictionaries share a number of oversimplifications, the most important being the following: - meaning is divisible into discrete units, so that - words have enumerable, listable meanings. A further simplification is that a bilingual dictionary such as the Collins-Robert divides the semantic space of source items as a function of the target language. This entails that a word which is considered as monosemic in a monolingual dictionary may be regarded as polysémie in a bilingual dictionary because the target language makes distinctions which are non-existent in the source language. Consider the entry for the verb croak in CIDE below which offers one definition to cover the general SOUND meaning: croak [SOUND] ν (of animals) to make deep rough sounds such as a FROG or CROW makes, or (of people) to speak with a rough voice because of a sore or dry throat. I could hear frogs croaking by the lake. [I]. "Water, water", he croaked. [+ clause] In comparison, CR makes distinctions which are based solely on the existence of different potential translations: croak 1 vi (a) [frog] coasser; [raven] croasser; [person] parler d'une voix rauque; Cgrumble) maugréer, ronchonner. These examples point to the all-important nature of the metalinguistic indicators {frog, raven, person, grumble) in a good bilingual dictionary (see also Duval 1986, 1990, 1991). The CR dictionary partly owes its reputation to the extensive use it makes of this metalinguistic information about the semantic, syntactic and combinatory properties of words. In this book, I intend to focus on the exploitation of these indicators which appear in italics in the printed version of the dictionary. Since I am primarily concerned with the English language, I have chosen to concentrate on the English-French part of the dictionary which was transformed into a database to enable diversified access. However, because I have decided to use a bilingual dictionary, i.e. a lexical resource in which metalinguistic annotations are provided mainly when the source and target languages do not divide the semantic space covered by lexical items similarly, it must be acknowledged that the English codings available in this database in a way reflect the lexical organisation of the target language. Before turning to a description of the database, then, examining an example of the CR entry in the printed version is in order. Consider the following entry for the verb render and the phrasal verbs associated with it: render vt

(a) (frm: give) service, homage, judgment rendre; help donner; explanation fournir. ~ unto Caesar the things which are Caesar's rendez donc or il faut rendre à César ce qui est de César; to ~ thanks to sb remercier qn; to ~ thanks to God rendre Grâce à Dieu; to ~ assistance prêter assistance or secours; to ~ an account of sth rendre compte de qch.

103 (b) (Comm) account remettre, présenter, (to) account ~ e d £10 rappel de compte or facture de rappel - 10 livres. (c) music interpréter; text rendre, traduire (into en). (d) (make) rendre, his accident ~ e d him helpless son accident l'a rendu complètement infirme. (e) (Culin) fat faire fondre. (f) (Constr) plâtrer. render down vt sep fat faire fondre. render up vt sep (liter) fortress rendre; prisoner, treasure livrer. The preface to the dictionary gives useful indications on how to use the dictionary and it would not make sense to reproduce it here. However, it is important to notice that part of speech is the main criterion for dividing entries. Small letters between parentheses are used to indicate senses and within a given meaning, further sense distinctions are separated by a semi-colon and end with a full stop. Example sentences, idioms and set phrases (e.g. to render thanks to sb, his accident rendered him helpless) appear in boldface at the end of a given meaning. The role of commas is of cardinal importance, as can be seen above: commas are indeed used to separate selection restrictions (e.g. service, homage, judgment) but also to separate alternative translations which have the same or a very similar meaning (e.g. remettre, présenter). It will also be noted that typography plays a crucial role insofar as the English text (headword, examples, idioms) appears in boldface while the French text is not emphasized. The metalinguistic apparatus, to which I shall return below, appears in italics and covers a number of concepts ranging from selection restrictions or parts of speech to subject fields, style labels or synonyms. The example above illustrates a word which has only one part of speech (vt for transitive verb). When a given headword is assigned several POSs, the main division is by POS, a letter preceding each POS label as in the following structure for the headword wrong: wrong 1 adj (a)... (b)... (c)... 2 adv ... 3 η (a)... (b)... (c)... 4 vt... 5 cpd... At this stage, it should be made clear that the basic material used in this book includes the primary translations in each meaning and their associated pieces of information (style labels, synonyms, collocations, etc.), to the exclusion of example sentences and their translations. Although I am fully aware that the latter also contain collocationally-relevant information (witness an example such as to render thanks to sb or to render an account ofsth), no attempt has been made to exploit this rich source of information which undoubtedly deserves an indepth analysis in the framework of another project. The availability of a database of around

104 60,000 example sentences and their translations is indeed an invaluable resource which merits fuller treatment in a teaching or computational perspective. The analysis of this material, however, which ranges from purely illustrative examples to completely fixed multi-word units (idioms, set phrases, proverbs and other types of phraseological material), would require the use of sophisticated parsing techniques which fall outside the scope of this book, at least if one aimed at an exhaustive treatment (see Michiels & Dufour (1996) who describe the computational exploitation of metalinguistic annotations and example sentences of two bilingual dictionaries in a word sense disambiguation perspective). In this project, I have chosen to limit my efforts to the exploitation of the first part of each "sense" in an entry, concentrating my attention on the numerous items of information belonging to the italicized metalinguistic apparatus, to which I would now like to turn.

6.3

The Collins-Robert metalinguistic apparatus

The prefatory matter uses the term 'general indicating material' to refer to the metalinguistic information appearing in italics. It should be realized that this material covers a wide range of linguistic information which will have to be sorted out when constructing a lexical database since the primary purpose of placing a dictionary on-line is to make explicit and implicit information readily accessible, which entails prior identification and analysis of the material it contains. The following types of metalinguistic indicators may be distinguished:

6.3.1

Part of speech of the source item

Since the various POS are listable (n, vt, vi, prep, adj, adv, cpd...), they can be formalized and used in a retrieval program. It is important to realize that the identification of POS information is crucial in constructing a database. In the example of render down above, the string of words in italics includes two types of information which must be distinguished and assigned to different fields in the database. In vi sep fat, vt sep is clearly a POS indicator (transitive phrasal verb whose direct object can be inserted between the verb itself and the particle). Fat, however, has a different status and clearly belongs to the set of collocational restrictions. Typographically, nothing indicates that vt and sep go together and that fat should be treated differently. 1

In the 1993 version, the Collins-Robert dictionary makes a clear typographical distinction between POS and selection restrictions. The former appear in small boldface while the latter are italicized. This is undoubtedly an improvement and facilitates the analysis of the typesetting tapes used in the DEFI project on word sense disambiguation (Dister & Bourseau 1997).

105 6.3.2

Meaning equivalents, explanations and micro-definitions

Such material is included in parentheses and includes synonyms (see give or make in the entry above) or what I call micro-definitions in Chapter 7. Consider the following examples which illustrate this category: chorus η (part of song) refrain m confederacy η (group of states) confédération / division η (that which divides) séparation/; (in room) cloison j\ (fig: between social classes etc) barrière / The material in round brackets also includes a large number of hyperonyms, synonyms or even antonyms: eglantine η (flower) églantine f (bush) églantier m tipper η (vehicle) camion m à benne (basculante); (back of vehicle) benne / (basculante) pipe vt (say) dire d'une voix flûtée; (sing) chanter d'une voix flûtée easy adj (not difficult) problem, sum, decision facile; person facile, accomodant

6.3.3

Subject field codes

The prefatory material indicates that labels referring to subject fields are used to differentiate various meanings of the headword or when the meaning in the source language is clear but may be ambiguous in the target language: cleavage η (lit) (Chem, Geol) clivage τη; (Bio) [cell] division /; (fig) [opinion] division, clivage. The usefulness of such information for disambiguation purposes need not be demonstrated. Codes such as Chem (Chemistry), Geol (geology) or Bio (biology) are indeed crucial in word sense assignment and, hence, in the selection of the appropriate translation in context. Some researchers have convincingly demonstrated that subject field codes contained in MRDs can also be used to identify the topic of a text automatically (Walker 1987 and Walker & Amsler 1986 tap the subject field codes contained in the magnetic version of the LDOCE dictionary; Jansen 1989 reports on a similar experiment with subject field codes from LDOCE and from the Collins-Robert dictionary and comes to the conclusion that these two MRDs rate similarly when one tries to compute the main topic of a text using sophisticated statistical techniques against a large corpus). Although a list of subject fields can be found on the inside covers of the dictionary, it must be acknowledged that this list is far from complete (witness the labels Acoustics, Advertising, Alpinism, Archery, Athletics, Bacteriology, Badminton, Ballet and dozens of other codes

106 which are not mentioned). 2 These labels are easy to identify, however, insofar as the bracketed element is capitalized, which makes it possible to distinguish it from a hyperonym, synonym or partial explanation (see above). Consider the following three examples where the string sport plays three different roles, depending on whether a capital letter is used and the item is bracketed or not: coach η (tutor) répétiteur m, -trice / ; (Sport) entraîneur m. football η (sport) football m; (ball) ballon m (de football), balle f . go in for vt fus (fig) sport, hobby pratiquer, s'adonner, faire; style, idea, principle, cause adopter... Sport in the first example is clearly a subject field label, while it is a hyperonym in the second example (corresponding to the genus term in a traditional definition for football). In the third example, sport must be seen as a restriction placed on the direct object of go in for and refers to the head of a thesauric class comprising all sports. It can of course also refer to the lexical item sport itself (see below for further details on the representation of selection restrictions). Obviously, this type of distinction will have to be made in the lexical database so that a user who wishes to examine the collocational properties of the item sport by retrieving occurrences of the string sport is not inundated with lists of sports or with lists of words which are assigned this subject field code. I shall return to this below when describing the structure of the database. A few exceptions to the principle above have been encountered when processing the database. In the following example, the word Bible is of course capitalized and appears in a sequence of bracketed words but does not refer to a subject field: word η (Rei) the Word (logos) le Verbe; (the Bible, the Gospel; also the Word of God) le Verbe (de Dieu), la parole de Dieu Such cases had to be treated manually to make sure that their function in an entry was represented appropriately in the database. The English-French part of the dictionary contains a staggering amount of subject field codes (39,382 labels in the database) but this book is mainly concerned with exploiting lexical-semantic relations holding between headwords and lexical items appearing in the metalinguistic apparatus. Although this information is now readily accessible in the database and could be exploited in other frameworks, no attempt has been made to evaluate its consistency or its usefulness for such tasks as information retrieval or word sense discrimination.3

-

T h e list given on the inside covers of the dictionary only includes abbreviations used in the dictionary. C o m m : C o m m e r c e ; St Ex: Stock Exchange... It w o u l d h o w e v e r be useful to have access to the exhaustive list of subject fields.

J

See w o r k described in Jansen ( 1 9 8 9 ) and alluded to above on the use of these codes in text indexing. In an e x p e r i m e n t in translation selection and w o r d sense discrimination, Michiels ( 1 9 9 6 ) m a k e s use of these C R field codes and puts t h e m in relation with the c o r r e s p o n d i n g L D O C E codes in a Prolog analyzer o f English.

107

6.3.4

Grammar notes

The dictionary contains a series of loose grammatical notes appearing in parentheses, as in the following examples: in (phr vb elem) prep (after superlative) de. The best pupil in the class le meilleur élève de la classe. measly"1 adj minable, misérable, piètre (before ri) this dem adj, pi these ce, (before vowel and mute 'h ') cet, / cette, pi ces Information such as after superlative or before vowel is far from being formalized and is not made use of in this book, despite its obvious usefulness in a generation perspective.

6.3.5

Selection restrictions

A systematic approach has been adopted by the CR lexicographers to account for a whole range of collocational constraints: - typical noun subjects of a verb headword appear in square brackets []; - nouns which usually complement a noun headword appear in square brackets; - typical objects of a verb headword or nouns typically modified by an adjective headword appear unbracketed next to the part-of-speech indicator; - adjectives, verbs and adverbs modified by an adverb are unbracketed. The following examples illustrate this approach: Typical objects: (V+N combinations) abolish vt practice, custom supprimer; death penalty abolir; law abroger, abolir do away with vt fus (a) custom, law, document supprimer; building démolir infringe 1 vi obligation contrevenir à; law, rule enfreindre, transgresser, contrevenir à validate vf claim, document valider; argument prouver la justesse de Typical subjects: (N+V combinations) go through 1 vi Paw, bill] passer, être voté; [business deal] être conclu, être fait, se faire slacken vi (...) [gale] diminuer de force; [speed] diminuer; [activity, business, trade] ralentir, diminuer; [effort, enthusiasm, pressure] diminuer, se relâcher wear out vi [clothes, material, machinery] s'user, [patience, enthusiasm] s'épuiser N+N combinations revocation η [order, promise, edict] révocation; [law, bill] abrogation; [licence] retrait; [decision] annulation burst η [shell etc] explosion, éclatement; [anger, indignation] explosion; [anger, laughter] éclat; [affection, eloquence] élan, transport; [activity] vague; [enthusiasm] accès, montée; [thunder] coup; [applause] salve; [flames] jaillissement, jet

108 Adi+N combinations forceful adj person, character énergique; argument, reasoning vigoureux, puissant incontrovertible adj fact indéniable; argument, explanation irréfutable; sign, proof irrécusable unflagging adj person, devotion, patience infatigable, inlassable; enthusiasm inépuisable; interest soutenu jusqu'au bout devouring adj hunger, passion dévorant; zeal, enthusiasm ardent Adv+V combinations appropriately adv speak, comment avec à propos, pertinemment; decide à juste titre; design convenablement A few remarks ought to be made at this juncture. First, most of the examples above illustrate collocational constraints of the restricted type (Aisentadt 1979). Applying Hausmann's terminology (1979), it is easy to notice that the base of the collocation appears in italics while the collocator itself is the headword. What is important to bear in mind is that, in an encoding perspective, this way of presenting collocations may not prove very useful if one wants to start from the base in order to discover a collocator which expresses a given meaning. Imagine, for example, a translator or a student who wishes to know which verbs can take the noun law as direct object and express the 'destruction' of a law. The entry for law in the dictionary is not going to provide any useful answer but a quick glance at the first category of examples above (V+N combinations) reveals that verbs such as do away with, abolish or infringe all share an important feature insofar as they all include some reference to a possible collocation with law which appears in italics in the entries. Working with the printed version of the dictionary and trying to locate all occurrences of the italicized unbracketed item law in verbal entries would undoubtedly provide a partial answer to the question above but it would boil down to looking for a needle in an 833-page haystack. The availability of the machine-readable version, however, makes it possible for the user to access information via any element of the dictionary, including - though not exclusively - the elements in italics. When the magnetic tapes of the CR dictionary were made available to the English Department in Liège in the 1980s, the WordCruncher (ETC) Text Retrieval Software package (running under MS-DOS) was chosen to exploit the contents of the dictionary. Unlike the lexical database organization we opted for a few years later and which is adopted in this book (see below), WordCruncher is a full-text retrieval system, i.e. a software package which takes as input any ASCII file and makes it possible to retrieve any string of characters or combination of strings thanks to an index it generates automatically. In preparing a WordCruncher version of the dictionary, Jacques Jansen had processed the files to distinguish the English words (in capital letters in the resulting file), the French words (in small letters) and the metalinguistic information (between angled brackets - o because italics cannot be represented in an ASCII file). Instead of being provided with one access path only, viz. the alphabetical order, the user can then resort to the WordCruncher version of CR to instantaneously retrieve all the occurrences of any string of characters, for example a given word in italics, together with the headword under which it is found. Starting from the hypothesis that co-occurrence knowledge is spread across the entire MRD, it is therefore

109 possible to retrieve a given base and its collocatore by concentrating on the occurrences of a given italicized item in the whole English-French part of the dictionary. Doing so yields a list of 89 items containing law in italics. Making use of the functionalities offered by WordCruncher, Boolean operators (and, or, not) can be used to refine the queries against the dictionary. Searching for the co-occurrence of the POS label vt and the italicized item law yields the list of transitive verbs that can take law as direct object. The 35 verbs which meet this criterion in CR are: abolish, annul, carry out, circumvent, contravene, defy, disobey, do away with, elude, enforce, establish, evade, get round, infringe, invoke, keep, neglect, obey, offend against, put into operation, override, promulgate, reform, repeal, rescind, respect, revoke, sanction, stretch, subvert, trespass against, uphold, vote in. Similarly, searching for law within an entry and between square brackets reveals that law can be the subject of the following verbs or expressions: fall into abeyance (s.v. abeyance), take affect (s.v. effect), become effective (s.v. effective), come into force (s.v. force), go through, lapse, operate, be in operation (s. v. operation), come into operation (s.v. operation), ordain, stand. It is important to realize that knowledge of typographical information is essential to identify the grammatical role played by a given selection restriction. In the following entry, law is the typical subject of ordain and, as such, is enclosed between square brackets: ordain vt (a) [God, fate] décréter (that que); [law] décréter (that que), prescrire (that que + subj) Searching for law and vt in italics within the same entry could lead one to infer incorrectly that law is a typical object of ordain. Extreme caution should therefore be exercised in phrasing such queries and a number of criteria should be met for a given italicized item to be considered as a direct object: - the POS of the headword should include vt (POS= vt, vt sep, vt fus); - the italicized item should be unbracketed. Similarly, for a given italicized item to be considered as a typical subject, it is insufficient to specify that it should appear in an entry whose POS is vi, witness ordain above. Rather, the query should be formulated so as to retrieve items which fulfil the following conditions: - the POS of the headword should be vi, vt, vt sep or vt fus\ - the italicized item should appear between square brackets. Consider the following examples which make it abundantly clear that working with a file which is too close to that of the printed dictionary is fraught with problems: summon vi servant, police appeler, faire venir; [monarch, president, prime mander stupefy vt [blow] étourdir; [drink, drugs, lack of sleep] abrutir slump 2 vi [popularity, morale, production, trade] baisser brutalement

minister]

110 The words president, drugs, morale and production in the entries above appear in italics and are unbracketed. They are all typical subjects of their respective headwords but it should be realized that square brackets have not been used around every typical subject for reasons of economy (only one pair of brackets is used). A full-text retrieval system such as WordCruncher is therefore clearly unable to treat fine-grained queries drawing on information that is only implicit in the dictionary. In the database version whose structure is described below, it has therefore been necessary to 'decompact' this space-saving coding and to redistribute this information to make sure that words such as president or drugs in the entries above inherit the properties indicated by the presence of square brackets. A decompacted entry should then look like the following: slump 2 vi [popularity], [morale], [production], [trade] baisser brutalement Another important feature of the metalinguistic information contained in the dictionary is that the italicized item, which stands for the base of a collocation, may be part of a restricted collocation but can also function as the head of a thesauric class. Consider the entries for shoal and school below which explicitly indicate that some kind of link holds between these items and the word fish: school2 η [fish] banc m shoal η [fish] banc m Fish can of course refer to the English word fish, but it is also part of the dictionary metalanguage, being the head of a thesauric class comprising several types of fish. A school of fish is of course a well-formed phrase, next to a school of sardines and of herring. Lexicographers usually only use a superordinate term because exhaustive listing of all the hyponyms would be too space-consuming in most cases. The typical relationship between dog and bark has been quoted to death in the linguistics literature but linguists are too prone to forget that dog can be replaced by greyhound, whippet, poodle, Alsatian, etc. As far as their representation in CR is concerned, the dividing line between collocations proper and selection restrictions is far from being a clear-cut one, as I argue in Chapter 5 (see also Heylen (1993)'s remark I quote on p.73). The example cited earlier in this section and which I reproduce here for the sake of clarity is also a case in point: forceful adj person, character énergique; argument, reasoning vigoureux, puissant The italicized item person is clearly a selection restriction standing for any [+HUMAN] noun. Words such as person, thing, object or animal are definitely near the top of the noun taxonomy, as noted by Byrd (1989:76) and the following equivalences can be deduced: typical object: person = typical object: animal =

[_ [+ human]] \_ [+ animal]]

(human object) (animal object)

Ill typical object: thing typical object: object

= [_ [- animate]]4 (non-human object) = [_ [+ concrete]] (concrete object)

A by-product of the database described in this book is that it is of course possible to retrieve relevant subsets of lexical items sharing a given subcategorization rule (e.g. display the 800odd English verbs which subcategorize for a [+human] object or the 62 intransitive verbs which are marked for a [+animal] subject).5 At this juncture, a general remark ought to be made with respect to the collocations listed in the Collins-Robert dictionary and quoted in this book. On the occasion of a EURALEX Conference, I was given the opportunity to present a few preliminary ideas related to the construction of the CR database and to a few potential applications (see Fontenelle 1994b). One of the examples I was quoting was the combination a school of fish which is reproduced above. Two eminent English lexicographers strongly objected that "school of fish" is not a well-formed collocation because everyone in England says a shoal of fish (which is equally attested in CR, s.v. shoal). According to these two lexicographers, the noun school would be reserved for large sea animals, especially mammals, like dolphins or whales. They objected to my trying to re-use an old dictionary (the 1978 CR version) which is entirely based on the intuition of the lexicographers, on the grounds that the CR dictionary contains statistically insignificant, or even erroneous, collocations (see section 2.3 for a brief survey of the most common statistical techniques currently used by lexicographers in their everyday quest for collocations). Yet, LDOCE (1987), which is based on a computerized collection of texts, gives the following definition: school 3 η (of) a large group of one kind of fish or certain other sea animals swimming together: a school of whales This definition does not indicate that a school of fish is acceptable in modern English but does not rule it out either. The Cobuild example is more explicit: school η (...) EG. a school of tiny, glittering fish. As I pointed out in the introduction to this book, Cobuild is the first truly corpus-based dictionary whose examples are all selected from a balanced corpus. This example was therefore included because it was deemed to be significant by the lexicographer. Another explanation could be the following: statistics such as Mutual Information scores are used to

4

5

Byrd ( 1989:76) argues that a word like thing in object position should be equated with a [+concrete] selection restriction. In the following examples, however, thing can also be used to refer to [+abstract] objects, which is w h y the equivalence should rather be "thing" => [ [-animate]]: dull ν/ thing atténuer (see the Cobuild example: The taking of food dulled pain). Verbs which subcategorize for animals which are lower in the animal taxonomy will of course not appear here since such a query would only apply to the string animal. The following pair of examples clearly illustrates the cline between selection restrictions and more restricted thesauric classes (person refers to [+human] nouns while snake is itself superordinate to viper, python, cobra, etc...): die vi [person] mourir, décéder, s'éteindre; [animal, coil vi [snake] se lover

plant]

mourir, crever,

112 detect significant patterns of co-occurrence. It is reasonable to think that the following pattern emerged (the figures are invented): school of

fish: 8 occurrences whales: 7 occurrences dolphins: 5 occurrences

School of fish is obviously more frequent than either of the remaining collocations. Taken together, however, whales and dolphins in collocation with school total 12 occurrences. It would therefore be more appropriate to say that school usually refers to a group of large sea mammals, although the Cobuild example provides strong evidence that this is only a tendency and that a school of fish should not be rejected. In this project, no attempt whatsoever has been made to take frequency information into account and to check CR evidence against corpus data. The 70,000 word pairs I have had to deal with in constructing the lexical-semantic database are obviously not all statistically significant. Moreover, value-judgmental arguments are most frequent in semantics and it is certainly difficult for a non-native speaker to say whether a given combination of words is acceptable or not, especially when highly-qualified, professional lexicographers from competing publishing houses hold widely divergent opinions on such controversial issues. Anyway, as pointed out by Varantola (1994:607) in an attack against dictionaries which only give examples of what is perceived as prototypical usage, professional translators and advanced dictionary users also need infrequent collocations, rare and 'exotic' combinations. Obviously, the CR dictionary, which was compiled well before large corpora were made available to lexicographers, certainly contains rare and exotic collocations that would not come on top of the list of significant combinations, but it is undeniable that they are, linguistically, equally interesting and it would certainly be a pity to ignore them. In general, I have therefore decided to take the CR information for granted and the numerous examples cited in this book should normally convince any reader that this dictionary is a most useful source of collocational information which ought to be made available to linguists, translators, language teachers or students.

6.4

Metalanguage and lexical functions

The discussion on the exploitation of the CR metalinguistic information above was triggered by the hypothetical question asked by a translator wishing to know which verbs can take the noun law as direct object and express the 'destruction' of a law (I deliberately misuse the term 'destruction' here since it is precisely words which express this rough meaning that I wish to identify). The list of around 35 transitive verbs which contain some reference to the noun law in the microstructure of the entries is indisputably most useful since it contains the type of decision-making material any translator or writer would like to have at his or her fingertips. As noted by Cop (1988), someone writing a text about a given concept, in this case law,

113 frequently needs a list of contextual partners to jog his or her memory or to spark his or her imagination. The abundance of collocations in the CR dictionary makes it possible to answer part of the question above and to discover what one typically does to a law. Lexicographically speaking, this is equally important since it is just the type of information a lexicographer is looking for when preparing a dictionary entry. It should be realized, however, that this list of verbs is rather heterogeneous and resembles what can be found in a traditional collocational dictionary such as the BBI dictionary (see section 3.2). Such a compilation does not provide any semantic interpretation and the user is left in the lurch when it comes to working out the exact nature of the link which unites the components of the collocation. The verbs abolish, carry out, enforce, repeal or trespass against all appear in the same category of V+N collocations but it hardly needs to be pointed out that they do not express the same meaning and are far from being interchangeable. Most collocational dictionaries also fail to differentiate between collocations, and corpus-based studies drawing on sophisticated statistical techniques often yield the same type of heterogeneous data, even if tagged and parsed corpora are used. Smadja (1991) admits that the type of semantic interpretation he has in mind to distinguish semantic sub-classes cannot be performed automatically. In the example under scrutiny here, this semantic interpretation would be based on the recognition that the relationship between the base of the collocation (law) and the collocator (the transitive verbs) is variable. Establish, promulgate and vote in, for example, can be considered as near synonyms while abolish, annul, do away with, repeal, rescind and revoke all express the opposite meaning. The former set of verbs refers to what Benson et al. (1986a) call CA collocations (verbs denoting creation and/or activation) and the latter set refers to EN collocations (eradication/nullification). The lexical collocations extracted from the CR dictionary can also be analyzed in terms of Mel'cuk's lexical functions and the CA collocations for law can be represented as follows in the ECD framework (establish a law « cause a law to come into force): CausFunc0 (law)= establish, promulgate, vote in Similarly, EN collocations correspond to the Liqu LF and are represented as follows: Liqu (law)= abolish, annul, do away with, repeal, rescind, revoke A computer program which would simply extract V+N or N+V collocations, however useful, would only enable the user to answer part of the question (viz. what can one do to a law or what does a law typically do?). In order to allow more flexible queries and semanticallymotivated questions, I decided to embark on the construction of a database enriched with lexical-semantic relations based on Mel'cuk's lexical functions. In order to enable the user to retrieve the verbs which collocate with law and which express the destruction of this law, for instance, the dictionary was enriched with information about the LFs linking law and the various transitive verbs it can be combined with. For verbs such as abolish, annul, do away with, the function Liqu obviously applies and the idea was to construct a large database in which the link between the metalinguistic indicator and the headword, if any, was made explicit in the form of a lexical function à la Mel'cuk. Since it proved to be impossible to

114 automate this labelling process, the assignment of this 'semantic tag' had to be performed entirely manually (with the aid of a program to ensure consistency - see below). Several remarks are in order here. First, it should be realized that enriching the MRD with lexical-semantic information entails adding information and hence changing the dictionary files. A software package such as WordCruncher is a full-text retrieval system; it uses a flat file without identifying the structural properties of dictionary entries, which makes it impossible to display only selected types of information (e.g. only the headword, translations, etc.). Moreover, it is virtually impossible to update the contents of the dictionary, to add, delete or modify information, which is precisely what needed to be done here. The various typographical problems to which I alluded above (see the space-saving strategies used in the bracketing system) also contributed to our giving up WordCruncher and opting for a representation model of the lexical database which would enable us to update the dictionary, change the contents and have powerful searching and retrieving functionalities at our disposal to formulate fine-grained queries against the database. The relational model was opted for and it was decided to embark on the construction of a relational database whose structure is detailed below. Another thing to note is that the systematic enrichment of the dictionary aims to make explicit a number of relationships which are only implicit in the dictionary and which are based on the same typographical conventions. The italicized item fish is used in around 100 entries but the following NN collocations make it abundantly clear that the bracketing system is used to cover totally different types of relationships: school2 η [fish] banc shoal η [fish] banc backbone η [fish] arête centrale beard η [fish, oyster] barbe; [goat] barbiche; [grain] barbe, arête bone η os; ¡fish] arête fin η [fish, whale, seal] nageoire; [shark] aileron The first two examples undeniably illustrate a GROUP relationship which, in MTT parlance, is represented by the Mult LF (Mult (fish)= school, shoal). The remaining examples, however, illustrate the part-whole relation, a.k.a. meronymy (Cruse 1986). This relationship is definitely not a lexical function (it is a semantic, one-to-many relation) and is not accounted for in Mel'cuk's model of LFs but the availability of the material in the CR dictionary and its potential use in information retrieval, in (machine) translation or in language teaching is so obvious that I have decided to treat it in the database like other types of relations (see also section 8.3 for more detail on this relation). Coding it systematically across the entire dictionary then makes it possible to answer such questions as: Q: What are the main parts of a fish? A: Part (fish)= backbone, beard, bone, fillet, fin, mandible, scale, spine, wattle. Again, it should be stressed here that the absence of items such as head, tail or gills in the list above depends on the methodology adopted in this work. Since the lexical resource used to construct the lexical-semantic database is a bilingual dictionary, more attention is paid in this resource to areas where English and French are not lexically homomorphous. When they

115 are homomorphous, i.e. when they divide semantic areas covered by lexical items in the same way, the metalinguistic annotations in italics are simply not necessary and therefore not used, except if the lexicographer deliberately provided the information to reassure the readers and prevent them from thinking that something is missing. To turn back to the correspondence between metalinguistic indicators and lexical functions, it is clear that lexicographers have sometimes applied different typographical conventions to account for the same relations. Such inconsistencies are perfectly understandable in the context of a lexicographical project which usually involves a large number of collaborators over several years. One of the purposes of the construction of the CR lexical-semantic database was therefore to harmonize all these discrepancies and to make sure that relationships can be accessed by the LF mechanism, irrespective of the original form in the printed dictionary. Consider the following entries which are cases in point: mouthful η [food] bouchée gulp 1 η (b) (mouthful) [food] bouchée, goulée portion η (...) (offood: helping) portion ration η (allowance: of food, goods etc) ration As human readers, we are able to infer that the relationship between food and the four entries above is the same even if brackets are used in the first two examples and a parenthesized ofphrase is used in the last two entries. Moreover, even if an intelligent computer program were able to recognize the similarity between these entries, it could not figure out the nature of this relationship. The corresponding LF, viz. Sing, was therefore assigned manually to each pair of items above in an attempt to formalize the lexical-semantic link in a uniform way. This enables the user of the database to retrieve the words which can be used to express a 'single unit' of food since it now contains the following facts: Sing (food)= gulp, mouthful, morsel, portion, ration... To give but one other example of the need to harmonize the representation of collocations in the database, consider the following entries which all illustrate the S0Son LF (typical noun for the sound of something): lapping η [waves] clapotis plash η [waves] clapotis, clapotement wash η (sound: of waves etc) clapotis whine η [bullet, shell, siren, machine] plainte stridente or monocorde zip η (sound of bullet) sifflement zing η (noise of bullet) sifflement All the typographical devices (brackets vs. parentheses) are used to express exactly the same type of relationship but the semantic interpretation in terms of Mel'cuk's LFs relies on the intelligence of the human reader. The third entry above should in fact be decoded as follows: wash is a noun which expresses a type of sound and which primarily applies to waves and other things (the interpretation of etc is left to the reader's intuition but it is doubtful that

116

computers are capable of such interpretation). The colon which separates sound and the ofphrase is obviously an obstacle to recognizing the similarity with the last two examples which resort to restricted definition patterns or defining formulae. The frequent use of such devices in micro-definitions has enabled me to speed up and even semi-automate the assignment of lexical functions (see Chapter 7 for more details), but it should be borne in mind that the link between wave and lapping or plash above had to be figured out and coded entirely manually, which, for the 70,000 pairs or so, represented more than two years of hard work. Now that the database is completed, the lexical-semantic links implicit in the entries above are uniformly represented as follows: S 0 Son (wave)= lapping, plash, wash S 0 Son (bullet)= whine, zing, zip Before concluding this section, I should like to point out that the method for compiling what corresponds to the lexical co-occurrence zone of an entry slightly differs from the approach adopted in Mel'cuk's ECD. Indeed, Mel'cuk basically starts from an entry (a lexeme, called a vocable in MTT parlance) and a pre-determined list of LFs and then tries to find out which values (mostly collocators) are yielded when one applies the LF to the lexeme. In the CR database, I have worked the other way round since I have started from potentially interesting collocations and tried to identify the LF, i.e. the lexical-semantic relationship which holds between the lexeme (the italicized metalinguistic item in the printed dictionary) and its collocator or associated word (the headword of the printed book). This poses a number of problems to which I shall return below in Chapter 9. The ultimate aim is to try to build a sort of semantic network of associated items and collocational partners, in keeping with the general principle that 'words shall be known by the company they keep' (Firth 1968, Mackin 1978). The idea is therefore to construct a flexible database providing users (translators, linguists, students, etc.) with a formal modelling of the collocational environment of a base, drawing on the descriptive power of lexical functions and offering a variety of access paths ranging from the collocator (as in a traditional semasiological dictionary), the base (as in the ECD and in most collocational dictionaries), the translation, the part of speech or even the lexical function, all of these features being accessible separately or in combination. To illustrate the descriptive power of the LF mechanism, I should like to conclude this section with an example of such a semantic network constructed around the noun suspicion. This item appears under the following 26 entries in the dictionary: arouse, avert, awake, baseless, confirm, dissipate, drive away, eliminate, entertain, harbour, just, quieten, remove, rest, rouse, suspicious(2x), suspiciously (2x), suspiciousness (2x), unsuspicious (2x), verify, well-founded, well-grounded. For the noun suspicion, part of my task has therefore been to figure out which LF, if any, connects the base with each of the items appearing in the list above. As can be seen below, the resulting relational network illustrates a wide variety of LFs and shows that their descriptive power makes it possible to model the paradigmatic and syntagmatic relations this

117 word enters into, despite the various limitations expressed in Chapter 5. The LF-based network looks as follows: suspicion CausFunc 0 : arouse, awake, rouse Liqu: avert, dissipate, drive away, eliminate, quieten, remove Oper,: entertain, harbour Real,: confirm, verify Ver: just, well-founded, well-grounded AntiVer: baseless A,: suspicious A2: suspicious Adv,: suspiciously Adv,: suspiciously AntiA,: unsuspicious AntiA2: unsuspicious S 0 Qual,: suspiciousness S0Qual2: suspiciousness This network clearly comprises syntagmatic relations (the first six categories of LFs) and paradigmatic (substitution) relations. Besides support verbs à la Gross (1981) (one can entertain or harbour suspicions and their respective translations into French - on entretient ou nourrit des soupçons à l'égard de quelqu'un), the list also includes Eradication/Nullification verbs (see the Liqu LF) and a number of most interesting restricted adjectival collocations. Several items appear twice in the paradigmatic series because the values are ambiguous in English and apply to the two different actants involved in the 'suspicion' scenario: the one who suspects and the person or thing that is suspected. Compare the different translations in French: AntiA, (suspicion)= unsuspicious (peu soupçonneux) AntiA2 (suspicion)= unsuspicious (qui n'a rien de suspect) The original entry in the printed dictionary may be necessary to illustrate how this information is derived: unsuspicious adj (feeling no suspicion) peu soupçonneux, peu méfiant. (causing no suspicion) qui n'a rien de suspect, qui n'éveille aucun soupçon As I argue in Chapter 7, the defining formulae "causing + Ν" or "feeling + N" are used as clues to identify the appropriate lexical function and to semi-automate the assignment procedure.

118

6.5

Collocations, terminology and lexical functions

The Collins-Robert dictionary is a general-purpose dictionary, not an LSP lexicon. This means that the user should normally not expect to find in it the domain-specific multi-word units which attract so much attention in translators' schools and in terminology handbooks. It is therefore not my intention to tackle the problem of the relationship between collocations and language for special purposes. Symposia have been devoted to the subject (see de Bessé 1992 or the special issue of Terminologies Nouvelles 1993) and doing so would lead me too far away from the topic of this book. It should be realized that Mel'cuk's apparatus of lexical functions is primarily designed to account for recurrent general-language lexical-semantic relationships and the emphasis laid by Mel'cuk and his colleagues on the description of abstract terms shows that he is very much aware that lexical functions can hardly be used for the semantic description of LSP complex terms. However, it should also be borne in mind that elaborate collocational information is usually not found in specialized dictionaries (Moulin 1983).6 By systematically specifying co-occurrence information which other dictionaries usually leave unsaid, Mel'cuk's ECD has drawn the lexicographer's attention to the need for new LSP lexicons whose design should make it possible to answer crucial questions translators, students or simply laymen usually cannot answer straightway. In a penetrating and stimulating article on new forms of specialized dictionaries, Frawley (1988a) notes that specialized lexicons do not usually provide anything about how a given term selects cooccurrence. A geological term such as winze, for instance, is defined as 'a vertical or inclined opening, or excavation, connecting two levels in a mine, differing from a raise only in construction' (AGI 1962). This information is not sufficient, Frawley argues (p.201), "since we need to know what kind of verbs co-occur with winze, if any, and what kind of prepositions it can take. We also need to know more about the paradigmatic structure, such as whether or not the term has any contrasts. We need to know whether or not winze takes certain copulas and if winze can be augmented, degraded, eliminated, caused, etc. In short, we need to know more of the actual geological discourse about winze". This specialized discourse can be elicited through interviews conducted with specialists who act as informants. Among the various questions a geologist might be asked, Frawley lists the following: - Is there a name for a group of winzes? (Mult) - Is there some way of talking about the closing off of a winze? (Liqu) - Can there be more or less of a winze? (Minus, Plus) - Can the word winze be used attributively, as in Winze XI (A0) Of course, as Frawley points out, some of these questions may sound absurd to a geologist but they are primarily intended to provide the lexicographer with information about the lexical structure of the item under scrutiny, which means that there may not be any answer in some cases. What is important, however, is that lexical functions can be used to describe some of

6

Although this seems to be changing when one considers the new generation of technical/specialized dictionaries which have emerged in the past few years (see my discussion in Chapter 3 and, more specifically, Verlinde 1995, Verlinde et al. 1992; Cohen 1986, Lainé 1993).

119

the relationships that can be established between a term such as winze and related items in the domain. It is reasonable to expect the lexicographer to discover a number of domain-specific lexical-semantic relations, however, which is why Mel'cuk's model is often found too shallow by terminologists. Blampain et al. (1992) and Merten (1992), for example, illustrate a number of notional relations, i.e. relations between notions, not between terms, which are specific to a given subject field and do not seem to have any equivalent in Mel'cuk's model (e.g. the relationship between an object and the material it is made of, the relationship between a given discipline and what this discipline studies - e.g. pedology => soil - or even the relation of temporal succession). The material contained in the CR dictionary is, this point is worth labouring, primarily made up of general-language items. This does not mean that it is totally useless from a terminological point of view, however, since it contains collocational information for artifacts which bears a strong resemblance with what terminologists are looking for in their quest for the specialized discourse Frawley (1988a) is alluding to. In fact, this dictionary contains collocations which are technical but familiar to the layman. Consider the noun sail which can by no means be considered as a very specialized nautical term, since it is part of general vocabulary. The collocational environment of this word, however, provides ample evidence that it deserves extensive treatment in a terminological description of nautical terms (see Van Campenhoudt 1994). The entry for sail in CR is very poor from a phraseological point of view (the only interesting expressions listed in it are "under sail" and "to set sail for"), but the numerous occurrences of sail in the metalinguistic sections of entries enable us to get a more appropriate picture of the actual nautical discourse about sails. Some of the words which enter into a syntagmatic relationship with sail can be grouped and described in terms of some of Mel'cuk's standard LFs. From the CR data, one might imagine the following lexicographic interview conducted by a terminologist or lexicographer with a specialist in navigation: Q: Is there a name for a regular group of sails? A: Mult (sail)= set (Fr. jeu) Q: Is there a verb for the typical sound of sails? A: Son (sail)= flap (Fr. claquer) Q: Is there a noun for the typical sound of sails? A: S 0 Son (sail)= flap (Fr. claquement) Q: Are there transitive verbs taking sail as a direct object and meaning that the typical purpose associated with sails is realized? A: Real, (sail)= get up, haul up, hoist, put up, shake up, spread (Fr. hisser, déployer) Q: Are there transitive verbs taking sail as a direct object and meaning that the typical purpose associated with sails is no longer realized? A: AntiReal, (sail)= haul down (Fr. affaler), lower (Fr. abaisser), strike (Fr. amener) Q: What are the typical parts of sails? A: Part (sail)= belly (Fr. creux), gore (Fr. pointe)

120 Q: Are sails part of a larger entity? A: Whole (sail)= barge (Fr. barge), yacht (Fr. yacht, voilier) The last two relations are not covered by Mel'òuk's apparatus because they are not functions proper, but rather 1 to η relations. As I said earlier in this section, I have introduced the Part and Whole relations because they are among the most frequent semantic relations used in terminology and because they form an important subset of syntagmatic relations and therefore cannot be ignored in a lexical-semantic modelling of the collocational environment of a lexical item (see section 8.3 for additional arguments). It should not be inferred from the above data that all collocations can be interpreted in terms of lexical functions, however. As I was arguing above, some relationships may be too domain-specific and I have not tried to 'force' the data or to be too 'pushy' in assigning LFs to pairs of related items. The following examples show that the CR dictionary also contains a certain amount of collocational information which cannot be readily modelled with LFs and I have not tried to study every subject field in detail to invent LF-like relations which would only be used once or twice in a database of 70,000 records. It is undeniable, however, that the following data are of paramount importance in the compilation of a collocational lexicon, specialized or not. Continuing the imaginary dialogue started above, one might elicit the following information which complements the combinations above: Q: Are there any other transitive verbs which can take sail as a direct object and which indicate what can be done to sails? A: One can bend (Fr. enverguer),/w/7 (Fr. ferler), gore (Fr. mettre une pointe à), puff (Fr. gonfler), reef (Fr. prendre un ris dans), swell (Fr. gonfler) or trim (Fr. gréer) a sail or sails. Q: Are there any verbs taking sail as subject and indicating what sails typically do? A: Sails can billow or billow out (Fr. se gonfler),/;// out (Fr. gonfler), puff out or puff up, swell or swell out (Fr. se gonfler), they can also gibe (passer d'un bord à l'autre du mât). Q: Are there any Adj+N or N+N combinations which can be of any interest in describing sails? (e.g. what can sails be like?) A: Sails can be billowy (Fr. gonflé par le vent),/«// (Fr. plein) or swelling (Fr. gonflé). The billow of a sail (Fr. gonflement) is also a regular combination, besides other N+N collocations which correspond to a standard LF, like flap of sails above (S„Son). Undoubtedly, the CR metalinguistic apparatus then proves to be a crucial starting point to access collocational data and the combinations above testify to the richness of the bilingual dictionary. The need to collect, interpret and systematically restructure all this information is also obvious when one considers the use that can be made of all this decision-making material in a translation perspective. The range of acceptable combinations of related items above is precisely the memory-jogging source of inspiration one is seeking when writing a text, especially in a foreign language perspective and the availability of the dictionary in machinereadable form is the prerequisite for making this information both explicit and readily accessible. The systematic enrichment of the 70,000 contextual partners contained in the dictionary could not have been carried out on the basis of a full-text retrieval system such as

121

WordCruncher, however, and a drastic reorganization of the lexical data was necessary in order to implement the updating and querying functionalities required by the applications I had in mind. I now wish to turn to a description of the structure of this lexical database and of the programs I implemented to assign lexical functions whenever possible.

6.6

A relational database

I pointed out above that the CR file as originally delivered to us by the publishers is not a computerized dictionary as defined by Michiels (1982), but rather a machine-readable dictionary whose form is very close to that of the printed book. Unlike the LDOCE file, the CR file does not display any other structure than what was necessary to drive the typesetting process. This implies that the various types of information - we will call them fields - i.e. part-of-speech, headword, translation equivalents, selection restrictions, etc., are not formalized and can only be identified on the basis of typographical criteria (e.g. the presence of a code signalling the beginning of italics or boldface - see also Michiels 1983a for further details). To give the reader an idea of the complexity of the task Jacques Jansen was faced with in converting the typesetting file into a database, I reproduce an extract of the CR magnetic tape illustrating the entries abandon and abandoned (quoted from Michiels 1983a:8): >ul< abandon >ul55< [ >ul1< >ul8< b >u43< nd >ull< η ] >u2< 1 >u6< vt >u2< (a) >u8< >u6< forsake >u9< >u6< person >u5< abandonner, quitter, de>ul29u8< >u6< fig >u9< >u4< to >u40< o.s. to >u5< se livrer a>ul28< . S'abandonner a>ul28< , se laisser aller a>ul28< , >u7< >u3< (b) >u8< >u6< Jur etc >u5< : >u6< give up >u9< >u6< property, right >u5< renoncer a>ul28< ; >u6< action >u5< se de>ul29u7< >u3< (c) >u5< faire (acte de) de>ul29u7< >u3< 2 >u6< η >u8< U >u9< >u5< laisser-aller >u6< m >u5< , abandon >u6< m >u5< , rela>ul32u6< m. >u4< with (gay) >u40< >u5< avec (une belle) de>ul29u7< >ul< abandoned >ul55< [ >ul 1< >ul8< b >u43< nd >u 11 < nd ] >u6< adj >u2< (a) >u8< forsaken >u9< >u6< person >u5< abandonne>u 129< , de>u 129u 129< ; >u6< place >u5< abandonne>ul29< . >u2< (b) >u8< dissolute >u9< de>u 129u 129< , >u7< The complexity of the sample above clearly demonstrates that the exploitation of the CR file entailed prior identification and interpretation of the numerous structural markers such as typesetting codes or font change controls (>u6< is used to switch to italics), fields and record delimiters (>ul< introduces a new entry) or special characters (>ul29< corresponds to an acute accent and >ul32< designates a circumflex accent). Boguraev (1991) discusses a dictionary entry parser aimed at converting a series of MRDs to a common LDB format along the lines suggested here and presents some of the problems encountered in analyzing dictionary data.

122 In the preceding sections, I pointed out a number of drawbacks associated with the use of a full-text retrieval system such as WordCruncher. Let me briefly summarize the reasons why this tool was felt to be unable to meet our needs and totally inadequate in several respects. 1. Although access is very fast, only full indexing of strings of characters is possible. Since the various fields are not formalized, i.e. there is no precise identification of the functional nature of a sequence of characters, it is for example not possible to figure out whether the italicized string prep is part of a grammatical comment (e.g. him pers proti (after prep etc)) or refers to the part of speech of the headword. Moreover, in an italicized sequence of words, one pair of opening and closing square brackets or parentheses frequently surrounds several items, for reasons of economy, which means that it is impossible to achieve a sufficient level of detail in the queries without determining the functional nature of the italicized expressions. 2. Customizing functionalities in standard text retrieval systems do not offer the possibility of selecting the information one wishes to display or export. This all-or-nothing policy does not allow a linguist to display and export only selected fields such as the headword and its translations, for example, or the headword and its typical subjects only, neglecting POS information and translations. 3. Automatic checking facilities are badly needed in order to avoid the errors and inconsistencies which are so common in dictionary making. Such facilities presuppose that the retrieval system is sensitive to the functional nature of the information items appearing in an entry. Checking the consistency of POS labels or of subject field codes presupposes that one can list them for further analysis, which can only be done once the dictionary is available in computerized form. 4. Full-text retrieval systems do not offer updating facilities. Yet, it should be borne in mind that the main bulk of the research reported on in this book has consisted in adding lexical-semantic information in the form of LFs or LF-like labels. This implied that a new field had to be created to house this information, besides other types of additional material to which I shall return below. Moreover, for reasons that will become clear in the remaining part of this chapter, the metalinguistic apparatus of the dictionary needed to be slightly modified and the adjustments could not have been carried out had the dictionary not been available in database form. For all these reasons, it was decided to embark on the construction of a lexical database in which the various types of information would be structured and formalized. The relational model enabled me to make full use of the retrieval and indexing facilities of standard relational database management systems (rdbms). Since other monolingual dictionaries are already available in the same type of format in the English department, this would also enable the Liège team to combine a monolingual dictionary and a bilingual dictionary with a view to carrying out experiments in word sense disambiguation and translation selection in an MT perspective (see Michiels 1996). As pointed out by Michiels (1995b), some criticism has been levelled against the use of the relational database model for the implementation of lexical databases (see the unpublished reports of the EUROTRA-7 study or, more recently, Boguraev et al. 1992 and Ide et al. 1994). Arguments countering this criticism are put forward by Michiels (1995b:94) as follows:

123

"It is pointed out that the restrictions of the relational model (fixed number of fields, fixed field length) make it extremely difficult to implement a lexical database in such a model. For instance, lexical items have different numbers of homographs; homographs have different numbers of associated definitions; definitions have different numbers of associated examples; definitions and examples are of varying length, from a single word to a full sentence. However, it is in the very essence of the relational model to work with a series of tables, rather than a single one. Consequently, the fact that one definition has one example, whereas the following has six, is not really a problem if the definition table and the example table are distinct tables related by one or several common fields, as they are in the Liège LDOCE database". Like LDOCE, the Collins-Robert dictionary is now also in relational database format. The database design and the transformation of the CR tape into dBaseIII+ relational tables was carried out by Jacques Jansen.7 The retrieval software consists of a series of application programs, written in C (by Luc Alexandre) or in Clipper (by myself). 8 1 also wrote a Clipper program for enriching the database with lexical-semantic labels (see below).

6.6.1

Structure of the database

In this section, I should like to describe the structure of the relational database as designed by Jacques Jansen. As pointed out above, it is in the essence of an rdbms to work with a set of distinct files (called tables) related by one or several common pieces of information (called fields). Ide et al. (1994:288) indicate that a relational database consists of a set of relations between entities and each role in that relation is called an attribute. Conceptually, they note, a relation is a table whose columns correspond to attributes and each row specifies all the values of attributes for a given entity. Working with tables implies that the user has only a fragmented view of the data and a table taken in isolation does not seem to make any sense. It is therefore the task of the application program to reconstruct a coherent view of the data by selecting information in the various tables on the basis of their common fields. In order to avoid duplicating data, the CR information has therefore been split across several tables, containing either source headwords, or part-of-speech information, metalinguistic indicators, French translation equivalents, etc. An example is in order here to illustrate this fragmentation of the data and to give the reader an idea of what a table looks like. I shall start from the entry for abandon in the printed dictionary to show how redundancy can be avoided. abandon 1 vi (a) (forsake) person abandonner, quitter, délaisser (b) (Jur etc: give up) property, right renoncer à; action se désister de; (c) (Naut) ship évacuer; (Jur) cargo faire (acte de) délaissement 2 η (U) laissez-aller, abandon

7 8

dBase is a trademark of Ashton Tate, Inc. Clipper is an enhanced dBase language and is a trademark of Nantucket Corporation.

124 The following sample records illustrate the first records of the lemma database, called enhead.dbf. This table includes three fields, one for the entry number (ENTRY), one for the homograph number (HMG) and one for the headword proper (HDWORD). As can be noticed, abandon is only cited once here even if the printed entry comprises two main senses, one for the noun and another one for the verb. This information is to be found in the part-of-speech table, called enpos.dbf, in which one clearly sees that entry n°6 has two main meanings (MNG), the first one comprising three sub-meanings introduced by a small letter (la,lb,lc). It should also be noticed that redistribution of information has taken place since a POS label such as vi, which appears only once in the printed version of the dictionary, actually applies to all three sub-meanings. Lemma database : enhead.dbf ENTRY HMG

11 1 1 2 3 4 5 6 7 8

1 2 1 1 1 1 1 1 1

HDWORD

A a a Aachen aback abacus abaft abandon abandoned abandonment

Part-of-speech database: ENTRY 4 5 5 6 6 6 6 7 7 8 9 10

HMG 1 1 1 1 1 1 1 1 1 1 1 1

MNG lb la 2a la lb lc 2a la lb la la la

enpos.dbf

POS η adv prep vt vt vt η adj adj η vt η

The table called italic.dbf contains all metalinguistic indicators appearing in the dictionary. As can be seen below, each indicator is associated with the entry under which it appears, its homograph number, its meaning number and the translation number to which it applies. One of the most important pieces of information in this database is the field TYP which refers to the nature of the string in italics (the ITWORD field). This 1-character marker can have the following values: M or m: subject field code (parenthesized and capitalized in the printed version: Econ, St Ex, Anat...); Ρ or P: the string appears in parentheses (micro-definition, explanation, hyperonym, synonym...); C or c: the string appears in square brackets ("C" for "crochet" - square bracket - in French); e.g. the deep subject of a verb; S or s: the string is unbracketed (it can be viewed as appearing at "surface" level, hence the "S"); e.g. the deep object of a verb.

125 This field is of paramount importance since it can be used to retrieve all the items pertaining to a given subject field (i.e. display all items belonging to the field of Anatomy: TYP = "M" and ITWORD = "Anat"). This field also houses information about the functional nature of a given metalinguistic indicator: to retrieve the verbs which can have the noun ship as typical direct object, one has to establish a relation between the files enhead.dbf, enpos.dbf and italic.dbf, testing on the presence of the string "vt" in the field POS, on the presence of the string "ship" in the field ITWORD and checking that TYP = "S" ("ship" appearing at "surface" level, i.e. unbracketed, in a verbal entry). Needless to say that computing all this information was done automatically and the various problems discussed in the preceding sections have been taken into consideration to make sure that what was implicit in the printed version is now fully explicit (see the problems related to the necessary decompacting and redistribution of the bracketing information I alluded to above - see p.l 10). Another thing to note is that the field ITWORD does not necessarily contain single-word strings of characters. A look at the examples below is enough for one to realise that phrasal verbs such as give up can appear in this field, which means that the program written by Jacques Jansen to capture this information has used various types of separators (brackets, full stops, square brackets, commas, etc), with the exception of the blank. The purpose was obviously to make sure that words that belong to the same type of information should be treated similarly (e.g. phrasal verbs, compounds, micro-definitions, etc). Metalinguistic database: ENTRY 6 6 6 6 6 6 6 6 6 6 β 6 6 6 6 6 6 6 6

HMG 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

MNG la la la la la la lb lb lb lb lb lb lb lb lb lc lc lc lc

TRN 010001 010001 010002 010002 010003 010003 010001 010001 010001 010001 010001 020001 020001 020001 020001 010001 010001 020001 020001

italic.dbf

ORI USE GRP Η 10 Η 20 Η 10 Η 20 Η 10 Η 20 10 Η Η 10 Η 10 Η 20 20 Η Η 10 Η 10 Η 10 Η 20 Η 10 Η 20 Η 10 20 Η

TYP Ρ S Ρ S Ρ S M M Ρ S S M M Ρ S M S M S

ITWORD forsake person forsake person forsake person Jur etc give up property right Jur etc give up action Naut ship Jur cargo

Translations appear in the field TRLINE of the table called frtran.dbf. As can be seen below, translation equivalents, which are separated by commas in the printed version, are considered as separate pieces of information. Abandonner, quitter and délaisser, which are the translation equivalents for the first sub-meaning (MNG = la) of the entry for abandon (ENTRY = 6; HMG = 1) are distinguished thanks to the TRN field whose first two digits indicate that they belong to the first group of translations for the relevant sub-meaning (there is only one group of translations for la). The last two digits indicate the respective position of these equivalents within a given group. The second sub-meaning (MNG = lb) comprises two groups of

126 translation equivalents, depending on the metalinguistic indicator they are associated with. TRN = 010001 indicates that renoncer à is the first equivalent in the first group of translations for sub-meaning lb. TRN = 020001 indicates that se désister de is the first equivalent in the second group of translations for this sub-meaning. Translation database: ENTRY 6 6 6 6 6 6 6 6 6

HMG 1 1 1 1 1 1 1 1 1

MNG la la la lb lb lc lc 2a 2a

TRN 010001 010002 010003 010001 020001 010001 020001 010001 010002

frtran.dbf ORI U S E H H H H H H H H H

L 1 1 1 1 1 1 1 1 1

M 1 1 1 1 1 1 1 1 1

TRLINE abandonner quitter délaisser renoncer à se désister de évacuer faire (acte de) délaissement laisser-aller abandon

In this database, and in other tables, the one-byte field ORI indicates whether the record is associated with a headword (ORI = H) or with a compound (ORI = C). In the latter case, the relevant compound will appear in a separate table called encpds.dbf (see below). In the following entry, for instance, the italicized item friend appears in the "compound" section of the entry for life and should in fact be associated with the compound lifelong, a piece of information which is to be found in the field ORI (for "origin" of the information): life 1 η (...) vie (...) 2 cpd life-and-death struggle combat à mort, lutte désespérée; life jacket gilet de sauvetage; (...) lifelong friend, friendship de toujours (...) The sample in the table above, chosen for the sake of simplicity, does not make it clear that the program written by Jacques Jansen to build the database has taken care of the redistribution of information when the lexicographer has used separators such as commas or conjunctions in translations. A conjunction such as or can indeed lead to serious misunderstandings when the machine has to interpret translation equivalents which have been truncated for reasons of economy. Consider the following examples: advertise vt (Comm etc) goods faire de la publicité or de la réclame pour alphabetize vt mettre en or classer par ordre alphabétique adult classes classes pour or d'adultes advance payment paiement anticipé or par anticipation The translations have been adapted and reconstructed for the user to be able to query the database against a possible translation such as "faire de la réclame pour", "mettre en ordre alphabétique", "classes pour adultes" or "paiement par anticipation". The use of the conjunction or as a space-saving device should indeed not obscure the fact that we have two possible translations and that it is extremely hard to recover the various bits and pieces of each translation in an automatic way, because the lexicographer usually relies on the human user's common sense to do so. For the 20,000-odd occurrences of or, an interactive program had to be applied to make the correct choice in reconstructing the intended translations. In the

127 first example above, it is clear for a human reader/user that the first word of the translation only (faire) is to be retained in the second translation (faire de la réclame). For alphabetize, "mettre en" is to be kept and combined with the last two words only ("classer par" has to be deleted) -> "mettre en ordre alphabétique" vs. "classer par ordre alphabétique". The last two examples feature two words on either side of or but, if we can keep the first, second and fourth words to yield "classes pour adultes", applying the same strategy in the last example would yield the absurd translation "paiement anticipé anticipation". Reorganizing this information purely automatically would have required syntactic, lexical and semantic knowledge which is just what researchers have been trying to extract for a number of years from MRDs and textual corpora. Doing so without manual intervention and with a reasonable rate of success was therefore impossible. This reorganization was carried out by Jacques Jansen thanks to an interactive program and each translation unit now forms a record on its own in the frtran.dbf database (e.g. "faire de la publicité pour" vs. "faire de la réclame pour"). The same kind of restructuring was also necessary in the case of micro-definition patterns in which a preposition is used only once but is valid for a whole range of items, as in the following example: case η (for watch, pen, necklace) écrin; (for camera, binoculars, umbrella, violin) étui It is clear that for is used for all the items in italics, and not just for the first one. The redistribution had to be carried out before the implementation of the database proper. The italic.dbf file generated by Jacques Jansen's programs therefore contains information which has been restructured along these lines, which means that the records for the entry corresponding to case in the various tables contain the following items, simplified for the sake of clarity: Enhead.dbf case

pos.dbf η

italic.dbf frtran.dbf for watch écrin for pen étui for necklace for camera for binoculars for umbrella for violin

It is of course the task of the application programs to reconstruct the entries and to make sure that case is translated as étui when the metalinguistic information specifies that for umbrella is the intended explanation. Since all the tables have several common fields used for indexing and pointing purposes, viz. ENTRY, HMG, MNG and TRN, redundancy of information is avoided. It will be noted that this reorganization was necessary for the user to be able to use the numerous defining formulae such as "FOR + NP" as clues to identify relevant lexical-semantic relations. In the present case, it is clear that the relationship linking case and the nouns watch, pen, necklace, etc. is one of typical location, which Mel'öuk formalizes in terms of the Sloe

128 function. Without redistributing the data as shown above, it would have been impossible to locate all italicized sequences introduced by the preposition for with a view to semiautomating the assignment of LFs as described in Chapter 7).9 The general structure of the Collins-Robert tables is described below. Lemma database (source - English - headwords) Structure for database: enhead.dbf Number of data records: 32941 Date of last update : 01/25/95 Field Field Name Type Width 1 ENTRY Character 5 2 HMG Character 1 3 HDWORD Character 21 ** Total ** 28

Dec

ENTRY: entry key (one per lexical item, irrespective of the number of homographs) HMG: homograph number HDWORD: headword Database of English compounds Structure for database: encpds.dbf Number of data records: 17964 Date of last update : 01/26/95 Width Field Field Name Type Character 5 1 ENTRY 1 2 HMG Character 2 3 MNG Character 4 DFN Character 2 2 5 VAR Character 77 6 CPWORD Character ** Total ** 90 MNG (2 characters): 1 character for the subdivision by part of speech (digit) 1 character for the small letter inside the part of speech (letter) DFN: main definition number (main definitions are separated by ";" in the printed dictionary) VAR: variant number (distinguished by "/" or "or" in the printed dictionary) CP WORD: compound word Database of part of speech specifications for source (English) headwords Structure for database: enpos.dbf

9

A similar problem arose with the negative particle not, which can be used as a clue to identify antonyms (LF = anti). Consider the following example in which twisted comes within the scope of the negation: straight adj (not curved, twisted etc) line, stick, limb, edge droit; In order to be able to use not as a pointer to a relationship of antonymy between straight and twisted, the same type of redistribution was carried out (not curved, not twisted). It would have been dangerous to fully automate this procedure, however, as is shown in the following example: insecure adj (not firm, badly fixed) bolt, nail, padlock peu solide, qui tient mal It is clear that the scope of the negation does not extend to badly fixed here and that the relationship between insecure and badly fixed is one of synonymy.

129 Number of data records: 54503 Date of last update : 03/11/95 Field Field Name Type Width 1 ENTRY Character 5 2 HMG Character 1 3 MNG Character 2 4 POS Character 14 ** Total ** 23

Dec

POS: part of speech (e.g. n, adj, vt prep, vt fus, vi...) Database of French translations Structure for database: frtran.dbf Number of data records: 134207 Date of last update : 01/26/95 Field Field Name Type Width 1 ENTRY Character 5 2 HMG 1 Character 3 MNG 2 Character 4 TRN . Character 6 5 ORI Character 1 6 USE Character 1 7 L Character 1 1 8 M Character 9 TRLINE Character 48 ** Total ** 67

TRN: Translation number ORI: Origin: H=Headword; C=Compound USE: refers to the particular use of a word or compound (full information is to be found in the enuse.dbf - see below) L: counter for translation lines M: number of lines of the translation TRLINE: French translation line It will be noted that the maximum length of a translation line has been limited to 48 characters. This figure is the result of a statistical analysis of the average length of translations. For translations which exceed this number of characters, a new record has been created to house the following 48 characters (49->96), and so on, and so forth, and it is the task of an application program to reconstruct the full translation on the basis of the fields L and M. These two fields act as counters keeping track of the current translation line and the number of translation lines for a given translation equivalent respectively. (L=l and M=2: this record corresponds to the first translation line - characters 1 to 48 - of a translation equivalent which is made up of two lines of 48 characters each). Database of metalinguistic information Structure for database: italic.dbf Number of data records: 157071 Date of last update : 09/07/94 Field Field Name Type Width 1 ENTRY Character 5 2 HMG Character 1 3 MNG Character 2 4 TRN Character 6

Dec

130 5 ORI 6 USE 7 GRP 8 TYP 9 ITWORD Total **

Character Character Character Character Character

1 1 2 1 36 56

The very high number of data records (more than 150,000 italicized items, not counting POS which are found in enpos.dbf) can be explained by the presence of subject field codes and collocations, selection restrictions, synonyms, etc. In the framework of this project, I have exclusively paid attention to the latter category of items (i.e. those items whose TYP field is not equal to "M" or "m"), a set of items which comprises over 70,000 records. Database of example sentences Structure for database: efexam.dbf Number of data records: 62607 Date of last update : 07/30/94 Field Field Name Type Width Character 1 ENTRY 5 Character 1 2 HMG Character 2 3 MNG 4 L Character 1 5 M Character 1 6 EXLINE Character 128 ** Total ** 139

EXLINE: contains example lines with a maximum number of 128 characters. The method used to keep track of records whose length is greater than that number is similar to the treatment of translation lines in frtran.dbf. This file is not used in this work because the separation between English examples and their French translations has not been carried out. Exploiting this file of bilingual data fell outside the scope of this book, but it undoubtedly deserves an in-depth treatment which it gets in the framework of the DEFI project. It is for example clear that what the CR lexicographers have called "examples" may contain highly idiosyncratic, totally unpredictable combinations, some of which border on idiom, while others clearly belong to the collocation range. These combinations are usually nested within examples, however, and recovering them requires appropriate tagging and parsing strategies to be computationally exploitable (see Michiels & Dufour 1996). Database of cross-references Structure for database: enlink.dbf Number of data records: 10391 Date of last update : 07/30/94 Field Field Name Type Width 1 ENTRY Character 5 2 HMG Character 1 3 MNG Character 2 4 DFN Character 2 5 LINK Character 1 6 ORI2 Character 1 7 ENTRY2 Character 5 8 HMG2 Character 1 9

MNG2

Character

2

Dec

131 10 DFN2 11 RFWORD 12 RFNOTE ** Total **

Character Character Character

2 40 35 98

The enlink.dbf database, which is not used in the application programs described below, contains information about the various types of links and cross-references nested in dictionary entries. The field LINK, for example, can have the following values: =: equivalence -: cross-reference A: see also... U: points to a US equivalent (American English) ORI2: refers to the nature of the item which is referred to (H: headword; C: compound; X: unknown) ENTRY2, HMG2, MNG2, DFN2 point to the entry, homograph, meaning and definition number of the item which is referred to. RFWORD points to the item which is referred to (when the key - ENTRY2, HMG2... - has not been computed). RFNOTE: optional comment on the link Database of usage contexts Structure for database: enuse.dbf Number of data records: 6862 Date of last update : 01/26/95 Field Field Name Type Width 1 ENTRY Character 5 2 HMG Character 1 3 MNG Character 2 4 TRN Character 6 Character 69 5 CTWORD 84 ** Total **

The enuse.dbf database contains information about a particular use of the word or compound. The field CTWORD will refer to the form of the headword in a given context. For instance, if a certain translation applies to the plural form of the headword only (e.g. abilities under ability), this inflected form will be specified in this field. Database of registers/style labels Structure for database: fnnark.dbf Number of data records: 4281 Date of last update : 02/01/95 Width Field Field Name Type 1 ENTRY Character 5 Character 1 2 HMG Character 2 3 MNG Character 6 4 TRN Character 15 5 TRNOTE 30 ** Total **

TRNOTE: this field contains information about register or style labels of the French translation. In the printed version of the dictionary, these labels are usually represented by one, two or three asterisks, one indicating that the translation is informal, two that it should be handled with extreme care and three that the word is offensive or taboo. Other symbols

132

are also used to indicate archaic or obsolete words. In the database, such symbols have been replaced by more telling labels (e.g. informal, offensive, obsolete, etc.). In order to set relations between the various tables, the tables have been indexed as follows: enhead.dbf: indexed on entry+hmg frtran.dbf: indexed on entry+hmg+mng+trn enpos.dbf: indexed on entry+hmg+mng italic.dbf: entry+hmg+mng+trn encpds: entry+hmg+mng+substr(trn,l,4) (= the 4 leftmost characters of the TRN field) enuse: entry+hmg+mng+trn frmark: entry+hmg+mng+trn

6.6.2

Modifying the metalinguistic apparatus

Before moving to a description of the programs which enabled me to enrich the database with LF-like labels, I should like to stress the fact that the file italic.dbf underwent some adjustments that had to be carried out manually. In some cases, the contents of the field TYP in the italic.dbf database had to be modified because the lexicographer had not respected the guidelines concerning the typographical features of the metalinguistic apparatus. In the following example, the TYP field of the italicized items metal, gilded frame, mirror, reputation and memory was assigned the value "C" because these items were misrepresented in square brackets in the dictionary: tarnish 1 vt [metal] ternir; [gilded frame etc] dédorer; [mirror] désargenter; (fig) [reputation, memory] ternir. 2 vi se ternir; se dédorer; se désargenter. The italicized collocations are obviously direct objects in the first meaning of the entry and should therefore have appeared at surface level (TYP="S"), i.e. unbracketed. Such mistakes were of course rectified, as they are in the more recent editions of the dictionary, before embarking on the systematic lexical-semantic labelling of the contextual pairs of items. More importantly, a series of adjustments also needed to be carried out to ensure consistency in the database and to make it more conducive to NLP applications. A given metalinguistic item was for instance likely to appear in the plural under certain entries and in the singular under others. In some cases, this was lexicographically justified but would have hampered any recognition of the link between the entries, had the adjustments not been carried out. Consider the following entries which feature occurrences of the nouns mouse and bird in the singular: cheep η [bird] piaulement; [mouse] couinement hole η (... for mouse; also Golf trou peep ν/ [bird] pépier, piauler; [mouse] pousser de petits cris aigus Compare with the following entries in which the plural forms mice and birds are used:

133 brood η [birds] couvée, nichée; [mice] nichée; [children] progéniture, nichée (hum)·, [vipers, scoundrels] engeance eat away vt sep [sea] saper, éroder; [acid, mice] ronger nest η [birds, mice, turtles, ants etc] nid Although it makes sense for the lexicographer to use a plural form when describing a set of relation (LF = Mult) between bird/mouse and brood or a locative relation (LF = Sloe) with nest, it must be recalled that, in a word sense discrimination perspective based on the identification of collocations and genus terms such as described in Michiels (1996), any NLP program needs to be able to access information via the lemmatised and not the inflected forms. In order to be able to fully exploit the contents of the database, it was therefore necessary to lemmatise the italicized forms to remove inconsistencies as in: double vt number doubler; salary, price doubler, augmenter du double double ν/' [prices, incomes, quantity etc] doubler Lemmatization of plural prices was a pre-requisite for the identification of the link between the causative (vt) and the inchoative (vi) uses of double (see Chapter 12 for details about the retrieval of causative/inchoative, a.k.a. ergative, verbs). Moreover, it would not make sense for a user who wishes to explore the combinatory potential of the noun price to have to specify that he is interested in retrieving all the occurrences of price or prices in italics (114 and 89 occurrences respectively). Another type of adjustment to the metalinguistic apparatus I had to carry out at several places concerned the deletion of determiners in sequences of italicized words. The following table illustrates a few of the numerous cases where the presence of a determiner, however justified from a lexicographical point of view, would have been an obstacle to the formalization and computational use of selection restrictions.

134 English headword (HDWORD) brood on divine expound recover strain peerage shear through shear through preach wipe out associate contraction exercise fasten further proffer shrug off tell

TYP S S

s c s Ρ

s s s s c c s s s s s s

Metalinguistic sequence of items (ITWORD) the past the future the Bible the economy the economy the peers the waves the crowd the Gospel the past a society a habit a right a crime a cause a remark a cold a lie

Clearly, the use of a determiner is justified because it points to the meaning of the metalinguistic indicator (cold under shrug off is used in the medical and not in the meteorological sense). For retrieval purposes, however, it would not be natural to ask the user to formulate a query such as "List the transitive verbs which correspond to the Liqu function and which can have a cold as direct object". Furthermore, the presence of a given determiner should not obscure the fact that the collocate may, in some cases, appear with another determiner or with the zero article as well {to tell a lie is definitely a well-formed collocation but to tell lies is equally correct). Since word sense assignment in NLP systems usually relies on the computation of the heads of the deep grammatical arguments of a predicate, it has been necessary the revise the whole italic.dbf file to edit the field ITWORD accordingly. A final category of adjustments relates to cases where a given italicized item is pre- or post-modified to convey subtle nuances. Such nuances draw on the human reader's common sense, however, and are hardly formalizable at all, as is clear in the examples below: fence vt question éluder fend off vt awkward question écarter, éluder The relationship between fence and fend off is undoubtedly one of synonymy, at least in the vicinity of the noun question. Yet, this relationship can only be implicit because the two selection restrictions are formally different. The presence of the adjective awkward certainly contributes to refining the description of the collocational environment of fend off but leaving the sequence of metalinguistic items as is would prevent any NLP system from selecting the appropriate French target translation in the following (real) sentence quoted from Cobuild:

135 It was the first time Daniel had spoken, except to fend off questions. Again, I edited such cases manually to make sure that only the head was preserved and could be used by the retrieval program described at the end of this chapter. Many other changes were also carried out systematically in the italic.dbf database but they will be dealt with elsewhere because they deserve a whole chapter to themselves. For the most part, these other changes relate to the use that has been made of the numerous definition patterns found in the dictionary micro-definitions (see Chapter 7).

6.6.3

Enriching the database with lexical-semantic information

In order to enrich the database with lexical-semantic labels à la Mel'cuk, I wrote an application program in Clipper whose function was to create an additional table containing the originally italicized metalinguistic item with the lexical function which relates it to the headword under which it appears. Since this book is concerned with paradigmatic and syntagmatic relations, the program was designed so as to exclude the numerous subject field codes contained in the italic.dbf database. This was done by setting a filter on the field TYP, excluding cases where TYP = "M" (see above). The program, called lftool.exe. creates a distinct table, which I modestly called mybase.dbf and which shares several common fields with the other tables (see structure below). Mybase.dbf bears a strong resemblance to the table italic.dbf and can actually be considered as a subset of it but, as can be seen below, it contains two additional fields: Structure for database: mybase.dbf Number of data records: 71355 Date of last update : 08/05/95 Field Field Name Type Width 1 ENTRY Character 5 2 HMG Character 1 3 MNG Character 2 4 TRN Character 6 5 TYP Character 1 6 ORI Character 1 7 ITWORD Character 36 8 TRADIT Character 25 9 LEXFUNC Character 25 10 USE Character 1 ** Total ** 104

While the majority of the fields above have been copied automatically from the italic.dbf database, the eighth and ninth fields, viz. TRADIT and LEXFUNC, exclusively contain information which was added manually and interactively. TRADIT contains the French translation of the metalinguistic item and LEXFUNC houses the lexical function linking the ITWORD (originally italicized item) and the headword (HDWORD) specified in the enhead.dbf database. One might justifiably wonder why it has been felt necessary to indicate the French translation of the metalinguistic items. It should be remembered, however, that the situation with these items is very similar to the treatment of the defining vocabulary in a monolingual

136 dictionary. I pointed out in the introduction to this book that computational linguists and lexicographers are faced with the difficulty of disambiguating the language of definitions (see the example of pen on p. 11). The analysis of the lists of words occurring as typical subjects, objects or head nouns in italics in CR reveals that they are most often used in their basic, prototypical meaning. Yet, there are cases where a given polysemous word is used by the lexicographer in its various senses. Obviously, this has strong implications for the study of the collocational environment of a given lexical item. Consider for example the list of verbs associated with the French noun prix in italics in the French-English part of the dictionary.10 The noun prix is indeed ambiguous and can refer to either Eng. price or prize. Since the verbs that collocate with this noun depend on its meaning, the resulting list of collocating verbs should be split into two sublists: prix, (= prize): attribuer, avoir, décerner, décrocher, donner, emporter, remporter ... prix, (= price): augmenter, baisser, débloquer, dégringoler, diminuer, s'effondrer, geler, grimper, majorer, plafonner, réduire ... (sample lists extracted from the F-E part) This means that extreme caution should be exercized in extracting collocations since word meaning obviously plays an important part in the use that is made of collocations. To give one other example from the English-French part of the dictionary, the item mean is used 12 times as a metalinguistic indicator but the twelve occurrences illustrate four different meanings. Consider the following entries: drive at vt fus (fig: intend, mean) en venir à, vouloir dire... import 3 vt b (mean, imply) signifier, vouloir dire literally adv (...); mean littéralement, au sens propre low-down 1 adj (mean) action bas (/"basse), honteux, méprisable small 1 adj (...); (pej: morally mean) person, mind petit, bas (/"basse), mesquin near 3 adj (e) (: mean) radin", pingre medium 1 b (mean) milieu m. the happy ~ le juste milieu. Mean is clearly used in the sense of 'signifier' in the first three examples and functions as a verb. In the entries for low-down and small, however, it is used as an adjective in the sense of 'méchant' while it means 'avare' under near. The last example illustrates the use of mean as a noun for 'moyenne' in French.

10

Until recently, the French-English part of the dictionary was only available in a format usable by the WordCruncher retrieval system. Unlike the English-French part, it has not been transformed into a full database enriched with lexical functions because the circumstances under which this project originated meant that the emphasis was laid on the acquisition of English collocations. It goes without saying, however, that the method which is suggested here can also be applied to the French-English part of the dictionary to construct a lexical-semantic network around French collocations. Since French lexicography has not given birth to commercially-available dictionaries that would provide as rich a source of lexical information as some English learner's dictionaries do, the possibility of extracting such knowledge for French should certainly not be dismissed. The French-English part of the dictionary is currently used in the framework of the DEFI project in Liège in order to link French texts to bilingual lexical databases in a translation selection and word sense disambiguation perspective (Michiels 1996, Dister & Bourseau 1997, Michiels & Dufour 1996).

137

To provide partial disambiguation, I therefore decided to enrich the database with the translation of the metalinguistic indicator. The availability of this translation, housed in the field TRADIT, offers the additional advantage that the database now contains information about the collocators and the bases for both languages while, in the original version, only English bases, English collocators and French collocators could be accessed. To give but one example based on the preceding section, a user who wishes to get information about the collocational environment of the French noun soupçon can now access this information directly by querying the database against the presence of soupçon in the TRADIT field, without having to resort to the English equivalent, namely suspicion. Since it was impossible to automate the translation process, the French equivalent was provided manually for the 70,000-odd metalinguistic items. To facilitate the encoding task and spare the coder the trouble of having to keyboard the word soupçon for each of the 25 occurrences of the italicized item suspicion, the program I wrote was designed so as to ask the coder to supply the translation of the first occurrence of the metalinguistic item only. The TRADIT field of the remaining occurrences was then filled automatically by copying this translation equivalent. Only in the case of polysemous items, as for mean above, was the coder required to manually change the translation. To populate mybase.dbf with the information we are interested in, the program takes the user through a series of pull-down menus. These menus provide him with a list of potential lexical functions and the appropriate LF for a given pair of items can be selected simply by moving the arrow keys and placing the cursor on the desired function and pressing ENTER. The advantage of this method is twofold: 1. By providing a pre-determined list of LFs, the coder no longer has to type in the lexicalsemantic label, which dramatically reduces the risk of typos or inconsistencies. 2. The lists of LFs the user is provided with are context-sensitive, which means that they vary as a function of the nature of the items the LF is supposed to relate. It is clear that the lexical functions which link two nouns are not the same as those which can link a noun and the intransitive verbs with which it can collocate. The menus were therefore designed so as to present the user with lists of potential LFs selected as a function of the part of speech of the headword and the typographical information contained in the field TYP (e.g. the menus for a metalinguistic item appearing between square brackets - TYP = "C" - in an entry whose POS = "vi" or "vt" would not contain any reference to the Caus function, but would contain LFs including the Incep function such as IncepPredPlus, IncepPredMinus, IncepFuncO, etc.). To clarify the situation and give the reader an idea of what the coder could see on the screen when coding the tens of thousands of contextual pairs of items, I reproduce a typical situation in which the user is requested to answer a question about the collocational environment of the noun suspicion:

138

[ suspicion ] [Type: S] Item in italics (itword): [ entertain ] [POS: vt] Headword: French equivalent of the base: [ soupçon ] Lexical function: [] Translation of headword: [ nourrir ] Do you wish to edit the translation of the itword? enter Y or Ν Do you wish to edit the lexical function? Enter Y or Ν Choose the lexical function from the following list by pressing ENTER FinOperl FinReal 1 FinReal2 IncepOperl Involv Liqu MagnOperl Nocer

Operi Oper2 Oper3 Perm FactO PermManif PermOper2 PreparFactO Real 1

Real! Telic

Not in the list above: enter the code of the function Empty verb

For each of the 71,355 metalinguistic items, such a screen was displayed to facilitate the coding of the LF. Of course, the context-sensitive lists of LFs were not necessarily exhaustive, which is why the coder was offered the possibility of keyboarding the function as well." It should be noted that the last line contains on-line help in the form of a short note reminding the coder of the broad meaning of the lexical function highlighted by the cursor. In the table above, it is clear that the LF which relates the base suspicion and the verbal collocator entertain is Oper,. The short notes indicate that, for example, Operi stands for an empty (support) verb' 2 , Liqu refers to "Liquidate" verbs or PermManif roughly means "to allow to be obvious or visible". At the risk of repeating myself, I should like to stress the fact that lexical functions have only been assigned when the link between the metalinguistic item and the headword corresponded to a standard paradigmatic or syntagmatic LF. No attempt has therefore been made to 'force' the data into a theoretical mould, which means that the field LEXFUNC has been left blank in a sizeable number of cases (18,687, to be precise). In addition to the

' 1 The form of the lexical functions in the Liège database slightly departs from standard practice insofar as the Liège LFs do not make use of capitals and sub- or superscripts, unlike M e l ' i u k ' s functions. An LF such as IncepOper, is therefore coded as incepoperl to facilitate retrieval procedures. 12

A s already stated in the general chapter devoted to the ECD and lexical functions, support verbs are not totally empty but should rather be viewed as semantically impoverished verbs. For reasons of concision and space in the Help file, the expression "empty verb" has been kept for the Oper and Fune LFs, since the primary purpose of these short notes is to j o g the coder's memory about what the functions stand for.

139 standard list which can be found in section 5.3.4.5, I have included a number of lexicalsemantic relations which are widely attested in the CR dictionary. The motivation for the inclusion of such functions can be found in Chapter 8 (see for example the paradigmatic function Spec below, which can be seen as the inverse function of Gener and Part, which denotes part-whole relations). It may be interesting to have a look at a breakdown of the LF figures as they appear in the mybase.dbf database. The combinatory nature of LFs is responsible for the large number of complex lexical functions in the database, which means that it would be useless to list the several hundred different LFs it contains. The figures for the most common relations (appearing more than 100 times in the database) are revealing, however, and testify to the breadth of analysis and the coverage of a resource such as the Collins-Robert dictionary. Number of records in mybase.dbf: 71,355 No LF in 18,687 cases 18687 15586 svn 4165 spec 2150 part 2004 m a g n 1643 qualO 1265 liqu 1193 reali 808 antiver 698 antibón 635 antimagn 601 ver 569 preparfacto 564 mult 458 sinstr 453 sing 414 operi 358 sloe 354 b o n 330 son 310 ant i 308 causpredplus 306 so son 288 causpredminus 284 causdegrad 281 antipos 278 facto 275 d e g r a d 255 si 246 s O m a g n

240 235 226 223 221 215 210 206 197 193 180 176 164 158 156 156 152 149 145 141 138 132 128 123 120 111 107 106

causmanif antireall causfuncO causobstr finfuncO pos real2 causfacto pred inceppredminus inceppredplus aOdegrad sres antiposl nocer magnbon sOreall bon+pos caus finfacto posi telic incepoperl incepfactO sOliqu causpredant iver involv funcO

It is clear that the program I wrote to assign the LF labels and the translation equivalents of the metalinguistic items could only ensure a reduction in the number of typographical inconsistencies by offering automatic checking facilities. The linguistic decisions concerning the choice of LFs have been impossible to automate, however, except for a number of cases where a definition pattern could be used as a clue to assign a specific relation. In most cases, the semantic interpretation was then carried out manually, drawing on the descriptive power of Mel'òuk's lexical functions. In a database of more than 70,000 items, this obviously entails a number of inconsistencies which are unavoidable in a lexicographical project I took more

140 than 2 years to complete. The vagueness of some LFs and the absence of linguistic tests to differentiate potentially ambiguous or conflicting LFs is partly responsible for such inconsistencies. In Chapter 9, the problems related to the difficulty I had in assigning some LFs are discussed at greater length and a few proposals are made to refine the descriptive nature of Mel'cuk's model. As I argue in this chapter, however, these tentative suggestions would deserve a more extensive treatment in the framework of a theoretical study of the "grammar" of lexical functions such as envisaged by Alonso Ramos & Tutin (1996).

6.6.4

Retrieving information from the database: application programs

The lftool.exe program described above was written to assist in the coding of the lexicalsemantic relations in the database. After populating the mybase.dbf table with LFs and the French translations of the metalinguistic items, I was left with another table which had to be related with the remaining tables described in 6.6.1. In order to access information from various angles, a number of application programs were written by Luc Alexandre and myself. In this section, I should like to concentrate on the description of the functionalities of these retrieval programs. The Clipper program I implemented, called revlfex.exe. is specifically designed to enable the user to browse through the dictionary database and to access and retrieve information via the metalinguistic element (the base of the collocations) or via the lexical functions. The user is taken through a series of menus such as the following: Do you wish to access information via: 1. the base (i.e. the italicized metalinguistic item in the printed CR) 2. the lexical function Enter your choice: 1 or 2 The user is then asked whether he or she wishes to view each contextual pair separately (with the French translations, the lexical function, the headword, the metalinguistic item...) or to export the network to an ASCII file. The following example illustrates the ASCII file which can be generated when one queries the database against the item brake. part ( brake / frein ) = drag (sabot de frein) [Ρ] degrad ( brake / frein ) = drag (frotter) [C] finfactO ( brake / frein ) = fail (lâcher) [C] magnfactO ( brake / frein ) = grip (mordre) [C] causobstr ( brake / frein ) = jam (bloquer) [S] obstr ( brake / frein ) = jam (se bloquer) [C] preparfactO ( brake / frein ) = line (garnir) [S] part ( brake / frein ) = lining (garniture ) [C] antifactO ( brake / frein ) = to be off (être desserré) [C] factO ( brake / frein ) = to be on (être serré) [C] causfactO ( brake / frein ) = operate (faire marcher) [S] alexcesst 0 ( brake / frein ) = overheated (qui chauffe) [S]

141

sOantireall ( brake / frein ) = release (dégagement ) [C] preparfactO ( brake / frein ) = reline (changer la garniture de) [S] son ( brake / frein ) = scream (hurler) [C] sOson ( brake / frein ) = screech (grincement ) [C] son ( brake / frein ) = screech (grincer) [C] sOson ( brake / frein ) = sound (bruit ) [C] sOson ( brake / frein ) = squeal (grincement ) [C] son ( brake / frein ) = squeal (grincer) [C] ( brake / frein ) = to tamper with (toucher à ( )) [S] As can be noticed, the network preserves the standard notation of lexical functions f(X)=Y. A slash separates the metalinguistic item and the French equivalent which was added when enriching the database. The last item of information refers to the content of the field TYP and indicates, for example, that the noun brake appears in a parenthesized expression under drag (TYP = Ρ) or that brake appears twice under the verbal entry for jam, first at surface level (unbracketed; TYP = S), then between square brackets (TYP = C), which accounts for the different translations (bloquer vs. se bloquer). Another functionality of the revlfex.exe program is that it enables the user to update the database, for instance by changing the lexical function when an inconsistency or an error has been identified. It should be stressed here that the user need not know anything about the internal structure of the relational database. All the update, search and export operations are carried out automatically by selecting the relevant pieces of information from their respective tables. For portability reasons, a retrieval program was also written in C by Luc Alexandre in the framework of the DECIDE project.13 In order to provide users with multiple access keys and to give them maximal freedom, a command line interface was created. The user simply has to type in the name of the program, viz. robcol. and the conditions on the fields he wishes to formulate (the symbol "*" can also be used for wildcards). The fields which can be queried have to be preceded by a hyphen, using the following notation: i: italicized metalinguistic item h: headword pos: part of speech lex: lexical function If we are interested in retrieving words which contain the string fox in italics, we may type the following command: robcol -i fox which yields the following results

13

The dictionary can be queried on a UNIX Sun Sparc station and on PCs.

142 bark (η) : ~fox~ => glapissement (renard,sOson) bark (vi) : ~fox~ => glapir (renard,son) bitch (η) : ~fox~ => renarde (renard,female) brush (n) : ~fox~ => queue (renard,part) dog (n) : ~fox~ => mâle (renard,male) earth (n) : ~fox~ => terrier (renard,sloe) hole (n) : ~fox~ => terrier (renard,sloe) kennel (n) : ~fox~ => repaire (renard,sloe) muzzle (n) : ~fox~ => museau (renard,part) run (vt) : ~fox~ => chasser (renard,)14 stink out (vt sep) : ~fox~ => enfumer (renard,) yelp (η) : ~fox~ => glapissement (renard,sOson) yelping (adj) : - f o x - => glapissant (renard,aOson) yelping (η) : - f o x - => glapissement (renard,sOson) The format here is the following: headword + part of speech + italicized item => French translation of the headword + French translation of the italicized item (here : renard) + the standard lexical function or lexical-semantic relationship (sOson: noun expressing the typical sound; sloe: noun for the typical place of something). The program makes it possible to retrieve collocational information by combining different types of criteria. In the following query, for example, the user is interested in extracting the verbs which can take 'law' as direct object and whose meaning in the vicinity of 'law' can be expressed in terms of the lexical function 'liqu' (liquidate, eradicate): Input: robcol -i law -lex liqu Output: abolish (vt) : - l a w - => abroger (loi,liqu) annul (vt) : - l a w - => abroger (loi,liqu) do away with (vt fus) : - l a w - => supprimer (loi,liqu) repeal (vt) : - l a w - => abroger (loi,liqu) rescind (vt) : - l a w - => abroger (loi,liqu) revoke (vt) : - l a w - => rapporter (loi,liqu) Another example of a query combining lexical functions and the base of a collocation is the following:

14

The field for the lexical function is empty because the relationship between run, or stink out in the following line, and fox cannot readily be described in terms of any standard LF, which, as has already been stressed, definitely points to a limitation of MTT.

143 robcol -i liar -lex magn which can be paraphrased as follows: list items expressing a 'high degree or intensity' (Magn) of the noun 'liar': arrant (adj) : - l i a r - => fieffé (menteur,magn) bare (cpd) [barefaced] : - l i a r - => éhonté (menteur,magn) chronic (adj) : - l i a r - => invétéré (menteur,magn) confirmed (adj) : - l i a r - => invétéré (menteur,magn) habitual (adj) : - l i a r - => invétéré (menteur,magn) hopeless (adj) : - l i a r - => invétéré (menteur,magn) out (cpd) [out-and-out] : - l i a r - => fieffé (menteur,magn) rank (adj) : - l i a r - => fieffé (menteur,magn) straight (cpd) [straight-out] : - l i a r - => fieffé (menteur,magn) unqualified (adj) : - l i a r - => fieffé (menteur,magn) The program robcol.exe can of course also be used to retrieve information as if one were simply looking up a headword in the printed dictionary. The primary access key is therefore the headword field and the information which is displayed is similar to what can be found in the published version of the dictionary, although the output is reformatted. The interest of such information in a generation perspective is obvious: Input: robcol -h confirmed -pos adj Output: confirmed confirmed confirmed confirmed confirmed confirmed

(adj) (adj) (adj) (adj) (adj) (adj)

: -drunkard- => invétéré (ivrogne,magn) : - l i a r - => invétéré (menteur,magn) : -smoker- => invétéré (fumeur,magn) : -bachelor- => endurci (célibataire,magn) : -sinner- => endurci (pécheur,magn) : - h a b i t - => incorrigible (habitude,magn)

The use of wildcards is very important. In the following query, the user is interested in retrieving the verbs (whether transitive -vt- or intransitive -vi-, hence the command "v*") which can collocate with the word 'bell' and which express the typical sound made by this artifact (the wildcard * before 'son' means that we want to extract verbs whose lexical function can be 'son' proper - i.e. the intransitive verb expressing the typical sound - or 'causson' - i.e. causative verbs expressing the typical sound). Input: robcol -i -bell -pos "ν*" -lex "*son"

144 Output: chime (vi) : -bell- => carillonner (cloche,son) chime (vt) : -bell- => sonner (cloche,causson) go (vi) : -bell- => sonner (cloche,son) jangle (vi) : -bell- => retentir avec un bruit de ferraille (cloche,son) muffle (vt) : -bell- => assourdir (cloche,causantimagnson) peal (vi) : -bell- => carillonner (cloche,son) peal (vt) : -bell- => sonner (à toute volée) (cloche,causson) ring (vi) : -bell- => sonner (cloche,son) ring (vt) : -bell- => (faire) sonner (cloche,causson) ring out (vi) : -bell- => retentir (cloche,son) ring out (vi) : -bell- => sonner (cloche,son) sound (vi) : -bell- => sonner (cloche,son) sound (vt) : -bell- => sonner (cloche,causson) toll (vi) : -bell- => sonner (cloche,son) toll (vt) : -bell- => sonner (cloche,causson) One can also ask the system to produce the list of support verb constructions (lexical function = Operi) for bases starting with "a", which can be formulated as follows: robcol -i "a*" -lex operi Sample output (sorted on the support verb): bestow (vt) : -admiration- => accorder (admiration,operi) bring forward (vt sep) : -argument- => avancer (argument,operl) enjoy (vt) : -advantage- => jouir de (avantage,operi) exercise (vt) : -authority- => exercer (autorité,operi) exert (vt) : -authority- => exercer (autorité,operl) fling (vt) : -accusation- => lancer ( à qn) (accusation,operi) give (vt) : -answer- => donner (réponse,operl) give (vt) : -answer- => faire (réponse,operi) lay (vt) : -accusation- => porter (accusation,operl) offer (vt) : -apology- => offrir (excuse,operl) present (vt) : -apology- => présenter ( à) (excuse,operi) proffer (vt) : -apology- => offrir (excuse,operi) put forward (vt sep) : -argument- => avancer (argument,operl) put in (vt sep) : -application- => faire (candidature,operl) scream (vt) : -abuse- => hurler ( à) (injure,operi) sling (vt) : -accusation- => lancer ( à qn) (accusation,operi) tender (vt) : -apology- => offrir (excuse,operi) wield (vt) : -authority- => exercer (autorité,operi) If one is interested in retrieving the subset of support verbs which includes inchoative verbs that may collocate with the noun 'habit', one can type the following command:

145 robcol -i habit -lex incepoperl The lexical function "incep" denotes the beginning of a process and is here combined with "operi" to form a complex lexical function. The CR data for this query is the following: acquire (vt) : - h a b i t - => prendre (habitude,incepoperl) contract (vt) : - h a b i t - => prendre (habitude,incepoperl) develop (vt) : - h a b i t - => contracter (habitude,incepoperl) form (vt) : - h a b i t - => contracter (habitude,incepoperl) take to (vt fus) : - h a b i t - => prendre (habitude,incepoperl) To give a final example of the possibilities offered by the retrieval program, it may be useful to stress once again that the database is not a collocational database proper but should be described more aptly as a lexical-semantic network. The thesauric and alphabetical organizations of the dictionary are no longer incompatible in a computerized dictionary (see Michiels 1982:359 or Atkins 1996), which is very much in keeping with prevailing ideas about the psycholinguistic organization of the lexicon. As a matter of fact, the Liège database features some characteristics which are typical of WordNet, the on-line lexical network built by George Miller's team at the University of Princeton (Miller et al. 1990, Fellbaum 1990, Miller 1990). This becomes all the more obvious when one considers the output of the following query against the hyponyms of the noun contraceptive (see section 8.8 for more information on the Spec relation): robcol -i contraceptive -lex spec French (cpd) [French letter] : -contraceptive- => capote anglaise [informal] (contraceptif,spec) coil (n) {the coil} : -contraceptive- => le stérilet (contraceptif,spec) douche (η) : -contraceptive- => lavage vaginal (contraceptif,spec) loop (n) {the loop} : -contraceptive- => le stérilet (contraceptif,spec) prophylactic (η) : -contraceptive- => préservatif (contraceptif,spec) rubber (n) : -contraceptive- => préservatif (contraceptif,spec) sheath (n) : -contraceptive- => préservatif (contraceptif,spec) This data makes it abundantly clear that the CR database could definitely be used as a starting point15 for the development of computerized or printed by-products such as thesauri à la Roget and monolingual or bilingual reference books in the tradition of LOLEX (McArthur 1981) or, more recently, the Language Activator (Summers 1993) or Word Routes (Walter 1994). I will come back to the pedagogical applications of this relational network in Chapter 14.

15

Obviously, the CR database cannot be considered as a final product. The absence of the word condom in the category above is due to the criteria used to construct the relational network. Indeed, condom is simply translated as préservatif m the dictionary (using contraceptive in parentheses could have been problematic because contraceptive is both synonymous with and superordinate to condom). Since there is no italicized indicator in the entry, no relationship can be established with the other lexical items belonging to the same thesauric class.

7 Defining formulae and lexical functions

7.1

Introduction

As noted by Ahlswede and Evens (1988a:216), relational models have been used widely in anthropology, in computer science, in linguistics and in psychology. Some disciplines tend to make extensive use of a small set of lexical-semantic relations only. Anthropology, for instance, draws heavily, though not exclusively, on the concept of taxonomy (consider the prototypical example used by anthropologists to illustrate the way in which they structure the vocabulary of the languages they study: a lion i s a k i n d o f animal). As has already been pointed out, the main lexical-semantic relations, viz. hyperonymy, synonymy and antonymy, can be represented in terms of the Gener, Syn and Anti lexical functions respectively. For a survey of the uses of these relations (and others) in the above-mentioned disciplines, the reader is referred to Evens et al. (1980) who provide a systematic comparison and a critical analysis of a whole range of relations. We saw in section 6.3.5 that the extraction of collocations from the Collins-Robert dictionary could be carried out automatically. The semantic interpretation of the link between base and collocate (headword and element in italics), however, was shown to be impossible to perform automatically. In most cases, then, the assignment of the appropriate lexical function had to be done manually, which proved a rather labour-intensive task. This statement should be qualified, however, because it appeared that the assignment process could benefit from a range of regularities in the micro-definitions of the dictionary. The idea of exploiting regularities in definitions is not a new one. Michiels (1982:188) and Michiels & Noël (1984) point to the possibility of retrieving the link between a process and its typical instrument by searching for keywords such as "apparatus", "device" or "instrument" combined with definition patterns such as used for V-ing, made to V, used in NP to V, etc. Ahlswede and Evens (1988a) use the term "defining formula" to refer to the words and/or phrases used repeatedly in definitions. Their contention is that these defining formulae are a valid clue to relations and they use them to identify lexical-semantic relations in the definitions of Webster's Seventh Collegiate Dictionary (Olney, 1968). In order to generate their relational lexicon, Ahlswede & Evens have to parse their definitions in order to identify what they call 'converters'. The function of a converter is to make a definition syntactically parallel with the word it defines (p.220), which means that the word and its definition can (or at least should) be substitutable. Consider the following example: legal adj concerning the law

148 The adjective legal is defined in terms of its relation to the noun law. The function of the word 'concerning' is to convert the noun phrase 'the law' into an adjective phrase. A converter can also act on a phrase of the same part of speech as the item defined, as shown in the following example: peerage η list of peers where 'list of expresses a GROUP relationship (see below for other examples). Identifying converters makes it possible to pinpoint relevant relations and to enrich the database accordingly. Of course, it should be borne in mind that a given relation (or lexical function) need not be expressed by only one and the same converter. The contrary, though, is not true since a given converter can express one relation only. The approach adopted here to exploit defining formulae in CR differs from earlier work on the topic in several respects: - Earlier attempts all concentrated on monolingual dictionaries. Michiels (1982) and Vossen et al. (1989) use LDOCE, Ahlswede and Evens (1988a) use W7. LDOCE is a learner's dictionary whose definitions are written in a controlled vocabulary of around 2,000 items (the words used in definitions that do not belong to this defining vocabulary are capitalized).The main advantage of using a controlled vocabulary is that it enables the lexicographer to work more consistently by resorting extensively to defining formulae. The main drawback, however, is that the syntax of definitions is often more complex, which in some cases may lead to unnatural formulations. - CR being a bilingual dictionary, we cannot expect to find many definitions in it. There are, however, what might be termed "micro-definitions" appearing in italics and in parentheses (see section 6.3.2). Such micro-definitions are usually characterized by a very simple syntax. Minimally, they may be no more than a synonym (one word only) but, in a number of cases, they resemble a more traditional definition. The main purpose of such micro-definitions is to guide the dictionary user by providing him or her with information on how to select the appropriate translation. This means that they will only be found in cases of translationallyrelevant ambiguities. Moreover, it must also be stressed that some micro-definitions may be totally inaccurate and that the key-words they contain may relate to non-prototypical values, which, in some cases, may make them useless in the perspective adopted here. Since I am concerned with the construction of a database which captures lexical-semantic relations, it is crucial to identify recurring defining formulae in micro-definitions and to use them to assign the corresponding lexical functions linking a headword and the item which comes immediately after the converter. The following sections consider the various LFs that can be readily identified and analyze the defining patterns which are used to express these LFs.

149 7.2

Mult

The first function I wish to examine is the exponent of the GROUP relationship. As has been pointed out, Mult(X)=Y is used to refer to a group or collection of X. In most cases, this function captures the syntagmatic relationship which holds between the two members of an N|+N, collocation where N, is a collective (a swarm of bees, a pack of dogs, a herd of cattle). In these examples, the Mult function definitely expresses a strong collocational link between two lexemes. In other cases, however, Mult links two items which will hardly ever co-occur (* a delegation of delegates) but are related by derivation; Mult (delegate)= delegation. In the latter example, one may follow Heylen (1993) and consider Mult as a substitution LF insofar as the meaning of the value of this function (delegation) includes the meaning of the argument (delegate). The former category of syntagmatic relationships is expressed in the dictionary through the standard use of typographical devices such as brackets, as in the following example: swarm η [bees, flying insects] essaim It is clear that the nouns swarm and bee are likely to collocate in a significant way, even though it can be safely asserted that the meaning of swarm to some extent includes or implies some reference to bees or insects. As I pointed out earlier, however, the semantic link cannot be predicted and has to be worked out by the linguist who analyzes the pairs of collocations. The latter category of the substitution Mult LF is somewhat easier to detect since it is signalled through the use of definition patterns in the parenthesized micro-definitions. The defining formulae which express this function are the following:

7.2.1

body of

judicature η {body of judges) magistrature judiciary η (body of judges) magistrature legislation η (body of laws) législation The noun which follows the defining formula is the argument of the function, which enables us to generate the following relations: Mult (judge)= judicature Mult (judge)= judiciary Mult (law)= legislation The same converter is also used to establish a relationship between christian and church, people and commission or men and force. Ideally, of course, a number of subscripted modifiers should be added in some cases to avoid the problem of overgeneralization as stated by Heylen & Maxwell (1994) and alluded to in section 5.3.4.6.

150 It may be useful to add here that a lexicographer may have had good reasons to opt for a given definition formula. Indeed, it would sound odd to define judicature as "a set of judges" because "set o f ' is rarely used in combination with human beings (see below). Of course, this nuance is lost in the database but what is important is that a new link has been established and captured between a noun X and the standard term, if any, which refers to a group of Xs.

7.2.2

group of

confederacy η {Pol: group of states) confédération delegation η (group of delegates) délégation Again, I have to stress that it is of utmost importance to identify the position and status of elements in italics. I am concerned here with the extraction of significant defining formulae, which means that it must be possible to concentrate on bracketed material (micro-definitions in CR are always bracketed). Hence, in our exploration of these micro-definitions and our search for meaningful patterns, we definitely do not want to come up with the following example (assuming we ask the system to list examples of items containing "group of'): direct vt group of actors diriger Since the database contains information on whether elements in italics appear in parentheses, in square brackets or at "surface" level (i.e. unbracketed), it is easy to filter out these occurrences of "group o f ' by stating that one wishes to focus on records containing this converter at Ρ level (in parentheses). "Group o f ' can be used as a clue to identify the following relations: Mult Mult Mult Mult

(officials)= board (chimney)= chimney stack, stack (state)= confederacy (delegate)= delegation

Mult (man)= draft, watch Mult (speaker)= panel Mult (troop)= patrol

Next to this standard converter, we also find a variant where the base of the collocations appears in square brackets and the noun group is mentioned in parentheses, as in the following examples: pack η (...) (group) [hounds] meute; [wolves, thieves] bande; [brownies, cubs] meute; [runners] peloton. flight η (group) [birds] vol, volée; [planes] escadrille. It is important to realize that this notation is equivalent to a micro-definition containing "group of hounds" or "group of birds". In the database, such variations and inconsistencies have disappeared and only the lexical function remains.

151

7.2.3

set of

The noun "set" is also frequently used as a converter to account for a collective meaning as in: series η (set of books) collection; (set of stamps) série quaternary η (set of four) ensemble de quatre This defining formula is also used in four other cases: Mult (clothes)= outfit Mult (exam question)= paper Mult (crockery)= service Mult (item)= kit

7.2.4

series of

"Series" is another synonym of group which expresses a collective relationship, for instance in: arcade η (series of arches) arcade, galerie Again, arcade and arches do not participate in a collocational relationship and the Mult LF which relates them is to be considered here as a substitution function. A similar relationship holds between blow and beating, goods and line, sound and splash.

7.2.5

list of

This definition pattern is not used very frequently in the dictionary. The four occurrences are: docket η (list of cases) rôle des causes peerage η (list of peers) nobiliaires petition η (list of signatures) pétition poll η (list of voters) liste électorale

7.2.6

cluster of

"Cluster of' is the least frequently used converter. It makes it possible to assign the Mult LF in the following two cases: Mult (hook)= drag Mult (feather)= plume

152 7.2.7

number of

The formula "number of' is used three times to define a collective noun. It should be noted that "number" is used here as a synonym of 'group' or 'set'. attendance η (number of people present) assistance battery η (number of similar objects) batterie membership η (number of members) [+ paraphrase in French to translate an example]

7.2.8

A note on "collective"

The use of the term collective in CR deserves more than a passing remark. Consider the following examples in which it is used: ancestry η (collective ri) ancêtres, aïeux, descendants cattle η (collective ri) bovins, bétail, bestiaux The word 'collective' is used here to refer to a grammatical property of the headword. It can neither be considered as part of a micro-definition nor as a converter or a defining formula. In fact, it is closer to a grammar code such as those used in the first edition (Procter 1978) of LDOCE (an approximate equivalent in the latter dictionary would be GC or GU, which stand for group countable noun and group uncountable noun respectively; LDOCE also uses the code [P] to account for nouns that are always used with a plural verb - note that the Collins-Robert entries above do not signal that ancestry takes a singular verb while cattle takes plural verbs). What is important to note here is that the word collective can be used as a clue to discover potential values of the Mult function. In the two examples given above, however, the argument cannot be retrieved from the dictionary entry, unlike the following entries in which the argument appears in the parenthesized material in italics: gunnery η (Mil: collective n: guns) artillerie leadership η (collective n: leaders) dirigeants chivalry η (collective: Hist: knights) chevalerie fowl η (hens etc) (collective ri) volaille, oiseaux peerage η (collective: the peers) pairs, noblesse linen η (collective ri) (sheets, tablecloths etc) linge Note that the base of the collocation need not appear in the same parentheses as the 'collective' indicator (e.g. fowl, linen). A similar remark can be made for the term 'collectively' which plays exactly the same part in denoting a group (sample only): bench η (judges collectively) les magistrats employment η (jobs collectively) emploi

153 knighthood η (knights collectively) chevalerie machinery η (machines collectively) machinerie, machines prelacy η (prelates collectively) prélats press η (newspapers collectively) presse It may be worthwhile to have a look at the words chivalry and knighthood above. The inconsistent treatment of the specification of the base disappears in the database since the information is now represented as follows, regardless of the variations in formulation which do not point to any difference in meaning (even though chivalry is used here in a historical sense): Mult (knight)= chivalry, knighthood. Interestingly, this way of reorganizing the lexicon enables us to discover some regularities in morphological processes. Calzolari (1990) notes that the morphological phenomenon of suffixal word formation may be the surface realization of an underlying lexical-semantic relation. She is mainly concerned with identifying thematic roles (agent, patient, instrument, etc) but the data contained in our database provide ample evidence that a clear link can be established between a lexical function such as Mult and certain suffixes such as -ery/-ary or hood. The former suffix appears in the following examples (see also Marchand 1969:282ff for further details on this suffix): Mult Mult Mult Mult Mult Mult

(knight)= chivalry (gun)= gunnery (machine)= machinery (four)= quaternary (statue)= statuary (rocket)= rocketry

The productive character of -ery/-ary was noted by the LDOCE lexicographers who specify that this suffix can be attached to a noun or an adjective to form an uncountable noun: -ery suffix 1 [n, (adj)->n [U]] also (after -d,-t,-l,-n,-e)-ry (...) b a collection of: a lot of CROCKERY [modern MACHINERY! in all her FINERY The suffix -hood does not seem to have been considered as productive enough to be identified as a clue to the Mult function (LDOCE only defines it as "the state or time of being" as in childhood). Yet, in Mult (priest) = priesthood and Mult (knight) = knighthood, it unmistakably refers to a body or group of priests and knights respectively.

154 7.3

Sing

Sing is the inverse lexical function of Mult. We have seen that it refers to the regular quantum or portion of something. The two examples given by Steele & Meyer (1990) are: Sing (fleet)= ship Sing (rice)= grain It is important to realize that these two examples do not cover exactly the same type of relation. Syntagmatically speaking, a grain of rice is undoubtedly a well-formed collocation but a ship of fleet is not. This problem is therefore similar to the distinction I made earlier between delegation/delegate and swarm/bee (p.149). Mel'cuk et al. (1984,1988) treat Sing and Mult as inverse LFs and no one seems to have questioned this classification, with the exception of Grimes (1990:361) who argues for the creation of four functions instead of two. Unit and Mass would replace Sing and its inverse while Mult and its inverse would be replaced by Multiple and Individual. Grimes does not justify his proposal, nor does he give any example to support it. The following examples would probably illustrate this new classification: Unit (rice)= grain Mass (grain)= rice [+salt, maize, wheat...] Individual (fleet)= ship [+car, bus, aircraft...] Multiple (ship)= fleet The former category deals with uncountable mass nouns (typically, though not exclusively, foodstuffs) while the latter category refers to collective nouns with a plural meaning (see also Wierzbicka 1988 for more information on such mass/count nouns). For practical reasons, I have chosen to follow Mel'cuk and have only used Sing and Mult. In the long-term, it will perhaps prove to be necessary to break down these functions into more detailed relations à la Grimes but a more thorough analysis will be required before embarking on this task.

7.3.1

piece of

Micro-definitions beginning with 'piece of are used to point to the relationship that holds between an uncountable noun, denoting an undifferentiated mass, and another, countable, noun used to refer to an individual unit or a quantitative partition. Typical examples are: body η (piece of matter) corps job η (piece of work) travail, besogne, tâche, boulot pot η (piece of pottery) poterie shaving η {piece of wood, metal) copeau

155 sod η (piece of turf) motte wedge η (piece: of cake, pie etc) part, morceau The importance of this Sing function in a cross-linguistic perspective hardly needs to be stressed. While 'poterie' can be used in French either in its uncountable meaning to refer to earthenware (material) or to a single instance of it, pottery in English cannot be used for a single pot. The same holds for the relationship between job and work, a notorious stumbling block for many francophone learners of English who tend to generate odd-sounding sentences such as 'John has got a good work in a bank' or 'John is applying for a new work'.

7.3.2

a member of

The converter member of is used three times in micro-definitions to refer to the [+HUMAN] entity which forms the basic unit in a group: deputy η (member of deputation) délégué gun η (member of shooting party) fusil teacher η (gen: member of a teaching profession) enseignant It is clear that member of functions as the inverse of body of or group of. The noun deputation could be defined as 'body/group of deputies'. Interestingly, this also demonstrates that the LF Sing is not necessarily associated with inanimate, concrete arguments, as is too often the case in the literature, the traditional examples being Sing (rice)= grain or Sing (fleet)= ship, as pointed out above.

7.3.3

flash

of

It is unclear whether 'flash o f can be called a definition pattern at all. However, even though it only occurs once in the dictionary, it undeniably plays the role of a converter in the following example: sally η (flash of wit) boutade, saillie

7.3.4

act of

I am not concerned here with the structure 'act of V ing', which is extremely frequent and which is dealt with in greater detail below in section 8.6. The definition pattern I am referring to here is the structure 'act of N', in which Ν is an uncountable noun, as in: favour η (act of kindness) service, faveur, grâce.

156 It should be noted that kindness is used here in its uncountable sense, although it can also be used as a countable noun, as testified in the following example which also illustrates the use of'act o f : kindness η (b) (act of kindness) bonté, gentillesse, grâce. As had been previously noted for monolingual learner's dictionaries (Michiels 1982, Boguraev & Briscoe 1989), the defining vocabulary is frequently used in different senses and no English dictionary provides any indication about which sense of the defining words is being used (see also p. 11). One possible way of overcoming such difficulty would be to assign a subscript to each of the defining words to point back to the sense in which it is used. (e.g. kindness2 = act of kindness,). This is precisely what Mel'öuk attempts to do in the ECD to avoid circularity. The relationship described above would then be formulated as follows: Sing (kindness,)= kindness2 Interestingly, this way of formalizing the semantic/syntactic behaviour of kindness makes it clear that this word undergoes a type of noun alternation known as the count/mass alternation (Quirk et al. (1985:248) would call this phenomenon 'reclassification'). An extended treatment of noun alternations and of the related concept of Lexical Implication Rules can be found in Chapter 11 .

7.4

Sir

Sinstr refers to the typical instrument used to perform a given task. This function has attracted a lot of attention among researchers, probably because capturing the link between a process and its typical instrument is essential in solving what is known as the PP attachment problem (i.e. deciding on where to attach a given prepositional phrase). To illustrate the complexity of this task, consider the following two sentences: (1) John painted the jeep with the new brush. (2) John painted the jeep with the new wheels. It is clear that (1) and (2) should be assigned different constituent structures. With the new brush in (1) is an immediate constituent of the larger consituent painted the jeep with a new brush. Conversely, the jeep with the new wheels in (2) forms a constituent. The two analyses may be arrived at on the basis of the distinct types of lexical-semantic relationships playing a role in (1) and (2). While brush is undoubtedly the typical instrument associated with the process of painting, wheel participates in a strong semantic relationship with jeep (and with most vehicles in general), viz. a part_of relationship (see 8.3). Solving such potential ambiguities presupposes the description of such relations in the lexicon, possibly in the form of lexical functions which specify the following pieces of information:

157

S,„sn (paint) = brush Part (vehicle) = wheel Spec (vehicle) = jeep To avoid specifying that wheels are parts of jeeps, ambulances, lorries, cars, it is much more economical to mention this information only once for the general hyperonym (vehicle) and to make sure that the taxonomy of vehicles is captured through the Spec (specific term) LF (see 8.8). Michiels (1982) points to a fairly large number of definition patterns in LDOCE that can be used to locate typical instruments (machine/device/instrument used to V-inf/for V-ing...). CR lexicographers have only used one type of defining formula, viz. 'for V-ing', which makes it possible to locate potential instruments fairly quickly. The list is too long to be given here. Let me simply give a few examples: birch η (for whipping) verge, fouet buffer η (for polishing) polissoir chisel η (for engraving) burin guillotine η (for beheading) guillotine lead η (for sounding) plomb lens η (for magnifying) lentille trepan η (for quarrying etc) trépan It should be noted that the LF is, in a few cases, directly retrievable from the micro-definition itself. Consider the following entries which explicitly indicate the instrumental relationship between the headword and a process verb: square η (drawing instrument) équerre triangle η (drawing instrument) équerre Whatever the device used by the lexicographer, the relationship between the process and the instrument is now indicated as follows, after manual editing of the micro-definitions: Sinstr (magnify) = lens Sinslr (sound) = lead Sinslr (draw) = square, triangle

7.5

7.5.1

Caus/Perm

cause to/allow to

Caus and Perm are the causative operators meaning 'cause, do something so that a situation occurs' and 'allow, permit' respectively. Both are normally used in combination with other

158 verbal lexical functions, typically (though not exclusively), Oper, Fune or Pred. The dictionary only infrequently makes use of "cause to" or "allow to" in micro-definitions (25 and 7 occurrences respectively), which means that these definition patterns cannot be used reliably to extract causative verbs on a large scale. Unlike what has been done on monolingual dictionaries (Fontenelle & Vanandroye 1989, Fontenelle 1991), it hardly seems possible to investigate the so-called causative/inchoative alternation by examining patterns involving causative operators. Other criteria have to be looked for, which is the topic of chapter 12 (on ergative verbs). The Perm LF can be detected in 7 cases in which the "allow to" definitions pattern is used, as in: let sb go (allow to leave) laisser partir leave vt (allow to remain) laisser let away vt (allow to leave) laisser partir let off vt (allow to leave) laisser partir let out vf (allow to leave) faire/laisser sortir let up vt (allow to rise) permettre de se lever see vt (allow to be) laisser, permettre Since "allow to" is followed by another verb, the Perm LF which is assigned here is used in isolation: Perm Perm Perm Perm

(leave)= let go, let away, let off, let out (remain)= leave (rise)= let up (be)= see

Of course, the second and fourth examples may seem rather odd because the values of the functions are ambiguous (leave and see are used here in non-prototypical senses, which provides ample evidence that the use of 'primitives' is not unproblematic). In the following examples, the converter "cause to V" converts V into a causative verb: drive back vt (cause to retreat) repousser, refouler, faire reculer get vt (cause to be: gen+adj) give vt (cause to have) donner make vi (cause to be) rendre, faire raise vt (cause to rise) lever Here again, Caus is not used in combination with other verbal LFs (e.g. Caus (rise)= raise), unlike the situation described in the following section.

7.5.2

make + adj

As is pointed out by Levin (1987, 1993), work in generative semantics centres on the form of lexical semantic representation. The predicate decomposition approach is a formal device

159 which has been adopted by numerous researchers and, although serious criticism has been levelled against it to provide evidence that it is unacceptable at the deep-syntactic level (see Fodor 1970), it has considerably influenced the lexicographic tradition. The underlying hypothesis is that the lexical composition of an item, say a verb, boils down to a combination of primitives which are called atomic predicates. CAUSE belongs to this set of atomic predicates insofar as it is an essential component of the lexical representation of causative verbs (see above), which is why many linguists would paraphrase 'kill' as 'CAUSE to die' or, refining the decomposition process even further, as 'CAUSE to BECOME NOT ALIVE'. CAUSE is a primitive concept which should not be confused with the English verb 'to cause'. The verb 'make', which we can decompose as 'CAUSE to BE', may be considered as another surface realization of this semantic primitive. In the CR dictionary, it is used extensively in combination with an adjective, as in the following examples: advertise vt (make conspicuous) afficher assure vt (make certain) garantir, assurer befuddle vt (make tipsy) griser, émécher burlesque vi (make ridiculous) tourner en ridicule equate vt (make equal) égaler, égaliser knock up vt (make pregnant) mettre enceinte, engrosser Since Pred is the verbalizing lexical function (meaning 'to be + the keyword'), it will have to be used ÌQ combination with Caus to yield the following triples: CausPred (conspicuous)= advertise CausPred (certain)= assure CausPred (equal)= equate In some cases, the adjective is used in the comparative form, as in: lighten vt (make lighter) éclaicir lighten vt (make less heavy) alléger improve vt (make better) améliorer The functions Plus or Minus are then used to indicate the direction of the comparison. Note that the adjective has to be stripped of all affixes, if any, to be reduced to its base form in the database: CausPredPlus (light)= lighten CausPredMinus (heavy)= lighten CausPredPlus (good)= improve Browsing through the database reveals that Caus is often associated with "recurrent, overt morphological relationships between one word and another belonging to a different syntactic category and produced from the first by some process of derivation called paronymy" (Cruse 1986:130). This is a clear case of a paradigmatic relationship in which the adjective is the

160 base and the derived form is called the paronym. The suffix -en, for example, can be added to an adjective to form a new (causative) verb, as in light/lighten. Other examples of such rules are 'adj + ize', as in public/publicize where -ize clearly evidences a causal relationship (LDOCE defines -ize - suffix - as 'to make of the stated type or put into the stated condition'). It is also possible to detect cases of conversion (Cruse 1986:132 would call this zero-derived paronymy), i.e. a paronymous relationship with no affixation process. The following examples are cases in point: CausPred (level)= level CausPred (round)= round based on the following entries: level vt (make level) niveler, aplanir round vi (make round) arrondir Adams (1973:37ff) gives a clear account of the various functions of the zero affix in word formation.

7.6

Ao

Ao refers to an adjectival derivative of a lexeme, more precisely an adjective which coincides with the keyword in meaning. The relationship between law and legal, for instance, can be accounted for in terms of this LF: Ao (law)= legal. Unlike monolingual dictionaries, however, CR does not make extensive use of definition patterns to signal this relationship. The restricted set of formulae is given below:

7.6.1

concerning

This converter is only used in two cases: legal adj (concerning the law) judiciaire, juridique marital adj (concerning husband) marital

7.6.2

of the/of a

Micro-definitions resort to this converter to turn a noun into an adjective: carnal adj (of the flesh) charnel

161 communal adj (of the community) communautaire demotic adj (of the people) populaire genetic adj (of the genes) génétique physical adj (of the body) physique vulgar adj (of the common people) vulgaire, commun corporate adj (of the corporation) d'une corporation vital adj (of life) vital As can be noted, the -al, -ic and -ar suffixes seem to play an important part in this derivational process. Of course, these suffixes are not necessarily attached to the noun. The adjectives in the examples above belong to the Latin or Greek stock of words and etymology should be resorted to in order to account for these shifts. The diachronic study of such a morphological process falls outside the scope of this book, however; what is important at this stage, from a purely synchronic point of view, is that our database has to capture the relationship which holds between seemingly different lexical items such as flesh and carnal. After editing the micro-definitions and getting rid of the converters "of' and "of the", the lexical relation is now represented as follows in the database: A0 (flesh) = carnal A0 (life) = vital At, (body) = physical These examples make it abundantly clear that the border between the dictionary and the thesaurus is a fuzzy one, as already shown by Michiels & Noël (1982) or Calzolari (1988).

7.6.3

made of

Made of is used to define adjectives derived from nouns expressing substances or materials. It should be noted that this definition pattern is used differently as a function of the part of speech of the headword. Compare the following two sets of definitions: (i)

chocolate cpd (made of chocolate) de chocolat ebony cpd (made of ebony) en ébène, d'ébène terracotta cpd (made of terracotta) en terre cuite

(ii)

golden adj (made of gold) en or, d'or leaden adj (made of lead) de or en plomb pearly adj (made of pearl) en or de nacre

cpd means that the nouns denoting substances or material in (i) can be used to form compounds (the part of speech in the new edition of CR is comp) (a chocolate bar, a mahogany table). Ostler & Atkins (1991) use the concept of Lexical Implication Rules (LIR) to account for such regular and predictable meaning shifts. They note that "the LIR Material Adjective, which allows the names of materials to be used as adjectives to describe objects

162 made of the material is unaffected by the application of the somewhat sporadic -en Derived Adjective rule" (pp.81-82). The latter rule is exemplified in (ii) above (e.g gold->golden). The advantage of using LIRs is that it makes it possible to state general rules and to avoid specifying these regularities in the lexicon. Only exceptions then have to be listed. Only a limited set of names of materials do not follow the LIR Material Adjective and form the corresponding adjective by adding a suffix, usually -en. Note that some of them use a variant of 'made of and only resort to 'of as a converter: leathern adj (of leather) de or en cuir ashen adj (of ashwood) en (bois de) frêne waxen adj (of wax) de or en cire Needless to say, the pairs of items are now treated in the database in exactly the same way as gold - golden or lead - leaden. Lexical Implication Rules will be dealt with in more detail in Chapter 11.

7.7

A¡

This section deals with the LF A¡, which denotes the generic attribute of the nth participant of the situation. So far, I have been concerned with A,,, i.e. the simplest form of semantic derivation. There are, however, many cases in which a participant (or actant) is involved (see my discussion of the relationship between stomach and ventral / potbellied on p.69). It is therefore important to identify which participant is being characterized by the value of A¡. Compare the following examples from Mel'éuk's French ECD (s.v. ventre and désespoir): Ao (ventre)= ventral A, (ventre)= ventru A, (désespoir)= désespéré A3 (désespoir)= désespérant, désespéré. Ventru clearly characterizes the possessor of that body part, hence the use o f , to refer to the sole participant. The situation with désespoir is somewhat more complex since this lexeme involves two participants, who are specified as follows in the Government Pattern table and in the definition: Désespoir de X devant Y: très forte émotion désagréable de X causée par le fait suivant: X, croyant très important d'échapper à un événement Y, n'espère plus pouvoir y échapper... (Mel'ôuk et al. 1984:90, Vol.I)

163 1= X 1. de Ν 2. Aposs 3. A

2 = Y 1. 2. 3. 4.

de Ν à Ν devant Ν de V-inf

As I have shown in section 5.3.3, each column represents one syntactic actant and each element in the columns represents a possible surface realization of the actant in question. 1 refers to the participant who experiences the feeling of despair while 2 is the event which causes that emotion. The following sentences may help to contrast these two actants (note the ambiguity of désespéré). 1. Le moral des troupes est au plus bas. Les hommes sont désespérés. 2. La situation est désespérante/désespérée. Prima facie, the following data from the French ECD (Mel'cuk et al. 1984:103-104) may look odd because the numbers associated with the actants do not seem to correspond: A, (étonnement)= étonné A2 (étonnement)= étonnant A, (étonner)= étonnant A, (étonner)= étonné This seemingly inappropriate coding is justified, however, because the first actant of étonnement (the person who experiences this feeling) is actually the second actant of étonner. The two words share the same semantic roles but the ways in which these frames are realized differ and need to be accounted for. As I have already shown, a complete account of the lexical co-occurrence zone of an entry should go along with an exhaustive description of its government pattern and frame semantics. This was not feasible in the framework of this dissertation, however, which means that the assignment of the subscript number to refer to the nth participant is not explicitly accounted for and is not linked to any external description of the semantic roles. Consider the following two entries for unsuspicious: a. unsuspicious adj {feeling no suspicion) peu soupçonneux, peu méfiant b. unsuspicious adj (causing no suspicion) qui n'a rien de suspect, qui n'éveille aucun soupçon Such entries have been reorganized as follows (see also p. 117): a. AntiA, (suspicion)= unsuspicious (peu soupçonneux, peu méfiant) b. AntiA2 (suspicion)= unsuspicious (qui n'a rien de suspect...)

164 7.7.1

having + Ν

This converter is used to refer to a standard property of the first participant (usually the only one), hence the assignment of A, The following entries are cases in point: genial adj (having genius) génial given adj (having inclination) adonné, enclin ulcerous adj (having ulcers) ulcéreux

7.7.2

causing + Ν

Causing means that an external participant (a stimulus or an event) causes the first actant to be in a certain state: anxious adj (causing anxiety) inquiétant, angoissant, alarmant expansive adj (causing expansion) expansif painful adj (causing physical pain) douloureux pestilent adj (causing disease) pestilentiel serious adj (causing concern) grave, sérieux suspicious adj (causing suspicion) suspect, louche ulcerous adj (causing ulcers) ulcératif unsuspicious adj (causing no suspicion) qui n'a rien de suspect, qui n'éveille aucun soupçon The database now contains the following facts: A2 (anxiety)= anxious A2 (concern)= serious A2 (ulcer)= ulcerous AntiAj (suspicion)= unsuspicious Note the recurrent use of the suffix -ous (defined appropriately in LDOCE as "causing or having the nature of'). It should also be noted that causing seems to be mainly associated with words that have a strong negative connotation.

7.7.3

lacking + Ν

Like having, lacking is used to convert a noun into an adjective, the only difference being that a negative element is added (Anti): dark adj (lacking light) obscur, noir ignorant adj (lacking education) ignorant intemperate adj (lacking moderation) immodéré

165 If there is a participant (actant) in the situation alluded to, A, has to be assigned. Otherwise, A0 is used, which explains the difference in the database: AntiAo (light)= dark AntiA, (education)= ignorant AntiA, (moderation)= intemperate

7.7.4

who + V

Who typically exemplifies a converter which involves a participant. It is only used in the following example: paying adj (who pays) payant A, (pay)= paying The rarity of this formula is due to the fact that the morphological process involved here (V + ing —> adj) is a highly regular one. Even if it is easy to describe in a decoding perspective, however, it does not apply in every case where it might, which may pose problems in a generation perspective.

7.7.5

without

Without can be used in combination with either a noun or a gerund. Depending on the of speech of the headword under which it is found, it will lead to the assignment of the Aj or Adv; (Adv¡ referring, as already explained on p.70, to the standard qualification adverbial form - of an action of the nth participant in a given situation). Unsurprisingly, definition pattern is mostly associated with adjectives ending in -less:

part LFs - in this

artless adj {without guile) naturel, ingénu bloodless adj (without blood) exsangue bootless adj (without boots) sans bottes listless adj (without energy) sans énergie, mou heedlessly adv (without reflection) étourdiment, à la légère heedlessly adv (without care) avec insouciance silently adv (without speaking) silencieusement modestly adv (without boasting) modestement The noun or the verb which follows without is of course stripped of all affixes and reduced to its base form: AntiA, (boot)= bootless AntiAdv, (boast)= modestly

166 7.8

Anti

The function Anti expresses the basic relation of antonymy. Predictably, the most frequent converter is the word not associated with adjectives or adverbs:

7.8.1

not

accidentally adv (not deliberately) accidentellement clean adj (not dirty) propre, net destructive adj (not constructive) destructif, destructeur easy adj (not difficult) facile Hundreds of antonyms can be located by searching for not at the very beginning of a microdefinition. Yet, this converter is not used with nouns, which means that other defining formulae come into play.

7.8.2

lack of

Consider the following examples: ambiguity η (lack of clarity) ambiguïté, obscurité distraction η (lack of attention) distraction, inattention ease η (lack of difficulty) aisance, facilité A number of antonyms are described with lack of which makes it possible to enrich the database with the following records (sample only): Anti Anti Anti Anti

(success)= failure (moderation^ intemperance (beauty)= plainness (experienced rawness

Once again, some antonyms display interesting morphological features. The suffix -ness attached to some of the values above clearly refers to a quality or a condition (Anti (energy) = slowness and Anti (consideration) = thoughtlessness are cases in point).

7.8.3

as opposed to

Consider the following examples from CR: apparent adj (not real) apparent, de surface

167

real adj (as opposed to apparent) véritable, vrai, réel As opposed to clearly plays exactly the same role in the micro-definition as not. In the database, the subtle distinction, if any, has disappeared: Anti (real)= apparent Anti (apparent)= real It should be stressed that as opposed to is not used here in its prototypical sense. More frequently, it introduces potential contrastive terms (see 7.9).

7.8.4

no longer

No longer is a variant of not, the only difference being that the former converter introduces a notion of time. Basically, though, the word introduced by no longer is an antonym of the headword: obsolete adj (no longer valid) dépassé, périmé retired adj (no longer working) retraité, à la retraite

7.9

Contr

Consider the following micro-definitions introduced by as opposed to which indicates a contrastive item: practice η (as opposed to theory) pratique mind η (as opposed to matter) esprit country η (as opposed to town) campagne kind η (as opposed to money) nature back η (as opposed to front) dos, derrière The LF Contr is used to deal with such contrastive pairs which are often classified as complementarles (dead vs. alive) or directional antonyms (up vs. down) by Cruse (1986): Contr (theory)= practice Contr (town)= country Contr (money)= kind Interestingly, as opposed to introduces the background context in which a word is used. Knowledge of contrastive terms is essential to perceive how a given word schematizes the world. Fillmore (1982a) gives a very interesting account of how the words land and ground

168 differ in terms of their respective contrastive terms (land as opposed to sea\ ground as opposed to air). The two words are used to identify the same concept (the dry surface of the earth). The main point, Fillmore argues, is that they differ "in how they situate that thing in a larger frame. It is by our recognition of this frame contrast that we are able to understand that a bird that 'spends its life on the land' is being described negatively as a bird that does not spend any time in water; a bird that 'spends its life on the ground' is being described negatively as a bird that does not fly" (1982a: 121). Similarly, if I get paid in kind for, say, a translation, it boils down to saying that I do not receive any money for the service I have rendered. In the examples above, as opposed to introduces the range of contexts in which the headword can be used and specifies the 'frame' which is applied by using that very word. Contrastive terms can also be introduced by opp of, which functions as a variant of as opposed to: land η ( U: opp of sea) terre no part (opp of yes) non

7.10

Son

The Son LF refers to the typical cry or noise made by something or someone. In general, the literature usually tends to focus on typical cries made by animals (frogs croak, dogs bark, elephants trumpet...). It should not be forgotten, however, that many objects are apt to make noises which are denoted by specific lexical items (bells ring; thunder booms, peals, grumbles, mutters, rolls or roars; a clock ticks...). The Son relation is, most of the time, left implicit in the dictionary entry, which means that the semi-automatic assignment of the function is practically impossible. Defining formulae are used in only 5 cases, of which 4 resort to the converter sound of.

7.10.1

sound of/noise of

crunch η (sound of teeth) coup de dents puff η (sound of engine) teuf-teuf wash η (sound : of waves) clapotis zip η (sound of bullet) sifflement zing η (noise of bullet) sifflement Several remarks ought to be made here. First, it should be noted that the defining formula examined here points to the noun which expresses a sound. This converter then points to a complex lexical function since S0 has to be used in conjunction with Son: S„Son (engine)= puff. Second, the different treatment of the Son relation in the entries of zip and zing (sound vs. noise) is no longer an obstacle since the collocation is now represented consistently as

169 S 0 Son (bullet)= zip, zing. Third, the entry for wash above shows how difficult the automatic processing of micro-definitions can be. The colon separating the word sound and the o/~-phrase complicates the analysis of the defining material since it hinders the use of a pattern-matching procedure (see my remark on p.l 16). The noun sound is here no longer used as a defining formula but as a hyperonym (a wash ISA sound). Scores of items pattern in a similar fashion and are "defined" in terms of this hyperonym used in isolation. This information can only be exploited in our database if the entry contains an explicit reference, in square brackets, to the entity (person, animal or thing) which makes the sound in question, as in the following examples: blast η (sound) [bomb] explosion drone η (sound) [bees] bourdonnement pipe η (sound) [bird] chant pop η (sound) [cork etc] pan Sound is then used here as a clue to the appropriate lexical function (S„Son) that should be assigned to capture the relationship between the bracketed material {bomb, bee...) and the headword.

7.11

Incep

Incep is the LF which yields inchoative verbs, i.e. verbs referring to the beginning of a change of state. Very much like Caus, it is often associated with Pred to form a complex lexical function. In terms of semantic primitives, the lexical representation of inchoative verbs always includes the predicate BECOME. Like CAUS, BECOME should not be confused with the English verb 'to become'; other verbs can function as the surface realization of this primitive, which means that the CR micro-definitions are going to make use of various defining formulae.

7.11.1

become

The most frequent formula is "become" used in combination with an adjective. If the adjective is used in the comparative form, the functions Plus or Minus will also be used, depending on the orientation of the comparison. The following examples illustrate the various cases: appear vi (become visible) apparaître, se montrer change vi (become different) changer, se transformer cool off vi (become less angry) se calmer, s'apaiser fill out vi (become fatter) forcir, grossir

170 These examples now appear as follows in the database: IncepPred (visible)= appear IncepPred (different)= change IncepPredMinus (angry)= cool off IncepPredPlus (fat)= fill out Of course, as is also the case with Cause+Adj, the adjective has to be reduced to its base form by stripping affixes or deleting more/less.

7.11.2

grow

The verb grow expresses the same relationship as become. It is most of the time used with an adjective but it may also be found in combination with verbs (consider become below): become vi (grow to be) devenir, se faire fail vi (,grow feeble) faiblir, baisser sober down vi (grow sadder) être dégrisé The micro-definition for sober down is questionable since it normally means that someone becomes serious or thoughtful, but definitely not sad.

7.11.3

get

The verb get can also be used as an inchoative operator. It is then associated with an adjective, as in the following examples: shrink vi (get smaller) rétrécir come round vi (get better) se rétablir, se remettre thaw vi (get warmer) (se) dégeler The use of an irregular comparative form (better) in the second example means that the adjective must be manually reduced to its base form when enriching the database with LF information: IncepPredPlus (good) = come round

7.11.4

turn

The verb turn is only used once in combination with an adjective:

171 blench vi' (turn pale) pâlir, blêmir

7.12

Smod refers to the typical noun for the mode pertaining to a given situation. The typical example cited in the MTT literature is : Smod (write)= handwriting.

7.12.1 way of + V-ing Way of acts as a converter to turn a verb into a noun. In the database, the verb is reduced to its base form and appears as the argument of the LF Smod: accent η (way of speaking) accents accent, paroles address η (way of speaking) conversation address η (way of behaving) abord dress η (way of dressing) tenue, mise idea η (way of thinking) façon de penser, conception

7.13

Able

Able¡ can be used with the verb argument to mean that the action denoted by this verb is possible. The subscript refers to the actant of the verb: Able, (V) means 'such that it can V' while Able2 means 'such that one can V it'. Although this function is used in several cases in the database, the definition patterns that allow it to be detected semi-automatically are limited to the following five cases:

7.13.1 meant to deceptive adj (meant to deceive) trompeur, mensonger, fallacieux inedible adj (not meant to be eaten) non comestible The two examples differ with respect to the actant that plays a role in the situation described. In the former example, the thing which is qualified as deceptive is the first actant of the verb deceive while in the latter example, the foodstuff which is described as being inedible is the second actant of the verb eat (hence the use of a passive voice). In terms of LFs this information is now represented as follows:

172 Able, (deceive)= deceptive AntiAble2 (eat)= inedible At this juncture, it may be interesting to compare the sense of deceptive illustrated above with another sense which uses a slightly different micro-definition: deceptive adj (liable to deceive) trompeur, illusoire While the converter meant to clearly indicates a standard property of the adjective, liable to indicates a typical property of the actant which is likely to entail its participation. The latter definition pattern actually corresponds to the Qual¡ lexical function since it entails Able, with a high degree of probability: Qual, (deceive) = deceptive The Qual LF is not treated in greater detail here because it cannot be assigned on the basis of other defining formulae.

7.13.2

fitto

Consider the following examples: eatable adj (fit to eat) mangeable, bon à manger inedible adj (not fit to be eaten) immangeable As is shown in the former example, Able2 need not be associated with a passive voice. This also means that it would be extremely difficult to assign the appropriate function and subscript without manual intervention. Resolving the ambiguity in 'fit to eat' (Able, vs. Able2) basically boils down to resolving an ambiguity of deep structure such as the one cited by Winograd (19B4): 'The chickens are ready to eat'. For the examples given above, the database contains the following facts: Able2 (eat)= eatable AntiAble 2 (eat)= inedible

7.13.3 can be This definition pattern is used in three cases and, predictably, is associated with the suffix -able attached to the root of a verb. Since the word following the converter is a past participle, the LF Able2 is assigned to the resulting pair of items. One can also note that the converter can also be preceded by 'which', as in the first example below:

173 distinguishable adj (which can be differentiated) qui peut être distingué, qu'on peut distinguer negotiable adj {can be sailed) navigable negotiable adj (can be crossed) franchissable

7.13.4 able to This definition pattern, used in the three examples below, is naturally associated with the LF Able¡. The specification of the subscript depends on the voice of the verb which follows this formula: accessible adj (able to be influenced) ouvert, accessible literate adj (able to read) qui sait lire et écrire visible adj (able to be seen) visible These entries form the basis of the following triples: Able2 (influence)= accessible Able, (read)= literate Able2 (see)= visible

7.13.5 capable of Two words resort to this definition pattern to account for the Able LF: knockdown adj (capable of being taken apart) démontable expansive adj (capable of expanding) expansible, dilatable This type of definition pattern being followed by an -ing form, it is essential to reduce the verb to its base form when modifying the database and deleting the converter. As in the other categories above, the choice of the subscript depends on the semantic role played by the actant (agent vs. patient), which is represented as follows: Able2 (take apart) = knockdown Able, (expand) = expansive

174 7.14

Perf

Perf (=perfective) is used to refer to a completed action, meaning that the process has been carried through to its natural limit. This function is most often used in combination with other LFs. The argument is typically a verb, as in the following example: S ι Perf (divorce)= divorcee The MTT literature only mentions examples combining Perf with S, (referring to the first actant) but it is sensible to think that it can also be combined with S2 (the second actant).

7.14.1 thing+ Ven The following examples all contain a two-word definition in which the word thing is followed by a past participle: composition η {thing composed) composition, oeuvre deletion η (thing deleted) rature forgery η (thing forged) faux pawn η (thing pledged) gage, nantissement remembrance η (thing remembered) souvenir, mémoire All these verbs in the micro-definition subcategorize for an object (hence the use of a past participle), i.e. the second actant in the situation. This information is now represented as follows in the database: S 2 Perf (compose)= composition S 2 Perf (forge)= forgery S 2 Perf (pledge)= pawn Of course the verb has to be reduced to its base form.

7.15

Magn//

Fused expressions are used when a single lexical unit covers both the meaning of the function itself and that of its argument (Mel'cuk & Zholkovsky 1988:64). The symbol // is used to show that the fused value can be substituted for the argument, as in: Magn (wind)= // hurricane

175 The examples cited in the literature usually involve nouns, but the entries below clearly show that CR contains mainly instances of the Magn fused function applied to adjectives or adverbs (very modifying the latter): annoying adj (very irritating) ennuyeux, fâcheux arctic adj (very cold) glacial blatant adj (very obvious) criant, flagrant desperate adj (very bad) atroce, abominable shockingly adv (very badly) très mal, odieusement In the database, the symbol // is put next to the lexical function to indicate that the argument and the value are in a paradigmatic relation with each other: Magn // (irritating)= annoying Magn // (badly)= shockingly Identifying the word very as the first element of an italicized string of words in parentheses then makes it possible to assign this LF semi-automatically.

7.16

Centr

The Centr LF refers to the centre of something. When one considers the following two entries, excerpted from the same page of the printed dictionary, one may reasonably wonder why two different defining formulae are used to convey the same type of relationship: midsummer η (height of summer) milieu or coeur de l'été midwinter η (heart of winter) milieu or fort de l'hiver One might argue that the choice of the lexicographer is probably semantically motivated and can be accounted for in terms of orientational metaphors such as those described by Lakoff & Johnson (1980). The latter argue that these metaphors organize a whole system of concepts with respect to one another, giving the concepts a spatial orientation (p. 14). Their favorite examples are the metaphors HAPPY IS UP and SAD IS DOWN, which account for expressions such as I'm feeling up today, That boosted my spirits, or I fell into a depression. The use of height in combination with summer is consistent with Lakoff & Johnson's observation that an UP orientation is given to general well-being (health, life, status, warmth, etc). Conversely, winter is associated with death, cold, disease and lots of other negative connotations. This might explain why the lexicographer chose to use heart instead of height to define midwinter. This explanation is hardly verifiable, of course, and may be considered a moot point because the heart is, in other contexts, associated with life and health. There must be an explanation, however, to account for what seems to be an inconsistency in the

176 treatment of these two similar items. In the database, this distinction is suppressed since we are primarily interested in the relationship between summer/winter and midsummer/midwinter. Consequently, the database contains the following records: Centr (summer) = midsummer Centr (winter) = midwinter The link between heart and winter on the one hand and between height and summer on the other is not irretrievably lost, however, since these very combinations are nested in the examples of the dictionary (s.v. height and heart). These examples clearly show that the link may simply be collocational, but it is difficult to discard a metaphorical explanation. The relationship between lexical functions and metaphors is the topic of Chapter 13 and is discussed in greater detail there.

7.17

Conclusion

The following table lists the various defining formulae described in this chapter and indicates which lexical functions they usually express.

(Micro-)Definition Pattern

Lexical Function

able to act of allow to as opposed to become + Adj body of can be + Ven capable of cause to + V causing + Ν cluster of concerning fit to flash of for + V-ing get + Adj group of grow + Adj having + Ν lack of lacking Ν list of made of make + Adj meant to member of number of of a / of the piece of series of set of sound of thing + Vcn turn + Adj way of V-ing

Able, Sing Perm Anti IncepPred Mult Able 2 Able, 2 Caus A2 Mult A0 Able, 2 Sing ' Sinstr

IncepPred Mult IncepPred A, Anti AntiA 0 1 Mult A„ CausPred Able, Sing Mult A„ Sing Mult Mult S 0 Son S 2 Perf IncepPred Smod

8 A few suggestions towards the creation of additional functions

8.1

Introduction

I have shown in the preceding chapter that the micro-definitions used in CR entries often contain defining formulae (converters) which can be used as potential clues to assign the appropriate lexical function to pairs of lexical items. It has already been said that the core of Mel'cuk's Explanatory Combinatory Dictionary consists of a list of around 60 lexical functions. This does not mean, of course, that Mel'cuk's apparatus is meant to cover the whole range of existing lexical-semantic relations. Mel'cuk himself is aware of this, which is why he actually uses the term standard lexical function. As is argued in section 5.3.4, for a lexical function to be qualified as standard, two conditions must be met (the formulation is adapted from Mel'cuk 1997 and is reproduced here for the sake of clarity): 1. The lexical function f is defined for a relatively large number of arguments, which means, in other words, that f has a vast semantic co-occurrence; 2. The lexical function f has a relatively large number of lexical expressions as its possible values. The lexical functions will be called non-standard if these conditions are not met. To these criteria, Gentilhomme (1992:114) adds two minor factors that should enable us to determine whether a given LF is standard or not: 3. The principle of economy makes it possible to determine a minimum number of functions (this very principle accounts for the union of two more specific relations, as in Cry u Noise = Son; see above p.88). 4. Multilingualism is definitely a non-essential criterion but it may be interesting to select lexical functions which are used in several languages. The additional conditions above are meant to prevent an explosion of lexical functions and aim at rationalizing the lexicographer's work while ensuring useful generalizations. Of course, we should not lose sight of the fact that the 60-odd LFs can be combined to form complex lexical functions, which means that the potential number of relations covered by Mel'cuk's apparatus is theoretically unlimited. In practice, however, lexicographers or linguists may feel tempted to invent new lexical functions because the material they deal with does not always fit in nicely with the predefined set of functions. Consider the collocations of hair as extracted from the CR dictionary. Roughly speaking, two categories can be distinguished: A. colour adjectives (exhaustive list from CR) black, brown, brunette, carroty, dark, darkish, fair, flaxen, ginger, grey, greyish, grizzled, grizzly, hoary, light, red, snowy, unbleached, uncoloured, white, yellow.

180 Β. miscellaneous (sample list) bristly, brushy, crinkly, crisp, curly, flowing, fluffy, frizzly, fuzzy, glossy, kinky, straight, thick, thin, tousled... It is tempting to introduce a COLOUR lexical function, even though it does not meet the requirements formulated above. Mel'cuk (personal communication) argues that such a function is not general enough because colours can only apply to three parts of the body: eyes, hair (including beard) and skin. The number of potential arguments being limited, criterion 1 above is violated, which means that the Colour relation cannot be considered as a standard LF. Despite this dubious claim, I have decided to capture this relationship thanks to the apparatus of subscripted modifiers. Information about the typical colour of the keyword is therefore represented in the database as follows: (cheek) = pink Qual0co|or (complexion) = cadaverous, chalky, coloured, dark, doughy, dusky, fair, florid, green, grey, high... Q o c o i o r (curl) = yellow Qual0color (hair) = black, brown, brunette, carroty, dark... Qualocoior

u a l

As to the second category of collocations, further distinctions should be made to ensure that the collocations are described appropriately. For example, the "frame" (i.e. theoretical situation) of the lexeme should be taken into account. As a matter of fact, the lexeme hair typically appears in a grooming situation (at the hairdresser's, for example) or when we refer to its health (in anatomy or physiology). This frame should therefore be mentioned when specifying the lexical function, which only applies when one considers this particular situation. Such refinements are generally indicated through the use of subscripted modifiers. For example, the adjective tousled refers to hair that is not combed properly in a grooming situation, which is rendered as follows: AntiVer

[Brooming]

(hair)= tousled...

The Magn function can also be refined to denote quantitative importance: Magn

[quam]

(hair)= luxuriant, opulent, thick

as opposed to the "default" value of Magn: Magn (hair)= long In the remainder of this chapter, I wish to turn to a number of relations which are not accounted for in Mel'cuk's theory but are nevertheless present in the dictionary.

181 8.2

Unit

Consider the following English sentence and its translation (both from CR, s.v. cover). -We covered 8 km in 2 hours -Nous avons parcouru or couvert 8 km en 2 heures. The direct object (8 km) could be replaced by numerous other noun phrases typically, though not exclusively, consisting of a figure and a noun such as meter, mile, yard, etc. This property is rendered as follows in the dictionary: cover vt distance parcourir, couvrir It should be realized that distance does not obligatorily stand for the string d-i-s-t-a-n-c-e, unlike the following two entries in which the base of the collocation is to be taken literally: long adj dress, hair, rope, distance, journey long mean adj distance, temperature, price moyen The problem with the entry cover is that the italicized item can point to a collocation proper (the sentence 'we covered the distance in 2 hours' is perfectly well formed) or to a whole thesauric class of words which could be described as [+UNIT] nouns. The problem is that such information ought to be stored in the lexicon, not necessarily in the form of a semantic feature à la Katz & Fodor but rather as word-relation-word triples like: kilometer UNIT distance mile UNIT distance day UNIT time pound UNIT currency Klavans et al. (1993:127) have tried to extract potential UNIT relations from Webster's Seventh New Collegiate Dictionary (W7). Very much like Ahlswede & Evens (1988a), they search for defining formulae which they use as clues to the UNIT relation. They show that the use of 'unit' in a defining structure such as 'unit of N 2 ' is ambiguous because it can refer to three different entities (the syntax of W7 definitions can be tapped in order to solve these ambiguities): a. units as amounts (carat= unit of weight) b. units as subdivisions (legion= ...unit of the Roman army) c. units as individuals (distich= ...unit of two lines). The structure above can also be used in CR but only yields poor results, as shown below: day η (unit of time: 24 hours) jour point η (counting unit: Scoi, Sport, St Ex; also on scale) point rand η (monetary unit) rand wing η (air force unit) groupe (de deux ou de plusieurs escadrilles).

182 Day, point and rand all exemplify the unit-as-amount sense (with rand illustrating the subclass of currencies). Wing exemplifies the unit-as-subdivision sense. ' Unif in italics is underrepresented in CR, which means that other clues have to be used to detect the information we are interested in. Klavans et al., who had also tried to apply their algorithms to LDOCE, had been struck by the few occurrences of unif. "Since UNIT is not part of LDOCE's controlled defining vocabulary, the UNIT relation is often expressed by "" as in 'nautical mile: measure of distance' (1993:130). This type of definition pattern is hardly ever used in CR, however, which does not mean that potential [+UNIT] nouns cannot be extracted. In fact, the only valid clue seems to be the use of unit or measure in parentheses, as in the following examples: bushel η (measure) boisseau cable η (Naut: measure) encablure chain η (measure) chaînée foot η (measure) pied (anglais) gill η (measure) quart de pinte horsepower η (unit) cheval-vapeur Sadly enough, the examples above are useless in a relational perspective since no information is given on the sub-class of unit these headwords refer to (distance, time, etc). Ideally, the information in the database should be stored in the following way, assuming that the UNIT relation behaves very much like one of Mel'cuk's LFs: Unit (distance)= cable, chain, foot... Unit (currency)= dollar, pound, rand... Unit (liquid)= gill Unfortunately, the information offered by CR is too limited to be able to capture this data. The only case for which such information is partially available is the following one: g η (Phys: gravity, acceleration) g The relationship which holds between the headword and the italicized items is clearly the UNIT relation, which is now represented as follows in the database: Unit (gravity)= g Unit (acceleration^ g Needless to say, the identification and assignment of the relation must be made manually by the lexicographer since this information is left implicit in the dictionary entry. To turn to a slightly different, though related, point, I wish to argue that being able to identify currencies is also crucial if one wants to select the appropriate reading in the following examples:

183 recover vi [the economy, the dollar] se rétablir strong adj (Econ) the pound, dollar solide weak adj (Econ) the pound, dollar faible weaken vt (Ecori) the pound, dollar affaiblir, faire baisser Like distance above (s.v. cover), the noun dollar can of course be considered as a collocation of the four headwords listed above. More importantly, its presence is mainly meant to indicate that the verb recover requires a [+CURRENCY] subject noun or that the adjectives strong and weak typically modify such a noun. It should be realized that dollar is NOT the head of a thesauric class but rather a prototypical example (like pound) which enables the lexicographer to save space while capturing what is in fact a selectional restriction. The human reader knows that dollar can be replaced by yen, Deutsche Mark, Ecu or Euro but any lexical database to be used in NLP ought to be able to make such inferences. In this respect, the data offered by CR is of little or no use at all, which is definitely one of the major limitations of a method which would concentrate exclusively on information extracted from a single source.

8.3

Part

Mel'cuk does not consider the part-whole relationship as a lexical function because it is essentially semantic, insofar as it relates to world knowledge, and not lexical. I have introduced the Part lexical function in the database to account for this relation which Mel'cuk considers to be part of the encyclopedic description of the lexeme. Apresyan et al. argue that it belongs to the 'lexical universe' of the keyword, which they define as 'an informal description of a sufficiently broad 'piece' of reality including the given situation as a constituent element' (1969:19). Although I share Mel'cuk's claim that the part-whole relation is purely semantic and not lexical, I tend to agree with Grimes that "making lexical functions of them [i.e. of part-whole relations -T.F.] allows them to be accessed by the function mechanism, which may be an advantage in information management" (1990:358). Since partwhole relations are a sub-class of collocational relations, making them accessible in a collocational database is of paramount importance if one wants to use that database in applications such as language teaching, information retrieval or artificial intelligence. Ahlswede & Evens (1988b), for instance, have developed a lexicon for a medical expert system which makes extensive use of the PART relation to account for the relation between say, cerebellum and brain. Iris et al. (1988b) refine the notion of part-whole relation and argue that this relationship actually covers four major sub-types: a) functional parts: discrete entities composing a whole (wheels, pedals in a bicycle) b) segmented wholes: a continuous whole is segmented into portions (lap is a segment of a journey) c) collection-member relations: aggregate of objects (a gaggle of geese) d) set-subset: (the set of foodstuffs contains the subset of edible fruit).

184 Iris et al. (1988b:264) asked Mel'öuk why the part-whole relation did not appear in the basic list of lexical functions and Mel'òuk replied that he felt it was too general to be useful. The four sub-categories above indeed show that some standard LFs actually cover some facets of the part-whole relation (Mult and Sing would then be a very specific instance of it, as shown in the example illustrating sub-class (c) above). In the database derived from CR, I have deliberately restricted the use of the Part LF to the functional components (the part being viewed as a functioning unit in a whole). Specific individuation and collective nouns (a grain of rice, a bunch of pebbles) are excluded from the part-whole relation and are accounted for in terms of the Sing and Mult LFs respectively (so are collection-member relations). Roughly speaking, then, the Part "function" has only been assigned to deal with the relation which holds between, for example, the noun brake on the one hand and the functional components brakes are made of, such as drags, drums, linings, shoes... on the other. The PART relation may be expressed in a number of ways. In their work on W7, Iris et al. (1988b) have identified a large number of clues (converters and defining formulae) to detect potential part-whole relations in dictionary definitions. Most of the converters discussed in their paper (section of division of comprising, including...) are never used in CR. Other devices are used by lexicographers, however, as shown in the following examples: inner [shoe] inner sole semelle intérieure sole η [shoe] semelle insole (part of shoe) première The brackets represent the traditional N,+N 2 collocations (the sole of a shoe) where N, is the part and N 2 the whole. The third example illustrates the use of the most frequent defining formula, viz. part of.

8.3.1

part of

Around 50 micro-definitions resort to this defining formula to account for the part-whole relationship. Typical examples are: chorus η (part of song) refrain gas ring η (part of cooker) brûleur gauntlet η (part of glove) Crispin icebox η (part of refrigerator) compartiment à glace, freezer presbytery η (part of church) choeur vestry η (part of church) sacristie The examples above clearly show that the part-whole relationship is a one-to-many relation, which is also why Mel'cuk does not consider it as a lexical function. Applying the Part function to a given argument may yield an indefinite number of values. Part (church) = presbytery, vestry is a case in point since presbytery and vestry are by no means interchangeable or synonymous. In an MTT-based generation system such as the one

185 described by Mel'cuk & Polguère (1987), the Part relation could not be used as a basic device for generating paraphrases, unlike standard lexical functions. One cannot deny, however, that this relation is crucial for information retrieval or pedagogical purposes.

8.3.2

branch of

This formula is only used in three cases and refers to a unit viewed in the context of a broader organization or system: arm η (branch of military service) arme discipline η (branch of knowledge) discipline, matière judiciary η (branch of government) pouvoir judiciaire

8.4

Child/Parent

Child is the relationship which holds between sheep and lamb, swan and cygnet or cat and kitten. Parent is the inverse relation. Strangely enough, Mel'cuk does not grant this relation LF status, perhaps because it is a specific case of converseness. Yet, it does not seem to violate the criteria listed at the beginning of this chapter since Child applies to a fairly large number of arguments or values (even if the list mainly consists of animals). Ahlswede & Evens ( 1988a:215) show how this type of lexical relation can be used in a relational thesaurus for a question-answering system or for information retrieval. In pedagogical applications, this relation is also undeniably useful and many primary-school exercises specifically resort to it in an LI vocabulary-acquisition perspective (e.g. What do you call the young of a sheep/ a young dog...?). In some cases, the relationship is left implicit and has to be decoded by hand as in the following examples (note the inconsistent use of parentheses or square brackets): foal η (horse) poulain; (donkey) ânon calf η [elephant] éléphanteau; [deer] faon; [whale] baleineau; [buffalo] buffletin In other cases, the relation is signalled through the use of young in the micro-definition. Young then functions as a converter and immediately precedes the parent: calf η (young cow or bull) veau In the CR database, the conjunction or was used to split this micro-definition into two new records: calf η (young cow) veau

186 calf η (young bull) veau After reorganization of this data, the triples now appear as follows: Child Child Child Child Child

(horse)= foal (donkey)= foal (cow)— calf (bull)= calf (elephant)= calf

It should be pointed out here that it would be extremely dangerous to try and assign the Child relation in a purely automatic way based on the detection of the converter young. Consider the following entries: infant η (young child) enfant en bas âge, petit enfant maid η (young girl) jeune fille partridge η (young bird) perdreau On the face of it, the micro-definitions resemble the material found in the entries for calf. It goes without saying, however, that an infant is not the child of a child, nor is a maid to be viewed as the child of a girl! This provides ample evidence that extreme caution is required in assigning lexical functions.

8.5

Male/Female

Male and Female are two lexical relations which, like Child, are not granted LF status in Mel'cuk's ECD. Yet, it is clear that they all belong to the same set of basic relations in the animal world. Interestingly, Male and Female should not be conceived of as inverse relations (unlike the Child and Parent relations discussed earlier). The defining formulae are either or as in: boar η (male pig) verrat buck η (male of deer, rabbit, hare etc) mâle bull η (male of elephant, whale etc) mâle cow η (female of elephant etc) femelle hen η (female bird) femelle The incomplete nature of the dictionary is self-evident if one considers the use of etc in the micro-definitions. This word is the lexicographer's (and the user's) nightmare because it is an admission of helplessness; for lack of space, the compiler is forced to abridge the entries in order to condense the information he or she wishes to convey. Unfortunately, etc tells us little, if anything, about the number of elements it is supposed to represent. The entry for cow

187 above is illuminating: does etc mean that cow can also be used for a female whale (compare with bull)? How can we infer that cow is used to refer to the female of a big mammal? How can we know which female big mammals can be called cows? These (and other) questions have to remain unanswered because of the inherent limitations of the bilingual dictionary. As a matter of fact, monolingual dictionaries are most likely to suffer from the same limitations, as the definition of cow in LDOCE testifies: Cow 2 [C] the female form of the elephant and certain other large sea and land animals: a cow elephant / That elephant is a cow The words certain and other in the LDOCE definition are definitely as vague as etc in CR. The CR dictionary also makes use of a slightly different technique to make the Male/Female relation explicit. Consider the following examples: dog η {male) [fox etc] mâle bitch η [dog\ chienne; [canines generally] femelle; [fox\ renarde; [wolf] louve For dog, the reference to the relation proper is to be found outside the material in square brackets which specifies the 'neutral' term (fox). For bitch, this relation is totally implicit. Whatever the system used in the printed dictionary, the information has now been restructured so as to make the relations queriable and as explicit as possible: Male (pig)= boar Male (fox)= dog Female (fox)= bitch Male (whale)= bull Female (elephant)= cow The same caveat as in the preceding section ought to be expressed when one considers the entry for alto: alto η (female voice) contralto; (male voice) haute-contre The bracketed material should clearly not be interpreted as meaning Female (voice)= alto or Male (voice)= alto!

8.6

Process

Byrd et al. (1987) describe an attempt to extract [+STATIVE] and [+ACTIVE] verbs from MRDs (more particularly from Webster's Seventh). Identifying this aspectual property of verbs is undoubtedly essential in NLP since this distinction accounts for the well- or ill-

188 formedness of progressive or imperative forms of the verb. In this section, I wish to argue that nouns should also be coded with respect to an aspectual property, viz. their ability to refer to an action or a state. Consider the following entry: imprisonment η (action, state) emprisonnement The parenthesized material in italics refers to the semantic property of the headword. Imprisonment can be viewed as a process proper (action) or as the result of a process (state). The distinction is not translationally relevant as far as the entry above is concerned. It is relevant, however, to ensure the selection of the appropriate translation in the following entry: tracing η {process:U) calquage; (result) caique [+PROCESS] nouns can normally be identified if they can be run successfully through the following frame test: The [N] took 2 hours. Only [+PROCESS] nouns can be inserted in the subject slot for the resulting sentence to be grammatical. Many [+PROCESS] nouns can be extracted automatically on the basis of the presence of a hyperonym in italics referring to the [+active] property of the headword (act, action, event, process...). Similarly, the presence of result or state in parentheses can be used as a clue to the [+RESULT] property of the entry. Work on machine translation has shown that the correlation between semantics and derivational morphology has to be tapped when building the various types of lexicons needed by an MT system. Allegranza et al. (1991:71) show that the EUROTRA project has chosen to "establish a set of abstract interlingual affixes which are argument-taking governors at IS (IS = Interface Structure, i.e. the "deepest" level of syntactic analysis in Eurotra), and to leave it to the synthesis component to determine the correct choice of affix in the target language". The transfer and synthesis rules make use of interlingual features such as #state or #action_nominal [+RESULT] vs. [+PROCESS]). This means that the lexical description of affixes must include information about their semantic value. In French, for example, the suffix -is can only be attached to verb stems in order to form [+RESULT] nouns (e.g. ramassis). This is specified in the Eurotra lexicon by using the feature {lu= -is; der_type= result} (lexical unit = -is; derivational type of the suffix= [result]). Other suffixes such as -age (e.g. emballage) can be attached to nouns denoting a state or a process, which means that the attribute der type can have the values result or process. Likewise, some English suffixes seem to be closely associated with a [+STATIVE] property. The following examples show how CR conveys such information: -ness: dizziness η (state) vertige drunkenness η (state) ivresse -ship:

189 membership η (state) adhésion -hood: manhood η (state) âge d'homme, âge viril Other English suffixes (-ment, -sion, -tion) are associated with either the [+PROCESS] feature or the [+RESULT/STATE] feature: conglomeration η (act, state) conglomération decoration η (act) décoration; (state) décor isolation η (action) isolation; (state) isolement In a relational perspective such as the one adopted in this book, however, all the above examples are incomplete because they do not indicate the verbal stem from which the noun is derived. This means that the metalinguistic indicator (action, act, state, etc) can at best be used as a clue to an inherent semantic property of the headword. In many other cases, the micro-definition offers much more information in the form of a converter linking the headword with the semantically-related verb. The general pattern is "act of V-ing", as in the following examples: abstraction η (act of removing) extraction alteration η (act of altering) changement, modification application η (act of applying) application claim η (act of claiming) revendication drainage η (act of draining) drainage, assèchement erasure η (act of erasing) grattage, effacement section η (act of cutting) section, sectionnement "Act of V-ing" is clearly used to define [+PROCESS] nouns. In most cases, there is an obvious regular connection between the headword and the verb (alter-alteration; applyapplication). However the choice of the appropriate morphological rule to be applied in order to generate the correct derivation is purely idiosyncratic. Why do we use 'erasure' and not 'erasion', 'alteration' and not 'alterage' or 'alterment'? In other cases, the link between the verb and the noun is purely semantic and cannot be accounted for in terms of any morphological rule (remove-abstraction·, cut-section). A few examples also illustrate cases of nouns derived by zero suffix (clairr^-claim,,; wearv-wearn). Interestingly, Adams (1973:51) distinguishes three main groups of zero-derived nouns from verbs: in her classification, the noun may denote either the agent of the action expressed by the verb (spy), the concrete object or result of the action (catch), or the abstract result of the action (desire). The dictionary provides ample evidence that a fourth category should be added to account for zero-derived nomináis which can refer to a process proper (cast, claim, draw, drift, rub, touch, watch, wear). To formalize such relationships and to make them accessible by the function mechanism, I have introduced the Process relationship whose function is precisely to capture the link between the verb and the noun, as in:

190 S 0 Process S0Process S 0 Process S 0 Process

(remove)= abstraction (alter)= alteration (erase)= erasure (claim)= claim

Interestingly, Mel'cuk uses S0 only to capture the link between the verb and the derived form (e.g. S0 (persecute)= persecution). In my opinion, this function is too general and should be combined with Process or Result (the latter being similar to Process but denoting a [+RESULT] noun) as in: automation η (state of being automated) automation dependence η (state of depending) dépendance which is represented in the database as follows (after manual editing of the micro-definition): S0Result (automate)= automation S0Result (depend)= dependence

8.7

Telic

Consider the definition Steele & Meyer (1990:56) give for the lexical function Labreal: "Labreal^: verb meaning to 'realize', which takes the actants i and j as its subject and direct object, respectively, and the keyword as its indirect object (a) Labreal| 2 (head)= have, entertain [something in one's head] (b) Labreal 12 (saw)= cut [something with a saw]" One first thing to note is that it is rather odd (and even inappropriate) to suggest that the keywords head and saw are indirect objects in the two examples. Mel'cuk uses the term 'deuxième complément d'objet' (Mel'cuk et al. 1992:129) to refer to the grammatical function of the keyword used with Labreal. More importantly, the basic semantic aspect conveyed by the Labreal function is the notion of 'realization' of some inherent objective of the keyword. In this section, I would like to argue for a further distinction which would enable us to assign different relations to example (a) and example (b) above. Clearly, the argument 'head' is an obligatory constituent in (a) (hence the unacceptability of * 'John had the events' as opposed to 'John had the events in his head'). 'Saw\ however, is undoubtedly optional in (b), as testified by the well-formedness of 'John cut the log'. Moreover, 'saw' has an instrumental meaning and 'cut' refers to the typical action associated with that instrument (what does one typically do with a saw?). I have introduced a function called Telic to capture the relation between a noun argument (functioning as an instrument) and the typical verb associated with it. TELIC is actually the inverse function of Sinstr which takes a verb as its argument and associates it with its typical instrument. We then have the following associations:

191 Sins,r (open)= key Telic (key)= open Such information is implicit in the microstructure of many CR entries, as shown in the following examples: stick vt (with glue) coller stick back vt (with glue) recoller chop at vt (...) (with axe) wood taillader (à la hache) colour vt (...) (with paint) peindre; (with crayon) colorier erase vt (with rubber) gommer The keyword is often buried within a prepositional phrase headed by 'with'. Unfortunately, the dictionary abounds in WITH-phrases in italics, which makes the automatic assignment of the TELIC function hardly possible. After manual editing, however, such information can be accessed as follows: Telic Telic Telic Telic Telic

(glue)= stick, stick back (axe)= chop at (paint)= colour (crayon)= colour (rubber)= erase

TELIC can also be combined with S„ to yield a derived nominal which encapsulates the meaning of the argument as in: S 0 Telic (fist)= blow, buffet, thump [=Fr. coup de poing] The name of this 'function' has obviously not been chosen at random, although I am aware that the word telic is also used in the study of aspect (Comrie - 1976:44-45 - defines a telic situation as one "which involves a process that leads up to a well-defined terminal point, beyond which the process cannot continue"). In the framework I am concerned with here, it has a different meaning and is reminiscent of Pustejovsky's Generative Lexicon (1991) in which the 'telic role' of a noun is the purpose which is typically associated with the noun. In this theory, TELIC is one of the four slots in the qualia structure which characterize nouns (e.g. a cigarette has the telic role of being smoked and a film of being watched). In many cases, the Telic role roughly corresponds to Mel'éuk's Real function, although the latter is restricted to cases in which the argument is the direct object (for more information on qualia structures and the Generative Lexicon in general, the reader is referred to Chapter 4).

192 8.8

Spec

Mel'cuk uses the function Gener to account for the relationship holding between a word and its hyperonym (or generic word). The standard example is Gener (anger) = feeling. Surprisingly, Mel'cuk does not seem to have felt the need for an inverse function which would link a generic term with more specific items. Following Grimes (1990:358ff), I have introduced the Spec lexical function. Unlike Gener, which associates a given item with one and (normally) only one hyperonym, Spec associates the generic term with a large number of hyponyms. The fact that this is a l-»n relation is most certainly the reason why Mel'cuk did not grant it LF status. It is undeniable, however, that this thesauric relation is most useful in information retrieval or in any NLP-oriented task since it enables the linguist to capture properties pertaining to a whole range of items by only stating them once and percolating them down the taxonomical hierarchy. The following examples make it clear that it is impossible to do without such a relation if one considers the items in italics as keywords: brake η {vehicle) break gig η (vehicle) cabriolet tipper η (vehicle) camion à benne basculante Querying the database on vehicle makes it possible to retrieve the following information: Spec (vehicle) = brake, gig, tipper... When the values yielded by the application of the Spec function are numerous, the distinction between a hyperonym and a semantic feature becomes rather fuzzy. Consider the following examples extracted from the database: Spec (animal) = badger, ermine, hopper, mink, musquash, sheep... Other formalisms would probably assign these items a [+ANIMAL] semantic feature. I am convinced that this is just a question of notation. What is important at this stage is that this information ought to be captured in the lexicon.

8.9

Conclusion

I pointed out in the general introduction to this book that there is no consensus as to the number of relations posited by relational models. Clearly, the material contained in the CR database is so rich that the three basic relations posited by Werner (1988), viz. modification, taxonomy and queuing/sequencing, are of no use at all if one attempts to structure and

193 describe the manifold relationships holding between the metalinguistic items and the headwords under which they appear. At the other end of the continuum, Ahlswede & Evens (1988a) make use of more than 100 relations for adjectives extracted from a monolingual dictionary. In comparison, Mel'cuk's model seems to strike a balance with a list of 60 standard lexical functions. However, the combinatory potential of these LFs is such that there does not seem to be any real limit to the number of compound LFs and the wide range of relations in my database clearly reflects the flexibility of the descriptive potential of Mel'cuk's apparatus. Despite the coverage of the original model, however, I have felt it necessary to add a number of relations which Mel'öuk considers to be basically semantic, i.e. belonging to purely encyclopedic information. The border between encyclopedic and lexical information is a crucial problem in lexicography and in terminology. Instead of embarking on an in-depth analysis of this problem, which would have led me too far away from my main objective, I have preferred to opt for a more pragmatic approach. Since the CR dictionary contains information which borders on encyclopedic data, such as part-whole relations, for instance, I have preferred to use the LF mechanism which comes in handy for the applications we have in mind. After all, all NLP systems need to draw on such knowledge of part-whole or hyponymy relations and it would have been a pity to ignore the presence of this information in the database. The LF-like relations described in this chapter should therefore be considered as means of enhancing the retrieval functionalities of the database by making information readily accessible.

9 Assigning lexical functions: tests and consistency

9.1

Introduction

I have already pointed out that one of the main drawbacks of MeaningText Theory is the ambiguous nature of some lexical functions. An example of such inconsistencies can be found in Steele & Meyer (1990), when they deal with the definition of the Real and Liqu LFs. Consider the two divergent interpretations: p.47: LiquFunc 0 (problem)= solve p.51: Real, (problem)= solve The two positions are here defensible: Liqu is justified insofar as the problem no longer exists when it has been solved. Real entails that the problem is 'realized', i.e. that the task it implies is fulfilled or carried out. This divergence actually testifies to the somewhat imprecise or unclear nature of some functions. The absence of linguistic tests is responsible for these potential misinterpretations. Apresyan (personal communication, Prague, July 1991) asserts that no battery of tests is available to ensure consistency in the assignment of lexical functions. From a linguistic point of view, however, this empiricism is rather unsatisfactory and I would like to show in the remainder of this chapter that some tests can be devised to enable the linguist or lexicographer to select the proper LF.

9.2

IncepPredMinus vs. FinFunc,

In the field of nouns expressing emotions and feelings, a choice often has to be made between complex LFs such as IncepPredMinus and FinFunc0. As we have seen earlier, the former LF means that the keyword decreases in intensity while the latter LF refers to its disappearance. Consider an emotion noun such as enthusiasm which can collocate with the verbs slacken or decrease: IncepPredMinus (enthusiasm)= slacken, decrease The meaning is here fairly obvious. Besides, decrease is often considered as the prototypical exponent of the function IncepPredMinus. For verbs such as flag (tomber), wither (s'évanouir) or ooze away (disparaître, se dérober) which can also take the noun enthusiasm as subject, the lexicographer might hesitate and be tempted to assign the IncepPredMinus LF as well. In case of doubt, however, it is appropriate to use a fairly simple test, viz. to use the verb in the present perfect and verify whether the concept denoted by the keyword is still present or not:

196 His enthusiasm has withered (-> it no longer exists) => FinFunc0 His enthusiasm has slackened (-> it still exists, albeit in a small quantity) => IncepPredMinus The same holds for the French verbs which can be submitted to a test using the 'passé composé' : Son enthousiasme s'est évanoui (it no longer exists) => FinFunc0 Son enthousiasme s'est épuisé (it no longer exists) => FinFunc0 Son enthousiasme a baissé/décliné (it still exists) => IncepPredMinus It should be pointed out that some verbs are vague insofar as the test does not enable the lexicographer to know precisely whether the emotion is still present or not. Mel'cuk himself seems to be aware of this vagueness. In the ECD (Vol.1, p.77), we find: FinFunc0 ou IncepPredMinus (colère)= s'apaiser, se calmer, se refroidir We find cases of vagueness in English as well, as the LDOCE entry for wear

offtestifies:

wear off ν adv [10] to be reduced until it disappears: the pain is wearing off Reduced clearly refers to the IncepPredMinus function while disappears definitely implies FinFunc0. On the basis of such a definition, both LFs should therefore be assigned to capture this vagueness. The test shows, however, that the pain has disappeared completely in the following sentence, probably because of the particle o f f , which entails that FinFunc0 should be used: His pain has worn off (-> it no longer exists) => FinFunc0.

9.3

Culm vs. Centr

I pointed out on p.72 that the function Culm, which refers to the culmination of something, was not to be found in early accounts of MTT. Instead of having triples such as Culm (anger) = paroxysm, as is currently the case in the more recent versions of the ECD (see Mel'cuk et al. 1984:77, Vol.1, s.v. colère), Apresyan et al. (1969:12) used Centr to cover both the central part of something and the culminating part of an object or a process. It is therefore clear that Mel'cuk and his collaborators have realized that it is necessary to have two distinct LFs to account for two types of co-occurrence phenomena. Since Culm actually refers to a kind of top or summit and is frequently associated with process nouns, it is possible to subject

197 the noun base to a simple frame test in which the linguist is asked to fill in the slots (X and Y) with the base and the collocator respectively: Test: The X started to increase, reached a/its Y and then decreased sharply. If the sentence is grammatically correct, the function Culm is assigned to the collocation, as in the following examples: Culm Culm Culm Culm

(career)= culmination, peak (pain)= paroxysm (fortune)= height (despair)= extremity.

Centr is actually to be considered as a specific instance of the part of relation and is usually associated with concrete objects or temporal notions. It will be assigned when the value of the function can appear in the following frame sentence: [ART] Y is the central part of [ART] X This yields the following sets of relations: Centr Centr Centr Centr

9.4

(artichoke)= heart (cell)= nucleus (fruitstone)= kernel (bone)= pith

Telic

In section 8.7,1 introduced a new function which I called Telic in order to capture the relation between a noun argument denoting an instrument and the verb associated with it. The need for such a function arose because the function Labreal^, which means to 'realize' and takes the actants i and j as subject and direct object and the keyword as indirect object, was too vague and covered too many types of co-occurrence phenomena. Telic, whose name is derived from Pustejovsky's theory of the generative lexicon (1991 - see also Chapter 4), can best be seen as the inverse of Sinslr, the instrumental LF. Telic obviously shares some partial meaning with Real, which refers to the realization or fulfilment of an inherent aim, goal or purpose associated with the keyword. However, Real takes the keyword as direct object while Telic yields a verb which is associated with an instrumental keyword. Telic is therefore assigned if the following test is passed successfully: Telic (X)= Y iff the sentence

198 John uses X to Y is grammatical. Y must be a verb but is not necessarily intransitive. If it is transitive, the sentence frame should allow for an optional direct object (something, someone). This test enables us to come up with the following relations: Telic Telic Telic Telic Telic

9.5

(helicopter)= lift out (gun)= bump off, shoot (gold)= electroplate, plate (paint)= blacken, colour, daub (powder)= blast

Real vs. AntiReal

The Collins-Robert dictionary contains numerous examples of practically free collocations. Even if they are not lexically restricted, these collocations may, in some cases, be analyzed in terms of lexical functions, provided they are not part of a specific sublanguage. The problem, however, may be to find out which function should be assigned to a given pair of collocates. Consider the following antonymous sentences: (a) John opened the door. (b) John closed the door. It is clear that open and close are related to the main actions associated with a door (realizing or not the 'telic' role of a door, to quote Pustejovsky (1991) - although I use the term telic in a narrower meaning above). The crux of the problem is that it is not easy to find out which verb refers to the basic aim or goal associated with door and should therefore be assigned the Real, function. Basically, a door is an artifact which can be opened or closed, but is it primarily used to prevent access to a building, a house, a room, etc, or to give access? Cobuild seems to opt for the former proposal and defines door as follows: 1.1. a swinging or sliding piece of wood, glass, or metal, which is used to close the entrance to a building, room, cupboard, etc. 1.2. the space in a wall which a door can close and by which you enter a building or a room. OALD (Cowie 1989) and CIDE (Procter 1995) give similar definitions, focusing on the closing function of doors. According to these lexicographers, then, the basic action which is associated with the concept of door is thus close, and not open. This, however, can hardly be verified and it is perfectly sensible to suggest that other lexicographers could have opted for

199 the other interpretation. The French dictionary Le Petit Robert, for example, defines the noun porte as "ouverture spécialement aménagée dans un mur, une clôture, etc., pour permettre le passage" (Robert, 1987). Unfortunately, no syntactic test seems to be able to cut this Gordian knot. One solution would be to say that the various goals which are associated with the noun door include actions such as opening or closing and the resulting notation would then be: Real, (door) = open, close, shut... Admittedly, many of the functions identified by Mel'cuk are not mathematical functions proper, but rather relations, which means that the values may not be synonymous. The problem with the above notation, however, is that the values may be diametrically opposed (anti (open) = close), and that some refinement is felt to be necessary. Although the choice was somewhat arbitrary, I therefore decided to stick to Cobuild/OALD/CIDE's interpretation and to consider that a door is primarily used to close the entrance of something. Hence, the LF which is to be assigned to the collocation close a door is Real,, which means 'realizing the inherent purpose associated with a door'. Conversely, 'opening a door' will be analyzed in terms of the AntiReal, LF. In the database, the following data are recorded: Real, (door)= close, push to, shut AntiReal, (door)= open, push, punch in, unbar, unbolt, unfasten, unlock At this juncture, it should be realized that some of the verbs above are ergative (see also Chapter 12 and Levin 1993). The following sentences are therefore possible but should be interpreted in terms of different LFs: (a) (a') (b) (b')

John opened the door, The door opened. John closed the door, The door closed.

While Real, is used for the causative interpretation, Fact0 indicates that the inherent objective of the keyword is realized, which is coded as follows in the database: (a) (a') (b) (b')

AntiReal, (door)= openvl (+ the verbs cited above) AntiFact0 (door)= openvi Real, (door)= closevl ( + the verbs cited above) Facto (door)= closevi, shutvi, shut tovi, swing tovi

The subscripts are used here to indicate whether the verb is transitive (vt) or intransitive (vi). The potential confusion is less acute in French since we usually resort to a pronominal verb for the inchoative interpretation: AntiReal, (porte)= ouvrir AntiFacto (porte)= (s') ouvrir

200 It should be noted that the preceding considerations are true of other items such as window, cupboard, lid, etc. However, Igor Mel'cuk argues that this way of viewing things is culturespecific. According to him (personal communication, June 1993), the same situation is viewed differently in Russian. Surprisingly enough, for a Russian native speaker, a door or a window gives access to something, which means that the Real, function applied to door should yield the equivalent of the verb open in Russian. I pointed out above that a French dictionary such as Le Petit Robert defines the noun porte as something which primarily gives access, which clearly shows that English and French monolingual dictionaries seem to take opposite views. If Mel'cuk is right, this entails that efforts to use LFs in a transfer-based machine translation system as advocated by Danlos & Samvelian (1992) or Van der Wouden (1992a) are vain. It would indeed be impossible to use LFs as a sort of pivotal semantic representation if the values for a given keyword are diametrically opposed in two different languages. Nothing can provide evidence that Mel'cuk is right, however, and I showed above that the choice may eventually be purely arbitrary, which constrains the potential use of such a lexicon in an NLP perspective even further.

9.6

Morphological clues

Calzolari (1990) analyzes the interrelation between 'thematic' relations (agents, places, instruments, etc) and the phenomenon of suffixal word formation. She is mainly concerned with the construction of lexical databases (LDBs) and lexical knowledge bases (LKBs) from the typesetting tapes of machine-readable dictionaries. In order to automate, or at least semiautomate the construction of such lexical repositories, she claims that derivational morphology provides 'surface signals' which can be used to detect underlying relations (nouns ending in -er in English are liable to refer to Agent names - teacher, singer, dancer - or Instrument names - computer, spreader). In Fontenelle & Vanandroye (1989:29-34), three suffixes (-en, -fy, -ate) are analyzed in detail and shown to be predictors of the causative/inchoative alternation (for an in-depth study of ergativity in the Collins-Robert database, the reader is referred to Chapter 12). Such morphological clues can therefore be used as pointers indicating potential lexical-semantic relations between an item and its derivates. Along the same lines, I wish to show in this section that certain morphological phenomena can be used to ensure consistency in the construction of our database. More particularly, it may be interesting to try and establish a correspondence between the meaning of some particles in phrasal verbs and the lexical functions which express the semantic link between the phrasal verbs in question and the base of the collocation.

201

9.6.1

Up/Down

At first glance, one might be tempted to increase. The database indeed includes a whose particle is up is the exponent of a following examples are cases in point CausPredPlus, depending on whether the object of a causative verb):

associate the particle up with verbs meaning to large number of instances where a phrasal verb compound lexical function containing Plus. The (note that the LFs may be IncepPredPlus or base is the subject of an inchoative verb or the

IncepPredPlus (tension)= build up IncepPredPlus (price)= bump up IncepPredPlus (production)= smarten up CausPredPlus (production)= speed up, step up, build up CausPredPlus (blood pressure)= push up, put up CausPredPlus (enthusiasm)= whip up It is essential to realize, however, that the presence of the particle up can in no way be used as a test to ensure that the function Plus is assigned to a collocation. Browsing through the database indeed reveals that, strangely enough, up is found in a phrasal verb which means to decrease, as in the following examples: CausPredMinus (opposition)= soften up CausPredMinus (resistance)= soften up The Minus LF is justified because soften up is a hyponym of decrease in the neighbourhood of opposition or resistance. Unlike what happens in the former set of examples above where up imposes a spatial orientation (upward direction) to a "neutral" verb expressing a change of state, the particle up here reinforces the meaning of the verb soften which is already loaded with a downward orientation. Up is therefore to be considered as an intensifier of soften, rather than as a directional particle. It is the task of the linguist to make generalizations and to propose rules for seemingly irregular patterns of behaviour. In assigning lexical functions to code the semantic link between base and collocator, it is of course tempting and natural to suggest a set of rules which would make it possible to automate the assignment process. Combining information on the part of speech of the collocator, the presence of a particle such as up or down and the typographical nature of the string in italics could then result in the formulation of the following rules (see section 6.6.1 for an explanation of the values of the Typ field): POS= POS= POS= POS=

vi vt vi vt

+ + + +

Typ= Typ= Typ= Typ=

C + S+ C + S+

"up" -> LF= IncepPredPlus "up" LF= CausPredPlus "down" LF= IncepPredMinus "down" -> LF= CausPredMinus

The above discussion relative to soften up provides evidence that such rules only give rather general tendencies and must not be applied blindly. Moreover, they result from an

202 oversimplification insofar as they do not take the polysemous nature of up and down into account. The database reveals that up also appears in phrasal verbs which are the values of the Real function, reinforcing the idea that something is realized, completed or achieved. Real, Real, Real, Real, Real, Real, Real,

(achievement^ chalk up (button)= do up (deal)= tie up, wrap up (form)= fill up (point)= score up (record)= set up (sail)= get up, haul up, put up

Up has here a positive value since it contributes to the idea that the prototypical goal associated with the keyword has been achieved. The particle can also convey a more negative meaning and be found in phrasal verbs denoting some eradication or nullification, to use Benson et aV s terminology (1986a). The organization of the dictionary database makes it possible to check this assumption by listing the phrasal verbs containing up and associated with the Liqu LF, a sample of which is given below: Liqu Liqu Liqu Liqu Liqu Liqu Liqu

(ammunition)= use up (army)= cut up (bridge)= blow up (business)= give up, sell up, shut up, wind up (coalition)= break up (deficit)= make up (weed)= dig up, pluck up, pull up, tear up

Interestingly, down is also found in verbs associated with Liqu. This seems to support Lakoff & Johnson's claim that orientational metaphors are pervasive in the lexicon and organize a whole system of concepts with respect to one another (1980:14). Metaphors such as BAD IS DOWN or LESS IS DOWN can be used to explain why the particle down, which frequently appears in phrasal verbs meaning decrease, is also closely linked with eradication verbs. Of course, the examples above illustrating verbs with up have to be accounted for in terms of other metaphors with a different experiential basis (the relationship between metaphors and lexical functions is dealt with in more detail in Chapter 13). The following examples illustrate the link between down and Liqu: Liqu Liqu Liqu Liqu Liqu Liqu

(anxiety)= fight down (barrier)= kick down, push down (building)= burn down, knock down, pull down, take down, tear down (business)= close down, shut down (opposition)= break down (rebellion)= stamp down (also: stamp out)

203 The relationship between down and Liqu is only possible when the phrasal verb is transitive since Liqu is basically a causative function with a negative meaning. When the phrasal verb is intransitive, down may indicate that a process or state comes to an end, which is represented in terms of the FinFunc 0 LF, as in the following examples where the value of the function is a terminative phrasal verb: FinFunc 0 FinFunc0 FinFunc0 FinFunCo

9.6.2

(argument)= break down (hope)= fall down (protest)= fall down (wind)= die down

Off

In many cases, the particle off seems to determine the meaning of the phrasal verb with respect to the bases with which it co-occurs. Consider the following relations where off indicates that an action has been accomplished or that some goal has been reached: Real, Real, Real, Real, Real,

(aim)= bring off, pull off (attack)= bring off, pull off (deal)= bring off, pull off (hoax)= bring off, pull off (work)= finish off

The idea of completion is here formalized in terms of the Real LF. If a deal is pulled off or brought o f f , it means that it is completed successfully and that the goal associated with it is reached. The same base can also be used with other phrasal verbs containing off and representing other LFs, as in: Liqu (deal)= cry off In these cases, off bas a negative meaning since it implies that the deal cannot be carried out successfully. The relationship between off and the Liqu LF is pervasive in the lexicon, as the following examples clearly show: Liqu Liqu Liqu Liqu Liqu Liqu Liqu

(agreement)= call off (animal)= finish off (arrangement^ cry off (ash)= flick off, flip off (chain)= cast off (disguise)= throw off (engagement^ break off

204 Off can also be found in verbs meaning to decrease, in which case it imposes a spatial orientation (downward direction) in very much the same way as the particle down which is dealt with above (see p.202). IncepPredMinus (demand)= ease off, slack off IncepPredMinus (enthusiasm)= fall off IncepPredMinus (storm)= taper off When one considers the following triples which illustrate the association of phrasal verbs with off with the LF Fac^ (which means that the goal denoted by the keyword is realized), it becomes clear that the presence of a given particle in a phrasal verb cannot be used as an infallible test to assign lexical functions: Facto (attempt)= come off Fact0 (event)= go off, pass off Facto (gun)= go off By and large, the meaning of off can be roughly subdivided into the following two broad categories: a) indicates that something is realized, completed: LF = Real (with transitive verbs), Fact (with intransitive verbs) b) indicates that something is removed, destroyed, disconnected or on the decrease: LF = CausObstr, Liqu, IncepPredMinus... In some cases, off also conveys an additional meaning, specifically emphasizing the speed with which an action is carried out. In the database, this is represented with the subscripted modifier speed which is attached to the intensifying function Magn, as in the following examples: MagnspeedReal, (letter)= dash off, run off, write off MagnspecdReal, (work)= polish off

9.6.3

Away

A similar analysis can be made for the particle away which is found in a heterogeneous set of phrasal verbs. The most frequent lexical functions associated with phrasal verbs containing away are the following: Degrad, FinFunc0, IncepPredMinus and Liqu. In some cases, the choice between FinFunc0 and IncepPredMinus is a subtle one, as I have pointed out above (see section 9.2). The fact remains nonetheless that away is most often associated with collocations expressing a degradation, a decrease at the end of a process or state. The following examples illustrate the strong negative potential of this particle: Degrad (inscription)= wear away Degrad (paint)= chip away, peel away FinFunc 0 (anger)= melt away

205 FinFunCp (strength)= ooze away IncepPredMinus (attendance)= drop away, fall away, tail away IncepPredMinus (interest)= tail away Liqu (care)= drive away Liqu (dust)= brush away, sweep away, whisk away

9.7

Conclusion

In this chapter, I have tried to show the complexity of the task the lexicographer is faced with when trying to label potential collocations with the appropriate lexical function. The ambiguity and the imprecise nature of some functions, which, in some cases, only seek to capture a superficial and partial meaning component of the combinations, is responsible for the inevitable inconsistencies in a lexicon based on lexical functions. The fact that each new volume of Mel'cuk's ECD opens with a list of amendments and corrections to the preceding volumes is not fortuitous. Lexical functions are undoubtedly a very powerful descriptive device, but the small number of tests which can lead to their correct assignment clearly indicates that the theory in general needs more operational definitions than the somewhat fuzzy paraphrases we are used to finding in the MTT literature, especially if one wishes to employ the mechanism of lexical functions in an NLP perspective.

10 A closer look at the lexical function Son

10.1

Introduction

Since the Collins-Robert database has been enriched with lexical-semantic information, it is possible to use this lexical resource to test a number of linguistic hypotheses. In this chapter, I should like to concentrate on the realization of one particular lexical function, namely Son, which accounts for the typical sounds or noises made by a given entity, whether animate or inanimate. More specifically, I wish to show here that the very organization of the lexical data in database form makes it possible to investigate the manifold regularities that underlie the structure of the lexicon, improving on existing descriptions of linguistic phenomena.

10.2

Phonaesthesia and onomatopoeia

It is now widely recognized that the relationship between form and meaning is purely arbitrary (cf. what Saussure called the arbitrariness of the linguistic sign). However, linguists have also recognized that there is an exception to that principle insofar as onomatopoeic forms seem to be, at least to some extent, naturally representative of the sound or cry that they are meant to signify. Onomatopoeia probably accounts for the close resemblance between the following words which all refer to the typical cry produced by a duck: quack (English), kwak (Dutch), qua-qua (Italian) and coin-coin (French). This exception also extends to what linguists have called 'phonaesthesia', or sound symbolism. The latter term refers to the relationship that exists between certain combinations of sounds and the meaning of the words that include these combinations (see Lyons 1977:104; Jackson 1988:51; Sobkowiak 1990). Phonaesthesia therefore has to do with phonemes or groups of phonemes and does not necessarily involve sound. Toumier (1985:146-147) gives a very interesting and useful list of phonaesthemes which contribute to the meaning of words without denoting sounds (/tr/ for words meaning 'to walk' as in tread, trip, trot, trudge..:, /si/ for something which is thick, ugly, soft or particularly unpleasant, as in slime, slack, slop, slug...)} The phonaesthetic properties of words

Some of these lexical items may be evocative of sounds or noises (e.g. trot). Verbs such as snarl, snicker or snort, which are usually said to display phonaesthetic (sound-symbolic) properties (referring to the nose) actually also belong to the domain of sound verbs (onomatopoeia referring to the sound made when air goes through the nose). It should also be bome in mind that, for any consonant cluster, whether initial or final, there are words which do not convey the relevant phonaesthetic meaning (consider /kl/ in clear, clergy, close, cloud, club...)

208 have been used extensively by poets but I will not be concerned with them here, although the Collins-Robert database provides a powerful means of studying sound symbolism. Consider for instance the words which are associated with the noun ink in the database: Sing (ink)= blob, blot, blotch, blur, speck, splodge, spot The lexical function Sing accounts for the link between these nouns and ink because they all refer to a small 'portion' of ink. Strikingly, all these nouns display interesting properties insofar as the initial consonant clusters, which consist of a plosive (/p/ or/b/) either followed by a liquid (/l/) or preceded by a sibilant (/s/) or both, seem to refer to the sound made by a liquid falling violently, as if a drop of ink were splashing on the paper. It is interesting to note that the French translations in the database do not display such phonaesthemes (pâté, tache, bavure, éclaboussure). Some of the English verbs which are associated with ink also pattern in a similar fashion since the database tells us that one can splash about or splodge ink. In this chapter, I will focus mainly on the onomatopoeic combinations of phonemes that are found in verbs denoting the production of a sound or noise. These verbs are extracted from the Collins-Robert dictionary and the method used for retrieving them is explained in greater detail below.

10.3

Sound verbs in the Collins-Robert dictionary

Unlike learners' dictionaries, CR does not draw upon definitions to guide the user to the appropriate meaning. We saw in chapter 7 that the bilingual dictionary occasionally resorts to what I called 'micro-definitions' which rely heavily on the concept of defining formulae. The very nature of these micro-definitions does not make it possible to extract verbs of sound by concentrating only on the recurring defining structures which combine verbs denoting the concept of making (produce, make) together with nouns, adjectives or adverbs denoting a sound. In this respect, then, any attempt to retrieve verbs of sound from CR automatically or semi-automatically is sure to differ radically from the method I described elsewhere (Fontenelle 1992d, 1993) to extract sound verbs from LDOCE. The following CR entries, excerpted from the printed version of the dictionary, clearly show that words of sound cannot be retrieved on the basis of some explicit feature indicating their inherent meaning: croak 1 vi a [frog] coasser; [raven] croasser; [person] parler à voix rauque; (*: grumble) maugréer, ronchonner crow 1 η [cock] chant du coq, cocorico; [baby] gazouillis; (fìg) cri de triomphe trumpet 3 vi [elephant] barrir snap 6 vi b [whip, elastic, rubber band] claquer 7 vi b whip, rubber band, etc faire claquer

209 Apart from the translations, the most interesting pieces of information are of course the italicized items which designate the entity which produces the sound denoted by the verb or the noun headword. In the database, the lexical-semantic relationship which holds between the italicized item and the headword is Son, although this lexical function does not necessarily appear in isolation, but can also be part of complex LFs, as is shown in the following triples excerpted from the database: Son (person)= croak Son (frog)= croak Son (raven)= croak S 0 Son (cock)= crow S 0 Son (baby)= crow Son (whip)= snapvi CausSon (whip)= snapvt Son (elephant)= trumpet Several remarks are in order here. First, the information contained in the triples above exceeds what can be extracted from a learner's dictionary by applying a simple pattern-matching procedure to the definition field. In the following LDOCE (1978) entry, for example, a pattern-matching procedure such as the one described in Fontenelle (1992d) is only able to come to the conclusion that croak is a verb of sound (because of the occurrence of make and noise in the definition): croak ν 1 [10] to make a deep low noise such as a FROG (2) makes. The semantic and the syntactic link between frog and croak can only be detected if one manages to parse the definitions along the lines suggested by Alshawi (1989) or Ahlswede & Evens (1988a). A second thing to note is that this way of formalizing lexical-semantic relationships enables the linguist to capture the link between a meaning component (viz. sound) and syntactic properties such as ergativity: the verb snap is indeed a good example of a verb which displays the so-called causative/inchoative alternation and this property can be identified readily because the patient argument (whip) appears twice in the microstructure of the verb entry, first as a bracketed typical subject, then as an unbracketed typical object. The reader is referred to Chapter 12 for more detail on ergative verbs. A third thing to note is that the range of typical nouns associated with sounds in the examples above reflects Tournier's remark that the class of onomatopoeic verbs can be subdivided into three subclasses as a function of the semantic features of the entity which makes the sound (Tournier 1985: 158-159): 1. [+HUMAN] subjects 2. [+ANIMAL] subjects 3. [-ANIMATE]/[+CONCRETE] subjects It will be noted that ergative verbs are much more likely to be found in the third category because it is easier to make an object produce a sound than to force an animal to produce its typical cry. The causative reading seems to be limited to [-ANIMATE] entities (see the

210 Cobuild examples : ...the sound of the bells clanging vs. She was methodically clanging the brass bells). The database organization of the dictionary makes it possible to access information via any field. To conduct this survey of sound items, I therefore extracted all collocations (base and collocator) whose lexfunc field contains some reference to the Son lexical function. The wildcard functionalities of the extraction program had to be used to make sure that any item denoting a sound could be retrieved (lexfunc= Son, S0Son, CausSon, etc). The complete list, which comprises 232 items, is too large to be reproduced here, but a sample of this resulting list is given as Appendix A (16.1). It should be clear, of course, that the items which appear in this list can only be extracted because the original printed entries contained some reference to the base of the collocation (the entity which makes a noise), which made it possible to enrich the lexicon by labelling the collocation with a lexical-semantic relation. This means that the verb whinny does not appear in the list of sound verbs because the absence of collocational material in the microstructure of the entry prevented me from creating a triple with the relevant lexical function. Consider the entry for whinny: whinny 1 η hennissement 2 vi hennir Again, this shows that a combination of techniques is necessary to extract exhaustive lists, which means that the pattern-matching approach advocated in Fontenelle (1992d) should not be rejected since it concentrates on monolingual learners' dictionaries as an alternative source of lexical data.

10.4

Analyzing regularities

A close study of the list of verbs immediately reveals that many of them display some phonetic patterns. As Tournier (1985:155ff) points out, a common component of meaning can be associated with the consonant clusters found at the beginning of these words. The sound denoted by the items may be produced by an animate or by an inanimate entity. I reproduce below the six combinations of onomatopoeic elements listed by Tournier (1985:162), together with the semantic value of each element: Initial Consonant Clusters /fl/ (liquid + sound produced by wings) e.g. flush, fly, flutter, flit /sp/ (sound made when throwing something) e.g. spew, spit, sputter /kr/ (sound of cracking or croaking) e.g. crack, croak, creak, crunch /kl/ (general sound)

211 e.g. clack, click, clink, clap, clang /gr/ (sound conveying pain/displeasure) e.g. groan, grunt, grumble /spi/ (sound of a solid falling into a liquid) e.g. splash, splatter, splutter A careful analysis of the data extracted from CR demonstrates that the list above is far from complete. The set of initial consonant clusters that can denote sound verbs actually includes more than 20 combinations and, as I am going to show, some internal regularity can be found in the structure of these clusters. The resulting list of onomatopoeic combinations can be found below, together with a few examples from the CR database for each cluster (the combinations mentioned by Tournier are preceded by #): /bl/ /br/ # /kl/ # /kr/ # /gr/

bleet, blare, blow bray clash, clack, clatter crack, crackle, creak, croak grate, groan, grumble, grunt

# /fl/ /fr/

flap fret

/pi/ /pr/ /tr/ /tj/ /dj/ /tw/ /dr/

plop, pluck prattle trill, trumpet chatter, chime, chug, cheep, chop jangle, jingle, jeer twang, twitter drone, drum

/Ir/

shrill

# /sp/ spit # /spi/ splutter /st/ stamp /str/ strike, strum /skr/ scratch, screech, scream /skw/ squall, squawk, squeak, squeal, squelch /sn/ snap, snarl, snort, snicker /si/ slam /ΘΓ/ throb /sw/ swish The list of sound verbs extracted from LDOCE (Fontenelle 1993) shows that three more categories should be added: /kw/ quack, quaver /gl/ glide /sm/ smack

212 They do not appear in the list here because a verb such as quack does not contain any explicit reference to its subject (duck) in the dictionary, which means that the lexical function Son cannot be used to link the verb to its typical subject noun. As can be seen, the onomatopoeic combinations can be further subdivided into major classes: two-consonant clusters (CC pattern) and three-consonant clusters (CCC pattern where the sibilant /s/ is obligatorily used at the beginning of the word). It is worth noting that these initial clusters usually involve a plosive (either the voiceless /p/, /t/, Ikl or the voiced Ibi, Idi, Ig I) followed by a liquid (III or Irl). Since Isl is a voiceless sound, it cannot be combined with any of the three voiced stops (/sb/, Isdl and Isgl are impossible to utter). It will also be pointed out that no word in our list displays the combination of an alveolar stop (/t/ or Idi) with the liquid III, at least in initial position. This is due to the incompatibility of the places of articulation of these consonants: such combinations just do not occur in the phonological system of English. Another point to note is the occurrence of the phoneme Isl in two-consonant clusters where it is usually combined with the liquid IV or the nasals Ini or Imi. The following table summarizes the most frequent combinations of phonemes that play a part in the sound-symbolic system of English. 2 voiceless sibilant [/s/]

+

voiceless stop /p/ /t/ + Ikl

liquid, / J / or semi-vowel M - [/l/] [/r/] - /w/ - /// [/r/] - III - [/w/]

voiced stop Ibi Idi

liquid

+

/g/ voiceless sibilant Isl + /J/ voiceless fricative

2

/r/ - /l/ M - /y M - Iii liquid or nasal IM -Ini -Imi M liquid

It is interesting to note that the phonetic constraints illustrated here are not typical of sound verbs and onomatopoeia. Tournier (1985:60) notes that the genesis of general-language words in English is conditioned by a series of factors which determine the relationships between phonemes. In a three-consonant cluster (C,C 2 C 3 ), for example, C,, C 2 and C3 must be different and C, is obligatorily /s/. C 2 is obligatorily /p/, Iti or Ikl. C, is obligatorily Irl, /j/ or /w/ if C2 is Ikl (e.g. scream, skew, squad). Similarly, C, is obligatorily /I/, Irl or /j/ if C 2 is /p/ (splash, spray, spume). Finally, C3 is obligatorily Irl or /j/ if C 2 is Iti (strand, stew). It should also be pointed out that many of the constraints listed above are not typical of English only. Some combinations do not occur because the places of articulation are incompatible, which is a physical constraint. The same phenomena are therefore likely to be found in other languages such as French, German, Dutch...

213 /f/ /θ/

+ +

IM - Irl M

The square brackets around IM, Irl, /w/ and Is/ in the first category mean that only /spi/, /str/, /skr/ and /skw/ are possible in a three-consonant cluster with a voiceless stop, /stl/ does not belong to the English sound system; /ski/ and /spr/ are possible (sclerosis, sprinkle...) but do not occur in verbs denoting sounds. In keeping with Tournier's remark that over a third of the onomatopoeic items concern man and his behaviour (Tournier 1985:158), it is interesting to note that a large number of verbs which appear in my list refer to the way people talk or speak. These verbs, which normally subcategorize for a human subject, can be detected in the database because they are flanked by a [+HUMAN] italicized item such as person, man, woman, people or baby, as in the following examples: Son (person)= bellow, bleat, burble, caterwaul, chatter, cheep, chime in, chirp... Son (people)= cackle, cluck, prattle Son (child)= chatter, prattle Son (baby)= babble, coo, crow MagnSon (baby)= howl, scream, squall, squawk, wail AntiMagnSon (baby)= whimper As I pointed out earlier (see 5.3.4.6), Mel'cuk's lexical functions are in some cases much too vague and need to be refined with subscripted modifiers. It is clear that verbs such as babble, coo, crow, squawk and whimper, which can all be used to talk about babies, are far from synonymous. This explains why I made use of the functions Magn and Anti to refine the description of the lexical relation between baby and these verbs. In other cases, however, the distinctions cannot be made in terms of levels of intensity. For the verbs bellow, bleat, caterwaul or cheep above, for instance, other mechanisms obviously come into play. They can all be accounted for in terms of metaphorical extensions which, as I point out in Chapter 13, play a crucial part in the creation of new meanings. Indeed, these verbs all belong to the lexical field of animal noises and they are used here metaphorically to characterize human speech. In the printed Collins-Robert dictionary, such metaphors can be spotted because the microstructure of the verb entry contains two types of metalinguistic indicators, one referring to the original, "basic" meaning with a [+ANIMAL] subject, the other with a [+HUMAN] subject in square brackets to refer to the second, more peripheral reading, as in: cackle 2 vi [hens] caqueter; [people] (laugh) glousser; (talk) caqueter, jacasser cheep 2 vi [bird] piauler; [mouse] couiner. 3 vt [person] couiner* Such conventionalized metaphors, which ought to be captured and represented as such in the lexicon, can also be accounted for in terms of what Ostler & Atkins (1991) call Lexical Implication Rules (for more details on LIRs, see also Chapter 11). The presence of two metalinguistic indicators belonging to the source and target domain ontologies within a verbal entry is sufficient to predict that the verb participates in the LIR "TO UTTER SOUND (OF

214 ANIMAL) -> TO SPEAK LIKE THAT (OF PERSON)". The following entries, excerpted from the database, illustrate cases where the alternation between person and animal signals this property: Son Son Son Son Son Son

(animal)= bellow (person)= bellow (animal/person)= (animal/person)= (animal/person)= (animal/person)=

(beugler, mugir) (brailler) cry (pousser un cri) growl (grogner) howl (hurler) squeal (pousser un cri aigu)

In other cases, however, it is more difficult to detect such metaphors automatically because the reference to a [+HUMAN] subject alternates with a specific animal, as in: Son (snake/person)= hiss (siffler) Son (insect/person)= hum (bourdonner/fredonner) Son (parrot/person)= squawk (pousser un gloussement/pousser un cri rauque).

10.5

Final clusters

The preceding section showed the importance of initial consonant clusters in the analysis of onomatopoeic phenomena. However, it is clear that phonaesthemes are also found in other positions, for example in the middle or at the end of words. Tournier (1985:162) gives an interesting list of final clusters and the meaning they suggest. Obviously, the interpretation assigned to these clusters may sound somewhat impressionistic and, perhaps, slightly subjective. It is also clear that these clusters are not necessarily meant to imitate, but also to suggest sounds and the distinction between the two aspects is, I think, artificial. Tournier's list is reproduced below and illustrated with examples taken from the Collins-Robert database. 1. /aej/ /ask/ /asp/ Son Son Son Son Son 2. /aebl/ /aekl/ /asti/

resounding blow

(cymbal)= clash (whip)= crack, snap (sail)= flap (finger)= snap (wave)= lap rapid series of quick sounds (with repetition)

215 /aetô/ /AtS/ Son Son Son Son Son 3. /Amp/

(stream)= babble (hen)= cackle (twig)= crackle (typewriter)= clatter (fire)= splutter muffled noise or fall of a heavy object

CausSon (door)= thump 4. /Ambi/ indistinct, confused and prolonged noise Son (thunder)= grumble, rumble Son (cannon)= rumble 5. /u:m/ /u:p/

noise related to the sweeping movement of an object

/u:J/

Son (wind)= boom Son (engine)= zoom Son (water)= swoosh It is clear that the list above is far from complete. To discover other meaningful sequences, I extracted all the verbs which appear as values of a lexical function containing Son and sorted them automatically in reverse alphabetical order (the list can be found as Appendix Β 16.2). Most items in this list confirm Tournier's classification but it does not take too long to see that other patterns also emerge. The second category above includes the combination /aekl/, for instance, but evidence can be found that the voiced form /aegl/ is also productive. The following examples illustrate some of the final clusters which are not mentioned by Tournier: /gl/ (e.g. /aegl/, /aer)gl/, /irjgl/...) Son (goose)= gaggle Son (rain/stream/water)= gurgle Son (bell/bracelet/chain/saucepan)= jangle Son (coin/key)= jingle CausSon (coin/key)= jingle It is clear here that several phonemes combine to form complex onomatopoeia. In gurgle, for example, /g/ is a sound produced in the throat and /l/ suggests the noise made by a liquid. The combination of these two onomatopoeic elements suggests some kind of swallowing.

216 /si/ Son (leaf/paper/skirt/wind/clothes)= rustle Son (person)= whistle /asr|/ Son (door/firework)= bang Son (gun)= bang, bang away Son (bow/wire)= twang /i:k/ Son (floorboard)= creak Son (hinge/shoe)= creak, squeak Son (mouse/pen/wheel/person)= squeak /ΛΠΙ/

Son (aeroplane/engine/machine)= hum Son (insect)= drum, hum CausSon (banjo/guitar)= strum As noted by Tournier (1985:161), simple onomatopoeic elements tend to be associated with general meanings (/η/ suggests vibrations; the liquid l\l often suggests liquids but, in final position after a voiceless stop, it can also indicate a repeated sound. Irl tends to be associated with vibration and repeated sounds; a vowel such as li:l suggests a long strident sound while ly.l suggests a long, grave sound). The CR data suggests that /ae/ is associated with brief sounds or a series of brief sounds. If these sounds are abruptly interrupted, a voiceless stop is used after /as/. If the sound is prolonged, the phoneme l\l is used in final position. Such simple elements combine into clusters and we may suspect that the list is open and that new onomatopoeic combinations are likely to be created, as is the case in cartoons, for example.

10.6

Conclusion

In this chapter, I have shown that onomatopoeia and phonaesthesia are far from being clearcut categories. The artificial dichotomy between the two concepts can thus be studied from another angle once the dictionary is organized in database format and enriched with lexicalsemantic labels, as is the case with the Collins-Robert dictionary. Examining the values of the lexical function Son (and complex LFs containing Son) has shown that it is possible to investigate the manifold regularities which underlie the structure of the lexicon. I have been mainly concerned here with enriching, complementing and verifying existing descriptions of such phonological regularities but I have also shown that the availability of the Collins-Robert dictionary as a lexical-semantic database makes it possible to test and verify linguistic hypotheses on the basis of a large body of lexical data. For example, it has been possible to provide evidence that causative verbs of sound are hardly ever used with a [+ANIMATE]

217 patient, probably because it is rather difficult to convince or force an animal or a human being to make a sound or a noise (hence the ungrammaticality of *John managed to bellow the bull/ to purr the cat or to howl/gurgle the baby, to mean that one makes the animal utter its characteristic sound, one must resort to the make construction: John made the bull bellow).1 Causative verbs of sound (LF= CausSon) thus seem to be limited to items which select artifacts as direct objects, especially when there is a direct contact between the agent and the patient {John rang the bell; John was jingling his coins/keys in his pocket), i.e. when the sound is not produced internally but externally (Atkins et al. (1996) provide an explanation in terms of internal vs. external sound emission to account for differences in the transitivity patterns of verbs of sound). The conclusions described in this chapter could not have been drawn, had the dictionary not been enriched with lexical-semantic information and structured so as to make multiple-access queries possible.

3

Note that one can burp a baby. Although burping does involve the emission of a sound, Levin (personal communication) considers burp as a bodily process verb, like cough or hiccup, and not as a verb o f sound strictly speaking. Moreover, even if the sound originates inside the baby, it is brought about by external contact with a surface, which explains why it can be used transitively.

11 Noun alternations and sense extensions

11.1

Introduction

Polysemy has always attracted a lot of attention because it poses a number of problems when one attempts to construct a dictionary. Since polysemy is more frequently the rule than the exception, it is essential to develop a framework to capture the numerous types of sense extensions which can be found throughout the lexicon of any given language. A lot of effort has been recently put forth in order to devise techniques which enable the linguist to detect and formalize systematic cases of polysemy. In the Acquilex ESPRIT project, for example, Copestake & Briscoe (1991) and Briscoe et al. (1990) argue for the existence of lexical rules which account for productive métonymie sense extensions. Their favourite example is the socalled 'grinding' lexical rule which transforms a count noun with well-defined properties appropriate to an individuated physical object into a mass noun whose properties are those of a substance. Copestake & Briscoe (1991) give the following example which illustrates this 'grinding' process whereby the ground (mass) sense 'denotes some stuff which was at some time part of one or more individuals denoted by the count sense' (p.98): After several lorries had run over the body, there was rabbit all over the road. The proponents of lexical rules argue that such a framework is required to minimize the size of the lexicon. Indeed, if lexical rules make it possible to capture sense extensions such as the one described above, the number of lexical entries can be reduced dramatically since it is no longer necessary to foresee two distinct entries, one for the count reading, another for the mass reading. Of course, such a framework must also provide tools to block generalizations (exceptions). The typical example is the animal-meat rule which makes it possible to use the noun denoting an animal to refer to the meat itself (it is a specific case of the count/mass alternation: consider The chickens are pecking grains vs. He ate a full plate of chicken). This rule is blocked in a limited number of cases, as is testified by the existence of specific words of French origin to refer to the meat of several common animals (pork vs. pig; mutton vs. sheep; beef vs. ox; veal vs. calf). Of course, it is much easier to deal with such exceptions when the application of a given rule has to be blocked in a finite number of cases. The main drawback of Copestake & Briscoe's grinding rule, however, is that it says nothing about when this rule should definitely not be applied. As a matter of fact, the grinding rule can only be applied if the 'substance' referred to by the ground sense is still clearly recognizable. 'There was rabbit on the street' is possible provided that the animal is still clearly identifiable because of its fur, its size or any other feature. If a lorry runs over a radio set, it hardly seems possible to say that ' There is radio all over the street', although blind application of the grinding rule allows this. The problem is that the individuated object (the countable noun) can

220 be seen as being made up of various component parts and the grinding rule can only be applied if the object is identifiable even after being taken apart, disassembled or pulled to pieces. The ungrammaticality of ' There is radio on the street' stems from the fact that no one would be able to clearly recognize a radio set after it has been run over by a lorry. Copestake & Briscoe never counter such argument, probably because it would prove extremely difficult to formalize and implement such restriction. Another drawback of such a framework is that, by applying rules too rigidly, one runs the risk of being caught in a vicious circle. Consider the following sentences: a) b) c) d)

The chickens are pecking corn. I ate chicken for lunch. Would you like some more coffee? He ordered three coffees and drank them down.

(a) and (b) clearly illustrate the count -> mass alternation described earlier in this section, (c) and (d) illustrate the opposite alternation, viz. the mass —> count alternation whereby a mass (substance) noun is used as a count noun to denote a specific item. In theory, all uncountable nouns denoting concrete substances display such potential but, as noted by Sue Atkins (personal communication), some real-world pragmatic constraints seem to limit the application of this rule ('three flours'' being most unlikely, except to mean 'three varieties of flour', perhaps because the rule seems to be limited to institutionalized usages for common substances which are frequently bought and sold; similarly, 'three champagnes' is blocked because one never orders champagne by the glass at a bar). In any case, it is undeniable that it is extremely dangerous to postulate the existence of rules, each being applicable to the output of the other.

11.2

Lexical Implication Rules

As stated at the beginning of this chapter, many linguists have tried to devise ways of capturing generalizations regarding regular polysemy. The lexical knowledge base (LKB) developed in the Acquilex project incorporates productive rules so that it is no longer necessary to list all syntactic properties for all words (Briscoe et al. 1993). Pustejovsky (1991) and Nunberg & Zaenen (1992) present alternative ways of dealing with regular cases of metaphorical and métonymie sense extensions. Interestingly, all these recent contributions are influenced by and draw heavily on Apresyan's (1973) definition of noun alternations and regular polysemy which occur when several words all have (at least) two senses and when all the words exhibit the same relationship between these two senses. The most elaborate and detailed description of regular sense extensions can probably be found in Ostler & Atkins (1991) and in the database compiled by Sue Atkins (see below). Their work owes a lot to Apresyan's contribution to polysemy. Moreover, it can be seen as a practical contribution to lexicography insofar as it attempts to systematize (and hence speed up) the compilation

221 process. In their paper, Ostler & Atkins (1991) argue for the existence of Lexical Implication Rules (LIRs) which can be defined as follows: Lexical Implication Rule Ά lexical unit LUI, consisting of a lexical component LCI and a semantic component SCI, implies the potential existence of a lexical unit LU2, with the lexical component LCI or LC2 and the semantic component SC2'. Such a definition implies that one sense forms the basis for another sense (unlike metaphors which can be accounted for in terms of a mapping of one conceptual ontology onto another ontology - see Lakoff & Johnson 1980). The following examples illustrate the types of lexical implications LIRs are meant to capture (Ostler & Atkins 1991:77): (a) LIR Vehicle Verb NC singular: vehicle —> VTI (verb transitive/intransitive): to travel/transport using that vehicle eg. He's got a new cycle / Let's cycle into town also motor, ferry, canoe, ship, jet etc. (b)LIR Container - Amount/Content NC: container (purpose-built only)—> NC: amount it contains/contents eg. The glass broke / Add a glass of wine. Don't drink the whole glass, also all-purpose built containers (trunk, tank, jug, bucket etc) Ostler & Atkins note that their LIRs are centrally concerned with semantics and less so with formal changes. Turning a count noun into a mass noun is a single formal transformation but, semantically speaking, such a process may cover totally different LIRs altogether (animal - » meat; animal fur; tree -> wood are only 3 examples of LIRs which involve one and the same syntactic transformation, viz. NC —> NU). As was pointed out above, some mechanism has to be foreseen in order to block the application of a rule in certain circumstances. Ostler & Atkins call such mechanism pre-emption, which they further sub-divide into semantic preemption (the 'derived' sense is already taken up by a different word) and lexical pre-emption (the word already has a different sense which means that it cannot be used in the 'derived' sense after application of the LIR). To give but one example of semantic pre-emption, the LIR Vehicle Verb illustrated above is blocked in the case of the noun car, which means that it is impossible to use this word as a verb (* We carred downtown) because the verb drive is available to account for the derived sense. Explaining such gaps in terms of semantic preemption may be dangerous, however, because it could constrain the phenomenon of synonymy. It should be noted that LIRs have a lot in common with Pustejovsky's Lexical Conceptual Paradigms - LCPs (see Chapter 4). Example (b) above corresponds to the container/containee LCP, for example. However, it should be admitted that Pustejovsky's descriptive framework only takes into account 7 or 8 types of noun alternations, while Atkins has compiled a database of around 130 noun and verb alternations. This database systematically lists the properties of both the 'base' sense and the 'derived' sense (the semantic class the rule involves: eg. amount, animal, meat, food,...; the syntactic property: nc for count noun; nu for

222 uncount nouns). It also lists the words which display the alternation in question and, if any, the items which are responsible for pre-emption. In the following section, I should like to examine how some of these LIRs are represented in the Collins-Robert dictionary and to what extent potential candidates which participate in some noun alternations can be identified automatically or semi-automatically.

11.3

LIRs and the bilingual dictionary

11.3.1 Mass/Count alternation Unlike learners' dictionaries, bilingual dictionaries rarely contain grammatical information in the form of sophisticated grammar codes. In the Longman Dictionary of Contemporary English (1978), for instance, various types of mass/count alternations can easily be identified because of the presence of the conjunction of grammar codes such as [U] (uncountable) and [C] (countable) in two different senses of a word. CR occasionally makes use of a U code as well (NonC in the 1993 version), but no formalized information can be found about the countable character of lexical items. In eight cases, however, the dictionary resorts to a periphrastic explanation to account for this property. Consider the following entries which are subdivided into 2 senses: abnormality η

(a) (U) caractère anormal or exceptionnel (b) (instance of this, also Bio, Psych) anomalie; (Med) difformité, malformation sophistry η (U) sophistique; (instance of this) sophisme The words absence and casuistry similarly draw on the string ' instance of this' to show that a basically abstract (uncountable) noun can be used as a count noun. It should be pointed out that the italicized explanation can only be interpreted with respect to the preceding (basic) senses of the word, this being used as an anaphoric reference pointing back to the mass sense (coded (Ό')). In two cases, this refers back to a mass sense which is rendered by the use of the noun act as in: review η (act) révision; (instance of this) revue, examen claim η (act of claiming; instance of this) revendication, réclamation It should be noted that using 'instance of this' is in no way typical of a bilingual dictionary. Monolingual dictionaries actually make much more extensive use of this device, which enables them to shorten the definitions of derived senses. Consider the LDOCE (1978) entry for aberration:

223 aberration η

1 [U] a usu. sudden change away from the habitual way of thinking; sudden forgetfulness 2 [C] an example of this

The count sense refers to a particular instance of the more general sense and the second definition can only be interpreted because it co-occurs with and comes after a noun coded as [U], The distinction between [COUNTABLE] and [UNCOUNTABLE] is sometimes expressed in the CR dictionary through the use of hyperonyms and explanations which do not necessarily refer back to the basic sense, as in: stone η (substance; single piece; also gem) pierre The metalinguistic indicator substance refers to the uncountable usage (and functions very much like the code U), while single piece clearly shows that the mass/substance in question can be partitioned into smaller, countable components. In still other cases, the mass/count alternation can be retrieved searching for co-occurrence of piece with the code U: veal fillet {U) langue de veau; (one piece) escalope de veau fillet steak (U) filet de boeuf, tournedos; (one piece) bifteck dans le filet, tournedos peat η (U) tourbe; (one piece) motte de tourbe 'One piece' is definitely not a definition, nor is it a hyperonym of the headword. It simply functions like a grammatical code ([C] in LDOCE for example) but the unformalized character of such information in CR makes it much less reliable and consistent than in any learners' dictionary.

11.3.2 The Animal

Fur LIR

The following information, excerpted from Atkins's database of LIRs, illustrates a kind of noun alternation which turns a count noun denoting an animal into a mass noun denoting its fur: ANIMAL -> ITS FUR OR SKIN eg. NC: Minks are a kind of rodent NU: the coat is made of mink also: mink, squirrel, fox, leopard/crocodile, calf, snake Information about whether a noun participates in this alternation is partly retrievable from the CR dictionary on the basis of the co-occurrence of several metalinguistic indicators. Consider the following entries: ermine η {animal, fur, robes) hermine mink η (animal, fur) vison musquash η (animal) rat musqué, ondatra; (fur) rat d'Amérique

224 skunk η (animal) moufflette; (fur) sconse white fox η (animal) renard polaire; (skin, fur) renard blanc In some cases, the French equivalent participates in the same type of alternation (vison, hermine), which accounts for the lumping together of the two senses. Note that, in this case, the lexicographer has provided what might be called reassuring information. For ermine and mink, for instance, the metalinguistic annotations (animal, fur) are not really necessary because English and French cut up the semantic areas covered by ermine/mink and hermine/vison in the same way. This reassuring information is of paramount importance because it is used to construct the database and to identify items which participate in these types of alternations. When this information is missing, as is the case for squirrel, the alternation cannot be established, which is certainly one of the main drawbacks of this approach. In other cases, when different terms have to be used in French to refer to the animal or its fur, a subdivision has to be made. This shows that not all animals display this alternation in French (Le Petit Robert defines 'sconse' as 'fourrure de la moufflette' and 'moufflette' as 'animal chassé pour sa fourrure'). Interestingly, there are several cases of semantic preemption in English: the word lamb, for instance, can only be used to refer to the animal and cannot denote its fur because the term lambskin is used for that purpose (note that lamb participates in other alternations, however: the animal-meat LIR or the LIR which makes it possible to use the name of an animal to refer to a person with the characteristics usually ascribed to that animal). Several other words ending in -skin illustrate this type of semantic pre-emption (buckskin, deerskin, doeskin, goatskin, moleskin, pigskin, sealskin, sheepskin...). The assignment of these items to the list of 'exceptions' should be taken cum grano salis, however, because other factors are likely to play a part in an item's potential to participate in a given LIR. Questions of register and of stylistic level will certainly influence usage and only in-depth corpus-based studies would be able to reveal the subtle stylistic differences, if any, between deer and deerskin, which, according to Webster's Third Dictionary, can both be used to denote the leather made from the skin of that animal. Querying the database to search for the co-occurrence of animal and fur in the same entry does not prove to be a sufficient condition since, in some cases, the lexicographers have used other devices to show that a given word refers to an animal. Consider the following entry: beaver η (ΖοοΓ) castor; (fur) (fourrure de) castor The metalinguistic information which can be used as a clue to identify the 'departure' sense of our LIR is no longer a hyperonym but a label referring to the subject field (Zool for 'Zoology'). This discriminator plays exactly the same role as animal but, of course, the parallelism between the two senses is broken (a subject field indicator co-occurring with a hyperonym). One last thing to note is that the CR conforms to the general assumption that noun alternations rest on the distinction between a basic ('departure') sense and a derived ('arrival') sense. It is therefore logical for the basic meaning to be listed first, while the target meaning comes later in the entry. This need not necessarily be the case in dictionaries whose sense-

225 distinction policy is based on frequency. Cobuild (Sinclair 1987a) is a case in point here, as shown in the following entry: mink

1. Mink is a very expensive fur that is used to make coats, hats etc. EG. I'm very fond of mink., a mink coat 2. A mink is a very small furry animal, from which mink is obtained.

Ν UNCOUNT

Ν COUNT

These definitions definitely sound odd because the reader intuitively feels that it is unnatural to define mink_as_fur without referring to mink_as_animal in any way. This is due to the fact that the former sense has come to be used much more frequently in our civilization than the latter reading, which gets reflected in the number of occurrences in the corpus on which Cobuild is based. Semantically speaking, however, this practice obscures the fact that sense 1 is actually derived from sense 2. In a way, Cobuild reflects an induction over a real-life situation while a bilingual dictionary such as CR tries to reflect a semantic property of the word it defines. All this points to the more general problem of sense ordering in dictionaries, which falls outside the scope of this book.

11.3.3 The Fruit/Flower of plant

Plant LIR

This LIR turns the name of a fruit or flower into the name of the plant itself. Atkins's database lists the following examples to illustrate this alternation: NC: We ate raspberries and cream. NC: They planted a row of raspberries. First of all, it should be noted that this LIR cannot be considered as a subset of the count/mass alternation since the countable character of the noun remains unchanged. A second thing to note is that the term Plant which refers to the derived sense should be extended to cover trees as well. Actually, Atkins's database explicitly specifies that nouns such as orange, pear or apple participate in this alternation and they can hardly be called plants. Interestingly, the French language does not allow this type of transfer to the same extent as English. French actually makes much more use of a derivation rule, adding the suffix -ier to the name of the fruit (pomme —> pommier, prune —> prunier, citron —> citronnier). In a bilingual perspective, the lexicographer must capture this distinction and, interestingly, the French derivational direction seems to determine the order of senses in English. It is therefore not surprising that this LIR should be treated much more consistently in CR than the animalfur one, which hardly posed any translation problems because the two languages most often behave similarly. The keywords in italics which can be used to identify items participating in this type of sense extension are fruit and plant/tree, as shown in the examples below: banana η (fruit) banane; (tree) bananier

226 breadfruit η (fruit) fruit de l'arbre à pain; (tree) arbre à pain wild cherry η (fruit) merise; (tree) merisier mango η (fruit) mangue; (tree) manguier quince η (fruit) coing; (tree) cognassier strawberry η (fruit) fraise; (plant) fraisier tomato η (fruit, plant) tomate Other examples include citron, date, guava, lemon, lime, nectarine, sorb or tamarind. Note that the noun tomato is translated as tomate in French (which means that tomate is one of the few cases where the fruit-plant alternation applies in French - kaki (persimmon) behaves similarly). In three cases, the dictionary uses the hyperonym nut to refer to a special variety of fruit: nutmeg η (nut) noix de muscade; (tree) muscadier pecan η (nut) noix pacane; (tree) pacanier pistachio η (nut) pistache; (tree) pistachier The use of such a hyperonym obviously complicates the task of the linguist who wishes to detect instances of nouns participating in a given LIR. Here, the hyperonym is used instead of fruit to denote the basic sense. In the following example, the target (derived) sense is signalled through the use of a hyperonym of tree, viz. busk.

s.v. black cpd (fruit, bush) blackcurrant cassis

11.3.4 The Container

Contents LIR

This LIR makes it possible to use a noun referring to a container to denote its contents (There's whisky in the bottle vs. He drank the whole bottle). The prototypical example, bottle, is easily identifiable since the general categories used to name the LIR are specified in the dictionary entry: bottle η (container, contents) bouteille Retrieving the following entry on the basis of information in italics is much less straightforward, however, because the metalinguistic information used to denote the alternation is rather infrequent: glass η (tumbler) verre; (glassful) (plein) verre Actually, it would be more appropriate to say that the latter example illustrates the Container Amount LIR, rather than the Container Contents one.

227 11.4

Conclusion

Atkins's database contains around 130 LIRs. I have not examined them all here (many of them are not even accounted for in any way in the dictionary, perhaps because lexicographers in the late seventies were much less aware of research in lexical semantics than they are now). What is important at this stage is to realize that some of these LIRs are at least partly retrievable on the basis of the occurrence of some metalinguistic pieces of information (mostly in parentheses). The main problem is to identify the clues which signal an entry's potential participation in a given alternation. This can only be done empirically because the dictionary compilers have not tried to capture these properties in terms of a consistent set of codes. They may be made explicit through the use of certain keywords in italics whose value can only be accounted for as a result of their co-occurring with other keywords in the same environment. In the database, the main lexical relation I used to identify LIRs was the Spec function which links a hyperonym and its various hyponyms. By studying the intersection of the sets of items appearing as values of the Spec function applied to animal and fur respectively, for example, one can identify words which participate in the Animal - Fur LIR. In other cases, the lexical implication rule is brought to the fore thanks to the use of definition patterns such as those studied in detail in Chapter 7 (consider the LIR which turns a noun denoting a material into an adjective meaning "made of that material" - goldn —> goldadJ; see 7.6.3). A major problem which must be mentioned is that this area of the lexicon is full of irregularities and that pre-emption for a given LIR is, like any other linguistic phenomenon, liable to change with time. Consider the LIR which turns a noun referring to a means of locomotion into a verb signifying 'to go by that means' (cyclen —> cyclevi). Ship and cycle both designate vehicles and can definitely be used as verbs but only the latter can be used intransitively. In her database, Atkins points out that car, boat and plane do not participate in this LIR (because the existence of the verbs drive, sail and fly blocks the application of this rule). She seems to be right as far as car is concerned, but Webster's Ninth New Collegiate Dictionary (Mish 1987) explicitly marks plane and boat as intransitive verbs (defined respectively as 'to travel by airplane' and 'to go by boat'). Atkins also includes the word helicopter in the list of exceptions, albeit with a question mark, which reveals her uncertainty. It is clear, however, that helicopter does participate in the Vehicle LIR; there has definitely been an evolution and the more recent version of the CR dictionary makes it clear that helicopter can be used as a transitive verb of motion (as in US troops were helicoptered to Kuwait). Finally, it should be remembered that a bilingual dictionary aims at capturing translationally-relevant distinctions, which means that it may be considered natural for a lexicographer not to record a given noun alternation for an English headword because the French counterpart behaves in very much the same way.

12 Transitivity alternations

12.1

Introduction

In this chapter, I should like to examine the way in which certain types of transitivity alternations are represented in, and can be extracted from, dictionary entries. The underlying assumption is, to quote Levin (1993:1), that "the behaviour of a verb, particularly with respect to the expression and interpretation of its arguments, is to a large extent determined by its meaning". Levin's contention, which is largely shared by other linguists and lexicographers and which goes back a long way (Halliday 1967-1968, Lyons 1968, Atkins et al. 1986, Boguraev 1991, Antelmi & Roventini 1992), is that knowing a verb entails knowing how the arguments of this verb can be realized syntactically. To illustrate the so-called unspecified object alternation, Atkins et al. (1986) take the example of the verb eat which can be used intransitively or transitively, as in: (a) John ate the cake. (b) John ate. In (a) and (b), the subject refers to the one who does the eating. In (a), an object noun phrase refers to what was eaten. In terms of semantic roles, one may argue that the verb eat requires an agent realized as the subject and an optional theme argument realized as a direct object. If the verb is used intransitively, as in (b), the sentence still implies that something was eaten, although the nature of what was eaten is left unspecified (see also Allerton 1975). Atkins et al. (1986) and Levin (1993) note that this unspecified (or indefinite) object alternation is characteristic of an entire class of verbs denoting an activity (drink, sing, write, read...: John read the book entails John read). The entailments which are possible with verbs participating in this alternation cannot be made with other verbs, as the following example clearly shows: (c) Julia washed her baby. (d) Julia washed. Although wash and eat both participate in a transitive/intransitive alternation, it is clear that they should not be treated similarly. Indeed, the sentence Julia washed her baby does not entail Julia washed because the latter is equivalent to Julia washed herself. As a matter of fact, wash participates in what Levin (1993) calls the 'understood reflexive object alternation', which means that the intransitive use of the verb expresses an action directed toward the subject of the verb (reflexive meaning). This property is typical of 'verbs of caring for the whole body' (verbs referring to taking care of or grooming the whole body - Levin 1993:

230 35,227): change, dress, undress, shave, shower, strip, wash... Interestingly, 'verbs of caring for a specific body part', such as powder (nose), brush (teeth), or towel (hands, face) do not participate in this understood reflexive object alternation, which is why Levin (1993:227-228) subdivides the class of 'verbs of grooming and bodily care' into several subclasses, specifying whether the verbs apply to the whole body or to a specific body part only, for instance. The considerations above provide ample evidence that the traditional dual distinction between transitive and intransitive verbs is not satisfactory. Knowing a verb entails knowing much more than the fact that a verb can, must or cannot have a direct object. One must in fact also consider the range of alternations in which a given verb may participate. In many cases, Levin argues, such syntactic properties can be predicted on the basis of the meaning of the verb. For example, knowing that disrobe refers to taking care of the whole body and hence belongs to the class of 'dress' verbs is enough to predict that it participates in the reflexive object alternation, as in the example given by LDOCE (1978): disrobe ν [I0;(Tl)]^w/ to take off (esp. ceremonial outer) clothing After the trial, the judge disrobed and left the court. The transitivity alternations illustrated with eat and wash above are just two instances of a larger set of transitive/intransitive alternations. The exhaustive list can be found in Levin (1993) but it is worth exemplifying a few more of them to demonstrate the complexity of the task lexicographers are faced with when describing the syntactic potential of verbs. Causative/Inchoative alternation (e) John opened the door. (f) The door opened. Middle alternation (g) They sell a new dictionary. (h) The new dictionary sells like hot cakes. Reciprocal alternation (i) Peter kissed Vivian. (j) Peter and Vivian kissed (= Peter and Vivian kissed each other). Locative alternation (k) John loaded hay onto the truck. (1) John loaded the truck with hay. Dative alternation (m) John gave Mary the book. (n) John gave the book to Mary. Indefinite/Unspecified object alternation (o) John started to sing a song, (p) John started to sing.

231 Reflexive object alternation (q) Vivian was dressing herself when I came in. (r) Vivian was dressing when I came in. Conative alternation (s) Vivian cut the bread, (t) Vivian cut at the bread. It should be noted that some alternations are not directly linked to the distinction between transitive and intransitive usages of a verb. The locative alternation, for example, is typical of verbs belonging to the so-called 'spray/load' class and describing 'actions in which an agent puts some substance or material on a surface or in a container' (Atkins et al. 1986:20). The alternation here concerns the syntactic realizations of the substance and the container/surface which can appear as direct object NPs or as prepositional phrases.

12.2

The causative/inchoative alternation

In this section, I would like to concentrate more particularly on one type of alternation which has attracted a lot of attention in the past few years, viz. the causative/inchoative alternation. Consider the verb boil in the following sentences which illustrate this type of alternation: 1. John boiled the water. 2. The water boiled. It has often been noted that this alternation is typical of change-of-state verbs and the verbs which participate in it are frequently referred to as ergative verbs (Levin 1987, 1993; Atkins et al. 1986...). The Cobuild dictionary is, to my knowledge, the only dictionary that explicitly distinguishes ergative verbs, which are coded as V-ERG. It defines them as 'verbs which are both transitive and intransitive in the same meaning; (...) the object of the transitive verb can be used as the subject of the intransitive verb' (Sinclair 1987a: 1620). In terms of semantic roles, the verbs involve an agent (generally an animate entity) and a patient (the entity that changes state). These verbs have a causative and a non-causative use (inchoative refers to verbs expressing the beginning of a change of state). The semantic relations between the verb and its arguments may be expressed in two different ways since the causative (i.e. transitive) construction implies that the agent is realized as the subject and the patient argument is the object. The inchoative (i.e. intransitive) construction only involves a patient which is realized as the subject. The term agent is used here in a broad sense and refers to the cause of the action: it may represent an animate being, an initiator (as in The general marched the soldiers) or a force (as in The winter wind froze the pond). Some justification for conflating these concepts under the more general heading of AGENT can be found in Allerton (1982). Levin (1993:31) makes

232 a further distinction, however, and argues that verbs such as march, jump, trot (and other 'run' verbs) participate in what she calls the Induced Action Alternation insofar as the 'causee' is an animate volitional entity that is induced to act by the causer, as in: 3. 4. 5. 6.

The The The The

general marched the soldiers. soldiers marched. girl jumped the horse over the fence. horse jumped over the fence.

In earlier studies (Fontenelle & Vanandroye 1989, Fontenelle 1991,1992d), it was shown that ergativity is lexically governed and that this property should be coded at definition level if we want a computerized lexicon to account for the transitive and intransitive usages of this set of change-of-state verbs. It is indeed clear that not all change-of-state verbs are ergative: the verb fracture, for example, is definitely ergative (consider the two CIDE examples: She fractured her skull in the accident vs. Two of her ribs fractured when she was thrown from her horse), but the verb dislocate, which also refers to a change of state and belongs to the same sub-class of "mutilation" verbs, can only be used transitively (I dislocated my knee). Francis & Sinclair (1994) note that the class of ergative verbs seems to be in constant evolution and expands very quickly, as is testified by recent corpus-based studies (see also Stubbs 1993,1994). The problem, of course, is to be able to acquire and identify potential ergative verbs. Since these verbs are usually not explicitly tagged as such, the acquisition method is heavily dependent on the microstructure of the dictionary entries. In lexical resources such as LDOCE or OALD, for example, a set of definition patterns (or defining formulae, to quote Ahlswede & Evens 1988a) may be tapped to provide evidence that a given verb is ergative. Such definition patterns are, to some extent, the lexicographical application of the predicate decomposition approach developed by the generative semanticists in the 1960's. Their contention was that the meaning of words can be broken down into a combination of what they called semantic primitives or atomic predicates. CAUSE, BECOME and BE, among many others, belong to this limited set of atomic predicates (see also Evens et al. 1980; see also my note on p. 158). They can be used to account for the lexical representation of a verb such as boil for which two representations are possible, depending on whether the verb is (a) transitive or (b) intransitive. BOIL

BOIL

(a)

(b) CAUSE X

S

BECOME STEAM

X

S

BECOME STEAM

The optional character of the primitive CAUSE accounts for the transitive/intransitive alternation since the two constructions can be paraphrased as follows: (a) BOIL= CAUSE to BECOME STEAM (b) BOIL= BECOME STEAM It is important to realize that ergative verbs are to be distinguished from causative verbs which can only be used transitively (eg. to lay: to cause to lie; to raise: to cause to rise).

233 Starting from the idea that ergative verbs, although not coded explicitly, could be located using a combination of particular (formalized) grammatical codes and of (semi-formalized) definition patterns, we were able to extract lists of ergative verbs from LDOCE (Fontenelle & Vanandroye 1989, Fontenelle 1992d). Of course, the fact that the LDOCE lexicographers had used a restricted vocabulary of around 2,000 items to write their definitions was of great help since this practice has imposed some constraints on the defining style: variations being somewhat limited, it is easy to locate ergative verbs once valid predictors of ergativity have been detected. The LDOCE definition patterns which correspond to the surface realization of the semantic primitive CAUSE are the following ones: - to (cause to) V - to (allow to) V - to (help to) V - to make or become - to bring or come The causative/inchoative alternation is made explicit by the lexicographer's use of parentheses in the first three cases whereas the disjunctive conjunction 'or' accounts for this alternation in the last two definition patterns. As noted by Boguraev (1991:245-246), this attempt to extract ergative verbs from the Liège LDOCE database rests upon a careful analysis of the semantically-motivated lexicalisation of causativity without taking into account purely structural properties of dictionary definitions (see below). This approach was also taken over by Sikra (1992) in a successful effort to acquire causative verbs from an on-line version of the Concise Dictionary of the Slovak Language. Antelmi & Roventini (1992) have also adopted a similar approach to study this alternation in a monolingual Italian machine-readable dictionary. It should be borne in mind, however, that such acquisition strategies can only be implemented because the lexical resources which are exploited are monolingual dictionaries whose semi-formalized definitions can be searched with a view to identifying predictors of ergativity. Bilingual dictionaries, on the other hand, hardly ever make use of definitions, which means that other strategies will have to be devised to identify verbs which participate in the causative/inchoative alternation.

12.3

Ergative verbs and the Collins-Robert dictionary

Boguraev (1991:244-247) argues that linguistic phenomena such as ergativity can be extracted from monolingual dictionaries on the basis of purely structural properties of the dictionary entries. This does not mean that one could dispense with a more semantics-oriented approach such as advocated in Fontenelle & Vanandroye (1989) or Fontenelle (1992d) (for a comparison between the two approaches, see also Fontenelle 1996c). A more structural approach is, however, better adapted to the format of bilingual dictionaries such as the Collins-Robert dictionary. Since definitions cannot be tapped to recover a set of potential ergative verbs, it is necessary to discover how a dictionary such as CR signals to the user that a given verb displays the causative/inchoative alternation. At this juncture, it is important to

234 realize that monolingual and bilingual dictionaries differ in this respect insofar as the former have opted for the so-called 'lumping' strategy, which means that the two variants of the alternation (causative/transitive vs. inchoative/intransitive) are considered to instantiate a single sense of the verb. The identification techniques advocated above for LDOCE or COBUILD can only be applied because we are working at definition level to retrieve representative defining formulae which are used to lexicalize causativity. Bilingual dictionaries, however, usually adopt the 'splitting' strategy in organizing senses. Atkins et al. note that 'they [bilingual dictionaries] adopt strict divisions by parts of speech, supplemented with transitivity indications for verbs' (1986:9). They argue that there are pragmatic reasons for grouping all transitive senses together and treating intransitive senses separately: in their opinion, dictionary users are only able to guess the part of speech and the transitivity of an unknown word if it is contextualized (provided they are at all able to identify whether a given word is a verb, of course). This means that a prototypical ergative verb will be treated in CR as follows (the entries have been slightly edited for the sake of clarity): turn

3 vt g milk faire tourner 4 vi d [milk] tourner

lessen

1 vt (...) pain atténuer; (...) (Pol) tension relâcher 2 vi (...) [pain] s'atténuer; [tension] se relâcher

Several things ought to be noted here. First, unlike what happens in monolingual dictionaries such as LDOCE, the information given in the entry is not restricted to the verb since the patient argument is here specified explicitly. The nouns milk, tension and pain in the examples above all refer to the entity which changes state. It must be borne in mind that the patient argument appears unbracketed in the transitive sense (direct object) and surrounded by square brackets in the intransitive sense (subject). This information is of cardinal importance since it will be used to extract verbs which pattern in a similar fashion. In fact, the structure of the CR database makes it possible to extract all the verbs which can be transitive (POS = vt) and intransitive (POS = vi) and for which a given item in italics (itword in the mybase.dbf database) can be used either as the subject of the verb (Typ = C for the square brackets which surround the subject - "crochet" in French) or as its direct object (Typ = S - surface level unbracketed). Access to ergative verbs is therefore made possible via the typical patients with which they can be combined. The following table illustrates the five ergative verbs (burn, congeal, curdle, sour, turn) which can be extracted when the noun milk in italics is chosen as primary access key:

235 Itword milk milk milk milk milk milk milk milk milk milk

Typ

C S

c S

c s c s c s

Enhead

Pos

burn burn congeal congeal curdle curdle sour sour turn turn

vi vt vi vt vi vt vi vt vi vt

Frtran attacher laisser attacher se cailler faire cailler se cailler cailler tourner faire tourner tourner faire tourner

(Itword= word in italics; Typ= typographical information; Enhead = English headword; Pos= part of speech; Frtran= French translation - see chapter 6.6.1 for more information on the structure of the database). Montemagni (1994) points out that knowing whether a verb is ergative or not is necessary to characterize the linguistic properties of verbs, but she adds that it is far from being sufficient. In addition to that, she argues, the lexical-semantic description of a verb must also include restrictions on the possible arguments a verb can take. Such restrictions account for the wellformedness of (7) and (8) and the ill-formedness of (9) in the following contrastive pairs: (7) John rang the bell. (8) The bell rang. (9) * John rang the telephone. (10) The telephone rang. These sentences make it abundantly clear that the ergative property of the verb ring is restricted to cases where it co-occurs with specific patients such as bell to the exclusion of other potential nouns such as telephone, which can only be used as subject. Some explanation may also be found in the hypothesis that the causative use of an ergative verb seems to entail that there is a direct action, often with some kind of contact between the agent and the patient. This could perhaps account for the ungrammaticality of John rang the telephone above, since there is no direct contact between John and the other person's telephone. Another explanation is that verbs which describe sounds with an internal source of production can only be used intransitively. In (10) above, the sound is apparently produced autonomously with the sound emitter (the telephone) while in (7) and (8), the sound is produced externally, through contact with the external surface of the bell. Verbs which describe sounds produced externally may be used transitively with a causative interpretation (Atkins et al. 1996:348). As a matter of fact, Montemagni's criticism of traditional approaches to the extraction of ergative verbs is therefore justified in the case of monolingual dictionaries which usually do not include any explicit reference to such restrictions. The CR dictionary, on the other hand, enables linguists

236 to readily identify this syntactic property together with the lexical set of arguments which make this type of alternation possible (the list of ergative verbs and their typical patients extracted from the CR dictionary is given as Appendix J - 16.10).

12.4

Ergativity and translation

It is generally admitted that knowledge about the syntactic behaviour of verbs is essential for the development of practical natural language processing systems. This means that a computational lexicon requires an accurate and detailed representation of subcategorization, if it is to be used to disambiguate and interpret a natural language text. Therefore, ergativity is undoubtedly a property we wish to identify since it is essential for the assignment of semantic roles. In a translation perspective, this property is equally important since several patterns are possible to render the transitive and intransitive uses of English ergative verbs into French (Geradon 1994 deals with the similar question of finding German equivalents of English ergative verbs - albeit in a non-computational perspective):

12.4.1 No modification The causative and non-causative uses of the verb are expressed by the same verb in French: (a) (b)

E: F: E: F:

The government has decided to increase the price of bread. Le gouvernement a décidé d'augmenter le prix du pain. The price of bread will increase in January. Le prix du pain augmentera en janvier.

Other examples are diminuer ( Is it small? Yes —> a square of chocolate, a segment of orange, a speck of dust, a flake of snow...·, No, but it's quite thin: a slab of stone, a rasher of bacon, a bar of chocolate...·, Is the substance solid? No -> a drop of water/wine, a dash/squirt/squeeze of lemon juice/oil, a puff/wisp of smoke. The algorithmic presentation is definitely a major improvement on Mel'cuk's use of lexical functions which, as I have argued repeatedly, sometimes require a deeper level of description and refinement (on the use of subscripted modifiers and the overgenerality of LFs, see section 5.3.4.6). My database indeed does not make any distinction between a bar/tablet of chocolate and a piece of chocolate which are all represented in terms of the Sing function. Yet, the former combinations refer to standard, marketed units which are themselves divisible into smaller pieces. A tablet/bar of chocolate can be divided into pieces/squares of chocolate. These distinctions are hardly formalizable in terms of lexical functions and are not accounted for in the CR database. In some cases, however, I have introduced modifiers such as [solid] and [liquid] which make it possible to distinguish the following examples in the database: Singsolid (lemon)= slice, zest

269 Singliquid (lemon)= dash Two misconceptions which underlie the LDOCE table must be cleared up. First of all, the insistence on substances could persuade the learner that words to talk about a piece of something are limited to concrete objects which can be illustrated. Abstract notions can also be partitioned, however, and knowledge of their combinatory potential is equally important, even if it is obviously less easy to describe graphically. The following examples from the CR database should normally be enough to provide evidence that this category of nouns also deserves a similar treatment: Sing Sing Sing Sing Sing Sing

(conversation^ scrap, snippet (bribe, fragment) (excitement)= flush (accès) (fever)= outbreak (accès) (generosity)= access (élan) (humour)= fund (fond) (truth)= grain, ounce, rag, shred, speck, spot (ombre, grain, brin)

The second danger is that the learner might be tempted to think that the collocations which are included in the table are the only possible ones. Of course, the compiler has cautiously noted that "the table shows some of them [i.e. words to talk about a piece of something T.F.] and gives an idea of which substances they can refer to". Dust, for example, is shown to collocate with speck but the CR database shows that its collocational range is much more extended: Sing (dust)= fleck, mote, particle, speck Puff and wisp are listed as potential collocates of smoke but, again, the CR data confirms my hunch that the LDOCE table (or any other such description found in vocabulary textbooks) would benefit from access to such a database: Sing (smoke)= coil, curl, gust, puff, ring, trail, whiff, wisp Of course, this data is not accompanied by frequency information and MI and t-score data would most certainly reveal that some candidates above are much more likely than others to appear in the vicinity of the bases in question. I am firmly convinced, however, that such a database could form a most useful resource to improve on existing descriptions and compile new generations of pedagogical reference works. Ideally, some of Mel'cuk's lexical functions could also be used in the micro-structure of entries to help advanced learners encode their meanings with greater facility. The lexical functions would then be used along the traditional grammar codes. An adapted, less technical version of the lexical functions might be envisaged, along the lines suggested by Hausmann (1979) who uses natural-language labels to introduce semantic subclasses (see also p.38).

270 14.5

Conclusion

In this chapter, I have sought to explore potential applications of the Collins-Robert database in the field of language teaching. The idea of teaching collocations is not new. As early as the beginning of the 20th century, Charles Bally, in his Traité de stylistique française (1909), proposed a battery of exercises ranging from cloze tests and multiple-choice questions which aimed at teaching collocations. Hausmann (1979:189) cites one of these exercises in which the reader is asked to fill in the blanks with the appropriate word selected from a list (the italicized item in the corrected version): -

Trahir une agitation fébrile. C'est d'un prix exorbitant. Etre d'une ignorance crasse. Elle est d'une laideur repoussante. J'ai une envie folle de partir.

Perhaps Bally had had access to Flaubert's Dictionnaire des idées reçues to which I alluded in the introduction to this book. More recently, Mackin (1978) devised a number of exercises inviting native speakers and advanced learners to complete a set of sentences taken from a corpus. In his exercises, Mackin made sure that the sentences were given as little context as possible to test the native speaker's intuitions as to the familiarity of the expressions and their variability. It should be noted that besides restricted collocations proper, Mackin considered pure idioms, clichés, discourse devices and similes. Similarly, the BBI compilers have felt the need to write a workbook with exercises aimed at exploiting the BBI material (Benson et al. 1991). This workbook, however, contains numerous types of tests exploiting grammatical collocations (typical prepositions...) and subcategorization properties. Less emphasis has been laid on lexical collocations. The approach I have adopted in this chapter is based on work on the organization of the mental lexicon (Aitchison 1987). Didactic research has indeed shown that language students learn best when they are presented with semantic networks whose arcs between the nodes correspond to the lexical functions linking italicized metalinguistic items and headwords in the dictionary. Baten et al. (1993), who are concerned with the use of semantic networks in foreign language teaching, argue that clustering information in meaningful chunks is a recognized memory technique in language learning. Semantic networks should not be used blindly, however, given their predominant tendency to organize concepts along the paradigmatic axis only (co-hyponyms, synonyms, antonyms, etc.). I have shown here that the Collins-Robert database contains enough material to be used as a source of countless exercises, taking the syntagmatic dimension into account. The exercises I have suggested in this chapter have not been implemented but the recent advances in Hypertext technologies open up exciting vistas for another type of open CALL software packages. Given the availability in the same type of database form of other dictionaries of English such as LDOCE, it even seems reasonable to envisage the development of an integrated lexical database which would allow the user to navigate f r o m a bilingual to a monolingual dictionary,

271 "surfing" a network of lexical data and providing learners (and teachers) with yet other pedagogical resources.

15 General conclusions

In a paper presented at a workshop in Grenoble on 'The Future of the Dictionary' in October 1994, Veronis & Ide (1994) wonder whether the 15 years of research on machine-readable dictionaries constitute wasted effort, considering the fact that "the extraction of semantic information has proved to be a far greater problem than originally envisaged, and, as a result, not a single large-scale knowledge base has been created from MRDs to date". Veronis & Ide suggest that the answer to their question depends on the point of view one wishes to adopt. It is true that the original goals, i.e. turning totally automatically the MR version of a dictionary into an LKB directly usable by an NLP system have not been met. It is also true that the few taxonomies based on genus terms extracted from definitions are hardly usable at all in current systems. To be sure, lexical-semantic information has proved harder to extract from MRDs than was thought 15 years ago. These unsatisfactory results are balanced, however, by the outstanding contribution MRD research has made to the study of word meaning. I tend to share this point of view, insofar as what I have tried to develop in this book should be seen as a modest contribution to the study of lexical-semantic relations and, more specifically, of collocational knowledge. The vast resource of linguistic data which is now available in the form of a flexible database will provide lexicographers and linguists with another testbed for their linguistic hypotheses. The emphasis in this book has been laid on the construction of the database proper and on the systematic enrichment of the Collins-Robert material with lexical-semantic labels drawing on the descriptive apparatus of Mel'cuk's lexical functions. My working alone with so much data over such a long time-span has probably entailed a number of inconsistencies in the database but, at the same time, revealing the shortcomings of Mel'cuk's theory, both in the coverage of lexical functions and in the lack of operational criteria governing their assignment, was only possible because 1 had decided to work on the entire dictionary. I could obviously have selected a micro-domain, say all metalinguistic items beginning in -s, or a restricted semantic field as a testbed for more detailed analyses. This approach would, I think, have led to frustration, however, because, at the end of the day, I would still have been faced with the gnawing question: What if the whole dictionary had been processed? Reflections upon and generalizations about the nature of the relationship between ergativity and lexical functions, or between onomatopoeia and consonant clusters, or between metaphors and lexical functions, could not have been made on the basis of a small sample of data only. It should now be clear that the types of applications envisaged in the final part of this book are far from exhaustive. As a matter of fact, the selection of topics addressed in the last five chapters (10,11,12,13,14) was meant to illustrate the variety of current areas of interest in computational linguistics, lexical semantics, lexicology or applied linguistics. It is my feeling that the CR database, whose construction has been the main topic of this dissertation, can contribute to a better understanding of some of these areas, but, in fact, the exploitation of the lexical-semantic database of 70,000 contextual pairs of English and French lexical items

274 has only begun and is only limited by the researcher's imagination. The flexibility of the retrieval programs and the multiple access paths they provide open up new and exciting vistas of research into the structure of the lexicon. The methodology and the tools presented in this book offer almost endless possibilities for exploring the dictionary data and the manifold semantic links between words. Generalizations and insights often emerge when one is allowed to navigate through such a lexical database, whether in an opportunistic or in a theory-oriented mode, and it is reasonable to expect that the numerous query functionalities of the user interface, together with the richness of the semantic information I have added, may provide an exciting resource for translators, linguists or teachers, giving them insights into the relationships between semantics and syntax, phonetics and word meaning, or collocations and general or specialized language, to name but a few. As was pointed out at several places, my work on the computerized Collins-Robert dictionary capitalized on 15 years of research on MRDs. This implies that some of the ideas developed here are not necessarily new. Many researchers have indeed insisted on the wealth of information contained in MRDs and the use that can be made of them to extract and acquire lexical knowledge for NLP or human applications. This work, however, departs from earlier attempts in several respects which can be summed up as follows: 1. Most research projects in computational lexicography have concentrated on monolingual dictionaries, with a predilection for British learners' dictionaries. Bilingual dictionaries, whose coverage is much more extended, have been neglected, with only a few recent exceptions (see Neff & McCord 1990, Picchi et al, 1988, Segond & Zaenen 1994, Bauer et al. 1995, Breidt & Segond 1996, Breidt et al. 1996). 2. In most cases, research on MRDs has focused on the extraction of syntactic information, with a strong emphasis on subcategorization and the description of the syntactic behaviour of verbs (see Boguraev & Briscoe 1989, who, in chapter 4, describe attempts to retrieve interesting verb classes from LDOCE, such as equi or raising verbs, on the basis of an algorithm suggested by Michiels 1982). In this respect, the grammar coding systems used by learners' dictionaries have proved most useful but the semantic dimension tends to have been overlooked. In particular collocational knowledge is often sought out in computerized corpora, making use of sophisticated statistical techniques. The collocational potential of an MRD, whether monolingual or bilingual, had, to my knowledge, never been made so explicit and so accessible. 3. Most attempts to generate relational lexicons from MRDs are based on the use of defining formulae as clues to lexical-semantic relations (Amsler 1980, Ahlswede & Evens 1988a, Calzolari 1988). Although this has considerably influenced my own work on the Collins-Robert micro-definitions, the majority of the relations in the database were originally implicit and making them explicit entailed interpreting the semantic link between the italicized items and the headwords under which they appear. The methodology was different and the range of semantic relations much wider than what can be extracted when exploiting defining formulae (lexical functions such as Degrad, Liqu, Fune, Real, Oper or their equivalents are usually absent from such relational lexicons). 4. On-line lexical databases such as WordNet (Miller et al. 1990) pay considerable attention to taxonomical structures. The Collins-Robert database certainly cannot compete with such

275 hierarchies of hyper- and hyponyms, mainly because the explicit indication of synonyms and hyperonyms in a bilingual dictionary depends on a target-oriented division of the semantic space covered by a source language entry. Since hyperonyms are only rarely mentioned in bilingual dictionaries, it is therefore not surprising that most attempts to generate taxonomies take monolingual dictionaries as a starting point and fare much better than what can be produced with the Collins-Robert dictionary. The situation is reversed in the case of collocational knowledge, however, but the few MRD-based systems which offer some possibility of retrieving collocations are limited to very basic questions, however important from a lexicographical point of view (Boguraev 1991:251 cites the following query against the IBM version of the Collins-Robert dictionary: What does one normally do with books? « List the transitive verbs which can take book as direct object). More sophisticated queries are usually not possible with existing systems because the relationship between the metalinguistic items and the headwords has not been figured out and formalized, as is the case in the Liège database. 5. Recent lexical-semantic theories such as Pustejovsky's Generative Lexicon attract a lot of attention in CL circles. In particular, the qualia structures provide an appealing apparatus for describing the argument structures of nouns. The four main roles identified by Pustejovsky (Formal, Constitutive, Telic and Agentive) are much too limited, however, and do not cover the whole gamut of lexical-semantic relations which can be found in an MRD such as the Collins-Robert dictionary. Moreover, these roles are mainly useful for the formal description of nouns and it is not clear how they can be adapted to the modelling of verbs. The four roles above are definitely similar to some of Mel'cuk's lexical functions (the Telic role clearly corresponds to the Real LF, for instance), but the need to account for a wide range of other relations, both purely lexical and partly encyclopedic, renders this model less useful. The perspective adopted in this book required a more flexible relational model to cope with the multiple relations identified in the dictionary. 6. Mel'cuk's theory of lexical functions is usually applied to limited semantic domains or to very small fragments of the lexicon (Mantha & Mel'cuk (1988) examine the field of meteorological phenomena; Mel'cuk & Wanner (1994) analyze restricted lexical co-occurrence in the field of emotions and feelings; see also the detailed descriptions of the few dozen lexemes in the three volumes of the Explanatory Combinatory Dictionary). Testing this theory against a huge corpus of general-language bilingual data extracted from a machine-readable dictionary had never been attempted. Furthermore, even ECDs offer only limited access paths. Bases are listed in alphabetical order, which means that collocators or lexical functions cannot be taken as primary access keys, as is the case in the CR database. While ECD compilers usually start from a pre-determined list of LFs and try to discover which words can be the values of these LFs, I have started from existing pairs of lexical items and tried to discover which function relates the members of these pairs. This approach has brought to light a number of shortcomings in the mechanism of lexical functions. The numerous problems related to the formalizable nature of these LFs and their potential use in an NLP perspective have also been evidenced. Despite these limitations, which account for some of the errors I have probably made in assigning LFs or the inconsistencies I have created, the fact remains that the descriptive power of Mel'cuk's lexical functions has proved most useful in my

276 attempt to structure the CR combinatory and paradigmatic knowledge into lexical-semantic networks. This descriptive capacity has enabled me to add a thesauric dimension to the dictionary, the significance of which should be obvious to translators who are constantly (and often desperately) seeking the appropriate term in context, or to computational linguists working in the field of information retrieval or machine translation. Before concluding, I should like to describe very briefly some of the problems this book has only touched upon and which definitely deserve further study. The applications which can be envisaged can be broken down into two categories, depending on whether the CR database is used for further linguistic studies or is used as a component of more ambitious systems drawing on other computerized resources (textual or lexical). To focus on the first category, it is clear that the exploitation of the database is not finished. In this book, I have concentrated more particularly on a few lexical functions (Son, Mult, Sing...), but it goes without saying that each LF (whether simple or complex) can be taken as a starting point for a systematic study. The dictionary database contains enough evidence for a linguist to try to come up with generalizations about the types of arguments or values which are associated with a particular function. In Chapter 9,1 alluded to the relationship between some LFs and the meanings of some of the particles which can be found in phrasal verbs functioning as values of these LFs. Along the same lines, it would be most interesting to examine the relation between morphology, collocations and lexical functions, in particular the mechanisms of affixation or compounding which underlie the formation of the values of certain lexical functions. Compound LFs containing Anti, for example, are, quite naturally, closely associated with the negative prefix -un or the negative suffix -less. Browsing through the database reveals that a given base may select collocators which illustrate a potential conflict between different morphological processes. Consider: AntiPos (event) Liqu (bottle-top) CausFinObstr (thread)

= = = = = =

luckless (malencontreux) unlucky (malencontreux) twist off (enlever en dévissant) untwist (dévisser) ravel out (démêler) unravel (démêler)

Only statistical data (and the use of techniques such as t-scores) would make it possible to discover the nuances which distinguish the synonyms or near-synonyms above. It is clear, however, that the systematic study of such conflicting word formation processes deserves more than just a passing remark. It would also shed light on the semantics of morphological oppositions, showing, for example, that the presence of a negative prefix does not necessarily entail opposition with the root, as the following relations extracted from the CR database show: CausFinPred (prisoner)

= loose (relâcher) = unloose (libérer)

277 The two verbs here refer to the same activity of freeing a prisoner, very much like the pair bone/debone cited by Fellbaum (1990:288), for which there is no semantic opposition either (bone a fish = debone a fish). Another area of research which will have to be envisaged in the future concerns the possibility of improving the database, extending its coverage and complementing the data it contains with evidence from other sources. The example section of the dictionary is undoubtedly one such source. In this work, I have exclusively considered the CR metalinguistic apparatus because its size and scope already presented me with enough data (and problems). Examples often contain restricted collocations and idioms, however, and retrieving them with appropriate tagging and parsing techniques would probably be very useful and would complement the already-existing database. Another area which deserves investigating and which the Liège team is currently exploring is that of word sense disambiguation and translation selection. Michiels (1996) shows how the CR metalinguistic information can be combined with genus terms extracted from the LDOCE database to select the appropriate French target translation for verbal predicates. This smallscale experiment, which makes use of the two relational databases and a Prolog syntactic parser, was carried out before the Collins-Robert database was enriched with lexical-functions and before I was able to make the numerous adjustments described in section 6.6.2. Such experiments are currently extended to other parts of speech and the members of the DEFI project on disambiguation through filtering have examined the benefit that can be derived from using the presence of lexical functions in the database as an additional discriminatory factor (Michiels & Dufour 1996). They have also linked the Collins-Robert and OxfordHachette databases to WordNet (see Fellbaum 1990, Miller et al. 1990, Miller 1990). The latter lexical database should supply the lexical inheritance system which is so crucially lacking in the Collins-Robert database.1 It has indeed been shown that the metalinguistic indicators in the bilingual dictionary often function as heads of thesauric classes. Since WordNet is primarily, though not exclusively, concerned with capturing relations of hyponymy and representing lexical knowledge in the form of taxonomies, combining the Princeton database with the Collins-Robert dictionary sheds light on some of the most interesting problems in NLP. To give only one example, in the case of a phrase such as a school of sardines analyzed by the parser, the system would not find any useful information linking school and sardine in the CR dictionary. In WordNet, however, sardine appears as one of the numerous hyponyms of the noun fish, for which an explicit link (with the Mult LF) is indicated in the Collins-Robert dictionary (s.v. school). Keeping track of this information is necessary to avoid translating the above phrase as une école de sardines. Linking WordNet and the Collins-Robert would also be worthwhile in an effort to disambiguate the several thousand occurrences of the little word etc which appears so often at the end of a list of typical collocations or selection restrictions. Financial considerations obviously play an important part in the design and compilation of dictionaries and it is understandable that the lexicographer should be willing to save as much space as possible and

1

The WordNet lexical database is freely available for research purposes and runs on PC-DOS platforms and under UNIX.

278 to compact information (Atkins 1993 provides an illuminating picture of extra-linguistic constraints which preside over the production of commercial lexicographical products). Financial reasons do not explain everything, however, and etc can in many cases be seen as the lexicographer's failure to account for very complex collocational phenomena. Consider the CR entry operate below: operate

2(a) vt [person] machine, tool, vehicle, switchboard, telephone, brakes, etc faire marcher, faire fonctionner 2(b) business, factory diriger, gérer; coal mine, oil well, canal, quarry exploiter, faire valoir

It is clear that etc is used to indicate that the list is far from exhaustive and that mentioning all possible direct objects for sense 2a would be too space-consuming. In the NLP perspective alluded to above, however, one needs to give precise indications to make sure that the verbal predicate operate in to operate the lift, which is not explicitly mentioned as a potential collocation, would be translated as faire marcher/ fonctionner, and not as diriger, gérer, exploiter or faire valoir. The taxonomical hierarchies offered by the WordNet system could perhaps be put to good use here in providing means of computing the semantic similarity of the various members of the list of metalinguistic items which immediately precede etc. Comparing the various hyperonyms of these items might possibly help inferring that the typical direct objects for sense 2a share a [+instrument] or [+apparatus] feature. It is difficult to predict the success of this approach, however, when one considers the large number of cases where etc is used after one metalinguistic item only.2 In this respect, the concept of paradigm extension put forward by Montemagni et al. (1996) and implemented by Michiels (1996) in the DEFI dictionarytext matcher to measure semantic distance in terms of metalinguistic slot sharing is sure to prove very useful. Another way of improving on the CR lists of collocations and semantic restrictions is to combine the CR database with the machine-readable version of the Oxford-Hachette EnglishFrench/French-English dictionary (Corréard & Grundy 1994). The SGML typesetting tapes of this corpus-based bilingual dictionary have also been made available to the Liège group for research purposes by the publishers and it will be interesting to compare the two reference works and see to what extent they can be used and merged in a word sense assignment perspective. They do not differ significantly in size or scope, they target a similar public, but the underlying principles diverge insofar as the OH lexicographers have used statisticallyanalyzed computerized corpora which were virtually non-existent when the first edition of CR was compiled (the DEFI project uses the 1993 version of CR, which is based on corpus data, however). It would therefore not be surprising to discover that the OH data better reflect central and typical usages, but this remains to be proved. This hypothesis would require a detailed analysis of the OH metalinguistic apparatus as well and a systematic comparison of

2

Consider the entry for compromise: compromise vi reputation etc compromettre Since one can also compromise one's beliefs or principles, it seems doubtful that WordNet. or any other system, would be of any use to disambiguate etc and figure out what it stands for.

279 the coverage of this apparatus in the two bilingual dictionaries. Such comparison is currently under way in the framework of the DEFI project, which has now reached the stage when the two dictionaries can be merged into one single machine-tractable dictionary usable for word sense disambiguation (Michiels & Dufour 1996, Dufour 1997). Furthermore, having the two dictionaries in the same format makes it possible to complement the lists of collocational descriptions. Another area which would be interesting to investigate in the future is the relationship between the CR intuition-based data and evidence found in computerized corpora. It is a wellknown fact that corpus-based lexicography relies on tremendous amounts of data. This raises the crucial question of how to sift through the data in order to find the evidence one is looking for without being overwhelmed by the thousands of occurrences of any common word in a corpus of several dozen million words. In section 2.3, I discussed a few of the most common statistical techniques used in corpus analysis, but, as Church et al. (1994:153) put it, "the lexicographer is like a person standing underneath Niagara Falls holding a rainwater gauge, while the evidence sweeps by in immeasurable torrents". Extracting lexicographically relevant facts is therefore undoubtedly a key problem in corpus-based dictionary making and it will be interesting to examine how the CR database can be used to enhance corpus-based exploration. Within the framework of the DECIDE project, the Liège lexical-semantic database has been linked with existing tools developed at the Rank Xerox Research Centre in Grenoble (Grefenstette 1994a,b) and at the Institut für maschinelle Sprachverarbeitung at the University of Stuttgart, in particular the IMS Corpus Workbench and the Xkwic tools, which produce KWIC concordances and feature very powerful sorting and collocate search functionalities (Atkins & Christ 1995; Schulze & Christ 1994). One application we had in mind was to use the Mel'cukian lexical functions of the CR database to pose semanticallymotivated questions and to search for relevant facts in the corpus. For example, a lexicographer might be interested in extracting sentences which refer to the abolishment of a law or regulation. Querying the CR database against the lexical function Liqu and the keyword law would then produce a list of potential collocators (abolish, repeal, rescind, do away with...) whose combinations with the keyword would be retrieved from a part-of-speech tagged corpus for further analysis. One clearly sees the vistas this combination of dictionarybased extraction and computer-aided corpus analysis opens up in the field of information retrieval as well, since one of the major problems IR is faced with is how to reduce the inevitable noise generated by a query (see Fontenelle 1996a). This could also be a modest contribution to the new methodologies required to develop the future generation of hypertext bilingual dictionaries featuring frame and semantic nets, instantaneous access to relevant corpus data and thesaural information as described by Atkins (1996). To conclude this book, I should like to quote Hausmann (1979:194-195) who, in a penetrating article about collocational dictionaries, attempts to justify the existence of these dictionaries and alludes to Flaubert's Dictionnaire des Idées Reçues which I used to open this thesis: "Une dernière objection reste à écarter. Elle se manifeste chaque fois qu'on présente des exemples devant un public d'enseignants. Enseigner des collocations, n'est-ce pas uniformiser le langage? N'est-ce pas réprimer la liberté de l'énonciation originale au profit de l'expression

280 stéréotypée, au profit du cliché? Le dictionnaire des collocations n'est-il pas plutôt un nouveau Dictionnaire des idées reçues dont le contenu mériterait les mêmes sarcasmes de la part du public éclairé? En ce qui concerne l'utilisateur étranger, l'objection n'est pas valable. Acquérir le fameux "sens de la langue", c'est, entre autres, assimiler les collocations. Une quelconque recherche d'originalité dans la combinaison des mots n'est concevable, pour l'étudiant du français comme langue étrangère, qu'à partir d'une bonne maîtrise de ces collocations consacrées par l'usage. Quant à l'utilisateur français, le débat a eu lieu entre 1953 et 1955 dans Vie et langage autour des mots-tandem qui ne sont autres que nos collocations. Aurélien Sauvageot résumait alors la question en disant: "Rien n'est plus fallacieux que de croire que nous nous exprimons librement" et "la part de l'automatisme dans l'expression linguistique est énorme". C'est pourquoi, à moins de renoncer à l'idéal du mot approprié (ou, mieux, des mots appropriés) et de vouloir transformer tous nos écoliers en poètes maniant le verbe à leur façon, le recueil des combinaisons usuelles peut être utile au francophone. Ne vaut-il pas mieux parler comme tout le monde que de parler mal?" Hausmann's paper was written before the term computational lexicography was coined, long before the use of computers had become commonplace in dictionary-making and analysis. His emphasis on language teaching, however important, unfortunately tends to minimize the more fundamental linguistic aspects of collocational and lexical-semantic studies I have tackled here. It is hoped that the tools and methodologies described in this book will not only pave the way for better dictionaries but will also contribute to a better understanding of the lexicon.

16 Appendices

16.1

Appendix A: The Son lexical function (sample list)

Query against the CR database ("son" in the field lexfunc) babble (n) : - b a b y - => babil (bébé,sOson) babble (n) : -stream- => gazouillement (ruisseau,sOson) babble (n) : - v o i c e - => rumeur (voix,sOson) babble (vi) : - b a b y - => gazouiller (bébé,son) babble (vi) : -stream- => jaser (ruisseau,son) bang (n) : - d o o r - => claquement (porte,sOson) bang (n) : -explosive- => détonation (explosif,sOson) bang (n) : - g u n - => détonation (arme,sOson) bang (vi) : - d o o r - => claquer (porte,son) bang (vi) : -firework- => éclater (feu d'artifice,son) bang (vi) : - g u n - => détoner (arme,son) bang away (vi) : - g u n - => tonner (arme,son) bark (n) : - d o g - => aboiement (chien,sOson) bark (n) : - f o x - => glapissement (renard,sOson) bark (vi) : - d o g - => aboyer ( après) (chien,son) bark (vi) : - f o x - => glapir (renard,son) bark (vi) : - g u n - => aboyer (arme,son) bark out (vt sep) : - o r d e r - => glapir (ordre,operl+causson) bawl (vt) : - o r d e r - => brailler (ordre,operl+causson) bay (n) : - d o g - => aboiement (chien,sOson) bay (n) : - p a c k - => abois (meute,sOson) beat (n) : - d r u m - => battement (tambour,sOson) beat (vi) : - d r u m - => battre (tambour,son) beating (n) : - d r u m - => battement (tambour,sOson) bellow (n) : -animal- => beuglement (animal,sOson) bellow (n) : -animal- => mugissement (animal,sOson) bellow (n) : - b u l l - => beuglement (taureau,sOson) bellow (n) : - c o w - => beuglement (vache,sOson) bellow (n) : - o c e a n - => mugissement (océan,sOson) bellow (n) : -person- => hurlement (personne,sOson) bellow (n) : - s t o r m - => mugissement (tempête,sOson) bellow (vi) : -animal- => beugler (animal,son) bellow (vi) : -animal- => mugir (animal,son) bellow (vi) : - b u l l - => beugler (taureau,son) bellow (vi) : - c o w - => beugler (vache,son)

282 bellow (vi) : ~ocean~ => mugir (océan,son) bellow (vi) : -person- => brailler (personne,son) bellow (vi) : - w i n d - => mugir (vent,son) bellow (vt) : - o r d e r - => brailler (ordre,operl+causson) bicker (vi) : -stream- => murmurer (ruisseau,son) bickering (adj) : -stream- => murmurant (ruisseau,aOson) blare (η) : -hooter- => bruit strident (sirène,sOson) blare (n) : - h o r n - => bruit strident (klaxon,sOson) blare (n) : - m u s i c - => beuglement (musique,sOson) blare (n) : - r a d i o - => beuglement (radio,sOson) blare (n) : -trumpet- => sonnerie (trompette,sOson) blare (vi) : - h o r n - => retentir (klaxon,son) blare (vi) : - m u s i c - => retentir (musique,son) blare (vi) : - r a d i o - => beugler (radio,son) blare (vi) : - v o i c e - => trompeter (voix,son) blare out (vt sep) : - m u s i c - => faire retentir (musique,causson) blast (η) : -rocket- => grondement (fusée,sOson) blast (n) : -trumpet- => fanfare (trompette,sOson) bleat (n) : - g o a t - => bêlement (chèvre,sOson) bleat (n) : - s h e e p - => bêlement (mouton,sOson) bleat (n) : - v o i c e - => bêlement (voix,sOson) bleat (vi) : - g o a t - => bêler (chèvre,son) bleat (vi) : -person- => bêler (personne,son) bleat (vi) : - s h e e p - => bêler (mouton,son) bleat (vi) : - v o i c e - => bêler (voix,son) blow (vi) : -foghorn- => mugir (corne de brume,son) blow (vi) : -trumpet- => sonner (trompette,son) blow (vt) : - h o r n - => jouer de (klaxon,causson) blow (vt) : -trumpet- => jouer de (trompette,causson) bray (vi) : - a s s - => braire (âne,son) bray (vi) : -trumpet- => résonner (trompette, son) burble (η) : -stream- => murmure (ruisseau,sOson) burble (vi) : -person- => marmonner (personne,son) burble (vi) : -stream- => murmurer (ruisseau,son) burbling (n) : -person- => marmonnement (personne,sOson) burbling (n) : -stream- => murmure (ruisseau,sOson) buzz (n) : -conversation- => bourdonnement (conversation,sOson) buzz (n) : -insect- => bourdonnement (insecte,sOson) buzz (vi) : - h a l l - => être (tout) bourdonnant ( de) (salle,son) buzz (vi) : -insect- => bourdonner (insecte,son) buzzing (adj) : -insect- => bourdonnant (insecte,aOson) cackle (η) : - h e n - => caquet (poule,sOson) cackle (η) : -people- => caquetage (gens,sOson) cackle (n) : -people- => gloussement (gens,sOson) cackle (vi) : - h e n - => caqueter (poule,son) cackle (vi) : -people- => caqueter (gens,son) cackle (vi) : -people- => glousser (gens,son)

283 call call call call call call

(η) : ~bird~ = > cri < m > (oiseau,sOson) (η) : ~bugle~ = > sonnerie < f > (clairon,sOson) (n) : ~drum~ = > batterie < f > (tambour,sOson) (n) : ~trumpet~ = > sonnerie (trompette,sOson) (vi) : - b i r d - = > pousser un cri (oiseau,son) (vi) : - p e r s o n - = > appeler (personne,son)

284 16.2

Appendix Β: Verbs of sound (in reverse alphabetical order)

throb sound thud voice breathe shake strike babble gobble rumble grumble warble burble muffle gaggle jangle j ingle gurgle cackle crackle

ripple tootle whistle rustle rattle prattle chime whine drone tone pipe blare ululate grate vocalize bang twang ring sing zing

chug screech squelch punch scratch sigh sough clash swish swoosh speak creak squeak croak crack tick knock cluck pluck pink

honk bark squawk peal squeal wail call squall trill roll toll carol snarl caterwaul bawl howl growl yowl scream slam

boom zoom hum drum strum moan groan turn go coo lap flap snap rap yap cheep peep romp thump chop

plop pop chirp roar gibber thunder jeer bicker snicker whimper whisper chatter clatter patter twitter splutter mutter whirr purr murmur

hiss beat bleat trumpet fret spit chant grunt hoot snort shout mew low blow bellow crow bray cry fizz buzz

285 16.3

Appendix C: The Mult lexical function (sample list)

mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul mul

abuse / injure ) = spate (torrent ) [C] abuse / injure ) = storm (torrent ) [C] ant / fourmi ) = nest (nichée ) [C] ant / fourmi ) = swarm (fourmillement ) [C] applause / applaudissement ) = burst (salve ) [C] applause / applaudissement ) = storm (tempête ) [C] applause / applaudissement ) = thunder (tonnerre) [C] applause / applaudissement ) = volley (salve) [C] asparagus / asperge ) = bunch (botte ) [C] banana / banane ) = bunch (régime ) [C] banana / banane ) = cluster (régime ) [C] bee / abeille ) = cluster (essaim ) [C] bee / abeille ) = swarm (essaim ) [C] camel / chameau ) = train (caravane ) [C] cattle / bétail ) = herd (troupeau ) [C] discussion / discussion ) = round (série ) [C] fish / poisson ) = school (banc ) [C] fish / poisson ) = shoal (banc ( )) [C] goose / oie ) = flock (troupeau ) [C] goose / oie ) = gaggle (troupeau ) [C] key / clé ) = bunch (trousseau ) [C] key / clé ) = set (jeu) [C] mountain / montagne ) = chain (chaîne ) [C] mountain / montagne ) = group (massif ) [C] mountain / montagne ) = range (chaîne ) [C] mountain / montagne ) = ridge (chaîne ) [P] mouse / souris ) = brood (nichée) [C] mouse / souris ) = nest (nichée ) [C] oath / juron ) = stream (flot) [C] onion / oignon ) = string (chapelet ) [C] oyster / huître ) = bed (banc) [C] policeman / policier ) = platoon (peloton ) [C] policeman / policier ) = squad (escouade ) [C] protest / protestation ) = storm (tempête ) [C] protest / protestation ) = wave (vague) [P] question / question ) = barrage (pluie ) [C] question / question ) = crop (série ) fP] radish / radis ) = bunch (botte ) [C] roe deer / chevreuil ) = bevy (harde ) [C] seal / phoque ) = rookery (colonie) [C] spy / espion ) = ring (réseau ) [C] stag / cerf ) = herd (harde ) [C] thief / voleur ) = pack (bande ) [C]

286 16.4

culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm culm

Appendix D: The Culm lexical function

( abuse / injure ) = grossness (énormité ) [C] ( ambition / ambition ) = consummation (couronnement ) [C] ( ambition / ambition ) = summit (summum ) [C] ( anger / colère ) = paroxysm (accès ) [C] ( art form / forme artistique ) = consummation (perfection ) [C] ( attack / attaque ) = the brunt (le (plus gros du) choc) [C] ( blow / coup ) = the brunt (le (plus gros du) choc) [C] ( business / affaire ) = peak period (période de pointe) [Ρ] ( career / carrière ) = copestone (couronnement) [C] ( career / carrière ) = culmination (apogée ) [C] ( career / carrière ) = peak (sommet) [C] ( despair / désespoir ) = extremity (extrême degré) [C] ( disturbance / perturbation ) = culmination (point culminant) [C] ( evening / soirée ) = the high spot (le clou) [C] ( fortune / fortune ) = height (apogée ) [C] ( glory / gloire ) = height (sommet ) [C] ( glory / gloire ) = summit (sommet) [C] ( goodness / bonté ) = epitome (modèle ) [C] ( grandeur / grandeur ) = height (sommet) [C] ( grief / chagrin ) = paroxysm (paroxysme) [C] ( happiness / bonheur ) = completion (comble ) [C] ( happiness / bonheur ) = extremity (extrême degré) [C] ( holiday / vacances ) = the high spot (le grand moment) [C] ( honour / honneur ) = summit (sommet) [C] ( misfortune / malheur ) = completion (comble ) [C] ( pain / douleur ) = paroxysm (paroxysme) [C] ( power / pouvoir ) = summit (sommet) [C] ( quarrel / dispute ) = culmination (point culminant) [C] ( road / route ) = crest (haut de côte) [C] ( show / show ) = the high spot (le clou) [C] ( success / succès ) = culmination (apogée ) [C] ( success / succès ) = height (point culminant) [C] ( traffic / trafic ) = peak hours (heures d' affluence) [P] ( traffic / trafic ) = peak period (période d' affluence) [P] ( virtue / vertu ) = epitome (modèle ) [C] ( visit / visite ) = the high spot (le grand moment) [C]

287 16.5

Appendix E: The Centr lexical function

centr abscess / abcès ) = head (tête ) [C] centr activity / activité ) = ganglion (centre ) [C] centr argument / argument ) = point ((point ) essentiel ) [C] centr artichoke / artichaut ) = heart (fond ) [C] centr atom / atome ) = core (noyau enveloppé de son électron(s)) [Ρ] centr bone / os ) = marrow (moelle ) [C] centr bone / os ) = pith (moelle ) [C] centr book / livre ) = matter (fond ) [C] centr book / livre ) = message (message ) [C] centr cabbage / chou ) = heart (coeur) [C] centr cable / câble ) = core (âme ) [C] centr celery / céleri ) = heart (coeur) [C] centr cell / cellule ) = nucleus (nucléus ) [C] centr commerce / commerce ) = seat (centre ) [C] centr conversation / conversation ) = gist (fond ) [C] centr earth / terre ) = womb (sein) [Ρ] centr energy / énergie ) = ganglion (foyer ) [C] centr fruit / fruit ) = core (trognon ) [C] centr fruit / fruit ) = stone (noyau ) [P] centr fruitstone / fruit à noyau ) = kernel (amande ) [C] centr fruitstone / fruit à noyau ) = pit (noyau ) [P] centr grape / raisin ) = seed (pépin ) [P] centr hurricane / ouragan ) = eye (oeil) [C] centr idea / idée ) = epitome (quintessence ) [C] centr idea / idée ) = foundation (base) [C] centr interest / intérêt ) = focus (centre ) [C] centr joke / blague ) = point (astuce ) [C] centr joke / blague ) = punch-line (astuce ) [C] centr lettuce / laitue ) = heart (coeur) [C] centr magnet / aimant ) = core (noyau ) [C] centr nature / nature ) = womb (sein ) [P] centr nut / noix ) = kernel (amande ) [C] centr problem / problème ) = core (essentiel ) [P] centr problem / problème ) = crux (coeur ) [C] centr problem / problème ) = knot (noeud) [C] centr question / question ) = gist (point principal) [C] centr reactor / réacteur nucléaire ) = core (coeur ) [C] centr report / rapport ) = gist (fond ) [C] centr speech / discours ) = body (fond ) [C] centr subject / sujet ) = epitome (quintessence ) [C] centr summer / été ) = midsummer (milieu de Γ été) [Ρ] centr table / table ) = centre-piece (milieu de table) [C] centr target / cible ) = bull's-eye (centre ) [C] centr town / ville ) = heart (coeur ) [C]

288

centr centr centr centr centr

( ( ( ( (

unrest / troubles ) = focus (foyer) [C] wheel / roue ) = axle (axe ) [C] wheel / roue ) = hub (moyeu ) [C] wheel / roue ) = nave (moyeu ) [C] winter / hiver ) = midwinter (milieu de Γ hiver) [Ρ]

289 16.6

sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe sloe

Appendix F: The S ^ lexical function (sample list)

ammunition / munition ) = pouch (étui ) [P] ant / fourmi ) = nest (nid ) [C] archbishop / archevêque ) = see (archevêché ) [C] arrow / flèche ) = quiver (carquois ) [P] bacteria culture / culture de bactéries ) = incubator (incubateur) [C] beaver / castor ) = lodge (abri ) [C] bee / abeille ) = hive (ruche ) [P] bishop / évêque ) = see (siège épiscopal) [C] cattle / bétail ) = barn (étable ) [C] cattle / bétail ) = shed (étable ) [P] dagger / poignard ) = scabbard (gaine ) [C] dagger / poignard ) = sheath (gaine ) [C] eye / oeil ) = socket (orbite ) [C] fox / renard ) - earth (terrier ) [C] fox / renard ) = hole (terrier ) [C] fox / renard ) = kennel (repaire ) [C] judge / juge ) = chamber (cabinet ) [C] jury / jury ) = box (banc ) [C] left luggage / bagages en consigne ) = cloakroom (consigne ) [P] lion / lion ) = den (tanière ) [C] money / argent ) = box (caisse ) [C] money / argent ) = pouch (bourse ) [P] money / argent ) = safe (coffre-fort ) [P] mortar / mortier ) = hod (oiseau ) [P] mouse / souris ) = hole (trou ) [P] mouse / souris ) = nest (nid ) [C] pig / porc ) = sty (porcherie ) [C] rabbit / lapin ) = hole (terrier ) [C] rabbit / lapin ) = hutch (clapier ) [C] rabbit / lapin ) = warren (terriers ) [C] rainwater / eau de pluie ) = butt ((gros) tonneau ) [C] rainwater / eau de pluie ) = storage tank (citerne ) [C] rainwater / eau de pluie ) = tank (citerne ) [p] rubbish / ordure ) = ash-bin (boîte à ordures) [P] scissors / ciseaux ) = sheath (étui ) [C] sword / épée ) = scabbard (fourreau ) [C] sword / épée ) = sheath (fourreau ) [C] taking / recette ) = till (caisse) [P] tar / goudron ) = barrel (gonne ) [C] tar / goudron ) = drum (gonne ) [P] telephone / téléphone ) = booth (cabine ) [C] tobacco / tabac ) = pouch (blague ) [P] vinegar / vinaigre ) = cruet (vinaigrier ) [P] witness / témoin ) = box (barre ) [C]

290 16.7

operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi operi

Appendix G: The Oper, lexical function (sample list)

( accusation / accusation ) = fling (lancer ( à qn)) [S] ( accusation / accusation ) = lay (porter) [S] ( accusation / accusation ) = sling (lancer ( à qn)) [S] ( action / action ) = prefer (intenter) [S] ( admiration / admiration ) = bestow (accorder) [S] ( advantage / avantage ) = enjoy (jouir de) [S] ( apology / excuse ) = offer (offrir) [S] ( apology / excuse ) = present (présenter ( à)) [S] ( apology / excuse ) = proffer (offrir) [S] ( apology / excuse ) = tender (offrir) [S] ( application / candidature ) = put in (faire) [S] ( argument / argument ) = bring forward (avancer) [S] ( argument / argument ) = develop (développer) [S] ( argument / argument ) = pose (présenter) [S] ( argument / argument ) = prefer (présenter) [S] ( argument / argument ) = pull out (sortir) [S] ( argument / argument ) = put forward (avancer) [S] ( barricade / barricade ) = erect (dresser) [S] ( barrier / barrière ) = put up (ériger) [S] ( bath / bain ) = take (prendre) [S] ( bend / virage ) = negotiate (prendre) [S] ( bend / virage ) = round (prendre) [S] ( bend / virage ) = take (prendre) [S] ( blasphemy / blasphème ) = bellow (vociférer) [S] ( blow / coup ) = aim (allonger) [S] ( blow / coup ) = deliver (porter) [S] ( blow / coup ) = fetch (flanquer) [S] ( blow / coup ) = plant (appliquer) [S] ( blow / coup ) = return (rendre) [S] ( blunder / gaffe ) = perpetrate (faire) [S] ( campaign / campagne ) = spearhead (mener) [S] ( challenge / défi ) = throw out (jeter) [S] ( charge / accusation ) = lay (porter) [S] ( charge / accusation ) = prefer (porter) [S] ( comment / commentaire ) = pass (faire) [S] ( comment / commentaire ) = slip in (glisser) [S] ( comparison / comparaison ) = draw (établir) [S] ( complaint / plainte ) = present (déposer) [S] ( condolences / condoléance ) = extend (présenter) [S] ( congratulations / félicitation ) = extend (présenter) [S] ( control / contrôle ) = exercise (exercer) [S] ( control / contrôle ) = wield (exercer) [S] ( crime / crime ) = commit (commettre) [S] ( crime / crime ) = perpetrate (perpétrer) [S]

291 16.8

liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu liqu

( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (

Appendix H: The Liqu lexical function (sample list)

agreement / accord ) = call off (rompre) [S] agreement / accord ) = cancel (résilier) [S] agreement / accord ) = rescind (annuler) [S] appendix / appendice ) = take out (enlever) [S] ban / embargo ) = lift (lever) [S] blister / ampoule ) = prick (crever) [S] blockade / blocus ) = lift (lever) [S] cold / rhume ) = shake off (se débarrasser de) [S] cold / rhume ) = shrug off (se débarrasser de) [S] cold / rhume ) = sweat out (guérir en transpirant) [S] cold / rhume ) = throw off (se débarrasser de) [S] corruption / corruption ) = abate (faire cesser) [S] crease / pli ) = iron out (faire disparaître au fer) [S] crease / pli ) = press out (aplatir) [S] crease / pli ) = press out (aplatir au fer) [S] crease / pli ) = smooth out (faire disparaître) [S] custom / coutume ) = abolish (supprimer) [S] custom / coutume ) = do away with (supprimer) [S] custom / coutume ) = put down (faire cesser) [S] custom / coutume ) = stamp out (déraciner) [S] death penalty / peine de mort ) = abolish (abolir) [S] disguise / masque ) = throw off (jeter) [S] embargo / embargo ) = raise (lever) [S] epidemic / épidémie ) = stay (enrayer) [S] fire / feu ) = beat out (étouffer) [S] fire / feu ) = choke (étouffer) [S] fire / feu ) = damp (couvrir) [S] fire / feu ) = extinguish (éteindre) [S] fire / feu ) = quench (éteindre) [S] fire / feu ) = stamp out (piétiner) [S] fire / feu ) = stifle (étouffer) [S] fracture / fracture ) = set (réduire) [S] inheritance / héritage ) = squander (dissiper) [S] injustice / injustice ) = right (réparer) [S] suspicion / soupçon ) = avert (écarter) [S] suspicion / soupçon ) = dissipate (dissiper) [S] suspicion / soupçon ) = drive away (chasser) [S] suspicion / soupçon ) = eliminate (éliminer) [S] suspicion / soupçon ) = quieten (calmer) [S] suspicion / soupçon ) = remove (dissiper) [S] wrong / tort ) = redress (redresser) [S] wrong / tort ) = repair (réparer) [S] wrong / tort ) = right (redresser) [S] wrong / tort ) = undo (réparer) [S]

292 16.9

Appendix I: The noun PRICE and lexical functions

The general form is f(X)=Y, where f is the lexical function, if any, X is the base (viz. price) and Y is the collocator. The combinations are listed as a function of the LF. price / pr χ = adjustment (rajustement ) [C] price / pr χ = agree (se mettre d'accord sur) [S] price / p r χ = all-in (net) [S] price / p r χ = average (moyen) [S] price / p r χ = determine (fixer) [S] price / pr χ = estimate (estimer) [S] price / p r χ = even out (s' égaliser) [C] price / p r χ = even out (égaliser) [S] price / p r χ = fix (fixer) [S] price / p r χ = forward (à terme) [S] price / p r χ = lay down (imposer) [S] price / p r χ = mean (moyen) [S] price / p r χ = minimum (minimum) [S] price / p r χ = name (fixer) [S] price / p r χ = net (net) [S] price / p r χ = opening (d'ouverture) [S] price / p r χ = overestimate (surestimer) [S] price / p r χ = police (contrôler) [S] price / p r χ = quote (coter ( à)) [S] price / p r χ = quote (indiquer) [S] price / p r χ = range (échelle ) [C] price / p r χ - regular (normal) [S] price / pr χ = round down (arrondir (au chiffre inférieur)) [S] price / p r χ = round up (arrondir (au chiffre supérieur)) [S] price / p r χ = schedule (barème ) [C] price / p r χ = set (fixe) [S] price / p r χ = special ((tout) spécial) [S] price / p r χ = spread (gamme ) [C] price / p r χ = stipulate (stipuler) [S] price / p r χ = suit (convenir à) [C] price / p r χ = survey (enquête ( sur)) [C] price / p r χ = unbeaten (non battu) [S] price / p r χ = wholesale (de gros) [S] aOfactOactual ( price / prix ) = current (courant) [S] aOfactOactual ( price / prix ) = going (existant) [S] aOfactOactual ( price / prix ) = ruling (pratiqué) [S] aOfactOusual ( price / prix ) = usual (courant) [S] aOinceppredminus ( price / prix ) = decreasing (en baisse) [S] aOinceppredminus ( price / prix ) = diminishing (qui baisse) [S] aOinceppredplus ( price / prix ) = rising (en hausse) [S] aOinceppredplus ( price / prix ) = soaring (qui monte en flèche) [S]

al culm ( price / prix ) = outside (maximum) [S] antiaOinceppredplus+minus ( price / prix ) = stable (stable) [S] antiaOinceppredplus+minus ( price / prix ) = steady (stable) [S] antiinceppredplus ( price / prix ) = keep up (se maintenir) [C] antimagn ( price / prix ) = competitive (concurrentiel) [S] antimagn ( price / prix ) = giveaway (dérisoire) [S] antimagn ( price / prix ) = low (bas) [S] antimagn ( price / prix ) = moderate (modéré) [S] antimagn ( price / prix ) = modest (modeste) [S] antimagn ( price / prix ) = modest (modique) [S] antimagn ( price / prix ) = sacrificial (bas ( basse)) [S] antipermpredplus ( price / prix ) = control (contrôler) [S] antipermpredplus ( price / prix ) = freeze (bloquer) [S] causfinpredplus ( price / prix ) = peg (stabiliser) [S] causmanif ( price / prix ) = mark (marquer) [S] causmanif ( price / prix ) = mark up (marquer) [S] causpredminus ( price / prix ) = beat down (faire baisser) [S] causpredminus ( price / prix ) = bring down (faire baisser) [S] causpredminus ( price / prix ) = depress (faire baisser) [S] causpredminus ( price / prix ) = drop (baisser) [S] causpredminus ( price / prix ) = lower (baisser) [S] causpredminus ( price / prix ) = mark down (baisser) [S] causpredminus ( price / prix ) = reduce (baisser) [S] causpredminus ( price / prix ) = send down (faire baisser) [S] causpredminus ( price / prix ) = slash (casser) [S] causpredplus price / prix ) advance (augmenter) [S] causpredplus price / prix ) boost (hausser) [S] causpredplus price / prix ) bump up (faire grimper) [S] causpredplus price / prix ) enhance (augmenter) [S] causpredplus price / prix ) escalate (faire monter en flèche) [S] causpredplus price / prix ) increase (augmenter) [S] causpredplus price / prix ) inflate (faire monter) [S] causpredplus price / prix ) jack up (faire grimper) [S] causpredplus price / prix ) mark up (hausser) [S] causpredplus price / prix ) push up (augmenter) [S] causpredplus price / prix ) put up (augmenter) [S] causpredplus price / prix ) = raise (majorer) [S] causpredplus price / prix ) = up (augmenter) [S] causpredplus(2x) ( price / prix ) = double (doubler) [S] finpred(plus/minus) ( price / prix ) = level off (se stabiliser) [C] finpred(plus/minus) ( price / prix ) = level out (se stabiliser) [C] finpred(plus/minus) ( price / prix ) = steady (se stabiliser) [C] incepoperl ( price / prix ) = realize (atteindre) [S] inceppred(plus/minus) ( price / prix ) = fluctuate (varier) [C] inceppredminus ( price / prix ) = collapse (s' effondrer) [C] inceppredminus ( price / prix ) = come down (baisser) [C] inceppredminus ( price / prix ) = decline (baisser) [C] inceppredminus ( price / prix ) = decrease (baisser) [C]

294 inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredminus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus inceppredplus

( price / prix = dip (fléchir) [C] ( price / prix = drop (baisser) [C] ( price / prix = fall (baisser) [C] ( price / prix = go down (baisser) [C] ( price / prix = lower (baisser) [C] ( price / prix = plummet (dégringoler) [C] ( price / prix = plunge (dégringoler) [C] ( price / prix = recede (baisser) [C] ( price / prix = sag (fléchir) [C] ( price / prix = sink (tomber très bas) [C] ( price / prix = slump (s' effondrer) [C] ( price / prix = toboggan (dégringoler) [C] ( price / prix = weaken (fléchir) [C] price prix ) = advance (monter) [C] price prix ) = boom (être en forte hausse) [C] price prix ) = go up (monter) [C] price prix ) = harden (être en hausse) [C] price prix ) = hike (augmenter) [C] price prix ) = increase (augmenter) [C] price prix ) = jump (monter en flèche) [C] price prix ) = leap up (faire un bond) [C] price prix ) = mount (monter) [C] price prix ) = pick up (remonter) [C] price prix ) = rise (monter) [C] price prix ) = rocket (monter en flèche) [C] price prix ) = shoot up (monter en flèche) [C] price prix ) skyrocket (monter en flèche) [C] price prix ) soar (monter en flèche) [C] price prix ) spiral (monter en flèche) [C] price prix ) spiral up (monter en flèche) [C] price prix ) to rise climb steeply (monter en flèche) [C] inceppredplus(2x) ( price / prix ) = double (doubler) [C] magn ( price / prix ) = dear (cher) [S] magn ( price / prix ) dear (élevé) [S] magn ( price / prix ) hefty (gros ( grosse)) [S] magn ( price / prix ) high (élevé) [S] magn ( price / prix ) pretty (joli) [S] magn ( price / prix ) steep (élevé) [S] magn ( price / prix ) stiff (élevé) [S] magn+al excess ( price / prix ) excessive (excessif) [S] magn+al excess ( price / prix ) exorbitant (exorbitant) [S] magn+al excess ( price / prix ) extortionate (exorbitant) [S] magn+al excess ( price / prix ) extravagant (exorbitant) [S] magn+al excess ( price / prix ) inflated (exagéré) [S] magn+al excess ( price / prix ) outrageous (scandaleux) [S] magn+al excess ( price / prix ) = prohibitive (prohibitif) [S] magn+alexcess ( price / prix ) = shocking (exorbitant) [S] magn+al excess ( price / prix ) = unreasonable (qui η' est pas raisonnable) [S]

295 mult ( price / prix ) = table (classement ) [C] mult ( price / prix ) = table (liste ) [C] oper3 ( price / prix ) = ask (demander) [S] pos2 ( price / prix ) = attractive (intéressant) [S] predplus ( price / prix ) = up (avoir augmenté) [C] sOantimagn ( price / prix ) = lowness (modicité ) [C] sOantimagn ( price / prix ) = modesty (modicité ) [C] sOantipermpredplus ( price / prix ) = freeze (blocage ) [C] sOcauspredminus ( price / prix ) = reduction (diminution) [C] sOcauspredplus ( price / prix ) = rising (augmentation ) [C] sOinceppredminus ( price / prix ) = collapse (effondrement) [C] sOinceppredminus ( price / prix ) = cutting (réduction ) [C] sOinceppredminus ( price / prix ) = decrease (baisse ( de)) [C] sOinceppredminus ( price / prix ) = drop (baisse) [C] sOinceppredminus ( price / prix ) = lowering (baisse) [C] sOinceppredminus ( price / prix ) = sag (fléchissement) [C] sOinceppredminus ( price / prix ) = slide (baisse ) [P] sOinceppredminus ( price / prix ) = slump (effondrement ( de)) [Ρ] sOinceppredplus ( price / prix ) = advance (hausse ) [P] sOinceppredplus ( price / prix ) = boom (brusque hausse) [C] sOinceppredplus ( price / prix ) = bulge (hausse ) [C] sOinceppredplus ( price / prix ) = hike (hausse ) [P] sOinceppredplus ( price / prix ) = increase (augmentation) [C] sOinceppredplus ( price / prix ) = inflation (hausse ) [C] sOinceppredplus ( price / prix ) = rise (hausse) [Ρ] sOmagn+sOexcess ( price / prix ) = exorbitance (énormité ) [C] sOmagn+sOexcess ( price / prix ) = unreasonableness (caractère exorbitant) [C] ver ( price / prix ) = keen (étudié (de près)) [S] ver ( price / prix ) = reasonable (raisonnable) [S]

296 16.10

Appendix J: List of ergative verbs

The following list includes causative/inchoative verbs extracted from the Collins-Robert dictionary. The "patient" argument, i.e. the entity that changes state, can be the subject of the intransitive (inchoative) verb ( - > in square brackets) or the direct object of the transitive (causative) verb ( - > unbracketed). Patient Patient Verb Verb abate advance advance advance amalgamate amalgamate back out bake balance begin begin bend bend down blend blend blend block blow blow blow off blow out boil bounce bounce bounce break break break break break up break up break up brew brew brew brighten brighten brown brown bruise buckle buckle buckle bud build up build up build up burn burn burn burn burn burn out

rent price troops work company metal vehicle pottery account custom movement branch branch colour idea style wheel fuse trumpet hat light water ball cheque person bone health news stick crowd meeting road beer plot tea future person person skin fruit belt metal wheel tree excitement pressure tension cake meat milk person sauce candle

burst burst burst burst burst open button camber camber cease chafe charge chill chime chip away clash clash clear clog close close down close down close up clot cloud cloud cockle cockle coil collapse collapse conform congeal congeal congeal cook cool crack crack crack crack crack crack up crumble crumble crumble crush curdle curdle curl dangle dangle darken darken

balloon bomb bubble tyre container garment beam road activity rope battery wine bell point cymbal metallic object ship pipe shop business shop wound blood expression face cloth paper rope chair table action blood milk oil food air glass ground pottery wall whip plane bread earth plaster clothes blood milk hair arm leg colour room

297 darken decay decay decay deteriorate develop dim dim dim dim dim dim dim disband discolour discolour do up double drag drain drain away draw draw away drip drip drive away drive out droop drop drop dry out end end escalate evaporate even out evolve evolve extend fall in fasten fasten fasten fasten feed fill finish flare flash float flood fly focus focus fog fog fold foul fray fray fray fuel fuel gallop gather gather

sky food tooth wood material region beauty colour glory light memory outline sight army white material white material dress price object vegetable liquid tea person cheese washing person person head conversation price alcoholic series speech fight liquid price plan system visit troops box door dress window baby hole game skirt light currency river flag light ray glasses mirror chair fishing line cloth garment rope aircraft ship horse object people

gather get across get in get through get under way get up graduate grate graze ground group grow harden harmonize hatch hatch heal heighten heighten herd together herd together hold together hole hook up ice ice ice over increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase increase infiltrate integrate iron jam jam jam keep off keep together ladder land leach leak out lengthen lessen

troops play person message progress person colour chalk cattle ship people hair steel colour chick egg wound fear tension animal people object sock dress aircraft wing windscreen river darkness effort firm friendship institution joy noise number pain population possession price pride rage rain riches sales sorrow strength supply surprise tax town trade wind liquid ethnic group clothes brake door gun person people stocking aircraft liquid news visit pain

298 lessen lighten link up load load load load load up load up lock lodge lodge loll out loosen loosen loosen lower lower mass match mellow mellow mellow mellow mellow mellow melt melt melt mildew mildew mildew mist mist mist move move down move forward move forward move forward move up move up naturalize naturalize offer open open open open open open open out open up open up operate operate operate overbalance overflow overlap overturn overturn overturn pack pair part

tension face spacecraft camera gun lorry ship lorry person door bullet person tongue knot rope screw pressure price troops cup character colour fruit person voice wine butter ice metal cloth paper vine eye mirror windscreen troops person person troops vehicle employee person person plant opportunity book debate door eye meeting shop business business career machine system vehicle person container tile boat car chair people animal boxer

part pass peal peel peel away pep up percolate perish perish pile up pitch pop pop pop pop puff out pul1 round pull round pull up pull up pump out puncture push out push out put down rally rattle refit reform register restart restart reverberate reverberate reverberate revive revive ring ring roast rock rock roll along roll over rotate rub off run run away run on run on run out run out rustle rustle sail scatter scatter scratch set set sever shake shake sharpen shear off shunt

crowd time bell fruit skin person coffee food rubber reason ball balloon cork corn press stud sail sick person unconscious person horse vehicle oil tyre root shoot aircraft troops box ship person part engine machine heat light sound hope person bell coin meat cradle ship ball person crop writing machine water letter word chain rope leaf skirt boat cloud crowd competitor concrete jam rope person window pain branch person

shut shut shut shut down shut down shut down simmer simmer simmer simmer sink sink sink slacken slacken slacken slacken slam slant slant slip off slot in slot in slot in slot together slot together slow slow slow slow slur smarten up smarten up smear smother snap snap snarl snarl snarl snarl up snarl up soak out soften soften soften soften soften soften soften soften soften soften soften up soften up soften up soften up soften up soften up soften up sound sound sour spill spin spin

door drawer shop business shop theatre soup stew vegetable water object person ship cable pressure rope screw door handwriting line cover item part piece part piece machine progress reaction vehicle sound person town point person rubber band whip hair rope wool plan traffic stain anger butter clay collar colour ground leather outline pitch skin butter clay collar ground leather pitch skin bell trumpet milk salt ball top

spin splay splay splay splinter splinter splinter splinter split split split split split split split split off split off split off split off split off split up split up split up split up spout spout spread spread spread spread spread spread spread spread spread out spread out spring spurt spurt squash stall stall stampede stampede stand start start start start stew stew stick stick down stick on stick on stick out stiffen stiffen stiffen stiffen stiffen stiffen stiffen stiffen stop stop

wheel end frame window frame bone glass party wood fabric garment party pole seam stone wood branch company department group piece crowd meeting party wood end of pipe liquid butter disease indignation infection knowledge news panic rumour fan wing timber flame water fruit car engine animal people object clock custom engine movement fruit meat needle envelope label stamp rod card dough fabric joint limb morale paste resistance allowance enj oyment

300 stop stop stop stop stop stop stop straighten strengthen strengthen stretch stretch stretch stretch stretch out stretch out stretch out strike surface sway sweat sweat sweeten sweeten swell swell swell swell swell swing swing swing swing swing swing round swing round swing round swing round swing round swish swish switch off switch on take apart take apart taper taper taper taper taper taper taper taper taper tear thaw thaw thaw thicken thread thread tie tie tie tighten tighten

machine pain person privilege process vehicle worry road limb muscle authority elastic glove rope arm foot hand match submarine object animal person person temper friend number river sail sound arm hammock leg object pendulum car convoy crane plane ship cane whip heater heater machine toy belt column end hair leg stick structure table leg trouser leg cloth food ice snow sauce bead needle necktie rope shoelace regulation rope

tighten tighten tip back tip forward tip up tip up tip up toll topple toss about toss about toughen toughen toughen toughen toughen toughen trail transfer transfer transfer transfer transfer transfer transfer trot tumble out tumble out turn turn turn turn turn turn turn back turn off turn on turn out turn round turn round turn round twirl twirl twirl twirl twist twist up unload unload veer veer waft warm warm warm warm warm up warm up warm up warm up wash out waste waste wave wave weaken

screw wheel chair chair box jug table bell government boat plume cloth condition glass leather metal person obj ect civil servant diplomat employee player prisoner servant soldier horse contents object handle key knob milk screw wheel vehicle heater heater troops object person vehicle cane handle knob lasso rope rope ship truck car ship smell food person room water audience car discussion engine stain food resource flag hair country

301 weaken weaken weaken weaken wear wear wear wear wear down wear down wear out weather wedge in weigh in weigh in whirl whirl whirl whirl whirl round whirl round wiggle wilt wilt wind up withdraw wither wither wither wither work work work out work out work out wrinkle zip up

material person structure team clothes fabric stone wood courage resistance clothes wood person boxer jockey dust leaf sand smoke chair revolving chair screw flower plant meeting troops beauty hope limb plant machine mechanism plan problem puzzle nose dress

17 Résumé

Le présent ouvrage décrit la construction et l'exploitation d'une base de données lexicosémantiques à partir du dictionnaire bilingue anglais-français Robert & Collins. La version électronique de ce dictionnaire a été mise à la disposition du département d'anglais de l'Université de Liège à des fins non commerciales et les travaux décrits dans ce livre s'inscrivent dans la droite ligne des efforts fournis par la communauté scientifique et le monde industriel pour développer des outils linguistiques destinés à faciliter la tâche des traducteurs, des rédacteurs et des apprenants. Les perspectives offertes par le développement de la base de données léxico-sémantiques décrite ici ne se limitent cependant pas à la construction d'outils informatisés d'aide à la traduction ou à la rédaction. La grande diversité des chemins d'accès proposés par l'interface d'interrogation permet en effet d'utiliser la base de données enrichie pour des études plus fondamentales de sémantique lexicale où l'accès à une grande quantité de données est crucial pour pouvoir dégager des généralisations linguistiquement intéressantes. La première partie du livre replace l'étude dans le contexte des recherches effectuées ces quinze dernières années dans le domaine de l'acquisition lexicale, c'est-à-dire l'extraction de connaissances lexicales (syntaxiques, sémantiques, pragmatiques) des dictionnaires informatisés, mais aussi des bases de données textuelles. L'exploitation du dictionnaire Robert & Collins étant guidée par le souci de réutiliser et rendre plus accessible l'information collocationnelle qu'il contient, la notion de collocation est examinée en détail et un panorama des différentes approches permettant de représenter les collocations est offert. Le cadre théorique qui sous-tend toute l'entreprise, la théorie SensoTexte du linguiste russe Igor Mel'öuk, fait également l'objet d'une présentation détaillée. Le système complexe de fonctions lexicales, à la base de la Théorie SensoTexte, a en effet été utilisé pour identifier et coder dans la base de données les relations lexico-sémantiques unissant les étiquettes métalinguistiques apparaissant en italiques dans le dictionnaire imprimé et les entrées sous lesquelles on retrouve ces étiquettes. L'appareil métalinguistique en italiques couvre en effet toute une série d'informations cruciales pour la désambiguïsation et la sélection de la traduction correcte dans un contexte donné (restrictions de sélection et collocations signalant les objets, sujets et compléments typiques, etc.). Dans sa version décrite dans ce livre, la base de données est organisée de façon à permettre l'accès à n'importe quel type d'information, qu'il s'agisse de l'entrée anglaise, de la partie du discours, de la base d'une collocation ou du collocatif, des traductions, des registres de langue, etc. Il est par exemple possible d'extraire les verbes prenant le nom brake comme objet direct (jam, line, operate, reline) ou les verbes intransitifs prenant brake comme sujet (drag, fail, grip, jam, scream, screech, squeal). Pour obtenir le même résultat sans l'ordinateur, il faudrait parcourir les quelque 800 pages de la version papier du dictionnaire afin d'y repérer les occurrences de brake au sein de la

304 microstructure des verbes mentionnés ci-dessus. Plus fondamentalement, la base de données a été enrichie manuellement par l'ajout de fonctions lexicales permettant de rendre compte de la diversité des relations lexico-sémantiques unissant les éléments en italiques et leur entrée. L'étiquetage systématique de ces liens a ainsi permis une modélisation plus poussée des réseaux sémantiques présents dans la base de données. Cette approche permet maintenant à l'utilisateur de poser des questions sémantiquement beaucoup plus fines, comme par exemple: * Quels verbes peuvent être utilisés en anglais ou en français pour désigner le bruit typique des freins? -> Son (brake / frein) = scream / hurler, screech / grincer, squeal / grincer * Y a-t-il des verbes transitifs/causatifs signifiant que quelque chose rend les freins inopérants? CausObstr (brake / frein) = jam / bloquer * Quelles sont les parties des freins? -> Part (brake / frein) = drag / sabot, lining / garniture La base de données relationnelle résultant du codage de plus de 70.000 enregistrements est décrite en détail, de même que les programmes de codage et l'interface permettant de l'interroger. Dans un certain nombre de cas, il a aussi été possible de semi-automatiser l'attribution des fonctions lexicales en exploitant la présence de structures récurrentes dans certaines micro-définitions (les chaînes noise of ou sound of au début d'une définition, par exemple, indiquent que la fonction lexicale S 0 Son doit être utilisée pour coder le lien sémantique entre un nom et un autre nom exprimant le son typique émis par le premier). La seconde partie du livre aborde les possibilités d'exploitation de la base de données lexico-sémantiques du Robert & Collins. Les limitations de la théorie SensoTexte de Mel'cuk dans le contexte du traitement automatique du langage naturel sont décrites en détail et certaines réserves sont émises quant à l'imprécision de certaines fonctions lexicales. Les différents chapitres consacrés à des études de sémantique lexicale montrent comment la base de données peut être utilisée pour dégager des généralisations concernant certaines classes de verbes, comme par exemple les verbes exprimant un son ou les verbes dits ergatifs, ou concernant la relation entre les fonctions lexicales et certains types de métaphores. Les perspectives d'applications pédagogiques pour l'enseignement des langues ou de la traduction sont aussi évoquées, tout comme des projets visant à exploiter l'information collocationnelle du dictionnaire pour la désambiguïsation automatique en contexte. Il apparaît en effet que les recherches en linguistique computationnelle et en lexicographie nécessitent de plus en plus le développement de grandes bases de donnée lexicales monolingues et multilingues. Le Robert & Collins décrit dans ce livre et les réseaux lexico-sémantiques qu'il contient n'offrent qu'une parcelle de solution au problème complexe de la désambiguïsation, mais la dimension thésaurique et collocationnelle de cette ressource lexicale en font un matériau intéressant permettant d'explorer la structure du lexique en contribuant à la création d'une nouvelle génération d'outils lexicographiques.

18 Zusammenfassung

Im vorliegenden Band werden Aufbau und Verwendung einer lexikalisch-semantischen Datenbank auf der Grundlage des Wörterbuchs Englisch-Französisch von Robert & Collins beschrieben. Dieses war in elektronischer Fassung der Anglistik der Universität Lüttich für nicht-kommerzielle Zwecke zur Verfügung gestellt worden. Die in diesem Band beschriebenen Forschungsarbeiten reihen sich demzufolge nahtlos in die Bemühungen von Wissenschaft und Industrie ein, linguistische Hilfsmittel für Übersetzer, Autoren und Lernende zu entwickeln. Die anhand der Erstellung dieser lexikalisch-semantischen Datenbank aufgezeigten Perspektiven beschränken sich jedoch nicht allein auf die Entwicklung solcher elektronischer Werkzeuge. Vielmehr ermöglicht die Benutzerschnittstelle mit ihren vielseitigen Abfragemöglichkeiten die Benutzung der erweiterten Datenbank für weitergehende lexikalischsemantische Analysen. Hierbei spielt nämlich der Zugang zu großen Datenmengen für die spätere Erarbeitung linguistisch relevanter Verallgemeinerungen eine bedeutende Rolle. Der erste Teil des Bandes stellt die Arbeit in den Kontext der Forschungsarbeiten der letzten fünfzehn Jahre auf dem Gebiet des Erwerbs lexikalischen Wissens, d.h. der Extraktion syntaktischen, semantischen und pragmatischen Wissens, sowohl aus elektronischen Wörterbüchern als auch aus Textdatenbanken. Da durch die Auswertung des Robert & Collins die darin enthaltenen Kollokationen wiederverwendet und leichter zugänglich gemacht werden sollten, wird einerseits der Begriff der Kollokation detailliert analysiert, andererseits werden die verschiedenen Ansätze zur Darstellung der Kollokation dargelegt. Die Meaning&TextTheorie des russischen Sprachwissenschaftlers Igor Mel'öuk, die die theoretische Grundlage der vorliegenden Arbeit bildet, wird ebenfalls eingehend dargestellt. Das dieser Theorie zugrundeliegende komplexe System lexikalischer Funktionen wurde dazu verwendet, die lexikalisch-semantischen Beziehungen zwischen den im Wörterbuch kursiv gedruckten metalinguistischen Angaben und den einschlägigen Einträgen zu identifizieren und anschließend in der Datenbank zu kodieren. Diese kursiv gedruckten Angaben beinhalten eine ganze Reihe von Informationen, die für die Disambiguierung und Auswahl der in einem bestimmten Kontext korrekten Übersetzung unabdingbar sind (Selektionsrestriktionen und Kollokationen mit typischen Objekten, Subjekten, Komplementen usw.). In der hier beschriebenen Fassung ermöglicht die Datenbankstruktur den Zugang zu jeglicher Art von Information, wie etwa englischem Eintrag, Wortart, Basis, Kollokator, Übersetzungen oder Sprachregistern. So lassen sich z.B. die transitiven Verben abfragen, die das Substantiv brake als Objekt nehmen (jam, line, operate, reline) oder aber die intransitiven Verben mit brake als Subjekt (drag, fail, grip, jam, scream, screech, squeal). Um das gleiche Ergebnis ohne Rechner zu erhalten, müßte man die 800 Seiten des gedruckten Wörterbuchs einzeln durchsehen, um das Substantiv brake innerhalb der Mikrostruktur der erwähnten Verben aufzufinden. Zusätzlich wurde die Datenbank von Hand durch lexikalische Funktionen erweitert, um so die Vielfalt der

306 lexikalisch-semantischen Relationen zwischen den Angaben in Kursivschrift und den einschlägigen Einträgen aufzuzeigen. Die systematische Kodierung dieser Beziehungen ergab somit eine feinere Modellierung der semantischen Netze der Datenbank. Dieser Ansatz macht es nunmehr möglich, semantisch verfeinerte Fragen zu stellen, wie z.B.: * Welche Verben können im Englischen oder Französischen dazu benutzt werden, das typische Geräusch von Bremsen auszudrücken? —> Son (brake / frein) = scream / hurler, screech / grincer, squeal / grincer * Gibt es transitive/kausative Verben, die das Auslösen der Funktionsunfähigkeit von Bremsen ausdrücken? -> CausObstr (brake / frein) = jam / bloquer * Welches sind die verschiedenen Teile einer Bremse? -> Part (brake / frein) = drag / sabot, lining / garniture. Die relationale Datenbank wird mit ihren mehr als 70.000 kodierten Einträgen detailliert beschrieben, ebenso die Kodierungsprogramme und die Abfrageschnittstelle. In bestimmten Fällen ließ sich die Zuweisung lexikalischer Funktionen durch die Auswertung rekurrenter Strukturen in bestimmten Mikro-Definitionen teilweise automatisieren (die Zeichenketten noise of oder sound of am Anfang einer Definition zeigen z.B. an, daß die lexikalische Funktion S 0 Son verwendet werden muß, um diejenige semantische Beziehung zwischen zwei Substantiven zu kodieren, bei der das zweite Substantiv das vom ersten verursachte typische Geräusch ausdrückt). Der zweite Teil des Bandes zeigt die Verwendungsmöglichkeiten der lexikalischsemantischen Datenbank auf Basis des Robert & Collins auf. Es werden die Beschränkungen der Meaning&Text-Theorie von Mel'cuk in bezug auf die Sprachdatenverarbeitung ausfuhrlich dargelegt und einige Vorbehalte im Hinblick auf die Ungenauigkeit bestimmter lexikalischer Funktionen zum Ausdruck gebracht. Die verschiedenen Kapitel zur lexikalischen Semantik zeigen den möglichen Einsatz der Datenbank bei der Erarbeitung von Verallgemeinerungen über bestimmte Verbarten (z.B. diejenigen, die ein Geräusch ausdrücken, oder die sogenannten ergativen Verben) oder über die Relation zwischen den lexikalischen Funktionen und bestimmten Arten von Metaphern. Des weiteren werden die Verwendungsmöglichkeiten fur den Sprach- und Übersetzungsunterricht ebenso beleuchtet wie Projekte, die auf die Verwendung der im Wörterbuch enthaltenen Kollokationen für die automatische Disambiguierung im Kontext abzielen. Die Forschung im Bereich der Computerlinguistik und Lexikographie ist immer mehr auf die Entwicklung großer ein- und mehrsprachiger Datenbanken angewiesen. Die in diesem Buch beschriebene Aufarbeitung des Robert & Collins und die darin enthaltenen lexikalisch-semantischen Netze bieten nur eine Teillösung des komplexen Problems der Disambiguierung. Die zusätzliche Erweiterung zum Thesaurus über Kollokationen machen diese lexikalische Ressource jedoch zu einer ergiebigen Quelle für die Auswertung der Lexikonstruktur, wobei sie gleichzeitig die Entwicklung einer neuen Generation lexikographischer Werkzeuge ermöglicht.

19 Bibliography

Aarts, F. (1991): "OALD, LDOCE and COBUILD: Three learner's dictionaries of English compared", in Granger (ed.) Perspectives on the English Lexicon: A Tribute to Jacques Van Roey, Cahiers de l'Institut de Linguistique de Louvain, CILL 17.1-3, pp.221-226. Adamczewski, H. & Keen, D. (1973): Phonétique et phonologie de l'anglais contemporain, Paris, Armand Colin. Adams, V. (1973): An Introduction to Modem English Word-Formation, Longman, London and New York. AGI ( 1962): American Geological Institute - Dictionary of Geological Terms, Garden City, New York, Double Day. AHD ( 1991 ): American Heritage Dictionary, (ed. by W. Morris), Boston, Houghton-Miffl in Company. Ahlswede, T. & Evens, M. (1988a): "Generating a Relational Lexicon from a Machine-Readable Dictionary", in International Journal of Lexicography, 1/3, pp.214-237. Ahlswede, T. & Evens, M. (1988b): "A lexicon for a medical expert system", in Evens (ed.) Relational Models of the Lexicon, Cambridge University Press, pp.97-112. Aisenstadt, E. (1979): "Collocability restrictions in dictionaries", in ITL, 45-46, pp.71-74. Aitchison, J. (1987): Words in the Mind: An Introduction to the Mental Lexicon, Oxford: Basil Blackwell. Alberto, P. & Bennett, P. (eds) (1995): Lexical Issues in Machine Translation, Studies in Machine Translation and Natural Language Processing, Vol.8, European Commission, Luxembourg. Allegranza, V., Bennett, P., Durand, J., Van eynde, F., Humphreys, L., Schmidt, P. & Steiner, E. (1991): "Linguistics for Machine Translation: The Eurotra Linguistic Specifications", in Copeland, Durand, Krauwer, Maegaard (eds) The Eurotra Linguistic Specifications, Studies in Machine Translation and Natural Language Processing, Vol. 1, Commission of the European Communities, Luxembourg, pp. 15-123. Allerton, D.J. (1975): "Deletion and Preform Reduction", in Journal of Linguistics, 11, pp.213-237. Allerton, D.J. (1982): Valency and the English Verb, London and New York: Academic Press. Alonso Ramos, M. & Tutin, A. (1996): "A Classification and Description of the Lexical Functions for the Analysis of their Combinations", in Wanner (ed.) Lexical Functions in Lexicography and Natural Language Processing, Amsterdam and Philadelphia, Benjamins, pp. 147-167. Alshawi, H. (1989): "Analysing the Dictionary Definitions", in Boguraev & Briscoe (eds) Computational Lexicography for Natural Language Processing, Longman, London & New York, pp.153-169. Amsler, R.A. (1980): The Structure of the Merriam-Webster Pocket Dictionary, Ph.D. Thesis, University of Texas at Austin, Austin. Amsler, R.A. (1984): "Machine-Readable Dictionaries", in Williams (ed.) Annual Review of Information Science and Technology, Vol.19, pp.161-209. Amsler, R.A. (1994): "Research Toward the Development of a Lexical Knowledge Base for Natural Language Processing", in Zampolli, Calzolari & Palmer (eds) Current Issues in Computational Linguistics: in Honour of Don Walker, Series Linguistica Computazionale, IX-X, pp.155-175. Antelmi, D. & Roventini, A. (1992): "Semantic relationships within a set of verbal entries in the Italian Lexical Database", in Euralex'90 Proceedings, Barcelona: Biblograf, pp.247-255. Apresyan, Y. (1973): "Regular Polysemy", in Linguistics, 142, pp.5-32. Apresyan, Y., Mel'éuk, I. & Zolkovsky, A. (1969): "Semantics and Lexicography: A New Type of Unilingual Dictionary", in Kiefer (ed.) Studies in Syntax and Semantics, Reidel, Dordrecht-Holland, pp.1-33.

308 Atkins, Β.T. (1991): "Building a Lexicon: The Contribution of Lexicography", in International Journal of Lexicography, 4/3, pp. 167-204. Atkins, B.T. (1993): "Theoretical lexicography and its relation to dictionary making", in Dictionaries: Journal of the Dictionary Society of North America, Number 15, pp.4-44. Atkins, B.T. (1996): "Bilingual Dictionaries: Past, Present and Future", in EURALEX'96 Proceedings, Göteborg University, pp.515-546. Atkins, B.T. & Christ, O. (1995): Case Study: Developing a Dictionary Entry for the Perception Reading of the Verb "survey" with Support of the IMS Corpus Query Tools, DELIS Cookbook, Internal document, Oxford University Press and University of Stuttgart, MS, 81 p. Atkins, B.T. & Duval, A. (1978): Robert & Collins Dictionnaire Français-Anglais, Anglais-Français, Paris: Le Robert/Glasgow: Collins. Atkins, B.T. & Fillmore, C. (1995): "A HyperText Dictionary based on Frame Semantics", paper read at the DELIS Workshop on Approaches to corpus-based dictionary building, University of Stuttgart, 26-27 July 1995. Atkins, B.T., Kegl, J. & Levin, B. (1986): "Explicit and implicit information in dictionaries", Lexicon Project Working Papers 12, Center for Cognitive Science, MIT, Cambridge, MA. Also available as Cognitive Science Laboratory Report 5, Cognitive Science Laboratory, Princeton University, Princeton, NJ. Atkins, B.T., Kegl, J. & Levin, B. (1988): "Anatomy of a Verb Entry: from Linguistic Theory to Lexicographic Practice", in International Journal of Lexicography, 1/2, pp.84-126. Atkins, B.T. & Levin, B. (1991): "Admitting Impediments", in U.Zernik (ed.) Lexical Acquisition Using On-Line Resources to Build a Lexicon, Hillsdale, Lawrence Erlbaum Associates, pp.233-262. Atkins, B.T. & Levin, B. (1995): "Building on a corpus: A linguistic and lexicographical look at some near-synonyms", in International Journal of Lexicography, 8/2, pp.85-114. Atkins, B.T., Levin, B. & Zampolli, A. (1994): "Computational Approaches to the Lexicon: An overview", in Atkins & Zampolli (eds) Computational Approaches to the Lexicon, Oxford University Press, pp. 17-45. Atkins, B.T. & Zampolli, A. (eds) (1994): Computational Approaches to the Lexicon, Oxford University Press. Atkins, B.T., Levin, B. & Song, G. (1996): "Making Sense of Corpus Data: A Case Study", in EURALEX'96 Proceedings, Göteborg University, pp.345-354. Baker, M., Francis, G. & Tognini-Bonelli, E. (eds) (1993): Text and Technology: In Honour of John Sinclair. Amsterdam and Philadelphia: John Benjamins. Bally, C. (1909): Traité de stylistique française, 2 volumes. Barnbrook, G. & Sinclair, J. (1995): "Parsing Cobuild Entries", in Sinclair, Hoelter & Peters (eds) The Languages of Definition: The Formalization of Dictionary Definitions for Natural Language Processing, Studies in Machine Translation and Natural Language Processing, Vol.7, European Commission, Luxembourg, pp. 13-58. Baten, L., Greenman, C. & Vekemans, L. (1993): "Semantic networks in language learning and CALL", in ABLA papers, 15, pp. 1-10. Bates, M., Bobrow, R. & Weischedel, R. (1993): "Critical challenges for natural language processing", in Bates & Weischedel (eds) Challenges in Natural Language Processing, Cambridge University Press, pp.3-34. Bates, M. & Weischedel, R. (eds) (1993a): Challenges in Natural Language Processing, Cambridge University Press. Bates, M. & Weischedel, R. (1993b): "The future of computational linguistics", in Bates & Weischedel (eds) Challenges in Natural Language Processing, Cambridge University Press, pp.283-288. Baugh, S., Harley, A. & Jellis, S. (1996): "The Role of Corpora in Compiling the Cambridge Dictionary of English", in International Journal of Corpus Linguistics, 1/1, pp.39-59. Bauer, D., Segond, F. & Zaenen, A. (1994): Enriching an SGML-Tagged Bilingual Dictionary for Machine-Aided Comprehension, Rank Xerox Research Center Technical Report, MLTT, 11.

309 Bazell, C., Catford, J., Halliday, M. & Hobins, R. (eds) (1966): In memory o/J.R. Firth, London, Longmans. Beckwith, R. & Miller, G. (1990): "Implementing a Lexical Network", in International Journal of Lexicography, 3/4, pp.302-312. Benson, M. (1985): "Collocations and Idioms", in Ilson (ed.) Dictionaries, Lexicography and Language Learning, ELT Documents: 120, Pergamon Press, Oxford, pp.61-68. Benson, M. (1989): "The structure of the collocational dictionary", in International Journal of Lexicography, 2/1, pp. 1-14. Benson, M. (1990): "Collocations and General-purpose Dictionaries", in International Journal of Lexicography, 3/1, pp.23-34. Benson, M. & Benson, E. ( 1993): Russian English Dictionary of Verbal Collocations, Amsterdam and Philadelphia, John Benjamins. Benson, M., Benson, E. & Ilson, R. (1986a): The BBI Combinatory Dictionary of English, Amsterdam and Philadelphia, John Benjamins. Benson, M., Benson, E. & Ilson, R. (1986b): The Lexicographic Description of English, Amsterdam and Philadelphia, John Benjamins. Benson, M., Benson, E., Ilson, R. & Young, R. (1991): Using the BBI: A workbook with exercises for the BBI Combinatory Dictionary of English, Amsterdam and Philadelphia, John Benjamins. Besse, Β. de (ed.) (1992): Phraséologie et Terminologie en Traduction et en Interprétation: Numéro spécial de Terminologie et Traduction, n°2-3, C.E.C., Luxembourg (Proceedings of the "Colloque Anniversaire de FETI", Genève, October 1991). Binon, J. & Verlinde, S. (1992): "Le Dictionnaire d'Apprentissage du Français des Affaires", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings /-//, Fifth EURALEXInternational Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.43-50. Binot, J-L. & Jensen, K. (1993): "A Semantic Expert Using an On-Line Standard Dictionary", in Jensen, Heidorn & Richardson (eds) Natural Language Processing: The PLNLP Approach, Kluwer Academic Publishers, pp. 135-147. Blampain, D. (1993): "Notions et phraséologie. Une nouvelle alliance?" in Terminologies Nouvelles, 10, déc. 1993, pp.43-49. Blampain, D., Petrussa, P. & Van Campenhoudt, M. (1992): "A la recherche d'écosystèmes terminologiques", in L'environnement traductionnel - La station de travail du traducteur de l'an 2001: Journées scientifiques du Réseau thématique de recherche "Lexicologie, Terminologie et Traduction", Actes du colloque (Möns, 25-27 avril 1991), Presses de l'Université du Québec et AUPELF-UREF, pp.273-282. Blum, S. & Levenston, E.A. (1978): "Universals of Lexical Simplification", in Language Learning, 28/2, pp.399-415. Bobrow, D. & Winograd, T. (1977): "An Overview of K.RL, a knowledge representation language", in Cognitive Science, 1/1, pp.275-304. Boguraev, B. (1991): "Building a Lexicon: The Contribution of Computers", in International Journal of Lexicography, 4/3, pp.227-260. Boguraev, B. (1994): "Machine-Readable Dictionaries and Computational Linguistics Research", in Zampolli, Calzolari & Palmer (eds) Current Issues in Computational Linguistics: in Honour of Don Walker, Series Linguistica Computazionale, IX-X, pp.119-154. Boguraev, B. & Briscoe, T. (1989): Computational Lexicography for Natural Language Processing, London and New York, Longman. Boguraev, B„ Briscoe, T., Carroll, J. & Copestake, A. (1992): "Database Models for Computational Lexicography", in EURALEX'90 Proceedings, Barcelona, Biblograf, pp.59-78. Bolinger, D. (1980): Language: The Loaded Weapon, London, Longman Group Ltd. Bouillon, P. & Clas, A. (1993): La Traductique, Collection "Universités Francophones", AUPELF/UREF et Presses de l'Université de Montréal.

310 Bouillon, P. & Viegas, E. (1994): "A Semi-Polymorphic Approach to the Interpretation of Adjectival Constructions: a Cross-Linguistic Perspective", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.36-44. Breidt, L. & Segond, F. (1996): "Compréhension automatique des expressions à mots multiples en français et en allemand", in Clas, A. & Thoiron, P. (eds) Lexicomatique et dictionnairique, Actes des Quatrièmes Journées scientifiques du réseau thématique de recherche "Lexicologie, terminologie, traduction", collection "Actualité scientifique", Beyrouth, AUPELF-UREF et FMA. Breidt, L., Segond, F. & Valetto, G. (1996): "Multiword lexemes and their automatic recognition in texts", COMPLEX'96 Proceedings, Budapest, pp. 19-29. Briscoe, T., Copestake, A. & Boguraev, B. (1990): "Enjoy the paper: lexical semantics via lexicology", in COLINO '90, Vol.2, Helsinki, pp.42-47. Briscoe, T., De Paiva, V. & Copestake, A. (eds) (1993): Inheritance, Defaults and the Lexicon, Cambridge University Press. Byrd, R. (1989): "Discovering Relationships among Word Senses", in Dictionaries in the Electronic Age - Proceedings of the Fifth Annual Conference of the UW Centre for the New Oxford English Dictionary, Oxford, pp.67-79. Byrd, R., Calzolari, N„ Chodorow, M„ Klavans, J., Neff, M. & Rizk, O. (1987): "Tools and Methods for Computational Lexicology", in Computational Linguistics, 13/3-4, pp.219-240. Calzolari, Ν. (1984): "Detecting patterns in a lexical database", in Proceedings of the 10th International Conference on Computational Linguistics, COLING '84, Stanford, California, pp. 170173. Calzolari, Ν. (1988): "The dictionary and the thesaurus can be combined", in Evens (ed.) Relational Models of the Lexicon, Cambridge University Press, pp.75-96. Calzolari, Ν. (1990): "Structure and access in an automated lexicon and related issues", in Linguistica Computazionale, 6/1, pp.139-163. Calzolari, Ν. (1991a): "Lexical Databases and Textual Corpora: Perspectives of Integration for a Lexical Knowledge Base", in Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Hillsdale, Lawrence Erlbaum Associates, pp.191-208. Calzolari, Ν. (1991b): "Acquiring and representing semantic information in a Lexical Knowledge Base", Pustejovsky & Bergler (eds) Lexical Semantics and Knowledge Representation, University of California, Berkeley, pp.188-197. Calzolari, Ν. (1994): "Issues for Lexicon Building", in Zampolli, Calzolari & Palmer (eds) Current Issues in Computational Linguistics: in Honour of Don Walker, Series Linguistica Computazionale, IX-X, pp.267-281. Calzolari, Ν. & Bindi, R. (1990): "Acquisition of Lexical Information from a Large Textual Italian Corpus", in Proceedings of COLING'90, Helsinki. Calzolari, Ν., Federici, S. Montemagni, S. & Peters, C. (1995): "Extracting, Representing and Using Syntactic-Semantic Information from Cobuild Definitions", in Sinclair, Hoelter & Peters (eds) The Languages of Definition: The Formalization of Dictionary Definitions for Natural Language Processing, Studies In Machine Translation and Natural Language Processing, Vol.7, European Commission, Luxembourg, pp.59-148. Calzolari, Ν., Peters, C. & Roventini, A. (1990): Computational Model of the Dictionary Entry Preliminary Report, ACQUILEX Deliverable, Esprit Project, BRA 3030, Università di Pisa, CNR. Calzolari, Ν. & Picchi, E. (1988): "Acquisition of semantic information from an on-line dictionary", in COLING'88, Budapest, pp.87-92. Calzolari, Ν. & Zampolli, A. (1991): "Lexical Databases and Textual Corpora: a Trend of Convergence between Computational Linguistics and Linguistic Computing", in Hockey & Ide (eds) Research in Humanities Computing, Oxford University Press, Oxford, pp.273-307. Carter, R. (1987): Vocabulary: Applied Linguistics Perspectives, Unwin Hyman, London. Carter, R. & McCarthy, M. (eds) (1988): Vocabulary and Language Teaching. Longman, Harlow.

311 Chen, Ζ. (1990): "The vocables TEACH and TEACHING: Two families of lexical entries for an Explanatory Combinatorial Dictionary of English", in Steele (ed.) Meaning-Text Theory, Linguistics, Lexicography and Implications, University of Ottawa Press, pp. 159-174. Chodorow, M., Byrd, R. & Heidorn, G. (1985): "Extracting Semantic Hierarchies from a Large OnLine Dictionary", in Proceedings of the Association for Computational Linguistics, pp.299-304. Chomsky, N. (1965): Aspects of the theory of syntax, Cambridge, The MIT Press. Choueka, Y. (1988): "Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases", in Proceedings of the RIAO'88 Conference on UserOriented Content-Based Text and Image Handling, Cambridge, MA, pp. 1-15. Church, K„ Gale, W., Hanks, P. & Hindle, D. (1991): "Using Statistics in Lexical Analysis", in Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Hillsdale, Lawrence Erlbaum Associates, pp.115-164. Church, K„ Gale, W„ Hanks, P., Hindle, D. & Moon, R. (1994): "Lexical Substitutability", in Atkins & Zampolli (eds) Computational Approaches to the Lexicon, Oxford University Press, pp. 153-177. Church, K. & Hanks, P. (1990): "Word Association Norms, Mutual Information and Lexicography", in Computational Linguistics, 16/3, pp.22-29. Clas, A. (ed.) (1992): Le mot, les mots, les bons mots - Word, words, witty words: Hommage à Igor Mel'cuk à l'occasion de son soixantième anniversaire, Les Presses de l'Université de Montréal. Clas, A. & Thoiron, P. (eds) (1996): Lexicomatique et dictionnairique, Actes des Quatrièmes Journées scientifiques du réseau thématique de recherche "Lexicologie, terminologie, traduction", collection "Actualité scientifique", Beyrouth, AUPELF-UREF et FMA. Clark, H. & Clark, E. (1978): "Universels, Relativity, and Language Processing", in Greenberg (ed.) Universals of Human Language, vol. 1: Method and Theory, Stanford, Stanford University Press, pp.225-277. Clear, J. (1993): "From Firth Principles: Computational Tools for the Study of Collocation", in Baker, Francis & Tognini-Bonelli (eds) Text and Technology - In Honour of John Sinclair, Amsterdam and Philadelphia, Benjamins, pp.271-292. Clear, J. (1994): " I Can't See the Sense in a Large Corpus", in COMPLEX'94 - Proceedings of the 3rd International Conference on Computational Lexicography, Linguistics Institute, Budapest, pp.33-48. Clear, J., Fox, G„ Francis, G., Krishnamurthy, R. & Moon, R. (1996): "COBUILD: The State of the Art", in International Journal of Corpus Linguistics, 1/2, pp.303-314. Cohen, B. (1986): Lexique de cooccurrents; Bourse-Conjoncture économique, Montréal, Linguatech. Cohen, B.(1993): "Méthodes de repérage et de classement des cooccurrents lexicaux", in Terminologie et Traduction, n°2/3-1992, Commission des Communautés européennes, Luxembourg, pp.505-511. Comrie, B. (1976): Aspect, Cambridge University Press. Cop, M. (1988): "The function of collocations in dictionaries", in Magay & Zigany (eds) BudaLEX'88 Proceedings: Papers from the EURALEX Third International Congress, Budapest, Akademiai Kiado, pp.35-46. Copestake, A. & Briscoe, T. (1991): "Lexical Operations in a Unification-based Framework", in Pustejovsky & Bergler (eds) Lexical Semantics and Knowledge Representation, Berkeley, University of California, pp.88-101. Corréard, M-H. & Grundy, V. (eds) (1994): The Oxford-Hachette French Dictionary (French-English, English-French), Hachette and Oxford University Press. Cowie, A.P. (1981): "The treatment of collocations and idioms in learners'dictionaries", in Applied Linguistics, 2/3, pp.223-235. Cowie, A.P. (1984): "EFL Dictionaries: past achievements and present needs", in R.R.K. Hartmann (ed.) LEXETER'83 Proceedings, Tübingen, Niemeyer. Cowie, A.P. (1986): "Collocational Dictionaries - A comparative view", in Murphy (ed.) Fourth Joint Anglo-Soviet Seminar, London: British Council, pp.61-69.

312 Cowie, A.P. (1988): "Stable and creative aspects of vocabulary use", in Carter & McCarthy (eds) Vocabulary and Language Teaching, Longman, Harlow, pp. 126-139. Cowie, A.P. (ed.) (1989): Oxford Advanced Learner's Dictionary of Current English, 4th edition, Oxford University Press. Cowie, A.P. (1990): "Language as words: lexicography", in Collinge (ed.) An Encyclopedia of Language, London, Routledge, pp.671-700. Cowie, A.P. (1991): "Multiword Units in Newspaper Language", in Granger (ed.) Perspectives on the English Lexicon: A Tribute to Jacques Van Roey, Cahiers de l'Institut de Linguistique de Louvain, CILL 17.1-3, pp.101-116. Cowie, A.P. (1992): "Multiword Lexical Units and Communicative Language Teaching", in Arnaud & Béjoint (eds) Vocabulary and Applied Linguistics, London, Macmillan, pp. 1-12. Cowie, A.P. (1994): "Phraseology", in Asher & Simpson (ed.) The Encyclopedia of Language and Linguistics, Vol.6, Oxford and New York, Pergamon, pp.3168-3171. Cowie, A.P. (1997): "Phraseological dictionaries: some East-West comparisons", in Cowie, A.P. (ed.) Phraseology: Theory, Analysis and Applications, Oxford University Press, Oxford (forthcoming). Cowie, A.P. & Howarth, P. (1995): "Phraseological Competence and Written Proficiency", paper presented at the BAAL (British Association of Applied Linguistics) annual meeting. Cowie, A.P. & Mackin, R. (1975): Oxford Dictionary of Current Idiomatic English, Vol.1 (Second edition, retitled Oxford Dictionary of Phrasal Verbs, 1993), Oxford, Oxford University Press. Cowie, A.P., Mackin, R. & McCaig, I.R. (1983): Oxford Dictionary of Current Idiomatic English, Vol.2 (retitled Oxford Dictionary of English Idioms, 1993), Oxford, Oxford University Press. Crowther, J. (ed.) (1995): Oxford Advanced Learner's Dictionary of Current English, 5th edition, Oxford University Press. Cruse, D. (1986): Lexical Semantics, Cambridge University Press. Danlos, L. & Samvelian, P. (1992): "Translation of the Predicative Element of a Sentence: category switching, aspect and diathesis", in TMI-92: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, Montreal, pp.21-34. Descamps, J-L. (1994): "Tournoi pour l'accomodement des dictionnaires de collocations", in META, 39/4, pp.561-575. Dister, A. & Bourseau, M. (1997): Conversion du dictionnaire bilingue français-anglais Robert & Collins en base de données, DEFI Technical Report, CIPL, Liège (available from the following URL: http://engdepl.philo.ulg.ac.be/michiels/defi.htm). Dixon, R.M.W. (1991): A New Approach to English Grammar on Semantic Principles, Oxford University Press. Dostie, G., Mel'cuk, I. & Polguère, A. (1992): "Méthodologie d'élaboration des entrées lexicales du Dictionnaire Explicatif et Combinatoire (REPROCHER, REPROCHE et IRREPROCHABLE)", in International Journal of Lexicography, 5/3, pp. 165-198. Dufour, N. (1997): Turning the Oxford-Hachette SGML tape into a DEFI dictionary, DEFI Technical Report, CIPL, Liège (available from: http://engdepl.philo.ulg.ac.be/michiels/defi.htm). Duval, A. (1986): "La métalangue dans les dictionnaires bilingues", in Lexicographica, 2, pp.93-100. Duval, A. (1990): "Nature et valeur de la traduction dans les dictionnaires bilingues", in Cahiers de Lexicologie, 56-57, pp.27-33. Duval, A. (1991): "L'équivalence dans le dictionnaire bilingue", in Hausmann, Reichmann, Wiegand & Zgusta (eds) Wörterbücher, Dictionaries, Dictionnaires: An International Encyclopedia of Lexicography, Berlin and New York, Walter de Gruyter, pp.2817-2824. Engwall, G. (1994): "Not Chance but Choice: Criteria in Corpus Creation", in Atkins & Zampolli (eds) Computational Approaches to the Lexicon, Oxford University Press, pp.49-82. Escalier, M-C. & Fournier, C. (1992): "Towards a Notional Representation of Meaning in the Meaning-Text Model", in Haenelt & Wanner (eds) Proceedings of the International Workshop on the Meaning-Text Theory, Darmstadt, Arbeitspapiere der GMD, 671, pp. 143-162. E T C Text Retrieval Software (a.k.a. WordCruncher), Brigham Young University, Provo, Utah, U S A .

313 Evens, M. (ed.) (1988): Relational Models of the Lexicon, Cambridge University Press. Evens, M., Litowitz, Β., Markowitz, J., Smith, R. & Werner, O. (1980): Lexical-semantic relations: a comparative survey, Edmonton, Linguistic Research, Inc. Fellbaum, C. (1990): "English Verbs as a Semantic Net", in International Journal of Lexicography, 3/4, pp.278-301. Fernando, C. & Flavell, R. (1981): On Idiom: Critical Views and Perspectives, Exeter Linguistic Studies, Vol.5, University of Exeter. Fernando, C. (1996): Idioms and Idiomaticity, Oxford University Press, Oxford. Fillmore, C. (1968): "The Case for Case", in Bach & Harms (eds) Universals in Linguistic Theory, Holt, Rinehart and Winston, New York, pp. 1-88. Fillmore, C. (1982a): "Frame Semantics", in The Linguistic Society of Korea (ed.) Linguistics in the Morning Calm, Seoul, Hanshin, pp. 111-137. Fillmore, C. (1982b): "Monitoring the Reading Process", in The Linguistic Society of Korea (ed.) Linguistics in the Morning Calm, Seoul, Hanshin, pp.329-348. Fillmore, C. (1991): Seminar on Semantic Interpretation and Construction Grammar, Summer School in Computational Linguistics: Formal and Computational Models of Meaning, Charles University, Prague. Fillmore, C. (1995): "A Frame Semantics description of terms of health and sickness", paper read at the DELIS Workshop on Approaches to corpus-based dictionary building, University of Stuttgart, 26-27 July 1995. Fillmore, C. & Atkins, B.T. (1994): "Starting where the Dictionaries Stop: the Challenge of Corpus Lexicography", in Atkins & Zampolli (eds) Computational Approaches to the Lexicon, Oxford University Press, pp.349-393. Fillmore, C., Atkins, B.T. & Ostler, N. (1994): "Perception and Speech Act Vocabulary - Lexical Semantics as Frame Semantics", in Ostler (ed.) A Corpus-Based Syntactic and Lexical-Semantic Description of Lexical Items: Structured Collection of Data and Exemplary Description of Prototypical Items. Towards a Methodology for Corpus-Based Lexical Description Leading to Reusable Lexical Specifications. Deliverable D-II of the DELIS (LRE 61.034) Project, London, Stuttgart, Luxembourg, pp. 10-18. Fink, S. (1977): Aspects of a Pedagogical Grammar Based on Case Grammar and Valence Theory, Tiibingen, Niemeyer. Firth, J.R. (1968): Selected Papers of J. R. Firth 1952-59, edited by F.R. Palmer, London and Harlow, Longman's Linguistics Library. Flaubert, G., Dictionnaire des idées reçues, Edition diplomatique des trois manuscrits de Rouen/ G. Flaubert, Caminiti (ed.), Napoli, Liguori Editore; Paris, AG. Nizet, 1966, 345p. Fodor, J. (1970): "Three reasons for not deriving Kill from Cause to Die", in Linguistic Inquiry, 1, pp.429-438. Fontenelle, T. (1991): "Grammatical Codes and Definition Patterns: A Closer Look at a Computerized Dictionary", in Kiefer (ed.) Computational Lexicography, [Balatonfiired, Hungary 8-11 September 1990], Research Institute for Linguistics, Hungarian Academy of Sciences, pp.73-79. Fontenelle, T. (1992a): "Collocation Acquisition from a Corpus or from a Dictionary: a Comparison", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings /-//, Fifth EURALEX International Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.220-228. Fontenelle, T. (1992b): "Co-occurrence Knowledge, Support Verbs and Machine-Readable Dictionaries", in Kiefer, Kiss & Pajzs (eds) Papers in Computational Lexicography COMPLEX'92, Linguistics Institute, Hungarian Academy of Sciences, Budapest, pp.137-145. Fontenelle, T. (1992c): "Using a bilingual computerized dictionary to retrieve support verbs and combinatorial information", in Acta Linguistica Hungarica, Budapest, 41/1-4, pp. 109-121. Fontenelle, T. (1992d): "Automatic Extraction of Lexical-Semantic Relations from Dictionary Definitions", in EURALEX'90 Proceedings, Barcelona, Biblograf, pp.89-103.

314 Fontenelle, T. (1993): "The Relationship between Sound and Meaning in a Lexical Database", in ITL: Review of Applied Linguistics, Leuven, 101-102, pp.41-52. Fontenelle, T. (1994a): "Towards the construction of a collocational database for translation students", in META, Presses de l'Université de Montréal, 39/1, pp.47-56. Fontenelle, T. (1994b): "Using lexical functions to discover metaphors", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.271-278. Fontenelle, T. (1996a): "Using a lexical-semantic database to enhance corpus-based collocation extraction", paper read at the 2nd International Symposium on Phraseology, Moscow Association of Applied Linguistics, Moscow State University. Fontenelle, T. (1996b): "Ergativity, collocations and lexical functions", in EURALEX'96 Proceedings, Göteborg University, pp.209-222. Fontenelle, T. (1996c): "Structural Regularities vs. Definition Patterns: A Riposte to Boguraev", in ITL: Review of Applied Linguistics, Leuven, 113-114, pp.321-334. Fontenelle, T. (1997): "Discovering significant lexical functions in dictionary entries", in Cowie, A.P. (ed.) Phraseology: Theory, Analysis and Applications, Oxford University Press, Oxford (forthcoming). Fontenelle, T., Adriaens, G. & De Braekeleer, G. (1994a): "The Lexical Unit in the METAL MT System", in Machine Translation, Kluwer Academic Publishers, 9, pp. 1-19. Fontenelle, T., Briils, W., Jansen, J., Thomas, L., Vanallemeersch, T. (1994b): "Survey of tools for the extraction of collocations from dictionaries and corpora", ms, DECIDE Deliverable D-la, Liège/Luxembourg, 81 p. Fontenelle, T., Gérardy, C., Alexandre, L., Thomas, L., Vanallemeersch, T. & Jansen, J. (1996): "Building a bilingual database of support verbs", in Thelen (ed.) Translation and Meaning Part 3 Proceedings of the Maastricht-Lodz International Duo Colloquium on Translation and Meaning, Maastricht, pp. 187-203. Fontenelle, T. & Vanandroye, J. (1989): "Retrieving ergative verbs from a lexical database", in Dictionaries: Journal of the Dictionary Society of North America, Number 11, pp.11-39. Francis, G. & Sinclair, J. (1994): " Ί Bet He Drinks Carling Black Label': A Riposte to Owen on Corpus Grammar", in Applied Linguistics, 15/2, pp. 190-200. Fraser, Β. (1970): "Idioms within a transformational grammar", in Foundations of Language, 6, pp.2242. Frawley, W. (1988a): "New Forms of Specialized Dictionaries", in International Journal of Lexicography, 1/3, pp.189-213. Frawley, W. (1988b): "Relational Models and Metascience", in Evens (ed.) Relational Models of the Lexicon, Cambridge University Press, pp.335-372. Gentilhomme, Y. (1992): "Panorama sur le Dictionnaire Explicatif et Combinatoire: Retombées Pédagogiques", in Mel'cuk, Arbatchewsky-Jumarie, Iordanskaja & Mantha (eds) Dictionnaire explicatif et combinatoire du français contemporain, Vol.III, Montréal, Les Presses de l'Université de Montréal, pp.95-120. Geradon, C. (1994): How to find equivalents of English ergative verbs in German, mémoire de licence, University of Liège, mimeographed. Gerardy, C., Vanallemeersch, T. & Barcena, E. (1995): "Lexicon for standardized medical diagnosis sublanguage in English", ANTHEM Deliverable D8-1, University of Liège. Greenbaum, S. (1970): Verb-Intensifier Collocations in English, The Hague and Paris, Mouton. Grefenstette, G. (1994a): "Corpus-Derived First, Second and Third-Order Word Affinities", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.279-290. Grefenstette, G. (1994b): Explorations in Automatic Thesaurus Discovery, Kluwer Academic Press, Boston. Grenfenstette, G. & Teufel, S. (1995): "Corpus-based method for automatic identification of support verbs for nominalizations", in EACL'95 Proceedings, Dublin, Association for Computational Linguistics.

315 Grefenstette, G., Heid, U„ Schulze, B.M., Fontenelle, T., Gerardy.C. (1996): "The DECIDE Project: Multilingual Collocation Extraction", in EURALEX'96 Proceedings, Göteborg University, pp.93107. Grimes, J. (1990): "Inverse Lexical Functions", in Steele, J. (ed.) Meaning-Text Theory: Linguistics, Lexicography and Implications, University of Ottawa Press, pp.350-364. Gross, M. (1981): "Les bases empiriques de la notion de prédicat sémantique", Langages 63, pp.7-52. Gross, M. (1994): "Constructing Lexicon Grammars", in Atkins & Zampolli (eds) Computational Approaches to the Lexicon, Oxford University Press, pp.213-263. Gross, D. & Miller, K. (1990): "Adjectives in WordNet", in International Journal of Lexicography, 3/4, pp.265-277. Guthrie, J., Guthrie, L., Wilks, Y. & Aidinejad, H. (1991): "Subject-Dependent Co-occurrence and Word Sense Disambiguation", in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. Berkeley CA, June 1991, pp.146-152. Haenelt, K. & Wanner, L. (1992): Proceedings of the International Workshop on the Meaning-Text Theory, Darmstadt, Arbeitspapiere der GMD, 671. Halliday, M. (1966): "Lexis as a linguistic level", in Bazell, Catford, Halliday & Robins (eds) In Memory ofJ.R. Firth, Longmans, London, pp. 148-162. Halliday, M. (1967-1968): "Notes on Transitivity and Theme in English", in Journal of Linguistics, 3 (1967), pp. 37-81; pp.199-244; 4 (1968), pp.179-215. Halliday, M. & Hasan, R. (1976): Cohesion in English, London, Longman. Hanks, P. (1987): "Definitions and Explanations", in Sinclair (ed.) Looking Up, London and Glasgow, Collins ELT, pp.116-136. Hausmann, F.J. (1979): "Un dictionnaire des collocations est-il possible?", in Travata de Linguistique et de Littérature, 17/1, pp. 187-195. Hausmann, F.J. (1985): "Kollokationen im Deutschen Wörterbuch. Ein Beitrag zur Theorie des Lexikographischen Beispiels", in Bergenholtz & Mugdan (eds) Lexikographie und Grammatik, Niemeyer, Tübingen, pp.118-129. Hausmann, F.J. (1989): "Le dictionnaire des collocations", in Hausmann, Reichmann, Wiegand & Zgusta (eds) Wörterbücher, Dictionaries, Dictionnaires: An International Encyclopedia of Lexicography, Berlin and New York, Walter de Gruyter, pp. 1010-1019. Heid, U. (1992a): "Décrire les collocations: deux approches lexicographiques et leur application dans un outil informatisé", in de Bessé (ed.) Terminologie et Traduction, n°2-3, C.E.C., Luxembourg (Proceedings of the "Colloque Anniversaire de 1ΈΤΙ", Genève, October 1991), pp.523-548. Heid, U. (1992b): "Notes on the use of lexical functions for the description of collocations in an NLP lexicon", in Haenelt & Wanner (eds) Proceedings of the International Workshop on the MeaningText Theory, Darmstadt, Arbeitspapiere der GMD, 671, pp.217-229. Heid, U. (1994a): "Access to collocational information in dictionaries", Input paper for the DECIDE deliverable D2a, MS, Stuttgart. Heid, U. (1994b): "On Ways Words Work Together - Topics in Lexical Combinatorics", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.226-257. Heid, U. (1996): "Using Lexical Functions for the Extraction of Collocations from Dictionaries and Corpora", in Wanner (ed.) Lexical Functions in Lexicography and Natural Language Processing, Amsterdam and Philadelphia, Benjamins, pp.115-146. Heid, U. & Freibott, G. (1991): "Collocations dans une base de données terminologique et lexicale", in META: Special Issue on Terminology, 36/1, pp.77-91. Heid, U. & McNaught, J. (eds) (1991): "Eurotra-7 Study: Feasibility and Project Definition Study on the Reusability of Lexical and Terminological Resources in Computerized Applications", EUROTRA-7 Final Report, Stuttgart. Heid, U., Martin, W. & Posch, I. (1991): Feasibility of Standards for the collocational description of lexical items, Stuttgart & Amsterdam, EUROTRA-7 Study, Document DOC-9/4.

316 Heid, U. & Raab, S. (1989): "Collocations in Multilingual Generation", in Proceedings of EACL '89, Manchester, Association for Computational Linguistics, pp. 130-136. Herbst, T. (1987): "A Proposal for a Valency Dictionary of English", in Ilson (ed.) A Spectrum of Lexicography, Amsterdam and Philadelphia, John Benjamins, pp.29-47. Heylen, D. (1993): Collocations - Final Report of the ET-10/75 Study on the Lexicalisation of Semantic Operations, Utrecht University, Stichting Taaltechnologie. Heylen, D., Maxwell, K. & Armstrong-Warwick, S. (1993): "Collocations, Dictionaries and MT", in AAAI Proceedings: Building Lexicons for Machine Translation, 1993 Spring Symposium Series, Stanford University, pp.69-80. Heylen, D. & Maxwell, K. (1994): "Lexical Functions and the Translation of Collocations", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.298-305. Hornby, A.S. (1954): Guide to Patterns and Usage in English, Oxford University Press. Howarth, P. (1993): "A Phraseological Approach to Academic Writing", in Blue (ed.) Language, Learning and Success: Studying through English, London, Macmillan. Howarth, P. (1997): Phraseology in English Academic Writing, Lexicographica Series Maior, 75, Max Niemeyer Verlag, Tübingen. Ide, Ν., Le Maitre, J. & Veronis, J. (1994): "Outline of a Model for Lexical Databases", in Zampolli, Calzolari & Palmer (eds) Current Issues in Computational Linguistics: in Honour of Don Walker, Series Linguistica Computazionale, IX-X, pp.283-320. Ide, N., Veronis, J., Warwick-Armstrong, S. & Calzolari, Ν. (1992): "Principles for encoding machinereadable dictionaries", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings /-//, Fifth EURALEX International Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.239-246. Ilgenfritz, P., Stephan-Gabinel, N. & Schneider, G. (1989): Langenscheidts Kontextwörterbuch Französisch-Deutsch, Langenscheidt, Berlin-München. Ilson, R. & Mel'öuk, I. (1989): "English BAKE Revisited (BAKE-ing an ECD)", in International Journal of Lexicography, 2/4, pp.325-345. Iordanskaya, L., Kim, M. & Polguère, Α. (1996): "Some Procedural Problems in the Implementation of Lexical Functions for Text Generation", in Wanner (ed.) Lexical Functions in Lexicography and Natural Language Processing, Amsterdam and Philadelphia, Benjamins, pp.279-297. Iris, M.A., Litowitz, B. & Evens, M. (1988a): "Moving towards literacy by making definitions", in International Journal of Lexicography, 1/3, pp.238-252. Iris, M.A., Litowitz, B. & Evens, M. (1988b): "Problems of the part-whole relation", in Evens (ed.) Relational Models of the Lexicon, Cambridge University Press, pp.261-288. Jackson, H. (1988): Words and their Meaning, London and New York, Longman. Jansen, J. (1989): "Apport contrastif des dictionnaires généraux de la langue au problème de l'indexation automatique dans le discours techno-scientifique", in META, 34/3, pp.412-427. Jansen, J. & Fontenelle, T. (1994): Short description of an implementation of the Robert & Collins English-French dictionary under database format, in Deliverable D-2a of the DECIDE Project, Liège. Justeson, J. & Katz, S. (1991a): "Co-occurrence of Antonymous Adjectives and their Context", in Computational Linguistics, 17/1, pp. 1-19. Justeson, J. & Katz, S. (1991b): "Redefining Antonymy: The Textual Structure of a Semantic Relation", in Using Corpora - Proceedings of the 7th Annual Conference of the UW Centre for the New OED and Text Research, Oxford, pp.138-153. Katz, J. & Fodor, J. (1963): "The structure of a semantic theory", in Language, 39, pp. 170-210. Katz, B. & Levin, B. (1988): "Exploiting Lexical Regularities in Designing Natural Language Systems", in Proceedings of the 12th International Conference on Computational Linguistics, Budapest, pp.316-323. Kilgarriff, A. (1992): Polysemy, Ph.D. Thesis, Cognitive Science Research Paper, Number 261, University of Sussex, Brighton.

317 Kipfer, Β. (1986): "Investigating an Onomasiological Approach to Dictionary Material", in Dictionaries: Journal of the Dictionary Society of North America, Number 8, pp.55-64. Klavans, J. (1988): "Building a Computational Lexicon Using Machine-Readable Dictionaries", in Magay & Zigany (eds) BudaLEX'88 Proceedings; Papers from the EURALEX Third International Congress, Budapest, Akademiai Kiado, pp.265-279. Klavans, J. (1994): "Visions of the Digital Library: Views on Using Computational Linguistics and Semantic Nets in Information Retrieval", in Zampolli, Calzolari & Palmer (eds) Current Issues in Computational Linguistics: in Honour of Don Walker, Series Linguistica Computazionale, IX-X, pp.227-236. Klavans, J., Chodorow, M. & Wacholder, Ν. (1990): "From dictionary to knowledge base via taxonomy", in Electronic Text Research: Proceedings of the 6th Annual Conference of the University of Waterloo Centre for the New OED, Waterloo, Canada, pp.110-132. Klavans, J., Chodorow, M. & Wacholder, Ν. (1993): "Building a Knowledge Base from Parsed Definitions", in Jensen, Heidorn & Richardson (eds) Natural Language Processing: The PLNLP Approach, Kluwer Academic Publishers, pp.119-133. Klavans, J. & Tzoukerman, E. (1990): "The BICORD System: Combining lexical information from bilingual corpora and machine-readable dictionaries", in Proceedings of the 13th International Conference ón Computational Linguistics - Coling'90, Helsinki, pp. 174-179. Kleinmann, Η. (1977): "Avoidance Behaviour in Adult Second Language Acquisition", in Language Learning, 27/1, pp.93-107. Knowles, F. (1993): "Review of Kozlowska: English Adverbial Collocations", in International Journal of Lexicography, 6/4, pp.300-304. Kozlowska, C.D. (1991): English Adverbial Collocations, Warszawa, PWN. Kozlowska, C.D. & Dzierzanowska, H. (1982): Selected English Collocations (2nd edition, 1993), Warszawa, PWN. Lakoff, G. (1993): "The Syntax of Metaphorical Semantic Roles", in Pustejovsky (ed.) Semantics and the Lexicon, Dordrecht, Kluwer Academic Publishers, pp.27-36. Lakoff, G. & Johnson, M. (1980): Metaphors We Live By, The University of Chicago Press. Laine, C. (1993): Vocabulaire combinatoire de la CFAO mécanique, Ottawa, Secrétariat d'Etat. Laufer, Β. (1992): "Corpus-based vs. lexicographer examples in comprehension and production of new words", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings /-//, Fifth EURALEX International Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.71-76. Lee, W. & Evens, M. (1996): "Generating Cohesive Text Using Lexical Functions", in Wanner (ed.) Lexical Functions in Lexicography and Natural Language Processing, Amsterdam and Philadelphia, Benjamins, pp.299-306. Leed, R. & Nakhimovsky, A. (1990): "Lexical Functions and Language Learning", in Steele (ed.) Meaning-Text Theory, Linguistics, Lexicography and Implications, University of Ottawa Press, pp.365-375. Lehrer, A. (1974): Semantic Fields and Lexical Structure, London, Elsevier. Levin, B. (1987): "Approaches to Lexical Semantic Representation", Lexicon Project Working Paper, MIT, 49p. Levin, B. (1991): "Building a Lexicon: The Contribution of Linguistics", in International Journal of Lexicography, 4/3, pp.205-226. Levin, B. (1993): English Verb Classes and Alternations - A Preliminary Investigation, Chicago and London, the University of Chicago Press. Lewis, M. (1993): The Lexical Approach: The State of ELT and a Way Forward, Hove, Language Teaching Publications. Liang, S.Q. (1991): "A propos du dictionnaire français-chinois des collocations françaises", in Cahiers de Lexicologie, LIX, pp. 151-167.

318 Liang, S.Q. (ed.) (in press): Dictionnaire de collocations français-chinois, University of Zhongshan, Guangzhou. Lipka, L. (1972): Semantic Structure and Word Formation, Munich, Fink. Lipka, L. (1990): An Outline of English Lexicology, Tübingen, Max Niemeyer Verlag. Long, T.H. & Summers, D. (eds) (1979): Longman Dictionary of English Idioms, London, Longman. Lyons, J. (1968): Introduction to Theoretical Linguistics, Cambridge University Press. Lyons, J. (1977): Semantics, Vol.1 and II, Cambridge University Press. Machonis, P. (1991): "The support verb 'make'", in Computational Lexicography, [Balatonfüred, Hungary 8-11 September 1990], Research Institute for Linguistics, Hungarian Academy of Sciences, pp.141-153. Mackenzie, I. & Mel'öuk, I. (1988): "Crossroads of Obstetrics and Lexicography: A Case Study, in International Journal of Lexicography, 1/2, pp. 71-83. Mackin, R. (1978): "On Collocations: 'Words shall be known by the company they keep'", in Strevens (ed.) In Honour of A S Hornby. Oxford University Press, pp. 149-165. Mantha, S. & Mel'òuk, I. (1988): "Le Champ Lexical 'Phénomènes Atmosphériques' - Formulation des Définitions", in Mel'òuk, Arbatchewsky-Jumarie, Dagenais, Elnitsky, Iordanskaja, Lefebvre & Mantha (eds) Dictionnaire explicatif et combinatoire du français contemporain, Vol.II, Montréal, Presses de l'Université de Montréal, pp.41-47. Marchand, H. (1969): The Categories and Types of Present-Day English Word-Formation, 2nd ed., C.H. Beck, München. Martin, J. (1991): "Representing and Acquiring Metaphor-Based Polysemy", in Zernik (ed.) Lexical Acquisition: Using On-Line Resources to Build a Lexicon, Hillsdale, Lawrence Erlbaum Associates, pp.389-415. Martin, W. (1992): "Remarks on Collocations in Sublanguage", in de Bessé (ed.) Terminologie et Traduction, n°2-3, C.E.C., Luxembourg (Proceedings of the "Colloque Anniversaire de FETI", Genève, October 1991), pp.157-164. McArthur, T. (1981): Longman Lexicon of Contemporary English, Longman, London. McArthur, T. (1986): Worlds of Reference, Cambridge University Press. McCarthy, M. (1991): "Lexis and Lexicology", in Malmkjaer (ed.) The Linguistics Encyclopedia, London and New York, Routledge, pp.298-305. McCarthy, M. & O'Dell, F. (1994): English Vocabulary in Use, Cambridge University Press. Mel'öuk, I. (1978a): "A new kind of dictionary and its role as a core component of automatic text processing systems", in T.A. Informations, Number 2, pp.3-8. Mel'öuk, I. (1978b): "Théorie de langage, théorie de traduction", in META, 23/4, pp.271-302. Mel'öuk, I. (1988): "Semantic Description of Lexical Units in an Explanatory Combinatorial Dictionary: Basic Principles and Heuristic Criteria", in International Journal of Lexicography, 1/3, pp. 165-188. Mel'öuk, I. (1989): "Semantic Primitives from the Viewpoint of the Meaning-Text Linguistic Theory", in Quaderni di Semantica, 10/1, pp.65-102. Mel'öuk, I. (1992a): "Paraphrase et lexique: la théorie Sens-Texte et le Dictionnaire Explicatif et Combinatoire du Français Contemporain", in Mel'öuk et al. (eds) Dictionnaire Explicatif et Combinatoire du Français Contemporain: Recherches Lexico-Sémantiques III, Les Presses de l'Université de Montréal, Montréal, pp.9-58. Mel'öuk, I. (1992b): "La phraséologie et son rôle dans l'enseignement/apprentissage d'une langue étrangère", unpublished manuscript. Mel'öuk, I. (1997): "Collocations and Lexical Functions", in Cowie, A.P. (ed.) Phraseology: Theory, Analysis and Applications, Oxford University Press, Oxford (forthcoming). Mel'öuk, I., Arbatchewsky-Jumarie, N., Dagenais, L., Elnitsky, L., Iordanskaja, L., Lefebvre, M.N., Mantha, S. (1988): Dictionnaire Explicatif et Combinatoire du Français Contemporain: Recherches Lexico-Sémantiques II, Les Presses de l'Université de Montréal, Montréal.

319 Mel'éuk, I., Arbatchewsky-Jumarie, Ν., Elnitsky, L., Iordanskaja, L. & Lessard, A. (1984): Dictionnaire Explicatif et Combinatoire du Français Contemporain: Recherches Lexico-Sémantiques I, Les Presses de l'Université de Montréal, Montréal. Mel'éuk, I., Arbatchewsky-Jumarie, N., Iordanskaja, L., Mantha, S. (1992): Dictionnaire Explicatif et Combinatoire du Français Contemporain: Recherches Lexico-Sémantiques III, Les Presses de l'Université de Montréal, Montréal. Mel'òuk, I., Clas, A. & Polguère, A. (1995): Introduction à la lexicologie explicative et combinatoire, Collection "Universités francophones", AUPELF-UREF, Duculot, Louvain-la-Neuve. Mel'cuk, I., Iordanskaja, L. & Arbatchewsky-Jumarie, N. (1981): "Un nouveau type de dictionnaire: le dictionnaire explicatif et combinatoire du français contemporain", in Cahiers de Lexicologie, 38/1, pp.3-34. Mel'cuk, I. & Polguère, A. (1987): "A formai lexicon in Meaning-Text Theory", in Computational Linguistics, 13/3-4, pp.261-275. Mel'cuk, I. & Zholkovsky, A. (1988): "The Explanatory Combinatorial Dictionary", in Evens (ed.) Relational Models of the Lexicon, Cambridge, Cambridge University Press, pp.41-74. Mel'òuk, I. & Wanner, L. (1994): "Towards an Efficient Representation of Restricted Lexical Cooccurrence", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.325-338. Merten, P. (1992): "Apport des relations notionnelles à la description terminologique", in TAMA 92, Termnet, Vienna, pp.203-228. Meyer, I. & Steele, J. (1990): "The Presentation of an Entry and Super-entry in an Explanatory Combinatorial Dictionary of English", in Steele (ed.) Meaning-Text Theory, Linguistics, Lexicography and Implications, University of Ottawa Press, pp.62-94. Michiels, A. (1977): "Idiomaticity in English", in Revue des Langues Vivantes, XLIII, 2, pp. 184-199. Michiels, A. (1982): Exploiting a Large Dictionary Database. PhD Thesis, University of Liège, mimeographed. Michiels, A. (1983a): Aspects de la lexicologie computationnelle de l'anglais, Conférence Faculté Ouverte, University of Liège, 19 p. Michiels, A. (1983b): "Automatic Analysis of Texts", in Proceedings of the Conference (Informatics 7) held by the Aslib Informatics Group and the Information Retrieval Group of the British Computer Society, Cambridge, pp. 103-120. Michiels, A. (1995a): "Introducing HORATIO", in Alberto & Bennett (eds) Lexical Issues in Machine Translation, Studies in Machine Translation and Natural Language Processing, Vol.8, European Commission, Luxembourg, pp.77-91. Michiels, A. (1995b): "Feeding LDOCE entries into HORATIO", in Alberto & Bennett (eds) Lexical Issues in Machine Translation, Studies in Machine Translation and Natural Language Processing, Vol.8, European Commission, Luxembourg, pp.93-115. Michiels, A. (1995c): Horatio. A Middle-sizedNLP Application in Prolog, L \ Liège. Michiels, A. (1996): "An experiment in translation selection and word sense discrimination using the metalinguistic apparatus of two computerized dictionaries", DEFI Technical Report, University of Liège, 24 p. (available from the following URL: http://engdepl.philo.ulg.ac.be/michiels/defi.htm). Michiels, A. & Dufour, N. (1996): From SGML tapes to DIC clauses: Identifying Multi-Word Units for Context-Sensitive Lookup, DEFI Technical Report, Liège (available from the following URL: http://engdepl.philo.iilg.ac.be/michiels/defi.htm). Michiels, A. & Moulin, A. (1983): "The 'Longman Lexicon of Contemporary English': A Tentative Appraisal", in Grazer Linguistische Studien, 19, pp.106-123. Michiels. Α.. Mullenders, J. & Noël, J. (1980): "Exploiting a Large Data Base by Longman", in COLING '80 Proceedings, Tokyo, pp.373-382. Michiels, A. & Noël, J. (1982): "Approaches to Thesaurus Production", in Proceedings of Coling' 82, Prague. Association for Computational Linguistics, pp.227-232. Michiels, A. & Noël, J. (1983): "Retrieving and improving collocability information in LDOCE: an interactive treatment", in Linguistica Computazionale, III, pp.207-213.

320 Michiels, Α. & Noël, J. (1984): "The pro's and con's of a controlled defining vocabulary in a learner's dictionary", in LEXeter'83 Proceedings, Tübingen, Max Niemeyer Verlag, pp.386-394. Miller, G. (1990): "Nouns in WordNet: A Lexical Inheritance System", in International Journal of Lexicography, 3/4, pp.245-264. Miller, G., Beckwith, R., Fellbaum, C„ Gross, D. & Miller, K. (1990): "Introduction to WordNet: An On-Line Lexical Database", in International Journal of Lexicography, 3/4, pp.235-244. Mish, F. (1987): Webster's Ninth New Collegiate Dictionary, Merriam-Webster Inc., Springfield, MA. Mitchell, T.F. (1966): "Some English Phrasal Types", in Bazell, Catford, Halliday & Robins (eds) In Memory ofJ.R. Firth, Longmans, London, pp.335-358. Mitchell, T.F. (1971): "Linguistic 'goings-on': Collocations and other lexical matters arising on the syntagmatic record", in Archivum Linguisticum, NS, 2, pp.35-69. Montemagni, S. (1994): "Non Alternating Argument Structures: The Causative/Inchoative Alternation in Dictionaries", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.349-359. Montemagni, S., Federici, S. & Pirrelli, V. (1996): "Example-based Word Sense Disambiguation: a Paradigm-driven Approach", in EURALEX'96 Proceedings, Göteborg University, pp. 151-159. Moulin, A. (1983): "LSP Dictionaries for EFL Learners", in Hartmann (ed.) Lexicography: Principles and Practice, New York, Academic Press, pp. 144-152. Nakhimovsky, A. (1990): "Word Meaning and Syntactic Structure: Some Comparative Notes", in Steele (ed.) Meaning-Text Theory, Linguistics, Lexicography and Implications, University of Ottawa Press, pp.3-17. Nakhimovsky, A. (1990): "A Lexicon-Based Algorithm for Ambiguity Resolution in Parsing", in Steele (ed.) Meaning-Text Theory, Linguistics, Lexicography and Implications, University of Ottawa Press, pp.326-349. Nation, P. & Carter, R. (eds) (1989): Vocabulary Acquisition, AILA Review 6. Nattinger, J. (1988): "Some current trends in vocabulary teaching", in Carter & McCarthy (eds) Vocabulary and Language Teaching, Longman, Harlow, pp.62-82. Nattinger, J.R. & Decarrico, J.S. (1992): Lexical Phrases and Language Teaching, Oxford University Press. Neff, M. & Cantor, L. (1988): "Computational Tools for Lexicographers", in Magay & Zigany (eds) BudaLEX'88 Proceedings; Papers from the EURALEX Third International Congress, Budapest, Akademiai Kiado, pp.297-311. Neff, M. & Boguraev, B. (1989): "Dictionaries, Dictionary Grammars and Dictionary Entry Parsing", in Proceedings of the 27th Annual Meeting of the ACL, Vancouver, BC, pp.91-101. Neff, M. & McCord, M. (1990): "Acquiring lexical data from machine-readable dictionary resources for machine translation", in Proceedings of the 3rd International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language, University of Texas at Austin, pp.85-90. Nida, E. (1975): Componential Analysis of Meaning, The Hague, Mouton. Nirenburg, S. (1989): "Lexicons for Computer Programs and Lexicons for People", in Dictionaries in the Electronic Age: Proceedings of the Fifth Annual Conference of the University of Waterloo Centre for the New Oxford English Dictionary, Oxford, pp.43-65. Nunberg, G„ Sag, I., Wasow, T. (1994): "Idioms", Language, 70/3, pp.491-538. Nunberg, G. & Zaenen, A. (1992): "Systemic Polysemy in lexicology and lexicography" in Torninola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings 1-11, Fifth EURALEX International Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.387-396. Olney, J. (1968): "To all interested in the Merriam-Webster transcripts and data derived from them", Systems Development Corporation Document, L-13579. Ostler, N. & Atkins, B.T. (1991): "Predictable Meaning Shift: Some Linguistic Properties of Lexical Implication Rules", in Pustejovsky & Bergler (eds) Lexical Semantics and Knowledge Representation, University of California, Berkeley, pp.76-87.

321 Palmer, H. (1933): Second Interim Report on English Collocations, Tokyo, Institute for Research in English Teaching. Palmer, H. (1938): A Grammar of English Words, Longmans Green, London. Pavel, S. (1993a): Bibliographie de la Phraséologie (1905-1992), Réseau International de Néologie et de Terminologie, Montréal. Pavel, S. (1993b): "La phraséologie en langue de spécialité. Méthodologie de consignation dans les vocabulaires terminologiques", in Terminologies Nouvelles, 10, déc. 1993, pp.67-82. Peters, C., Federici, S., Montemagni, S. & Calzolari, Ν. (1994): "From Machine-Readable Dictionaries to Lexicons for NLP: the Cobuild Dictionaries - a Different Approach", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.147-157. Petitpierre, D„ Robert, D. & Warwick-Armstrong, S. (1994): "DICO: A Network-Based Dictionary Consultation Tool", poster presented at the EURALEX'94 International Congress, Vrije Universiteit Amsterdam. Picchi, E., Peters, C. & Calzolari, Ν. (1988): "Implementing a Bilingual Lexical Database System", in Magay & Zigany (eds) BudaLEX'88 Proceedings; Papers from the EURALEX Third International Congress, Budapest, Akademiai Kiado, pp.317-329. Picchi, E., Peters, C. & Marinai, E. (1992): "The Pisa Lexicographic Workstation: The Bilingual Components", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings I-II, Fifth EURALEX International Congress, Studia Translatologica, Ser. A, Vol.1, University of Tampere, pp.277-285. Pin-Ngern, S. & Evens, M. (1994): "An Adverbial Lexicon for Natural Language Processing Systems", in International Journal of Lexicography, 7/3, pp.197-221. Procter, P. (ed.) (1978): Longman Dictionary of Contemporary English, (2nd edition edited by D. Summers), Longman Group Ltd, Harlow. Procter, P. (1994): "The Cambridge Language Survey", paper read at the Workshop on "The Future of the Dictionary" co-sponsored by Rank Xerox Research Centre and Acquilex-II, Grenoble. Procter, P. (ed.) (1995): Cambridge International Dictionary of English, Cambridge University Press. Pustejovsky, J. (1991): "The Generative Lexicon", in Computational Linguistics, 17/4, pp.409-441. Pustejovsky, J. (ed.) (1993): Semantics and the Lexicon, Dordrecht, Kluwer Academic Publishers. Pustejovsky, J., Bergler, S., Anick, P. (1993): "Lexical Semantic Techniques for Corpus Analysis", in Computational Linguistics. 19/2, pp.331-358. Pustejovsky, J. & Boguraev, B. (1991): "Lexical Knowledge Representation and Natural Language Processing", in IBM Journal of Research and Development, Vol.35/4. Pustejovsky, J. & Boguraev, B. (1994): "A Richer Characterization of Dictionary Entries: The Role of Knowledge Representation", in Atkins & Zampolli (eds) Computational Approaches to the Lexicon, Oxford University Press, pp.295-311. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. (1985): A Comprehensive Grammar of the English Language, Longman. Renouf, A. & Sinclair, J. (1991): "Collocational Frameworks in English", in Ajimer & Altenberg (eds) English Corpus Linguistics, Longman Group Ltd, Harlow, pp. 128-144. Rey-Debove, J. (1971): Etude linguistique et sémiotique des dictionnaires français contemporains, Mouton, La Haye. Richards, J. (1976): "The role of vocabulary teaching", in TESOL Quarterly, 10/1, pp.77-89. Rizo Rodriguez, A. (1992): "A Proposal for a Valency Lexicon of English Catenative Verbs", in EURALEX'90 Proceedings, Barcelona: Biblograf, pp.381-390. Robert, P. (1987): Le Petit Robert - Dictionnaire alphabétique et analogique de la langue française. Dictionnaires Le Robert, Paris. Roget's Thesaurus of English Words and Phrases, Penguin Edition, 1966. Sansome, R. (1986): "Connotation and lexical field analysis", in Cahiers de Lexicologie, 49-2, pp. 1333.

322 Schulze, Β.M. & Christ, O. (1994): THE CQP User's Manual. Institut für maschinelle Sprachverarbeitung, Universität Stuttgart, Version l.Od, May 1994 (Revised October 1994). Segond, F. & Zaenen, A. (1994): "Multi-word expressions in bilingual dictionaries and in Compass", paper read at the workshop on "The Future of the Dictionary" co-sponsored by Rank Xerox Research Centre and Acquilex-H, Grenoble. Sikra, J. (1992): "Dictionary Defining Language", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings /-//, Fifth EURALEX International Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.295-300. Sinclair, J. (1966): "Beginning the study of lexis", in Bazell, Catford, Halliday & Robins (eds) In Memory of J.R. Firth, Longmans, London, pp.410-430. Sinclair, J. (1987a): Collins CobuildEnglish Language Dictionary, HarperCollins Publishers, Glasgow (2nd edition in 1995). Sinclair, J. (ed.) (1987b): Looking Up: An account of the Cobuild project in lexical computing, London, Collins ELT. Sinclair, J. (1991): Corpus, Concordance, Collocation, Oxford University Press. Sinclair, J. (ed.) (1996): Collins Cobuild Grammar Patterns J: Verbs, HarperCollins Publishers, Glasgow & London. Sinclair, J., Hoelter, M. & Peters, C. (eds) (1995): The Languages of Definition: The Formalization of Dictionary Definitions for Natural Language Processing, Studies In Machine Translation and Natural Language Processing, Vol.7, European Commission, Luxembourg. Sinclair, J. & Renouf, A. (1988): "A lexical syllabus for language learning", in Carter & McCarthy (eds) Vocabulary and Language Teaching, Longman, Harlow, pp. 140-160. Smadja, F. (1991): "Macrocoding the Lexicon with Co-occurrence Knowledge", inZernik (ed.) Lexical Acquisition: Using On-Line Resources to Build a Lexicon, Hillsdale, Lawrence Erlbaum Associates, p. 165-189. Smadja, F. (1993): "Retrieving collocations from text: Xtract", in Computational Linguistics. 19/1, pp. 143-177. Sobkowiak, W. (1990): "On the phonostatistics of English onomatopoeia", in Studia Anglica Posnaniensia, XXIII, pp. 15-30. Spencer, A. (1975): Noun-Verb Expressions in Legal English, Khartoum, Karthoum University Press. Steele, J. (ed.) (1990): Meaning-Text Theory: Linguistics, Lexicography, and Implications, University of Ottawa Press. Steele, J. & Meyer, I. (1990): "Lexical Functions in an Explanatory Combinatorial Dictionary: Kinds, Descriptions, and English Examples", in Steele (ed.) Meaning-Text Theory: Linguistics, Lexicography, and Implications, University of Ottawa Press, pp.41-61. Stubbs, M. (1993): "British Traditions in Text Analysis - From Firth to Sinclair", in Baker, Francis & Tognini-Bonelli (eds) Text and Technology: In Honour of John Sinclair, Amsterdam and Philadelphia: John Benjamins, pp. 1-33. Stubbs, M. (1994): "Grammar, text, and ideology: computer-assisted methods in the linguistics of representation", in Applied Linguistics, 15/2, pp.201-223. Summers, D. (ed.) (1987/1995): Longman Dictionary of Contemporary English, 2nd/3rd edition, Longman Group Ltd, Harlow. Summers, D. (1993): Longman Language Activator, Longman Group UK Limited, Harlow. Telija, V. (1992): "Lexicographic description of words and collocations: Feature-functional model", in EURALEX'90 Proceedings, Barcelona, Biblograf, pp.315-320. Terminologies Nouvelles, n°l 0, revue du Réseau International de Néologie et de Terminologie (RINT), December 1993, Special issue on phraseology, (proceedings of the International Seminar, Hull, 1993), in French. Tesnière, L. (1959): Eléments de Syntaxe Structurale, Paris, Klincksieck. Thoiron, P. & Béjoint, H. (eds) (1996): Les dictionnaires bilingues, Collection "Universités francophones et Champs linguistiques", Louvain-la-Neuve, AUPELF-UREF et Duculot.

323 Tournier, J. ( 1985): Introduction descriptive à la lexicogénétique de l'anglais contemporain, Paris and Genève, Champion-Slatkine. Van Campenhoudt, M. (1994): "Idiomaticité et gestion de données terminologiques: une approche notionnelle", in META, Presses de l'Université de Montréal, 39/1, pp.97-106. Van der Wouden, T. (1992a): "Prolegomena to a Multilingual Description of Collocations", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings /-//, Fifth EURALEX International Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.449-456. Van der Wouden, T. (1992b): "Beperkingen op het optreden van lexicale elementen", in De Nieuwe Taalgids, 85/6, pp.513-538. Van der Wouden, T. ( 1997): Negative contexts: Collocation, polarity and multiple negation, Routledge Studies in Germanic Linguistics, Routledge, London. Varantola, K. (1994): "The dictionary user as decision maker", in EURALEX'94 Proceedings, Vrije Universiteit Amsterdam, pp.606-611. Verlinde, S. (1995): "La combinatoire du vocabulaire des fluctuations dans le discours économique", in Cahiers de Lexicologie, 66, pp. 137-176. Verlinde, S., Binon, J. & Van Dyck, J. (1992): Dictionnaire contextuel du français économique - Tome A: L'entreprise, Louvain, Garant. Veronis, J. & Ide, N. (1994): "From Dictionaries to Knowledge Bases ... and Back", paper read at the workshop on "The Future of the Dictionary" co-sponsored by Rank Xerox Research Centre and Acquilex-II, Grenoble (abstract published under the title "Machine-Readable Dictionaries: Have we wasted our time?", in Cambridge Language Reference News, Cambridge University Press, Number 4, p.l). Vossen, P. (1992): "The automatic construction of a knowledge base from dictionaries: a combination of techniques", in Tommola, Varantola, Salmi-Tolonen & Schopp (eds) EURALEX'92 Proceedings /-//, Fifth EURALEX International Congress, Studia Translatologica, Ser.A, Vol.1, University of Tampere, pp.311-326. Vossen, P. (1996): "Right or Wrong: Combining Lexical Resources in the EuroWordNet Project", in EURALEX'96 Proceedings, pp.715-728. Vossen, P., Meijs, W. & Den Broeder, M. (1989): "Meaning and Structure in Dictionary Definitions", in Boguraev & Briscoe (eds) Computational Lexicography for Natural Language Processing, London and New York, Longman, pp. 17-192. Walker, D. (1987): "Knowledge Resource Tools for Accessing Large Text Files", in Nirenburg (ed.) Machine Translation - Theoretical and Methodological Issues, Cambridge University Press, pp.247261. Walker, D. (1994): "The Ecology of Language", in Zampolli, Calzolari & Palmer (eds) Current Issues in Computational Linguistics: in Honour of Don Walker, Series Linguistica Computazionale, IX-X, pp.359-375. Walker, D. & Amsler, R.A. (1986): "The Use of Machine-Readable Dictionaries in Sublanguage Analysis", in Grishman & Kittredge (eds) Analysing Language in Restricted Domains, Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp.69-83. Walter, E. (ed.) (1994): Cambridge Word Routes Anglais-Français: Lexique thématique de l'anglais courant, Cambridge University Press. Wanner, L. (1996) (ed.) Lexical Functions in Lexicography and Natural Language Processing, John Benjamins Publishing Company, Amsterdam and Philadelphia. Webster's Seventh New Collegiate Dictionary (W7) (1963), G.& C. Merriam, Springfield, Massachusetts. Werner, O. (1988): "How to teach a network: minimal design features for a cultural acquisition device or C-KAD", in Evens (ed.) Relational Models of the Lexicon, Cambridge University Press, pp. 141166. Wierzbicka, A. (1987): English Speech Act Verbs: A Semantic Dictionary, Sidney, Academic Press.

324 Wierzbicka, Α. (1988): The Semantics of Grammar, Studies in Language Companion Series, Vol.18, Amsterdam & Philadelphia, Benjamins. Wilkins, J. (1668): An Essay Towards a Real Character and a Philosophical Language. London. Wilks, Y., Fass, D„ Guo, C.-M., McDonald, J., Plate, T. & Slator, B. (1989): "A tractable machine dictionary as a resource for computational semantics", in Boguraev & Briscoe (eds) Computational Lexicography for Natural Language Processing, Longman, London and New York, pp. 193-228. Wilks, Y., Fass, D., Guo, C.-M., McDonald, J., Plate, & Slator, B. (1993): "Providing Machine Tractable Dictionary Tools", in Pustejovsky (ed.) Semantics and the Lexicon, Dordrecht, Kluwer Academic Publishers, pp.341 -401. Wilks, Y., Slator, B. & Guthrie, L. (1996): Electric Words - Dictionaries, Computers and Meanings, MIT Press, Cambridge, MA and London. Winograd, T. (1983): Language as a Cognitive Process, Vol.1, Syntax, Addison-Wesley. Winograd, T. (1984): "Computer Software for Working with Language", in Scientific American, Vol.251, Number 3, pp.90-101. Zampolli, A. (1994): "Introduction", in Atkins & Zampolli (eds) Computational Approaches to the Lexicon, Oxford University Press, pp.3-15. Zampolli, Α., Calzolari, Ν. & Palmer, M. (eds) (1994): Current Issues in Computational Linguistics: in Honour of Don Walker, Series Linguistica Computazionale, IX-X, Pisa and Dordrecht, Giardini Editori and Kluwer Academic Publishers. Zernik, U. (ed.) (1991): Lexical Acquisition: Using On-Line Resources to Build a Lexicon, Hillsdale, Lawrence Erlbaum Associates. Zgusta, L. (1971): Manual of Lexicography, Janua Linguarum, Series Major, 39, The Hague, Mouton. Zolkovskij, Α., Mel'cuk, I, Ilin, G., Fitialov, S. & Paduceva, E. (1971): La sémantique en URSS, Documents de Linguistique Quantitative, 10, Dunod.

20 Index

AO 68, 160 Able 69, 171 Access key 45, 275 ACQUILEX 1 3 , 5 1 , 2 1 9 , 2 2 0 Actant 54 Actual 90 Adv 70 AdvO 69 Agentive role 48 AHD 249 Ai 68, 162 ANTHEM 95 Anti 70, 166 AntiReal 198 Antonymy 258 Application program 135, 140 Applied linguistics 255 Argument 63 Argument structure 47 Atomic predicate 232 Avoidance strategy 258 Away 204

Centr 71, 175, 196, 287 Child 185 Clipper 123, 135 Cobuild 11, 24, 59, 111, 225, 231, 249, 258, 268 Cobuild defining practice 12 Colligation 32 Collins-Robert dictionary 1 , 1 0 1 , 2 7 4 Collocation 6, 16, 73, 79, 94, 110, 113, 247 Collocational dictionary 31 Collocator 21, 108, 116, 137, 275 Color 91 Compass 10, 16 CompLex 10 Componential analysis 4 Compound lexical function 66 Computerized dictionary 2, 15, 121 Conative alternation 231 Concordance 24 Consonant cluster 210 Constitutive role 48 Cont 71

Bank of English 24, 268 Base 21, 108, 116, 137, 262, 275 Basic shapes 251 BBI 32, 37, 78, 113, 270 Bilingual dictionary 101, 274 Bilingual MRD 14 Bon 70 Boolean operator 109 British National Corpus 24, 268 BYU Concordance 2

Context-sensitive rule 96 Contr 71, 167 Conv 71 Converter 147, 184, 189 Cooking verb 242 Corpus 23, 111, 279 Corpus analysis 279 Culm 72, 196, 286

C 123 CA collocation 33, 113 CALL 255, 260 Cambridge Language Survey 268 Cap 70 case 54 Caus 70, 157, 240 Causative/inchoative alternation 209, 230, 231, 296

Dative alternation 230 DBaseIII+ 123 DECIDE 2, 24, 141, 279 DEFI 16, 104, 130, 136, 277-279 Definiendum 59 Definiens 59 Defining formula 127, 147, 184, 232, 274 Definition 147 Degrad 7 2 , 2 4 1 Delexical collocation 19 Delexical verb 82

326 Diathesis 50 Diathesis alternation 67 Dictionnaire des idées reçues xvii, 3, 270, 279 Differentiae 10 Dim 91 Disambiguation 11, 14, 277 Down 202 EAC 39 EN collocation 34, 78, 113, 242 Epit 73 Equip 74 Ergative 1 9 9 , 2 0 9 , 2 3 1 , 2 3 4 , 2 9 6 ETC Text Retrieval Software 2 EURALEX 111 EUROTRA 13, 54, 96, 188 EUROTRA-7 13, 122 EuroWordNet 11 Event structure 47 Excess 74 Explanatory Combinatory Dictionary 5, 53, 56, 275 Fact 7 4 , 2 0 4 Female 186 Figur 74 Figurative verb 19, 82 Fin 75 FinFuncO 1 9 5 , 2 0 3 , 2 4 2 Fission rule 92 Flaubert xvii, 3, 270, 279 Foreign-language teaching 257 Formal role 48 frame 54, 81 Frame semantics 55, 248 Free collocation 19 Fulg 91 Fune 75, 267 Fused lexical function 68, 175 Gener 76 generation 53, 92, 93 Generative Lexicon 47, 191, 275 Genus 1 0 , 1 1 , 7 6 Germ 76 Government Pattern 56, 60 Grammatical collocation 21 HORATIO 10 Hyperonym 192, 275

Hyperonymy 11 Hyponym 192 Hyponymy 258 IBM 10, 14 Idiom 17, 79 Imper 76 IMS Corpus Workbench 279 Incep 76, 169, 240 IncepPredMinus 195 Inchoative verb 77 Indefinite/Unspecified object alternation 230 Induced Action Alternation 232, 241 Information retrieval 93 Instr 77 Introductory Zone 58 Involv 77 ISA link 11 ISSCO 15 KWIC

24,279

Labor 77 Labreal 77 Language Activator 145 Language engineering 13 LDB 121, 200 LDOCE 1,10,105,111,121,123,148, 208, 211, 233, 248, 268, 277 Lemmatizer 27 Lexical acquisition 9 Lexical acquisition bottleneck 9 Lexical cohesion 94 Lexical collocation 19, 33, 255 Lexical Conceptual Paradigm 50, 221 Lexical database 122 Lexical function 5, 62, 113, 116, 137, 240, 247, 273 Lexical Functions Zone 62 Lexical Implication Rule 156, 161, 213, 221 Lexical Knowledge Base 10, 13, 200, 220, 273 Lexical phrase 250 Lexical pre-emption 221 Lexical relation 5, 14 Lexical-semantic database 273 Lexical-semantic network 145 Lexical-semantic relation 5 Liqu 77, 195, 202, 203, 242, 291

327 Loe 78 Locative alternation 230 LOLEX 145 LSP 118 LSP dictionary 43 Machine translation 13, 14,93,95-98, 188, 200 Machine-readable dictionary 2, 15, 121, 273 Magn 78 Magn// 174 Male 186 Manif 80 MeaningoText Theory 5, 53 Merriam-Webster Pocket Dictionary 10 METAL 95, 237 Metalinguistic indicator 102 Metalinguistic information 104 Metaphor 74, 175, 202, 213, 245 MI score 25 Micro-definition 105, 127, 147, 148, 189, 208 MIDAS 245 Middle alternation 230 Minus 80, 240 Motor 91 MRD 10 Mult 80, 149, 247, 251, 285 Mutual Information 25, 111 Nocer 80 Notional relation 119 Noun alternation 50, 220 Obstr 80,241 Off 203 Onomatopoeia 207 Oper 8 1 , 9 7 , 2 6 7 , 2 9 0 Oxford-Hachette dictionary 16, 278 P-collocation 40, 73 Paradigmatic 64 Paradigmatic relation 64 Paraphrasing rule 92 Parent 185 Paronymy 159 Parser 27 Part 183 Part-whole relationship 183

Partitive noun 87 Pejor 82 Perf 82, 174 Perm 83, 157 Phonaestheme 207 phonaesthesia 207 Phrasal verb 200 Pisa group 15 Plus 83,201,240 Polarity 265 Polysemy 219, 220 Pos 83, 104 Pre-emption 221 Pred 83 Prepar 83, 242 Process 187, 190 Propt 84 Prox 84 Qual 84,90 Qualia 48, 275 Qualia structure 48 Quant 90 Rank Xerox Research Centre 16, 279 Real 85, 195, 198, 202, 203 Reciprocal alternation 230 Reflexive object alternation 230 Reiteration 94 Relational database 114,123 Relational database management system 122 Relational semantics 4 Restricted collocation 18, 19, 32, 35, 37, 40 Restricted lexical co-occurrence 62 Result 86, 190 Reusability of lexical resources 13 Roget 145 SO 86 SI,... 86 Scenario SEC 36 Selection Semantic Semantic Semantic Semantic Semantic Semantic

81 restriction 73, 107, 110, 183 field 4, 261 network 14, 116, 261 pre-emption 221 primitive 232 relation 5 Zone 59

328 Sense extension 219 SGML 16 Simile 79 Sing 87, 154, 251, 264, 268 Sinstr 87, 156 Sloe 8 7 , 2 8 9 Smed 87 Smod 87, 171 Son 87, 168, 207, 240, 281, 284 Sound 87 Sound symbolism 207 Spec 192, 227 Specialized collocation 19 Speed 91, 204 Sres 87 Standard Lexical Function 62, 179 Stat 91 Subject field 105 Sublanguage 85, 94 Sublanguage collocation 45 Subscript 82 Subscripted modifier 90, 180, 204 Substitution rule 92 Superordinate 11 Support verb 19, 21, 63, 81, 95, 267 Sympt 88 Syn 89 synonymy 258 Syntactic Zone 60 Syntagmatic 64 T-score 26 Tagger 27 Taxonomy 1 1 , 2 7 5 , 2 7 7 Telic 77, 190, 197 Telic role 48, 191 Temp 90 Thesauric class 110, 181, 277 Thesaurus 28, 145 Transitivity alternation 229 Trem 91 T° 91 Understood reflexive object alternation 229 Unit 181 Universale 251 Unspecified object alternation 229 Up 201 Usual 90 V0

89

Valence 54 Value 63 Ver 89 Vocable 59 Vocabulary teaching 255 W7 148, 181, 184 Word formation 153, 200, 276 Word Routes 145 WordCruncher 2, 108, 114, 121, 122 WordNet 1 1 , 1 4 , 1 4 5 , 2 7 4 , 2 7 7 WordSmith 10 Xkwic

279

Zero suffix

189