Defining collocation for lexicographic purposes: From linguistic theory to lexicographic practice 9783034320542, 9783035109412, 2016932426

392 77 5MB

English Pages [346] Year 2016

Table of contents :
Cover
Contents
Introduction (Adriana Orlandi, Laura Giacomini)
Monolingual collocation lexicography: State of art and new perspectives (Adriana Orlandi)
Congruency Principles in Word Combination and Lexicography (Vincenzo Lo Cascio)
Distributional restrictions based on word content and their place in dictionaries (Michele Prandi)
For a typology of phraseological expressions: how to tell an idiom from a collocation? (Béatrice Lamiroy)
What do we talk about when we talk about collocation in Spanish? (Daniela Capra)
Collocations dictionaries for English and Spanish: the state of the art (Gloria Corpas Pastor)
Defining collocations for lexicographic purposes. A matter of boundaries and arrangement (Laura Giacomini)
Core vocabulary and core collocations: combining corpus analysis and native speaker judgement to inform selection of collocations in learner dictionaries (Veronica Benigno and Olivier Kraif)
NOUN PREP NOUN collocations in French: the case of scientific lexicon (Francis Grossmann, Agnès Tutin)
Italian dictionaries of collocations (Luigi Matt)
Notes on Contributors

Recommend Papers

English-Latvian Lexicographic Tradition: A Critical Analysis 9783110365764, 9783110369878

Since 1987 when the first English explanatory dictionary fully based on corpus evidence was published, considerable chan

141 44 4MB Read more

English-Latvian Lexicographic Tradition: A Critical Analysis 9783110365764, 9783110369878

Since 1987 when the first English explanatory dictionary fully based on corpus evidence was published, considerable chan

146 75 7MB Read more

Language Teaching: Linguistic Theory in Practice 9780748636365

How can theories of language development be understood and applied in your language classroom? By presenting a range o

113 49 1MB Read more

Key Words for Fluency: Upper Intermediate Collocation Practice 9780759396272

Key Words for Fluency Intermediate provides practice in learning collocations of some of the most useful words in Englis

664 40 31MB Read more

Language Planning: From Practice to Theory 9781800418059

Reviewing the field of language policy and planning, this text sets out current practice and ways of thinking about lang

139 85 30MB Read more

Blue Ocean Strategy: From Theory to Practice

Adapted from BLUE OCEAN STRATEGY: How to Create Uncontested Market Space and Make the Competition Irrelevant by W. Chan

483 130 441KB Read more

Linguistic Analysis: From Data to Theory 9783110222517, 9783110222500

This book reconsiders the classic topics of linguistic analysis and reflects on universal aspects of language from a typ

152 102 3MB Read more

Anarchism: From Theory to Practice 0853451753, 9780853451754

a 69 page anarchist pamphlet published by anarchists in San Francisco...1) The Basic Ideas of Anarchism; 2) In Search of

479 103 441KB Read more

Introduction to Visual SLAM: From Theory to Practice 9811649383, 9789811649387

This book offers a systematic and comprehensive introduction to the visual simultaneous localization and mapping (vSLAM)

112 69 9MB Read more

Intercultural Approaches to Education: From Theory to Practice 3030708241, 9783030708245

This open access book provides an analysis of contemporary societies and schools shaped by cultural diversity, globaliza

111 66 2MB Read more

Defining collocation for lexicographic purposes: From linguistic theory to lexicographic practice
9783034320542, 9783035109412, 2016932426

Author / Uploaded
Adriana Orlandi
Laura Giacomini

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

li219

Linguistic Insights

Studies in Language and Communication

Adriana Orlandi & Laura Giacomini (ed.)

Defining collocation for lexicographic purposes

Peter Lang

From linguistic theory to lexicographic practice

li219

This volume aims to promote a discussion on the definition of collocation that will be useful for lexicographic purposes. Each of the papers in the volume addresses in detail one or more aspects of three key issues. The first issue concerns the boundaries between collocations and other word combinations, and the way in which lexicographers convey classifications to dictionary users. The second issue is the possibility, or even necessity, of adapting the definition of collocation to the objectives of different types of dictionaries, taking into account their specific microand macro-structural properties and their users’ needs. The third issue concerns the methods for collocation extraction. In order to tailor the definition of collocation to the actual dictionary function, it is necessary to develop hybrid methods relying on corpus-based approaches and combining data processing with criteria such as native speakers’ evaluation and contrastive analysis.

Adriana Orlandi has a PhD in French Linguistics, and teaches French linguistics and translation at the University of Modena and Reggio Emilia (Italy). Her main research interests are semantics, terminology and translation. She has been studying collocations since 2011, with a special interest in the definition of collocations and its possible applications in lexicography. In 2012, she organized the International Workshop “New perspectives on collocations” (Modena).

I B

978-3-0343-2054-2

Laura Giacomini has a PhD in Applied Linguistics from the Department of Translation and Interpretation of Heidelberg University (Germany), where she is a teacher and researcher. Her research fields include lexicography, phraseology, LSP and translation studies. She is currently involved in different lexicographic projects (e.g. WLWF) and is working on her habilitation thesis on LSP databases of the technical domain, with special focus on the topic of phraseological variation in specialised language and its representation in e-lexicographic resources.

Deﬁning collocation for lexicographic purposes

Linguistic Insights Studies in Language and Communication Edited by Maurizio Gotti, University of Bergamo

Volume 219

ADVISORY BOARD Vijay Bhatia (Hong Kong) David Crystal (Bangor) Konrad Ehlich (Berlin / München) Jan Engberg (Aarhus) Norman Fairclough (Lancaster) John Flowerdew (Hong Kong) Ken Hyland (Hong Kong) Roger Lass (Cape Town) Matti Rissanen (Helsinki) Françoise Salager-Meyer (Mérida, Venezuela) Srikant Sarangi (Cardiff) Susan Šarcˇevi´c (Rijeka) Lawrence Solan (New York)

PETER LANG Bern • Berlin • Bruxelles • Frankfurt am Main • New York • Oxford • Wien

Adriana Orlandi & Laura Giacomini (ed.)

Deﬁning collocation for lexicographic purposes

From linguistic theory to lexicographic practice

PETER LANG Bern • Berlin • Bruxelles • Frankfurt am Main • New York • Oxford • Wien

Bibliographic information published by die Deutsche Nationalbibliothek Die Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliograﬁe; detailed bibliographic data is available on the Internet at ‹http://dnb.d-nb.de›. British Library Cataloguing-in-Publication Data: A catalogue record for this book is available from The British Library, Great Britain. Library of Congress Control Number: 2016932426

The editors, Adriana Orlandi and Laura Giacomini, would like to thank the Department of Studies on Language and Culture at the University of Modena and Reggio Emilia and the Department of Translation and Interpreting at Heidelberg University for the assistance provided in elaborating on the manuscript and the ﬁnancial support given to the project. ISSN 1424-8689 pb. ISBN 978-3-0343-2054-2 pb.

ISSN 2235-6371 eBook ISBN 978-3-0351-0941-2 eBook

This publication has been peer reviewed. © Peter Lang AG, International Academic Publishers, Bern 2016 Hochfeldstrasse 32, CH-3012 Bern, Switzerland [email protected], www.peterlang.com All rights reserved. All parts of this publication are protected by copyright. Any utilisation outside the strict limits of the copyright law, without the permission of the publisher, is forbidden and liable to prosecution. This applies in particular to reproductions, translations, microﬁlming, and storage and processing in electronic retrieval systems.

Contents

Acknowledgements................................................................................7 Adriana Orlandi, Laura Giacomini Introduction............................................................................................9 Adriana Orlandi Monolingual collocation lexicography: State of art and new perspectives............................................................................19 Vincenzo Lo Cascio Congruency principles in word combination and lexicography..........71 Michele Prandi Distributional restrictions based on word content and their place in dictionaries..............................................................99 Béatrice Lamiroy For a typology of phraseological expressions: how to tell an idiom from a collocation?...........................................123 Daniela Capra What do we talk about when we talk about collocation in Spanish?......................................................................151 Gloria Corpas Pastor Collocation dictionaries for English and Spanish: the state of the art...............................................................................175 Laura Giacomini Defining collocations for lexicographic purposes. A matter of boundaries and arrangement...........................................211

6

Contents

Veronica Benigno, Olivier Kraif Core vocabulary and core collocations: combining corpus analysis and native speaker judgement to inform selection of collocations in learner dictionaries................................................237 Francis Grossmann, Agnès Tutin NOUN PREP NOUN collocations in French: the case of scientific lexicon..............................................................271 Luigi Matt Italian dictionaries of collocations.....................................................301 Notes on Contributors........................................................................325

Acknowledgments

The editors, Adriana Orlandi and Laura Giacomini, would like to thank the Department of Studies on Language and Culture at the University of Modena and Reggio Emilia and the Department of Translation and Interpreting at Heidelberg University for the assistance provided in elaborating on the manuscript and the financial support given to the project.

Adriana Orlandi, Laura Giacomini

Introduction

While initially understood as a type of cognitive restriction (Firth 1957), very much in line with Coseriu’s lexical solidarities (1967), the term col location is now most often used as a kind of distributional restriction. This notion has had two different developments. The first is the phraseological one, that can be found in frameworks such as Meaning-Text Theory, in relation to the notion of lexical functions (Mel’cuk/Clas/Polguère 1995; Mel’cuk 2003; Mel’cuk/Polguère 2006), and in the idea of a binary relation between two lexical components, the base and the collocate, where the collocate fully realizes its meaning only when coupled with its base (see Paillard [1997], Hausmann [1998], Grossmann/Tutin [2002, 2003]). The second approach, originated in the works of the late John Sinclair in Great Britain, is strongly grounded in statistics and corpus analysis. The emphasis here is on the frequency of co-occurrences of word pairs, and on the distribution of meanings and lexical uses of words. The former approach led to extensive researches in lexicology (Cruse, 1988) and lexicography (Hausmann 1989, Mel’čuk 1998). The latter (Sinclair 1991, Evert 2008) underlies research in corpus linguistics. In this volume, we take a lexicographic perspective. The aim of this volume is to promote a discussion on the definition of collocations that can be useful to lexicographic purposes. Problems with the definition of collocations are related, first, to the boundaries between collocations and free combinations, and, second, to those between collocations and idioms. At this level, the question to be answered is: do lexicographers need to take these distinctions into consideration? If we analyse the boundaries existing between collocations and free combinations, for instance, what is under investigation is the very notion of linguistic restriction. Given the large amount of space made available by the digital medium, in the near future electronic dictionaries might make this distinction redundant and enable lexicographers to include all

10

Adriana Orlandi, Laura Giacomini

combinatory possibilities of a word, even for lexical items that are not part of a collocation. This does not mean, however, that a classification of multi-word combinations, and, consequently, criteria to define collocations are superfluous to lexicographic needs. As Bergenholtz and Gouws (2013: 11) point out, what is under question is not the necessity of a classification system, which is actually becoming more and more crucial for lexicography, but the way in which lexicographers convey such a classification to dictionary users. A second issue that has to be taken into consideration when focusing on the notion of collocation is the possibility, or even necessity, of adapting the definition of collocation to the ends of different types of dictionaries (Tarp 2008). The structure of a given dictionary, i.e. the sum of its micro- and macro-structural properties, results from and reflects the objectives of the lexicographer, who is aiming to serve specific user’s needs in specific usage situations (e.g. text production or active translation). The lexicographic formalization of the concept of collocation should take all this into account. The central issue is thus how to fit a theoretical model into the practice of lexicography. As suggested by Rundell (2012: 71), while lexicographers require a theory, the particular purpose of a dictionary precludes its uncritical application. However, the question could also be turned around by asking: is there any theoretical model for the description of collocations that is directly applicable to dictionary making? One further issue that should be addressed concerns the methods for collocation extraction. As a matter of fact, computational linguistics operationalizes morphosyntactic and statistical criteria (Heid/Weller 2010, Evert 2005 respectively), however these criteria are not able to draw a clear distinction between collocations and other word combinations, since they rely on an empirical definition of collocation (Evert 2008) which often fails to capture cognitive and semantic distinctions. In order to tailor the definition of collocation to the actual dictionary function it is therefore necessary to develop hybrid methods combining data processing with other criteria such as native speakers’ evaluation and contrastive analysis. The primary aim of this volume is to reflect upon the relation between lexicographical practice and theorization of the notion of

Introduction

11

collocation, and stimulate discussion of issues relevant for future research in the field of lexicography. Each of the papers in this volume addresses in detail one or more aspects of the above mentioned issues. The book opens with a preliminary chapter by Adriana Orlandi, in which the author outlines the state of art of research in monolingual collocation lexicography. The aim is to reflect upon the definition of collocation in lexicography and to point out the domains where lexicography needs further improvements in order to compile collocation dictionaries really adequate to the users’ needs. The paper shows that a functional definition of collocation does not necessarily contradict the theoretical views upon collocations, and can be useful to determine the role of colligations in a collocation dictionary, as well as the place of free combinations and idioms. The paper also tries to discuss the possibility of envisaging a prototypical approach to collocations, emphasizing some paradoxical aspects that characterize the search for a prototype. Finally, the role of electronic lexicography is emphasized as a way to improve features which miss or lack accuracy in nowadays lexicography, such as information about frequency and fixedness of collocations, authentic examples, direct access to corpora, and usage notes. After this overview of collocation lexicography, the volume investigates some general questions very often underestimated by lexicographers, but rich in implications for collocation lexicography and overall lexicography: the nature of distributional restrictions underneath word combinations, and boundaries between collocations, free combinations and idioms. The first issue is grafted on a typology of syntactic and conceptual restrictions, and it has been treated in the present volume mostly in relation to the boundary between lexical combinations and free combinations. According to Vincenzo Lo Cascio, all word combinations, included free combinations, are regulated by some congruency principles, so that the study of word combinations should be based upon the knowledge of these principles. Lo Cascio distinguishes between formal congruency principles (syntactic and functional syntactic) and encyclopaedic-semantic ones. There are congruency principles that are not language-bound, and this is the case of free combinations, that only satisfy requirements of the general congruency principles. When congruency principles are specific and cultural-bound, they generate

12

Adriana Orlandi, Laura Giacomini

idiosyncratic word combinations, to which collocations belong. Thus, according to the author, it is only within a comparison between languages that we can speak of collocations. Contrastive analysis becomes the central criterion of distinction between collocations and free combinations. The main consequence for lexicography is that the lexicographic description of the lexicon should concern the entire range of combinations allowed by a word, and that all the properties which determine the combinatorial preferences of a word should be described. Finally, Lo Cascio introduces the online version of his collocation dictionaries with the help of selected excerpts. Michele Prandi’s paper focusses more deeply on Lo Cascio’s notion of “encyclopaedic-semantic congruency principles”. The very notion of content-based restrictions is investigated, and three different types of restrictions are taken into account: selection restrictions (confining for instance death to living beings), lexical solidarities (restricting barking to dogs) and cognitive models (restricting flying to birds). Unlike lexical solidarities, that are language-specific, selection restrictions, which correspond to consistency criteria and belong to a natural ontology, are not language-specific but universal. Cognitive models are conceptual structures shared on a very large scale, which admit the possibility of being falsified (birds that don’t fly). Consistency criteria are never stated in dictionaries, but if the aim of a dictionary is to account for the distribution of lexemes within sentence structures, and not simply to describe the content of isolated words, consistency criteria should be taken into account. Prandi proposes Gaston Gross’ model of “generative lexicon” as a model that makes consistency criteria as well as lexical solidarities and cognitive models explicit. An interesting point that differentiates Lo Cascio’s approach from Prandi’s is their position vis-à-vis the relationship between syntax and lexicon. Whereas Lo Cascio considers that the lexical component has “a primary role above syntax”, in Prandi’s view “syntax goes far beyond lexicon”. The paper by Béatrice Lamiroy focusses on the distinction between collocations and idioms. Starting from the description of a research project called “BFQS project” (an enquiry on idioms in the francophone area that takes as its starting point Maurice Gross’ corpus

Introduction

13

of French idioms), Lamiroy raises the problem of settling a protocol enabling researchers and lexicographers to easily recognize idioms and distinguish them from collocations. Two main problems are recalled: the multifaceted nature of expressions figées, and differences in the continuum of lexicalization due to their diachronic dimension. A detailed description of similarities and differences between idioms and collocations is then provided, giving the reader a comprehensive view of the subject. Lo Cascio, Prandi and Lamiroy’s papers raise an issue that will require further investigation in the future, that is the problem of the relationship between different types of word combinations and encoding/ decoding tasks. Just to make one example, selection restrictions and cognitive models described by Michele Prandi enable speakers of different cultures to decode collocations not based on figures of speech quite easily, while these restrictions are not sufficient to correct encoding (see for instance it. aspettare per un bel pezzo which can be translated into English using the expression to wait for a fair amount of time and not to wait for a nice piece of time). On the other hand, collocations based on figures of speech represent to a non-native speaker, especially when they are not transparent, a difficulty not only in encoding but also in decoding tasks. Thus, the French collocation peur bleue (great fear) is quite difficult to decode for a non-native speaker if the context does not support interpretation. These considerations hint at the possibility of a different treatment for different types of collocations in collocation dictionaries, for instance providing more explanations and examples for opaque collocations. The second part of the volume investigates more deeply some aspects of the definition of collocation in lexicography. The aim of Daniela Capra is to show the inherent instability of the definition of collocation. Her observations focus on the domain of Spanish linguistics, but the remarks delivered are generalizable. The issues concern discrepancy whether collocations are part of phraseology or not, fixity and the compositional character of collocations, frequency, and finally the semantic determination in the combinatory of collocations. She takes as an example the treatment of Spanish light verbs, and of some Noun + Prep. + Noun combinations. Special attention is given

14

Adriana Orlandi, Laura Giacomini

to Bosque’s combinatory dictionary REDES (2004), where the choice is made to avoid the term collocation. The conclusion of the paper invites linguists to consider the instability of the concept of collocation as “part of its nature”, and encourages the application of the Prototype Theory to the description of this complex and multi-layered category of word combinations. Gloria Corpas Pastor analyses collocation dictionaries for English and Spanish classifying them on the basis of the importance given to corpora, and from the viewpoint of their theoretical and methodological underpinnings. Standard dictionaries of collocations are based on the lexicographer’s intuition and do not take into account important information such as frequency or evidence of usage. Dictionaries of collocations rely on statistical and frequency-based theories of collocation make use of corpora as sources of information, but here again the way in which corpora are used can largely differ from one dictionary to another, with some dictionaries being corpus-based, and other corpus-driven (Tognini-Bonelli 1991). The paper provides a thorough classification of English and Spanish collocation dictionaries, offering an overview of the underlying approaches to collocation and its definitorial properties. Laura Giacomini’s paper focusses on the definition of collocation for lexicographic purposes. According to the author, a welldesigned concept of collocation is fundamental to have criteria for data selection, but it is necessary for a lexicographer to base theoretical considerations upon the user’s needs. Starting from her own experience as a dictionary compiler, Giacomini discusses the advantages of a “functional definition” of collocation as it has been employed for modelling an electronic dictionary of Italian collocations concerning the semantic field of fear. This working definition includes on the one hand a phraseological concept of collocation including idiomatic expressions, and on the other hand a wider concept of collocation as a combination with a high degree of familiarity in the speaker’s mental lexicon. The use of the electronic medium makes it possible for each headword to be very accurate at the microstructural and mediostructural level, exploiting both formal and conceptual parameters to the description of collocational meanings and their syntactic patterns.

Introduction

15

Veronica Benigno and Olivier Kraif discuss the concept of ‘core vocabulary’ and ‘core collocations’ and its implications for the treatment of collocations in monolingual learner phraseological dictionaries. They present the findings from a corpus-based study combining statistical analysis and native speakers’ evaluation in order to isolate the features that can be used to filter out core collocations from a set of potential candidates identified from a given pivot. The study shows that statistical measures such as frequency are appropriate but not sufficient to identify core collocations in language, because native speakers show to assign more value to highly restricted and fixed units regardless of their frequency of occurrence. These findings are directly connected with the third part of the paper, which deals with phraseology from the pedagogical and lexicographical perspective of collocations’ learner dictionaries. This section argues that both frequency and usefulness should be considered as main organizing principles for this kind of dictionaries. Practical examples extracted from the Longman Dictionary of Contemporary English – LDOCE – 5th edition illustrate this point. Francis Grossmann and Agnès Tutin analyse Noun Prep Noun constructions, which represent a challenge for the study and classification of collocations both in the field of general and specialised discourse. They choose to focus on cross-disciplinary scientific lexicon using a large corpus of scientific papers, and they analyse collocations candidates according to a list of parameters that includes the semantic characterization of N1 and N2, the role played by the preposition, the presence or absence of determiners behind the N2, and so on. They address a typology of five Noun Prep Noun constructions: a) objective genitive constructions, b) subjective genitive constructions, c) predicative structures, d) specification structures, and e) classification structures. The study shows that among these types of constructions, two appear to be more directly linked to the emergence of collocations: predicative and specification structures (hypothèse de départ, recherche de terrain). The authors thus validate an approach that can help linguists and lexicographers to evaluate the collocational status of a construction in scientific lexicon. This approach is based on various criteria (syntactic, semantic, and sometimes pragmatic), in addition to statistical measures.

16

Adriana Orlandi, Laura Giacomini

The final chapter of the volume is by Luigi Matt. Four Italian dictionaries of collocations recently published (Urzì 2009; Russo 2010; Tiberii 2012; Lo Cascio 2012, 2013) are illustrated in detail. The analysis focusses on all aspects of dictionary compiling, going from titles and target readership, to theoretical aspects, choice of headwords and collocations, treatment of collocations, definitions, examples and diaphasic markers. Matt shows that these dictionaries represent a first step taken by Italian linguists to fill the lexicographic gap between Italian and other major European languages. The most relevant features of each dictionary, including its main advantages and drawbacks, are described in the concluding chapter.

References Bergenholtz, Henning / Gows, Rufus 2013. A lexicographical perspective on the classification of multiword combinations. International Journal of Lexicography. 27/1, 1–24. Coseriu, Eugenio 1967. Lexikalische Solidaritäten. Poetica, 1, 293–303. Cruse, Alan 1986. Lexical Semantics. Cambridge: Cambridge University Press. Evert, Stefan 2005. The Statistics of Word Cooccurrences – Word Pairs and Collocations. Stuttgart: University of Stuttgart, IMS. . Evert, Stefan 2008. Corpora and collocations. In Lüdeling, Anke / Kytö, Merja (eds) Corpus Linguistics. An International Handbook. Vol. 2. De Gruyter, 1212–1248. Firth, John Rupert 1957. Papers in Linguistics 19341951. Oxford: Oxford University Press. Grossmann, Francis / Tutin, Agnès (eds) 2003. Les collocations. Analyse et traitement. Travaux et recherches en linguistique appliquée. Amsterdam: De Werelt. Grossmann, Francis / Tutin, Agnès 2002. Collocations régulières et irrégulières: esquisse de typologie du phénomène collocatif. Revue

Introduction

17

française de linguistique appliquée. (Lexique: problèmes ac tuels). 7/1, 7–25. Hausmann, Franz Josef 1989. Le dictionnaire de collocations. In Hausmann, Franz Josef et al. (eds) Dictionaries. An Internation al Encyclopedia of Lexicography. Vol. 2. Berlin/New York: De Gruyter, 1010–1019. Hausmann, Franz Josef 1998. O diccionario de colocaciónes. Criterios de organization. In Ferro Ruibal, Jesus (ed.) Actas de I Coloquio galego der Fraseoloxía. Santiago de Compostela, Centre Ramon Piñeiro: Xuntade Galicia, 63–81. Heid, Ulrich / Weller, Marion 2010. Corpus-derived data on German multiword expressions for lexicography. In Proceedings of the 6th International Conference on Language Resources and Evalu ation, 331–340. Lo Cascio, Vincenzo (ed.) 2012. Dizionario combinatorio compatto italiano. Amsterdam/Philadelphia: John Benjamins. Mel’čuk, Igor / Clas, André / Polguère, Alain 1995. Introduction à la lexicologie explicative et combinatoire. Paris/Louvain-la-Neuve: Duculot. Mel’čuk, Igor / Polguère, Alain 2006. Dérivations sémantiques et collocations dans le DiCo/LAF. Langue française. 150, 66–83. Mel’čuk, Igor 1998. Collocations and Lexical Functions. In Cowie, Anthony P. (ed.) Phraseology. Theory, Analysis, and Applica tions. Oxford: Clarendon Press, 23–53. Mel’čuk, Igor 2003. Collocations: définition, rôle et utilité. In Grossmann, Francis / Tutin, Agnès (eds) Les collocations. Analyse et traitement. Amsterdam: De Werelt, 23–31. Paillard, Michel 1997. Co-texte, collocations, lexique. In Guimier, Claude (ed.) Co-texte et calcul du sens. Caen: PUC, 63–72. Rundell, Michael 2012. ‘It works in practice but will it work in theory?’ The uneasy relationship between lexicography and matters theoretical. In Proceedings Euralex 2012 (electronic publication). Russo, Domenico 2010. Modi di Dire. Lessico italiano delle collocazio ni. Roma: Aracne. Sinclair, John 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

18

Adriana Orlandi, Laura Giacomini

Tarp, Sven 2008. Lexicography in the Borderland between Knowledge and Non-Knowledge. Tübingen: Niemeyer. Tiberii, Paola 2012. Dizionario delle collocazioni. Bologna: Zanichelli. Tognini Bonelli, Elena 1991. Corpus Linguistics at Work. Amsterdam: John Benjamins. Urzì, Francesco 2009. Dizionario delle combinazioni lessicali. Luxembourg: Convivium.

Adriana Orlandi

Monolingual collocation lexicography: State of art and new perspectives

Abstract: In this paper we will outline a state of art of research in monolingual collocation lexicography. Our aim is to give the reader an overview of the main results attained by linguists and lexicographers in this domain, starting from the definition of collocation (Section 1), examining (semi)automatic extraction of collocation candidates useful to the lexicographic work (Section 2), and going through to the organization of a dictionary of collocations (Section 3), with its macro and microstructure (Sections 3.1 and 3.2). The last part of this paper (Section 4) will be devoted to the development of electronic dictionaries. Each topic will be dealt with by considering both lexicographic theory and practice, and relevant aspects for the development of future collocation lexicography will be particularly emphasized. Keywords: collocations – lexicography – definition – linguistic theories

1. Defining collocations from a lexicographic perspective According to Williams, “collocation is not a grammatical or lexical structure that accepts a precise definition, but a phenomenon of language that can only be defined within a school of thought and with a particular application in mind” (2013: 91). Three important issues come to the fore. First, the (im)possibility of specifying the concept of collocation, and most particularly the impossibility for a definition of collocation to be formulated out of a theoretical background. Second, the lexical and/or grammatical nature of the collocational phenomenon. Third, the relationship between the definition of collocation and its possible applications, which invites us to the question of a lexicographically viable definition. These three issues will be addressed in sections 1.1 to 1.3. We will then consider some open questions related to the practice

20

Adriana Orlandi

of collocation dictionary making (section 1.4), and finally we will try to investigate the possibility of a prototypical approach to collocations (section 1.5). 1.1 Collocations and linguistic theories It is widely known, since its origins the concept of collocation has been tightly linked to linguistic theories. Although Firth’s writings were collected and published later (Firth 1957; Palmer 1968), the word collocation was introduced by J.R. Firth, the father of the Contextualist approach, as early as the 1930s. Underlying the notion of collocation was the concept of frequently recurrent word combinations and the idea that the meaning and usage of a word, or node, can be characterized by its most typical collocates. As Williams (2003: 34) points out, Firth’s Contextualism was a reaction against Structuralism and a way to contribute to Hornby’s pedagogical lexicography. Contextualism encouraged the study of languages in context in order to improve understanding, teaching, and translating tasks. This approach dates back to Dr. Johnson’s lexicography. He undertook the description of the words’ actual usage in contrast with the prescriptive tradition which aimed to impose the ‘correct’ use of a word. The Birmingham School, led by John Sinclair, carried on Firth’s work and gave great impetus both to corpus linguistic and computational lexicography (Sinclair 1966, 1991), giving rise to the COBUILD project, the first large scale corpus established from a lexicographic perspective, which culminated in the production of the famous Collins COBUILD English Dictionary (1995). Since then, the empirical approach to collocations has been attempting to develop statistical parameters that assess the degree of significance of a co-occurrence, in order to provide good collocation candidates and thus improve methods for extracting collocations, especially for lexicographic purposes (see Section 2). Another well-known use of the word collocation relates to phraseology. In phraseology, the term refers to (semi-)compositional and lexically determined word combinations.1 This is seen as more 1 Mel’čuk (2012: 36) refuses the idea of partial or semi-compositionality. In his view, a phraseme is either compositional or non compositional. We will come back to compositionality in § 1.5.

Monolingual collocation lexicography: State of art and new perspectives

21

theoretical definition because it is more clearly concerned with the description of the lexical restrictions on the elements that make up a collocation (see Lo Cascio and Prandi in this volume) and understands a collocation as a binary relation between base and collocate, where the collocate fully realizes its meaning only when coupled with its base (confirmed bachelor, to withdraw money, etc.). Moreover, collocations are seen as units of langue and not of parole in the Saussurean sense. This goes back to Bally’s notion of groupement usuel (1909).2 It was developed by Hausmann (1979, 1989, 1999), and has found widespread acceptance: suffice to mention work carried out by Grossmann and Tutin (2002, 2003), and the formal approach to collocations developed by Mel’čuk and associates within the context of Meaning-Text Theory, where collocations are an application of the notion of Lexical Func tion (Mel’čuk/Clas/Polguère 1995; Mel’čuk 2003; Mel’čuk/Polguère 2006). Like the empirical approach to collocations, the phraseological approach is strongly linked to lexicography. Hausmann’s work on collocations aims essentially at discussing the conditions and criteria under which a collocation dictionary can be compiled, and one cannot but mention the DEC’s project (Dictionnaire explicatif et combina toire du français contemporain, 1984, 1988, 1992, 2000) conducted by Mel’čuk, and its most divulgative version, the LAF (Lexique actif du français, 2007). As Evert points out, “there is considerable overlap between the phraseological notion of collocation and the more general empirical notion put forward by Firth […], but they are also different in many respects” (2008: 1213). Broadly, the phraseological notion of collocation tends to be more restrictive than the empirical one: “good and time are strongly collocated in the empirical sense, but a good time can hardly be understood as a non-compositional or lexically restricted expression” (2009: 1213). The reason is that the phraseological approach is more interested in a qualitative evaluation of the concept of lexical restriction, whereas the empirical approach is mainly based on statistics, and 2

“Il y a série ou groupement usuel lorsque les éléments du groupe conservent leur autonomie, tout en laissant voir une affinité évidente qui les rapproche, de sorte que l’ensemble présente des contours arrêtés et donne l’impression du déjà vu” (Bally: 1909, § 83).

22

Adriana Orlandi

therefore it does not consider the nature and type of lexical restriction as essential to the definition of collocation. To make things more complicated, the two main approaches to the definition of collocation are both open to more or less restrictive interpretations and to the natural evolution of theories. On the one hand, the phraseological definition of collocation can extend its limits to include some free word combinations or even morphology. For instance, in Meaning-Text Theory the Lexical Function Magn (intensification) groups together expressions such as peur bleue (which is a collocation in the narrowest sense) and peur intense (which is a free combination in a strict phraseological sense). Still within the framework of the Meaning-Text Theory, Beck/Mel’čuk (2011) introduce the concept of morphological phrasemes, that is, phraseologized expressions at the morphological level found in both derivation and inflection. In this case, collocations are no longer considered a lexical phenomenon because phraseologization is said to go far beyond the boundaries of lexicon: phraseologization is not restricted to a particular type of linguistic sign — that is, not just to phrases. The defining characteristics of phrasemes, paradigmatic restrictedness and syntagmatic noncompositionality, characterize in principle all types of complex linguistic sign. Therefore, word forms built out of morphemes show the same properties of phraseologization as phrases built out of lexemes. (Beck/Mel’čuk, 2011: 177)

On the other hand, the empirical approach does not guarantee uniformity in the selection of collocation candidates, and not only because of the great variety of statistical measurements available and of parameters that can influence the resulting set or ranking of collocations (see below). Another important question is the nature of collocations, lexical or grammatical, which leads us to the second issue raised by William’s (2013) claim. 1.2 Delimiting collocations: collocation and colligation Firth drew a clear line between collocation and colligation. McEnery/ Xiao/Tono (2006: 11) illustrate the evolution of the concept of colligation:

Monolingual collocation lexicography: State of art and new perspectives

23

According to Firth (1968: 181), colligation refers to the relations between words at the grammatical level, i.e. the relations of ‘word and sentence classes or of similar categories’ instead of ‘between words as such’. But nowadays the term colligation has been used to refer not only to significant co-occurrence of a word with grammatical classes or categories […] but also to significant co-occurrence of a word with grammatical words […].

Thus, the term colligation has extended its meaning and is used nowadays to refer to a large variety of structures which can be extracted from corpora by means of different tools. If we take the word agreement, agreement about + N (or NP), agreement + that clause, or in agreement with + N (or NP) can be seen as colligations. The distinction between collocation and colligation is now basically represented by the difference between lexical and grammatical collocations, where grammatical collocation has become a synonym of colligation in its widest sense, that is a structure made up of “a dominant word (noun, adjective, verb) and a preposition or grammatical structure such as an infinitive or a clause” (BBI 2009: XIX). An idea of what kinds of phenomena are covered by the concept of grammatical collocation is given in the preface to the BBI Combinatory Dictionary of English, where eight types of grammatical collocations (G1–G8) are identified: G1: Noun + Preposition: blockade against, apathy towards; G2: Noun + to + Infinitive: They made an attempt (an effort, a promise, a vow) to do it; G3: Noun + that clause: We reached an agreement that she would rep resent us in court; G4: Preposition + Noun: by accident, in advance; G5: Adjective + Preposition: they were angry at everyone; G6: Predicate Adjectives + to + Infinitive: it was necessary to work / she is ready to go; G7: Adjective + that clause: she was afraid that she would fail the ex amination; G8: Verb patterns (19 different patterns): verbs that form a collocation with a specific preposition (to come by train), verbs that are followed by to + infinitive (he decided to come), etc.

24

Adriana Orlandi

Given that the structure of grammatical collocations can be different from language to language, it is hoped that a typology of grammatical collocations can be identified for each language. The distinction between lexical and grammatical collocations is another point that distinguishes the phraseological from the empirical approach to collocations. Only lexical collocations are taken into account by Hausmann (1989, 1999), mainly because, in his view, collocations are basically “un savoir lexical, […] une propriété du lexique” (Hausmann/Blumenthal 2006: 3). Thus, in his view, only lexical collocations should be included in a collocation dictionary. 1.3 Defining collocations for lexicographic purposes Crucially, the third issue raised by Williams (2013: 91) is the relationship between the definition of collocation and its possible applications (“collocation […] can only be defined […] with a particular application in mind”). It is not clear whether theoretical reflection about collocations must be strictly related to a particular application of the concept. However, if the focus is on lexicography, with the final aim of producing tools to help speakers and language learners in activities such as decoding and encoding texts, it is clear that a working definition of collocation would be useful. It might even be suggested that in collocation lexicography the notion of collocation should be moulded around the lexicographic function(s) of a dictionary of collocations. As Tarp (2008: 168) points out, “It is a dictionary’s functions that […] determine which data it should contain and how this data should be structured and made accessible”. The core question, then, seems not to be, “What is a collocation?”, but, basically, “What is a collocation under a lexicographic perspective?”. On this point, two main positions can be recognized. The first one (Fuertes-Olivera et al. 2012) clearly rejects the ‘linguistic’ point of view on the definition of collocations because both the phraseological and distributional (frequency-based) definitions do not respond to the true nature of lexicography. As stressed by Fuertes-Olivera et al., “the construction of lexicographical information systems presupposes circumscribed data categories and therefore the linguistic concept of

Monolingual collocation lexicography: State of art and new perspectives

25

collocation cannot be translated into the realm of lexicography” (2012: 299). The reason why the linguistic definition of collocation is not adequate for lexicography is that it is not sufficiently precise and agreed on by members of the scientific community. Following Gries (2008), Fuertes-Olivera et al. (2012: 299) give a list of the issues connected to the formal definition of collocation which still have to be clearly circumscribed by theorists: (i) the nature of the elements involved in a collocation, (ii) the number of elements necessary for constituting a collocation, (iii) the degree of lexical and syntactic flexibility of the elements involved in a collocation, (iv) the role played by non-compositionality in defining a collocation, […] (v) the permissible distance between the elements involved in a collocation, (vi) the number of times an expression appears before it is considered a collocation, and (vii) the different units that are identified as collocations.

Given these unresolved problems, collocations are then defined in a very different way to the linguistic approach to collocation: The term collocation was chosen as an umbrella term for referring to word combinations that are typical for the kind of language in question, and which can be useful for re-use in text production or for assisting in text translation. They are composed of two or more ortographic words, do not constitute a full sentence, but offer potential users the possibility of obtaining relevant information. (Fuertes-Olivera et al. 2012: 299)

Here the definition of collocation is tightly linked to the notion of rele vance (Tarp 2008; Fuertes-Olivera/Nielsen 2011; Bothma/Tarp 2012), that is “the condition of being directly connected with the subject field, the dictionary function(s), the use situation in which the dictionaries are intended to be used, and the level of competence of the intended users” (Fuertes-Olivera et al. 2012: 294). Following this principle, a collocation can also coincide with a very long expression provided that it appears to be useful to dictionary users. This was the principle followed for the compilation of the accounting database described by Fuertes-Olivera et al. (2012), containing 26,000 collocations of the accounting domain. The so-called Accounting Dictionaries are a set of online specialized dictionaries connected to the database and intended particularly to help students and translators in encoding tasks. Collocations are used for

26

Adriana Orlandi

offering relevant information (that is, information “directly connected with accounting, the dictionary functions […], and a user’s medium-to-low level of competence in accounting” [Fuertes-Olivera et al. 2012: 305]), but also implicit grammar knowledge and word knowledge. Another approach to the definition of collocation that can be adequate for lexicography is modelled on the linguistic definition and consists of adopting a ‘functional definition’ which is not directly a ‘lexicographical definition’ but a reasonable compromise between linguistic theory and lexicographic practical needs. Within this frame, one definition that has received general approval among lexicographers is Bartsch’s (2004). In her view, collocations are understood as “lexically and/or pragmatically constrained recurrent co-occurrences of at least two lexical items which are in a direct syntactic relation with each other” (2004: 76). Bartsch carries out a statistical selection to extract collocation candidates, and then submits collocation candidates to qualitative criteria. Collocations are seen not only as the result of lexical constraints, but also of “pragmatic constraints on lexical selection” (Bartsch 2004: 178).3 As for the syntactic dimension, Bartsch argues that: A syntactic relation between words in a word combination indicates a potential semantic relation. This relation can be assumed to become relatively stable and established in the language when the word combination occurs frequently in always the same constellation and meaning (2004: 71).

Thus, Bartsch’s definition of collocation “takes a middle road” (Evert 2008: 1213) between the theoretical (phraseological) and empirical (distributional and frequency based) approaches, and it is sufficiently broad to be useful for lexicographic purposes. Moreover, it is favoured by those lexicographers who are interested in hybrid methods for the extraction of collocation candidates, that is partly quantitative and partly qualitative (see Section 2.1). However, it should be noted that this definition focuses on lexical collocations, leaving grammatical collocations out. 3

“Other collocations become carriers of fixed and stereotyped embedded meanings which are deeply rooted in the cultural background of a linguistic community (e.g. age of consent, affirmative action)” (Bartsch 2004: 177).

Monolingual collocation lexicography: State of art and new perspectives

27

Another functional definition of collocation is found in Giacomini (2010: 1184): In a narrow sense, collocations were defined as idiomatic multiword expressions […], subject to restricted compositionality, substitutability and modifiability, and identifiable through standard idiom tests. In a broad sense, I considered collocations as familiar word combinations recurring in our mental lexicon […], mostly associated with typical scenes, i.e., situational contexts.

This twofold approach to collocations seems to assign the hard core of the category to lexical collocations in the phraseological sense, but is also open to considering other types of word combinations, and in particular familiar word combinations associated with specific situational contexts. Grammatical collocations do not appear to be included. Here again the definition of collocation is broad enough to be easily applied to lexicography. For a more detailed description of this approach see Giacomini in this volume. A dictionary of collocations that has adopted a broad and layered concept of lexical collocation, very much like Bartsch (2004) or Giacomini (2010) is the Oxford Collocations Dictionary (2009). Having adopted a “pragmatic, rather than theoretical” (2009: V) approach to the question of which collocations to include in the dictionary, the editors decided to give the full range of collocation – from the fairly weak (see a movie, an en joyable experience, extremely complicated), through the medium-strength (see a doctor, direct equivalent, highly intelligent) to the strongest and most restricted (see reason, burning ambition, blindingly obvious) […]. Totally free combinations are excluded and so, for the most part, are idioms. Exceptions to this rule are idioms that are only partly idiomatic. An idiom like not see the wood for the trees has nothing to do with wood or trees, and is therefore excluded; but drive a hard bargain is very much about bargaining, even if the expression as a whole can be considered to be idiomatic. (2009: V–VI)

In this way the dictionary intends to satisfy the users’ needs (the need for students of English to “express [their] ideas naturally and convincingly” [back cover]). In a similar way, in the Dictionnaire des combinai sons de mots (2007), collocations (here named word combinations) are selected according to the following pattern:

28

Adriana Orlandi des plus libres (ex. grand débat, obtenir une aide), aux plus idiomatiques (ex. colère noire, comptes d’apothicaire). En revanche, les locutions figées (ex. l’oreille basse, avoir voix au chapitre) et les associations très ponctuelles ou sans intérêt (ex. diverses prévisions, nouvelles certitudes) ont été écartées. (2007: VI)4

A dictionary of collocations that integrated the concept of grammatical collocation from its first edition (in addition to the concept of lexical collocation), thus expanding the notion of collocation to an even larger definition, is the above mentioned BBI Combinatory Dictionary of English (2009): Traditionally, the combination of words into grammatical patterns has been called colligation or complementation or construction (though in BBI it is called collocation, too) and its result has been called valency. A dictionary that provides both phraseology and valency is a dictionary of word combinations (3d ed. 2009: VII).5

The BBI specifies exactly what type of grammatical and lexical collocations are included in the dictionary. As in the former cases, the dictionary “does not include free lexical combinations” (2009: XXXI), and “does not normally include idioms, i.e. frozen expressions in which the meaning of the whole does not reflect the meanings of the component parts” (2009: XXXIV). Some exceptions are made, such as phrases expressing a simile, like as free as a bird.

4

5

It is interesting to observe the use of terminology here. Fixed word combinations (locutions figées) are said to be left out from the dictionary, but the examples given by the authors (l’oreille basse, avoir voix au chapitre) clearly refer to idioms (non compositional expressions), not simply to fixed word combinations. If idioms are necessarily fixed word combinations, fixed word combinations are not necessarily idiomatic (e.g. emergency landing). In order to signify both semantic fixedness (non compositionality) and syntactic fixedness, Mel’čuk (see for instance Mel’čuk 2011) uses the term locution, as opposed to phraseme which refers only to syntactic fixedness, and G. Gross (1996) speaks of composé (or locution) exocentrique. The expression locution figée is thus not really considered appropriate in this context, and that the use of the word idiomatique should also be specified. The italics are added by the present author.

Monolingual collocation lexicography: State of art and new perspectives

29

1.4 Open questions An attempt is now made to summarize the main problems that lexicographers have to face when compiling and publishing a dictionary of collocations, as regards the definition of collocation. One of the first things to do, we have seen, is to decide whether or not to include grammatical collocations. The majority of collocation dictionaries do not appear to contain grammatical collocations, but this is an issue that should not be underestimated. It is likely that in the future lexicographers will pay increasing attention to grammatical collocations due to the importance that this class of constructions have in the process of language learning, especially considering the development of electronic dictionaries (see Section 4). These observations are echoed in Siepmann’s (2005: 2) definition of collocation, which accomodates grammatical collocations: “a collocation is any holistic lexical, lexico-grammatical or semantic unit which exhibits minimal recurrence within a particular discourse community”. This definition is explicitly presented as a “viable lexicographic definition of collocation”, based on the notions of statistical significance (Sinclair 1991) and holisticity (Siepmann 2003).6 It encompasses a huge variety of collocations: (a) Colligation (you can stick your + NP, far be it from me to + INF, ignorer tout de + N, il n’y a qu’à + INF, ce/cette N [tradition, etc.] est resté(e), NP dans l’âme, typisch + N, etc.) […]. (b) Collocation between lexemes or phrasemes (just as + clause … so / in the same manner + clause, levy charges, briser ses chaussures, c’est-à-dire en l’occurrence, regarde où tu vas, bon ben, à la fin, etc.). (c) Collocation between lexemes and semantic-pragmatic (contextual) features (beautifully + [result of creative activity], [uncertainty] + not so, [question] + eh bien, [expectation] + duly, [negative contextual aspect] + (not) detract from s.o.’s enjoyment, help! […]). (d) Collocation between semantic-pragmatic features (e.g. long-distance collocations […]). (Siepmann 2005: 2)

6

Holisticity “refers to the facts that native speakers can ascribe meaning to general-language collocations even if these are divorced from context and that such units are intuitively considered as self-contained ‘wholes’ ” (Siepmann 2005: 2).

30

Adriana Orlandi

This passage accounts for the needs of modern lexicography and tries to frame the broadest lexicographic definition of collocation, including colligations and pragmatic features. As Siepmann points out, “the future [of lexicography] clearly belongs to collocation and colligation in the widest possible sense” (2005: 1). Back to lexical collocations, another open question concerns free combinations. Lexicographers must ask themselves to what extent free combinations are to be included in a dictionary: the question remains as to whether it is actually necessary to include combinations such as a good, bad, right, wrong decision or also regret a decision, as the semantic possibility of combining these words is neither unexpected nor beyond an advanced learner’s linguistic knowledge. (Götz-Votteler/Herbst 2009: 53)

One possible way to cope with this problem is suggested by Hausmann: il y a un problème du fait que la microstructure de l’article obéit à une logique onomasiologique, à savoir, du sens au mot, et que le sens peut souvent être exprimé par une combinaison libre. […] Voyons l’exemple hôtel. L’auteur de l’article doit se demander quels sont les verbes employés typiquement avec la base hôtel. On y descend, y couche, y passe la nuit, y loge, le quitte. Quitter semble plutôt autosémantique (son rayon combinatoire est immense), mais dans l’optique onomasiologique il ne semble pas inutile à sa place systématique. L’étranger peut ne pas avoir ce mot à sa disposition ou se demander: est-ce qu’on peut dire quitter un hôtel? (1999: 131)

Here, Hausmann underlines the positive role that some free combinations can play in a dictionary of collocations. This, of course, does not mean that all free combinations must be included in a dictionary, but encourages lexicographers to reflect upon the organization of the microstructure of the entries. Receiving free combinations in a collocation dictionary does not necessarily imply that free combinations are admitted within the range of what are defined, from a theoretical point of view, as collocations. It simply means that compiling a dictionary requires some practical adjustments in order to satisfy the users’ needs. As previously mentioned, the Oxford Collocations Dictionary (2009) accepts fairly weak combinations such as see a movie, an enjoyable experience, extremely complicated, and so does the Dictionnaire des combinaisons de mots (2007).

Monolingual collocation lexicography: State of art and new perspectives

31

A third decision that lexicographers have to take, is whether or not to include idioms in a collocation dictionary. As Heid/Weller observe, As there is no clearcut boundary between more collocational and more idiomatic multiwords […], a dictionary aimed at several user types, usage situations and dictionary functions may need to cover both, collocations and idioms, in (almost) equal depth (2010: 332).

Heid (2008: 343) reminds us that the distinction between compositional and non-compositional (or only partial compositional) expressions is close […] to the way a dictionary user who reads a text and stumbles over a MWE would operate. For him or her, the most important question is not to know what type of MWE he or she is facing, but where in the text the MWE starts and ends, and what it means.

This is why Heid suggests that what is most crucial for NLP is not a particularly fine-grained subdivision into specific kinds of multiword phrases, but, in the first place, a classification into compositional and non-compositional (or only partly compositional) expressions (2008: 342).

These remarks raise a few interesting questions. The first concerns the degree of explicitness of information in dictionaries. If it is true that users are not necessarily interested in linguistic classifications, this does not mean that dictionaries can refrain from adopting a system of classification of MWEs. As Bergenholtz/Gows point out: lexicographers do need a classification system for the successful execution of their lexicographic endeavours in total. This does not mean that the lexicographer always has to convey such a classification to the users but in the execution of their tasks they do need a comprehensive classification system (2013: 11).

We believe that the classification of MWEs is not only a genuine task for linguistic phraseology, but that it plays an important role in lexicography too.

32

Adriana Orlandi

The second question concerns more specifically the presence of idioms in collocation dictionaries. If it is true that “a dictionary aimed at several user types, usage situations and dictionary functions may need to cover both”, idioms and collocations (Heid/Weller 2010: 332), one may ask what a dictionary aimed at more specific user profiles, usage situations and dictionary functions is supposed to do. Generally speaking, the presence of idioms in collocation dictionaries can of course be envisaged. This would, however, imply first of all the passing of purely ‘collocation’ dictionaries at the advantage of more general ‘word combination’ dictionaries. Moreover, if it is true that the communicative role of idioms and collocations is not the same because idioms appear to be less used than expected (Nuccorini 2003: 369), this difference should be taken into account, especially when compiling collocation dictionaries for encoding purposes. A third issue relates to the lexicographical treatment of collocations and idioms. Nuccorini (2003: 368) observes that usually collocation dictionaries follow an onomasiological approach to the treatment of collocations, whereas dictionaries of idioms are based on a semasiological approach. This remark could be softened by saying that both idioms and collocations can undergo an onomasiological and a semasiological treatment. Once again, it depends on the dictionary type and function. 1.5 Envisaging a prototypical approach to collocations: the paradox of collocations With regard to the definition of a collocation, one thing must be acknowledged: apart from the approach adopted (purely lexicographic or purely linguistic), the definition of a collocation has to be consistent with the characteristics of the dictionary one intends to compile. Thus, in the field of lexicography, Willliam’s statement that a “collocation […] can only be defined […] with a particular application in mind” (2013: 91) is particularly relevant. Adapting the definition of a collocation to lexicographic needs means, as we have seen, to enlarge the boundaries of the category from collocations in a strict phraseological sense, to “collocations and colligations in the widest possible sense” (Siepmann

Monolingual collocation lexicography: State of art and new perspectives

33

2005: 1). This does not mean, however, that the concept of ‘collocation’ has to become so vague as to be emptied of all meaning. ‘Collocation’ is not a monolithic category but rather a heterogeneous one, so its definition cannot be moulded on the model of categories characterised by definite borders and whose members share all and the same properties, but rather on the model of categories provided with a core and a complex stratification of peripheral members. This is why we think that an approach such as that of the Prototype Theory could be envisaged to outline an appropriate definition of ‘collocation’. The Prototype Theory, born in the domain of cognitive studies (Rosch 1973), has found many applications in linguistics, starting from the semantic domain (Kleiber 1990) and going through the study of lexicon where the theory has been applied to the definition of lexical categories such as adjectives (Goes 1999). It has also invested lexicography and most particularly the treatment of meaning in dictionaries (Geeraerts 2013). As Prandi points out, a prototype is neither an image of a paradigmatic instance […] nor a simple collection of empirical data […]. A prototype is not part of actual experience, but a model filtering our access to experience. The prototype of bird, for instance, is neither the image of a given bird nor a collection of random data about birds, but a hierarchy of shared ideas about what a bird should ideally be and what a bird is empirically allowed to be – a filter organizing the experience of real birds (2004: 192).

The search for this kind of “abstract prototype” (Goes 1999: 38) or “prototype-entité construi[t] d’attributs typiques” (Kleiber 1990: 65) has not been conducted yet in the field of phraseology, nor the field of collocations in particular. Rather, what has been looked for is the best representative of the category, or a prototypical collocation. Interestingly, this path seems to lead to a paradoxical situation. According to the traditional phraseological view, collocations are prototypically combinations of two lexical words which stand in some kind of grammatical relationship to one another. The meaning of one of these (the collocate) can be said to emerge from its use with the other word (the base). (Mittmann 2013: 500)

34

Adriana Orlandi

To agree with this position means recognizing a quite obvious fact: that collocations are essentially a lexical restriction, that is a kind of “contrainte des mots sur les mots” (Hausmann 1979: 191). However, the implication of this statement is not obvious at all. If collocations are above all lexical restrictions, then prototypical collocations (that is, the most representative examples of lexical restrictions) cannot but coincide with expressions such as (fr) peur bleue, chocolat viennois or arme blanche, that is, with opaque collocations (Grossmann/Tutin 2003). As a matter of fact, as the meaning of blue, viennois and blanche depend on their association with, respectively, peur, chocolat and arme, these examples enhance the process of lexical restriction where, as stated by Hausmann, “le collocatif […] ne réalise pleinement son signifié qu’en combinaison avec une base” (1979: 192), or as stated by Firth (1957: 11), one knows a word by the company it keeps. What we find paradoxical is that many linguists indeed have an opposite view as to what they consider a prototypical collocation. Opaque collocations show a certain degree of semantic and syntactic fixedness which makes them difficult not only to encode but also to decode. For this reason, these collocations are considered peripheral members of the category, and they are at the opposite side of prototypical collocations which are compositional and hence, easily interpretable, e.g. rouge de honte ‘red with shame’, peinture fraîche ‘fresh paint’, avoir froid ‘to be cold’, etc. As pointed out by Hausmann (1997: 282), such collocations are easily decoded by foreign language learners, even if their encoding may be problematic, due to their idiosyncratic character. (Lamiroy in this volume)

The problem, so to speak, with these collocations, that Grossmann/ Tutin (2003) call regular collocations (collocations régulières), is that they are not based on a lexical restriction but, as argued by Orlandi (2013), they are grounded on selection restrictions, or at the most on cognitive models, a kind of content-based restriction belonging to a system of concepts that is not language-specific but shared by a very large community (see also Prandi in this volume). This is why they are easily decodable and to some extent predictable (cf. also Tutin 2013). What

Monolingual collocation lexicography: State of art and new perspectives

35

differentiates these collocations from free combinations, then, is mostly syntactic and cognitive fixedness7. The paradox of collocations can thus be outlined in the following way: if the starting point for the definition of a collocation is the traditional concept of lexical restriction, the prototypical core of the category has to be identified with opaque collocations, that is, true lexical restrictions. However, if we favour compositionality as the main definitorial property of collocations (together with syntactic and cognitive fixedness), we end up by isolating regular collocations as the nuclear layer of the category, a subset which is not based on lexical restrictions but on selection restrictions or cognitive models. To solve this dilemma one can go back to the semantic notion of a prototype. According to Kleiber, “le prototype n’est vraiment considéré comme le meilleur exemplaire que s’il apparaît comme celui qui est le plus fréquemment donné comme tel” (1990: 49). If it is true that we have to rely on the speaker’s intuition to find the prototypical instances of a category, then we would probably turn to opaque collocations as being the most representative instances of the category. Given a word such as alley, a native speaker or even a non-native speaker would probably think of blind alley as a prototypical instance of a collocation rather than dark or empty alley. However, this is a matter for further research in the domain of psycholinguistics. What should be stressed here is that the definition of a collocation would certainly benefit from the application of the Prototype Theory both at a theoretical level and in the practice of lexicography. At a theoretical level, adopting a definition based on the idea of a prototype or prototypical properties would allow the description of collocations without specifying each time if an expression is a collocation ‘in a large sense’ or ‘in a strict sense’. One would speak in terms of prototypical vs. non prototypical collocations. Moreover, this kind of approach would explain the terminological fluctuation when considering collocations with regards to compositionality. Until now, there is no 7

By cognitive fixedness we mean that “les locuteurs savent que les mots apparaissent ensemble dans telle ou telle construction et que l’utilisation de l’expression est conventionnelle et partagée par la plupart des locuteurs” (Svensson 2002: 777).

36

Adriana Orlandi

agreement in scientific literature. Some scholars consider collocations as mainly or totally compositional expressions (Mel’čuk 2011; Tutin 2013; Squillante 2014a), while others prefer to talk of semi-compositionality (Seretan 2009). If it is true that “idiomaticity implies an element of non-compositionality” (Cap/Weller/Heid 2013: 35), then collocations where the collocate has an idiomatic meaning (fr. peur bleue, chocolat viennois or arme blanche) are admittedly only partially compositional. Some authors try to bring these examples back to the notion of full compositionality by saying that collocations are compositional “dans la mesure où il est possible d’associer à chaque élément de l’expression un contenu sémantique et de calculer le sens de l’ensemble à partir des composants” (Tutin 2013: 57), so peur bleue is compositional because bleue refers to intensity. We believe that defining collocations by means of the Prototype Theory would allow consideration of both full and partial compositionality as properties equally acceptable for collocations, with some collocations being more compositional than others. In the practice of lexicography, the advantage of adopting a definition based on the Prototype Theory would be the possibility of having a flexible concept fit for lexicographic purposes. It would cover the whole range of collocations, taking into account differences in specific types of collocations, and this definition could be adapted to the specific lexicographic functions of a dictionary. Just to give one example, frequency can be considered a prototypical feature of collocations, but as this property does not necessarily have to be shared by all members of the category, this would not prevent lexicographers from including, especially in specialised dictionaries, collocations with a low frequency of occurrence. As a matter of fact, it is common knowledge that “frequent lemmas are not very useful in specialised lexicography” (Fuertes-Olivera/Tarp 2014: 204). To conclude this section, research still has to be conducted in order to verify the possibility of applying the Prototype Theory to the concept of ‘collocation’. The search for prototypical instances of the category leads to a paradoxical situation where opaque collocations can be said to be prototypical as they enhance the concept of lexical restriction, and regular collocations can be said to be prototypical as

Monolingual collocation lexicography: State of art and new perspectives

37

they embody compositionality in collocations. Maybe we should try to establish a hierarchy of shared ideas about what a collocation should ideally be and what a collocation is empirically allowed to be.

2. Extraction of collocation candidates The automatic extraction of collocation candidates from corpora has become one of the most challenging issues in collocation lexicography. According to Gries, this can be viewed as an “[attempt] to come to a potentially much more generally applicable definition of ‘collocation’ ” (2013: 139), and this is the reason why we dedicate this chapter to extraction methods. Computational phraseology does not deal exclusively with collocations. Its domain of interest concerns the so-called MWE (multi-word expressions), which assemble all the phraseological units, “a broad continuum between non-compositional (or idiomatic) and compositional groups of words” (Moon 1998, in Heid 2008: 339). Heid (2008: 340) lists 13 types of MWE which have already been analysed by computational phraseologists, and among which we find colligations, proverbs, idioms, routine formulae of conversation and, of course, collocations. Hausmann (1989) and Tutin (2010a), among others, underline the importance of exploiting two types of sources to obtain collocation candidates: corpora and dictionaries, both of which allow automatic or semi-automatic extraction. In the next two sections we will try to summarize the main achievements in both research domains. 2.1 Extraction from corpora The way a corpus is preprocessed is fundamental to determine if co-occurrence between words has to be interpreted as surface, textual or syntactic proximity (Evert 2008). Thus, “depending on extraction objective, raw text corpora, parts-of-speech (POS)-tagged and lemmatised corpora, or chunked and possibly parsed corpora may be used” (Heid

38

Adriana Orlandi

2008: 350). Preprocessing can also vary from language to language. For English, for instance, where morphological variation is limited, (POS)tagged and lemmatised corpora have proved to be more suited. However, for German or Turkish the POS-pattern approach appears to be less convincing, and a dependency-parsing approach has proved to be more efficient (Uhrig/Proisl 2012). Different methods have been developed for the extraction of collocation candidates from corpora. These can be roughly divided into three categories: (mainly) statistical procedures, (mainly) symbolic procedures, and hybrid procedures. According to the statistical approach, “collocativity can be operationalised in terms of cooccurrence frequencies and quantified by mathematical association measures” (Evert 2008: 1242) which measure the attraction between cooccurring words. The number of AMs has increased enormously over the years. One of the last recognition studies about AMs (Pecina 2009) counted 82 different measures, but the number is expected to grow. An important factor that has often been emphasized is that “there is no single ‘best measure’ and that the selection of a measure may depend on the kinds of phraseological units the researcher wants to extract” (Heid 2008: 352; see also Evert 2005 and 2008; Barnbrook et al. 2013). Evert suggests that “It is perhaps better to apply several measures with well-understood and distinct properties than attempt to find a single optimal choice” (2008: 1243). The most recent trends in the elaboration of AMs suggest the development of directional measures, that is, measures that are able to account for the asymmetric status of words in a collocation or MWE: “p(word1|word2) [where p stands for probability] is not the same as p(word2|word1), just compare p(of|in spite) to p(in spite|of)” (Gries 2013: 141). Gries complains about the fact that (as of 2009) “the issue of directionality has not received the attention that one of the most important notions in our field deserves” (2013: 151). The importance of directional measures for lexicography is that, as they provide directionality information, they can be useful to improve the ‘directionality’ of dictionary entries, “i.e. which part of a complex expression to choose as the headword under which an expression will be found” (Gries 2013: 152). Among the most promising directional measures we can mention

Monolingual collocation lexicography: State of art and new perspectives

39

are Gries’ ΔP (Delta P, 2013) and Colson’s CPR (Corpus Proximity Ratio, 2014). Additionally, lexicographers may also apply other procedures, symbolic or hybrid. These methods have received great attention in the last ten years, and studies have become so numerous that it is impossible to give a thorough account of their achievements here. We will just mention a few of them, trying to account for the diversity of approaches. Symbolic procedures consider morpho-syntactic preferences (number or case preferences, definiteness, etc.) as indicative of collocativity (Bannard 2007; Weller/Heid 2010). Another type of symbolic procedures, the so-called semantic procedures, instead are not directly connected to the semantics of collocations, rather they are ‘approximation’ procedures which measure the semantic opacity of MWEs by means of context analysis, substitutability and/or word alignment. Context analysis aims to show that compositional MWEs appear in contexts more similar to their constituents than non-compositional MWEs (Baldwin et al. 2003; Katz/Giesbrecht 2006). The underlying hypothesis of this kind of semantic approach is that items with similar contexts share meaning components […]. Thus, nouns denoting pet animals (dog, cat, hamster) will occur with similar verbs (e.g. keep or feed). To identify hot dog as not being a kind of ‘dog’ (and thus being a non-compositional expression), one may check for typical verbs showing up with hot dog and compare its typical contexts with those of other words; verbs like eat, cook, serve will be shared by hot dog, meal, burger, sandwich, etc. (Heid 2008: 353)

Studies on substitutability identify non-compositional MWEs by measuring to what extent their constituents can be substituted by synonymic terms (Van de Cruys / Villada Moirón 2007). Substitutability can also combine with syntactic procedures (Squillante 2014b). This method has proved to be particularly effective thanks to its capacity to obtain very similar results to those achieved by statistical tests, but without involving statistical calculations. According to Squillante, “the PI [Prototypicality Index, the algorithm based on syntactic and semantic properties] works better on larger scales and appears to be useful to lexicographers who are interested in retrieving more efficiently MWEs when considering a high coverage” (2014b: 936). Coming to word alignment, this

40

Adriana Orlandi

method uses parallel corpora to identify MWEs (Caseli et al. 2009). Tools have been developed for the automatic alignment of texts, like, for instance, the aligner Giza++ (Och/Ney 2003). In this case, “the assumption is that single words used compositionally will usually be consistently translated by one equivalent or one of very few alternatives” (Heid 2008: 353). Word alignment is more often combined either with (morpho)-syntactic procedures (Zarrieß/Kuhn 2009; Weller/Fritzinger 2010; Tsvetkov/Winter 2011; Cap/Weller/Heid 2013), or with statistical procedures, thus generating a hybrid extraction method (Ramisch et al. 2010; Tsvetkov/Winter 2012). As previously stated, hybrid methods are a combination of statistical measures and symbolic procedures. Hybrid methods can combine: statistical measures and semantic procedures (as mentioned above); statistical measures and syntactic analysis (Seretan 2009 and 2011); and statistical measures, syntactic and semantic procedures (Heid/Weller 2010; Squillante 2014a). In this case a very rich feature set is adopted. The extraction methods described are developing very fast, becoming more and more effective and accurate in the extraction of MWEs, with higher percentages of precision and recall. All of them have implications for the definition of a collocation. On the one hand, symbolic and hybrid procedures provide useful criteria for distinguishing collocations from idioms. Just to give an example, one of the most recent studies on Italian MWEs (Squillante 2014a) has proved, thanks to the application of a hybrid procedure measuring inflection variability, interruptibility and substitutability, that multiword units8 are characterized by low values of interruptibility and low values of substitutability. Lexical collocations can be more easily interrupted if they have low values of substitutability, while they do not allow for interruptibility if they have high substitutability. (Squillante 2014a: 80)

Moreover, “inflection variability does not play a role in discriminating between the categories” (Squillante 2014a: 79). These empirical results represent an important step towards a deeper knowledge of what collocations are, and of the boundaries between collocations and idioms. 8

The expression multiword units is here used as a synonym of idiom.

Monolingual collocation lexicography: State of art and new perspectives

41

On the other hand, statistical measures can also contribute to better circumscribing the definition of a collocation. However, in order to do so they have to take into account the following issues (issues 1–5 are taken from Gries 2013; issue 6 is taken from Colson 2014): 1. 2. 3. 4. 5. 6.

directionality (the fact that there is a difference between a MWE read from left to right and one read from right to left); dispersion (the distribution of MWEs over corpora); type-token distributions; extendability of statistical measures to MWEs made of more than two words; cognitive and/or psychological relevance; encapsulation (the fact that many MWEs are part of a larger expression or contain themselves smaller expressions).

The Corpus Proximity Ratio, the algorithm developed by Colson (2014), satisfies properties 1, 2, 4 and 5. What is particularly interesting is that CPR does not correlate with frequency of occurrence9. Concerning the definition of a collocation, this new measure leads to a definition very much like the functional definitions discussed in section 1.3, which explicitly consider the cognitive-psychological dimension in the description of what a collocation is. 2.2 Extraction from dictionaries With Hausmann (1989: 1012) we claim that “On a […] intérêt à profiter de l’ensemble des collocations recensées par l’ensemble des dictionnaires existants”. Hausmann (1999) suggests exploiting information about collocations in general monolingual dictionaries, bilingual dictionaries and specialized dictionaries. Dictionaries are an important repository of collocations but only few works have used this approach in NLP (= natural language processing). Tutin (2010a) mentions the pioneering work of Fontenelle (1997), who extracted collocations out of 9

A new version of CPR presented by Colson at EUROPHRAS 2015 in Malaga combines fixedness and frequency, and is used to differantiate MWEs within a continuum going from “partly fixed” to “very fixed and frequent”.

42

Adriana Orlandi

the bilingual dictionary Collins-Robert and classified them according to the model of Lexical Functions. Another experience is described in Lux-Pogodalla/Tutin (2008), where the two linguists used the Trésor de la Langue Française informatisé to extract collocations belonging to the scientific domain. Collocations are not easy to find in a paper dictionary. But the possibilities offered from electronic dictionaries open a path towards the automatic extraction of collocations and MWE. For example, in the Petit Robert (online version) it is possible, to isolate collocations and idioms by accessing the field “exemples et expressions”. In TLFi the field “syntagme” is dedicated to lexical collocations.10 Collocations can be extracted out of a single entry, or by using the function “recherche assistée” or “recherche complexe” that make it possible to find a word’s collocations across the whole dictionary. However, access to collocations in a general dictionary (in a paper or electronic format) remains problematic, because very often collocations are scattered across the dictionary, and their treatment is not consistent: collocations are sometimes entered under the base, other times under the collocate, and they can be found not only under the “SYNT.” head, but also in examples, definitions and quotations.11 This is why Hausmann (1996, 1999) talks about the “visible” and the “hidden” face of a general dictionary, where the visible face refers to the information directly accessible (in this case regarding collocations), while the hidden face denotes the submerged information accessible only after targeted searches.

10 11

A detailed description of the exact location of collocations in Petit Robert and TLF (both paper and electronic version) can be found in Tutin (2005). Hausmann (1996: 41–42) observes that in TLF, collocations can be found at eight different places: “1. dans la rubrique SYNT. […], 2. dans le microcosme nodal, 3. comme unité de traitement interne sans indicateur de statut, 4. comme unité de traitement interne précédé de l’indicateur Loc., 5. dans les citations détachées, 6. dans les citations enchaînées, 7. dans les définitions, 8. dans l’information synonymique et antonymique”.

Monolingual collocation lexicography: State of art and new perspectives

43

2.3 Extraction methods and collocation dictionaries If we now turn to the introductions to some dictionaries of collocations, reference to the methods used for the extraction of collocations is only rarely included. Tutin (2010b) and Corpas Pastor (this volume) give some useful information on this point. The BBI dictionary of English word combinations (1997) does not even mention the type of corpus used, whereas for the 2009 edition, “many sources have been used, including: Internet searches; The British National Corpus; Reading and listening to English-language material” (back cover). However, no information about extraction methods is available. The LTP dictionary of selected collocations (2002) is based on a press corpus but does not mention extraction methods. On the back cover, the Oxford colloca tions dictionary for students of English (2009) claims to be “Based on the authority of the Oxford English Corpus, a collection of nearly two billion words of text”, but nothing is explicitly said about extraction methods (we know, however, that the Oxford English Corpus is usually analysed with Sketch Engine). The Macmillan Collocations Dictionary (2010) “has been compiled using leading-edge collocation-finding software and a 2-billion word corpus of modern English” (back cover). In this case, information is explicitly given about the name of the software (Sketch Engine). Much more limited is the corpus used by the French Diction naire collocationnel du français général (1990), 2 million words, and collocations have probably been extracted, according to Tutin (2010b, 1086), by manual selection. The Dictionnaire des cooccurrences (2001) is based on a corpus of novels of the 19th and 20th centuries, newspapers and magazines. The new edition (Grand dictionnaire des cooccur rences, 2009), which “comporte environ 5000 entrées, soit 800 de plus que le Dictionnaire des cooccurrences, ainsi qu’un plus grand nombre d’adjectifs, de verbes et de locutions verbales pour chaque entrée” does not give further information about corpora and extraction methods. To compile the Dictionnaire des combinaisons de mots (2007), “un corpus de 500 millions de mots […] (presse généraliste et spécialisée des 6 dernières années, textes littéraires contemporains) a été utilisé, exploité par un outil d’extraction des collocations” (Tutin 2010b: 1086), but information about the extraction methods was only obtained thanks to

44

Adriana Orlandi

personal communication with Dominique Le Fur. Finally, while claiming that it pays special attention to standard language, the more recent Dictionnaire combinatoire compact (2011) says nothing about corpora and extraction methods. One dictionary worthy of mention is the French Le Grand Druide des cooccurrences (2012). In the Introduction, the editors specify that their dictionary made double recourse to IT tools: Nous avons d’abord écumé l’Internet pour constituer un corpus totalisant 1,8 milliard de mots, ou 92 millions de phrases, tiré de plus de 2.400 sources distinctes. Parmi celles-ci, des sites journalistiques, comme ceux du Monde et du Devoir, des bibliothèques numériques, comme Gallica et Projet Gutenberg, et de nombreux sites d’intérêt général, le tout en proportions mutuellement pondérés. […] Nous avons ensuite mis à contribution […] l’analyseur d’Antidote, qui forme le cœur du logiciel d’aide à la rédaction […] que Druide a créé et continuellement raffiné depuis 1996. Comme l’analyseur d’Antidote sait traiter automatiquement de grandes quantités de textes, nous lui avons fait disséquer notre gigantesque corpus pour en extraire toutes les associations pertinentes. […] Un filtre statistique a permis de ne retenir que les combinaisons fortes, c’est-à-dire significativement fréquentes et singulières. (2012: XI)

Here very precise information is provided regarding the way cooccurrences were selected. One peculiarity of this dictionary is that it is the printed version of an electronic dictionary, the dictionary of cooccurrences of the Antidote software. This is very unusual: electronic dictionaries are most often the product and reworking of paper dictionaries, not the contrary. The electronic origins of Le Grand Druide des cooccurrences thus explain the use of IT tools and represent the strongest feature of this product. The statistical filter enabled extraction of cooccurrence candidates, but then a manual selection of cooccurrences was applied: “Nos linguistes se sont ensuite longuement penchés sur ces résultats bruts afin de s’assurer de leur exactitude et de leurs intérêt, retranchant parfois certains résultats anodins ou in exacts” (2012: XI). A similar procedure has been adopted for Quasthoff’s Wörterbuch der Kollokationen im Deutschen (2011), where significant cooccurrences (“signifikante Nachbarschaftskookkurrenzen”) have been extracted via statistical measurements as a first step, and a manual selection was

Monolingual collocation lexicography: State of art and new perspectives

45

then carried out to establish lists of collocates. The Feste Wortverbind ungen des Deutschen (2014) contains 2000 entries belonging to common language, and collocations (95,000) are said to have been automatically extracted from electronic corpora, but no precise information is given about that. Turning to Italian dictionaries of collocations, Matt (this volume) shows that information about corpora and extraction methods is very poor. Two notable exceptions are the Dizionario combinatorio compatto italiano (2012) and Dizionario combinatorio italiano (2013): In making this dictionary many sources have been used, ranging from the knowledge and intuition of many native speakers to monolingual and bilingual dictionaries and electronic corpora such as Sketchengine, and others on the Internet delivering a lot of authentic texts. Also used was a database compiled over more than 30 years in order to compile an electronic, bilingual, Italian-Dutch dictionary, […] in which many of these combinations had already been researched. (2013: XI).

It is worth noting that monolingual and bilingual dictionaries are mentioned here as sources of collocations in combination with electronic corpora. As for Spanish dictionaries of collocations (see Capra and Corpas Pastor in this volume), the Diccionario combinatorio práctico del español contemporáneo “está elaborado a partir del mismo corpus de prensa española y americana (68 publicaciones periódicas; 250 millones de palabras) con el que se preparó REDES” (2006: XIV), but no reference is made to extraction methods.

3. Principles of organization of a collocation dictionary: Open questions This section addresses some of the questions that characterize current scientific discussion on the organization principles of collocation dictionaries.

46

Adriana Orlandi

3.1 The macrostructure The first issue regards the macrostructure. As Tutin (2010a: 118) observes, dictionaries of collocations are almost exclusively organized around the Hausmanian distinction between base vs. collocate (Hausmann 1979), so that a collocation is usually mentioned under the base entry together with a list of collocates. Sometimes, as in the Diction naire collocationnel du français général (1990), a collocation is accessible under both the base and the collocate. This double access is also available in the electronic version of the Oxford Collocations Diction ary (2009), while its printed counterpart only provides access through the base entry. Recently some lexicographers have proposed considering collocations as specific entries (or sub-entries) in the dictionary, just like bases and collocates, treating collocations on the same footing as single word items (see, e.g., Heid/Gouws 2006: 982): collocations “should be ‘promoted’ to the same headword status as ‘normal’ lemmata: what is needed, is information about the collocation, not only its mention. The collocation becomes a fully-fledged lexicographic treatment unit”. Tutin (2010a) discusses the pros and cons of this proposal: on the one hand, as Heid/Gows (2006) point out, collocations can be assigned not only a grammatical category as a whole, but also a pragmatic class, and they exhibit distributional preferences and semantic and morpho-syntactic relations with other words. Moreover, collocations do not always inherit the properties of their constituents, so that they constitute a lexical entity endowed with specific properties. On the other hand, “il serait regrettable de ne pas tenir compte de l’aspect régulier de nombreuses collocations, en particulier en ce qui concerne l’héritage de propriétés syntaxiques du collocatif ” (Tutin 2010a: 136). Another problem would be the excessive volume of the information to code, since the number of collocations is much greater than the number of bases. Some collocations presented as lemmata can be found in the Lexique Actif du Français (2007): levée de boucliers, ministre du culte, parties génitales, petit ami, plaque dentaire, poignée de main, prise de sang, etc. It is interesting to observe that parties génitales (genitals) is a plural entry, and that petit ami (fiancé) is treated at the same time as a collocation and as a compound noun (“locution nominale”). Also in

Monolingual collocation lexicography: State of art and new perspectives

47

the Oxford Collocations Dictionary (2009) some collocations acquire the status of entries: per cent, physical therapist, point of view, political asylum, press conference, prime minister, pros and cons (for letter P). With the development of electronic dictionaries, a new way of organizing the macrostructure of a dictionary of collocations can be envisaged. Searching for collocations could then be carried out in multiple ways according to the encoding vs. decoding objectives of the dictionary. Just to give one example, in the Accounting Dictionaries, a search for English reserve retrieves 38 hits, some of which are multiword terminological expressions. Some of these terminological word expressions are included as collocations in the dictionary article for reserve and as lemmas, e.g. statutory reserve. (Fuertes-Olivera et al. 2012: 301–2)

In relation to the problem of organizing a dictionary around the base vs. collocate distinction, the question can be raised of the possibility of always identifying a dominant constituent in every instance of collocation. On one hand, the base-collocate hierarchy is “une réalité psychologique observable en discours” (Hausmann/Blumenthal 2006: 4), and this hierarchy (or collocation orientation in Hausmann and Blumenthal’s terms) “is de facto institutionalized in [the] lexicographic practice, even though it may be assumed that the collocation dictionary user probably uses more syntactic clues than the notions of base and collocate” (Tutin 2008: 1456). On the other hand, some lexical associations do not seem to have a dominant constituent. For instance, with reference to the expression a pack of dogs, Bartsch (2004: 36) queries which of the constituents should be considered the base. There is also the case of conjoined collocations such as rich and famous, sain et sauf, slowly but surely, etc., where “from a syntactic viewpoint, the coordination suggests that no dominant element emerges, and from a semantic viewpoint both elements seem to have an equal importance” (Tutin 2008: 1457). From a theoretical point of view these can be considered non-prototypical collocations. In lexicographic practice, one solution is to record these two-base collocations under both entries; they could also constitute a specific entry in an electronic dictionary. Moreover, regarding Bartsch’s critical example a pack of dogs, Tutin (2013) observes that these cases do not challenge the asymmetric status

48

Adriana Orlandi

of collocations. According to Tutin (2013: § 3.2), expressions like these show a semantic and syntactic ambiguity so that it is possible to have at least two interpretations. In expressions sharing the structure NQUANT of N (a pack of dogs), if the noun quantifier is interpreted as a sort of semantically empty modifier, then we can speak of collocation and consider pack as the collocate. However, if the noun quantifier is analysed as a full lexical word, then the collocate is dogs. Another issue that has received much attention is the binary status of collocations. A detailed analysis of this point can be found in Tutin (2008 and 2013), where the author distinguishes the case of merged and recursive collocations from true ternary (or more) collocations. Only in the latter case the collocation cannot be decomposed and we can speak of true ternary collocations (e.g. in other words). In merged collocations, instead, two collocations have the same base and can combine syntactically (e.g. pay attention + close attention → pay close attention). Third, in recursive collocations, collocations can themselves be used as base or as collocate (e.g. freshly baked bread, where freshly is the collocate of baked and freshly baked is the collocate of bread). As for their lexicographic treatment, Tutin (2008: 1456) argues that: “In dictionaries, merged collocations and recursive collocations should be decomposed whenever they can be, but very frequent clusters should be mentioned if necessary within the base entries. True ternary collocations could be mentioned within the base entry”. 3.2 The microstructure For the sake of brevity, we set aside discussion of all the aspects concerning the organization of the microstructure of a dictionary of collocations, and concentrate on issues that appear to be significant for the development of collocation lexicography. Therefore we don’t deal with the question of the syntactic and semantic organization of lexical entries,12 and address issues such as information about frequency and fixedness of collocations, examples, and pragmatic markers. 12

For some hints to the semantic organization of the lexical entry, see, among others, Siepmann 2005; Walker 2009; Giacomini 2010; Tutin 2010b (Section 5.1).

Monolingual collocation lexicography: State of art and new perspectives

49

3.2.1 Frequency and fixedness of collocations As for frequency, Heid points out that despite the quantitative importance of MWEs, no detailed data about their frequency is available […]. When general dictionaries provide frequency data, they do so for lemmata, not for their uses or the word combinations in which they occur. (2008: 349)

Likewise, in collocation dictionaries “l’information liée à l’usage (fréquence, registres de langue) est également peu développée, ce qui apparaît fort gênant pour des ouvrages qui sont souvent décrits comme destinés à l’encodage” (Tutin 2010a: 124). Walker (2009: 296) compares the treatment of collocations in three collocation dictionaries (Oxford Collocations Dictionary 2002, BBI Dictionary of English Word Combinations 1997, LTP Dictionary of Selected Collocations 1999), and finds that “one weakness in the design of the three collocational dictionaries is that there is no indication of the relative frequency of the collocates listed in an entry. […] Information of this kind would be very useful”. Tutin (2010a: 124) observes that using electronic corpora could open some very interesting perspectives in that the possibility of exploiting sub-corpora could enable the user to know how frequent a collocation is in a particular context, thus helping him/her in the choice of the good collocate. What seems to be important, then, is not so much frequency but dispersion of collocations in corpora and sub-corpora. Only few corpus-based editing tools give this type of information. In the Antidote’s dictionary of cooccurrences, for example, cooccurrences are listed according to their syntactic structure and their frequency, so that the user can know which of the cooccurrences listed within a syntactic frame is the most frequent. Also the Sketch Engine takes frequencies into account to create lists of cooccurrences. Some examples of collocations searched with the help of Antidote and the Sketch Engine are discussed in Tutin (2010a). Regarding collocation dictionaries, a printed dictionary including explicit information about frequency is Le Grand Druide des cooccurrences (2012). Thus, within the lexical entry balai (broom), cooccurrences are first grouped according to a syntactic criterion, and then, within each syntactic category, they are listed according to their

50

Adriana Orlandi

“statistical strength”: passer le balai (to sweep) is then understood to be more frequent than voler sur un balai (to fly on a broom) which appears below in the list. Frequency of co-occurrence is also a central element of the Emolex database and the Lexical Database for French, two electronic tools that will be discussed in Section 4. Frequency may become a more interesting piece of information if it is associated with information about fixedness. There is no discussion in scientific literature about this point, but it is beyond doubt that this type of information could be very useful in collocation dictionaries to give users (especially students) a deeper awareness of the combinatorial dimension of a language. A project coordinated by Colson at the University of Leuven is worth mentioning in this connection. This project aims to develop a tool that will enable users not only to search for collocations in texts, but also to highlight them with different colours according to their degree of fixedness and frequency. Coming to collocation dictionaries, a dictionary that has systematically integrated information about frequency and fixedness is the Italian collocation dictionary Modi di dire. Lessico italiano delle collocazioni (2010), but unfortunately this is actually “one of the most arguable aspects of the work” (Matt in this volume). As a matter of fact, the “frequency and fixedness” datum is used as a criterion for classifying collocations at a macro structural level. However, this classification is based on the erroneous assumption that fixed collocations invariably display low frequency, whereas non-fixed collocations are always frequent. This, of course, betrays a rather simplified and naïve view about collocational behaviour. 3.2.2 Examples Another important improvement to be desired in collocation dictionaries is direct reference to authentic examples. As Blumenthal (2005: 280) points out, “peut-on se contenter d’exemples (au sens restreint) dans un dictionnaire de collocations ou bien faut-il des citations authentiques?”. Collocation dictionaries behave in a very different way from one to another. El Diccionario combinatorio práctico del español

Monolingual collocation lexicography: State of art and new perspectives

51

contemporáneo (2006) contains only decontextualized examples, as its editor observes in the introduction: Los ejemplos que proporciona PRÁCTICO no están documentados. Han sido construidos por los redactores a imitación de otros muchos similares que se encuentran en el corpus con el que se ha elaborado el diccionario. […] Se ha procurado que los ejemplos sean sencillos, que resulten naturales a los oídos de los hablantes nativos y que puedan servir de pauta para que los estudiantes de español como segunda lengua construyan otros similares. (2006: L)

The French Dictionnaire des combinaisons de mots (2007) makes use of decontextualized sentence examples (la maison dispose d’un jardin privé), but also of noun phrases and famous literary quotations (“Il faut cultiver notre jardin” [Voltaire, Candide]). In the Macmillan Colloca tions Dictionary (2010) “each group of collocates (or single collocate) is followed by a contextualized example, or, less frequently, by more than one example; all examples are in the form of complete sentences” (Coffey 2011: 331). See for instance: Today we are on the verge of technological advances that will redefine how we wage war. On the back cover, the Oxford Collocations Dictionary (2009) highlights the presence of 75,000 examples (over 250,000 word combinations). These are in the form of complete contextualized sentences and noun phrases extracted from the Oxford English Corpus. A striking difference between Antidote’s electronic dictionary of cooccurrences and its printed version (Le Grand Druide des cooccur rences, 2012) is the complete lack of examples in the printed version, whereas the electronic version is rich in examples manually selected from the enormous corpus used. Each example in the electronic version is a quotation from the corpus, and the source of the quotation is always cited (Gallica, Lire.fr, Cybersciences, etc.). An alternative strategy adopted by some dictionaries is to replace examples with explanations. In the Dictionnaire combinatoire compact du français (2011) the majority of collocations or word combinations is followed by an explanation (célébrer la mémoire de qqn, dire du bien de qqn qui est mort). Examples are very rare and they usually follow the explanation (célébrer qqch, fêter qqch, ex. célébrer un anniversaire, un mariage, etc.). This strategy of using examples to support explanations is more systematically used by the Dizionario combinatorio

52

Adriana Orlandi

italiano (2013), where very often the contextualized examples, always in the form of complete sentences, follow explanations provided within square brackets that “help to give the context in which the expression can be used” (2013: IX): per nessuna cosa al mondo [per nessuna ragione] non lo farei mai, per nessuna ragione al mondo, lanciarmi col paracadute! This brief overview gives an idea of how different the treatment of examples in collocation dictionaries can be, when a general agreement on the nature and function of examples is still lacking. 3.2.3 Markers Another important aspect to be developed in collocation lexicography is the use of markers that can help users (students, translators, etc.) to place a collocation, or a specific aspect of its use (a plural form, and so on), in its proper context: Not only the collocation as a whole, but also certain aspects of its use (e.g. specific morphosyntactic forms) may be marked with respect to style, register, region or time. We thus foresee a marking on the collocation as a whole […], on its elements, and on its morphosyntactic usage properties. (Heid/Gows 2006: 984)

Considering the existing collocation dictionaries from this point of view the situation is very heterogeneous and Tutin (2010b) complains of the lack of systematic treatment of markers in collocation dictionaries. Markers can be found in the BBI Combinatory Dictionary of Eng lish (2009), where diatopic markers (“American English”, “American”, “British English”, etc.), are distinguished from diaphasic (“colloquial”), and diachronic (“obsolete”) markers. Figurative use of language is also indicated (“figurative”), and markers referring to specific domains are quite numerous (“anatomical”, “linguistics”, “music”, and so on). The same type of markers can be found in the Oxford Collocations Diction ary (2012) where they are called “usage labels”, and sometimes differ in terminology (“old-fashioned” for “obsolete”) and quantity of labels (in the OCD the diaphasic markers are more numerous). The Macmil lan Collocations Dictionary (2010) provides numerous “usage notes” concerning formal characteristics of the expression, but is quite poor in pragmatic markers, apart from the red “informal” style label. The

Monolingual collocation lexicography: State of art and new perspectives

53

Dictionnaire des combinaisons de mots (2007) and the Grand Druide des cooccurrences (2012) make use of markers. The former uses diaphasic markers (“familier”, “ironique”, “humouristique”, etc.), diachronic markers (“vieilli”), rhetoric markers (“sens littéral”, “sens figuré”, “euphémisme”), and markers referring to specific domains (“Bourse”, “commerce”, “droit”, etc.). In the latter there are only two diaphasic markers (“familier” and “très familier”), but no less than 72 markers referring to specific domains! Other dictionaries have no pragmatic markers at all, e.g. the LTP Dictionary of Selected Collocations (2002) and Beauchesne’s (2009) Grand dictionnaire des cooccurrences. As regards Italian dictionaries, as Matt (this volume) points out, the only dictionaries that use markers are the Dizionario combinatorio compatto (2012) and the Dizionario combinatorio italiano (2013): The semantic field is sometimes introduced by an indicator which specifies […] its discipline or field of terminology […]. This dictionary has identified almost 60 disciplines. A semantic field may also include an indicator showing its specific usage, for example fig. (figurative), iron. (ironic) or it may include a register indicator such as vulg. (vulgar) […] (2013: VIII).

Bosque’s Diccionario combinatorio práctico del español contemporá neo (2006) contains some pragmatic markers, but no labels referring to specific domains: “En cuanto a la información sociolingüística, las únicas marcas que contiene PRÁCTICO son coloquial (col.), despectivo (desp.), vulgar (vulg.), irónico (irón.) y poético (poét.)” (2006: LI). Finally, Quastoff’s Wörterbuch des Kollokationen im Deutschen (2011) does not contain pragmatic markers, while the Feste Wort verbindungen des Deutschen (2014) contains diaphasic and diatopic markers.

4. Paper and electronic dictionaries This paper concludes with a brief discussion of the transition from printed lexicographical works to electronic dictionaries. Electronic

54

Adriana Orlandi

dictionaries have greatly improved over the last few years, and the beginning of the new century has witnessed renewed interest in the subject (see Granger/Paquot (eds) 2012). This is due to a number of factors: (1) corpus integration; (2) more and improved data; (3) efficiency of access; (4) customization; (5) hybridization; and (6) user input (Granger 2012). Rundell (2012: 29) highlights the importance of the democratization of access to corpus lexicography: “the cost of acquiring corpora (in time and money) has fallen dramatically, while general computing power has increased”. Billion-word corpora are now available, and offer the possibility of enriching the set of lexical entries and collocations, making a huge variety of examples available for lexicographers. Space limitations are no more a frustration for dictionary professionals, but modern lexicographers have a new challenge to face: while the storage space within an electronic dictionary is unlimited, the presentation space (Lew 2011) is limited, so that “the article visualized on the screen should include the smallest possible amount of data, i.e. exactly the amount needed to satisfy the user’s needs in each consultation” (Tarp 2012: 118). Accessibility of information and customization are also important elements. Not only does information have to be accessible in a quick and straightforward manner, but it also has to be accessible in a way that satisfies the needs of a particular user in a particular situation, so that all and only the information needed is provided. Users can have two types of information needs, global and punctual (Tarp 2012). Global information needs typically arise in cognitively oriented situations (users wanting to learn about facts or objects), whereas punctual information needs typically arise in communicatively oriented situations (text production or text reception) and concern a small amount of information useful to solve a specific question. Electronic dictionaries should allow the possibility of undertaking different types of consultation: more of a browsing-type access in case of global information needs, and more of a searching-type access in case of punctual information needs (Debus-Gregor/Heid 2013: 1005). Customization is one of the most important challenges for the future of electronic lexicography and lexicography in general. Tarp (2012) outlines the development of the Function Theory of Lexicography from

Monolingual collocation lexicography: State of art and new perspectives

55

an interest in types of users to an interest in individual users, and laments a lack of electronic dictionaries “which permit individualized solutions to concrete and individual users in concrete situations” (2012: 117). Finally, hybridization is becoming a central question for electronic lexicography. Hybridization is the combination of one or more types of reference work in a single product (Hartmann 2005) and examples of this kind of combination are beginning to appear: Paquot (2012) describes the Louvain English for Academic Purposes Diction ary (LEAD) as a “dictionary-cum-writing aid tool”; Kübler and Pecman (2012: 199) consider the ARTES dictionary (Aide à la Rédaction de TExtes Scientifiques) as “both a multilingual LSP dictionary and a terminological database”. Asmussen (2013) analyses combined products which result from hybridization of electronic dictionaries and corpus query tools. According to him, a dictionary-corpus product should give the opportunity to consult separately the dictionary and the corpus, although these two instruments should also be linguistically interlinked. Until now, “none of the existing approaches possesses both of these two features” (Asmussen 2013: 1084). For the compilation of electronic dictionaries, lexicographers have at their disposal efficient corpus tools that facilitate their work. The word sketch, for instance, provides lexicographers with a convenient one-page summary of a word’s grammatical and collocational behaviour. The Sketch Engine’s GDEX program (Good Dictionary EXamples) “attempts to automatically sort the sentences in a concordance according to how likely they are to be good dictionary examples […]. GDEX scores sentences using heuristics for readability and informativeness” (Kilgarriff/Kosem 2012: 46). GDEX has been designed for the English language. However, as stated in the Manual for GDEX, “the exact way of sorting the sentences can be adapted for various languages” (web)13. Another useful tool is the TickBox Lexicography (TBL) program, that “allows lexicographers to select examples of collocates from a list of (good) candidates, and export the selected examples into the dictionary writing system” (Kilgarriff/Kosem 2012: 50). Another option is to combine a corpus tool and a dictionary writing system in a single program, in order to use one and the same interface to make 13 .

56

Adriana Orlandi

corpus analysis and write dictionary entries, as in the TLex Dictionary Production System. A question we should ask at this point is whether collocation lexicography is ready to exploit these new tools. Tarp (2012: 116–117) categorizes electronic lexicographical works into four types: 1. Lexicographical works which have been either photocopied or directly copied from a text file and then placed on an electronic platform, frequently as a PDF-file. 2. Lexicographical works which have made use of the new technologies only […] to provide quicker access to their data by means of links or search strings, but where the data included are still organized in traditional and static articles […]. 3. Lexicographical e-tools that […] make use of the existing technology […] also to adapt the dictionary articles to the various functions displayed by the dictionary. The result is dynamic articles with dynamic data […]. 4. Lexicographical e-tools which permit individualized solutions to concrete and individual users in concrete situations […].

Nowadays collocation dictionaries clearly belong to the second category. Only a few dictionaries of collocations also exist in an electronic version, which in turn is very similar to the paper version. Tiberii’s Dizionario delle collocazioni (2012) on CD-Rom offers exactly the same content as the paper dictionary. The electronic version of the Oxford Collocations Dictionary (on CD-Rom) is more interactive and contains spoken pronunciation for every word and a section of interactive exercises. Moreover, in this version collocation searches can be made not only for the base but also for the collocate. The Longman Col locations Dictionary and Thesaurus (2013) provides an online access using a PIN number in the printed dictionary. Users can “get the full contents of the printed dictionary online plus additional collocations and thesaurus entries” (official website), and find interactive exercises to practise collocations. For a more detailed analysis, see Benigno and Corpas Pastor in this volume. Two online French dictionaries of collocations belong to the second type of electronic dictionaries: the Dictionnaire des cooccurrences

Monolingual collocation lexicography: State of art and new perspectives

57

of the Language Portal of Canada14 is the electronic version of Beau chesne’s Dictionnaire des cooccurrences (2001). It has not been updated to the Grand dictionnaire des cooccurrences (2009), the new version compiled by Beauchesne’s daughters, currently available only in printed format. The electronic dictionary is no more than a paper dictionary in electronic format, and it inherits the weaknesses of the printed version: lack of examples or markers, use of alphabetical order (instead of the frequency order) for the list of collocates, absence of adjective or verb entries. Another online electronic dictionary is Gonzáles Rodríguez’s Dictionnaire des collocations, published on the Internet in 2004 and last updated in August 2014.15 Adjective, noun and verb collocations extracted from literature and newspapers can be searched, and for each word entry the dictionary gives lists of collocates that are divided into categories (“adjectives”, “verbs” – if the word entry is a noun – “recurrent segments”, “analogies”) and alphabetically ordered. Exercises are also available. Gonzáles Rodríguez’s dictionary is not the electronic version of a paper dictionary, but it still remains a dictionary where the data are organized in traditional and static articles. No reference is made to frequency or fixedness of collocations, and no examples are provided. A more dynamic dictionary is the electronic dictionary included in the writing aid tool Antidote. As discussed above (Section 3.2.1), co-occurrences are listed according to the syntactic structure in which they appear and their frequency, and authentic examples extracted from the corpus can be visualized together with their source. Antidote’s dictionary is the first electronic dictionary of collocations to have generated a paper dictionary (the Grand Druide). Another electronic dictionary belonging to the second type, but more dynamic, is the electronic version of Dizionario combinatorio italiano (for a short presentation see Lo Cascio in this volume), which gives the opportunity of making cross-reference searches (for instance, all the word combinations belonging to a specialized domain), and of

14 . 15 .

58

Adriana Orlandi

partly customizing the dictionary by adding personal comments that can be shared by all users. It must be said, however, that the transfer of a printed dictionary to the electronic medium does not affect the underlying collection of lexicographic data in most cases, so that “the properties of the transferred product [are] somewhat ‘in between’ those of a paper dictionary and those of a dictionary conceived to exist exclusively as an electronic tool” (Debus-Gregor/Heid 2013: 1001–2). An electronic tool which aims to be a dictionary of the third type (more a database than a dictionary) is the Lexical Database for French (cf. Verlinde 2010).16 In this case “dynamic articles with dynamic data” that match the specific needs of users are expected to be found. However, as Asmussen (2013: 1083) correctly points out, “the emphasis of BLF [Base Lexicale du Français] lies more on experimenting with a new way of theoretically conceiving the functionality of electronic dictionaries rather than on providing a fully-fledged product”. Just to give one example, when searching for a word it is possible to check its combinations, and to verify their usage contexts by clicking on a button that shows a list of examples taken from a corpus (Le Monde 1998). If we check the collocations for amour, we find, for instance, that mourir is the second most frequent verbal collocate. However, when we take a look at the corpus examples we find examples such as Son film, lu mineux, tout en images impressionnistes (d’Alain Levent), est un hymne à la nature, à la vie, à l’amour jusqu’à l’approche de la mort, where mort is obviously not a collocate for amour. A more specific e-tool focussing on collocations is EmoConc.17 This is a corpus query interface created to search the corpora of the Emolex project, a corpus-based project regarding the Lexis of emotions in five languages (German, French, English, Russian, and Spanish). The project ended in June 2013, and the database has been available on the Internet since then. Users can choose the language of the corpus and decide whether the corpus should be monolingual or not. All corpora are taken from the press. Once the corpus has been selected, three types of queries can be executed: “lexicograms”, “concordances”, and “complex 16 . 17 .

Monolingual collocation lexicography: State of art and new perspectives

59

expressions”. A lexicogram is “a chart which displays the frequencies of the main collocates of one or several search words” (official website); a list of authentic examples appears in a pop-up window by clicking on each base or collocate. The “concordance” query gives access to a list of concordances. For each example it is possible to zoom to a wider context, and to obtain a syntactic and morphological annotation. Complex expressions can be searched by clicking a tools icon whereby more specific queries can be submitted. It is to be hoped that EmoConc will by extended to other semantic fields in the future.

5. Conclusions Two types of conclusions can be drawn from the discussion. As regards the definition of collocation, the conflict between the qualitative and the quantitative definition can be settled, from a theoretical point of view, if we assume, as Blumenthal does, that the proponents of a quantitative interpretation […] do not reject the idea that collocations arise from selectional restrictions, nor do they deny the existence of contextually enriched idiomatic meanings. What they are wary of rather, is the fact that these theoretical ideas can hardly be formalised for practical exploitations of text corpora. For this reason, their definition of the notion of collocation relies on probabilistic, statistical methods […] with the expectation that their definition would also cover those word pairs that qualify as colloca tions under a qualitative analysis (2007: 69).

Collocation lexicography, which relies more and more on the exploitation of text corpora, definitely needs a broad and functional definition of collocation, aiming at satisfying the needs of its users, which range from understanding a text to producing or translating it. A broad definition of collocation appears to meet the needs of lexicography, and makes it possible to envisage the production of wide-ranging and effective lexicographic works. As regards the compilation of collocation dictionaries, much work remains to be done in this domain, especially concerning the

60

Adriana Orlandi

microstructure of a dictionary of collocations. Information such as frequency and fixedness of collocations or their pragmatic value in discourse are of great importance if editors seriously take into consideration the practical needs of the dictionary users. A mere list of collocates, however divided into semantic groups, or, worse, an alphabetical list of collocates is no more conceivable. Direct access to corpora, to check the usage context of collocations, has become another essential task of lexicography. All these tasks are becoming more practicable thanks to the help of electronic lexicography, which has received a great impulse especially in the last decade. Given the plurality of users and users’ needs, it is to be hoped that electronic dictionaries of collocations will become flexible objects, both adaptable (i.e. “involv[ing] manual customization by the user”, Granger 2012: 4) and adaptive (i.e. “adapt[ing] automatically to the users thanks to the dictionary logs”, Granger 2012: 4). But, to do that, it is essential that publishers invest in editorial projects of this type.

References General References Asmussen, Jörg 2013. Combined products. Dictionary and corpus. In Gouws, Rufus H. et al. (eds) Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume. Berlin/ Boston: De Gruyter, 1081–1090. Baldwin, Timothy et al. 2003. An empirical model of multiword expression decomposability. In Proceedings of the ACL 2003 Workshop on Multiword expressions, Sapporo, Japan. Stroudsburg, PA, USA: Association for Computational Linguistics, 89–96. Bally, Charles 1909. Traité de stylistique française. Paris: Klincksieck. Bannard, Colin 2007. A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In Proceed ings of the Workshop on a Broader Perspective on Multiword

Monolingual collocation lexicography: State of art and new perspectives

61

Expressions, Prague, Czech Republic, June 2007. Stroudsburg, PA, USA: Association for Computational Linguistics, 1–8. Barnbrook, Geoff / Mason, Oliver / Krishnamurthy, Ramesh 2013. Collocation. Applications and implications. Palgrave Macmillan. Bartsch, Sabine 2004. Structural and Functional Properties of Colloca tions in English. Tübingen: Gunter Narr Verlag Tübingen. Beck, David / Mel’čuk, Igor 2011. Morphological phrasemes in Totonacan verbal morphology. Linguistics. 49/1, 175–228. Bergenholtz, Henning / Gows, Rufus 2013. A lexicographical perspective on the classification of multiword combinations. Internation al Journal of Lexicography. 27/1, 1–24. Blumenthal, Peter 2005. Le dictionnaire de collocation: un simple dictionnaire d’exemples? In Heinz, Michaela (ed.) L’exemple lexicographique dans les dictionnaires français contemporains. Niemeyer, 265–282 (“Lexicographica. Series Maior”, vol. 128). Blumenthal, Peter 2007. A Usage-Based French Dictionary of Collocations. In Kawaguchi, Yuji et al. (eds) Corpus-based Perspectives in Linguistics. Amsterdam/Philadelphia: John Benjamins, 67–83. Bothma Theo J.D. / Tarp, Sven 2012. Lexicography and the relvance criterion. Lexikos. 22, 86–108. Cap, Fabienne / Weller, Marion / Heid, Ulrich 2013. Using a rich feature set for the identification of German MWEs. In Monti, Johanna et al. (eds) Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technology, Nice, France, September 2013, 34–42. Caseli, Helena de Medeiros et al. 2009. Statistically driven alignement-based multiword expression identification for technical domains. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, August 2009. Stroudsburg, PA, USA: Association for Computational Linguistics, 1–8. Coffey, Stephen 2011. A New Pedagogical Dictionary of English Collocations. International Journal of Lexicography. 24/3, 328–341. Colson, Jean-Pierre 2014 (forthcoming). Set phrases around globaliza tion: an experiment in corpus-based computational phraseology. In Alonso Almeida, Francisco et al. (eds) Input a Word, Analyse

62

Adriana Orlandi

the World: Selected Approaches to Corpus Linguistics. Newcastle upon Tyne: Cambridge Scholars Publishing. Debus-Gregor, Esther / Heid, Ulrich 2013. In Gouws, Rufus H. et al. (eds) Dictionaries. An International Encyclopedia of Lexicogra phy. Supplementary volume. Berlin/Boston: De Gruyter, 1001– 1013. Evert, Stefan 2005. The Statistics of Word Cooccurrences – Word Pairs and Collocations. Stuttgart: University of Stuttgart, IMS. . Evert, Stefan 2008. Corpora and collocations. In Lüdeling, Anke / Kytö, Merja (eds) Corpus Linguistics. An International Handbook. Vol. 2. De Gruyter, 1212–1248. Firth, John Rupert 1957. Papers in Linguistics 1934–1951. Oxford: Oxford University Press. Fontenelle, Thierry 1997. Turning a Bilingual Dictionary into a LexicalSemantic Database. Tübingen: Neimeyer Verlag (“Lexicogra phica. Series Maior”, vol. 79). Fuertes-Olivera, Pedro / Tarp, Sven 2014. Theory and Practice of Spe cialised Online Dictionaries. Lexicography and Terminography. Berlin/Boston: De Gruyter (“Lexicographica Series Maior”, vol. 146). Fuertes-Olivera, Pedro A. / Nielsen, Sandro 2011. The dynamics of terms in accounting: what the construction of the Accounting Dictionaries reveals about metaphorical terms in culture-bound subject fields. Terminology. 17/1, 157–180. Fuertes-Olivera, Pedro A. et al. 2012. Classification in Lexicography: the concept of collocation in the Accounting Dictionaries. Lexi cographica. 28, 293–307. Geeraerts, Dirk 2013. The treatment of meaning in dictionaries and prototype theory. In Gouws, Rufus H. et al. (eds) Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume. Berlin/Boston: De Gruyter, 487–495. Giacomini, Laura 2010. A proposal for an electronic dictionary of Italian collocations highlighting lexical prototypicality and the syntactic-semantic relations between collocation partners. In Dykstra, Anne / Schoonheim, Tanneke (eds) Proceedings of the

Monolingual collocation lexicography: State of art and new perspectives

63

14th Euralex International Congress, Leeuwarden, July 2010. Fryske Academy, 1183–1192. Goes, Jan 1999. L’adjectif. Entre nom et verbe. Paris/Bruxelles: Du culot. Götz-Votteler, Katrin / Herbst, Thomas 2009. Innovation in advanced learner’s dictionaries of English. Lexicographica. 25, 47–66. Granger, Sylviane / Paquot, Magali (eds) 2012. Electronic Lexicogra phy. Oxford: Oxford University Press. Granger, Sylviane 2012. Introduction: Electronic lexicography – from challenge to opportunity. In Granger, Sylviane / Paquot, Magali (eds) Electronic Lexicography. Oxford: Oxford University Press, 1–11. Gries, Stefan Th. 2008. Phraseology and linguistic theory. A brief survey. In Granger, Sylviane / Paquot, Magali (eds) Phraseology. An interdisciplinary perspective. Amsterdam/Philadelphia: John Benjamins, 3–25. Gries, Stefan Th. 2013. 50-something years of work on collocations. What is or should be next… International Journal of Corpus Lin guistics. 18/1, 137–165. Gross, Gaston 1996. Les expressions figées en français. Paris: Ophrys. Grossmann, Francis / Tutin, Agnès (eds) 2003. Les collocations. Analyse et traitement. Travaux et recherches en linguistique appliquée. Amsterdam: De Werelt. Grossmann, Francis / Tutin, Agnès 2002. Collocations régulières et irrégulières: esquisse de typologie du phénomène collocatif. Revue française de linguistique appliquée. (Lexique: problèmes ac tuels). 7/1, 7–25. Hartmann, Reinhard R.K. 2005. Pure or hybrid? The development of mixed dictionary genres. Facta Universitatis. 3/2, 193–208. Hausmann, Franz Josef / Blumenthal, Peter 2006. Présentation: collocations, corpus, dictionnaires. Langue française. 150, 3–13. Hausmann, Franz Josef 1979. Un dictionnaire de collocations est-il possible? Travaux de linguistique et de littérature. 17/1, 187–195. Hausmann, Franz Josef 1989. Le dictionnaire de collocations. In Hausmann, Franz Josef et al. (eds) Dictionaries. An Internation al Encyclopedia of Lexicography. Vol. 2. Berlin/New York: De Gruyter, 1010–1019.

64

Adriana Orlandi

Hausmann, Franz Josef 1996. La syntagmatique dans le TLF informatisé. In Piotrowski, David (ed.) Autour de l’informatisation du TLF, Actes du Colloque International de Nancy, May 1995. Paris: Didier, 51–77. Hausmann, Franz Josef 1999. Le dictionnaire de collocations. Critères de son organisation. In Greiner, Norbert / Kornelius, Joachim / Rovere, Giovanni (eds) Texte und Kontexte in Sprachen und Kul turen. Festschrift für Jörn Albrecht. Trier: Wissenschaftlicher Verlag Trier, 121–139. Heid, Ulrich / Gouws, Rufus H. 2006. A Model for a Multifunctional Dictionary of Collocations. In Corino, Elisa / Marello, Carla / Onesti, Cristina (eds) Proceedings of the 12th Euralex Inter national Congress, Torino, September 2006. Torino: Edizioni Dell’Orso, 979–988. Heid, Ulrich / Weller, Marion 2010. Corpus-derived data on German multiword expressions for lexicography. In Proceedings of the 6th International Conference on Language Resources and Eval uation, 331–340. Heid, Ulrich 2008. Computational Phraseology. An overview. In Granger, Sylviane / Meunier, Fanny (eds) Phraseology. An interdisciplinary perspective. Amsterdam/Philadelphia: John Benjamins, 337–360. Katz, Graham / Giesbrecht, Eugenie 2006. Automatic identification of non-compositional multiword expressions using latent semantic analysis. In Proceedings of the Workshop on Multiword Expres sions: Identifying and Exploiting Underlying Properties, Sydney, Australia, July 2006. Stroudsburg, PA, USA: Association for Computational Linguistics, 12–19. Kilgarrif, Adam / Kosem, Iztok 2012. Corpus tools for lexicographers. In Granger, Sylviane / Paquot, Magali (eds) Electronic Lexicog raphy. Oxford: Oxford University Press, 31–55. Kilgarriff, Adam et al. 2004. The Sketch Engine. In Proceedings of the 11th Euralex International Congress, Lorient, France, 105–115. Kleiber, Georges 1990. La sémantique du prototype. Catégories et sens lexical. Paris: PUF. Kübler, Natalie / Pecman, Mojca, 2012. The ARTES bilingual LSP dictionary: From collocation to higher order phraseology. In

Monolingual collocation lexicography: State of art and new perspectives

65

Granger, Sylviane / Paquot, Magali (eds) Electronic Lexicogra phy. Oxford: Oxford University Press, 187–209. Lew, Robert 2011. Space restrictions in paper and electronic dictionaries and their implications for the design of production dictionaries. In Bański, Piotr / Wójtowicz, Beata (eds) Issues in Modern Lex icography. München: Lincom Europa. . Lux-Pogodalla, Veronika / Tutin, Agnès 2008. Extraction de collocations à partir du champ syntagmatique du TLFi: application aux noms transdisciplinaires des écrits scientifiques. In Lexicogra phie et informatique: bilan et perspective, Actes du Colloque International de Nancy, January 2008. McEnery, Tony / Xiao, Richard / Tono, Yukio 2006. Corpus-based lan guage studies: an advanced resource book. London/New York: Routledge. Mel’čuk, Igor / Clas, André / Polguère, Alain 1995. Introduction à la lexicologie explicative et combinatoire. Paris/Louvain-la-Neuve: Duculot. Mel’čuk, Igor / Polguère, Alain 2006. Dérivations sémantiques et collocations dans le DiCo/LAF. Langue française. 150, 66–83. Mel’čuk, Igor 2003. Collocations dans le dictionnaire. In Szende, Thomas (ed.) Les écarts culturels dans les dictionnaires bi lingues. Paris: Champion, 19–64. Mel’čuk, Igor 2011. Phrasèmes dans le dictionnaire. In Anscombre, Jean Claude / Mejri, Salah (eds) Le figement linguistique: la pa role entravée. Paris: Champion, 41–61. Mel’čuk, Igor 2012. Phraseology in the language, in the dictionary, and in the computer. In Kuiper, Koenraad (ed.) Yearbook of phraseol ogy. Vol. 3. New York/Berlin: De Gruyter, 31–56. Mittman, Brigitta 2013. New tendencies in the treatment of collocations. In Gouws, Rufus H. et al. (eds) Dictionaries. An Interna tional Encyclopedia of Lexicography. Supplementary volume. Berlin/Boston: De Gruyter, 500–509. Moon, Rosamund 1998. Fixed Expressions and Idioms in English. A Corpus-Based Approach. Oxford: Clarendon Press.

66

Adriana Orlandi

Nuccorini, Stefania 2003. Towards an Ideal Dictionary of Collocations. In Sterkenburg, Piet Van (ed.) A Practical Guide to Lexicogra phy. Amsterdam/Philadelphia: John Benjamins, 366–387. Och, Franz Josef / Ney, Hermann 2003. A systematic comparison of various statistical alignment models. Computational Linguistics. 29/1, 19–51. Orlandi, Adriana 2013. Pour une typologie des expressions figées Nom Adjectif. In Ligas, Pierluigi (ed.) Lexique Lexiques. Verona: QuiEdit, 183–200. Palmer, Franck Robert (ed.) 1968. Selected papers of J.R. Firth 1952– 59. London: Longman. Paquot, Magali 2012. The LEAD dictionary-cum-writing aid: An integrated dictionary and corpus tool. In Granger, Sylviane / Paquot, Magali (eds) Electronic Lexicography. Oxford: Oxford University Press, 163–185. Pecina, Pavel 2009. Lexical association measures and collocation extraction. Language, Resources and Evaluation. 44/1–2, 137–158. Prandi, Michele 2004. The Building Blocks of Meaning. Amsterdam/ Philadelphia: John Benjamins. Ramisch, Carlos et al. 2010. A hybrid approach for multiword expression identification. In Caseli, Helena et al. (eds) Computational Processing of the Portuguese Language. Springer, 65–74. Rosch, Eleanor 1973. Natural categories. Cognitive psychology. 4, 328– 350. Rundell, Michael 2012. The road to automated lexicography: An editor’s viewpoint. In Granger, Sylviane / Paquot, Magali (eds) Elec tronic Lexicography. Oxford: Oxford University Press, 15–30. Seretan, Violeta 2009. Extraction de collocations et leurs équivalents de traduction à partir de corpus parallèles. TAL. 50/1, 305–332. Seretan, Violeta 2011. Syntax-Based Collocation Extraction. Dor drecht/Heidelberg/London/New York: Springer (“Text, Speech and Language Technology Series”, vol. 44). Siepmann, Dirk 2003. Eigenschaften und Formen lexikalischer Kollokationen: Wider ein zu enges Verständnis. Zeitschrift für fran zösische Sprache und Literatur. 1, 260–283.

Monolingual collocation lexicography: State of art and new perspectives

67

Siepmann, Dirk 2005. Collocation, colligation and encoding dictionaries. Part II: lexicographical aspects. International Journal of Lexicography. 19/1, 1–39. Sinclair, John 1966. Beginning the Study of Lexis. In Bazell, Charles E. et al. (eds) In Memory of John Firth. London: Longmans, 410– 430. Sinclair, John 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Squillante, Luigi 2014a. Towards an empirical subcategorization of multiword expressions. In Proceedings of the 10th Workshop on Multiword Expressions, Gothenburg, Sweden, April 2014. Association for Computational Linguistics, 77–81. Squillante, Luigi 2014b. Syntax and semantics vs. statistics for Italian multiword expressions: empirical prototypes and extraction strategies. In Proceedings of the 16th Euralex International Con gress: The User in Focus. Bolzano, July 2014, 927–937. Svensson, Maria Helena 2002. Critères de figement et conditions nécessaires et suffisantes. Romansk Forum. 16/2, 777–783. Tarp, Sven 2008. Lexicography in the Borderland between Knowledge and Non-Knowledge. Tübingen: Niemeyer. Tarp, Sven 2012. Theoretical challenges in the transition from lexicographical p-works to e-tools. In Granger, Sylviane / Paquot, Magali (eds) Electronic Lexicography. Oxford: Oxford University Press, 107–118. Tsvetkov, Yulia / Winter, Shuly 2011. Identification of multi-word expressions by combining multiple linguistic information sources. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), 836–845. Tsvetkov, Yulia / Winter, Shuly 2012. Extraction of multi-word expressions from small parallel corpora, Natural Language Engineer ing. 18/4: 549–573. Tutin, Agnès 2005. Le dictionnaire de collocations est-il indispensable? Revue française de linguistique appliquée. 10/2, 31–48. Tutin, Agnès 2008. For an extended definition of lexical collocations. In Bernal, Elisenda / DeCesaris, Janet (eds) Proceedings of the 13th Euralex International Congress, Barcelona, Spain, July 2008, 1453–1460.

68

Adriana Orlandi

Tutin, Agnès 2010a. Sens et combinatoire lexicale: de la langue au dis cours. Dossier en vue de l’habilitation à diriger des recherches. Vol. 1: Synthèse. . Tutin, Agnès 2010b. Le traitement des collocations dans les dictionnaires monolingues de collocations de français et de l’anglais. In Neveu, Franck et al. (eds) Congrès Mondial de Linguistique Française. Paris: Institut de Linguistique Française, 1075–1088. Tutin, Agnès 2013. Les collocations lexicales: une relation essentiellement binaire définie par la relation prédicat-argument. Langages. 189/1, 47–63. Uhrig, Peter / Proisl, Thomas 2012. Less hay, more needles – using dependency-annotated corpora to provide lexicographers with more accurate lists of collocation candidates. Lexicographica. 28, 141–179. Van de Cruys, Tim / Villada Moirón, Begoña 2007. Semantic-based multiword expression extraction. In Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, Prague, Czech Republic, June 2007. Stroudsburg, PA, USA: Association for Computational Linguistics, 25–32. Verlinde, Serge 2010. The Base lexicale du français: A multi-purpose lexicographic tool. In Granger, Sylviane / Paquot, Magali (eds) eLexicography in the 21st Century: New Challenges, New Ap plications. Proceedings of Euralex International Congress 2009. Louvain-la-Neuve: Presses universitaires de Louvain, 335–342. Walker, Crayton 2009. The treatment of collocation by learners’ dictionaries, collocational dictionaries and dictionaries of business English. International Journal of Lexicography. 22/3, 281–299. Weller, Marion / Fritzinger, Fabienne 2010. A hybrid approach for the identification of multiword expressions. Proceedings of the 7th international conference on language resources and evalua tion (LREC 2010), Malta, May 2010. Weller, Marion / Heid, Ulrich 2010. Extraction of German multiword expressions from parsed corpora using context features. In Pro ceedings of the 7th international conference on language re sources and evaluation (LREC 2010), Malta, May 2010.

Monolingual collocation lexicography: State of art and new perspectives

69

Williams, Geoffrey 2003. Les collocations et l’école contextualiste britannique. In Grossmann, Francis / Tutin, Agnès (eds) Les collocations. Analyse et traitement. Travaux et recherches en linguistique appliquée. Amsterdam: De Werelt, 33–44. Williams, Geoffrey 2013. Reviews. International Journal of Lexicogra phy. 26/1, 90–94. Zarrieß, Sina / Kuhn, Jonas 2009. Exploiting translational correspondences for pattern-independent MWE identification. In Proceed ings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Singapore, August 2009. Stroudsburg, PA, USA: Association for Computational Linguistics, 23–30. Collocation Dictionaries English Longman Collocations Dictionary and Thesaurus. 1st ed. 2013. Pearson Longman. LTP Dictionary of Selected Collocations. 1st ed. 2002. Boston: Thomson. Macmillan Collocations Dictionary. 1st ed. 2010. Macmillan Publishers Limited. Oxford Collocations Dictionary for Students of English. 2nd ed. 2009. Oxford University Press. The BBI Combinatory Dictionary of English: Your Guide to Colloca tions and Grammar. 3d ed. 2009. Amsterdam [etc.]: John Benjamins. The BBI Dictionary of English Word Combinations. 1st ed. 1997. Amsterdam/Philadelphia: John Benjamins. French Dictionnaire collocationnel du français général. 1st ed. 1990. Varsovie: Państwowe Wydawnictwo Naukowe. Dictionnaire combinatoire compact du français. 1st ed. 2011. La maison du dictionnaire. Dictionnaire des combinaisons de mots. 1st ed. 2007. Le Robert. Dictionnaire des cooccurrences. 1st ed. 2001. Montréal: Guérin.

70

Adriana Orlandi

Le grand dictionnaire des cooccurrences. 1st ed. 2009. Montréal: Guérin. Le Grand Druide des cooccurrences. 1st ed. 2012. Montréal: Éditions Druide. Lexique actif du français. 1st ed. 2007. Bruxelles: De Boeck. German Feste Wortverbindungen des Deutschen. Kollokationenwörterbuch für den Alltag. 1st ed. 2014. Tübingen: Francke Verlag. Wörterbuch der Kollokationen im Deutschen. 1st ed. 2011. Berlin/New York: De Gruyter. Italian Dizionario Combinatorio Compatto Italiano. 1st ed. 2012. Amsterdam/ Philadelphia: John Benjamins. Dizionario Combinatorio Italiano. 1st ed. 2013. Amsterdam/Philadelphia: John Benjamins. Dizionario delle collocazioni. Le combinazioni delle parole in italiano. 1st ed. 2012. Bologna: Zanichelli. Modi di dire. Lessico italiano delle collocazioni. 1st ed. 2010. Roma: Aracne. Spanish El diccionario combinatorio práctico del español contemporáneo. 1st ed. 2006. Madrid: SM. Other dictionaries Collins COBUILD English Dictionary. 2d ed. 1995. London: Harper Collins. Dictionnaire des expressions et locutions. 1st ed. 2007. Paris: Le Robert. Dictionnaire explicatif et combinatoire du français contemporain. 1984 (vol. I), 1988 (vol. II), 1992 (vol. III), 2000 (vol. IV). Montréal: Les Presses de l’Université de Montréal. Trésor de la langue française informatisé, . Petit Robert (en ligne), .

Vincenzo Lo Cascio

Congruency principles in word combination and lexicography1

Abstract: Lexicon is a set of words, forming a network, where each word functions as a knot from where links start in different directions and reach other knots (Lo Cascio 2007). Lexicography and lexicology have to consider lexicon as a system of word combination connected to each other on the ground of some congruency principles, which are at the basis of the system. Words do not combine with any word but only with few of them. The combination is regulated by the grammatical category to which they belong, but also by semantic and encyclopaedic congruency principles. The word combination goes from fixed, as idiomatic forms and proverbs, to free combinations. Some of these combinations, called collocations, nevertheless are typical and idiosyncratic as they are specific combinations, statistically very frequent and/or language and cultural bound. Criteria should be given in order to define and establish the congruency principles, which are necessary to predict combinations. Stating whether a combination is a collocation or free is determined by language comparison and contrastive analysis and is therefore didactic relevant and predictive about learning processes. Keywords: Word Combination, Congruency principles, semantic encyclopedic princi ples, free combinations, collocations, contrastive analysis, learning processes

0. Introduction It is common knowledge nowadays that interest in the cognitive lexicon is growing and that in both theoretical and applied linguistics, phraseological units, phrasal-lexemes, or word combinations in written and spoken language play a very important role.

1

The English text has been revised by Linda Pollack-Johnson, Philadelphia, but any mistake is mine.

72

Vincenzo Lo Cascio

Lexical functions (Lo Cascio 2007) are acquired and are systematized in our brain as prefabricated lexical units composed in a systematic way. As a matter of fact, lexical functions consist of an infinite set of mini-systems. It is estimated that fluent speakers have hundreds of thousands of these prefabricated lexical units. Indeed, phraseological expressions are basic to our learning process. If this statement is proven to be true then we should consider a lexis as a network made up of combinatorial systems. As a matter of fact, lexicology and lexicography, and in a wider sense, general linguistics should be acquainted with the fact that we learn, speak and understand language using phrasemes, sequences and formulas as the output of complex ideas. Consequently, a lexis is located and registered, in general, in our brain in the form of phrasemes and phraseological units. The idea is that word combinations crystallize and that language is not formed by single words, casually combined together, but primarily, by lexical sets, which can be considered a unit, a formula, a lexical packet, such as for example: (1a) (1b)

warrant of extradition to issue a warrant of arrest distress warrant to order the arrest of an offender

and, in Italian (1c) mandato di cattura spiccare un mandato di cattura

It means that the lexicon could be formed not only by single lexical entries but also by syntagmatic entries according to our cognitive behaviour. As a matter of fact, we record in our mind not only single words but also and especially sequences, syntagmas, formulas.

Congruency principles in word combination and lexicography

73

1. Lexicon as a network A lexicon is built up of a set of words, forming a network, where each entry functions as a node from which links branch out in different directions and reach other nodes.2 Lexicography and lexicology, in fact, treat a lexicon as a system of word combinations connected to each other based on the foundation of some congruency principles, which are at the basis of the system. This means that linguistic theories should abandon the notion whereby a lexicon is formed by a list of entries, i.e. words characterized by a grammatical marker and a semantic tag organized alphabetically. However, words in a lexical system do not combine with just any word of that specific language but only with few of them. The combination is regulated by the grammatical category to which they belong, but also by semantic and encyclopaedic congruency principles. This means that words form a grammatical network as well as an encyclopaedic and semantic one. Word combinations, in the system, go from fixed, such as idiomatic forms and proverbs, to free combinations. Some of the combinations, called collocations (Heid 1997, Lo Cascio 1997, Melʼčuk 1988), nevertheless are typical and idiosyncratic, meaning that they are specific combinations, occur very frequently, and/or are bound to that language and culture. However, although lexical items also appear in free combinations, this does not mean that any combination is allowed. In order for the free combination to apply, syntactic, semantic and encyclopaedic congruency principles (Lo Cascio 2012) must be met. Free combinations are essentially predictable and supported by universal principles, but must follow congruency rules. In every language some combinations are preferred over others, which are theoretically possible. Very often word combinations appear to be acceptable but sometimes speakers hesitate about the grammaticality or acceptability of some word combinations. Therefore some word combinations which 2

See for instance Lo Cascio (2007).

74

Vincenzo Lo Cascio

are in theory possible but in reality are not very frequent or probable, are marked (in this text) by a question mark, while other combinations which are not used at all, or are to be considered unacceptable for a native speaker, are marked by an asterisk. In English, the expression: (2)

to drink in one breath

is in Italian (2a) bere tutto d’un fiato

And it is possible in Italian to say: (3a) leggere un libro tutto d’un fiato

However, in English it is (3b) to read a book in one sitting or to read a book in one go

and not (3c) *to read a book in one breath

If the preference is culturally bound then we have a collocation since the combination is a preference and not a generalized choice. If it is not, but the lexical combination is not normal, then it often means that a metaphorical interpretation has taken place thus extending the combinatory principles such that the combinatorial result has a meaning and can be understood. Also in that case we can have a collocation. For example: (4a) accendere un conto/prestito/un’ipoteca

cannot be translated in English as (4b) *to turn on/switch on an account/loan/mortgage

Congruency principles in word combination and lexicography

75

but with (4c) to open an account, to take out a loan/mortgage

which come from other similar metaphorization operations. Given that phrasemes and lexical components vary from one language to another, while always following semantic encyclopaedic congruency principles, this should also imply that the lexical component of each language should contain the explicit list of word combinations belonging to every specific lexical entry of that language.

2. Congruency Principles Since all combinations are not random but are instead regulated by some principles, which ensure congruency, a definition is needed for congruency principles. 2.1 Syntactic congruency principles (scp) Two or more lexical items in a language can combine, thus forming a network, on the condition that their grammatical markers are compatible. An adverb can combine with a verb, a noun with an adjective or a verb, or with another noun, and so on. 2.2 Functional syntactic congruency principles (fscp) Two lexical items can combine if their syntactic functional roles allow the combination. An intransitive verb cannot combine with a noun or a nominal syntagma functioning as an object. A transitive verb requires a nominal syntagma selected in accordance with some semantic criteria. A verb has some valence and therefore allows for one or more arguments.

76

Vincenzo Lo Cascio

Curiously, a nominal syntagma, assuming the role of object, depends on a verb in the same phrase, but that conversely selects the lexical extension of that nominal verb. So for example, a noun such as account, functioning as an object, i.e. NP2, is selected by, and depends on a transitive verb. The lexical choice of that verb is nevertheless determined by the noun account in the NP2 position, which allows for verbs such as open or close but not verbs such as polish or half-open or leave ajar (it. socchiudere). In Italian, the same noun, on the other hand, would also allow for a lexical extension such as accendere (to put on, to switch on, to enlighten) because of a metaphorical operation, which is acceptable in that language. This is not the case in English where an account can be opened or closed but not switched on (although “open” and “close” have a methaphorical reading). However, using another metaphor, an account can be held. Cf. structures (5a) and (5b) (5a)

S

NP1

VP V

aprire, chiudere, accendere *lucidare, *socchiudere, *sollevare (5b)

NP2

un conto

S

NP1

VP V

NP2

open, close, hold an account *polish,* half-open *leave-ajar *switch-on

Congruency principles in word combination and lexicography

77

2.2.1 The position of syntax The syntactic component can only specify the syntactic relationship and a regular network but is not able to assign definitively proper lexical extensions to syntactic nodes. It is the lexicon, which determines the appropriate lexical items every syntactic node can have. So since syntax cannot predict what kind of lexical choice a language makes in word combinations, meaning which words every linguistic entry or node combines with, it appears that in linguistics, whether it be theoretical, descriptive or applied, the lexical component should have a primary role above syntax. 2.3. Encyclopaedic-semantic congruency principles (escp) Consider again that a lexical item will combine with words when they are compatible with its semantic components. So if a verb possesses or contains a component which requires that its object be characterized as “liquid”, then a combination is allowed with nouns and nominal syntagma characterized as liquid, and containing that property of being “liquid”. Hence, to drink water is a good combination because water meets the requirement of being liquid in order to combine with drink. It is not possible, on the contrary, to combine water with eat since this is a verb which requires that the object be “solid”. So eat and water do not form a (semantic) network. If that semantic/encyclopaedic congruency principle is met, then the object, for instance characterized as “liquid”, ultimately determines which extension the verb, having the property of requiring a liquid object, should have – but always within the range of verbs which require a “liquid” object. The choice of the verb however is somehow culturally bound, or bound by other properties which the noun possesses, or is bound to specific situations. So, for instance, a word such as wine can combine with verbs in the following lexical extensions: (6a) Verb + wine to drink, to pour, to gulp down a glass of, to sip, to flavour with, ?to dress with

78

Vincenzo Lo Cascio

But not *to fry with wine

The Italian counterpart would be (6b) Verbo + vino bere, versare, sorseggiare, tracannare un bicchiere di, aromatizzare con, insaporire con, ?condire con

But not *friggere con vino

For the noun oil, which is also characterized by the property “liquid”, the verbs allowing or requiring a liquid component would be different than those required by wine. (7a) Verb + oil to flavour with, to pour, to fry with, to dress with, to take or swallow (as with medicine), ?to drink, *to gulp down, *to sip

And for the Italian counterpart (7b) Verbo + olio condire con, versare l’olio, friggere con l’olio, ?bere, ?tracannare l’olio, *sorseggiare l’olio

As a matter of fact, a property that belongs to oil (olio) allows for it to be used for frying, which is not the case for wine. For water the combination would be (8a) Verb + water to drink, to pour, to gulp down, to swallow, to sip, *to fry with

In Italian acqua would combine, among others, with (8b) Verb + acqua bere, versare, tracannare, ingollare, ?sorseggiare, *friggere

Congruency principles in word combination and lexicography

79

The fact that water can be swallowed or gulped down is a property which oil possesses but to a lesser degree, not because of the fact that it is not a liquid object, but because its density, its taste and its function would sometimes make it less appropriate to gulp down. In a sequence such as (8c) ingollare un bicchiere d’acqua tutto d’un fiato (to swallow, gulp down a glass of water: to drink all in one breath, in one gulp)

the aspectual marker is [+quickly]. While in (8d) *ingollare un bicchiere d’acqua in un respiro

the locution (in 8c) tutto d’un fiato (all in one breath, or, in one gulp) is driven by the verb ingollare (to swallow or gulp down) which among its semantic components has the aspectual property of “to drink QUICKLY”. The aspectual marker, “to drink QUICKLY” allows the combination with a locution such as in one gulp, stressing the speed with which the action takes place. The combination with the quantifier and adverbial locution is then dictated by the semantic congruency principle, and as such is not language bound. However it is language bound that one of the quantifiers in English for swallow or gulp down can be in one breath whereas in Italian it is in un fiato and not ?in un respiro. A verb such as wait and an adverb quantifying its duration such as for a long time (per molto tempo or per lungo tempo) can form a sequence, which is dictated by general and core rules, going beyond the particularity of a specific language. The quantifier (per un bel pezzo) would however be language bound. (9a) aspettare per molto tempo/per lungo tempo/per un bel pezzo (9b) to wait for a long time/for a fair amount of time/for a chunk of time (9c) ?for a nice piece of time *for a nice piece

Semantic and encyclopaedic relationships such as perception (see, touch, smell) or possession, are determinant. Aspectual information such as [+/- durative] or [+/- dynamic] markers (Lo Cascio/Adelaar 1986; Lo Cascio 1995; Puglielli 1997) are basic in this regard (Lo Cascio/ Jezek 1999). They allow basic combinations which are not language

80

Vincenzo Lo Cascio

specific but linked to a semantic and encyclopaedic principle. Aspectuality is of importance as a marker for every situation, or event, or state of affairs, and it is inherent to every situation and therefore inherent to verbs. Special lexical extensions, controlling the aspectual forms, could be different from one language to another. Componential semantics and generative semantics are of great help in stating congruency principles for lexical combinations since they are able to highlight all those semantic markers and components regulating word combination and as such enter into the meta-language of lexical definitions. Moreover, nouns possess components or markers, which are determinant for their combinative behaviour, expressing aspectuality or countability. [+/- count] or [+/- animate] or [+/-abstract] (10a) two potatoes (10b) *two dry airs (10c) to deal out potatoes/*to change airs

A noun marked by a marker such as [+count] can appear with a quantifier, e.g. two potatoes (but not *two airs), or with a verb which requires a [+count noun], e.g. to deal out potatoes. “Air” is instead [-count]. The same observations could be made for markers such as [+/-animate], [+/-abstract] which can greatly influence the combinatorial behaviour. Pusteyovsky (1995) uses the term qualia structures to define a set of properties or events which help identify the attributes, constituents, purpose and functions in order to represent the use and meaning of nouns and adjectives, and therefore the right lexical combinations. Although a novel and a dictionary are both books, one reads a novel but consults a dictionary. One writes a novel but compiles a dictionary. (11a) to read a novel/to write a novel/?to compile a novel (11b) to consult a dictionary/to compile a dictionary/?to read a dictionary

Congruency principles in word combination and lexicography

81

In (11a) and (11b) different verbs sensitive to some qualia structures characterizing the objects are employed. The relationships of the type of perception and possession are not based on inherent markers but on semantic components of verbs, which then combine with objects that can be touched or seen. Again, the combinability goes beyond the specificity of a language. In all those cases, we deal with free combinations, in the sense that they are not language specific. So for instance, consider the difference between salire and montare. You can (12a) montare su una sedia (go up on a chair), and (12b) salire su una sedia (to climb on a chair), and (12c) salire sul divano (to climb on a sofa)

but not (12d) ?montare sul divano or ?montare su una collina3 (to go up on a sofa, to go up a hill)

The semantic-encyclopaedic markers can vary from one language to another and can be relevant for the word combination. Let us take the Italian word cattura (capture, arrest, warrant). Its lexical environment changes according to the context, [HUMAN] or [ANIMAL], in which it is used (13) 1. cattura [UOMO/HUMAN] English = warrant Nome + cattura: mandato di cattura, ordine di cattura cattura + Nome: cattura del delinquente/del ricercato Verbo + cattura: ordinare la cattura, sfuggire alla cattura (to escape capture) cattura + Verbo: la cattura ha luogo Aggettivo + cattura: difficile cattura 2. cattura [ANIMALE/ANIMAL] English = capture Nome + cattura: attrezzature per la cattura cattura + Nome: cattura di un cervo/di un animale Verbo + cattura: tentare la cattura, autorizzare la cattura, disporre la cattura, avventurarsi nella cattura

3

Moneglia (1997: 260–261).

82

Vincenzo Lo Cascio

The difference in context results in two different words in English: war rant and capture (13a) the capture of an animal; (13b) to flee/escape capture; (13c) warrant (of arrest)/to issue a warrant (of arrest)

2.4 Metaphorical congruency principles (mcp) Sometimes combinations are allowed which are not foreseen or expected by the normal congruency rules. Indeed, speakers are often very inventive in combining words, which in general are not bound to each other unless a metaphorical process is involved. Metaphorical property markers are assigned to objects coming from similar situations. So, for instance, once again we consider the word account. In Italian, an ac count can be considered to be opened in the same way that a light can be on. This similarity would then allow, theoretically, that an account could be put on like a light, in order to express that an account starts to exist, that it becomes alive. A light, or a lamp, can be put on; it can be started. Therefore we could imagine saying in English to put/switch on an account, but we don’t. This is not possible. In Italian on the contrary we say, as already noted, accendere un conto. An idiosyncratic construction! This metaphorical process depends on the creativity which people in a society can develop and cannot be generalized and extended to all languages. Metaphorical combinations are culturally bound but congruent, because they are allowed by similarity and through the evolution of congruency rules. However, since they are determined by an operation, which is bound to a language and a culture, while still respecting some (metaphorized) rules, they are not universal and, as such, must be considered as a kind of collocation and not as free combinations.

Congruency principles in word combination and lexicography

83

3. Free Combinations Criteria should be given in order to define and establish the congruency principles, which are necessary to determine free combinations. From the previous considerations we can conclude that free combinations are – not idiosyncratic, – regulated by core rules, – general, maybe innate and language independent, – they must only satisfy requirements of the general congruency principles.

Free combinations, in short, highlight the range of qualities a word possesses independent from the language in question. Let us take as examples some words, which are close to each other because of their semantics: (14) conversation, dispute, quarrel, controversy, debate, discussion conversazione, disputa, lite, controversia, dibattito, discussione

Let us consider the verbal combination for the word conversation and the Italian counterpart conversazione: (15a) conversation Verb + conversation: keep going/to carry on a conversation in, to enter/ fall into conversation with s.o., to have no conversation, to hold a con versation with s.o., to make conversation conversation + Verb: the conversation is flagging (15b) conversazione Verb + conversazione: avere una conversazione, avviare la conversazione, fare una conversazione, tenere viva la conversazione, ?condurre una con versazione, ?tenere una conversazione, *entrare in conversazione con q.no conversazione + Verb: la conversazione languisce

Consider that to hold a conversation has no counterpart in Italian whereas you can have tenere viva la conversazione (keep going) (16a) dispute Adj + dispute: heated dispute

84

Vincenzo Lo Cascio Verb + dispute: avoid a dispute (evitare una contestazione), to be in dis pute with (avere una vertenza con), to intervene in a dispute, to settle a dispute (appianare una controversia) Prep + dispute: beyond dispute (incontestabilmente, fuori discussione) dispute + Verb: a dispute arises (nasce una controversia, si accende una controversia/disputa) (16b) disputa Verb + disputa: accendere, risolvere una disputa Adj. + disputa: annosa disputa disputa + Adj.: disputa accademica, familiare, legale Loc + disputa: al centro di una disputa legale

The noun disputa in Italian is, as a matter of fact, not as productive as in English. Italians make use of other specifications, using words such as contestazione or expressions such as appianare una lite. (17a) quarrel Verb + quarrel: to have a quarrel (avere una lite), to settle a quarrel (appianare una lite, fare la pace), to pick a quarrel with someone (litigare con q.no, attaccare briga con q.no), to begin a quarrel with s.o. (attaccare lite con q.no) quarrel + Verb: a (bitter) quarrel arises (nasce una grossa lite, scoppia una lite) (17b) lite Verb + lite: attaccare lite con q.no, avere una lite, appianare una lite, sedare una lite lite + Verb: nasce una lite, scoppia una lite, ?accendere una lite (18a) controversy Verb + controversy: to give rise to controversy, to hold a controversy, to carry on a controversy with s.o. on sthg. controversy + Verb: a controversy arises (18b) controversia Verbo + controversia: avere una controversia, suscitare una controver sia, *tenere una controversia, *condurre una controversia con q.no (19a) discussion Verb + discussion: to draft a discussion, to draw a discussion, to enter into a discussion, to face a discussion on, to give up a discussion, to start a discussion, to intervene in a discussion, to tackle a discussion

Congruency principles in word combination and lexicography

85

discussion + Noun: discussion of a bottle of wine (il gustare/la degustazione di una bottiglia di vino) (19b) discussione Verb + discussione: affrontare una discussione, avere una discussione, condurre una discussione, entrare in discussione con q.no, iniziare una discussione (20a) debate Verb + debate: to attend a debate, to chair a debate, to intervene in a debate, to lead a debate, to win/lose a debate (20b) dibattito Verb + dibattito: avviare un dibattito, condurre un dibattito, dirigere un dibattito, guidare un dibattito, innescare un dibattito, intervenire a un dibattito, presenziare a un dibattito, presiedere un dibattito, vincere/ perdere un dibattito

The majority of those verbs are, in principle, free combinations. Most of them correspond in both languages and they would not form an acquisition problem. Free combinations as such should be easier to learn than collocations since they comply with general principles which are more universal than those marking collocations. Some of the combinations are collocations in the sense that they have a specific morphological or phonological form, which is also language bound. In some cases, for example the term dispute, the corresponding Italian form is much poorer and needs the use of other lexical entries. The selection of verbs combining with these entries, is also specific, since they change from one language to another and as such they should nevertheless be considered as collocations. In sum, all those general categories regulating the relationship between things and considered as properties and markers of each word are involved (Alinei 1974; Levin/Rappaport 1992), and, as such, allow for free combinations. Free combination is only based on generic congruency principles and responds to general rules bound by the nature of things and, as such, is part of the competence of every language speaker. Collocations, on the other hand, are language specific and have the function of indicating the proper way to express things in a specific language.

86

Vincenzo Lo Cascio

4. Collocations Consider now, for instance, that while in Dutch or German, you can poetsen (polish) your teeth (Dutch tanden poetsen), in Italian you can only clean or wash (pulire or lavare) or brush (spazzolare) them. On the contrary, in Italian you can only polish metal or shoes. This means that languages, as mentioned above, use different collocations and not free combinations. To wash/clean a shirt would however be allowed in all languages, so that those verbs (clean and wash) in combination with shirt would be considered a “free combination”. On the contrary wash and clean in combination with teeth would not be “free”. While white teeth would be considered a free combination since the adjective white in this case is not language bound nor idiosyncratic, but merely highlights the encyclopaedic nature of a tooth. This happens at least when Italian and English are compared. The degree of freedom in the combination must be measured and is not precisely given. (21a) tanden poetsen / *tanden wassen (21b) pulire/lavare i denti / *lucidare i denti (21c) wash oneʼs teeth / brush oneʼs teeth (21d) white teeth

4.1 Authentic collocations Authentic collocations are those combinations, such as fare la doccia, which are language-typical but not a result of a metaphorical process. The same can be said of fissare un appuntamento (to make an appointment), or lavare i denti (to brush oneʼs teeth). Cf.: (22a) fare la doccia / *prendere una doccia (22b) fissare un appuntamento / *fare un appuntamento

The Italian verb tagliare (to cut) is used with a noun such as erba (grass) but also with carne (meat) or capelli (hair). This is not the case in other languages, where different verbs must be employed in combination with grass, meat, hair or bread. For instance in Dutch you will have, respectively,

Congruency principles in word combination and lexicography

87

maaien (grass, to mow the grass), snijden (vlees, to slice meat), knippen (haar, to cut the hair) snoeien (de heg snoeien/to clip a hedge, tagliare/ potare la siepe), kappen (houten kappen, to chop wood, tagliare/spaccare la legna). In Dutch, speakers use specific verbs in combination with these nouns, which of course would also be present in Italian (falciare, affettare, accorciare con le forbici, potare la siepe, spaccare la legna respectively), but which in Italian would appear only in some peculiar contexts. So we have to conclude that, when a language comparison takes place, what would be a free combination in one language would not be the case in other languages. In conclusion, stating whether a combination is a collocation or a free combination is dependent on language comparison and contrastive analysis and is therefore didactic (Lo Cascio 2000, 2004) relevant and predictive about learning processes. It is within a comparison between languages that we can speak of collocations. In Dutch, there is only one verb: huren (affittare = to rent). In Italian on the contrary verbs are used for huren: (prendere in affitto) affit tare and noleggiare. While affittare can be used for all objects, noleggiare can only be used for a mobile object (e.g. a bike or a car) but not for an immovable object (such as an apartment or a house). It would mean that, for a Dutch (native) speaker, verbs such as noleggiare or affittare are collocations of nouns such as bike or car, while for an apartment the collocation can only be affittare. Noleggiare is then a marked verb for Dutch speakers. (23a) een auto huren, een fiets huren, een huis huren (23b) affittare una macchina, una bicicletta, una casa (23c) noleggiare una macchina, una bicicletta, *noleggiare una casa

In other languages, the corresponding word for the Italian pane (bread) will allow different (adjectives or verbs) combinations than in Italian. This is because bread as a word will have a different structure belonging to another organization and because the specific language has developed other ways to say things. In the contact between two languages, a mechanism plays a role, which brings interference from the native language into the foreign language. This mechanism, which stems from the contact between the two languages in the learners, generates three types of rules:

88

Vincenzo Lo Cascio (24) General Rules (GR), Common Rules (CR) and Specific Rules (SR). GR + CR

SRL1

SRL2

Suppose a speaker uses two languages i.e. L1 (native language) and L2 (second or foreign language). He will try to create an efficient system with the aim of arriving at a common set of rules which he thinks are valid in both languages. In other words, we could predict that every speaker of two languages tends to organize the data into 3 subsets of rules. One subset (GR = General Rules) is a collection of all the rules which are common to all languages. In this case, a mature speaker doesn’t need to relearn this subset of rules because they belong to the capabilities and knowledge of all human beings. A second subset of rules, the Common Rules (CR), is more functional. It puts together all the rules which are shared by or common to those languages which are under consideration. In our case the CR would then be the rules common to L1 and L2. The CR are easy for a speaker to learn, to remember, and to use since they follow the rules of the mother tongue. The number of rules considered to be common to at least two languages depends, of course, on the linguistic knowledge of the speaker and can vary from one speaker to another and from one moment to another. The organization and reorganization of this set of rules is continuously evolving. The CR (Common Rules) subset includes, therefore, every type of rule which is common to two (or more) languages at different levels: lexical, semantic, syntactic, phonetic, pragmatic and so on. Specific Rules (SR) are rules belonging only to one of the languages. Therefore we have SRL1 and SRL2. An English speaker, when speaking about the “opening of an account” would tend to use, in Italian, the verb open/aprire but not the verb accendere/switch on because the two languages then behave differently and the use of account is peculiar. As stated earlier, you don’t “switch on” an account (accendere un conto) in English.

Congruency principles in word combination and lexicography

89

4.2 Code mixing Many mistakes are made when speakers assign to Common Rules (CR), linguistic phenomena which are language specific. An Italian might guess that in English, as in Italian, it would be pertinent to speak about switching on an account. A bilingual speaker may consistently make a mistake of assigning to the set of CR, constructions which belong to the SR (Specific Rules) of L2. Sometimes, a speaker unwittingly makes a mistake of assigning to the L2 the rules coming from the L1. It is very rare to happen the other way around, i.e. that a speaker applies to the L1 (his mother tongue) rules belonging to the L2. Such a mistake takes place when the speaker is very fluent in L2 or lives in the country of L2 so that the rules of L2 unconsciously influence and interfere with the rules of the L1. In another case, an idiomatic expression can be so common and typical that mistakes in usage of it are rare. For instance a fluent speaker of Italian saying that somebody is smart would say è un tipo in gamba (he is a type in leg!) and not è un tipo superbo even if that would make sense syntactically and is acceptable in another language. A speaker sometimes thinking that both languages behave in the same way, when organizing the languages into the three sets of rules, makes a wrong classification by assigning to Common Rules, some rules which should remain separate and part of the Specific Rules of L1, and as such distinguished from those of L2. An Italian who says to an English-speaking person I would like to make a shower instead of saying I would like to take a shower is making such a mistake in classifying that construction under Common Rules, when he should use a Specific Rule. Contrary to Italian, where one fa la doccia (makes/does a shower), in English you take a shower, you donʼt make one (cf. Lo Cascio 1982, 1987). This is a case of code mixing. One of the tests that can help us understand the cognitive relevance of collocation is code-mixing. Both internal (monolingual) and external (bilingual or multilingual) code-mixing, are primarily involved in collocations. In monolingual situations it could happen that a collocate belonging to another word is used. An Italian speaking English, or even an English native speaker, might say

90

Vincenzo Lo Cascio (25) *I tried to draft him into a conversation, in analogy to I tried to draft him into a discussion, instead of saying I tried to draw him into a conversation

While in a bilingual code-mixing situation it is conceivable that an Italian speaker, addressing English-speaking people might say (26) *I would like to make a shower (desidero fare una doccia)

while an Italian speaker addressing an English speaker who understands Italian, might say (27a) *I tried to imbastire a discussion instead of (27b) to draw, or to sketch or to draft a discussion

using, in this way, a collocate from his native language (imbastire una discussione) for a similar English word (discussion). The question is then whether Complex Formation Rules are at stake here and which cognitive mechanism is involved.

5. Electronic dictionaries In any case, in lexicology, and also in lexicography, the entire range of free combinations and collocations, should be given in order to represent the whole behaviour of each word (Dizionario Combinatorio Compatto Italiano 2012), the network to which every word belongs according to its grammatical character, its semantic markers and contextual behaviour. It is vital that collocations, and also allowed free combinations dictated by the qualia structure of a word, as well as fixed forms such as idioms and proverbs be given. A dictionary, in other words, should try to give all the syntagmatic relationships for every entry and present the entire lexical universe, the mini-system to which an entry, a lexical node, belongs. Kilgarriff asserted that “if we had a database containing all the facts and generalizations about the behaviour of all the words and

Congruency principles in word combination and lexicography

91

phrases of the language, optimally structured, then we wouldn’t need linguistics. But we don’t. That is what linguistics aims to do” (2012: 29). Still I believe that we are in an era where electronic lexicography can already replace theoretical linguistics, since electronic lexicography is able, nowadays, both to simulate how language works and to produce language. But this will be the case only on the condition that electronic lexicography is able to show how language is organized in a network by giving the right lexical combinations. Any electronic dictionary therefore has to describe the entire combinatorial lexical system, simulating the knowledge a native speaker possesses about a lexical entry. The dictionary, as such, must be compiled on a relational database, hierarchically organized, where its semantics is dominant and where all combinations are given. Because of the relational system and because of the possibility, within such a dictionary, to surf from one word to another, the dictionary is able to simulate the function of that language in a dynamic way, where every word can appear as a starting node and where all the allowed links depart to other nodes as happens in our mind. This can be shown through an existing electronic Italian-Dutch dictionary. From its databank can be extracted a dictionary where all the word combinations are given or can be recovered. Let us take two nodes for example and follow the lexical trip the dictionary provides. In order to try to reconstruct systematically that network and lexical behaviour of each lexical entry, I published a Dizionario Combinatorio Italiano (2013) where idiomatic forms and formulas, and proverbs, locutions, collocations and the more frequent free combinations are given. It is now imminent the publication of such a dictionary on-line with an enormous amount of information and research possibilities (locasciodictionary.com). The examples below are taken from both dictionaries. For discus sione I present only the part in the Dizionario Combinatorio Italiano (2013) featuring the combination with adjectives: discussione [di-scus-sió-ne] nome f. s. -i 1. dibattito, colloquio, in cui si esamina un argomento considerando le diverse opinioni a favore e contro AGG. + discussione sgangherata – [discussione scorretta, disordinata, priva

92

Vincenzo Lo Cascio di linea e di coerenza] → ci mancavi solo tu in questa sgangherata discussione. Per quanto mi riguarda, è meglio finirla qui. discussione + AGG. – accanita; – accesa/animata [discussione vivace]; – agitata → ci fu uno scambio di revolverate a seguito di un’agitata discussione tra i malviventi; – amichevole; – antipatica; – approfondita; – ardua [discussione difficile, impegnativa]; – arroventata; – articolata; (fig.) – aspra; – banale; – breve; – calma; – calorosa; – collegiale; – concernente/riguardante … [discussione che ha come argomento …]; – concitata; – cordiale; – costruttiva; – critica; – delicata [discussione difficile, che comporta prudenza, sensibilità] → non voglio affrontare la delicata discussione teologica sul rapporto tra fede e ver ità; – difficile; – enorme → nel nostro paese si dovrebbe aprire una discussione enorme sulla pulizia nei bagni e la cortesia del personale nei locali pubblici; – feconda; – franca; – fruttuosa; – furiosa; – futile [discussione insignificante, inutile]; – generale; – imbarazzante [discussione che per il suo tema mette a disagio]; – impegnativa [discussione difficile, ardua, che richiede coinvolgimento] → a giorni inizierà una discussione impegnativa per il partito su un tema molto delicato e rilevante come quello dell’eutanasia; – incandescente; – inconcludente; – incredibile; – infruttuosa; – infuocata; – insolita; – insulsa [discussione banale, futile] → credo che questa discussione insulsa sul modo di vestirsi della famiglia reale sia andata avanti anche per troppo tempo; – interessante; – interminabile; – inusitata [discussione inconsueta, insolita]; – inutile; – irritante; – noiosa; – oziosa; – pacata; (pol.) – parlamentare; – pericolosa; – piacevole; – politica; – pretestuosa [discussione falsa, che costituisce un pretesto]; – privata; – proficua; – pubblica; – rovente; – serena; – seria [discussione impegnativa]; – solita [discussione che torna spesso] → non vuole essere la discussione solita sull’inquinamento; – spiacevole; – sterile; – tematica [discussione riguardante uno specifico tema] → alcuni giornali italiani hanno deciso di apri re una discussione tematica sull’emigrazione italiana al confronto con l’attuale immigrazione nel nostro paese; – tranquilla; – vasta; – violenta → è scoppiata una discussione violenta tra di loro, forse una questione di gelosia; – viva [discussione animata accesa]; – vivace

which is a different disposition with respect to the traditional one as in the following text taken from Lo Cascio Grande Dizionario Elettronico Italiano/Neerlandese-Neerlandese/Italiano 2006:

Congruency principles in word combination and lexicography

93

Figure 1. Lo Cascio’s CD-ROM: advanced search for the lemma discussione. In the left-hand box verbal phraseologisms are shown, in which discussione appears.

6. Rhetorical function of electronic dictionaries Cicero states in De Oratore (1994: 139), that a speaker has to create the right linguistic profile in order to convince his audience. He must be able to push the audience where he wants. He must also be able (Cicero De Oratore 1994: 141) to construct an elegant speech with nice thoughts and fine words. Speakers must be capable with their speech to modify people’s feelings, to reach agreement, and to foster adhesion. Nevertheless, without a lexicon it is impossible to communicate, to convince or persuade an audience. The task of an orator is to say things in a pertinent way. As a matter of fact, it is not enough to use words but to combine them in an appropriate way and according to the combinatorial

94

Vincenzo Lo Cascio

congruency rules which apply to that specific language. What is important is how the words are disposed. Every word must be placed in the right context. Words are organized in sequence as if they were formulas. We don’t use words alphabetically but rather we store them in our brain organized in a system. Every word is the knot of a network of words belonging to a specific area. Rhetoric means the ability to persuade, which is the aim of eloquence. According to Quintilianus (Institutio Oratoria 1979: 285, vol. 1), rhetoric is in fact the art of persuading “with words”. Aristotle (Rhetoric 1988) would say that rhetoric is a way to guide the interlocutor with words where the orator wants. For Cicero (De Oratore 1994: 141), the speaker must choose the right word combinations, using the linguistic profile in order to communicate in an appropriate way. Word combinations and collocations are then the main core of linguistic production and communication. Hence, communication takes place according to the right rhetorical laws and systems. Nowadays, electronic dictionaries are our rhetorical tools.

Congruency principles in word combination and lexicography

95

Figures 2 and 3: Presentation of the lemma applicazione in the online version of the Dizionario Combinatorio Italiano (locasciodictionary.com). Applicazione appears together with its “lexical combinations” (e.g. adjectives following [AGG+] or preceding the noun [AGG-], nouns following or preceding it, etc.).

7. On-line Dictionaries Nowadays, society offers new ways to find out how to speak properly. Dictionaries are the right place to store the language system. Modern electronic dictionaries show how words combine properly, and enable the user to find the right word for the right context. Researching word combinations helps us reconstruct how linguistic systems work for that specific communication area: law, economics, fashion, cuisine and so on. On-line dictionaries are the right tools to organize information in monolingual or bilingual contexts in order to construct adequate rhetorical messages. An electronic dictionary created on a relational database is able to simulate how language functions. As a matter of fact, words in an electronic dictionary belong to a system whereby every entry is a lexical node from which other language branches off. This is the job of an electronic dictionary of the new generation.

96

Vincenzo Lo Cascio

8. New electronic platform A multilingual platform is now ready (locasciodictionary.com) with Italian as the central language. In particular, two resources are available: a bilingual Italian – Dutch dictionary, and a Combinatory dictionary of Italian (Locasciodictionary.com online, publisher Italned). These are on-line dictionaries which take us to a platform with many possibilities. Every entry is the branching point connecting all the word combinations belonging to that word. This enables the user to surf the system and find all kinds of information from the entire databank as if it were functioning like our brain. The number of places where information about the chosen word can be found is enormous. There are more than 300.000 phraseological units, 200.000 explanations and more than 65 specific language jargons. All the words and phraseological units are marked with a tag which assigns them to a specific language jargon: economics, fashion, culinary, medical and so on. It is also possible to search by category, for instance a “feminine noun belonging to fashion” or “all the vulgar terms” belonging to the language. Another advanced search enables the user to go from the meaning to the entries or phraseologies. This is possible because all the entries, and also a large portion of the phraseological units, are explained. Given a keyword, the software is capable of recovering all the words and phraseological units which have that term in their explanation or expression, or concept. For instance, if you write an expression (concept) such as “in movimento” (in movement) then the program will give all the lexical items which have that concept in their explanation. The same holds true for the phraseological units. For instance, if you look at the combinatorial dictionary under advanced search, and you type “in movimento” then you get different entries (agitare, accompagnare, attrito, collisione, marcia, and so on) and many phraseological units. For example, under the entry “dinamica”, you will find the phraseological unit dinamica dei fluidi which is explained as “ramo della fisica che studia la proprietà dei fluidi in movimento”. Imagine the endless enjoyment of surfing in this linguistic network!

Congruency principles in word combination and lexicography

97

References Alinei, Mario 1974. La Struttura del lessico. Bologna: Il Mulino. Aristotle 1988. Rhetoric, Poetics (Retorica, Poetica). Roma/Bari: Editori Laterza (translated by Armando Plebe & Manara Valgimigli). Cicero, Marcus Tullius 1994. De Oratore (introd. Emanuele Narducci). Biblioteca Universale Rizzoli (latin text, ed. K.F. Kunaniecki, Leipzig, Teubner 1969). Cowie, Anthony (ed.) 1998. Phraseology: Theory, analysis, and appli cations. Oxford: Oxford University Press. De Mauro, Tullio / Lo Cascio, Vincenzo (eds) 1997. Lessico e Gram matica. Roma: Bulzoni. Dizionario Combinatorio Compatto Italiano. 1st ed. 2012. Amsterdam: John Benjamins. Dizionario Combinatorio Italiano. 1st ed. 2013. Amsterdam: John Benjamins. Heid, Ulrich 1997. Proposte per la costruzione di un dizionario elettronico delle collocazioni. In De Mauro, Tullio / Lo Cascio, Vincenzo (eds): Lessico e Grammatica. Roma: Bulzoni, 47‒62. Kilgarriff, Adam 2012. Review of Fuertes-Olivera and Bergenholtz, e-Lexicography The Internet, Digital Initiatives and Lexicography. Kerneman Dictionary News. 20, 26‒29. Levin, Beth / Rappaport Malka 1992. The Lexical Semantics of Verbs of Motion: the Perspective from Unaccusativity. In Roca, Iggy (ed.) Thematic Structure, its Role in Grammar. Dordrecht: Foris Publications, 247‒269. Lo Cascio Grande Dizionario Elettronico Italiano-Neerlandese / Neerlandese-Italiano, 1st ed. 2006. Amstelveen: Italned Foundation. Lo Cascio Vincenzo 2000. Compétence linguistique et Collocations. In Collès, Luc et al. (eds) Didactique des langues romanes: le développement des compétences chez l’apprenant. Actes du colloque de Louvain-la-Neuve, janvier 2000. Bruxelles: De Boeck/ Duculot, 349‒359. Lo Cascio, Vincenzo (ed.) 2007. Parole in rete. Novara: Utet Università. Lo Cascio, Vincenzo / Adelaar, Mascia 1986. Temporal relation, localization and direction in discourse. In Lo Cascio, Vincenzo / Vet,

98

Vincenzo Lo Cascio

Co (eds) Temporal Structures in Sentence and Discourse. Dordrecht: Foris Publications, 251‒297. Lo Cascio, Vincenzo / Jezek, Elisabetta 1999. Thematic-Role Assignment and Aspect in Italian Pronominal Verbs. In Mereu, Lunella (ed.) Boundaries of Morphology and Syntax. Amsterdam: John Benjamins, 253‒270. Lo Cascio, Vincenzo 1995. The relation between tense and aspect in Romance and other languages. In Bertinetto, Piermarco et al. (eds) Temporal reference, Aspect and Actionality. Torino: Rosemberg/Sellier, 273‒293. Lo Cascio, Vincenzo 1997. Semantica lessicale e i criteri di collocazione nei dizionari bilingui a stampa ed elettronici. In De Mauro, Tullio / Lo Cascio, Vincenzo (eds) Lessico e Grammatica. Roma: Bulzoni, 63‒88. Lo Cascio, Vincenzo 2004. La lessicografia e il lessico nella mente 2. In Incontri, Amsterdam/Utrecht: APA Holland/University Press. 1, 17‒30. Lo Cascio, Vincenzo 2012. Nelle reti del lessico. In Ferreri, Silvana (ed.) Lessico e Lessicologia. Roma: Società di Linguistica Italiana/ Bulzoni, 3‒27. Melʼčuk, Igor 1988. Semantic description of lexical units in an Explanatory Combinatorial Dictionary: Basic principles and heuristic criteria. International Journal of Lexicography. 1, 165‒188. Milroy, Lesley / Muysken, Pieter (eds) 1995. One speaker, two lan guages: Cross-disciplinary perspectives on code-switching. New York: Cambridge University Press. Moneglia, Massimo 1997. Teoria empirica del senso e proprietà del lessico: note sulla selezione. In De Mauro, Tullio / Lo Cascio, Vincenzo (eds) Lessico e Grammatica. Roma: Bulzoni, 259‒291. Puglielli, Annarita 1997. Quale e quanta grammatica in un dizionario? In De Mauro, Tullio / Lo Cascio, Vincenzo (eds) Lessico e Gram matica. Roma: Bulzoni, 91‒111. Pustejovsky, James 1995. The Generative Lexicon. Cambridge (Mass.): The MIT Press. Quintilianus, Marcus Fabius 1979. Institutio Oratoria. Faranda, Rino / Pecchiura, Piero (eds), 2 vol., Unione Tipografico-Editrice Torinese.

Michele Prandi

Distributional restrictions based on word content and their place in dictionaries

Abstract: The aim of this paper is to sketch a typology of the different families of restrictions that constrain the distribution of conceptual contents within the structure of consistent sentences. Consistency criteria, traditionally named selection restrictions, constrain the access to processes and properties by the great categories of beings circumscribed by a shared natural ontology: for instance, human beings are allowed to dream or speak, living beings to be born and die, whereas inanimate nature is barred to all these processes. Lexical solidarities (Porzig 1934) are language-specific lexical restrictions: for instance, German draws a line between eating by human beings – essen – and by animals: fressen. Finally, there are cognitive restrictions, motivated by shared models about the typical structure of empirical facts: for instance, streams flow, trees bear fruits, birds fly. These different layers of restrictions form a hierarchy. Consistency criteria circumscribe from outside the area of consistent concepts, the same that is organised by both language-specific lexical structures and cognitive models. Lexical solidarities draw subtler distinctions among consistent concepts: murder, for instance, is kept distinct from kill in that it requires a human being as a direct object. This supplementary restriction is internal to the area of animate beings that circumscribes the consistent objects of kill and the consistent subjects of die. Cognitive models simplify the structure of empirical experience but are in turn consistent: it is possible to see or imagine birds that do not fly and flying fishes but not walking trees or speaking stones. Unlike consistency criteria, lexical solidarities and cognitive models share the property of being internal to the area of consistency. Unlike lexical solidarities and like consistency criteria, cognitive models are shared far beyond the borderline of a single linguistic community. Keywords: selection restrictions, consistency criteria, lexical solidarities, cognitive models, cognitive restrictions

In this paper, I shall examine a general question that cuts across the different subjects discussed in the present volume, that is, the distributional constraints based on word content. The label content-based restrictions covers a wide array of heterogeneous, although interconnected structures. The main aim of this paper is to turn it into an orderly map.

100

Michele Prandi

When one thinks of content-based restrictions, the first notion that comes to mind is collocation. In this paper, however, I do not speak of collocations for two reasons. First of all, although often used to refer to a variety of different data, this cover notion easily leads one to think of a unitary set of structures, flattening relevant differences and discouraging an in-depth differential analysis. Secondly, some current definitions of collocations (see for instance Grossmann/Tutin 2002, 2003; Mel’čuk 2003; Mel’čuk/Polguère 2006; Orlandi, in this volume, § 1.1, for an overview) include both phraseology and word formation, which however are located outside the domain of distribution for both phraseological units and compound words have a distribution but not a distributional structure. They occur in constructions but their inner structure is not the outcome of a combinatory process. Like formal syntactic constraints, by contrast, content-based constraints presuppose construction and distributional structure, a datum that leaves no room for phraseological combinations of lexemes on the one hand and word formation on the other. After examining some different ideas on lexicon to be found within contemporary linguistics and their manifold relations with conceptual contents and syntactic structures (§ 1), I shall suggest some criteria that enable us to identify three layers of content-based distributional restrictions, namely lexical solidarities, consistency criteria and cognitive models (§ 2). In the final section, I shall hypothesise that the three layers form a hierarchy, analyse the relation of each with lexical structures, lexical contents and formal syntax and examine the place of each in dictionaries (§ 3).

1. Conceptual contents, lexicon and syntax Lexicon is far from being a univocal concept. First, there is not a shared notion of lexicon in linguistics. Furthermore, to speak of distributional restrictions implies analysing lexemes in use in syntagmatic structures, a methodological stance that raises some specific questions unknown to

Distributional restrictions based on word content and their place in dictionaries

101

more traditional analyses based on systematic lexical values out of use. On the one hand, lexemes in use carry with them a nebula of information that goes far beyond the borderline of anything it is reasonable to call lexicon. On the other, when dealing with content-based distributional constraints it is impossible not to call into question the relationship with formal distributional constraints belonging to syntax. If we consider the scientific literature of the last century, the documented notions of lexicon range between two opposite ends. According to the structural tradition (Trier 1931[1973], 1932[1973]; Lyons 1963, 1977; Coseriu 1967, 1968), lexicon is a language-specific structure like phonology or grammar. As Lyons (1963: 37) puts it, “Each language must be thought of as having its own semantic structure, just as it has its own phonological and grammatical structure”. The consequence of such a statement is that the formal organisation of lexical structures is highlighted at the expenses of substantive conceptual contents. At the opposite end, some cognitive linguists share the idea that lexicon contains any kind of conceptual and factual information connected in some way or other to the use of words. This methodological stance questions the borderline between lexical and encyclopaedic information on the one hand and the borderline between the symbolic and the indexical dimension on the other. The former point is underlined by Haiman (1980: 331) when he claims that “dictionaries are encyclopaedias”. The latter idea is shared by some cognitive linguists, among others Croft/Cruse (2004) and Fauconnier (1997), who anchor the meaning of sentences in contingent uses, leaving aside the distinction between long-lasting structures and contingent events and therefore the distinction between the symbolic and indexical dimension: “A language expression E does not have a meaning in itself; rather, it has a meaning potential (Fauconnier 1992), and it is only within a complete discourse and in context that meaning will actually be produced” (Fauconnier 1997: 37). The relationship between lexicon and syntax is in turn intricate. In recent years, the study of lexicon has been moved from the paradigmatic dimension to the syntagmatic one, that is, from the correlations between lexemes within lexical fields to the relations between lexemes within the combinatory structure of model sentences. According to Gross (2012: Ch. 2), for instance, the relevant unit for lexical analysis

102

Michele Prandi

coincides with a structure formed by a predicative term saturated by its arguments, that is, with a model sentence. Against this methodological stance, however, the object of lexicography ends up overlapping with the object of syntax to a certain extent. Therefore, the question of their interaction comes to the foreground. Insofar as lexical meanings have a combinatory dimension, and therefore a distributional structure, what is the exact relation between the syntax of meanings, so to speak, and the syntax of forms? Once again, the developments of the last decades in linguistics suggest two opposite answers. Within a typical formal paradigm, as documented by the classical version of generative grammar (Chomsky 1957, 1965), syntactic structures are assumed to be independent of any kind of conceptual constraint: “grammar is autonomous and independent of meaning” (Chomsky 1957: 17) and “uniquely determines […] semantic interpretation” (Chomsky 1966: 5). Syntax is conceived of as formal not only in that it has a static form but also, and above all, because it imposes a form on the organised concepts. At the opposite end, within functional and cognitive approaches, syntax is assumed to be instrumental and iconic or, more specifically, diagrammatic. In this light, syntactic structures have neither an autonomous form nor a shaping power, but simply mirror independent networks of conceptual relations. The active role of syntax is confined to profiling conceptual relations (Langacker 1987), that is, to imposing a perspective on them (Fillmore 1977). According to Dik (1989[1997: 8]), for instance, “Semantics is regarded as instrumental with respect to pragmatics, and syntax as instrumental with respect to semantics. In this view there is no room for something like an ‘autonomous’ syntax”. The iconic view of syntax, in turn, is split into a weak and a strong version. With regard to a weak version, the structure of sentences mirrors the structure of long-lasting conceptual models: “the linguistic form is a diagram of conceptual structure” (Haiman 1985: 2). With regard to a strong version, the structure of sentences is assumed to depict contingent experiential situations. According to Langacker (1991[1992: 35]), for instance, the meaning of a sentence is the “image” of “a particular event known in full detail”: “When we use a particular construction or grammatical morpheme, we thereby select a particular image to structure the conceived situation for communicative

Distributional restrictions based on word content and their place in dictionaries

103

purposes” (Langacker 1991[1992: 12]). Once again, the borderline between the symbolic and the indexical dimensions is blurred. In any case, if we assume that syntactic structures mirror the structure of complex concepts, syntax is not an autonomous structural level but is reduced to the relational dimension of lexicon making room for complex conceptual structures and therefore taking the model sentence as its relevant unit. According to Langacker (2000: 18), “lexicon and grammar form a continuum, structures at any point along it being fully and properly described as symbolic in nature”: grammar is a repository of complex structures each of which, like a lexeme, has a meaning, or a family of interconnected meanings. In a similar light, construction grammar suggests that “Constructions themselves carry meaning” (Goldberg 1995: 1). The meaning of a ditransitive construction, for instance, “can be argued to be the sense involving successful transfer of an object from an agent to a recipient, with the referent of the subject agentively causing this transfer”.

2. The content-based restrictions: selection restrictions, lexical solidarities and cognitive models This whole range of issues concerning the relation between conceptual structures, lexical structures, lexical contents and syntax is developed throughout the history of Twentieth Century linguistics, from Porzig to cognitive linguistics. The first linguist who isolated syntagmatic lexical structures, and therefore a first type of distributional restrictions grounded on contents, was Porzig (1934), who suggested the concept of lexical solidarity soon after Trier (1931) published his seminal work on lexical fields, which are paradigmatic lexical structures. In German, for instance, the difference between the verbs essen and fressen depends on the nature of the subject, which is a human being in the former case and an animal in the latter. A lexical solidarity is an asymmetric kind of structure, which connects a determinant term with a determined one. The determined

104

Michele Prandi

term is a relational, unsaturated concept – the verb essen, for instance – whereas the determinant term is provided by a class of arguments appropriate for its saturation: for instance, the class of human subjects. A lexical solidarity contributes to determine the value (Saussure 1916[1974: 116–117]) of a relational lexeme. Unlike lexical solidarities, which belong to formal lexical analysis, the discovery of selection restrictions is one outcome of syntactic research, and in particular of the study of the constraints imposed on the distribution of formal classes of words and phrases within the structure of the sentence. When looking for the constraints that govern the distribution of such formal categories as noun, verb, noun phrase, verb phrase, distributional analysis inevitably meets with an independent layer of constraints that depend on conceptual contents. A verb like pour, for instance, requires a second argument whose form and conceptual content are equally constrained: in formal terms, it has the form of a noun phrase – the direct object; in conceptual terms, it refers to a concrete liquid substance. According to Harris (1946: 178), formal constraints belong to syntax, whereas conceptual constraints belong to lexicon, which is seen as a “semantic” complement of formal syntax: there are further limitations of selection among the morphemes so that not all sequences provided by the formulae [that is, the formal schemata of syntagmatic structures] actually occur. Individual limitations of selection cannot be described in these formulae; at best, the most important among them can be stated in special lists or in the dictionary.

Unlike Harris, Chomsky (1965) considers selection restrictions as a kind of syntactic constraints. This methodological stance transfers into linguistics Carnap’s idea of a logical grammar (Carnap 1932) that includes both formal and conceptual distributional restrictions. Whereas the function of formal constraints is to account for the agrammaticality of such illformed combinations as This and is perfectly, the function of conceptual constraints is to filter out such inconsistent conceptual combinations as for instance Colorless green ideas sleep furiously, And Winter pours its grief in snow (Emily Brontë), They sleep, the mountain peaks (Alcman1). 1

The first line of Alcman’s famous Nocturne, eúdousin d’oréon koryphaí is quoted in the English translation by M.L. West, Greek Lyric Poetry, Oxford World’s

Distributional restrictions based on word content and their place in dictionaries

105

The idea that selection restrictions belong to syntax was immediately rejected by generative semantics (McCawley 1970[1971]; Lakoff 1971), which claimed the “semantic”, and therefore lexical, nature of selection restrictions. According to McCawley (1970[1971: 218], selection restrictions “are not restrictions on how lexical items may be combined but rather restrictions on how semantic material may be combined”. Within the framework of generative semantics, the adjective semantic is used in a broad sense; it does not refer to languagespecific lexical structures but roughly means conceptual, or notional in Jespersen’s (1924: Ch. III) sense. The same attitude was inherited some decades later by cognitive and functional scholars: according to Lakoff (1987: 539), for instance, “the meanings are concepts in a given conceptual system”. The idea that selection restrictions belong to lexicon is now a commonplace sanctioned by handbooks (see for instance Leech 1974; Palmer 1976; Lyons 1977). On the other hand, within the framework of cognitive linguistics, which neutralises the distinction between lexical contents and both cognitive structures, encyclopaedic information and contingent data, selection restrictions are seen either as restrictions on shared cognitive models (Fillmore 1977) or as beliefs about the world (Haiman 1980: 345): semantic constraints and beliefs about the world are not to be distinguished. Thus, selection restrictions: the sentence ‘the rock is pregnant’ violates a selection restriction. But the categorization of rocks as inanimate and hence, a fortiori, barren, is a belief about the world, and one which is not necessarily shared by everyone.

In addition to that, it should be stressed that functional and cognitive linguists who are acquainted with the structural tradition (see for instance Dik 1989[1997: 91] and Wierzbicka 1980: 87) equate selection Classics, Oxford: Oxford University Press, 1993: 35. As Jakobson (1959 [1971]) immediately underlined, Chomsky’s move is in conflict with the idea of a formal grammar that, like Husserl’s ‘pure grammar’, should be independent of any substantive conceptual constraint. If syntactic connections are independent of the connected concepts, and in the first place of conceptual consistency, why should formal syntax filter out inconsistent combinations? As we shall see below, conflictual complex meanings provide both the most powerful argument in favour of formal syntax and a functional justification for it.

106

Michele Prandi

restrictions to lexical solidarities. According to Geeraerts (1991: 38), for instance, “syntagmatic semantic relations, known in transformational grammar as selection restrictions […] had actually been discussed earlier by Porzig (1934)”. To sum up, linguists use three labels to refer to content-based constraints on the distribution of lexemes: lexical solidarities, selection restrictions and restrictions on cognitive models. According to some linguists, these labels simply constitute three different ways to refer to the same phenomena. The aim of this paper is to argue for the opposite hypothesis: lexical solidarities, selection restrictions and cognitive constraints, in spite of some surface analogies, represent three profoundly distinct categories which are linked by a web of complex and revealing connections. The distinction is relevant for the structure and aims of dictionaries. As we shall see below, in particular, selection restrictions, unlike lexical solidarities, are never stated in current dictionaries, whose aim is to give definitions of isolated words. By contrast, if a dictionary is meant to account for the use of words in model sentences – if it is “generative” in Gross’s sense, selection restrictions become an essential component. 2.1 Selection restrictions and lexical solidarities Whereas the discovery of selection restrictions is connected to a distributional approach to syntactic structures, Porzig’s work lies outside syntax. His idea was not to describe distributional limitations based on word content but to identify a syntagmatic kind of formal and language-specific lexical structure. As pointed out by Coseriu (1967), the function of lexical solidarities is to provide the language-specific formal organisation of lexical fields with a peculiar kind of differential features, which are not inherent but relational. The couple essen vs fressen, for instance, organizes the conceptual area of eating in German thanks to a differential dimension that is not based on some inherent features of the verbal content but depends on its relation with a specific kind of subject2. This premise already suggests that lexical solidarities and 2

Lounsbury (1964: 1073–1074) identifies a root meaning correlated with some oppositive dimensions: “We shall regard as a paradigm any set of linguistic forms wherein: (a) the meaning of every form has a feature in common with the

Distributional restrictions based on word content and their place in dictionaries

107

selection restrictions are radically different. In order to examine this point in detail, we shall now observe some significant instances. The lexical paradigm organising the conceptual area of killing in English contains such values as murder, assassinate, slaughter, ex terminate, execute, slay, butcher, and massacre. Each of these different lexemes is subjected to language-specific lexical solidarities, which impose rather arbitrary restrictions on the kinds of being that can hold as patient. Murder is restricted to persons: assassinate adds the restriction that the object must be a person in a position of political importance and that the agent has a political motive for killing. Slaughter and butcher seem to be terms used primarily for the killing of animals for food […] Slay is applied to humans or higher animals, overlapping somewhat with slaughter, but it has an archaic, especially biblical, connotation. Exterminate is usually used for intentionally killing in order to get rid of fairly low forms of animal life, e. g. insects, or animals that are considered pests, e. g. rats […] Massacre adds the feature that the object consists of a group of people […] Execute is like kill, and adds the qualification that the act is a punishment for a crime and is carried out according to the laws or mores of a social group (Lehrer 1974: 123–124).

As the examples show, a language is sovereign when imposing specific restrictions on the use of words, but it is so on one condition: all these restrictions are internal to the boundaries of consistency, and presuppose it. English can freely legislate about what kinds of being can be murdered, slaughtered or massacred, but on the preliminary condition that all these beings are mortal, and therefore living beings. It is not the task of English to state what kinds of being can die, and therefore be killed. Whereas the relation between assassinate and its direct object, which has to refer to a person of political importance, is a clear instance of lexical solidarity, the constraint that restricts death to living beings is meanings of all other forms of the set, and (b) the meaning of every form differs from that of every other form of the set by one or more additional features. The common feature will be said to be the root meaning of the paradigm. It defines the semantic field which the forms of the paradigm partition. The variable features define the oppositive dimensions of the paradigm”. The classical distinction, however, is not appropriate here for the generic concept of eating is not a meaning in German because there does not exist a hypernym such as English eat.

108

Michele Prandi

a clear case of selection restriction. Whereas the lexical solidarity is restricted to the English language, selection restrictions belong to a more general system of concepts that is shared by a very large community. The relation between selection restrictions and lexical solidarities should now be clear: lexical solidarities draw further language-specific restrictions within conceptual areas that are by definition consistent, and therefore within the conceptual borderline drawn by selection restrictions. If this is true, selection restrictions cannot possibly be seen as language-specific lexical structures. Selection restriction are consistency requirements that belong to a natural ontology, namely to a sort of conceptual constitution shared by a very large number of people that cuts across many different linguistic communities and governs not only linguistic expression and thought but also, and in the first place, everyday behaviour. The reasons that lead one to think that a sentence such as The moon smiles (Blake) has an inconsistent meaning are the same that prevent one from asking questions and giving orders to the moon or to a tree (Prandi 2004: Ch. 8). The differences between lexical solidarities and consistency criteria are confirmed by the observation of conceptual conflicts. The violation of a lexical solidarity is a lexical mistake. Even when it leads to a conflict, it does not end up in inconsistency. The utterance John murdered a spider, for instance, is conflictual but not inconsistent. In spite of the inappropriate lexical choice, the action itself is consistent in that a spider can be killed. The barrier between the spider and the act of murdering is not conceptual, but formal lexical. This is the reason why lexical conflicts can be considered shallow conflicts. The violation of a consistency criterion, on the other hand, gives rise to a process that is essentially and irreversibly inconsistent and does not admit a consistent framing. The utterance describing a smiling moon, for instance, ascribes to the satellite an action that is inconsistent with its essential properties as an inanimate being. What is wrong in the utterance is not the choice of the word but the action itself. The barrier between the moon and the smile is not just lexical: it is conceptual. This is a case of true inconsistency. As any consistent conceptual relation can by definition be framed in consistent words, lexical mistakes can be removed by substitution. In

Distributional restrictions based on word content and their place in dictionaries

109

the presence of a lexical mistake, it is always possible to find a more or less direct path through lexical structures, leading to at least one alternative non-conflicting lexeme on the basis of a common root meaning. Given an example such as John murdered a spider, for instance, the verb can be replaced by kill, the generic hypernym of the field containing murder. On the other hand, if we take into account the utterance John murdered a calf, the verb can be replaced either by the generic verb kill or by the more specific slaughter, which applies a correlative lexical solidarity to the common root meaning ‘kill’: John slaughtered a calf. As it is a question of conceptual lawfulness, by contrast, inconsistency cannot be removed by lexical substitution. If a poet describes the moon as a smiling creature, no consistent alternative formulation can be attained by lexical substitution. There is no English verb that is at the same time correlative or structurally connected to smile and fit for inanimate beings. What is barred to the moon is not a given lexeme, but the whole conceptual area of human expression, no matter what lexeme is used to describe it. A last argument in favour of this line of reasoning is offered by translation. Lexical structures are by definition language-specific, and so are lexical mistakes, which implies that translation does not necessarily preserve them. The anomaly of the English utterance The horse is mewing is preserved in French – Le cheval miaule – and Italian – Il cavallo miagola – because the same lexical solidarity is shared by all three languages. The anomaly of the German sentence Hans frißt, by contrast, disappears in English, French and Italian. The translation of an inconsistent meaning, for its part, has no effect on the conceptual conflict. In one of his sonnets, the French poet Charles Baudelaire attributes to the moon the inconsistent experience of dreaming: Ce soir la lune rêve avec plus de paresse, which is translated into English word by word as The moon tonight dreams vacantly3. On the sole condition that the same system of consistency criteria is shared, whatever language the utterance is translated into the inconsistency does not disappear. For the utterance to lose its inconsistency, it is not enough to imagine another 3

Baudelaire, Charles, The flowers of evil, transl. by James Mc Gowan, Oxford: Oxford University Press, 2008.

110

Michele Prandi

lexical structure; one has to imagine another conceptual landscape – a conceptual picture of world which, unlike ours, would allow celestial bodies to dream. For a person grown up within such a strange ontology, on the other hand, this utterance would in any case be taken as consistent, no matter what language is used to express it. 2.2 Selection restrictions and cognitive models Cognitive models (Holland/Quinn 1987) impose substantive conceptual restrictions on the structure of typical kinds of things and states of affaires. Like lexical solidarities and unlike consistency criteria, cognitive models belong to the territory of consistent concepts, the same that is circumscribed by consistency criteria. Like consistency criteria and unlike lexical solidarities, cognitive models are conceptual structures shared by a very large group including many different linguistic communities. Cognitive models share an essential property with consistency criteria: neither conceptual system depicts a picture of the world as it actually appears, but rather a picture of some possible worlds. The two pictures, however, are depicted according to different relevance criteria, and therefore each bears a specific relationship to the world of real experience. The rule of cognitive modelling is typicality. Cognitive models draw a simplified picture of what our shared world would look like if it were inhabited only by typical beings behaving in a typical way. The typical world is a poorer version of ours, containing nothing more and much less. The typical world, for instance, does not contain a single bird unable to fly. The rule of consistency is substantive possibility, and therefore conceptual lawfulness. Consistency criteria uncover an indefinite set of worlds, including any possible worlds that combine in any imaginable way all kinds of being and properties compatible with conceptual lawfulness. The inverted world of Baroque poetry, for instance, is as admissible as its real counterpart in terms of consistency. Strolling through consistent worlds, one could meet swimming birds, flying feathered fish – De l’océan de l’air les poissons emplumés (Chevreau) – and

Distributional restrictions based on word content and their place in dictionaries

111

floating clouds – I wandered lonely as a cloud / That floats on high o’er vales and hills (Wordsworth) – under a black sun4: Le feu brûle dedans la glace le soleil est devenu noir (Théophile)

Fire burns within ice the sun has become black

Though held to be generally true, cognitive models are sensitive to actual experience. They not only admit the possibility of being falsified by experience but also actually entail it, for the question about typicality arises only insofar as experience makes room for non-typical instances. Most instances that falsify the expectations raised by cognitive models are taken as natural: this is the case, for instance, of birds that cannot fly. Some others would be seen as somehow odd: to come across a feathered fish, for instance, would certainly be an amazing experience. All these kinds of being, however, are equally conceivable, because they do not cross the boundaries of consistency: cognitive modelling is internal to consistency. By contrast, there is no room for inconsistent beings or processes in either actual or possible experience. Inconsistent beings and processes such as the dreaming moon or sleeping mountains are conceivable only as complex meanings of significant expressions, that is, as semantic structures of the symbolic order.

3. Content-based constraints, lexicon and syntax Consistency criteria, lexical solidarities and cognitive models are not different labels for the same structures but radically different structures; furthermore, they are not on a level but form a hierarchy. Selection restrictions hold as general consistency criteria whose function is to delimit the territory of shared consistent concepts. Within the territory of consistent concepts, each language builds up a network of languagespecific lexical relations and correlations, among which lexical solidarities find their place. Within the same territory of consistent concepts, general cognitive models, which are shared by a very large 4

The examples are taken from Genette (1966: § 1).

112

Michele Prandi

group including many linguistic communities, impose substantive constraints on the structure of typical states of affairs to be found in common experience. Against this background, we are now ready to answer the questions asked at the beginning of this paper: what is the place of each kind of constraint within the lexicon or, to put it another way, what idea of lexicon is consistent with each kind? What is the place of formal lexical structures, cognitive models and consistency criteria in a dictionary? What is the relationship between content-based restrictions and formal syntax? 3.1 Content-based constraints and lexicon Like semantic fields, lexical solidarities belong to the formal, language-specific layer of lexical structures. Although relevant for the description of the inner structure of a given language, the formal idea of lexicon is not up to facing the challenge of syntax, that is, to providing the conceptual contents needed for the construction of complex meanings of complex expressions. In order to perform this task, formal lexical structures have to be filled with substantive conceptual contents, which however are not language-specific but shared by a larger community. Like Janus, a functionally adequate lexicon has two dimensions: it has a linguistic specific form but also a more general conceptual content. The formal lexicon is a system of relations and correlations between values that belongs to the specific structure of a linguistic system. The functional lexicon is a shared repository of consistent and substantive conceptual contents. The relevant relations and correlations that build up the formal lexical structure of a given language can be identified against the backdrop of any kind of substantive information thanks to the commutation test. The only relevant relations and correlations that are retained as lexical structures are those that are correlated to differences on the expression plane (Hjelmslev 1943[1961]; Coseriu 1967, 1968). In Latin, for instance, in the area of old age, the conceptual distinction between humans, things, and a composite class including animals and vegetables is relevant in that it provides the distinction between the meanings of senex, vetus and vetulus with its differential dimension (Duchacek

Distributional restrictions based on word content and their place in dictionaries

113

1965: 58). A point is worth underlining here: although relevant on linguistic grounds, such classes, and in particular the grouping of animals and vegetables, bear no cognitive import. More generally, such lexical classes are devoid of conceptual relevance5. Once formal lexical structures have been isolated, we can assume that the functional side of the lexicon is an organised repository of concepts and conceptual relations. In order to draw a more exact portrait of a functional lexicon, however, we have to face two orders of questions. On the one hand, we have to take control over two centrifugal drifts that challenge its firmness, that is, the drift towards occasional data belonging to the indexical dimension and the drift towards encyclopaedic information. On the other hand, within the core of functional lexicon, we have to find a proper place for either layer of substantive conceptual structures, that is, for cognitive models and consistency criteria. One condition a substantive content must comply with in order to be included into functional lexicon is that it is both long lasting and shared by a wide community of speakers. This criterion ideally filters out any kind of contingent datum bound to a contingent speech event and occasionally shared by its contingent actors. Let us suppose that in order to understand a given message one has to realise that red is the colour of Ann’s shirt. Many pieces of contingent information of precisely this kind have to be part of the stock of data available to the interpreter when contingent interpretative acts are performed within the borders of an occasional interpretation field (Prandi 2004: Ch. 1). But it is obvious that the piece of occasional information about Ann’s shirt cannot possibly find its place within the lexical definition of the word red. The long-lasting datum that red is the colour of blood, by contrast, is a relevant cognitive anchor for circumscribing the shared concept. The activation of contingent information is a condition for understanding 5

The gap between lexical classes and consistent conceptual classes is underlined by metaphorical lexical extensions of the meaning of such relational lexemes as verbs or adjectives. In English, for instance, one can harbour, gratify, obey, espouse a desire, which can fuel a project. Since they are shared by English speakers, all these uses are consistent: as we have remarked above, lexical structures are tautologically consistent. This however implies that the inclusion of wishes, boats and human beings in the same lexical class bears no consequence on ontological classification.

114

Michele Prandi

a contingent message, but not a condition for understanding the longlasting meaning of a word such as red. The borderline between lexical contents and encyclopaedic information is at one and the same time elusive, as stressed by Haiman (1980: 329), and inescapable: “One does not expect to find in a dictionary a compendium of everything that is known about horses: if one did, the entry for ‘horse’ alone would be considerably longer than an entire dictionary. But where exactly does one stop? And, more important, why does one stop?”. Haiman’s question presupposes that lexical and encyclopaedic information are similar kinds of data, so that their difference is a pure matter of granularity and any boundary an arbitrary choice. However, the difference between a lexical definition and an encyclopaedic description is not simply a matter of degree within a continuum – “where exactly does one stop?”, as Haiman puts it – but of relevance. An encyclopaedic description ideally contains anything people are assumed not to know and would like to know about the objects referred to by words, and takes for granted anything that can be assumed as shared by everybody. An encyclopaedic description of a car, for instance, is not expected to explain at length its socially assumed function, whereas one would be surprised not to find in it a set of technical and historical data, and an accurate description of different types and famous models of cars. This considerable amount of data about cars is likely to enrich one’s empirical knowledge. The definition of a concept, by contrast, tries to make explicit the assumptions that are ideally shared by everybody, namely everything everybody is supposed to take for granted and rely upon when using a word. As the nature of these tacit assumptions is not empirical, their explication does not expand our body of positive knowledge. This is the reason why a concept, even if it necessarily contains some positive information about entities, is not simply a selection from encyclopaedic data but belongs to an incommensurable order of magnitude. Once the core of the functional lexicon is restricted to a repository of long-lasting conceptual structures shared and relied upon by its users, the following step is to define the place of the two independent orders of long-lasting and shared conceptual structures, that is, cognitive models and consistency criteria.

Distributional restrictions based on word content and their place in dictionaries

115

If we move from the structure of lexical information to the structure of dictionaries, the first point to be stressed is that consistency criteria are never stated in dictionaries. A good dictionary of English, for instance, is expected to state that the verb murder denotes “the deliberate and unlawful killing of a person” (Cobuild Collins) but not that the consistent use of the verb requires an animate object. A possible reason could be that consistency criteria are not language-specific. However, this hypothesis is falsified by cognitive models, which, although generally shared, are incorporated into definitions. In fact, the real reason lies at a deeper level. If it is true that both lexical structures and cognitive models belong to the territory of consistency, and that the function of consistency criteria is to circumscribe such a territory from outside, the conclusion is that the place of consistency criteria is not within the definitions of lexical contents but among the presuppositions of the act of defining and more generally of the lexical enterprise. This point is underlined by Black (1952[1954: 32]), who compares the consistency conditions of lexical definitions to the felicity conditions of speech acts. Just as an act of promise fails if it is addressed to a celestial body, the definition of such verbs as murder, assassinate, exterminate, kick the bucket or slaughter only makes sense against the presupposition – the tautology, in a sense – that death can only be predicated of animate beings. In the same way that the presuppositions of consistent actions are never stated but simply relied upon when acting, the presuppositions of definitions are not stated but simply relied upon when defining. Consistency criteria remain “outside the jurisdiction of the definition” is Black’s conclusion. Black’s remark certainly holds for the traditional game of defining, which assumes that the object of lexical definitions is provided by consistent conceptual structures. In particular, it holds for both formal lexical structures and consistent conceptual models of things and processes, which are equally assumed to be consistent. However, if the scope of lexical description is moved from the individual lexeme to the structure of the model sentence and the aim of a dictionary is not simply to describe formal and substantive lexical contents but to account for the distribution of lexemes within sentence structures – for uses (em plois), in Gross’s terminology – consistency can no longer be taken for

116

Michele Prandi

granted but is in turn called into question. In the same light, consistency criteria can no longer be kept in the background but have to be focused on as objects of lexical analysis. When distributional analysis crosses the borderline of purely formal syntax to include the syntax of concepts, so to speak, Harris’s idea of lexicon as a conceptual complement of formal syntactic distribution becomes relevant. In formal distributional terms, namely in terms of formal syntactic structures, there is no difference between such a sentence as John poured wine into Mary’s glass and And Winter pours its grief in snow (Emily Brontë): both are perfectly well formed. The difference is that the former is consistent whereas the latter is not. The former builds up a complex meaning that corresponds to a shared conceptual model independently accessible to consistent thought, whereas the latter builds up a complex meaning that finds no place among consistent concepts. If its task, among others, is to account for such a difference, a dictionary seen as a systematic description of lexical structures and contents cannot help but contain a systematic analysis of consistency criteria. Gross’ model of “generative lexicon”, for instance, makes room for them (Gross 2012). In Gross’ model, consistency criteria are dealt with in terms of such “hyperclasses” as ‘abstract’, ‘concrete’, ‘human’, ‘animate’ and ‘vegetable’ and thus distinguished from distributional restrictions internal to consistency, including both lexical solidarities and cognitive models, dealt with in terms of “object classes”. Gross’ approach suggests that the difference between consistency criteria and distributional restrictions internal to consistency is a matter of granularity and thus somehow overlooks the essential difference between structures that account for consistency and structures that are by definition consistent. In spite of this, the room for consistency criteria in a dictionary that focuses on the syntax of concepts is out of question. 3.2 Content-based constraints and formal syntax At this point, we are ready to deal with the last question: the relationship between content-based distributional restrictions and formal syntax. If it is conceived of as a systematic description of the consistent distribution of lexemes within the structure of model sentences, a

Distributional restrictions based on word content and their place in dictionaries

117

dictionary contains a syntax of its own – a syntax of concepts. This being the premise, the relevant question is whether the syntax of concepts, namely the set of relations displayed by consistent conceptual structures and made explicit in a dictionary, is the same as the syntax of forms, namely the structure of model sentences. As mentioned above, this question receives a positive answer within the framework of cognitive linguistics, and in particular of construction grammar: if it is true that they mirror the structure of consistent conceptual models, the formal syntactic structures of model sentences coincide with the relational dimension of the lexicon. Within such a model, there is no room for an autonomous formal syntax, that is, for a distribution of formal classes of expressions independent of the consistent distribution of concepts. Borrowing Husserl’s (1901[1970: 511]) words, there is no room for “that a priori system of the formal structures which leave open all material specificity of meaning”. In my opinion, the idea of a formal syntax coinciding with the relational dimensions of lexical contents is falsified by an inescapable datum belonging to the common experience of homo loquens: syntax is not only an instrumental device in the service of independent and consistent conceptual structures, but also a creative device capable of connecting concepts in unexpected ways. The complex conceptual structures that circumscribe the distribution of shared concepts are tautologically consistent. The distribution of syntactic classes in formal syntactic structures, by contrast, is not constrained by a requirement of consistency. The same formal syntactic structures that mirror consistent combinations of concepts when used in a purely instrumental way – as for instance in John poured wine into Mary’s glass – are capable of connecting concepts in such a way as to challenge the shared conceptual lawfulness: And Winter pours its grief in snow. A fortiori, the formal structures of syntax are not constrained by lexical solidarities – John murdered a spider – and by cognitive models: Fire burns within ice / the sun has become black. As Husserl (1901[1970: 511–512]) points out, inconsistent meanings do not belong to meaninglessness but are the outcome of a successful formal connection of significant parts to form a meaningful whole, or “unified meaning” (“einheitliche Bedeutung”). Since they connect concepts in

118

Michele Prandi

unexpected ways, they both document the autonomy of formal syntactic structures and provide a functional justification for it (Prandi 1987, 2004). Formal autonomy of syntax is the necessary condition for creative connection of concepts. If it is true that lexicon has a syntax, syntax goes far beyond lexicon.

References Black, Max 1952(1954). Definition, presupposition and assertion. The Philosophical Review, 61. Repr. In Black, Max (ed.) Problems of Analysis. Philosophical Essays, London: Routledge & Kegan Paul, 24–45. Carnap, Rudolph 1932(1959). Überwindung der Metaphysik durch logische Analyse der Sprache. In Erkenntnis II, 219–241. Engl. transl.: The elimination of metaphysics through logical analysis of language. In Ayer, Alfred (ed.) Logical Positivism, New York: Macmillan, 60–81. Chomsky, Noam Avram 1957. Syntactic Structures. The Hague/Paris: Mouton. Chomsky, Noam Avram 1965. Aspects of the Theory of Syntax. Cambridge/Mass: The MIT Press. Chomsky, Noam Avram 1966. Topics in the theory of generative grammar. In Sebeok, Thomas (ed.) Current Trends in Linguistics. Vol. III: Theoretical Foundations. The Hague/Paris, Mouton, 1–60. Coseriu, Eugenio 1967. Lexicalische Solidaritäten. Poetica. 1, 293–303. Coseriu, Eugenio 1968. Les structures lexématiques. Zeitschrift für Französische Sprache und Literatur. Beiheft 1, 3–16. Croft, William / Cruse, D. Alan 2004. Cognitive linguistics. Cambridge: Cambridge University Press. Dik, Simon C. 1989(1997). The Theory of Functional Grammar. Part I: The Structure of the Clause. Dordrecht – Providence. 2nd ed. Berlin/New York: Mouton De Gruyter. Duchacek, Otto 1965. Sur quelques problèmes de l’antonymie. Cahiers de lexicologie. 6/1, 55–66.

Distributional restrictions based on word content and their place in dictionaries

119

Fauconnier, Gilles 1992. Sens potentiel: grammaire et discours. In de Mulder, Walter / Schuerewegen, Franc / Tasmowski, Liliane (eds) Enonciation et Parti pris. Amsterdam: Rodopi, 159–171. Fauconnier, Gilles 1997. Mappings in Thought and Language. Cambridge: Cambridge University Press. Fillmore, Charles J. 1977. The case for case reopened. In Cole, Peter / Sadock, Jerrold Murray (eds) Syntax and Semantics. 8: Gram matical Relations. New York/San Francisco/London: Academic Press, 59–81. Geeraerts, Dirk 1991. La grammaire cognitive et l’histoire de la sémantique lexicale. Communications. 53, 17–50. Goldberg, Adele E. 1995. Constructions. A Construction Grammar Ap proach to Argument Structure. Chicago/London: The University of Chicago Press. Gross, Gaston 2012. Manuel d’analyse linguistique. Approche sémantico-syntaxique du lexique. Villeneuve d’Ascq: Presses Universitaires du Septentrion. Grossmann, Francis / Tutin, Agnès 2002. Collocations régulières et irrégulières: esquisse de typologie du phénomène collocatif. Revue française de linguistique appliquée. Lexique: problèmes actuels. 7/1, 7–25. Grossmann, Francis / Tutin, Agnès (eds) 2003. Les collocations. Analyse et traitement. (Travaux et recherches en linguistique appliquée). Amsterdam: De Werelt, 2003. Haiman, John 1980. Dictionaries and encyclopaedias. Lingua. 50, 329– 357. Haiman, John 1985. Introduction. In Haiman, John (ed.) Iconicity in Syntax. Amsterdam: John Benjamins, 1–7. Harris, Zellig 1946. From morpheme to utterance. Language. 22, 161– 183. Hjelmslev, Luis 1943(1961). Omkring sprogteoriens grundlaeggelse. Copenhagen. Engl. transl.: Prolegomena to a Theory of Lan guage. Madison: The University of Wisconsin Press. Holland, Doroty / Quinn, Naomi (eds) 1987. Cultural Models in Lan guage and Thought. Cambridge: Cambridge University Press. Husserl, Edmund 1901(1970). Logische Untersuchungen. Band 1: Halle 1900; Band II: Halle 1901. Rev. edition: Band 1 (Prolegomena),

120

Michele Prandi

Band II, Teil I (Res. I–V), Halle 1913; Band II, Teil II (Res. VI), Halle 1921. Critical edition: Husserliana, Vol. XVIII (1975) – XIX, I–II (1984). The Hague: Nijoff. Engl. transl.: Logical Inves tigations. London: Routledge/Kegan Paul. Jakobson, Roman 1959(1971). Boas’ view of grammatical meaning. The Anthropology of Franz Boas: Essays on the Centennial of his Birth, Memoir LXXX. Stanford: American Anthropological Association. Repr. in Jakobson, Roman Selected Writings. Vol. II: Word and Language. The Hague/Paris: Mouton, 489–496. Jespersen, Otto 1924. The Philosophy of Grammar. London: George Allen & Unwin. Lakoff, George 1971. Presuppositions and relative well-formedness. In Steinberg, Danny D. / Jakobovits, Leo A. (eds) Semantics. Cambridge: Cambridge University Press, 329–340. Langacker, Ronald 1987. Foundations of Cognitive Grammar. I. Stanford: Stanford University Press. Langacker, Ronald 1991. Foundations of Cognitive Grammar. II. Stanford: Stanford University Press. Langacker, Ronald 1991(1992). Concept, Image and Symbol. The Cognitive Basis of Grammar, 2nd ed. Berlin/New York: Mouton/ DeGruyter. Langacker, Ronald 2000. Grammar and Conceptualization. Berlin/ New York: Mouton/DeGruyter. Leech, Jeoffrey 1974. Semantics. Harmondsworth: Penguin Books. Longacre, Robert E. 1985(2006). Sentences as combinations of clauses. In Shopen, Tymoty (ed.) Language typology and syntactic description. Vol. 2: Complex constructions. 2nd ed. Cambridge: Cambridge University Press, 372–420. Lounsbury, Floyd G. 1964. The structural analysis of kinship semantics. In Lunt, Horace Gray (ed.) Proceedings of the Ninth International Congress of Linguists. The Hague: Mouton, 1073–1093. Lyons, John 1963. Structural Semantics. Oxford: Blackwell. Lyons, John 1977. Semantics. Cambridge: Cambridge University Press. McCawley, John D. 1970(1971). Where do noun phrases come from?. In Jakobs, Roderick A. / Rosenbaum, Peter S. (eds) Readings in English Transformational Grammar. Waltham/Mass: Blaisdell.

Distributional restrictions based on word content and their place in dictionaries

121

Revised version in Steinberg, Danny D. / Jakobovits, Leo A. (eds) Semantics. Cambridge: Cambridge University Press, 217–231. Melʼčuk, Igor 2003. Collocations dans le dictionnaire. In Szende, Thomas (ed.) Les écarts culturels dans les Dictionnaires bi lingues. Paris: Champion, 19–64. Mel’čuk, Igor / Polguère, Alain 2006. Dérivations sémantiques et collocations dans le DiCo/LAF. Langue française. 150, 66–83. Palmer, Frank R. 1976. Semantics. Cambridge: Cambridge University Press. Porzig, Walter 1934. Wesenhafte Bedeutungsbeziehungen. Beiträge zur deutschen Sprache und Literatur. 58, 70–97. Prandi, Michele 1987. Sémantique du contresens. Paris: Les Editions de Minuit. Prandi, Michele 2004. The Building Blocks of Meaning. Ideas for a Philosophical Grammar. Amsterdam/Philadelphia: John Benjamins. Saussure, Ferdinand de 1916(1974). Cours de linguistique générale. Paris: Payot. Critical edition by T. de Mauro 1972. Engl. transl.: Course in General Linguistics. London: Fontana & Collins. Trier, Jost 1931(1973). Der deutsche Wortschatz im Sinnbezirk des Ver standes. Die Geschichte eines sprachlichen Feldes. Part I: Von den Anfangen bis zum Beginn des 13. Jahrhunderts. Heidelberg: Winter. Repr. in Trier 1973, 40–65. Trier, Jost 1932(1973). Sprachliche Felder. Zeitschrift für Deutsche Bil dung. 8. Repr. in Trier 1973, 93–109. Trier, Jost 1973. Aufsätze und Vorträge zur Wortfeldtheorie. Ed. by van der Lee Antony / Reichmann, Oskar. The Hague/Paris: Mouton. Wierzbicka, Anna 1980. Lingua Mentalis. Sidney/New York: Academic Press.

Béatrice Lamiroy

For a typology of phraseological expressions: how to tell an idiom from a collocation?

Abstract: This article is devoted to French phraseology. Although phraseology has been researched for several decades now, to our knowledge no comprehensive typology of phraseological expressions has been proposed so far. We argue here that four phraseological types should be distinguished, both by theoretical linguists and by lexicographers: collocations (e.g. prendre la parole ‘take the floor’), idioms (e.g. prendre la mouche, lit. take the fly ‘to get furious’), conversational routines (e.g. à qui le dis-tu ! lit. to whom do you say this! ‘formula to confirm and reinforce what the speaker has said’) and proverbs (e.g. un tiens vaut mieux que deux tu l’auras ‘a bird in the hand is worth two in the bush’). All types share the property of being “fixed” combinations of words, i.e. multi-word units which are part of the lexical competence of a native speaker but have to be learned by heart by foreign language learners. However, each type also has properties of its own that make it distinct from any other type. We zoom in here on two types only, collocations and idioms. Collocations are binary units which usually consist of a base and a collocative and which are due to the recurrent co-occurrence of particular words. Their meaning more often than not is compositional. Idioms instead are multi-word expressions which are the result of a diachronic process in which the word combination became progressively frozen and whose meaning is prototypically opaque. Keywords: collocation, idiom, phraseology, French

1. Introduction The aim of this paper is to zoom in on the relation between two different types of linguistic expressions which belong to the phraseology of a language, viz. idiomatic expressions and collocations, and to shed light on their similarities and their differences. Although phraseology has deserved a lot of interest both from theoretical and applied linguists, it still suffers from an often confusing and ambiguous terminology and from having fuzzy borders (Cowie 1998; Granger/Paquot 2008; Lamiroy

124

Béatrice Lamiroy

2008). This is not only unfortunate from a theoretical point of view, but it is particularly problematic from a lexicographic vantage point for the simple reason that lexicographers have to decide which entries should figure in their dictionaries and which should not. This obviously holds for the words of a language but also for its phraseological expressions, since the latter also make up a language’s lexicon: like single words, their meaning is most often conventional, i.e. non-compositional, hence they have to be learned by heart by foreign language learners (Nesselhauf 2005; Verlinde et al. 2006), and therefore must be included in a dictionary. Although the question is well known by lexicographers (Blumenthal/Hausmann 2006; Cowie 1998; Hausmann 2008), mixing up different kinds of phraseological expressions is not unusual in dictionaries. Thus the main phraseological dictionary of French, viz. Dic tionnaire Robert des expressions et locutions (Rey/Chantereau 1999) contains 4 types of phraseological expressions that we would rather distinguish as separate categories: (1) proverbs (e.g. qui dort dine ‘when you are sleeping you do not have to eat’), (2) idioms (e.g. dormir sur ses deux oreilles lit. sleep on your two ears ‘to keep the wolf from the door’) (3) sequences that will be analyzed here as collocations (e.g. tomber de sommeil lit. fall from sleep ‘be very sleepy’) and (4) what we consider to be situational sentences which are typical of spoken conversation (Klein/Lamiroy 2011) (e.g. à qui le dis-tu ! lit. to whom are you saying this ‘formula to confirm and reinforce what the interlocutor has said’). On the other hand, several metaphorical expressions such as ouvrir des horizons ‘to offer new perspectives’ or couper le souffle ‘to be astonishing’ appear both in Robert’s Dictionnaire des expressions et locutions (p. 510 and p. 855) and in Robert’s Dictionnaire des Combinaisons de mots which is a dictionary of French collocations (p. 425 and 883): examples such as these show that lexicography in the phraseological domain is not always as coherent as it should be. The study presented here is meant as a contribution to “disentangling the phraseological web” (Granger/Paquot 2008), mainly from a theoretical viewpoint, but if our analysis is correct, it should also be useful for the lexicographic practice in the field of phraseology. We will argue that although idiomatic expressions and collocations share a number of properties, which is probably why they often get mixed up, they also have syntactic and semantic characteristics of their own.

125

For a typology of phraseological expressions

All our data will be taken from French. Note, to begin with, that the two phraseological types under study have in common that they may belong to any of the major parts of speech: nouns (1a-2a), verbs (1b-2b), adjectives (2c) and adverbs (1c): (1) a. un nid de poule lit. a chicken’s nest b. casser les pieds à quelqu’un lit. to break someone’s feet c. (partir) en catimini/en douce (2) a. une peur bleue lit. a blue fear b. pousser un cri lit. to push a cry c. grièvement blessé

‘a hole in the road’ ‘to bother someone’ ‘(to leave) in secret’

‘a major fright’ ‘to cry out’ ‘seriously injured’

Idiomatic expressions and collocations, which will both be under analysis here, are not the only subtypes that make up a language’s formulaic lexicon (Wray 2002): at least1 two other types should be distinguished. On the one hand, there are proverbs or sayings such as (3) a. Chat échaudé craint l’eau froide lit. a heated cat is afraid of cold water ‘if you have been traumatized by something, you may become excessively careful’ b. Qui veut aller loin ménage sa monture lit. he who wants to go far does not tire his horse ‘if you want to live a long life, it is better to avoid excesses’ c. La robe ne fait pas le moine lit. the gown does not make the monk ‘you cannot judge a book by its cover’

Sayings as in (3) differ from idiomatic expressions both formally and semantically. Syntactically, they are always full sentences whose subject

1

Some authors, e.g. Dannell (1992) or Machonis (2010) include phrasal verbs, i.e. verbs followed by a “fixed” adverb among phraseological expressions. As they are much less frequent in French than in English, they will not be considered here as one of the phraseological subtypes.

126

Béatrice Lamiroy

is fixed. Semantically, they have a generic interpretation and function as precepts with a moralizing value. On the other hand, there is a much less studied subtype that we have called “situational” sentences or conversational “routines” (Klein/ Lamiroy 2011). The notion of linguistic routine was launched by researchers in the ethnography of communication (Hymes 1989): routines correspond to highly conventionalized and pre-patterned expressions whose recurrent occurrence has led to automatization and which are tied to standardized communication situations such as greeting and parting, apologizing and making requests, etc. (Coulmias 1981), e.g. I beg your pardon, How is life, Good Morning, etc. Obviously, they vary according to the socio-cultural tradition of the speakers. Although a single word can correspond to a conversational routine, e.g. French Salut!, with respect to phraseology, we only apply the notion of linguistic routine to multi-word units which syntactically correspond to full sentences. Their peculiarity resides in that they only mean something in a particular discursive situation, i.e. they can only be used in a certain pragmatic context in which the hearer is called upon by the speaker. They therefore often appear in utterances containing 2d person personal pronouns and are typical of spoken colloquial French. Examples are: (4) a. Tu rigoles ! lit. you are joking! formula to express one’s unbelief b. Tu parles ! lit. you speak! formula to say that the opposite is true c. Cause toujours, tu m’intéresses lit. Keep talking, you interest me formula to express that one is not interested at all in what the interlocutor is saying

Although sentences of the kind illustrated in (4) have not been analyzed traditionally as being part of the formulaic lexicon, it should be noted that they are totally fixed, both formally and semantically, and therefore also belong to a language’s phraseology: rigoler cannot be replaced in (4a) by the synonymous verbs rire. Similarly, parler and causer, although synonymous, are not interchangeable in (4b-c), and the 2d person plural *Vous rigolez / *Vous parlez / *Causez toujours …

For a typology of phraseological expressions

127

is not possible when these sentences are used as formulae. They are thus lexically and morpho-syntactically “fixed” sentences. Note that their meaning in (4a-b-c) is also fixed, i.e. it does not correspond to the literal or compositional meaning of the homonymous free sentences. Defining what all phraseological expressions have in common is not an easy task: Nous sommes nombreux à trouver que c’est un thème admirable [le figement], sans pouvoir dire avec netteté ce que c’est. (Martin 1997: 291) [We are many to consider fixedness a most interesting topic. However, we cannot exactly say what it is].

The general definition we propose here is the following: a phraseological expression is a multiword sequence which does not result from a free combination of words in a given language and which belongs to the lexical competence of a native speaker of that language. Although proverbs and situational sentences undoubtedly also belong to the formulaic expressions of a language (see Klein/Lamiroy, forthc.), they will not be examined here, as we will only focus on idioms and collocations. As mentioned before, all the data analyzed hereafter belong to French. However, we believe that the question of how to define and hence, to distinguish between the different subtypes of phraseological expressions of a language applies to many other languages than French. We thus believe that some of our insights hold for phraseology in general. The following examples illustrate the four subtypes in English: idiomatic expressions (5a), collocations (5b), proverbs (5c) and situational sentences (5d): (5) a. To kick the bucket To go astray b. Fast food (*quick food) Quick meal (*fast meal) c. Too many cooks spoil the broth A bird in the hand is worth two in the bush d. The pleasure was mine! You are welcome!

128

Béatrice Lamiroy

The paper is organized as follows. The next section is devoted to a short presentation of a research project, called the BFQS project, which was devoted to the phraseology of native speakers of French in France, Belgium, Switzerland and Quebec (Lamiroy et al. 2010 and ). This project was based on pioneering work by Maurice Gross (Gross 1982 and 1984), who dedicated a major part of his research to the construction of a database of about 40,000 French verbal “fixed” expressions.2 However, one of the problems we encountered when using Gross’s database, was that it includes all kinds of phraseological expressions without distinction, ranging from real idioms (e.g. jouer avec le feu lit. to play with fire ‘to take high risks’) to what we consider to be situational sentences (e.g. ôte-toi de là que je m’y mette ‘get out of there so that I can take your place’) or collocations (e.g. donner un coup de balai lit. to give a hit of the broom ‘to sweep’ or prêter attention ‘pay attention’). As we will argue here, idiomatic expressions and collocations both belong to a language’s phraseology, and therefore have a number of properties in common, as we will see in section 4, but they correspond to separate types, as they display different syntactic and semantic properties. We will develop this hypothesis in sections 3 and 4, which deal with idiomatic expressions and collocations respectively. In section 5 we draw some conclusions.

2. The BFQS project All the expressions gathered by M. Gross were of use in France, e.g. aller à Canossa (lit. go to Canossa ‘to do something humiliating’) or manger à tous les rateliers (lit. eat from all racks ‘to be opportunistic’) but did not take into consideration any of the expressions used by French natives outside France, e.g. the Belgian expression tomber avec son derrière dans le beurre (lit. fall with your buttocks in the butter ‘to 2

The database unfortunately remained unpublished, as it was still in progress when Maurice Gross died (2001). Most of the data however can be consulted on line ().

For a typology of phraseological expressions

129

be lucky’), se lever le gros bout le premier (lit. stand up with the big toe first ‘to get up in the morning in a bad mood’), which is used in Quebec, or peindre le diable sur la muraille, used in the French speaking part of Switzerland (lit. draw the devil on the wall ‘to be pessimistic’). The main objective of the BFQS project was to complete M. Gross’s database with the idioms of the larger francophonie area, i.e. variants of French used by native speakers in three different countries outside the Hexagon: Belgium, Quebec and Switzerland.3 So, from a methodological viewpoint, the project was not meant as a sociolinguistic survey: this entails that some expressions may be better known or more frequently used by certain native speakers than others (differing in age, education, gender etc.), the individual phraseological lexicon largely varying from speaker to speaker, as does the lexicon of single words. The BFQS database did not aim either at giving any detailed information about regional variation within each country. Among the Belgian expressions for example, some idioms are rather used in Brussels (e.g. tomber de son sus ‘to faint’, while others may be typical of Wallonia (e.g. avoir bien pour faire ‘to be rich’), but this goes beyond the description of the three national varieties of Belgian, Quebec and Swiss French and should be the object of a separate investigation, such as the one carried out by Rézeau (2001) on the regional variation within France. In line with a variationist position with respect to French (Gadet 2007; Massot/Rowlett 2013) i.e. an approach which advocates against the imposition of the normative (and exclusively Parisian) variant, we checked which of the expressions registered by M. Gross were also known by native speakers of French outside France. We found that this was the case in about 75 à 80% of the cases: BFQS expressions, i.e. expressions common to the four variants, thus constitute the hard core of all phraseological expressions. Examples are trainer le diable par la queue (lit. drag the devil by his tail ‘to be poor’), ne pas y aller par quatre chemins (lit. not go somewhere by four paths ‘to act or speak in a direct manner’) or faire porter le chapeau à quelqu’un (lit. make someone wear the hat ‘accuse someone of something he is not responsible 3

Thus the project did not include any expressions belonging to African French or American French (except Quebec).

130

Béatrice Lamiroy

for’). Expressions such as these are thus shared by speakers of French in France and in the French speaking parts of Belgium, Switzerland and Canada. The remaining expressions of M. Gross’s database are of two types. Certain expressions are used in certain areas of the francophonie but not in all of them, thus belonging to intersections between varieties. For obvious reasons, these intersections take on different forms, as certain expressions may be common to France, Belgium and Switzerland but unknown in Canada, e.g. aller au charbon (lit. go to the coal ‘to go off to work’), while others are shared by Belgian and French speakers only, e.g. avoir les jambes en flanelle (lit. have legs made out of flannel ‘to be weak in your legs’), etc. The possible intersections are: BF, FQ, FS, BFQ, BFS or FQS. The other type corresponds to expressions exclusively used within the Hexagon. This was one of the surprising results of the BFQS project: given the traditional “centralist” approach of French, French idiomatic expressions are usually a priori considered to be common to all speakers of French. Yet this is not the case at all. Idioms such as peigner la girafe (lit. comb the giraffe), coincer la bulle (lit. catch the bubble) both meaning ‘to do something useless’, se faire appeler Arthur (lit. be called Arthur ‘to be yelled at’) or ça fait la rue Michel (lit. this makes the Michael street ‘this is very convenient’) are used in France and in France only, and should therefore be labeled ‘F’, i.e. they correspond to what should be called francismes, by analogy with the well-known notions of belgicismes, helvétismes and québecismes. After establishing which expressions are common to the four varieties (BFQS) and which exclusively belong to the French of France (pure F), a further objective of the project was to gather all the French idiomatic expressions that had not been listed by M. Gross. In other words, the aim was to add to the stock of French idioms those which specifically belong to the varieties spoken outside France. Here again, two types can be distinguished. On the one hand, for certain expressions from Gross’s database we found that there is a “regional” equivalent used in Belgium, Switzerland and/or Canada. For example, for the common expressions in (6) and (7), each of the three French speaking regions also has its own expression:

131

For a typology of phraseological expressions (6) a. Qu’il aille au diable ! lit. that he may go to the devil ‘To hell with him!’ b. Qu’il aille à la merde ! lit. that he may go to the shit c. Qu’il aille au balai ! lit. that he may go to the broom d. Qu’il aille aux pives ! lit. that he may go to the fir cones

BFQS

(7) a. Cette robe m’a coûté les yeux de la tête lit. this dress costed me the eyes of my head ‘This dress was extremely expensive’ b. Cette robe m’a coûté un os lit. this dress costed me a bone c. Cette robe m’a coûté un saladier lit. this dress costed me a salad bowl d. Cette robe m’a coûté un bras lit. this dress costed me an arm

BFQS

B Q S

B S Q

On the other hand, there are expressions which are totally sui generis, i.e. they do not correspond to any equivalent used within the Hexagon. For example, idioms such as faire de son nez (lit. make from your nose ‘to make fuss’) or arriver comme des figues après Pâques (lit. arrive like figs after Easter ‘to be late’) are exclusively Belgian. Similarly, passer la nuit sur la corde à linge (lit. spend the night on the washing line ‘go out the whole night’) or avoir les yeux ronds (lit. have round eyes ‘to be drunk’) are only used in Quebec and être sur le balan (lit. be in unbalance ‘to hesitate’) and se miner le plot (lit. break one’s own head ‘to be worried’) only belong to Swiss French. In order to gather the data, we not only consulted the existing dictionaries of regional French but also found expressions by checking examples on the Internet.4 It will come as no surprise that the Quebec expressions (around 5000) largely outnumbered the Belgian and Swiss 4

To make sure that the data taken from the net were reliable, we only used documents of which the source could be identified, e.g. the Belgian expression avoir un oeuf à peler (lit. have an egg to peal ‘to have a score to settle with someone’) is illustrated by an example found in the newspaper L’avenir based in Liège: (3) J’ai aussi un œuf à peler avec mes coéquipiers de club. Ceux qui font partie de la sélection m’ont beaucoup charrié. (L’avenir.net, 29/02/12) ‘I also have a

132

Béatrice Lamiroy

ones (around 2000 each), Quebec being geographically much further from the Hexagon. The three regional variants B, Q and S, share two major properties with respect to the expressions from France: they are often archaisms, i.e. expressions that were used in France in former days (ex. 8a-b), or calques from the local neighboring language (Dutch, English or German respectively, ex. 9a-b-c): (8)

a. avoir une brette b. être d’adon

‘to quarrel’ ‘to be friendly’

(9) a. jouer avec les pieds de quelqu’un ‘to fool someone’ lit. to play with one’s feet b. tomber en amour ‘to fall in love’ c. scier du bois ‘to snore’ lit. to saw wood

B Q B Q S

Among the major theoretical results of the investigation, we should mention the following: In all varieties, we found that the idiomatic expressions did not differ syntactically from the so-called “free” sentences, i.e. they always correspond to one of the possible constituent structures of French, ranging from intransitive to ditransitive structures, as shown by the following examples: (10) a. Ça s’arrose ! ‘Let’s celebrate this with a drink’ b. Chacun doit jouer le jeu ‘Everyone has to play the game’ c. Ils jouent au chat et à la souris ‘They play cat and mouse’ d. Paul prend des vessies pour des lanternes ‘Paul pulls the wool over his eyes’ e. Paul appelle un chat un chat ‘Paul calls a spade a spade’ f. Paul saute du coq à l’âne ‘Paul jumps from one subject to another’

5

N0 V5 N0 V N1 N0 V Prep N1 N0 V N1 Prep N2 N0 V N1 N2 N0 V Prep N1 Prep N2

score to settle with my teammates. Those who were in charge of the selection were very nasty to me.’ N, V and Prep are the symbols used for Noun, Verb and Preposition. The numbers 0, 1, etc. correspond to their syntagmatic order in the structure.

For a typology of phraseological expressions

133

The reason why this is so, is straightforward: idioms are fixed expressions from a synchronic point of view, but once they were, at some stage, free combinations and only became “frozen” over time. Thus a modern French expression such as porter le chapeau lit. wear the hat ‘accuse someone of something he is not responsible for’ is totally opaque nowadays, but was once transparent and goes back to the medieval habit of putting a hat on the the head of those who were found guilty when they were taken through the streets of the village or town (Martin 1997). The sentences in (11) exemplify the same principle: they once were free sentences, but would not be grammatical in Modern French. Thus raison in avoir raison (lit. have reason ‘to be right’) can no longer be the antecedent of an anaphor (l’ in 11a), conseil in porter conseil (lit. bring council ‘to give advice’) could not be followed by a relative clause in Modern French (as in 11b) nor could la boule in perdre la boule (lit. lose the ball ‘to lose your head’) become the direct object of the have + past participle construction (as in 11c). (11) a. Il me semble que vous avez raison ; et cependant il est vrai que vous ne l’avez pas. (Molière, quoted by Fournier 1998: 187) ‘It seems to me that you are right; however it is certain that you are not’ b. Après que la nuit vous aura porté conseil, qui sera apparemment de nous séparer courageusement. (Mme de Sévigné, Lettre du 1.10.1677, quoted by Fournier 1998: 187) ‘After taking the council of a night’s sleep, which will be that we separate courageously’ c. J’avais la boule complètement perdue quand nous nous sommes retrouvés au commencement de février. (Flaubert, Correspondance, letter to his niece Caroline, April 5, 1871) ‘I was totally out of my mind when we saw each other again at the beginning of February.’

A second result is that from a semantic point of view, the geographic variation is particularly striking in a number of general domains such as (un)happiness, health and illness, being rich or poor and weather conditions. Thus the following idioms (12a to 12f) which belong to common French and all mean ‘to have the blues’ also have their specific variants in non-hexagonal French, as illustrated by the Quebec examples in 12g to 12i:

134

Béatrice Lamiroy (12)

a. Avoir l’âme en peine / le blues b. Ne pas avoir le moral c. Avoir le moral dans les chaussettes d. Avoir le/du vague à l’âme e. Broyer du noir f. Etre au / dans le 36e dessous g. Avoir les bleus h. Avoir le balai bas i. Avoir le moral à terre

BFQS BFQS BFQS BFQS BFQS BFQS Q Q Q

Another result is that the notion of synonymy, which usually refers to formally different words with identical meaning within one language, does not apply to French expressions belonging to different countries. Whereas examples (12a) to (12f) can be called synonyms, because they are interchangeable for the same speaker according to the context – e.g. avoir le moral dans les chaussettes (lit. to have your moral in your socks) is more colloquial than the more literary avoir l’âme en peine (lit. to have your soul in pain) –, this does not hold for (12g) to (12i), which are only “synonyms” for Canadian speakers, but not for the French speaking people from France, Belgium or Switzerland to whom these expressions are unknown. In other words, the paradox here is that two French expressions can be equivalent from a referential point of view, without being synonymous because they are not shared by the same group of speakers. We therefore propose the term and the notion of geo-synonyms: (12h) for example is a geo-synonym of (12c) for speakers of French outside Quebec. Similarly, while all the expressions in (13) mean ‘to be fed up’, examples (13b) to (13e) are geo-synonyms of (13a): (13)

a. En avoir marre / ras le bol / plein le cul b. En avoir ras le cul / les baskets / la patate / la casquette c. En avoir plein les bottes d. En avoir plein son capot / son sac e. Avoir son load, avoir son voyage

BFQS BFS BFS Q Q

Interestingly, the reverse case also exists: a notion which usually applies to pairs of languages, viz. false friends – words or expressions which are formally similar but have a different meaning according to the language – also applies to regional variants within the same language. Thus avoir

135

For a typology of phraseological expressions

de l’allure means something different in « common » French (14a), in Quebec (14b) and in Belgium (14c): (14) a. ‘to have a certain elegance’ b. ‘to be reasonable’ c. ‘to be orderly’

BFQS Q B

As mentioned at the beginning of this paper, a major challenge of the BFQS project was to extract the idiomatic expressions from Gross’s database, which includes all kinds of phraseological combinations, and hence to determine what an idiomatic expression exactly is. This appeared to be a much more difficult task than expected. The next section is therefore devoted to this question.

3. Idiomatic expressions Although linguists often disagree on the exact definition of what idioms are, there seems to be a consensus in the literature on a certain number of issues. First, since the last decades linguists agree upon the fact that phraseology in general and idioms in particular are important because the “formulaic” part of a language (Wray 2002) is one of its central components, language use being essentially made up of recurrent word combinations. These combinations not only play a crucial role for cognitive reasons (Bolinger 1977; Everaert 1995; Langlotz 2001) but also for simple reasons of quantity, as they make up at least 30 % of all produced speech6 (Dannell 1992; M. Gross 1984; Senellart 1998; Moon 1998). Secondly, idiomatic expressions are “fixed” in a threefold way (G. Gross 1996; M. Gross 1984; Lamiroy et al. 2010: 11–26), the notion of fixedness lying at the heart of the French approach to phraseology 6

According to Dannell (1992), phraseological expessions even amount to 50% of our language production. It should however be noted that Dannell includes phrasal verbs in his statistics, which is not the case of Senellart 1998 for example.

136

Béatrice Lamiroy

(Granger/Paquot 2008):7 (1) idioms have a global, i.e. non compositional meaning: the meaning of an idiom does not correspond to the sum of the meanings of the separate words that make up the expression. This is the reason why their meaning has to be learned by heart by language learners: although one knows the referential meaning of the French words porter ‘to wear’ and chapeau ‘hat’, this will not allow him to understand what the idiom faire porter le chapeau means, viz. to accuse someone who is not guilty (2) they do not allow any lexical paradigmatic variation and (3) they display some morpho-syntactic restrictions regarding one of the categories such as number, person, tense, etc. Each of these characteristics is illustrated in the following examples: •

non-compositional meaning (15) a. N cordon bleu lit. blue ribbon ‘excellent cook’ b. V prendre la mouche lit. take the fly ‘to get very angry’ c. Adv à l’anglaise lit. in the English way ‘secretly’

•

absence of paradigmatic variation (16) a. N *cordon rouge 8 b. V *prendre le moustique c. Adv *à la française

•

lit. red ribbon lit. take the mosquito lit. in the French way

morpho-syntactic restrictions (17) a. prendre la mouche / *les mouches b. tirer le diable par la queue / *par sa queue lit. pull the devil by the tail / *by its tail ‘to be poor’ c. il a avalé son parapluie / *il avale / *il avalera lit. he swallowed his umbrella / *he swallows / *he will swallow ‘he is an unflexible person’

However, it should be noted that the three above mentioned criteria apply to a varying degree according to the expression. Thus certain idioms 7

8

As pointed out by Granger/Paquot 2008, the Anglo-saxon tradition often adopts a wider perspective and mainly insists on the frequent co-occurring of words, thus including combinations that would probably fall outside the scope of phraseology according to French linguists. These combinations are of course totally well formed when taken in their literal (compositional) meaning.

For a typology of phraseological expressions

137

are semantically totally opaque (18a), whereas others are more easily interpretable, their meaning almost being compositional (18b): (18) a. avoir la tête près du bonnet lit. have the head next to the hat ‘to be irascible’ b. pleurer toutes les larmes de son corps lit. cry all the tears of one’s body ‘to sob out’

The example in (18b) has however to be analyzed as an idiomatic expression because it is formally, i.e. lexically (19a) and morphosyntactically (19b), totally fixed: (19) a. * pleurer toutes les larmes de son coeur lit. cry all the tears from one’s heart b. * pleurer beaucoup de larmes de son corps lit. cry many tears from one’s body

The same holds for possible lexical variations: in most cases a paradigmatic change is impossible or it converts the idiom in a “free” sentence (20b), but some idiomatic expressions do allow a certain lexical variation (21b): (20) a. Il a cassé sa pipe lit. he broke his pipe ‘he died’ b. Il a cassé sa tirelire ‘he broke his piggy bank’ (21) a. Il va / suit / poursuit / continue son petit bonhomme de chemin lit. he goes / follows / continues his little man of way ‘he goes his own way’

And similarly, morpho-syntactic constraints do most often apply, but sometimes they do not. Compare for example the possibility of introducing a negation in the following examples: (22b) is a grammatical sentence, but (23b) is not:

138

Béatrice Lamiroy (22) a. Elle nous a donné carte blanche lit. she to-us has given white card ‘She let us act freely’ b. Elle ne nous a pas donné carte blanche (23) a. Les bras m’en tombent lit. the arms to-me fall down ‘I am very surprised’ b. *Les bras ne m’en tombent pas

Moreover, the very existence of semantic, lexical and morpho-syntactic constraints which is characteristic of idioms, is not an exclusive property of phraseological language, i.e. free sentences are also submitted to various restrictions (Hausmann 1997; Lamiroy 2003 and 2008). It suffices to recall that verbs for example subcategorize for certain complements upon which they impose selection restrictions. Lexically, non-idiomatic language is also made up of preferred word combinations, much more so than one would expect, as has repeatedly been pointed out in the literature (cf. Biber/Conrad 1999; Stefanowitsch/Gries 2003; Grossmann/ Tutin 2003; Evert 2005; Blumenthal/Hausmann 2006). Thus lexical solidarity between words and morpho-syntactic constraints also occur in so-called free sentences, as shown in example (24), due to Gaatone (1997: 71), and (25): (24) a. Il faut de la volonté pour faire cela ‘one needs willpower to achieve this’ b. Il faut une forte / grande volonté pour faire cela lit. one needs a strong / a great willpower to do this c. Il faut de la grande volonté pour faire cela lit. one needs great willpower to do this d. *Il faut de la forte volonté pour faire cela lit. one needs strong willpower to do this (25) a. Il va à la maison lit. he goes to the house b. *Il va à une maison lit. he goes to a house c. Il va à une maison mystérieuse lit. he goes to a mysterious house

‘he goes home’

139

For a typology of phraseological expressions

In sum, one has to admit that the phraseological, and hence fixed, character of an expression is a matter of degree, which is one of the reasons why it is not easy to define them in a straightforward way (Bolly et al. 2006; Forsberg 2006; Gonzalez Rey 2002; G. Gross 1996; Lamiroy 2003; Lamiroy 2008; Lamiroy/Klein 2005; Lamiroy et al. 2010; Svensson 2004). Three types of idiomatic expressions can therefore be distinguished: (1) prototypical idioms, which are totally unanalyzable and opaque semantically, e.g. aller à Canossa ‘to do something humiliating’ (2) semi-idiomatic expressions which are interpretable because their meaning, though not referential, can be understood metaphorically, e.g. prendre le taureau par les cornes lit. take the bull by its horns ‘to face a problem’ and (3) expressions which despite a compositional meaning are formally, i.e. lexically and morpho-syntactically, “fixed”, e.g. prendre ses rêves pour des réalités ‘to take your dreams for real’: *prendre ses souhaits pour du vrai and *prendre ses rêves pour une réalité. Thus one has to acknowledge that certain idioms are spectacularly idiomatic, for example when they contain totally opaque archaisms such as noise in (26a) or anaphors without antecedent, as the pronoun en in (26b), while others are closer to “free” language use, as is the case of (18b) or (26c). (26) a. chercher noise lit. look for noise ‘to pick a quarrel’ b. en avoir vu de toutes les couleurs lit. to have seen all colors of them ‘to have gone through many troubles’ c. noyer son chagrin dans l’alcool ‘to drown one’s sorrows in alcohol’

As the following examples show, all of which contain the verb prendre (Gaatone 1997), there is a continuum that goes from highly idiomatic expressions, which correspond to the prototypical definition of an idiom (27a-b) to free sentences (27f to 27h). What lies in between (27c-27d27e), is what we consider to be collocations. (27)

a. prendre la mouche b. prendre une veste c. prendre un risque d. prendre la fuite e. prendre conscience

‘to get furious’ ‘to fail’ ‘to take a risk’ ‘to flee’ ‘to realize’

140

Béatrice Lamiroy

f. prendre un avion g. prendre une bière h. prendre un crayon

‘to take a plane’ ‘to have a beer’ ‘to take a pencil’

4. Collocations Collocations are preferred word combinations, i.e. words that “co-occur more often than their respective frequencies and the length of text in which they appear would predict” (Jones/Sinclair 1974: 19). Collocations are thus the result of a lexical attraction between certain words that is statistically significant. Examples in French are: (28)

a. N chat de gouttière b. N vin chaud c. V courir un risque d. V respirer la santé

lit. gutter cat ‘alley cat’ lit. hot wine ‘mulled wine’ lit. run a risk ‘to take a risk’ lit. breathe the health ‘to be in good health’ e. Adj grièvement blessé lit. severely injured ‘seriously injured’ f. Adj éperdument amoureux lit. lost in love ‘madly in love’ g. Adv en tout et pour tout lit. in all and for all ‘all in all’ h. Adv à la va vite lit. in the goes fast ‘fast’

Collocations have several properties in common with idiomatic expressions, which is why they easily get mixed up. First, both collocations and idioms illustrate what Sinclair (1991: 70) called the Idiom Principle, which views language as basically made of strings of co-occurring words. Language use is in great part the result of lexical affinities between words which are idiosyncratic: words that repeatedly occur together eventually entertain a privileged relationship, which in the case of an idiomatic expression even evolves to a point where the combination becomes frozen and the meaning of the individual words gets lost. As a result, collocations just like idioms must be learned by foreign language speakers, and are often not translatable from one language into another: une pluie battante corresponds to heavy rain in English, but not to “striking rain”, donner la parole

For a typology of phraseological expressions

141

should not be translated by “to give the speech” but by to give the floor to someone, etc. Since the formation of word combinations may go back to former stages of a language, both collocations and idioms may testify to older language structures, for example by not using a Determiner before the Noun in French, something which is virtually excluded in Modern French, as shown for example in collocations such as avoir sommeil / faim / soif ‘to be sleepy / hungry / thirsty’ and idioms such as avoir maille à partir ‘to have a bone to pick’ or rebrousser chemin ‘to turn back’. Collocations, like idioms, exploit similar rethorical procedures such as metaphors, metonyms and comparisons. Thus the idiomatic expressions tourner la page lit. turn the page ‘to move on to a new stage’ is a metaphor, se casser la tête ‘to rack your brains’ a metonym and être plus royaliste que le roi ‘to be too scrupulous’ a comparison. Similarly, the collocations un caractère de cochon lit. a character of a pig ‘a very bad character’ and bête comme ses pieds lit. stupid as your feet ‘extremely stupid’ are obviously a metaphor and a comparison, respectively. Collocations, just like idioms, are word combinations that are subject to arbitrary restrictions: Les collocations sont des séquences polylexicales constituées de deux ou plu sieurs mots, contigus ou non dans l’usage, qui entretiennent entre eux une rela tion lexicale contrainte (Bolly 2008: 42) [Collocations are multiword sequences of two or more words that appear together or not and between which there is a lexical restricted relation] La collocation est une cooccurrence lexicale plus ou moins immédiate et plus ou moins contrainte qui se distingue de la combinatoire libre et du figement lexical. (François/Manguin 2006) [A collocation is a lexical co-occurrence of more or less contiguous and more or less restricted words which is different from free word combinations as well as from lexical fixedness].

Collocations have sometimes been defined as lexical cooccurrences of words between which there is a syntactic relation (cf. also Evert 2005: 16 for the notion of relational collocation):

142

Béatrice Lamiroy La collocation est une cooccurrence lexicale privilégiée de deux éléments linguistiques entretenant une relation syntaxique (Tutin/Grossmann 2002) [A collocation is a privileged lexical cooccurrence of two linguistic elements beetween which there is a syntactic relation].

Obviously, words that make up idiomatic expressions also have syntactic relations, in a trivial way. Thus we saw in the above section that idiomatic expressions display all kinds of syntactic structures which, by definition, are made of words that entertain syntactic relations with each other. However, collocations also have properties of their own. From a semantic point of view, the meaning of collocations, contrary to that of idioms, is prototypically compositional, e.g. homme de confiance ‘man of confidence’, éternellement reconnaissant ‘forever grateful’, prêter attention ‘to pay attention’, etc. Some collocations are partially opaque, e.g. in larmes de croco dile ‘crocodile tears’, crocodile does obviously not have its referential meaning and in mariage blanc ‘white marriage’, the adjective blanc does not refer to the wedding dress, but to a fake marriage. Note however that the expressions still refer to a kind of tears, and a kind of wedding, respectively. Three types of collocations can in fact be distinguished:9 (1) prototypical collocations which are compositional and hence, easily interpretable, e.g. rouge de honte ‘red with shame’, pein ture fraiche ‘fresh paint’, avoir froid ‘to be cold’, etc. As pointed out by Hausmann (1997: 282), such collocations are easily decoded by foreign language learners, even if their encoding may be problematic, due to their idiosyncratic character; (2) collocations which are still interpretable but whose collocative adds something to the referential meaning of the base, e.g. café noir ‘black coffee’ refers to a ‘coffee without sugar nor milk’ rather than to its black color, fauteuil roulant is a particular kind of chair made for disabled people, viz. a wheelchair, and vin chaud is not only a hot wine, but it also contains sugar and spices such as cinnamon, etc. 9

Tutin/Grossmann (2002) also distinguish three types of collocations: opaque collocations, transparent collocations and regular collocations. Our typology differs from theirs in that it is mainly based on the degree of compositionality of the collocation.

For a typology of phraseological expressions

(3)

143

the least prototypical collocations of all, which contain an element that is semantically opaque, like bleue in peur bleue lit. blue fear ‘terrible fear’ or nuit blanche lit. white night ‘night without sleep’. These collocations, however, are not completely non-compositional since they still refer to a kind of fear and a kind of sleep.

Obviously, the third type of collocation comes close to what we have considered idioms in the above section. However, two remarks are in order here. First, note that there still is a difference between the collocation peur bleue and the compound (idiomatic) noun cordon bleu ‘excellent cook’: in the first case, the NP refers to a kind of fear, whereas in the second, there is no reference to a ribbon nor to something blue. Secondly, collocations that may not be directly interpretable, are rather exceptional. In other words, the three semantic types that we have distinguished both for idioms and for collocations, are in fact each other’s reverse: the meaning of prototypical idioms is global, i.e. non-compositional, whereas that of prototypical collocations is compositional and easily interpretable. We thus agree with Tutin (2005), according to whom collocations are « half way » (à mi-chemin) between idioms and free combinations. In sum, in both cases, that of collocations and idioms, one must accept that there are prototypical cases and others that are less so: idioms usually have a global meaning but they can be “half fixed” (for the notion of semi-figement, cf. Balibar-Mrabti/Vaguer 2005 and Lamiroy/ Klein 2005). To the contrary, collocations usually have compositional meanings but opaque collocations (one of Tutin/Grossmann’s 2002 subtypes) and “semi-phrasemes” (Mel’čuk 2001) do exist. Another difference between the two regards the semantic domains which they refer to. Collocations arguably often indicate quantity or intensity (François/Mainguin 2006; Tutin 2013), as the examples in (29) show. Idioms instead rather deal with one of the major topics of life, such as health, poverty, happiness, birth and death, etc., as illustrated by the expressions in (30):

144

Béatrice Lamiroy (29)

a. froid de canard lit. duck’s cold ‘very cold’ b. mort de fatigue lit. death from tiredness ‘very tired’ c. savoir pertinemment lit. know significantly ‘to know for sure’ d. absolument convaincu ‘absolutely convinced’ e. simple comme bonjour lit. simple as goodday ‘very simple’

(30)

a. nager dans l’or b. avoir la frite/la pêche c. voir le jour d. rendre l’âme e. battre de l’aile

‘to be wealthy’ ‘to be very fit’ ‘to be born’ ‘to die’ ‘to be poorly’

Formally, collocations are units that are made up of at least two words: at first sight they share their multiword character with idiomatic expressions. Thus casser sa pipe is idiomatic, as opposed to mourir. And grande surface is a collocation in contrast with the single synonymous word supermarché. But in the case of collocations, the binary structure10 seems to be an essential property, so much so that they are often analyzed in terms of one element being the base and the other one the collocative (Mel’čuk 2013; Tutin/Grossmann 2002), e.g. (31) a. chat de gouttière chat = base b. courir un risque un risque = base c. grièvement blessé = base d. éperdument amoureux amoureux = base

gouttière = collocative courir = collocative grièvement = collocative éperdument = collocative, etc.

And finally, a crucial difference between collocations and idioms is that they are different in nature: collocations, as was already hinted at by Firth (1957), are made of words that very often “keep each other company”, i.e. they are multi-word units whose co-occurrence is significant for statistical reasons. Thus the notion of collocation is essentially a frequency-based concept: if tall patterns with man and high with wall, this is a merely distributional matter. As all words that often co-occur, 10

Tutin/Grossmann (2002: 10) correctly point out that certain collocations contain more than two words strictly speaking, for example in comparisons such as fort comme un Turc ‘strong like a Turk’, nu comme un ver ‘naked like a worm’, etc.

For a typology of phraseological expressions

145

collocations may eventually lexicalize, and hence become partly non-compositional, but lexicalization is not their raison d’être. In the case of idioms, to the contrary, lexicalization, which leads to global meanings, seems to be a sine qua non condition for them to exist.

5. Conclusions We have shown in this paper that it comes as no wonder that idioms and collocations are often mixed up, both in theoretical studies and in lexicographic practice. This is so because the two categories share a number of properties: both belong to the phraseology of a language, because both are multiword units which are not the result of a “free” combination. Collocations as well as idioms are subject to formal and semantic restrictions which make the lexical selection of the words they are made of relatively arbitrary. As a consequence, both have to be learned by heart by foreign language learners. However, idioms and collocations are also characterized by major differences which can be summarized as follows. Collocations are mostly binary units consisting of a base and a collocative and correspond to a statistical phenomenon of frequent co-occurrence of particular words whose meaning more often than not is compositional. Idioms instead are multi-word expressions which are the result of a diachronic process in which the word combination became progressively frozen and whose meaning is prototypically opaque. We also hope to have shown that although the border between these two subtypes is not always easy to establish, both theoretical linguists and lexicographers could benefit from separating them, rather than from putting them together.

146

Béatrice Lamiroy

References Balibar-Mrabti, Antoinette / Vauger, Céline (eds) 2005. Le semi-figement. LINX, numéro spécial 53. Biber, Douglas / Conrad, Susan 1999. Lexical bundles in conversation and academic prose. In Hasselgard, Hilde / Oksefjell, Signe (eds) Out of corpora. Studies in Honor of Stig Johansson. Amsterdam: Rodopi, 181–190. Blumenthal, Peter / Hausmann, Franz Josef (eds) 2006. Collocations, corpus, dictionnaires. Langue française 150. Bolinger, Dwight 1977. Meaning and Memory. Forum Linguisticum. 1, 1–14. Bolly, Catherine 2008. Les unités phraséologiques. Un phénomène lin guistique complexe? Thèse de doctorat. Université Catholique de Louvain. Bolly, Catherine / Klein, Jean René / Lamiroy, Béatrice (eds) 2006. La phraséologie dans tous ses états. Cahiers de l’Institut de Linguis tique de Louvain. 31, 2–4. Coulmias, Florian (ed.) 1981. Conversational Routine: Explorations in Standardized Communications. In Rasmus Rask Studies in Prag matic Linguistics. Vol. 2. Berlin: Mouton-de Gruyter. Cowie, Anthony P. 1998. Phraseological Dictionaries: some East-West comparisons. In Cowie, Anthony P. (ed.) Phraseology: Theory, Analysis and Applications. Oxford: OUP, 209–228. Dannell, Karl Johan 1992. Nothing but phrases. About the distribution of idioms and stock phrases. In Edlund, Lars-Erik / Persson, Gunnar (eds) Language: the time machine. Umeå: Umeå University. Everaert, Martin et al. (eds). 1995. Idioms. Hillsdale, New Jersey: Erl baum. Evert, Stefan 2005. The Statistics of Word Cooccurrences: Word Pairs and Collocations. Stuttgart: Stuttgart University. Firth, John Rupert 1957. Modes of Meaning. Papers in Linguistics, 190–215. Forsberg, Fanny 2006. Prêt-à-parler. Les séquences préfabriquées en français parlé L1 et L2. In Bolly, Catherine et al. (eds) La

For a typology of phraseological expressions

147

phraséologie dans tous ses états. Cahiers de l’Institut de Linguis tique de Louvain. 31, 183–195. Fournier, Nathalie 1998. Grammaire du français classique. Paris: Belin. François, Jacques / Manguin, Jean-Luc 2006. Dispute théologique, discussion oiseuse et conversation téléphonique: les collocations adjectivo-nominales au coeur du débat. Cahiers de lexicologie. 82, 50–65. Gaatone, David 1997. La locution: analyse interne et analyse globale. In Martins-Baltar, Michel (ed.) La locution entre langue et usages. Vol. 3. Paris: ENS éditions. Fontenay-St Cloud, 165–177. Gadet, Françoise 2007. La variation de tous les français. Linx. 57, 155– 164. Granger, Sylviane / Paquot, Magali 2008. Disentangling the phraseological web. In Granger, Sylviane / Meunier, Fanny (eds) Phrase ology: an interdisciplinary perspective. Amsterdam: Benjamins, 27–50. González Rey, Isabel 2002. La phraséologie du français. Toulouse: Presses de l’Université du Mirail. Gross, Gaston 1996. Les expressions figées en français. Noms composés et autres locutions. Paris: Ophrys. Gross, Maurice 1982. Une classification des phrases “figées” du français. Revue Québécoise de Linguistique. 11/2, 151–185. Gross, Maurice 1984. Une classification des phrases “figées” du français. In Attal, Pierre / Müller, Claude (eds) De la Syntaxe à la Pragmatique. Amsterdam: Benjamins, 141–180. Gross, Maurice 1988. Les limites de la phrase figée, Langages. 90, 7–22. Grossmann, Francis / Tutin, Agnès (eds) 2003. Les Collocations: ana lyse et traitement. Amsterdam: De Werelt. Hausmann, Franz Josef 1997. Tout est idiomatique dans les langues. In Martins-Balthar Michel (ed.) La locution entre langue et usages. Vol. 3. Paris: ENS éditions. Fontenay-St Cloud, 277–290. Hausmann, Franz Josef (ed.) 2008. Collocations in European Lexicog raphy and Dictionary Research. Lexicographica. 24. Hymes, Dell 1989. Ways of speaking. In Bauman, Richard / Sherzer, Joel (eds) Explorations in the ethnography of speaking. Cambridge: CUP, 433–451.

148

Béatrice Lamiroy

Jones, Susan / Sinclair, John McHardy 1974. English Lexical Collocations. A study in Computational Linguistics. Cahiers de Lexicol ogie. 24, 15–61. Klein, Jean René / Lamiroy, Béatrice 2011. Routines conversationnelles et figement. In Anscombre, Jean-Claude / Mejri, Salah (eds) Le figement linguistique: la parole entravée. Paris: Honoré Champion, 195–217. Klein, Jean René / Lamiroy, Béatrice (forthcoming) Le figement: unité et diversité. Collocations, expressions figées, phrases situationnelles et proverbes. L’information grammaticale. Lamiroy, Béatrice 2003. Les notions linguistiques de figement et de contrainte. Linguisticae Investigationes. 26/1, 1–14. Lamiroy, Béatrice 2008. Le figement: à la recherche d’une définition. Zeitschrift für Französische Sprache und Literatur, Beiheft 36, 85–99. Lamiroy, Béatrice (coord.), Klein, Jean René et al. 2010. Les expres sions verbales figées de la francophonie. Belgique, Québec, France, Suisse. Paris: Ophrys. Lamiroy, Béatrice / Klein, Jean René 2005. Le vrai problème du figement est le semi-figement. Linx. 53, 135–154. Langlotz, Andreas 2001. Cognitive principles of idiom variation: idioms as complex linguistic categories. Studi Italiani di Linguisti ca Teorica e Applicata. 30/2, 289–302. Machonis, Peter A. 2010. English Phrasal Verbs: from LexiconGrammar to Natural Language Processing. Southern Journal of Linguistics. 34/1, 21–48. Martin, Robert 1997. Sur les facteurs du figement lexical. In Martins-Baltar, Michel (ed.) La locution entre langue et usages. Vol. 3. Paris: ENS éditions. Fontenay-St Cloud, 291–305. Massot, Benjamin / Rowlett, Paul 2013. Le débat sur la diglossie en France: aspects scientifiques et politiques. Journal of French Language Studies. 23, 1–16. Mel’čuk, Igor 1998. Collocations and Lexical Functions. In Cowie, Anthony P. (ed.) Phraseology, Theory, Analysis and Application. Oxford: Clarendon Press, 23–53. Nesselhauf, Nadja 2005. Collocations in a Learner Corpus. Amsterdam: J. Benjamins.

For a typology of phraseological expressions

149

Rey, Alain / Chantereau, Sophie 1999. Dictionnaire des expressions et locutions. Paris: Le Robert. Rézeau, Pierre 2001. Dictionnaire des régionalismes de France. Bruxelles: De Boeck/Duculot. Senellart, Jean 1998. Reconnaissance automatique des entrées du lexique-grammaire des phrases figées. In Lamiroy, Béatrice (ed.) Le lexique-grammaire. Travaux de Linguistique (Special Issue). 37, 109–127. Sinclair, John 1991. Corpus, concordance, collocations. Oxford: Oxford University Press. Stefanowitsch, Anatol / Gries, Stefan 2003. Collostructions: investigating the interaction between words and constructions. Internation al Journal of Corpus Linguistics. 8/2, 209–243. Svensson, Maria Helena 2004. Critères de figement. L’identification des expressions figées en français contemporain. Umeå, Umeå University. Tutin, Agnès 2005. Le dictionnaire des collocations est-il indispensable? Revue de Linguistique Appliquée. 10/2, 1–14. Tutin, Agnès 2013. Les collocations lexicales: une relation essentiellement binaire définie par la relation prédicat-argument. Langages. 189/1, 47–63. Tutin, Agnès / Grossmann, Francis 2002. Collocations régulières et irrégulières: esquisse de typologie du phénomène collocatif. Revue de Linguistique Appliquée. 7/1, 7–25. Verlinde, Serge / Binon, Jean / Selva, Thierry 2006. Corpus, collocations et dictionnaires d’apprentissage. Langue française. 150, 84–98. Wray, Alison 2002. Formulaic language and the lexicon. Cambridge: CUP.

Daniela Capra

What do we talk about when we talk about collocation in Spanish?

Abstract: The aim of this paper is to offer an overview of the concept of collocation as it is seen in Spanish linguistics. Several perspectives are critically examined, among which those of E. Coseriu (1967) – for its influence on most of the subsequent linguists for his ‘lexical solidarities’ –, Corpas Pastor (1996) – whose Manual is considered a key text on this topic –, Koike (2001), the author of the most complete corpus-based classification of collocations in Spanish. Besides, a point is made about light verb constructions and other types of combinations. The basic definition of collocation in Spanish has to do with the idea of inner fixity and repetition (Zuluaga 2002); on the other hand, there are restrictions in word combinations and it can be argued that this is the foundation of most definitions of collocation. Under this point of view, Bosque’s dictionary (2004) could be regarded as a dictionary of collocations, even though he says it is not. Moreover, it is usually pointed out that collocations’ meaning is clear and in no case idiomatic, but this is not always true. In conclusion, collocation is a heterogeneous category; its boundaries, on one side close to certain types of fixed expressions (called ‘locuciones’ in Spanish) and on the other quite similar to linguistical formulations that are considered free combinations (i.e., without fixity), are not clear-cut and definite ones. Keywords: Spanish collocation, lexical solidarity, word combination, light verb con structions, inner fixity, multi-word expressions

1. Introduction In the last years Spanish linguistics has seen numerous contributions dealing with collocation, but in spite of the interest in this topic there is no unitary perspective, not even regarding the words with which we define it. In the following pages we will examine some of these conflicting positions and we will try to analyze the current situation.

152

Daniela Capra

Let’s step back in order to situate the issue in its proper perspective. The debate on collocation cannot be separated from its relationship with phraseology and from the issue of the position occupied by collocation in relation with the other types of multiword expressions. Almost all the linguists who have worked on collocation have addressed the issue of the relationship between collocation and phraseology. That question falls within the theme known as ‘broad conception’ versus ‘narrow conception’ of phraseology. Let us, then, briefly recall what is meant by ‘broad conception’ and ‘narrow conception’ in Spanish phraseology. The broad conception encompasses all fixed lexical combinations in syntactic relationships consisting of at least two words and up to an indefinite maximum – provided that it is syntactically and semantically coherent – so as to constitute even a complex sentence; the high frequency of the combinations, their formal stability and their institutionalization constitute defining properties related to all these expressions. This conception has been proposed in particular by Corpas (1996), who established a systematization of the different types of units that fall in the above description, taking as initial criteria of distinction inner fixity (fijación) and the fact of being or not an utterance. In the category of expressions that constitute an utterance we find proverbs and several types of utterances with formulaic characteristics such as greetings, expressions of thanks, apology, congratulations, etc., but also the so-called winged words (Bergenholtz/Gouws 2013), that is famous quotations, sayings, titles and slogans. In the category of expressions that do not constitute an utterance, we find lexical combinations that are part of a sentence, such as collocations (colocaciones), but also another class of fixed expressions, called locuciones in Spanish. According to Corpas (1996) the difference between colocaciones and locuciones lies in the fact that the latter are fixed in the system of langue, while the former are a simple matter of linguistic norm.1 While locuciones are fixed word combinations that function as a specific class of words and include also grammatical parts of speech such as conjunctions (antes bien = instead), prepositions (en torno a = around) and

1

Corpas bases her notion of system and norm on Coseriu (1952).

What do we talk about when we talk about collocation in Spanish?

153

adverbs (de repente = suddenly),2 colocaciones are defined in Corpas (1996: 53) as “phraseological units that under the point of view of the language system are free combinations, generated from rules, but at the same time they show a certain degree of combinatory restriction, which is determined by the use (i. e., certain inner fixity)”.3 The wording of these criteria and the classification itself are reassuring and apparently they smooth out any possible obstacles. However, unfortunately, language is a complex phenomenon and it is not easily definable or synthesizable by schematizations. We shall see that the category called collocation is very uneven, because it subsumes phenomena that differ greatly one from another. The second outlook, that of the ‘narrow conception’ of phraseology, which constitutes the other pole of the broad – narrow dichotomy, encompasses exclusively what does not constitute an utterance, thus excluding both paremiology and pragmatic formulae. In some cases, it carries out further exclusions – particularly of collocations – or reclassifications between the two categories.4

2

3

4

English translations are related to a particular meaning and are intended only to show cases of what we call locución in Spanish. Other cases have to do with nouns (lugar común = commonplace), adjectives (de rechupete = delicious), pronouns (alguno que otro = some) and verbs (caer en la cuenta = to realize). Corpas’ article (1998) could be interesting for the English reader, as it centers on general criteria for classifying phraseological units with examples taken from English and Spanish. “Unidades fraseológicas que, desde el punto de vista del sistema de la lengua, son sintagmas completamente libres, generados a partir de reglas, pero que, al mismo tiempo, presentan cierto grado de restricción combinatoria determinada por el uso (cierta fijación interna)”. Therefore, those who work within the ‘narrow conception’ of phraseology study above all locuciones. It is ironic that while Mellado, in the introduction to the book she was editing, stated that “[the narrow conception] seems finally passed” (2008: 9), García-Page published a book entitled Introducción a la fraseología española. Estudio de las locuciones (2008), where he basically identified the concept of locución with that of phraseological expression. García-Page (2001) claims that lexical restrictions – though less strong – apply also to free combinations of the language. On the other hand, Bosque (2001: 9) argues that “the study of collocations belongs to the (so-called) lexicon-syntax interface rather than to the proper domain of phraseology”. Cf. also Ruiz Gurillo (1998).

154

Daniela Capra

Many taxonomic studies related to the internal structure of fixed expressions and most lexicographers apply in fact the narrow perspective, but sometimes they do so also because of practical reasons (size of the dictionary, or its target). The problem of the definition of collocation in particular and of phraseology in general should be addressed and solved by those who aim to study linguistic expressions as entities falling into the category of the so-called ‘repeated discourse’5. It seems possible to affirm that as there exists an opposition between two different conceptions of Spanish phraseology (broad versus narrow conception, as we have seen above) we also have a broad and a narrow conception of collocation.

2. From Coseriu’s solidaridades léxicas to Corpas’s combinaciones preferentes In order to situate themselves in the field of phraseological studies, many scholars take a retrospective glance at the previous texts that have come to form a sort of canon. Leaving aside what could be considered the general history of this subject (Firth, Halliday, Sinclair), in the domain of hispanic lexicography an important linguist for many reasons was Casares, a lexicographer who first approached multi-word expressions (Casares 1950 [1992]); in the third part of his Introducción a la lexicografía moderna he grouped such expressions according to their function and called them locuciones, frases proverbiales, refranes and modismos. He did not mean to study any kind of multiword expression in itself, as his purpose was a discussion about the possibility of their insertion in dictionaries: therefore he excluded modismos, questioning not only this category (a wide and generic one, as it included many types of figurative expressions) but the label itself, not scientific at all and taken from the general use of language; in spite of it he understood 5

Coseriu was the first that used this fortunate definition; he meant to include any word combination that was not part of the ‘free combinations’ in language (the ones generated by syntactic rules).

What do we talk about when we talk about collocation in Spanish?

155

their importance for speakers and so he dedicated several pages to describe them. His classification of locuciones, based on their function in the sentence, is a starting point for the subsequent ones; he did not consider collocations at all. Another, even more influential linguist in Spanish lexicography was Coseriu (1967), with his notion of solidaridad léxica (lexical solidarity). Starting from a semantic perspective, Coseriu studied the relationship between lexemes. Lexical solidarity describes the syntagmatic behaviour of a word; Coseriu finds three types of relationship, named affinitive, selectional and implicative solidarity, depending on the way the restriction works. While the first type of relationship is not relevant to a discussion on phraseology in that it describes a semantic relationship between words that has not to do with collocation or other multiword combinations (cf. Corpas 1996; Koike 2001; Rivas 2010; Serra 2012), the other two do; in Coseriu’s words, they build up a mul tilateral solidarity, because in these cases a certain word establishes an interconnection with an archilexeme or another lexeme. More precisely, selectional solidarity means that the semantic determination relates a word to any word included at least into an archilexeme; the verb tocar, to play, for example, is determined by the superordinate term “instrument”, which includes words like “piano, guitar, violin”, etc. (tocar el piano, la guitarra, el violín, etc.). Implicative solidarity is also based on a semantic determination that is external vis-à-vis the class of a given word, but in this case we have a word determining another word, like in vino seco, dry wine. Both types of solidarity consist of a special relationship between two words, that show a semantic constraint. Again, the schema is well-balanced and clear and it definitely works with many word combinations, but one could ask why we need the selectional type of solidarity, if what we have in discourse is a combination between two lexemes, and the fact that one of them is part of an archilexeme does not seem relevant6; let us make an example: the word that has the task to select – the so-called base – is a lexeme (for instance, piano), and its selection of the verb “to play” is based on a semantic 6

Gutiérrez Ordóñez (1989: 114–115) asked himself this question, indeed, and did not find an answer; he concluded that not all Coseriu’s solidarities are sintagmatic structures.

156

Daniela Capra

determination that does not take into account the fact that piano is part of a class that includes “violin”, “oboe” and other instruments. Moreover, in Spanish we can say tocar la guitarra, but also rasguear la guitarra, that is to play guitar with a peculiar technique, and this is a unique combination, but it is still regarded as a collocation.7 Other works on collocation deal with its relationship with the notion of contour; the term collocation has also been used in reference to the contour, as we find it in dictionaries (Ruiz Martínez 2007; Serra 2012). Lexical contour is a way of recording most selection restrictions within the lexicographic definition. While some of them deal only with the actant structure, as it happens with the verb regalar in Spanish (someone gives something to someone), in most cases contour makes explicit the arguments, thus helping the user to combine words correctly, as we can read in DRAE (2001) about the adjective caro (expensive): “3. Dicho de cualquier cosa vendida, comprada u ofrecida: A un precio más alto que el de otra tomada como punto de referencia, la cual es más barata con relación a aquella”. Or fuerte (strong): “6. Dicho de un ter reno: Áspero, fragoso. 7. Dicho de un lugar: Resguardado con obras de defensa que lo hacen capaz de resistir los ataques del enemigo. 8. Dicho de una cosa: Entre plateros, monederos y lapidarios, que excede en el peso o ley. 9. Dicho de un color o de un sabor: Intenso”.8 A good treatment of the contour would be very helpful, particularly for non native speakers, but this is not always as good as one would expect in Spanish dictionaries; Alonso (2002) and Martínez (2007), for example, address the problem of where this information is given: the former thinks that the entry of the collocate is not the adequate place and the latter criticizes the fact that examples have been intercalated in dictionaries oriented to the definition of one-word entries. Another issue about contour that involves the status of collocation is the debate as to whether contour is part of the content or of the context of a lexical item. The claim that contour is part of the context

7

8

Also Rivas (2008: 159) qualifies as “irrelevant” Coseriu’s distinction into three types, adding that the problem of semic definition has not been solved. Nonetheless, he sees collocation as a phenomenon of lexical restriction. The formula “Dicho de” (said of / about) is used to indicate restrictions.

What do we talk about when we talk about collocation in Spanish?

157

limits the possibility of speaking about collocation in these circumstances. On the other hand, some scholars extend the notion of “combination of two lexical units” – which is one of the basis of the definition of collocation – to include grammatical elements, such as verbs followed by the preposition(s) they support. Following Benson, Koike (2001: 14) divides collocations into two groups, named grammatical collocations (colocaciones gramaticales, such as pensar en, consistir en, soñar con, etc.) and lexical collocations (c. léxicas), whereupon he focuses only on the latter, especially those formed by a verb and a noun, or a noun and an adjective: the two largest classes for the Spanish language. It is important to remember that in Spanish phrasal verbs are not as developed as in English, if they exist at all; meaning is given by the verb (not by the combination of verb and preposition) and a verb combines in most cases with just one preposition; we should rather speak of prepositional verbs, as the argument is introduced by a given preposition. Koike (2001) also reflects upon several classifications of collocations based on their structure and then gives his own, adding a category (verb + adjective) and reshaping the other ones; if there is a single pattern into a category, he reports which item is the base and which is the collocate. He also compares the relationship between the two lexemes of each category to the lexical functions of Mel’čuk’s model. Koike also proposes a definition of collocation through the analysis of its formal and semantic properties; he highlights the frequent cooccurrence of its items, the presence of combinatory restrictions and its compositionality on the formal side, and on the semantic one he emphasizes the correlation between two lexemes, the typicality of such a relation and the semantic precision of the combination. He believes that this characterization is helpful to distinguish them from locuciones on one side and from free associations on the other. In this regard, Zuluaga (2002) sees collocation as a “phenomenon of intersection […] half way between free combinations and phraseological units” (2002: 99) and defines them “hypotactical lexeme combinations which have become stable only through repetitive use” (2002: 97). He recognizes the arbitrary fixity of its components, but does not consider collocations as a part of phraseological expressions, as they

158

Daniela Capra

lack other kinds of fixity.9 Ruiz Gurillo (1997) has gone further on the idea of “half way” phenomena: for her the very notion of phraseological unit cannot be defined in a univocal way, because neither inner fixity nor idiomatic meaning – the two main attributes that define phraseological units – are given in the same degree in all the elements of every determined type of unit. For this reason, she approaches the whole matter under a cognitive perspective; more precisely, she applies the Prototype Theory and the idea of “family resemblance” to establish a center and a periphery where all phraseological units find their place. Penadés (1996) had approached the issue with a similar methodology. Shifting from the notion of ‘fixed combination’ of lexical elements that we set out above to the one of “preferential combination” (combinación preferente), we can incorporate a wide range of lexical associations into the concept of collocation – especially those including verbs + nouns and nouns + adjectives – that do not seem to be semantically motivated. Among a range of possible examples, let’s consider deseo irresistible / ganas incontenibles (overwhelming desire / uncontainable wish)10, where the two adjectives are combined in this way and not the opposite way round, in spite of their very close meaning. Other cases illustrate different types of associations, but, similarly to the examples above, they show that semantics is not always essential. Such cases are entrar en vigor (to become effective, literally *to enter into vigour), prestar juramento (to swear, to take an oath; literally *to lend oath), librar la batalla (to engage in a battle, literally *to soar the battle); all of these, for different reasons, lack semantic motivation, as they are beyond figurative interpretation.11 They are non compositional and as a consequence, they are not easy to understand, but nevertheless they are classified as collocations, not locuciones.

9

10 11

In the words of Zuluaga (1980: 99), another kind of fixity is defined as “suspension of some combination rules of the speech elements” (“suspensión de alguna regla de la combinación de los elementos del discurso”). He was probably thinking of a similar type of fixity, that in fact we just see in some locuciones. The translation is only intended as functional, as a perfect synonym for ganas does not exist. The metaphorical dimension of some collocations has been generally recognized.

What do we talk about when we talk about collocation in Spanish?

159

Against this backdrop, Corpas (2001: 48) follows Haensh and defines collocation as “the property of languages due to which speakers tend to produce certain word combinations among a large amount of theoretically possible combinations”.12 These expressions that become established through repeated context-dependent use have reached a stability without mutual semantic implications, a situation that makes it difficult to decide the role of the two lexemes in terms of base and collocate. Mendívil (1991) does not use the term collocation for these and other kinds of expressions; he talks about preferencias usuales, usual preferences, where “terms tend to bound to others, even when there is no simultaneous presence of both terms (as we can see in locuciones) or when an element does not imply the other one (as in Coseriu’s lexical solidarity)”.13 About the qualifier preferente that is used implicitly or explicitly to describe the above mentioned collocations, it must be pointed out that there are sometimes differences in collocations due to other factors. One of the most significant has to do with diatopic variation, too large a topic to be fully discussed here; another one, not less broad, with diastratic variation. These phenomena do not always constitute a case of preference on the part of speakers, as in most cases they use the only form they know (even if it can tell much about them); similarly, discourse genres encode certain kinds of speech uses, thus limiting the range of choices. However, a self-conscious speaker can select the right collocations for any given situation.

12

13

“Aquella propiedad de las lenguas por las que los hablantes tienden a producir ciertas combinaciones de palabras entre una gran cantidad de combinaciones teóricamente posibles”. “En las preferencias usuales normalmente unos términos tienden a vincularse a otros, sin que exista la copresencia obligada en el sintagma (como en las locuciones) o que un elemento suponga semánticamente al otro (como en las solidaridades léxicas”. Among his examples, apagar la sed, trabar amistad, librar la batalla. His categorization of ‘fixed’ expressions does not rely on syntactic functions, nor on formal criteria; conversely, it seems to be semantically grounded, as he also considers a group called locuciones ambiguas.

160

Daniela Capra

3. Light verb constructions and other types of collocation According to Serra (2012) light verb constructions (see examples below) – to which we refer with different terminologies in Spanish – are the core of collocations, as this is the only type of collocation that does not fall into Coseriu’s lexical solidarities; more or less for the same reason, Rivas (2008, 2010) suggests considering light verb constructions as locuciones, provided that they have a unitary meaning. Light verb constructions in Spanish include two elements, a semantically weak verb and a noun phrase, where the noun is in most cases an abstract one. Traditional grammar has focused on the verb as the fulcrum of the sentence, but in this kind of sentences it is the noun that has a major role. This statement does not mean that the verb is ‘empty’, as we say in Spanish; Wotjak (1998), for example, makes a distinction between associations where the verb retains its lexical meaning (such as in albergar esperanzas, experimentar una gran alegría) and constructions with different degrees of ‘lightness’ of the verb. In this work he divides these constructions into three groups: those with a near idiomatic meaning (close to locuciones; cf. Wotjak 2008), those with a verb that carries some aspectual information (inchoative, durative, causative, terminative, etc., e.g. coger miedo, mantener relaciones, poner en mar cha, perder la confianza) and those where the meaning is given by both lexemes: a verb whose meaning is “reduced” or “restricted”, as he says (1998: 270), and a noun whose meaning is predominant. Constructions of the second group have no alternative formulation in a single verb, while many collocations of the last group can be replaced by a verb, such as dar un paseo – pasear (to take a walk – to walk). Dar (to give) is perhaps the most common verb in this type of collocations; other common verbs are hacer (to do), tener (to have), poner (to put), tomar (to take), echar (to get), whereas pegar, prestar and a few more combine with a small number of nouns. In this type of collocations diastratic variation is particularly visible. It must be pointed out that the existence of a single verb form does not mean that it is a perfect synonymous of the analytical form, as differences can be produced at any level, from the semantic or the pragmatic ones to anyone related to the style; in most

What do we talk about when we talk about collocation in Spanish?

161

contexts dar un beso and besar are not interchangeable in Spanish: we believe that anything that is in the language has its own raison d’être and can not be substituted or reduced to something else; in a context like this, the question should concern the reason why we have two different forms (cf. Dubský 1998). Another interesting related topic has to do with verb categorization into collocational and non-collocational (Koike 2001: 69), as there are verbs that do not form collocations. Serra (2012) insists on the difference between “not to have lexical meaning” and “not to have a meaning at all” and revises positions she considers incorrect, stressing that carrying information about the argument structure is the role of these verbs, as well as the morphological characterization of the sentence (mode, tense, person, number); she explains the process of this type of collocations as the selection of a collocate – a verb, whose primitive content is weakened – by the base, a predicative noun; together, they make a “unity of sense”. Serra also emphasizes the fact that a combination like this is the semantic equivalent of a full verb, even when there is not such verb in the lexicon of a given language. We believe this statement is intended to mean that the two lexemes that form a collocation ‘merge’ their meanings; what comes out is not the plain sum of the individual meanings, nor the mere meaning of the base, but a different meaning. Talking about collocation in general, Blasco (2002), following Alonso (1994–95), moves on this line.14 Nevertheless, we think it is inaccurate to extend this idea to all types of collocation. It can be said that light verb constructions challenge the notion of compositionality that is normally related to the definition of collocation. This is a problem that entails the whole ambit of these constructions, as other types show the same condition. However, there are discrepancies on whether a collocation be partially or completely compositional: some authors define collocation as a completely transparent combination (especially in contrast with locuciones), but the most widespread opinion is that it is not always easy to catch its meaning – especially for 14

She explains that the meaning of the base A + the meaning of the collocate B does not equal AB; instead, it includes the meaning of A + a meaning C (where C may take different values depending on the type of collocation). It is the adaptation of Mel’čuk’s characterization of the collocation to the Spanish language.

162

Daniela Capra

a non-native speaker – which makes them (partially) non-compositional (Alonso 1994–95; Wotjak 1998; Koike 2001; Blasco 2002; Zuluaga 2002; Valero 2012). In particular, the semantic tailoring that intervenes in some selections involves a semantic specialization of the collocate, which acquires a figurative sense, giving a partially compositional meaning to the whole collocation (Koike 2002). Before concluding our survey on collocations, let us have a look at another problematic type of collocation, where a noun combines with another noun. According to most authors who have worked on collocation taxonomy (for Spanish linguistics, Corpas 1996; Castillo 1998 and 2015; Koike 2001) the noun + prep. + noun combination forms one type of collocation. The first noun is the collocate and the second one is the base; the first noun may refer (1) to a collection (often it is a collective name) or (2) to a regular ration of what the second name designates: (1) un enjambre de abejas (a swarm of bees), un ramo de flores (a bouquet of flowers); (2) un gajo de naranja (an orange slice), una rebanada de pan (a slice of bread). Semantic determination varies: the word enjambre, for examples, refers to bees, but it can be figuratively used for people, while gajo applies to several kinds of fruit, particularly to citrus fruits, but also includes pomegranate. Moreover, some noun associations of the second type show a change in the categorization of the base, that becomes discontinuous, as for example in un copo de nieve (a snowflake) or in un terrón de azúcar (a sugar cube). Some combinations include abstract nouns, such as un ciclo de conferencias (a lecture series) or un ataque de risa (fits of laughter). It seems quite obvious that they tend to be compositional; the two types of combinations that we have shown above could even be regarded as the two sides of a synecdoche (a movement from general to particular or the opposite way round), especially where semantic determination is stronger. In some context-depending situations it is even possible to mention only the collocate, thus cancelling the collocation. Not every noun + prep. + noun combination is a collocation; some of them have to be considered as compounds, in spite of their analytical form: a dandelion is a diente de león in Spanish; the word is a calque from French, while the English form is an adaptation from the

What do we talk about when we talk about collocation in Spanish?

163

same language, dent de lion. Similarly, other noun associations, even without preposition, can be considered compound nouns, like paquete bomba (parcel bomb) or coche cama (sleeper).15 Compounds are quite a big class of words in Spanish, closer to the category of locuciones, or even identifiable with it (García Platero 2002). The other noun based combination is a noun + noun combination, but as the second noun modifies the first one it can be considered as an adjective (Corpas 1996; Castillo 1998); among the examples, hombre clave (key man) and ciudad fantasma (ghost town); in our opinion other noun combinations show some problem, such as paquete bomba, that we classify among compounds; Koike (2001) and Val Álvaro (1999) also call compounds this kind of word combinations. The name lexías comple jas is sometimes found for this kind of compounds, perhaps to avoid the word compound, that some authors reserve to one-word formations (like sacacorchos, corkscrew, agridulce, bitter-sweet, pelirrojo, redhead, etc.). On the other hand, Castillo (2015: 63), who follows Corpas’ classification,16 envisages the possibility of the creation of new combinations: “conviene puntualizar, por tanto, que, independientemente de los patrones morfológicos más habituales, hay que estimar como posible la coaparición de otros previamente no sistematizados que afectan no solo a la combinación categorial, sino al orden secuencial”.17

15

16

17

In a morphological perspective, the study of word formation may admit this kind of combinations as lexemes, as they have a unitary meaning; also many verb-noun compounds and noun-adjective compounds may be seen in this perspective. However, the issue is problematic (Alonso 2009). Corpas (1996) finds the following types of collocation in Spanish: V + N (usually the Noun works as a complement of the Verb); N + V (the Noun tends to be the subject of the sentence), N + Prep. + N, Adj. + N (or N + Adj.), Adj. + Adv. (or Adv. + Adj.), V + Adv. From the examples it seems quite clear that Corpas does not take into account the notion of compound. Other linguists partially modify this classification, adding combinations, like Koike (2001). “It is convenient to point out, therefore, that besides the most common morphological patterns, we should estimate as possible the occurrence of new ones, not previously systematized, that could affect not only the categorial combination, but the sequential order”.

164

Daniela Capra

4. DiCE and REDES DiCE is the Diccionario de Colocaciones del Español, a big project directed by Alonso Ramos since 1999, that has filled a gap that existed in Spanish lexicography, as other Spanish dictionaries focused on phraseology in general.18 This dictionary of collocations is an ongoing project based on the concept of lexical function and refers to the theoretical framework of Mel’čuk et al. (1995), whose results are given form in Mel’čuk et al.’s Dictionnaire explicatif et combinatoire and Mel’čuk and Polguère’s Lexique actif du français (cf. DiCE’s bibliographical section). DiCE describes the restricted word cooccurrence through the lexical functions adapted to the Spanish language and systematizes all semantic constraints at work in word combinations. The entries of the dictionary are the bases, the natural starting point for displaying the collocations according to Alonso Ramos. There is a semantic area where words are defined; each meaning is listed apart and has its semantic label; the actant structure is also given, with an example for helping users to better understand it. A syntactic area shows all possible syntagmatic combinations. Finally, in the lexical-semantic area, a list of collocates is given, together with the appropriate lexical function. We believe the online mode offers great advantages in this type of dictionaries, for consultation is more agile and rapid. Given that the theoretical framework of this work is very well-known and its tools are recognized as very helpful, especially for encoding speech elements, we shall not discuss it further (see Buendía Castro / Faber 2014) and we will focus on another dictionary, whose status is in a way more innovative. Ignacio Bosque’s Redes (2004) is a combinatory dictionary of Spanish. According to Bosque himself, it is not a dictionary of collocations. Lexemes are arranged in alphabetical order, but not all lexemes are listed; instead, only predicates find room as entries, especially in the so-called “analytical entries”. Words are not defined, instead they 18

Also general dictionaries of the language contain collocations (Romero 2014), even if several problems have been highlighted: the bibliography about many aspects of this complex matter is huge. Phraseological dictionaries are Varela/ Kubarth (1994) and Seco/Andrés/Ramos (2004). Alonso’s DiCE is online at .

What do we talk about when we talk about collocation in Spanish?

165

are combined with those lexemes with which a combination is possible.19 The work is corpus-based and takes into account especially newspapers, with informative articles, but also comments and essays, both from Spain and Latin America; it shows semantic restrictions, ignoring other kinds of restrictions, particularly the extra linguistic ones: for example, the verb romper, to break, will not be taken into account in its combinations with breakable objects, but only with abstract nouns like promesa, pacto o compromiso (promise, agreement, or commitment). Collocation is seen as a particular case of lexical selection, which exists only when the selection is restricted. In the wide introduction to his dictionary, Bosque starts a discussion on collocation only from the page CLII, as his investigation is based on a specific notion of lexical combination that does not fit into all different visions of collocation, still too vague a category to be used as a definition of his work. He admits (LXXXVII) that those who work on phraseology could call his work a dictionary of collocations, while a lexicographer would say it is a dictionary of contours, or perhaps of classematic information. He prefers to say that it is a combinatory dictionary. To give an example of the treatment of a lexeme, we copy the wordlists selected by the verb admitir (admit): admitir • abiertamente, a concurso, alegremente, a medias, a regañadientes, con franqueza, con matices, con reservas, de antemano, de (buena/mala) gana, de buen grado, de plano, explícitamente, humildemente, implícitamente, incondicionalmente, lisa y llanamente, ni por asomo, sin ambages, sin dudar, sin pestañear, sin rechistar, sin reservas, sin tapujos, tácitamente, unánimemente, universalmente, veladamente • autoría, cambio, compromiso, culpa, culpabilidad, defecto, derrota, desconocimiento, dolor, equivocación, error, existencia, fracaso, ignorancia, intención, límite, malestar, miedo, necesidad, participación, pasión, plan, posibilidad, relación, responsabilidad, verdad, otros sustantivos que designan sentimientos � Véase también: aceptación, admitir a trámite, confesar, reconocer.

19

The selection of predicates has been done according to their capability to restrict and to identify arguments. Predicates chosen as entries are also called collocates (Bosque 2001). The same perspective is adopted by Castillo (2000), who refers to contour as “bases of collocational relationships” (quoted in Bosque 2001: 18).

166

Daniela Capra

The first list, from abiertamente to veladamente (openly – in a veiled manner) shows all significant combinations with admitir found in the corpus: we could consider them collocations; the second list, from au toría to verdad (authorship, truth) shows a different kind of associations found in the corpus, that is the “fields of use” of the verb admitir: the list includes different ‘things’ you can admit in Spanish, for instance a fault (culpa), or a defeat (derrota). As one can see, the first list contains adverbs, the second one nouns. After the last noun, there is additional information about the field of application of the verb (“other nouns that designate feelings”). At the end, the dictionary sends to other somehow related entries. This is not the only type of entry of the dictionary. The one shown above is called a “short entry” and displays possible combinations of a lexeme; not all “short entries” constitute a predicate. The other type of entry is called “long” or “analytical entry” and it also shows possible combinations; its arguments are divided into “lexical classes”;20 the information contained in these entries include the linguistic contexts in which the lexeme may appear, the ways in which it combines and its semantic restrictions. As in a language all words undergo some kind of combinatory restriction, here Bosque focuses on combinatory properties more than on the way lexeme combines. Because of their length, we cannot quote a whole entry but to give an idea of how they are structured, we just copy its frame, skipping all nouns and full sentences quoted as examples (see also De Cesaris/Battaner 2008):21 aclamar v. • En el sentido de ‘proclamar’ admite sustantivos que designan cargos o puestos. En el sentido de ‘vitorear o aplaudir’ admite frecuentemente sustantivos de persona, más frecuentemente si designan líderes, artistas, deportistas y otros individuos de renombre. También se combina con sustantivos que designan obras o composiciones artísticas. Se combina con otros sustantivos, especialmente con… A SUSTANTIVOS QUE DENOTAN ACTUACIÓN O INTERPRETACIÓN, MUY FRECUENTEMENTE MUSICAL O DRAMÁTICA: 1 actuación + 2 concierto + 3 gira 4 recital 5 interpretación + 6 show 20 21

The “lexical classes” are formulated using a reduced set of semantic features shared by the lexemes; they should not be confused with the semantic fields. Capital letters are in the original.

What do we talk about when we talk about collocation in Spanish?

167

B ALGUNOS SUSTANTIVOS QUE DENOTAN EL PROCESO DE MANI FESTARSE ALGUIEN O HACERSE PRESENTE: 7 llegada 8 aparición 9 presencia + 10 entrada C SUSTANTIVOS QUE DENOTAN EL EFECTO O EL RENDIMIENTO OBTENIDOS AL FINAL DE ALGÚN PROCESO, MÁS FRECUENTEMENTE SI SE REFIEREN AL RESULTADO FELIZ DE LO QUE SE EMPRENDE: 11 resultado 12 conquista 13 triunfo + D SUSTANTIVOS QUE DESIGNAN LANCES DEPORTIVOS O TAURINOS: 14 jugada 15 faena 16 pase � Se combina también con: • a bombo y platillo, a los cuatro vientos, bullicio samente, efusivamente

The sign + after a noun indicates an especially frequent combination.22 The formula sustantivos que designan / denotan (substantives that designate / denote) used in this kind of entry precedes the lexical classes found in combination with the given verb (“acclaim” in this case) or adjective. Should the predicate be an adverb, the formula for its combinations would be verbos que designan…. Nouns usually have a short entry and sometimes the same happens to the other classes of lexemes, depending on their collocability. Redes is an onomasiological dictionary (Serra 2012). It suggests all possible combinations of a given predicate and it gives cross-references that expand the range of word using. For example, if we go back to the previous example, we see that aclamar (acclaim) can combine with llegada (number 7 of the list); looking for llegada, we find a short entry: llegada • apoteósico15, a tiempo, en masa, en volandas, esperado, inesperado, inoportuno, intempestivo17, moltitudinario, oportuno, sorpresivo, triunfal • acla mar7, esperar, producir(se), revivir, tener lugar.

As one can see, aclamar is accompanied by an exponent, the number 7, that sends the reader back to the verb, and the same happens with the 22

Bosque (LXXXIV) specifies that frequent cooccurrence does not necessarily mean that we are dealing with a collocation; instead, it is an index of the systematic nature of the combination. He insists on this idea in Bosque (2001: 11–15).

168

Daniela Capra

other exponents. Redes means “nets” or “networks”, and it seems to be a fit title for such a dictionary.23

5. Conclusions We have seen that there are conflicting positions related to the definition of collocation. A major point of discrepancy is whether it is part of phraseology or not, and, in relation to this, there is the issue as to whether a collocation is a free combination (with some fixity), or is (partially or totally) compositional. Another problem is linked to the notion of frequency, as most authors reject the statistical approach to the concept of collocation, but still they make use of the concept of frequency of co-appearance in the definition of collocation. It seems quite clear that a collocation is a word combination. According to most authors, two lexemes are needed; they are referred to as base and collocate, or predicate and argument. Their relationship has been explained in different ways; semantic determination seems to be a widespread explanation, but other authors emphasize the lack of motivation of some selections, while some others focus on combinatory restrictions. The very notion of collocation is not univocal, and this is the real problem. Lexical solidarity is seen as the main type of collocation, or it is excluded from the category. The same happens for light verb constructions and for some noun based combinations. The instability of the concept seems to be part of its nature. Going back to the idea of the broad versus narrow conception of collocation, we could conclude that if we accept a broad perspective, we should define collocation in such a way that it is possible to have several degrees of each characteristic, making all of them gradual, not oppositional; the cognitive perspective, with its theory of the prototype 23

In 2006 Bosque published another combinatory dictionary, known as Práctico. For its comparison / contrast with REDES (2004) see Barrios (2007). See also Lo Cascio’s contribution in this volume.

What do we talk about when we talk about collocation in Spanish?

169

and its concept of “family resemblance” may be able to offer a solution to these problems. On the contrary, in a narrow conception, we should exclude one or more facets of what is referred to as collocation, depending on our own definition of this phenomenon.

Bibliography Alonso Ramos, Margarita 1994–95. Hacia una definición del concepto de colocación: de J.R. Firth a I.A. Mel’čuk. Revista de lexico grafía. 1, 9–28. Alonso Ramos, Margarita 2002. Colocaciones y contorno en la defini ción lexicográfica. Lingüística española actual. 24/1, 63–96. Alonso Ramos, Margarita 2009. Delimitando la intersección entre composición y fraseología. Lingüística española actual. 31/2, 243–275. Barrios Rodríguez, María A. 2007. Diccionarios combinatorios del español: diferencias y semejanzas entre Redes y Práctico. redELE revista electrónica de didáctica / español lengua extranjera. 11, 1–14. Barrios Rodríguez, María A. 2015. Las colocaciones del español. Madrid: Arco Libros. Bergenholtz, Henning / Gouws, Rufus 2013. A lexicographical perspective on the classification of multiword combinations. Inter national Journal of Lexicography. 27/1, 1–24. Blasco Mateo, Esther 2002. La lexicalización y las colocaciones. Lingüística española actual. 24/1, 35–62. Bosque, Ignacio 2001. Sobre el concepto de colocación y sus límites. Lingüística española actual. 23/1, 9–40. Bosque, Ignacio 2004. Presentación. In Bosque, Ignacio (ed.) Redes. Diccionario combinatorio del español contemporáneo. Madrid: SM, xv–clxix. Bosque, Ignacio (ed.) 2006. Diccionario combinatorio PRÁCTICO del español contemporáneo. Madrid: SM.

170

Daniela Capra

Buendía Castro, Miriam / Faber Benítez, Pamela 2014. Collocation Dictionaries: A Comparative Analysis. MonTI. Monografías de Traducción e Interpretación. 6, 203–235. Casares, Julio 1950 [1992]. Introducción a la lexicografía moderna. Madrid: Consejo Superior de Investigaciones Científicas (CSIC). Castillo Carballo, Maria Auxiliadora 1998. El término ‘colocación’ en la lingüística actual. Lingüística española actual. 20/1, 41–54. Castillo Carballo, Maria Auxiliadora 2000. Función adjetival y adverbial de algunas locuciones. Español Actual. 73, 57–63. Castillo Carballo, Maria Auxiliadora 2015. De la investigación frase ológica a las decisiones fraseográficas. Un estudio de interrela ciones. Vigo: Academia del Hispanismo. Corpas Pastor, Gloria 1996. Manual de fraseología española. Madrid: Gredos. Corpas Pastor, Gloria 1998. Criterios generales de clasificación del universo fraseológico de las lenguas, con ejemplos tomados del español y el inglés. In Alvar Ezquerra, Manuel / Corpas Pastor, Gloria (eds) Diccionarios, frases, palabras. Málaga: Universidad de Málaga, 157–187. Corpas Pastor, Gloria 2001. Apuntes para el estudio de la colocación. Lingüística española actual. 23/1, 41–56. Coseriu, Eugenio 1967 [1991]. Las solidaridades léxicas. In Principios de semantica estructural. Madrid: Gredos. Coseriu, Eugenio 1952. Sistema, norma y habla. RFHC. Revista de la Facultad de Humanides y Ciencias. Montevideo. 9, 113–177. De Cesaris, Janet / Battaner, Paz 2008. A New Kind of Dictionary: REDES, Diccionario combinatorio del español contemporáneo. In Proceedings Euralex 2006, 399–407. DiCE = Alonso Ramos, Margarita (ed.) Diccionario de colocaciones del español. In progress, on line: . DRAE 2001 = Real Academia Española: Diccionario de la lengua es pañola. 22d ed. Continuosly updated, on line: . Dubský, Josef 1998. Debilitamiento del valor comunicativo del verbo español. In Wotjak, Gerd (ed.) Estudios de fraseología y fraseo grafía del español actual. Madrid: Iberoamericana.

What do we talk about when we talk about collocation in Spanish?

171

García-Page Sánchez, Mario 2001. El adverbio colocacional. Lingüísti ca española actual. 23/1, 89–103. García-Page Sánchez, Mario 2008. Introducción a la fraseología es pañola. Estudio de las locuciones. Barcelona: Anthropos. García Platero, Juan Manuel 2002. Aspectos semánticos de las colocaciones. Lingüística española actual. 24/1, 25–34. Gutiérrez Ordóñez, Salvador 1989. Introducción a la semántica fun cional. Madrid: Síntesis. Koike, Kazumi 2001. Colocaciones lexicas en el español actual: estu dio formal y lexico-semantico. Alcalá de Henares: Universidad de Alcalá / Takusho-ku University. Koike, Kazumi 2002. Comportamientos semánticos en las colocaciones léxicas. Lingüística española actual. 24/1, 5–23. Martínez López, Juan A. 2007. Sobre algunos elementos del contorno en el diccionario fraseológico. Revista de Lexicografía. 13, 55–65. Mel’čuk, I. et al. 1995. Introduction à la lexicologie explicative et com binatoire. Louvain-la-Neuve: Duculot. Mellado Blanco, Carmen 2008. Introducción: colocaciones y algunas cuestiones teórico-prácticas de fraseografía. In Mellado Blanco, Carmen (ed.) Colocaciones y fraseología en los diccionarios. Frankfurt am Main: Peter Lang, 7–31. Mendívil, José Luis 1991. Consideraciones sobre el carácter no discreto de las expresiones idiomáticas. In Martín Vide, Carlos (ed.) Actas del VI Congreso de lenguajes naturales y lenguajes formales. Barcelona: PPU, 711–736. Penadés Martínez, Immaculada 1996. Las expresiones fijas desde los conceptos de centro y periferia de los lingüistas praguenses. In: Casas Gómez, Miguel (ed.) I Jornadas de Lingüística, Cádiz, Universidad de Cádiz, 91–134. REDES 2004 = Bosque, Ignacio (ed.): Redes. Diccionario combinato rio del español contemporáneo. Madrid: SM, 2004. Rivas González, Manuel 2008. Sobre la vinculación de algunas estructuras a la fraseología. Las solidaridades de Coseriu y sus derivaciones. In Mellado Blanco, Carmen (ed.) Colocaciones y fraseología en los diccionarios. Frankfurt am Main: Peter Lang, 147–161.

172

Daniela Capra

Rivas González, Manuel 2010. Posibilidades y límites de la investigación lingüística: el caso de la fraseología. Ph.D. dissertation. Online: . Romero Aguilera, Laura 2014. Unidades pluriverbales y diccionarios: el tratamiento de las colocaciones en la historia de la lexicografía española. In Álvarez Vives, Vicente / Díez del Corral Areta, Elena / Oudot, Reynaud (eds) Dándole cuerda al reloj. Ampliando pers pectivas en lingüística histórica de la lengua española. Valencia: Tirant Humanidades, 223–239. Ruiz Gurillo, Leonor 1997. Aspectos de lexicografía teórica española. Anejo nº. XXIV de la Revista Cuadernos de Filología. Valencia: Universidad de Valencia. Ruiz Gurillo, Leonor 1998. Una clasificación no discreta de las unidades fraseológicas del español. In Wotjak, Gerd (ed.) Estudios de fraseología y fraseografía del español actual. Madrid: Ibero americana, 13–37. Ruiz Martínez, Ana María 2007. La noción de colocación en las partes introductorias de algunos diccionarios monolingües del español. Revista de Lexicografía. 13, 139–182. Seco, Manuel / Andrés, Olympia de / Ramos, Gabino 2004. Diccionario fraseológico documentado del español actual. Madrid: Aguilar. Serra Sepúlveda, Susana 2012. Gramática y diccionario: contornos, solidaridades léxicas y colocaciones en lexicografía españo la contemporánea. Madrid, 2012. Ph.D. dissertation. On line: . Val Álvaro, José Francisco 1999. La composición. In Bosque, Ignacio / Demonte, Violeta (eds) Gramática descriptiva de la lengua es pañola. Madrid: Espasa Calpe, 4757–4841. Valero Gisbert, Maria 2012. Fraseología, gramática y lexicografía. Mantova: Universitas Studiorum. Varela, Fernando / Kubarth, Hugo 1994. Diccionario fraseológico del español moderno. Madrid: Gredos. Wotjak, Gerd 1998. Reflexiones acerca de construcciones verbo-nominales funcionales. In Wotjak, Gerd (ed.) Estudios de fraseología y fraseo grafía del español actual. Madrid: Iberoamericana, 257–279.

What do we talk about when we talk about collocation in Spanish?

173

Wotjak, Gerd 2008. Acerca del potencial combinatorio de las UL: procedimientos escenogenésicos y preferencias sintagmáticocolocacionales. In Mellado Blanco, Carmen (ed.) Colocaciones y fraseología en los diccionarios. Frankfurt am Main: Peter Lang, 193–210. Zuluaga, Alberto 1980. Introducción al estudio de las expresiones fijas. Tübingen: Hueber. Zuluaga, Alberto 2002. Los ‘enlaces frecuentes’ de Maria Moliner. Observaciones sobre las llamadas colocaciones. Lingüística Es pañola Actual. 24/1, 97–114.

Gloria Corpas Pastor

Collocation dictionaries for English and Spanish: the state of the art

Abstract: Collocations pose serious problems in language production and, to certain extent, also in comprehension. Languages differ considerably as to the range of acceptable collocational patterns. Quite often users are badly in need of reliable resources to find, check or translate collocations. In the case of Spanish and English, there are currently no bilingual dictionaries of collocations. However, there are a number of monolingual collocation dictionaries for both languages. This paper provides an up-to-date survey of available collocation dictionaries for English and Spanish, presents an overview of the underlying approaches to collocation in those dictionaries, and puts forward a tentative classification based on the degree of corpus involvement which could serve as guidance and useful information for prospective users. Keywords: collocation, collocations dictionary, dictionary classification, corpus-based, corpus-driven

1. Introduction Words seem to have a natural tendency towards individual or mutual attraction. Word preferences, combinatory properties, lexical restrictions and syntagmatic structures are just some alternative terms to refer to this linguistic phenomenon and its explicit manifestations: collocations, i.e. arbitrary, domain-dependent, diasystematically restricted and cohesive lexical patterns which vary from one language to another. For instance, when you infer from evidence or premises, you draw a conclusion in English, whereas in Spanish the corresponding collocation is sacar una conclusion. Unlike idioms (e.g., draw fire, ‘attract criticism’; sacar a uno de quicio, ‘drive someone crazy’), collocations are semantically compositional and transparent. However, collocations can exhibit some

176

Gloria Corpas Pastor

degree of opacity, as exemplified by avert a tragedy (‘prevent it from happening’) or cosechar una derrota (‘suffer a defeat’). Collocations pose problems mainly in production and especially in second language learning, languages for academic/special purposes, translation and interpreting. Quite often users are badly in need of reliable resources to find, check or translate collocations. To our knowledge, there are currently no bilingual dictionaries of collocations Spanish-English.1 However, there are a number of monolingual collocation dictionaries for both languages which can be used as writing aids and, occasionally, for comprehension purposes. This paper will present the wide range of collocation dictionaries available, provide a tentative classification on the basis of the degree of corpus involvement, and, at the same time, analyse their theoretical and methodological underpinnings.2 Our main aim is to provide some useful guidance to prospective users: students, teachers, language professionals and linguists.

2. Standard dictionaries of collocations Since ancient times, lexicographers have been well aware of the ‘invisible’ ties among words. Suffice it to mention the plethora of syntagmatic dictionaries that started to appear since the 15th century (cf. Zuluaga 1980; Hausmann 1989; Corpas Pastor 2001). No wonder lexicographers have been among the most active in those analysing the 1

2

On the lexicographic treatment of collocations in bilingual dictionaries (Spanish-English) and related translation issues, please refer to Corpas Pastor (2015a, 2015b and 2016/in press). See McGee (2012) for a review of four monolingual dictionaries of English collocations (LTP, BBI, OCD and MCDSE) in the light of the data-driven learning (DDL) approach. Spanish collocation dictionaries REDES and PRÁCTICO have been compared by Barrios Rodríguez (2007). Ferrando (2013) offers a brief description of DE, REDES, PRÁCTICO and DICE in her study on collocations in German and Spanish lexicography from a contrastive perspective. Buendía Castro and Faber (2014) provide a comparative analysis of some of the dictionaries mentioned in this paper (BBI, OCD, MCDSE, DICE, REDES and PRÁCTICO) in terms of collocations encoding, information and display.

Collocation dictionaries for English and Spanish: the state of the art

177

semantic framework of collocations. In this respect, Hausmann (1979, 1985, 1989, 1991, 1998, 2007) has contributed some of the most influential ideas in the advancement of the semantic approach to collocation. Hausmann conceives collocations as a bipartite structure,3 conventionally restricted, in which both components exhibit a different semantic status: for example, in den Tisch abräumen (‘clear the table’), the Basis (base) is the semantically autonomous word (Tisch) and abräumen is the Kollokator (collocate), that is, the semantically dependent component. In language production, the base selects its collocate in a unidirectional fashion. In other words, the selection of the base is contingent upon the prior selection of the collocate. Besides, he established that collocations enter into specific grammatical patterns: 1. verb + noun [object] (to tackle a problem) 2. adjective + noun (weak tea), 3. noun [subject] + verb (the heart palpitates), 4. noun + noun (a pack of dogs), 5. adverb + adjective (keenly aware) and 6. verb + adverb (hurt badly).4 By default, nouns tend to be the bases, except for patterns 5 and 6 where the bases are verbs and adjectives respectively. Types 1 and 3 provide basic information about the sentence function of the constituents. In addition, Hausmann also developed metalexicographic criteria for a more efficient placement of collocations: (i) they should be listed under the entry for the collocates in decoding dictionaries in order to aid comprehension (or for reassurance), and (ii) collocates should appear under their bases in the case of encoding dictionaries, which are designed for production. British English lexicography has been pioneer in the compilation of collocation dictionaries.5 The first dictionaries of English collocations follow the principles advocated by Hausmann6: the Selected English 3 4 5

6

There is room, however, for complex collocations like harsche Kritik üben (‘criticise harshly’) (Hausmann 2007: 218). These are Hausmann’s (1989) types of collocations. See Hausmann (1989) for a historical survey of collocational dictionaries. The ODCIE and the ODEI have not been included in our study as both editions are fully consistent with the traditional dictionaries of English idioms and include all kinds of composites, from restricted collocations to pure idioms (cf. Cowie 1981). Hausmann (1991) also mentions The Dictionary of English Words in Con text (1979), which was published in Germany as a production tool for German-speaking students and teachers of English. It contains collocations for

178

Gloria Corpas Pastor

Collocations [SEC] (Dzierzanowska/Kozłowska 1982), the English Ad verbial Collocations [EAC] (Kozłowska 1991), the BBI Combinatory Dictionary of English [BBI] (Benson/Benson/Ilson 1986, 19997, 2009) and the LTP Dictionary of Selected Collocations [LTP] (Hill and Lewis 1997). Both the SEC and the EAC were originally published in Poland and later refunded in the LTP as one volume. The three are production dictionaries designed to help advanced learners write, translate or speak English accurately. Collocations are listed under their bases. Common and ‘strong’ (e.g. cognitively salient) collocations are included in the dictionaries: magnificent house, declare war; but technical collocations or highly colloquial ones have been systematically left out. SEC includes collocations with noun bases (Hausmann’s patterns 1–3) and EAC includes collocations with verb and adverb bases (Hausmann’s patterns 5–6). LTP is not just a combination of both dictionaries but a revised, updated, new dictionary, which also includes Hausmann’s pattern 4. A common practice in SEC and EAC is to provide semantic information in brackets right after the lemma. The main types of semantic information are the synonymic definition, e.g. “healed (healthy again)”; a restricted collocation, a semantic set or an instance of a typical set, e.g. “oppressive (laws, etc.)”. Lemmas are then given separate entries according to their different collocational senses and sets. The LTP appears to be more systematic (and modest) in this respect. See the entries for open in EAC and LTP for a comparison:

7

verbs, substantives, adjectives and adverbs but, all in all, “the space is simply wasted with idiomatic expressions” (1991: 230). The second edition of the BBI (1997) was published with minor revisions two years later in Germany under the title of Student’s Dictionary of Collocations [SDC] (Benson 1999).

Collocation dictionaries for English and Spanish: the state of the art

179

Table 1: Entry for open in the EAC and the LTP. OPEN (become open, e.g. door), Adj. Adv. a crack, a little, inwards, outwards, quickly, unexpectedly, violently, wide OPEN sth vt8 Adv. gently, noiselessly, quickly, wide OPEN (eg. door), Adj. Adv. always, barely, constantly, half-, wide OPEN (not decided, e.g. matter) Adj. Adv. completely, still

OPEN (verb) Open gently, inwards, outwards, suddenly, unexpectedly, wide OPEN (adj) always, barely, completely, half-, wide open

EAC

LTP

The BBI is a combinatory dictionary that includes grammatical collocations (or colligations) and lexical collocations. The dictionary has had three editions so far (1986, 1997, 2009). The first two are based on lexicographers’ intuitions, while the third one refers to corpora (see section 3). All editions declare a pedagogic orientation. The first edition of the BBI states that it is a learner’s dictionary of English intended to help users to express themselves as naturally as possible. In the third edition, the BBI claims to be a specialised dictionary designed to help learners of English find collocations so as they are able “to express themselves fluently and accurately in speech and writing” (xiii). Word combinations are divided into (i) free combinations (af ter lunch); (ii) idioms (to be beside oneself), which are syntactically frozen and semantically opaque; (iii) lexical and grammatical collocations (declare war, responsible for), which are placed between idioms and free combinations; (iv) transitional combinations (as light as a feather), which are more transparent than idioms and less variable than collocations; (v) and compounds, which are completely frozen and include ‘multiword lexical units’ (MLUs) and phrasal verbs (long shot, double take, hand in). In the first edition, collocations are defined as follows: Students must learn how words combine or ‘collocate’ with each other. In any language, certain words regularly combine with certain other words or grammatical constructions. These recurrent, semi-fixed combinations, or collocations, 8

The abbreviation vt stands for “transitive verb” in the list of Signs and Abbreviations of SEC (p. 7).

180

Gloria Corpas Pastor can be divided into two groups: grammatical collocations and lexical collocations. Grammatical collocations consist of a dominant word — noun, adjective/ participle, verb — and a preposition or a grammatical construction. Lexical collocations, on the other hand, do not have a dominant word; they have structures such as the following: verb + noun, adjective + noun, noun + verb, noun + noun, adverb + adjective, adverb + verb. (Benson/Benson/Ilson 1986: xiii)

Grammatical collocations (colligations) precede lexical collocations. They are further subdivided into types (see Table 2). Table 2: The BBI typology of collocations (Benson/Benson/Ilson 2006). [G1] noun + preposition (blockage against) [G2] noun + to + noun (a pleasure to) [G3] noun + that-clause (an oath that) [G4] preposition + noun (by accident) [G5] adjective + preposition (angry at) [G6] predicative adjective + to + infinitive (stupid to go) [G7] adjective + that-clause (afraid that) [G8] nineteen English verb patterns, designated by the capital letters A to S.9

[L1] CA verb + noun/pronoun (set a record) [L2] EN verb + noun/pronoun (squander a fortune) [L3] adjective + noun (reckless abandon) [L4] noun + verb (bombs explode) [L5] noun + of + noun (a pride of lions) [L6] adverb + adjective (deeply absorbed) [L7] verb + adverb (argue heatedly)

Grammatical collocations

Lexical collocations

Similarly to EAC and LTP, collocations are listed under their bases or ‘dominant words’: nouns (L1, L2, L3, L4, L510), verbs (L7) and adjectives (L6). However, to aid comprehension, there are sense discriminators and short definitions for nouns, verbs and adverbs; occasionally, short examples and glosses are also provided (see Table 3).

9

10

The verb patterns are: A = svo to o (or) svoo; B = svo to; C = svo for o (or) svoo; D = sv prep. o (or) svo prep. o; E = sv to inf.; F = sv inf.; G = svv-ing; H = svo to inf.; I = svo inf.; J = svov-ing; K = sv possessive v-ing; L = sv(o) that-clause; M = svo to be c; N = svoc; O = svoo; P = sv(o)a; Q = sv(o) wh-word; R = s(it) vo to inf. (or) s(it)vo that-clause; S = svc (adjective or noun). [List of abbreviations: s = subject; v = verb; o = object; (direct or indirect); c = complement; a = adverbial (when obligatory); v-ing = verb form in -ing.] L5 collocations are listed under the first noun.

Collocation dictionaries for English and Spanish: the state of the art

181

Table 3: Entry for damage in the BBI (1986). damage I n. [“harm”] 1. to cause, do ~ to; to inflict ~ on 2. to suffer, sustain ~ 3. to repair, undo ~ 4. grave, great, extensive, heavy, incalculable, irreparable, serious, severe; lasting, permanent; light, slight ; widespread ~ 5. fire; flood ; material; property; structural ~ 6. brain ~ (irreversible brain ~) 7. ~ from (~ from the fire) 8. ~ to (was there much ~ to the car ? the ~ done to the house was extensive; grave ~ to one’s reputation)

The BBI is heavily influenced by Mel’čuk’s Meaning-Text Theory (MTT)11, the notion of Lexical Function (LF) and the extremely influential Dictionnaire explicatif et combinatoire du français contem porain (DECFC). Within the MTT, the term lexical function (LF) is used in the mathematical sense f(X) = Y in order to describe and represent semantic relationships among words.12 The function argument values are word-dependent. A clear parallelism can be drawn, then, between lexical functions [fi (argument) = valuei] and the semantic approach to collocation [fi (base) = collocatei], since both values and collocates are selected by their respective arguments and bases in order to express a specific meaning relation (cf. Mel’čuk 1998; Gelbukh/Kolesnikova 2013). The BBI makes systematic reference to simplified functions only in relation to (i) L1 collocations with transitive verbs denoting creation or activation (CA collocations, launch a missile); (ii) L2 collocations with transitive verbs denoting eradication or nullification (EN collocations, annul a marriage), and (iii) L5 collocations, where the first noun reflects part-whole or general-specific relations (an article of clothing, a herd of buffalo, an act of violence). Besides, the BBI also includes semantic indications (sense discriminators and random short definitions) within the entries which might be somehow equated with simplified 11

On the Meaning-Text Theory, see Mel’čuk (1973, 1996), Mel’čuk/Clas/Polguère (1995), Mel’čuk/Pertsov (1987), and Wanner (2007), among others. 12 Mel’čuk (1996) identified over 60 LFs, but the number keeps changing to accommodate the research findings and lexicographic practice associated with the compilation of combinatory dictionaries. In addition, simple LFs can be combined to form complex LFs. “Formally, an LF f is a function that associates with a given LU L a set {Li} of lexical expressions that express, contingent on L, the meaning (f) associated with f and bearing on the meaning (L): f(L) = {Li}, such that an Li expresses (f)((L)). L is called the argument, or keyword, of f; f(L) = {Li} is the value of f applied to L; and an Li is an element of this value.” (Mel’čuk 1996: 40).

182

Gloria Corpas Pastor

versions of LFs. In fact, Alonso Ramos (1993: 155–160) identified typical LFs realised in the types of lexical collocations which were included in the first edition of the BBI. In comparison with the rich English tradition, collocational dictionaries for Spanish appeared much later. The Diccionario Euléxico para expresarse con estilo y rigor [DE] (Boenu 2001) is the first collocations dictionary of this kind. This modest piece of work is intended as a writing aid to help users produce appropriate word combinations (Boenu 2001: i). But there is no mention to colocación (‘collocation’) anywhere in the dictionary. Instead, there is an implicit reference to collocations as genre indicators and stylistic devices within a text: Una estudiada combinación del sustantivo, el verbo y el adjetivo permite expresar el concepto o idea deseados con mayor estilo y rigor. Según el contexto, y a mero título de ejemplo, quizá prefiramos utilizar: irradiaba una luz cegadora en lugar del simple daba mucha luz. (Boenu 2001: 9)13

Entries are organised around nouns (bases) and their verbal and adjectival collocates. The microstructure is extremely poor. It contains only a list of collocates arranged by grammatical categories and, occasionally, random cross-references to other entries (cf. the entry for descontento, ‘dissatisfaction’, ‘unrest’). Sometimes relevant collocates are missing (estallar / crecer / provocar, etc. + descontento) or else specific determinants for the base have been included without any obvious reason14. Table 4: Entry for descontento in the DE. descontento Verbo aflorar el, declarar el, despertar, dimanar el, evidenciar(se) el, expresar el, exteriorizar(se) el, germinar el, invadir el, manifestar(se) el, sembrar el, suscitar el 13

14

‘A careful combination of noun, verb and adjective can express the desired concept or idea in a more appropriate style. Depending on the context, and by way of example, it could be preferable to use “radiated a blinding light” instead of simply “gave great light”.’ (Our translation). For instance, [aflorar + un + descontento] is perfectly possible in Spanish: “Lo que pasa es que en esta legislatura la crisis ha hecho aflorar un descontento generalizado, porque cada uno en su propia casa siente la mala política del Gobierno.” (Our emphasis. Retrieved from WebCorp http: ).

Collocation dictionaries for English and Spanish: the state of the art

183

3. Corpus-based dictionaries of collocations Standard dictionaries of collocations tend to reflect lexicographers intuitions plus certain underlying theoretical principles (basically Hausmann’s postulates and, to a certain extent, Mel’čuk’s lexical functions). This type of dictionary does not take into account important defining aspects such as frequency or evidence of usage. Other theories of collocations rely, precisely, in those aspects. Since the late fifties, the British Contextualism, represented by Firth, Halliday and Sinclair, started building a semantic theory of collocation, which later evolved into a statistic approach and influenced corpus-based lexicography. The term collocation15 was introduced by Firth (1957, 1968) to mean a mode of semantic analysis (meaning by collocation) and a stylistic means to characterise restricted languages. Later on, Halliday (1966) equated collocation with usual or habitual co-occurrence16 and Sinclair (1966) advanced his ideas for computer-assisted analysis of collocation al patterns in large corpora, which were widely adopted in Systemic Functional Linguistics.17 A decade later, Jones and Sinclair (1974: 19) provided a frequency-based definition18 which would shape most NLP studies so far: 15

16

17

18

Due to lack of space, only the most relevant authors and works will be mentioned. For a more detailed account, see Bartsch (2004), especially chapters 2–4, and Barnbrook/Krishnamurthy/Mason (2013), especially chapters 1–2. Halliday’s redefinition of collocation in probabilistic terms marks the beginning of the distributional or statistical approach to collocation: “the syntagmatic association of lexical items, quantifiable, textually, as the probability that there will occur at n removes (a distance of n lexical items) from an item x, the items a, b, c…” (Halliday 1966: 158). Collocates include the node (or word under study) and the word or set of words that can appear in combination with the node (the collocational range) within a given collocational span (distance among collocates or window). The basic tool in the analysis of collocations is the KWIC (keyword in context) concordance for the discovery of patterns. Greenbaum (1970: 60) also refined the notion of collocation by introducing the concepts of principal collocate (to make a decision, to mow the lawn) and extended collocate (He really is a fool). In addition, collocations can be position-free vs. position-dependent (the more numerous class), casual vs. regular or habitual, and downward vs. upward (Sinclair 1991). Within British Contextualism, some discordant voices against

184

Gloria Corpas Pastor “Collocation” is the co-occurrence of two items in a text within a specified environment. “Significant collocation” is regular collocation between items, such that they co-occur more often than their respective frequencies and the length of text in which they appear would predict.

In order to account for language institutionalisation and creativity, Sinclair (1996) developed a corpus-driven model of analysis for identifying and describing lexical items as ‘extended units of meanings’.19 The model is composed of five categories of co-selection: (i) core, that is the word(s) which is/are invariable and always present (for instance, naked eye); (ii) collocation (co-occurrence of words with the core, e.g. see, visible, invisible, apparent, evident, obvious, undetectable at L320 and to a lesser extent at L4); (iii) colligation (co-occurrence of grammar choices with the core, e.g. the at L1 position and with, to, by, from, as, upon, tan at L2); (iv) semantic preference, that is, the restriction of regular co-occurrence to words which share a common semantic feature (e.g. about ‘vision’ or ‘visibility’); and (v) semantic prosody, in other words, the overall functional meaning of the lexical item, e.g. ‘difficulty’, also confirmed by the collocational range of see (small, faint, weak, difficult) and visible (barely, rarely, just and modal verbs can, could). Neo-Firthian linguistics and British Contextualism have played a crucial role in the statistical and frequency-based theories of collocation. The use of corpora underpins this approach. However, the degree of penetration of corpora and their potential uses varies significantly among collocation dictionaries. Following a well-established distinction between corpus-based and corpus-driven language studies (Tognini-Bonelli 2001: 65, 84–85), corpus-based collocations dictionary use corpora as a convenient methodology in order to explore combinatory properties of words, check the arrangement of collocation senses, validate, refute or refine the list of selected collocates, and find illustrative examples of use. By contrast, corpus-driven collocation dictionaries are more inductive and typically use corpus data as the main

19 20

“frequency” as the sole criterion for collocations are, for instance, Greenbaum (1974: 83, 1988: 114) and Hoey (2006[2005]: 5). This notion lies at the heart of Hank’s novel approach to lexical patterns (Hanks 2013). Ln means n positions to the left of the node in a KWIC line; likewise, Rn means n positions to the right.

Collocation dictionaries for English and Spanish: the state of the art

185

source from where lexicographers’ hypothesis and theories about the combinatory properties of languages emerge. The third edition of the BBI (1999) qualifies as a corpus-based collocations dictionary (or “corpus-refined”, according to McGee 2012). Collocations have not been extracted (semi-)automatically, but are the result of lexicographer’s intuitions which have been ‘refined’ and ‘expanded’ through exposure to large corpora: Nowadays, our task is eased not only by the availability of corpuses of contemporary English (such as the British National Corpus) but also by the amazing resource of the Internet itself, which enables us to search in it for a word and find superb examples of that word in context. Nor should it be forgotten that an important source of new information in BBI 3 is, paradoxically, BBI 2, now that the computer allows material from an entry in BBI 2 to be added to other entries in BBI 3 when such material is appropriate. (Benson/Benson/Ilson 2009: viii).

The BBI (1999) is a revised edition with more collocations, examples and usage notes (compare the new entry for damage in Table X), alongside a substantial number of new collocations in the field of computing, e.g., “server n. an online ~”; “program […] 10. to boot up; debug; download; execute; load; reboot; run; write a ~ 11. a user-friendly ~ 12. a computer; software; word processing ~ [“academic course of study”]” (cf. Benson/Benson/Ilson 2009: xi). Table 5: Entry for damage in the BBI (1999). damage I n. [“harm”] 1. to cause, do ~ to ; to inflict ~ on 2. to suffer, sustain ~ 3. to repair, undo ~ 4. to assess the ~ (trying to assess (the extent of) the ~) 5. grave, great, extensive, heavy, incalculable, irreparable, serious, severe; lasting, permanent; widespread ~ 6. light, slight ~ 7. Environmental; fire; flood; material; property; structural ~ 8. brain ~ (irreversible brain ~) 9. ~ from (~ from the fire) 10. ~ to (was there much ~ to the car? the ~ done to the house was extensive; to do grave ~ to smb.’s reputation) (see USAGE NOTE at damage II)

This third edition is also much indebted to Mel’čuk’s lexical functions (LFs) for explanatory and combinatorial dictionaries (cf. Benson/ Benson/Ilson 2009: vii–ix), as well as Hausmann’s types of collocations, even though Benson (1989) claims to have been unaware of the works by Hausmann during the compilation phase of the BBI. Collocations or recurrent combinations are defined as “fixed, identifiable,

186

Gloria Corpas Pastor

non-idiomatic phrases and constructions” (Benson/Benson/Ilson 2009: ix). However, the actual term collocation is used in BBI (1999) at least to refer to: (i) the collocational phenomenon or combinatory properties of words (cf. phraseology), (ii) the lexical combinatory properties of words and the resulting combinations, (iii) the grammatical combinatory properties of words and the resulting combinations, and (iv) a type of compositional lexical word combination, fixed and recurrent, which is midway between idioms and free combinations. This is reminiscent of some uses of ‘phraseology’ and ‘collocation’ within the statistical framework, of explanatory and combinatory dictionaries or, even, of the slovosočetanije or word combinations studied by the Russian school as part of syntax (Alexandrova/Ter-Minasova 1987). Similarly to the third edition of the BBI, the Diccionario de colocaciones del español [DICE] (Alonso Ramos 2004)21 represents a compromise solution. The DICE is an electronic combinatory dictionary of Spanish.22 It follows Mel’čuk’s approach and the principles of DCEFC with a pedagogic orientation (Alonso Ramos 2010). It is both semantically-based and corpus-based: corpora are only used to illustrate collocations in context, enrich the dictionary-based description of lemmas and check frequency (Alonso Ramos 2003: 557–558). The DICE is organised in semantic fields, with a special focus on the domain of emotions. The microstructure of the entry includes the argument structure, frequency, examples retrieved from Spanish corpora (CREA, LexEsp), the Internet and other sources, quasisynonyms and quasi-antonyms, the syntactic schema (esquema de régimen) with the linguistic realisations of actants and collocations23. See the entry excerpt for descontento.

21 . 22 For a recent description, please refer to Vincze/Mosqueira/Alonso (2011). The DICE is still under construction. 23 The syntactic schema and the collocations are displayed in separate windows.

Collocation dictionaries for English and Spanish: the state of the art

187

Table 6: Entry excerpt for descontento in DICE. descontento 1 m. (Sentimiento) [ver ejemplos] Frecuencia moderada descontento de individuo X por hecho Y Ejemplos 1. El incumplimiento de las promesas políticas provoca descontento en la población 2. Ya en diciembre de 1939 había descontento entre los altos mandos militares en contra de la situación política. Cuasisinónimos agitación 3, desagrado 1, disgusto 1, insatisfacción 1 Cuasiantónimos agrado 1a, placer 1a, satisfacción 1a Ver esquema de régimen Colocaciones ver todas, atributo de los participantes, descontento + adjetivo, verbo + descontento, descontento + verbo, nombre de descontento

LFs are provided in their full form but also in a simplified version or lexicographic gloss for ease of comprehension. The DICE features an advance search system which allows users to find (i) the collocates of a base by means of an LF (búsquedas directas, ‘direct searches’), (ii) the base of a collocation from the collocate or from a specified LF (búsque das indirectas, ‘indirect searches’), or, else, (iii) correct collocations for a word according to a list of glosses (simplified LFs) through a writing aid (ayuda a la redacción). For instance, a direct search for the word tristeza (‘feeling of sadness’) plus the LF Magn + A124 retrieves three values or collocations: hundido, lleno and sumido. The microstructural information about each collocate (e.g., hundido) generally includes the syntactic pattern (e.g., [en ART ~]), a gloss (e.g., “que siente una ~ intensa”) and example(s) extracted from corpora, e.g. “María Cristina de Habsburgo está hundida en la tristeza, sumergida en un pozo trágico”. Conversely, the writing aid permits to find adjectives that collocates

24 A1 refers to the Deep Syntactic Actant (active).

188

Gloria Corpas Pastor

with tristeza according to (i) the lexical units25 associated to the lemma and (ii) the list of glosses (simplified LFs or collocational meanings) displayed on the screen: •

intensa (‘intense’): Magn (tristeza 1a) = enorme, grande, honda, inconmesurable, infinita, inmensa, insondable, marcada, profun da, sin límites; • poco intensa (‘mild’, ‘not intense’): Anti Magn (tristeza 1a) = leve, ligera, sosegada, vaga; • verdadera (‘true’): Ver (tristeza 1a) = verdadera; • agradable (‘pleasant’): Bon (tristeza 1a) = dulce; • que no se permite su manifestación (‘repressed’): A2 non Perm1 Manif (tristeza 1a) = reprimida; • grande (‘big’): Magn (tristeza 1b) and Magn (tristeza 2) = grande, profunda

4. Corpus-driven dictionaries of collocations The corpus-driven approach is a methodology whereby the corpus serves as an empirical basis from which lexicographers extract their data and detect linguistic phenomena without prior assumptions and expectations (cf. Tognini-Bonelli 2001; McEnery/Hardy 2012). The first corpus-driven dictionaries of collocations relied exclusively on frequency and statistics, with no room for intuition or meaning considerations. There are two first-generation dictionaries of English collocations of this kind: A Dictionary of English Collocations. Based on the Brown Corpus [DEC] (Kjellmer 1994) and The Collins Cobuild − English Collocations on CD-ROM [CCEC] (Sinclair 1995). The DEC is based on n-gram extraction for all running words in the Brown Corpus. Words are not lemmatised and no specific frequency threshold is in place (for instance, there is an extremely long entry for the word the). Discontinuous collocational patterns are not provided, as only 25

The lemma tristeza has three senses (unidades léxicas) associated: tristeza 1a f. (Sentimiento), tristeza 1b f. (Hecho) and tristeza 2 f. (Cualidad).

189

Collocation dictionaries for English and Spanish: the state of the art

combinations of adjacent items (n-grams) have been included. The DEC offers collocates (on either side of the node) and a frequency analysis, regardless of meaning.26 By way of example, see the entry for sentiment. Table 7: Entry for sentiment in the DEC. SENTIMENT CTy 5; CF 11; Cte 47

EF

IF

A SENTIMENT a

2

2

ANTI-SLAVERY SENTIMENT a

2

2

IN SENTIMENT a

2

2

OF SENTIMENT a

3

3

PUBLIC SENTIMENT a

2

2

RF *

*

TC

DI

2

1

2

2

2

1

2

2

2

2

The CCEC uses statistical methods and T-score in order to extract discontinuous co-occurrences. This methodology27 lies at the heart of most corpus-driven dictionaries of collocations. Collocation is defined as co-occurrence in the Content menu: “two words which occur together”. It includes some 10,000 nodes of core vocabulary which have been linked to the Bank of English. It functions in a very simple way: once a word is selected on the screen, up to 20 collocates are shown, in decreasing frequency order. Stop-words can be used to refine searches. KWIC lines can be sorted left and right, examined (up to 20 lines) and expanded to view whole sentences. Rather than an electronic dictionary of collocations, the CCEC resembles a concordancer.28 26 27 28

Only the grammaticality of the collocations has been ensured, to prevent highly frequent n-grams which are not collocations, like although he, hall to, etc. On extraction methodologies currently used in computational phraseology, please refer to Corpas Pastor (2013). On-line writing aids and websites for collocations lay beyond the scope of this paper. Some interesting samples for English are Just the Word (), the Free Online Collocations Dictionary (), the English Col locations Dictionary Online (), Collocations Search (), and Collocation Checker (, among others. GDEX Demo Dictionary by Sketch Engine is an experimental automatic collocations dictionary that supports many languages, including Spanish and English ().

190

Gloria Corpas Pastor

By contrast, second-generation, corpus-driven dictionaries of collocations provide extensive coverage and rich microstructure information derived from corpus evidence which has been filtered by lexicographers’ knowledge and experience. An outstanding feature that singles out this type of dictionaries is their semantically-enhanced entries. This is in line with what most combinatorial dictionaries do by means of lexical functions. While second-generation dictionaries have been mainly designed for production, the aforementioned semantic enrichment turns them into a coding/encoding tool at the same time. For English there are three main corpus-driven dictionaries: the Ox ford Collocations Dictionary for Students of English [OCD]29 (Crowther/ Lea/Dignen 2002, 1st ed.; McIntosh/Francis/Poole 2009, 2nd ed.), the Mac millan Collocations Dictionary for Learners of English [MCD] (Rundell 2010, 1st ed.) and the Longman Collocations Dictionary and Thesaurus [LCDT]30 (Mayor 2013, 1st ed.). They are learner dictionaries designed as writing aids for production. The 1st edition of the OCD (1st ed.) is based on the 100 million-word British National Corpus (BNC), whereas the 2nd edition with CD-ROM is based on the Oxford English Corpus (OEC) of over 2 billion words, which has been mined from the web31 through Sketch Engine (Kilgarriff et al. 2004). The MCD also uses a gigabyte corpus mined from the web with Sketch Engine, the World English Corpus of two-billion words32. The LCDT has been compiled using the Longman Corpus Network – a huge database of over 450 million words from a wide range of real-life printed sources and recorded speech.33 The three dictionaries share a neo-Firthian approach to corpus data and collocation: •

“Collocation is the way words combine in a language to produce natural-sounding speech and writing. For example, in English you say strong wind but heavy rain. It would not be normal to say *heavy wind or strong rain.” (OCD)

29 30 31

On-line version: . On-line version: (subscription required). On web corpora, see Hundt/Nesselhauf/Biewer (2007), and Schäfer/Bildhauer (2013). For more information on this corpus, you are referred to . For a complete description of this corpus, see .

32 33

Collocation dictionaries for English and Spanish: the state of the art

• •

191

“[…] the property of language whereby two or more words seem to appear frequently in each other’s company.”34 (MCD) “In this book, a collocation is a word that you often use with another word. This can be an adjective used with a noun – for example when talking about the rain, you can use heavy rain when there is a lot of rain, or when talking about the wind, you use high winds about winds that are very strong.” (LCDT)

Both the OCD and the MCD have extracted collocations automatically through Sketch Engine35. This robust and stable software includes a corpus building and management system (web service) plus a corpus query tool (software) which is powerful enough to process giga-token size corpora. It consists of three core functions: Concordance, Word Sketch and Thesaurus. Word Sketch provides a one-page summary of a word’s grammatical and collocational patterns. Results are ranked according to raw frequencies or score (the salience threshold) and linked to the concordancer. The OCD (2009) lists word collocations − e.g. words that combine with each other and do not allow changes (small fortune, but not *little fortune) − and category collocations,36 defined as “another area of collocation […] where a word can combine with any word from a readily definable set. This set may be quite large, but its members are predictable, because they are all words for nationalities, or measurements of time” (McIntosh/Francis/Poole 2009: vii). Category collocations bear a resemblance to Sinclair’s semantic prosody and to Mel’čuk’s lexical functions. Illustrative examples would be economic + words which denote ‘in high degree’ (extremely, fairly, very, etc.) and 34

35 36

On the Macmillan website, Rundell provides a more detailed definition of collocation: “Collocations are ‘semi-preconstructed phrases’ which allow language users to express their ideas with maximum clarity and economy. Not only that, there is strong correlation between frequency in a corpus and typicality, which means that the use of common collocations contributes to the naturalness of a text.” See . . For a recent description of the latest version of the tool, see Kilgarriff et al. (2014). The OCD also includes a separate section of ‘phrases’, i.e. multiword units that are frequent in general language and pose comprehension problems to non-native speakers.

192

Gloria Corpas Pastor

pollution + adjectives or nouns which denote ‘poisonous/poisoned substances’ (lead, mercury, oil, ozone, smoke, traffic, etc.). As a production dictionary, collocations tend to be listed under their bases (therefore, noun collocates are not provided for verb and adjective entries). They are ordered first by lemma senses and then by grammatical categories. Within grammatical categories, collocations follow an implicit semantic grouping criterion. Most collocations included are of the types established by Hausmann, plus some others, like verb + adjective (declare safe) and noun (+ preposition) + noun (party dinner, light of the moon); as well as some colligational patterns with prepositions (noun + preposition, at the conference; adjective + preposition, safe from), phrases (the din of battle) and phrasal verbs (take off). The base entries contain sense discriminators for collocational meaning and illustrative examples of the collocations in context (see entry for dip in Table 8). British and American English variants are systematically provided. As a bonus, the CD-ROM version lists collocates under both constituents, and provides definitions and spoken pronunciation of British and American English for all lemmas. Table 8: Entry for dip in the OCD. dip verb 1 in liquid adv. lightly ◊ She dipped the brush lightly in the varnish. | quickly ◊ Quickly ~ the tomatoes in boiling water. prep. in, into ◊ He dipped his finger in the water. 2 go/move downwards adv. gently ◊ hills which ~ gently to the east | slowly ◊ The sun was slowly dipping out of sight. | steeply | down ◊ The road dipped steeply down into the town. prep. below ◊ The sun dipped below the horizon. 3 prices, support, etc. adv. slightly | sharply ◊ Support dipped sharply to 51%. prep. below ◊ when unemployment ~s below a certain point.

As stated in the front matter, the MCD is a production dictionary especially suited for upper-intermediate to advanced students of English

Collocation dictionaries for English and Spanish: the state of the art

193

preparing for IELTS37 exams, but is also valuable in other academic or professional settings. This dictionary includes noun-based patterns (e.g., strong desire), verb-based patterns (e.g., deserve applause) and adjective-based patterns (e.g., become desirable)38. The patterns comprise Hausmann’s collocation types alongside with less common ones like verb + preposition + noun (dressed to kill), noun + preposition + noun (room for improvement) and noun + and/or + noun (impartial and unbiased), colligations of bases with prepositions (benefit from) or infinitives (reasonable to) and phrasal verbs as headphrases. Unlike most collocation dictionaries (including the BBI and the OCD), the MCD provides noun collocates under the entries for verbs and adjectives. This confronts the traditional listing of collocations under just their bases. Such lexicographic practice, that goes back to Hausmann’s lexicographic postulates, has already been challenged by Siepmann (2005) and Walker (2009), among others. Collocations have been selected according to frequency and statistical significance. Corpus-driven meaning by collocation seems to be the underlying theoretical assumption. All lemmas are defined, collocates serve as sense discriminators and, whenever possible, they are grouped into semantics sets (‘semantic groupings’), each of which is preceded by their own short definitions (some of them à la Mel’čuk, but in extremely simplified forms or glosses). For instance, under the noun entry empire the pattern V + N features four semantic groupings which resemble combinatory LF or the nullification and activation verbs in BBI: (a) ‘increase the size of an empire’: ex pand, extend; (b) ‘establish an empire’: build, create, establish, forge, found; (c) ‘rule an empire’: control, govern, rule; and (d) ‘destroy an empire’: destroy, dismantle, overthrow. Finally, there are plenty of usage notes that provide colligations and alternatives to collocations, as well as real examples which show collocations in use (cf. the entry for implement).

37 38

IELTS stands for International English Language Testing System (). See Coffey (2011: 333) for a summary of the main collocation patterns included in the MCD.

194

Gloria Corpas Pastor

Table 9: Entry excerpt for implement in MCD. implement V make an idea, plan, system or law start to work • adv + V successfully correctly, effectively, efficiently, properly, rigorously, successfully We successfully implemented the scheme on 17 February 2009. • completely fully The planned changes have not yet been fully implemented. • immediately immediately, quickly, swiftly, with immediate effect, without delay The adjudicator’s decision is final and must be implemented immediately. • over a particular area locally, nationally, widely Nation al contracts will be locally implemented. • badly badly, poorly Legislation which is poorly imple mented is not acceptable. Usage Implement is usually passive in all of the adv + V combinations shown above: Their advertising campaign was badly implemented. • We are now working to ensure that the recommendations are effectively implemented.

It is worth pointing out again that grouping collocates semantically is a characteristic feature of second-generation, corpus-driven dictionaries of collocations. As seen above, the OCD uses implicit semantic groupings (like most combinatorial dictionaries). Similarly, the LCDT is semantically enriched with lemma definitions and collocations glosses, as well as collocational groupings. The entries contain a lemma and its definition (or different senses, in the case of polysemous items and/ or collocational meanings). Then, a list of collocations is provided after each sense or single definition. Collocations are grouped by part of speech, listed according to frequency, and followed by an illustrative example. Both lexical (or “true” collocations) and grammatical collocations are included. For instance, under loyalty we find lexical collocations with adjectives (great/deep/strong loyalty), with nouns (family/ company/party loyalty, a loyalty scheme) and with verbs (show/demon strate loyalty); as well as grammatical collocations with prepositions (loyalty to/towards sb/sth, out of loyalty) and common phrases (a show of loyalty, where your loyalties lie). Glosses are provided for collocations with metaphorical, figurative or specialised senses, e.g., fierce/

Collocation dictionaries for English and Spanish: the state of the art

195

intense loyalty (=very great), unswerving/unwavering loyalty (=never changing), divided/conflicting loyalties (=to more than one person or group, especially when this causes problems), brand loyalty (=by customers who always buy a particular make of product), swear/pledge loyalty (=promise to be loyal), etc. For some keywords, a thesaurus is integrated in the entry in order to guide users to choose the right synonym and appropriate collocations. For instance, under the entry for main we find a grammar note (“Main is always used before a noun, usually with the or our/my etc.”), noun collocations (the main part / aspect / feature, the main conclusion) and a thesaurus with quasi-synonyms (principal, chief, primary, core, central, prime and predominant). See Table 10. Table 10: Entry excerpt for main in LCDT. THESAURUS: main reason | cause | aim | objective | problem | difference | argument most important. Principal and chief are more formal than main: Most people work for the same principal reason – in order to make money. | Cutting down trees was the chief cause of floods and landslides. | The principal aim of the research is to examine people’s attitudes to technology. |Their chief problem was lack of funds. primary aim| objective | purpose | reason | function | role | concern | focus most important – used especially about the reason why you are doing something. Primary is more formal than main: The primary aim of the research is to find out more about the causes of the disease. | Our primary objective is to collect information about students’ performance. | The primary function of government is to represent the wishes of the people. | His health is our primary concern core business | beliefs | values | principles | skills | subject | issue | area most important – used especially about the things that people pay most attention to: The company needs to focus on its core business. | One of our core values is free dom of choice. | Students receive help with English, maths, and other core skills.

196

Gloria Corpas Pastor

Grammar and usage notes are provided for words that tend to cause problems for students. Particular usages, language varieties and levels of formality tend to be indicated by means of diasystematic labels, e.g., cease publication is marked as “formal”, a leisure centre is marked as British English (“BrE”), alternative spellings are provided (a gruelling schedule BrE versus a grueling schedule AmE), and so on. In addition, lemmas which are included in the Academic Word List (AWL) (Coxhead 2000) are labelled accordingly (“Ac”). The AWL includes relatively high-frequency words in academic texts.39 In fact, a feature that singles out the LCDT is the Academic Collocation List (ACL) that appears at the end of the dictionary. It contains the 2,469 most common academic collocations found within the Pearson International Corpus of Academic English, as exemplified by political arena, become blurred, (a) vast / wide array of, communicate effectively, (be) widely dispersed, etc.40 The on-line version includes the contents of the print edition plus some additional collocations, synonyms, British and American pronunciation of lemmas, and a ‘study centre’ page (subscription required). The printed version comes with a year’s subscription to the on-line version of the dictionary. In the case of Spanish, there are only two corpus-driven collocation dictionaries: REDES. Diccionario combinatorio del español contemporáneo [REDES] (Bosque 2004) and its by-product, the Dic cionario combinatorio práctico del español contemporáneo [PRÁCTICO] (Bosque 2006). Both are second-generation collocation dictionaries which include frequent and usual collocations and other selectional restrictions. They are based on a large corpus of 250 million words of newspaper texts in Latin American and European Spanish. 39

40

Both OCD and MCD also include a substantial proportion of lemmas that are present in the AWL (13.5% and 16.2% respectively, according to Coffey 2011: 336–338), although such lemmas are not explicitly marked nor a list of academic collocations is provided, as in the case of LCDT. However, microstructural information as regards levels of formality is not always accurate. Quite often readers are left to their own devices with contradictory or incomplete information. For instance, while the verb agree is to be found in the AWL and typical academic collocations are generally agree and strong ly agree (cf. ACL), agree wholeheartedly is glossed as “(=agree completely – more formal)” under the entry for agree in the main part of the dictionary.

Collocation dictionaries for English and Spanish: the state of the art

197

In the extensive front matter of REDES there is neither explicit mention of the statistical measures selected nor of any additional information about the (implied) distributional model of linguistic analysis in place. Lexicographers seem to have been free to use corpus data to write the entries, extract illustrative examples (documented examples) and provide intuitive ones when necessary (undocumented examples). In the case of PRÁCTICO, lexicographers have written learner-oriented examples based on corpus data. Contrary to mainstream lexicographic practice for collocational dictionaries, REDES provides the bases (usually nouns) under the collocate entries or, in this case, predicates (cf. MCD and the on-line version of LCDT). In line with second-generation, corpus-driven collocation dictionaries, bases are then grouped into semantic and conceptual classes. The rationale lies in the linguistic theory which underpins the dictionary. Collocates are conceived as predicates which select their arguments (the bases), in the same way they select their actants and grammar patterns. Emphasis is laid on the notion of lexical classes and the restrictions on lexical insertion and the directionality of lexical selection, which goes beyond the traditional selection restrictions of phrase structure and case grammars. En Redes se exploran las combinaciones frecuentes tomando como punto de partida los criterios semánticos que permiten agrupar conceptualmente estas voces, y muy especialmente las formas en las que los predicados restringen a sus argumentos.41 (Bosque 2004: lxxxvi).

Unlike the English collocation dictionaries, REDES has not been designed primarily as a learner’s dictionary. It is in fact a sophisticated combinatorial dictionary of Spanish designed for both production and comprehension. The dictionary offers two main types of entries: (i) analytical entries (entradas analíticas) and (ii) abbreviated entries (entradas abreviadas). To put it in simple terms, analytical entries are for selecting words (bases) and abbreviated entries are for selected words (collocates or predicates) (Bosque 2004: xxxviii). Analytical 41 ‘Redes explores frequent word combinations; the starting point is the semantic criteria which allow conceptual groupings of words, especially the way arguments are restricted by their predicates.’ (Our translation).

198

Gloria Corpas Pastor

entries are long entries that contain the lemma (word or sequences of words), its grammatical category and a subentry. The subentry provides general combinatory information, different meaning senses of polysemous words and the list of arguments (words or sequences of words) ordered by frequency. Frequency of argument words is indicated by means of diacritic marks: ++ (very frequent combination), + (fairly frequent combination), –– (relatively frequent). No mark indicates a hapax legomenon which the lexicographer has nevertheless considered to be a usual combination (e.g., optimismo ciego). Argument words which encompass different semantic groups are further divided into lexical classes (indicated by capital letters). This type of semantic ordering provides a clear and informative picture of the figurative semantic extensions of the lemmas or predicates. Finally, cross-references to other semantically close words are given at the end of the entry (see excerpt entry for ciego, ‘blind’, in Table 11).42 Table 11: Excerpt of analytic entry for ciego in REDES. ciego adj. ▌Se aplica a las personas y a los animales. En el sentido de ‘obstruido’, sin aberturas o cegado’ acepta sustantivos que denotan canal, orificio o cavidad (conduc to, canal, tubo, agujero, orificio, pozo) y otros que designan algunos elementos arquitectónicos que pueden carecer de abertura (arco, vano, pared, muro, ventana, puerta). En el sentido de ‘radical, cerrado a razones’ se combina con sustantivos que designan creencias, ideologías o posturas que se consideran extremas (conservadurismo, na cionalismo, fanatismo, intolerancia). Se combina asimismo con… A. sustantivos que denotan convicción o seguridad en algo, especialmente en que el futuro sea favorable. 1 fe ++: El deseo de modernidad lleva a una fe ciega en la educación experimental. rel011096. 2 confianza ++: En ambos he encontrado una confianza ciega hacia mí y lo que yo puedo contribuir en el museo. epe1312201. 3 esperanza +: …y siempre a la caza de una esperanza ciega y furtiva. epe290699 4 optimismo: …viene a confirmar que los temores de las fuerzas políticas de oposición que critican el «optimismo ciego» de las autoridades. eme280996. B. sustantivos que denotan inclinación, a menudo vehemente, hacia algo o alguien: 5 pasión 6 amor + 7 admiración + 8 reconocimiento 9 entusiasmo. C. sustantivos que denotan solidaridad, adhesión o defensa de alguna persona o alguna causa: 10 fidelidad + 11 lealtad 12 apoyo 13 respaldo 14 adhesión 15 apología -

42

Due to lack of space, examples have been included only for the first lexical class (A). Classes D, E and F have been omitted altogether.

Collocation dictionaries for English and Spanish: the state of the art

199

Abbreviated entries are generated automatically from analytical entries. Lemmas are not listed twice: they appear either in abbreviated entries or, else, in analytical entries. Abbreviated entries list collocates, i.e. arguments under predicates and vice versa. As in most collocational dictionaries, abbreviated entries provide just a list of words which combine frequently with the lemma, ordered first by senses (with glosses when necessary), and then by grammatical categories. There are five types of abbreviated entries in REDES: (i) cross-references to terms (referencias cruzadas a las voces), (ii) cross-references to concepts (referencias cru zadas a los conceptos), (iii) entries of the conceptual index (entradas del índice conceptual), (iv) abbreviated series (series abreviadas) and (v) cross-references (remisiones). Words with superscripts cross-refer to the corresponding analytic entries where they are listed and to their position in them. For instance, in Table 12 (left column), “cándido25” indicates that fe is in position number 25 under the entry for cándido (‘innocent, naïve’). Cross-references to concepts are marked by superscript letters which identify lexical classes in the analytical entries, as can be seen for muestra (‘sample’) in Table 12. Finally, cross-reference (in square brackets) are used to refer to a phrase (collocation or idiom) which is listed in an unexpected alphabetical order, as in the entry for largo (‘long’): “[largo] → de largo, de tiros largos, largo y tendido.”

200

Gloria Corpas Pastor

Table 12: Abbreviated entry for fe and conceptual entry for muestra in REDES. fe ♦ absoluto, apasionado, ardiente, cándido25, ciego1, del carbonero, entusiasta, escaso, ferviente54, fervoroso, ilimitado, imperecedero, inalterable, incondicional15, inconfesado, infundado, inquebrantable8, profundo, total, vivo ♦ sin menoscabo (de)11 ♦ abjurar (de)1, abrazar, abrigar18, adherirse (a)7, afianzar(se)52, alcanzar, alimentar57, arraigar (en alguien), brotar, compartir, conservar35, cuatear(se), dar197, defender, defraudar11, difundir(se)115, erosionar, extinguirse(se)46, fortalecer(se)39, infundir27, inspirar, irradiar8, manifestar, mantener, mellar(se)2, minar, perder, perseverar (en)18, predicar6, profesar2, quebrantar42, quebrar(se), reavivar25, recobrar, recuperar, refugiar(se), renovar, socavar50, tener, traicionar. □ Véase también: buena fe, certeza, confianza, creencia, seguridad.

muestra ♦ (sustantivos) Véase: aleccionadorD, a título deA, cúmulo (de)H, descarnadoC, detectara, efusivoE, itineranteA, llamativoD, palpableA, prodigarA, reveladora, rotundoG ♦ (verbos) Véase: a lo lejosB, de refilónG, elocuentementeB, profusamenteE □ Véase también: demostración, prueba.

It is worth pointing out the similarities of the approach in REDES with the systemic concept of meaning by collocation and the Sinclairean notion of extended unit of meaning (cf. section 3). First, lemma senses are divided according to lexical classes or semantic groupings. Secondly, both predicates and arguments are in fact cores, e.g. word or words which are invariable in their components, for instance the lemmas correo (‘post’) and correo electrónico (‘e-mail’). Lemmas make explicit selection of their valency in a way similar to colligation or grammatical collocations (cf. the lemma consumirse (de)). Frequent and/or habitual word combinations of predicates and arguments can be considered a collocation (collocates and nodes). Thirdly, predicates which share common semantic features form lexical classes with selection restrictions, in the same way that semantic preference designates restriction of regular co-occurrence to

Collocation dictionaries for English and Spanish: the state of the art

201

words which share a common semantic feature. For instance, ciego collocates with words which denote ‘feelings (and actions out) of hatred, wrath, aggressiveness, etc.’ (semantic class D), whose actual instances could be also considered collocates (odio, violencia, ira, enojo, furia, furor, desesperación and temor). And finally, the semantic prosody (overall communicative function) of ciego for semantic class D would be that of ‘in a high degree, usually in a negative sense’ (cf. LF Magn). PRÁCTICO is a conceptually simplified version of REDES with a more straightforward pedagogic orientation. The analytic and abbreviated entries in REDES have been merged in simple entries (entradas simples) in the DCP. Lemmas are again words or sequences of words (e.g. fuego, ‘fire’, and alto el fuego, ‘ceasefire’) from REDES or newly added. There is no explicit reference to semantic classes, although they form the basis for the implicit semantic ordering within the entries which is marked by the symbol ǁ (see oponerse, ‘oppose’, Table 13). As a novelty, simple entries include a section for phrases and usage notes, as well as definitions when needed (see pincho, ‘bite’, Table 13). Generic entries (entradas genéricas) are another new feature of PRÁCTICO. They serve the purpose of placing together in one single entry the combinatory properties shared by several lemmas which belong to the same lexical field. Lemmas in generic entries appear in capital letters (cf. the entries for día, mes, religion or título nobiliario). As in REDES, there are also cross-references (remisiones) with identical purpose, cf. “[libro] → como un libro abierto; libro”.

202

Gloria Corpas Pastor

Table 13: Simple entries for oponerse and pincho en PRÁCTICO. oponerse v. ● con advs. absolutamente • totalmente • por activa y por pasiva Nos hemos opuesto por activa y por pasiva a ese descabellado plan • por completo • diametralmente • de plano • radicalmente • rotundamente • en redondo Mis padres se oponían en redondo a que saliera con ella • terminantemente • a toda costa • categóricamente • con firmeza • con rotundidad • duramente • con todas {mis tus sus} fuerzas Los trabajadores se opusieron con todas sus fuerzas al cierre de la fábrica • decidamente enérgicamente • firmemente • férreamente • drásticamente • vigorosamente ǁ abiertamente • frontalmente • lisa y llanamente • clamorosamente ǁ ardientemente • fervientemente • visceralmente • abruptamente ǁ con matices • moderadamente ǁ por sistema • sin fundamento ǁ por escrito • verbalmente • oficialmente

pincho s.m. ▌[trozo de comida que se toma de aperitivo] ● con adjs. de tortilla A media mañana siempre toma un pincho de tortilla ● con susts. bar (de) ǁ merluza (de) • pescadilla de ● con vbos. comer • tomar □ expresiones pincho moruno [carne troceada y ensartada en una varilla]

5. Conclusion In this paper we (i) presented an up-to-date survey of available collocation dictionaries for English and Spanish, (ii) provided an overview of the underlying approaches to collocation in those dictionaries, and (iii) put forward a tentative classification based on the degree of corpus involvement: standard, corpus-based and corpus-driven collocation dictionaries. The semantic approach to collocations is represented by Hausmann’s theories of grammar patterning and semantic dependency

Collocation dictionaries for English and Spanish: the state of the art

203

within a collocation (base/collocate), as well as Mel’čuk’s inventory of lexical functions that formalise this semantic dependency in combinatory dictionaries. The distributional approach is also essentially semantic. It was initiated by neo-Firthian linguistics and the contextual theory of meaning. British Contextualism equates ‘meaning’ to meaning by collocation. Corpus-driven research is the path for this kind of semantic discovery. Digging meanings presupposes an underlying extraction method based on frequency, quantitative distribution and statistical significance. Standard dictionaries of collocations do not resort to corpora. Instead, they rely solely on lexicographers’ intuitions to select and present combinatory restrictions or preferences. They are production dictionaries that follow the traditional lexicographic practice of presenting collocates under their bases (SEC, EAC, BBI, LTP, DE). With the advent of corpus linguistics and corpus-based lexicography, collocation dictionaries have gradually started to exploit corpora. Some standard dictionaries have undergone new editions where corpora are used in order to refine lexicographers’ intuition, check data and find suitable examples of use (e.g., the third edition of the BBI); or else, new combinatory dictionaries have been designed on linguistic principles but resort to corpora in order to illustrate usage of collocations, as in the case of Spanish DICE. Those dictionaries are corpus-based and represent a compromise solution between lexicographers’ introspection and corpus-enhanced linguistic analysis. A step further is represented by corpus-driven collocation dictionaries. The first inductive collocation dictionaries where strongly influenced by radical distributional approaches and corpus methodology, akin to mere corpus tools. This is the case of the English DEC and CCEC, which are both first-generation corpus-driven collocation dictionaries. The analysis of large data allows for pattern generalisation and abstraction. Second-generation corpus-driven collocation dictionaries use corpora to unveil combinatory restrictions and detect (new types of) lexical patterns. This type of dictionary provides comprehensive collocational coverage and frequency of use, as well as semantic enhancement of entries (glosses, groupings, collocational sets, collocational thesaurus, diasystematic information, etc.). This feature turns them into both

204

Gloria Corpas Pastor

production and encoding tools. Most of them have a pedagogic orientation or can be used as learner dictionaries (OCD, MCD, LCDT and PRÁCTICO), although REDES is also a robust tool for linguistic analysis. Second-generation corpus-driven collocation dictionaries have reached a high degree of sophistication. There is, though, some room for improvement, such as systematic information on levels of formality, registers, diatopic variation and so forth. Especially in the case of electronic versions, it would be desirable to offer multiple ways of access to data, include links to the actual corpora, or provide users with modular dictionaries that could be “customised” according to their needs. Finally, second-generation corpus-driven collocation dictionaries that exist for both languages could serve as a valid starting point for compiling a bilingual dictionary of collocations. Even more so when most of them appear to have used the same software (Sketch Engine) at some point. Bilingual word sketches could provide invaluable additional data. The time is definitely ripe.

Acknowledgements The research presented in this paper has been partially carried out in the framework of research projects Expert (317471-FP7-PEOPLE-2012ITN), Inteliterm (FFI2012-38881) and Termitur (HUM2754).

Bibliography Monographs and articles Alexandrova, Olga / Ter-Minasova, Svetlana 1987. English Syntax (Collo cation, Colligation and Discourse). Moscow: University of Moscow.

Collocation dictionaries for English and Spanish: the state of the art

205

Alonso Ramos, Margarita 1993. Las funciones léxicas en el modelo lexicográfico de I. Mel’čuk. PhD Dissertation. Madrid: UNED. Alonso Ramos, Margarita 2003. Hacia un Diccionario de colocaciones del español y su codificación. Lexicografía computacional y semántica. 63, 11–31. Alonso Ramos, Margarita 2010. No importa si la llamas o no colocación, descríbela. In Mellado, Carmen et al. (eds) Nuevas perspectivas de la fraseología del S. XXI. Berlin: Frank & Timme, 55–80. Barnbrook, Geoff / Mason, Oliver / Krishnamurthy, Ramesh 2013. Col location: Applications and Implications. Basingstoke: Palgrave Macmillan. Bartsch, Sabine 2004. Structural and functional properties of collocations in English. A corpus study of lexical and pragmatic constraints on lexical co-occurrence. Tübingen: Narr. Barrios Rodríguez, María A. 2007. Diccionarios combinatorios del español: diferencias y semejanzas entre Redes y Práctico. RedELE. Revista Electrónica de Didáctica del Español como Lengua Ex tranjera. 11, 1–14. Benson, Morton 1999. The Structure of the Collocational Dictionary. International Journal of Lexicography. 2/1, 1–3. Buendía Castro, Míriam / Faber, Pamela 2014. Collocation Dictionaries: a Comparative Analysis. MonTI. 6, 203–235. Coffey, Stephen 2011. A New Pedagogical Dictionary of English Collocations. International Journal of Lexicography. 24/3, 328–341. Corpas Pastor, Gloria 2001. Corrientes actuales de la investigación fraseológica en Europa. Euskera. 46/1, 21–49. Corpas Pastor, Gloria 2013. Detección, descripción y contraste de las unidades fraseológicas mediante tecnologías lingüísticas. In Olza, Inés / Manero, Elvira (eds) Fraseopragmática. Berlin: Frank & Timme, 335–373. Corpas Pastor, Gloria 2015a (in press). Register-specific Collocational Constructions in English and Spanish: a Usage-based Approach. Journal of Social Sciences. 11 (3). 139-151 ISSN: 1549-3652 (print) / ISSN 1558-6987 (on-line). http://thescipub.com/abstract/10.3844/jssp.2015.139.151 Corpas Pastor, Gloria 2015b. Translating English Verbal Collocations into Spanish: on Distribution and other Relevant Differences

206

Gloria Corpas Pastor

related to Diatopic Variation. Lingvistice Investigationes. 38:2 (2015), 229–262. Special Issue "Spanish Phraseology. Varieties and variations" (ed. P. Mogorrón Huerta). DOI 10.1075/li.38.2. 03cor. Corpas Pastor, Gloria 2016 (in press). Collocations in e-Bilingual Dictionaries: from Underlying Theoretical Assumptions to Practical Lexicography and Translation Issues. In Torner, Sergi / Bernal, Elisenda (eds) Collocations and other lexical combina tions in Spanish. Theoretical and Applied Approaches. Chicago, IL: Ohio State University Press (“Theoretical Developments in Hispanic Linguistics”). Cowie, Anthony P. 1981. The Treatment of Collocations and Idioms in Learner’s Dictionaries. Applied Linguistics. 2/3, 223–235. Coxhead, Averil 2000. A New Academic Word List. TESOL Quarterly. 34/2, 213–238. Ferrando, Verónica 2013. El tratamiento de las colocaciones en la lexicografía española y alemana: estudio contrastivo. Revista Inter nacional de Lenguas Extranjeras. 2, 31–53. Firth, John Rupert 1957. Papers in Linguistics 1934–1951. London: Oxford University Press. Firth, John Rupert 1968. Linguistic Analysis as a Study of Meaning. In Palmer, Frank Robert (ed.) Selected Papers of J. R. Firth 1952– 59. Londres/Harlow: Longmans, 12–26. Gelbukh, Alexander / Kolesnikova, Olga 2013. Semantic Analysis of Verbal Collocations with Lexical Functions. Springer (“Studies in Computational Intelligence”, vol. 414). Greenbaum, Sidney 1970. Verb-intensifier collocations in English: an Experimental Approach. London: Longman. Greenbaum, Sidney 1974. Some verb-intensifier collocations in American and British English. American Speech. 49/1–2, 79–89. Greenbaum, Sidney 1988. Good English and the Grammarian. London: Longman. Halliday, M.A.K. 1966. Lexis as a Linguistic Level. In Bazell, Charles E. et al. (eds) In Memory of John Firth. London: Longman, 148–62. Hanks, Patrick 2013. Lexical Analysis. Norms and Exploitations. Cambridge, MA/London: The MIT Press.

Collocation dictionaries for English and Spanish: the state of the art

207

Hausmann, Franz Josef 1979. Un dictionnaire des collocations est-il possible? Travaux de Linguistique et de Litterature. 17/1, 187–195. Hausmann, Franz Josef 1985. Kollokationen in deutschen Wörterbuch: Ein Beitrag zur Theorie des lexikographischen Beispiels. In Bergenholtz, Herbert / Mugdam, Joachim (eds) Lexikographie und Grammatik: Akten des Essener Kolloquims zur Grammatik in Wörterbuch 28.–30.6.1984. Tübingen: Max Niemeyer, 118– 129 (“Lexicographica Series Maior”, vol. 3). Hausmann, Franz Josef 1989. Le dictionnaire de collocations. In Hausmann, Franz Josef et al. (eds) Wörterbücher. Dictionaries. Dictionnaires. Ein internationals Handbuch zur Lexikographie. An International Encyclopedia of Lexicography. Encyclopédie internationale de lexicographie. Vol. I. Berlin/New York: Walter DeGruyter, 1000–1019. Hausmann, Franz Josef 1991. Collocations in monolingual and bilingual English dictionaries. In Ivir, Vladimir / Kalogjera, Damir (eds) Languages in Contact and Contrast. Essays in Contact Linguistics. Berlin/New York: de Gruyter, 225–236. Hausmann, Franz Josef 1998. O diccionario de colocacións. Criterios de organización. In Ferro Ruibal, Xesús (ed.) Actas do I Colo quio Galego de Fraseoloxía, 15–18 de septiembre de 1997. Centro de Investigacións Lingüísticas y Literarias Ramón Piñeiro: Xunta de Galicia, 63–81. Hausmann, Franz Josef 2007. Die Kollokationen im Rahmen der Phraseologie – Systematsche und historische Darstellung. Zeitschrift Für Anglistik und Amerikanistik. 55, 217–234. Hoey, Michael 2006 [2005]. Lexical Priming. London/New York: Routledge. Hundt, Marianne / Nesselhauf, Nadja / Biewer, Carolin (eds) 2007. Corpus Linguistics and the Web. Amsterdam/New York: Rodopi, 2007. Jones, Susan / Sinclair, John M. 1974. English Lexical Collocations. A Study in Computational Linguistics. Cahiers de Lexicologie. 24, 15–61. Kilgarriff, Adam et al. 2004. The Sketch Engine. In Proceeding from the Euralex Conference. Lorient, France, 105–116.

208

Gloria Corpas Pastor

Kilgarriff, Adam et al. 2014. The Sketch Engine: ten years on. Lexicog raphy: Journal of ASIALEX. 1/1, 7–36. McEnery, Tony / Hardie, Andrew 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press. McGee, Iain 2012. Collocation Dictionaries as Inductive Learning Resources in Data-driven Learning: an Analysis and Evaluation. International Journal of Lexicography. 25/3, 319–361. Mel’čuk, Igor 1973. Towards a Linguistic ‘Meaning Text’ Model. In Kiefer, Ferenc (ed.) Trends in Soviet Theoretical Linguistics. Vol. XVIII. Dordrecht: Reidel, 33–57. Mel’čuk, Igor 1996. A Tool for the Description of Lexical Relations in the Lexicon. In Wanner, Leo (ed.) Lexical Functions in Lex icography and Natural Language Processing. Amsterdam/ Philadelphia: John Benjamins, 37–102. Mel’čuk, Igor 1988. Collocations and Lexical Functions. In Cowie, Anthony P. (ed.) Phraseology. Theory, Analysis, and Applica tions. Oxford: Clarendon Press, 23–53. Mel’čuk, Igor / Clas, André / Polguère, Alain 1995. Introduction à la lexicologie explicative et combinatoire. Lovain-la-Neueve: Duculot/Aupelf-UREF. Mel’čuk, Igor / Pertsov, Nikolai V. 1987. Surface Syntax of English: a formal model within the meaning-text framework. Amsterdam: John Benjamins. Schäfer, Roland / Bildhauer, Felix 2013. Web Corpus Construction. San Francisco: Morgan & Claypool (“Synthesis Lectures on Human Language Technologies”). Siepmann, Dirk 2005. Collocation, Colligation and Encoding Dictionaries. Part I: Lexicological Aspects. International Journal of Lexicography. 18/4, 409–443. Sinclair, John M. 1966. Beginning the study of lexis. In Bazell, Charles E. et al. (eds) In Memory of John Firth. London: Longman, 410–430. Sinclair, John M. 1996. The Search for Units of Meaning. TEXTUS. 9, 75–106. Sinclair, John M. 1991. Corpus, Concordance, Collocation. Oxford/ New York: Oxford University Press.

Collocation dictionaries for English and Spanish: the state of the art

209

Tognini Bonelli, Elena 1991. Corpus Linguistics at Work. Amsterdam: John Benjamins. Vincze, Orsolya / Mosqueira, Estela / Margarita Alonso Ramos, Margarita 2011. An Online Collocation Dictionary of Spanish. In Boguslavsky, Igor / Wanner, Leo (eds) Proceedings of the 5th International Conference on Meaning-Text Theory. Barcelona, 275–286. Walker, Crayton 2009. Dictionaries, Collocational Dictionaries and Dictionaries of Business English. International Journal of Lexi cography. 22/3, 281–299. Wanner, Leo 2007. Selected Lexical and Grammatical Issues in the Meaning-Text Theory: In Honour of Igor Mel’čuk. Amsterdam/ Philadelphia: John Benjamins. Zuluaga Ospina, Alberto 1980. Introducción al estudio de las expre siones fijas. Frankfurt-am-Main/Bern/Cirencester: Peter Lang (“Studia Romanica et Linguistica”, vol. 10). Dictionaries BBI (1986) = Benson, Morton / Benson, Evelyn / Ilson, Robert 1986. The BBI Dictionary of English Word Combinations. 1st edition. Amsterdam/Philadelfia: John Benjamins. BBI (1997) = Benson, Morton / Benson, Evelyn / Ilson, Robert 1997. The BBI Dictionary of English Word Combinations. 2nd. edition. Amsterdam/Philadelfia: John Benjamins. BBI (2009) = Benson, Morton / Benson, Evelyn / Ilson, Robert 2009. The BBI Dictionary of English Word Combinations. 3rd edition. Amsterdam/Philadelfia: John Benjamins. CCEC = Sinclair, John 1995. The Collins Cobuild − English Colloca tions on CD-ROM. London/Glasgow: HarperCollins. DE = Boneu, Javier 2001. Diccionario Euléxico para expresarse con estilo y rigor. Barcelona: Ed. Juventud. DEC = Kjellmer, Göran (ed.) 1994. A Dictionary of English Colloca tions. Based on the Brown Corpus. 3 vols. Oxford: Clarendon Press.

210

Gloria Corpas Pastor

DECFC = Igor Mel’čuk, Igor et al. 1984–1999. Dictionnaire explicatif et combinatoire du français contemporain (DECFC). Recherches lexico-sémantiques I, II, III, IV. Montréal: Les Presses de l’Université de Montréal. DICE = Alonso Ramos, Margarita (ed.) 2004: Diccionario de Colo caciones del Español [on line]. La Coruña: Universidad de La Coruña. . EAC = Kozłowska, Christian Douglas 1991. English Adverbial Collo cations. Warsaw: Wydawnictwo Naukowe. LCDT = Mayor, Michael 2013. Longman Collocations Dictionary and Thesaurus. Pearson Longman. LTP = Hill, Jimmie / Lewis, Michael (eds.) 1997. LTP Dictionary of Selected Collocations. Hove: Language Teaching Publications. MCD = Rundell, Michael 2010. Macmillan Collocations Dictionary for Learners of English. Oxford: McMillan Publishers Limited. OCD (2002) = Crowther, Jonathan / Lea, Diana / Dignen, Sheila 2002. Oxford Collocations Dictionary for Students of English. Oxford: Oxford University Press. OCD (2009) = McIntosh, Colin /Francis, Ben /Poole, Richard 2009. Oxford Collocations Dictionary for Students of English: A cor pus-based dictionary with CD-ROM which shows the most frequently used word combinations in British and American Eng lish. Oxford: Oxford University Press. ODCIE = Cowie, Anthony P./ Mackin, Ronald / McCaig, Isabel R. 1983. Oxford Dictionary of Contemporary English, vol. II. Oxford: Oxford University Press. ODEI = Cowie, Anthony P./ Mackin, Ronald / McCaig, Isabel R. 1993. Oxford Dictionary of English Idioms. Oxford: Oxford University Press. PRÁCTICO = Bosque, Ignacio (dir.) 2006. Diccionario combinatorio práctico del español contemporáneo. Madrid: Ediciones SM. REDES = Bosque, Ignacio (dir.) 2004. Redes. Diccionario combinato rio del español contemporáneo. Madrid: Ediciones SM. SDC = Benson, Morton / Benson, Evelyn / Ilson, Robert 1999. Student’s Dictionary of Collocations. Berlin: Cornelsen. SEC = Dzierzanowska, Halina / Kozłowska, Christian Douglas 1982. Se lected English Collocations. Varsovia: Panstwowe Wydawnictwo Naukowe.

Laura Giacomini

Defining collocations for lexicographic purposes. A matter of boundaries and arrangement

Abstract: This paper investigates the role that pre-definition of the concept of collocation may have in the process of compiling a special dictionary focussing on this linguistic phenomenon. This issue will be explored referring to existing and well-known collocation dictionaries in different languages, in particular to the model of an electronic dictionary of Italian collocations described in Giacomini (2012). After briefly surveying the mainstream definitions of collocation and their impact on applied linguistics in general, the paper concentrates on the steps of the lexicographic process (identification of the dictionary object and function according to the ideal user, corpus-based collocation selection and macrostrostructural arrangement, and determination of microstructural properties) to assess the extent to which each of them may require the immediate availability of a clear-cut concept of collocation. The conclusion can be drawn on the basis of these considerations that a coherent definition of collocation is mostly required at the stage of collocate selection, and that this definition, while possibly grounding on extant theoretical assumptions, should be praxis-oriented and functional to the user’s needs and the targeted usage situations. Keywords: collocation, definition of collocation, functional definition

1. Introduction As is the case with other linguistic phenomena, especially those with a recent history of theoretical examination, collocations have been pinned down in several definitions that, as far as they may be plausible and well grounded in theory, can at times collide with the practical needs of applied studies. Lexicography is probably the field of applied linguistics that has most promptly and vigorously reacted to the theoretical input provided by the systematic study of collocations since the 50s, striving to make the phenomenon of collocation available to the wider

212

Laura Giacomini

community of language end users. Several definitions of collocation have been formulated since the Firthian vivid image of words keeping each other company. However, lexicography does not seem to have openly and entirely committed to only one of them. Dictionary introductions and guides to the usage primarily address the language learner and have always set great value upon a comprehensible, user-friendly explication of the dictionary object, introducing collocations as, for instance, “grammatical and lexical recurrent word combinations” (BBI: vii), “fixed, identifiable, non-idiomatic phrases and constructions” (BBI: ix), “the way words combine in a language to produce natural-sounding speech and writing” (OCD: vii), “combinaisons de mots […] plus probables, plus naturelles que d’autres” (RCM: v), “words [which] typically combine with each other […], to form natural-sounding chunks of language” (MCD: ix) or “words which seek each other out and gather together to form a coherent set” (DCI: v). Yet, given its purpose and form, the user-oriented explication is not necessarily meant to fully reproduce, in its contentual and formal implications, the definition that substantiates the actual elaboration of lexicographic data. This paper discusses the role of theoretical, often deductively originated definitions of collocations in the practical creation of a lexicographic resource. This issue will be explored referring to existing collocation dictionaries in different languages, in particular to the model of an electronic dictionary of Italian collocations described in Giacomini (2012). However, many of the considerations that will be here made can be applied to general monolingual and bilingual dictionaries providing basic or extensive coverage of this linguistic phenomenon. Section 2 is devoted to the identification of the relation between the mainstream definitions of collocation and the subfields of applied linguistics. Section 3 ushers in the treatment of collocations in lexicography and analyses the role a definition of collocation may play at different stages of the lexicographic process: the characterisation of the dictionary user, object and function (section 3.1), the corpus-based data selection with initial considerations on a working definition of collocation (section 3.2) and the microstructural data representation (section 3.3). Section 4 provides the conclusions of the investigation, answering the key question concerning the role of a definition of collocation in lexicography and,

Defining collocations for lexicographic purposes

213

from this perspective, suggesting a summary assessment of state-ofthe-art collocation dictionaries.

2. Collocations in applied linguistics: handling a multifaceted phenomenon According to most publications which provide a detailed account of the different theoretical approaches to collocation (Klotz 2000; Bartsch 2004; Jehle 2007; Bergenholtz 2008; Fuertes-Olivera et al. 2012 inter alia), widespread definitions of collocation essentially arose from investigations in different linguistics subdisciplines and followed distinct paths that only at times met. The attention paid for the first time to meaningful syntagmatic relations by important forerunners like Porzig in the 1930s and Coseriu in the 1960s, which is not regarded as a theory on collocability in its current sense, moved much later into the path of lexical semantics and phraseology, with special attention paid to foreign language acquisition and teaching. The essential concept of collocation developed by scholars like Benson, Cowie and Hausmann shaped this theoretical current and was largely informed by the practical need to introduce the newly identified phenomenon to language learners and, more generally, dictionary users. These significance-oriented approaches are largely distinguished from the views of authors like Halliday and Hasan, who analysed the potential of collocation on the textual level (cf. Jehle 2007: 9–38). A wider approach to collocation is also the one proposed by Mel’čuk in the Meaning-Text Theory, which accounts for deep semantic, syntactic and phonological associations between words building syntagmatic or paradigmatic relations that can be interpreted by means of restricted sets of language-independent Lexical Functions (Mel’čuk 1996). This approach aims at lexicographic description through explanatory combinatory dictionaries, but its practical reception has been so far quite limited: a good example of lexicographic application of Lexical Functions is the online DICE, Diccionario de colocaciones del Español, developed at the Universidade da Coruña (still under construction).

214

Laura Giacomini

The other mainstream theoretical perspective on collocations, though, is statistically oriented, and takes its inspiration from the British Contextualism of Firth, generally considered the pioneer of the concept of collocation, even though he never built a systematic theory around the concept. Undoubtedly, Firth’s idea of a “meaning by collocation”, i.e. that part of the meaning of a word is given by its habitual collocation with co-occurring items (Firth 1957: 195), contributed towards development of a semantic-based approach. However, his view of sets of words (not necessarily two, and not necessarily lexical words) habitually accompanying each other in syntagmatic chains substantially laid the foundations of the successful statistical (or, subsequently, computational) tradition in collocation studies initiated and championed by Sinclair and Kjellmer, who target a formal and quantitative description of recurring word combinations by observing the statistical behaviour of a given node within a certain collocational span. The computational path is closely related to rapid and exciting developments in corpus linguistics, which offer the opportunity of thoroughly investigating collocation in real language usage. This discipline, which has flourished in the wake of the rising interest in automated data extraction, has actually been able to account for both major approaches, the statistically- and the significance-oriented, not least due to the growing functionality of existing corpus query systems. The most common approach to collocation in modern corpus linguistics is well described in McEnery/Hardie (2012: 123) as follows: “most linguists today agree that the only way to reliably identify the collocates of a given word or phrase is to study patterns of co-occurrence in a text corpus”. As a consequence, a collocation is defined as a “co-occurrence pattern observed in corpus data”, or, in the summarising words of Fellbaum (2007: 8), “the regular and statistically discernible co-occurrence of words that can be expressed quantitatively”. The practice of introspection still adopted by lexicographers and other linguists to assess the collocational relevance of extracted word combinations is looked at with a certain amount of scepticism. While disagreeing with this opinion, at least as far as it does not differentiate between different goals of corpus analysis, this paper should instead stress the considerable weight that introspection has in the process of lexicographic data selection: acting in the

Defining collocations for lexicographic purposes

215

interests of the dictionary user, the lexicographer needs to evaluate and filter automatically extracted candidates, not only to identify collocations per se but also the most appropriate collocations for the intended usage situations (cf. section 3.2.1). Although significance-oriented approaches to the theory of collocation, with their special attention paid to phraseology, have been developed in a close relationship with the fields of foreign language acquisition/teaching and lexicography, the practice of applied linguistics reveals a general difficulty in committing to a single theoretical approach. Lexicography, as well as terminography and translation studies, seemingly benefit from hybrid approaches to collocations since they have to come to terms with extremely concrete needs, such as the lexicographic representation addressing a particular user group, data and access routes (Fuertes-Olivera et al. 2012: 304), the identification of collocations belonging to specialised languages, or a contrastive approach to the study of language- and culture-specific word combinations, which is highly relevant in translation. An approach which indeed has been more pervasive than others and has informed both dictionary introductions and the lexicographer’s work is the view of collocations as syntactically and semantically relevant binary co-occurrences made up of a semantic core, the base, and a specifying lexical element, the collocate (cf. section 3.3).

3. Which collocations in lexicography? 3.1 Each to their own: ideal users, lexicographic objects and functions The question regarding whether the pre-defined theoretical classification of the dictionary object (i.e. the phenomenon of collocation) is necessary to the lexicographer to shape the dictionary-making process or if it is ultimately this process that determines the theoretical orientation of the dictionary, cannot be answered without taking into account the role of the dictionary object and function, and the stages of

216

Laura Giacomini

the lexicographic work which might require the presence of a clear-cut definition of collocation. Along the standard typological classification based on the dictionary subject matter, collocation dictionaries can be classified as mono-informative syntagmatic special dictionaries (cf. WBLEXWF: 205–207). They target a linguistic phenomenon at the interface between syntax and semantics, and usually address the non-native speaker who wishes to be supported in the production of natural-sounding word combinations in the foreign language. However, resources offering an advanced-level language coverage can be equally helpful to the native speaker for competence control and passive knowledge activation purposes. The monofunctional orientation of collocation dictionaries aims at text production inside or outside translation and is closely linked to the speaker’s ability to use appropriate combinatorial patterns to name specific entities. The dominant situation of potential usage is therefore a communicative situation, although cognitive situations, both sporadic and systematic, should also be taken into account, for instance in the case of non-native speakers consulting a dictionary in order to acquire knowledge about collocations in a foreign language (Tarp 2008: 45–50; Fuertes-Olivera et al. 2012: 294). Idiomaticity of communication as the primary goal of data representation, as well as the ideal user they wish to address (i.e. mainly language learners), is what existing collocation dictionaries seem to have in common across languages, despite all structural differences. Meanwhile, the widely recognized usefulness of this shared orientation clearly places the preferred view of collocations in lexicography within the field of phraseology. As already mentioned in the previous section, the concept of collocation being part of the complex family of phraseological expressions has turned out to be a feasible lexicographic option and has consequently been applied to most dictionaries published in the last three decades. The latest developments in collocation description generally show a stronger tendency towards the treatment of lexical rather than grammatical collocations, often giving up the sharp distinction that had guided the compilation of the BBI (ix–xxviii). Hybrid solutions clearly attach greater importance to lexical combinations, which constitute the

Defining collocations for lexicographic purposes

217

real dictionary object, but try to account for significant grammatical constructions as well highlight them, for instance, in lexicographic examples (cf. MCD). The dictionary model proposed with the help of a case study on the semantic field paura (fear) in Giacomini (2012) takes up the phraseological approach to collocation and concentrates on the peculiar needs of professional translators with Italian as their target working language. Language proficiency at native or non-native level is essential for the adequate use of this type of resource, the objects of which are general language typical word combinations covering different pragmatic areas. The initial process of shaping a lexicographic object and function according to the language competence and needs of the ideal user did not require the immediate availability of a well-defined concept of collocation, and was empirically determined by the first-hand experience of the lexicographer as a native speaker and translator. A word should be said at this point about the notational terms currently employed in the titles of collocation dictionaries. The coexistence of different terms such as collocations and combinations since the publication of the first special dictionary, the BBI, may reflect on the terminological level the fuzzy boundaries of the concept itself and, of course, of its specific usage in lexicography. The BBI, which called itself a combinatory dictionary, provided three equivalent terms referring to the same phenomenon, namely recurrent combinations, fixed combinations and collocations (ix). Parallel notations have been similarly adopted in other languages (cf., among others, Italian col locazioni / combinazioni lessicali, or French combinaisons de mots/ cooccurrences). In considering the aspects of habitual monofunctionality and bidirectionality of the existing collocation dictionaries, the issue of bipartition also deserves serious consideration. With few exceptions (e.g. KolleX, described in Hollós 2013, or Konecny/Autelli 2015), the greatest number of collocation dictionaries is monolingual. On the one hand, many authors point out that the role of collocations in our mental lexicon is most likely to be appreciated if a direct comparison with another language takes place (DCI 2013: 5–6; Hausmann 1999: iv–v). As the BBI straightforwardly put it in 1986, “knowledge of other languages

218

Laura Giacomini

is normally no help in finding English collocations” (vii), but, for sure, it is the necessary empirical premise for a full awareness of their existence. On the other hand, a contrastive analysis which leads to a bilingual lexicographic description of collocations, and which aims at going beyond the systematic alignment of false friends, inevitably sharpens the issue of tracing boundaries between different phraseological types and between phraseological and non-phraseological combinations. From a translational perspective, it is important to stress that the distribution of typologically different phraseologisms may strongly vary from one language to another and, as a consequence, a lexicographic resource should not suggest that collocations or other multiword expressions should be exclusively translated by means of analogous phraseological combinations in the target language. 3.2 Building the macrostructure: suitable corpora for suitable collocations 3.2.1 Data selection After identification of the ideal user group, as well as the dictionary object and function, the next practical concern of the lexicographer is related to the criteria which need to be satisfied to select relevant collocations and the way in which an appropriate distribution on the macrostructural level can be obtained (Butina-Koller 2005: 25). The current availability of large-scale digital text collections offers the lexicographer the opportunity of extracting a vast amount of data but, at the same time, challenges him with the urgent problem of data evaluation and selection. A first selection, however, concerns the type of corpus which is to be used as the primary source of lexicographic data. The process of designing a corpus first of all needs to ensure maximum representativeness of the chosen sample, for instance in terms of language variety, text type, genre, domain or time span covered. Representativeness is also related to the size of a corpus, and to the way in which the issue of text sampling has been handled, i.e. either by a stratificational or a probabilistic approach (Baker et al. 2006).

Defining collocations for lexicographic purposes

219

The elaboration of corpus data for lexicographic purposes is, generally speaking, a semi-automated process in which the lexicographer uses introspection to decode and refine extracted data, either by interpreting them without referring to existing theories, or by testing linguistic hypotheses on automatically retrieved and refined material (cf. Heid 2008: 138–139, and Weller et al. 2011 for details on terminology extraction). This well-established distinction between corpus-driven and corpus-based methods (Tognini-Bonelli 2001) seems to exhibit a tendency towards the latter in collocation lexicography, which reinforces the impression that data selection occurs, if not yet on the basis of a predefined concept of collocation, at least according to a predefined, user-focused dictionary function. The corpus chosen for my case study was a stratified sample corpus of newspaper articles published in major Italian newspapers over a period of 8 years and totaling around 300 million words. Newspaper articles, although stylistically biased as far as words denoting emotions are concerned, have the advantage of providing an up-to-date coverage of language in use, displaying different text types and pragmatic features. Moreover, they are usually far more accessible than other texts and are, together with controlled WaC-samples, one of the most significant data sources for synchronic lexicological and lexicographic studies. The size of the corpus was large enough to allow for a qualitatively and also quantitatively relevant analysis of a subset of nouns belonging to the semantic field paura. The corpus was not annotated, which enormously increased the amount of time spent on non-automated data evaluation, but which, in turn, resulted in a very accurate investigation of the obtained concordances. Due to its considerable complexity, the issue of corpus annotation and its advantages and disadvantages will not be further discussed here. However, it is important to stress that different degrees of annotation, for instance at token, lemma or part-ofspeech level, can differently and more or less efficiently contribute to the identification of relevant word combinations. The case study on the semantic field paura aimed at extracting from the available corpus the collocation candidates of the key nouns paura, timore, fobia, panico, terrore, orrore, ansia, angoscia and spavento.

220

Laura Giacomini

The qualitative data assessment was carried out on a considerable number of collocation candidates that were automatically extracted by means of WordSmith Tools 5.0. The importance of up-to-date corpus query systems offering increasingly specific functions for the retrieval and statistical evaluation of word combinations is crucial for obtaining accurate results from large corpora (Kilgarriff/Kosem 2012: 31–32; Giacomini/Kilgarriff 2015). Even though frequency counts are still one of the key criteria for data selection, especially in the presence of large corpora and when a learners’ dictionary is the intended goal (Heid 2008: 140), statistical measures, in particular association measures based on hypothesis testing, are of great help even in the absence of text annotation, because they contribute towards highlighting significant co-occurrences that could hardly be captured with the sole aid of frequency indications. The log-likelihood ratio (ll) was a suitable measure of dependence for constituents of the collocation candidates, given the high frequency of the analysed substantives in general language and the good results obtained by means of this measure on sparse data (Manning/Schütze 1999: 172–175). Of the over 18,500 collocation candidates extracted for the most frequent noun paura, around 2000 were eventually selected as collocations to enter the dictionary. This is indeed an overwhelming number, if compared to the number of collocates of paura usually recorded in collocation dictionaries of Italian, which hardly exceeds 70 words. The selection method was based on introspection and a detailed comparison between absolute frequency and ll-value attributed to each candidate. The essential criterion behind the final selection needed to be, at this stage, the presence of a concept of collocation which could make it possible for a lexicographer to take advantage of the automated data preprocessing and identify a fairly limited number of combinations fulfilling the requirements of the ideal user. Text (re)production tasks (Gerzymisch-Arbogast 2005) performed in the context of professional translation are often very demanding in terms of semantic, pragmatic and, more specifically, stylistic adequacy and require from the translator a concentrated effort to consult several lexicographic resources as well as online and offline documentation in order to find appropriate, context-specific collocations in the target language. Bilingual as well as monolingual general

Defining collocations for lexicographic purposes

221

language dictionaries usually provide a fairly limited coverage of current word combinations, whereas the most effective support is offered by the combined use of collocation dictionaries and thesauri. The goal of the proposed lexicographic model is, on the one hand, to widen the spectrum of lexical choices provided to the translator with an extensive coverage of syntagmatic and paradigmatic variation (cf. also Klosa et al. 2012: 76) and, on the other hand, to found the macrostructure of the dictionary on onomasiological (i.e. conceptual) principles. This goal can be efficiently achieved only on a digital medium, which is much less subject to space constraints and has the advantage of supporting hypertextual architectures and thus coherent and systematic mediostructures (Giacomini 2011). In the case here exemplified, the conceptual macrostructure significantly rests upon the model of an existing lexical database, WordNet, and involves the representation of the selected nouns within a lexical semantic network, in which each lexeme constitutes a node and is linked to other items through semantic relations such as hypernymy or hyponomy. Positioning a lexical item inside a conceptual network points out the semantic relations also existing at collocational level. In general language, paura takes the role of a hypernym of words such as timore, spavento, or terrore, and in virtue of this semantic relation, a number of collocates of a hypernym can be inherited by the hyponym (this phenomenon has been described as downward inheritance, cf. Gia comini 2012: 270–275), whereas transfer in the opposite direction is much less common. This fact clearly accounts for the rich paradigmatic variation of collocations in general language and is best highlighted by a kind of lexicographic structure in which semantic relations are the criterion underpinning the arrangement of data. In recent times, the semantic modeling of collocation clusters according to inheritance principles has been further investigated. This has been undertaken, on the one hand, by Lemnitzer/Geyken (2014), who start from the theory of Lexical Functions as a central point of reference to develop a lexicographic-oriented model in which regular correspondence patterns between bases and collocates are identified within automatically extracted word profiles (Wortprofile). Giacomini (2014), on the other hand, aims to apply criteria used in monolingual research on emotion concepts to

222

Laura Giacomini

a multilingual environment, taking into consideration languages with relevant cultural similarities (Italian, French, German and English). 3.2.2 A working definition enabling data selection At this stage in the lexicographic process, the availability of a concept of collocation is required for enabling data selection, and the choice of this concept is inevitably affected by all considerations concerning the ideal user and the expected situations of consultation. A working definition might not totally comply with established theoretical definitions, but it is usually introduced to surmount a procedural obstacle and thus answers a very specific purpose. The ‘umbrella term’ collocation, which is often used by linguists to imply different kinds of multiword units, cannot be used for lexicographic purposes, as long as fundamental issues concerning, for instance, the nature, number and positional features of the elements of a collocation, its lexical and syntactic flexibility as well as its non-compositionality remain unanswered (Fuertes-Olivera et al. 2012: 298–299). The functional definition designed in our example (this name deliberately foregrounds the applied role of the definition itself) aims at solving the phraseological problem of drawing boundaries between different units such as collocations, idiomatic expressions and free combinations, avoiding the embarrassment caused by the impossibility of retaining sharp distinctions, and opens up the possibility of recording in the dictionary a considerable number of collocational variants by drawing closer to the computational view of collocation. The definition is, in fact, twofold (Giacomini 2012: 111–115). It includes, on the one hand, a phraseological concept of collocation as a multiword expression subject to no, or restricted, compositionality, substitutability and modifiability. Its syntactic, semantic and pragmatic features are only limitedly inferable from the linguistic features of its components. According to this concept, idiomatic expressions can also be included in lexicographic data but, rather than being physically severed from other collocations, they can be signaled to the dictionary user as, for instance, stable expressions with no syntagmatic or paradigmatic variants (e.g. avere una paura del diavolo/ to have the jitters). The close relation between collocations and idiomatic expressions is best described by the

Defining collocations for lexicographic purposes

223

words of Fellbaum (2007: 10): “collocations are lexical entities consisting of words that tend to be found together; their association is strong enough to make them candidates for ‘fixed’ expressions (though few, if any, truly fixed expressions exist)”. On the other hand, the definition includes a wider concept of collocation as a combination with a high degree of familiarity in our mental lexicon. The components of the combinations behave as a semantic unit and are generally associated with linguistically and culturally typical scenes (or frames, in the sense of the Frame Semantics Theory, cf. Fillmore 1977). Some of these combinations match the same requirements stated in the phraseology-oriented part of the definition, but, on the same grounds, some others might be easily mistaken for free combinations. For instance, paura di volare (fear of flying) is compositional, substitutable and, at least partially, modifiable. Still, this is a highly familiar word combination to the native speaker and could, with good reason, be recorded in a collocation dictionary. A working, functional definition of collocation is ultimately tailored on the practical goals pursued in the lexicographic process, taking, of course, existing theoretical approaches to a specific linguistic phenomenon as a starting point. Whatever its contents may be, this definition allows for coherent data selection, with inclusion or rejection of extracted candidates (Bartsch 2004: 76–77). Once again, however, a substantial difference emerges between the operational purpose of such a definition and the way in which it eventually affects data representation and data accessibility for the addressed user. The continuum of lexical features and restrictions that can be recognized in phraseologisms confronts the lexicographer not only with the issue of establishing the boundaries between different phraseological types, but also, and perhaps foremost, with the decision as to whether such boundaries should be made visible to the dictionary user or if they should simply play a metalexicographic role in the organisation of the macrostructure (L’Homme 2014). Whereas descriptive criteria (Lexical Functions, Frames etc.; cf. section 3.3.2) can play a crucial role in coherently arranging previously selected word combinations and representing them in a user-friendly look, definitory criteria that aim to draw fine-grained semantic and/or syntactic distinctions between

224

Laura Giacomini

phraseological units are less relevant from the point of view of the concrete user’s needs, especially if the lexicographic product is intended for communicative rather than cognitive purposes. Particularly in the case of an expert user like a professional translator, who often needs to tackle production and translation problems related to highly specific contexts, the availability and quality of metalinguistic information about lemmas and collocations is far more significant than physical categorization of the linguistic data themselves. In the field of semantics, proposals for conceptually classifying and defining lexicographic items according to specific theoretical frameworks, for instance central vs. peripheral senses (in conformity with the principles of the Prototype Theory; cf. Geeraerts 2013) or semantic primes belonging to the Natural Semantic Metalanguage (Bullock 2010), can be tentatively applied to the collocation phenomenon. However, the key issue of whether an actual user’s requirements in a particular usage situation would be satisfied or not should be tackled as a separate issue. Generally speaking, debate on e-lexicography has continued to point out the benefits a methodological shift towards onomasiology can produce in lexicography, with the possibility of representing the lexicon as a coherent, network-like structure (cf., for instance, Tutin/Falaise 2013, or Heid 2014). The way in which an onomasiological macrostructure is implemented in a specific dictionary can, in turn, depend on the underlying theoretical approach. 3.3 Microstructural representations of collocations 3.3.1 The interplay of syntax and semantics Since the publication of the BBI in 1986, the procedure applied in microstructural description of collocations has rested, in the first instance, on syntactic principles. Representation of the syntagmatic features of a given headword is indeed the most evident and unequivocal way of identifying its word combinations and, given the usually restricted number of language-specific combinatory patterns, it is also the most time- and space-saving type of classification that can be employed inside a dictionary entry (Klosa et al. 2012: 82). The lack

Defining collocations for lexicographic purposes

225

of explicit notation for the various syntactic patterns, which were only mentioned in the introduction to the dictionary, made the consultation a fairly demanding task for the user. Lexicographic successors such as the OCD greatly increased user-friendliness by adding syntactic tags to each treatment unit in the entry. In the meantime, the role of semantic features has become more and more important in collocation treatment, serving as a second descriptor with much analytical potential. Not only can semantic classification follow syntactic patterning and produce subgroups of semantically related collocates, it can also be effectively employed to distinguish meanings of polysemous headwords, thus enabling a syntactic arrangement of already disambiguated words. This meanwhile popular three-layered model, headword’s meaning disambiguation – syntactic patterns of collocations – semantically related collocates, is being used in the microstructure of most collocation dictionaries and is often supported by special notational choices (e.g. partial or full naming of syntactic structures), as well as typographic and non-typographic devices (e.g. bold characters for collocates or a special symbol preceding each syntactic pattern). The purpose of metalexicographic content is to enhance data accessibility, whereas representation of lexicographic content should aim at reaching a marked degree of data usability, i.e. the possibility of the intended user properly interpreting and fully exploiting any kind of linguistic information contained in the dictionary. Whereas exemplification, with the exception of the BBI, seems to have a similar distribution in older and newer dictionaries, and can be used only for a selected number of collocates, collocation dictionaries of the latest generation sometimes offer additional information on pragmatic features of collocates (cf. RCM and DCI). In this way, the microstructural properties of a collocation dictionary are increasingly tailored to its primary function, with careful attention paid to the essential needs of a user who is performing a text production task. The practical concerns of the lexicographer during this phase of the dictionary-making process are less influenced by the presence of a definition of collocation than by the necessity of efficiently arranging the previously selected material. This statement can be substantiated by referring to the categories base/collocate, which have been briefly

226

Laura Giacomini

mentioned in section 2. These categories, introduced by Hausmann (cf. 1989) in studies on collocation oriented towards didactics and lexicography, are designed to flag the status of semantic correlation occurring between the constituents of a collocation. A base is usually defined as the semantically autonomous, co-creative constituent, which is specified by a collocate. Accordingly, nouns are bases of verbs and adjectives, whereas adjectives serve as bases for adverbs. The two categories are meant to be intuitively distinguishable and, as a matter of fact, this is indeed the case as far as the just mentioned combinatory patterns are concerned. However, the consistent application of this approach turns out to be quite problematic whenever other combinations come into play, for instance non-binary combinations or a prepositional phrase following a noun (cf. the example momento di ansia vs. ansia del momento / moment of anxiety vs. anxiety of the moment, discussed in Giacomini 2012: 154; cf. also Bartsch 2004: 36–37). From the point of view of the end user, the categorisation as bases and collocates can only be functional to the ease of use of the dictionary if it works as a strict, systematic criterion for lemmatisation, i.e. if the user needs to know that a collocation is only to be found in the entry of the base. Without entering into a complex discussion on lemmatisation principles in collocation dictionaries (cf., among others, Bahns 1987, 1994 and Tarp 2008: 253), it is essential to point out that this issue is likely to become redundant in the case of electronic resources, in which data representation benefits from the great space availability and the possibility of hypertextual content linking. 3.3.2 Adding new user-oriented features The vast potential of the electronic medium in terms of microstructural space and mediostructural cross-referencing has been largely exploited in the dictionary model presented in Giacomini (2012). As a matter of fact, as Fellbaum clearly points out, one of the fundamental shortcomings of dictionaries “covering the phraseological component of the lexicon [is that they] typically do not inform about the morphosyntactic and lexical flexibility of most multiword units” (2007: 3). The dictionary model implements, at the level of the entries, the onomasiological

Defining collocations for lexicographic purposes

227

principle outlined in section 3.2 and adds new devices of data description to account for existing collocational variation. Differences in syntagmatic structures strongly affect collocational meanings and, for this reason, the structure of the entries rests upon a surface classification of the syntactic patterns in which the headword realizes its collocations. The collocations of paura and of the other selected words (here N) may belong to one of the following five classes, which represent the basic syntagmatic framework of most Italian nouns: N + PP, N + AP, N + V, N in PP, N in other combinations. The subsequent, deeper description layer is shaped on each of these classes and takes advantage of different semantic or combined syntactic-semantic approaches to further group collocates into smaller sets: •

Thematic roles (primarily Agent/Cause, Experiencer and Beneficiary) have been largely used to sort substantival collocates in a PP following the headword, relational adjectives and the subjects of verbal collocates other than the headword. • Ontology-based clusters of entities (People, Personified entities, Animals; Abstract entities; Natural elements and phenomena; Physical or psychological conditions etc.) have served as specific classifiers of substantival collocates in a PP following the headword and relational adjectives. • Descriptive parameters derived from psychological studies (origin, quality, intensity, duration, and adequacy of emotions) have been employed to distinguish qualitative adjectives. • Grammatical functions (subject, direct object, prepositional complement) have been attached to arguments of verbal collocates. • Aktionsart has also been used to order verbal collocates. Pragmatic labels marking register, style and domain features, as well as a coherent cross-referencing system (signalled by a downward or upward arrow) complete the lexicographic treatment of collocations. Figure 1 shows an excerpt from the microstructure of paura, in particular from the syntagmatic classes N + PP and N + AP.

228

Laura Giacomini

PAURA + PP  CAUSE  ABSTRACT ENTITIES p. del male, dell’inferno, della fine (del mondo) p. del futuro ↓, del domani, dell’ignoto, della novità, delle novità, del cambiamento, dei cambiamenti p. della libertà, della verità  NATURAL ELEMENTS AND PHENOMENA p. dell’acqua ↓, del mare, dei fulmini, dei tuoni, del terremoto ↓, del fuoco, delle frane, delle inondazioni, dell’inquinamento p. del buio ↓, delle tenebre, del vuoto ↓, dell’abisso, dei luoghi [+A] PAURA + AP  QUALITY  INTENSITY, DURATION gran p., grande p. ↓, grossa p., profonda p. ↓, forte p. ↓, vera p. ↓, bella p., p. profonda ↓, p. matta [fig.][coll.], p. pazza [fig.], p. folle [fig.], p. tremenda, p. infinita, p. cieca, p. birbona [scherz.], p. orribile, p. violenta, p. enorme, p. immensa, p. terrificante, p. terribile, p. spaventosa, p. apocalittica, p. angosciosa, p. disperata, p. incredibile, p. maledetta [fam.], p. rotta, p. fantastica, p. da morire, p. mortale, p. indiavolata, p. del diavolo [fig.][coll.], p. dannata [fam.], p. fottuta [pop.][volg.] p. incontrollata, p. incontrollabile ↓, p. incontenibile ↓, p. irrefrenabile, p. invincibile, p. insuperabile, p. controllata p. dilagante, p. crescente, p. serpeggiante, p. sottile p. tangibile, p. palpabile p. improvvisa ↓, p. subitanea p. quotidiana, p. eterna, p. costante, p. ricorrente piccola p., p. modesta, p. leggera, p. debole, p. lieve, p. passeggera Figure 1. – Excerpt from the syntagmatic classes N + PP and N + AP in the entry paura.

This example proves that the multi-layered classification is able to supply a reliable account of syntagmatic and paradigmatic relations, enabling the user to identify and actively use marked and non-marked collocations and their variants. The microstructural treatment satisfies both practical distinction criteria stated in the previously formulated functional definition of collocation, including primarily familiar expressions (e.g. paura del domani) and expressions that are both cognitively familiar and phraseologically motivated (e.g. paura del diavolo). The phraseological orientation of this lexicographic model is clearly demonstrated by the attention paid to different phraseological types. Varying degrees of idiomaticity, in the sense of morphosyntactic and

Defining collocations for lexicographic purposes

229

lexical ‘fixedness’, however, have been covered without the creation of separate collocational subsets, and the focus is rather on the usage labels attached to each multiword unit. This kind of representation aims to reproduce the continuum in the collocational range of a word, i.e. its readiness to form a collocation by combining with other lexemes (Cowie 1981; Meunier/Granger 2008: 58–59), a property which can be actually used to outline any combinatory behaviour, even instances of free combinations. This descriptive approach produces a rich, homogeneous collection of corpus-based relevant word combinations and offers dictionary users a valuable resource for text production. Another promising approach to semantic collocation description comprises the application of frame-like categories. This approach has been explored both in general language (e.g. by Fontenelle 2000; Martin 2008) and in specialised language. In the latter, it has turned out to be a helpful descriptive tool in different (sub)fields of knowledge. In fact, frames can be used to classify typed conceptual units and can be combined with thematic roles to provide comprehensive coverage of complex word combinations, especially in the case of clusters including verbal or deverbal items that identify specific conceptual patterns (cf., among others, Kokkinakis/Gronostaj 2008 for the medical language, and Buendía Castro et al. 2014 for the environmental language). Giacomini and Heid1 apply the same kind of approach to patent law language in German and Italian: for instance, the collocations etwas durch (ein) Patent schützen / tutelare qc tramite brevetto (English to protect sth by a patent) as well as Patentschutz / tutela brevettuale (English patent protection) belong to the same frame PATENT PROTECTION. However, the former can be assigned the thematic role ‘process’, whereas the latter are best described by means of the thematic role ‘goal’. From a lexicographic viewpoint, frames can be integrated into an onomasiological macrostructure at the interface between a superordinate ontology and a subordinate lexical level to which collocations belong. At the same time, frames constitute a useful access structure to lexical items that belong to the same conceptual pattern (for instance, 1

Laura Giacomini and Ulrich Heid: Die Präsentation von Fachkollokationen in einem online-Wörterbuch für die Textproduktion: Beispielsfall Patentrecht Deutsch-Italienisch. Presentation at the conference: mehrWortverbindungen. Kollokationen: Sprachgebrauch & Wörterbuch, Basel University 13.10.2014.

230

Laura Giacomini

in the form of a search filter in e-dictionaries). Studies on specialised language might provide an excellent background for future application tests on general language collocations. The topic of collocations and user-oriented features is also dealt with in recent studies concerning search structures in e-lexicographic resources: the availability of customizable search options that allow for more targeted results may include the possibility of retrieving single collocations or clusters of collocations (cf., for instance, Heid/Prinsloo 2011; Bergenholtz/Bothma 2011; Spohr 2012; Granger 2012).

4. Conclusions Over ten years ago, Bartsch emphasized the unsystematic treatment of collocations in existing dictionaries and complained about the fact that it still left much to ask for (2004: 51), but the present state of lexicography on collocation can be certainly described as rosier: the extensive and effective employment of corpus data, supported by the inception of powerful corpus query tools, as well as the growing attention given to the interaction of ideal usage situations and data representation have generally contributed towards a rise in the quality of collocation dictionaries. While, on the theoretical side of linguistics, the approaches developed in the last few decades still seem to retain their high recognition, collocation lexicography is definitely trying to make up for lost time, is obtaining fairly positive results and, luckily, has still a long way to go to practically exploit, with the technology of the present, all opportunities offered by those fundamental theories. In reminding us of the latest advancements and, at the same time, of the still unaccomplished tasks in electronic lexicography, Rundell/Kilgarriff mention the fact that sometimes, practical hurdles can be cleared with the help of even more practical thinking, and that structural matters should not be handled exclusively in terms of traditional theoretical categories but by acknowledging the real contribution of a word to a communicative situation: “It has become increasingly clear that the meaning of a word in a particular context is closely associated with the specific patterning

Defining collocations for lexicographic purposes

231

in which it appears – where ‘patterning’ encompasses features such as syntax, collocation, and domain information” (2011: 280). The question introduced at the beginning of section 3 as to whether a pre-defined theoretical conceptualisation of collocation is necessary for the lexicographer to describe collocations in use can now be answered by referring to the actual considerations made in this paper. As we have seen, a well-designed concept of collocation plays an essential role in data selection, whereas the implementation of a dictionary entry should be based primarily on the fulfilment of specific user’s needs. From this perspective, the availability of a theoretical framework concerning the treated linguistic phenomenon can only be beneficial insofar as it is subordinated to the applied purposes of lexicography. The presumably existing gap between established theories and practical goals, therefore, can be effectively filled by shaping a functional definition of collocation.

5. References 5.1 Dictionaries BBI = Morton Benson et al. 1986. The BBI Combinatory Dictionary of English. A Guide to Words Combinations. Amsterdam, Philadelphia PA: John Benjamins. DCI = Vincenzo Lo Cascio (ed.) 2013. Dizionario Combinatorio Ital iano. Amsterdam, Philadelphia PA: John Benjamins. DICE = Diccionario de colocaciones del Español. . MCD = Rundell, Michael et al. 2010. Macmillan Collocations Diction ary. For Leaners Of English. Oxford: Macmillan. OCD = Oxford Collocations Dictionary for Students of English. 2010. Oxford: OUP. RCM = Dictionnaire des combinaisons de mots. 2007. Paris: Le Robert. WBLEXWF = Wiegand, Herbert Ernst et al. (eds) 2010. Wörterbuch zur Lexikographie und Wörterbuchforschung/Dictionary of

232

Laura Giacomini

Lexicography and Dictionary Research, Band 1, A-C. Berlin: Walter de Gruyter. 5.2 Monographs and articles Bahns, Jens 1987. Kollokationen in englischen Wörterbüchern. Worts chatzarbeit. Anglistik & Englischunterricht. 32, 87–104. Bahns, Jens 1994. Die Berücksichtigung von Kollokationen in den drei großen Lernerwörterbüchern des Englischen. Fremdsprachen Lehren und Lernen (FluL). 84–101. Baker, Paul et al. 2006. A Glossary of Corpus Linguistics. Edimburgh: Edimburgh University Press. Bartsch, Sabine 2004. Structural and Functional Properties of Collo cations in English. A corpus study of lexical and pragmatic con straints on lexical co-occurrence. Tübingen: Gunter Narr. Bergenholtz, Henning 2008. Von Wortverbindungen, die sie Kollokationen nennen. Lexicographica. 24, 9–20. Bergenholtz, Henning / Bothma, Theo J. D. 2011. Needs-adapted Data Presentation in e-Information Tools. Lexikos. 21, 53–77. Buendía Castro, Miriam / Montero Martínez, Silvia / Faber, Pamela 2014. Verb collocations and phraseology in EcoLexicon. Year book of Phraseology. 5/1, 57–94. Bullock, David 2011. NSM + LDOCE: A Non-Circular Dictionary of English. International Journal of Lexicography. 24/2, 226–240. Butina-Koller, Ekaterina 2005. Kollokationen im zweisprachigen Wörter buch: Zur Behandlung lexikalischer Kollokationen in allgemein sprachlichen Wörterbüchern des Sprachenpaares Französisch/ Russisch. Tübingen: Max Niemeyer. Fellbaum, Christiane 2007. Introduction. In: Fellbaum, Christiane (ed.) Idioms and Collocations. Corpus-based Linguistic and Lexico graphic Studies. London: Continuum, 1–19. Fillmore, Charles J. 1977. Scenes-and-frames semantics. In Zampolli, Antonio (ed.) Linguistic Structures Processing. Amsterdam: North-Holland Publishing Company, 55–81. Firth, John Rupert Firth 1957. Modes of meaning. Papers in Linguis tics, 190–215.

Defining collocations for lexicographic purposes

233

Fontenelle, Thierry 2000. A bilingual lexical database for frame semantics. International Journal of Lexicography. 12, 232–248. Fuertes-Olivera, Pedro et al. 2012. Classification in Lexicography: The Concept of Collocation in the Accounting Dictionaries. Lexico graphica. 28, 291–305. Geeraerts, Dirk 2013. The treatment of meaning in dictionaries and prototype theory. In Gouws, Rufus H. et al. (eds) Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume: Recent Developments with Focus on Electronic and Computational Lexicography, 487–495. Gerzymisch-Arbogast, Heidrun 2005. Introducing Multidimensional Translation. MuTra 2005 – Challenges of Multidimensional Translation: Conference Proceedings, 1–15. Giacomini, Laura 2011. An onomasiological dictionary of collocations: mediostructural properties and search procedures. Lexicograph ica. 27, 241–268. Giacomini, Laura 2012. Un dizionario elettronico delle collocazioni come rete di relazioni lessicali. Studio sul campo semantico della paura. Frankfurt am Main: Peter Lang. Giacomini, Laura 2014. Variational models in collocation: taxonomic relations and collocates inheritance. Proceedings of the Work shop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014) at ESSLLI European Summer School in Logic, Language and Information. August 2014, Tübingen, 23–26. Giacomini, Laura / Kilgarriff, Adam 2016 forthcoming. Corpus evidence and lexicography. In Hanks, Patrick / de Schryver, Gilles-Maurice (eds) International Handbook of Lexis and Lexicography. Berlin: Springer. Granger, Sylviane 2012. Customisable dictionary-cum-corpus webbased tools. Proceedings of Lexicograffiti, 2–52. Hausmann, Franz Josef 1989. Le dictionnaire de collocations. In Hausmann, Franz Josef et al. (eds) Wörterbücher. Ein internationales Handbuch zur Lexikographie, 1. Teilband. Berlin: Walter de Gruyter, 1010–1019. Hausmann, Franz Josef 1999. Praktische Einführung in den Gebrauch des Student’s Dictionary of Collocations. In Benson, Morton

234

Laura Giacomini

et al. Student’s Dictionary of Collocations. Berlin: Cornelsen, iv–xiii. Heid, Ulrich 2008. Corpus linguistics and lexicography. In Lüdeling, Anke et al. (eds), Corpus Linguistics. An international hand book. Berlin: Mouton/de Gruyter, 131–153. Heid, Ulrich 2014. Natural Language Processing Techniques for Improved User-friendliness of Electronic Dictionaries. Euralex 2014 Proceedings, 47–61. Heid, Ulrich / Prinsloo, Daan J. 2011. Linking dictionary and corpus data in online language tools. Afrilex 2011 Proceedings. Hollós, Zita 2013. Interferenzkandidaten in zweisprachigen Lernerwörterbüchern, insbesondere im deutsch-ungarischen Kollokationslexikon KolleX. Lexicographica. 29, 92–116. Jehle, Günter 2007. The Advanced Foreign Learner’s Mental Lexicon. Storage and retrieval of verb-noun collocations like “to embezzle money”. Hamburg: Kovac. Kilgarriff, Adam / Kosem, Iztok 2012. Corpus tools for lexicographers. In Granger, Sylviane / Paquot, Magali (eds) Electronic Lexicog raphy. Oxford: OUP, 31–55. Klosa, Annette et al. 2012. Zum Nutzen von Korpusauszeichnungen für die Lexikographie. Lexicographica 28, 71–97. Klotz, Michael 2000. Grammatik und Lexik. Studien zur Syntagmatik englischer Verben. Tübingen: Stauffenburg. Kokkinakis, Dimitrios / Gronostaj, Maria Toporowska 2008. MEDLEX+: An Integrated Corpus-Lexicon Medical Workbench for Swedish. Euralex 2008 Proceedings, 703–712. Konecny, Christine / Autelli, Erica 2015. Kollokationen Italien ich-Deutsch. Tübingen: Buske. Lemnitzer, Lothar / Geyken, Alexander 2014. Semantic modeling of collocations for lexicographic purposes. Proceedings of the Work shop on Computational, Cognitive, and Linguistic Approach es  to the Analysis of Complex Words and Collocations (CCLCC 2014) at ESSLLI European Summer School in Logic, Language and Information. August 2014, Tübingen, 35–40. L’Homme, Marie-Claude 2014. Why Lexical Semantics is Important for E-Lexicography and Why it is Equally Important to Hide its

Defining collocations for lexicographic purposes

235

Formal Representations from Users of Dictionaries. International Journal of Lexicography. 27/4, 360–377. Manning, Christopher D. / Schütze, Hinrich 1999. Foundations of Sta tistical Natural Language Processing. Cambridge MA, London: MIT Press. Martin, Willy 2008. A unified approach to semantic frames and collocational patterns. In Granger Sylviane / Meunier Fanny (eds) Phraseology. An Interdisciplinary Perspective. Amsterdam Philadelphia PA: John Benjamins, 51–65. McEnery, Tom / Hardie, Andrew 2012. Corpus Linguistics. Cambridge: CUP. Mel’čuk, Igor 1996. Lexical Functions: A Tool for the Description of Lexical Relations in a Lexicon. In Wanner, Leo (ed.) Lexical Functions in Lexicography and Natural Language Processing. Amsterdam, Philadelphia PA: John Benjamins, 37–102. Rundell, Michael / Kilgarriff, Adam 2011. Automating the creation of dictionaries: where will it all end? In Meunier, Fanny et al. (eds) A Taste for Corpora. A tribute to Professor Sylviane Granger. Amsterdam, Philadelphia PA: John Benjamins, 257–282. Spohr, Dennis 2012. Towards a Multifunctional Lexical Resource: Design and Implementation of a Graph-based Lexicon Model. Berlin/Boston: Walter de Gruyter. Tarp, Sven 2008. Lexicography in the Borderland between Knowledge and Non-Knowledge. Tübingen: Max Niemeyer. Tognini-Bonelli, Elena 2001. Corpus Linguistics at Work. Amsterdam Philadelphia PA: John Benjamins. Tutin, Agnès / Falaise, Achille 2013. Multiword expressions in scientific discourse: a corpus-driven database. eLex 2013 electronic lexicography in the 21st century: thinking outside the paper. Weller, Marion et al. 2011. Terminology extraction and term variation patterns: a study of French and German data. Proceedings of GSCL 2011.

Veronica Benigno and Olivier Kraif

Core vocabulary and core collocations: combining corpus analysis and native speaker judgement to inform selection of collocations in learner dictionaries

Abstract: This paper presents the concepts of “core vocabulary” and “core collocations” and discusses implications for the treatment of collocations in monolingual learner phraseological dictionaries. In the first section, we give an account of what the above concepts refer to by drawing on previous research. In the second part, we present the findings from a study (Benigno et al. 2015; Benigno et al. forthcoming) using L1 speaker judgements to validate a method to automatically extract core collocations from frWaC (Baroni et al. 2010), a very large web-corpus. The study aims to identify what features can be used to define and filter “core collocations” from a set of potential candidates – which were retrieved from the corpus by means of frequency, dispersion, and associative measures and then subjected to the evaluation of a group of native speakers who were asked to decide about the importance of collocations to communicate in everyday situations. Findings from the study showed that frequency is an appropriate but not sufficient measure to identify such central and nuclear units in language. In fact native speakers seem to attach importance (intended as usefulness in language use) to highly restricted and fixed units regardless of their frequency of occurrence – providing evidence of the fact that what is core is not systematically a matter of frequency. Based on these findings, the third part of the paper deals with phraseology from the lexicographical perspective and argues that in learner dictionaries both frequency and usefulness should serve as main organizing principles. Our discussion will be accompanied by practical examples extracted from the Longman Collocations Dictionary and Thesaurus, a learner dictionary informed by corpus data as well as by pedagogical judgements of expert lexicographers. Keywords: core vocabulary, core collocations, vocabulary selection, usefulness, avail ability, frequency, dispersion, associative measures, fixedness

238

Veronica Benigno and Olivier Kraif

1. Theoretical context of the study This section helps the reader position the theoretical perspective of the study. We provide a brief overview of the most influential approaches to the study of collocations (1.1), we introduce the notion of “core vocabulary” (1.2), and we explain how the derived concept of “core collocation” is intended in our research (1.3). 1.1 Approaches to the study of collocations Phraseology has become a major area of interest in theoretical and applied research as is evident from the large amount of studies in a number of interrelated disciplines: corpus linguistics, lexicography, psycholinguistics, translation studies, natural language processing, etc. The study of phraseology has emphasized the importance of prefabricated language in learning and teaching – questioning the validity of the traditional grammar-based approach which treated grammar and vocabulary as two separate systems. Phraseology is an umbrella term which refers to multiword units such as collocations, idioms, support verbs, pragmatic units, etc. – each with a different status depending on their underlying semantic, syntactic, and conceptual restrictions. Among such units, great attention has been given to collocations. Collocations are a particular type of phraseological unit. This term was originally introduced by Palmer, author of the combinatory dictionary A grammar of English Words and Professor of English in Japan (1933, 1938, cited by Cowie 1999). To cite Palmer’s words: it is not so much the words of English nor the grammar of English that makes English difficult, but that vague and undefined obstacle to progress in the learning of English consists for the most part in the existence of so many odd comings-together-of-words (Palmer 1933: 13, cited by Cowie 1999: 53).

The concept of collocation was then revived by authors like Bally (1951) and Firth (1957) in the second half of the last century. Collocations can generally be described as ready-made word combinations such as break

Core vocabulary and core collocations

239

a promise, do a favour, and meet a deadline which cannot be predicted according to rules of syntactic and semantic usage and which reflect the way speakers convey particular meanings in a language. According to Manning and Schütze (1999: 51), collocations are a “conventional way of saying things”. Historically, two main approaches provided valid criteria to define collocations: the statistical approach and the phraseological approach. The statistical approach was originated by Firth (1957) and continued by Halliday (1985) and Sinclair (1991), who founded the first dictionary to make use of corpus-based examples, the COBUILD (1987) – Collins Birmingham University International Language Database. This approach sees collocations as “the statistical tendency of words to co-occur” (Hunston 2002: 12), i.e. it identifies collocations on the basis of their textual frequency. In this view, if two words co-occur with a frequency which is higher than what one would expect by chance based on the frequency of each component, then they are said to have a privileged relationship. On the one hand, high frequency but loosely fixed combinations (combining a lexical and a grammatical element and commonly referred to by the literature as “colligations”) gain the status of collocations. On the other hand, there is the risk of not finding low-frequency but strongly associated pairs, for example collocations of specialized domains. What is crucial in this approach is the social, psychological, and cognitive relevance of collocations and the importance of ritualization in language learning. The phraseological approach (Hausmann 1989; Cowie 1998; Mel’čuk 1998) sees collocations as directional pairs subjected to some sort of restriction, either of semantic or functional or syntactic nature. Hausmann (1989), for example, discards combinations between an article and a noun. Mel’čuk (1998) classifies collocations based on the abstract semantic relation occurring between their components, expressed by the formula f(X)=Y. So for instance, the lexical function « Magn » refers to intensifier adjectives such as big, strong, massive, etc. And Howarth and Nesi describe “restricted collocations”, as “combinations in which one component is used in its literal meaning, while the other is used in a specialised sense. The specialised meaning of one element can be figurative, delexical or in some way technical” (1996: 47).

240

Veronica Benigno and Olivier Kraif

A limitation of the phraseological approach is that genuine collocations which deviate from pre-established norms are sometimes erroneously discarded. For example, Grossmann and Tutin (2003: 7) identify that in Hausmann’s enumeration of syntactic classes, acceptable patterns in French such as “adjective + preposition + noun” (e.g. ivre de colère) are missing. In this view, language pervasiveness becomes a secondary consideration. To summarize, if purely frequency-based analyses seem to adopt too comprehensive a definition of collocation, phraseological approaches seem to use too strict a categorization. Nowadays, however, the dividing line between the two perspectives is not as clear-cut and mixed methods of investigation are far more common than absolute positions. The statistical approach, for example, most typically uses associative measures such as Mutual Information in order to minimize the noise deriving from a raw frequency count. A comprehensive and critical review of the two approaches is offered by Orlandi in this volume and by Granger and Paquot (2008) who point out the inherent variability of the study of phraseology and explain how the two main approaches differ in scope. Second language acquisition research in particular stresses the importance of knowledge of collocations to achieve native-like fluency (Nattinger/DeCarrico 1992; Laufer/Nation 1995; Cowie 1998; Wray 2002; Ellis 2003; Durrant/Schmitt 2009; Laufer/Waldman 2011). Studies on the relationship between collocational use (an index of lexical depth) and proficiency level (Gitsaki 1999; Bonk 2001; Boers et al. 2002; Benigno/Vedder forthcoming) suggest that collocations are better mastered as proficiency increases. Psycholinguistic theories suggest that word combinations are stored as units in the mental lexicon (Aitchinson 1987). Ellis (2002), for example, describes vocabulary acquisition as a progression from unanalysed units to creative structures: from formula to low-scope pattern to creative constructions. According to Wray (2002), the limited lexical environments in which many words occur lead speakers to acquire them as unanalyzed chunks. These chunks are broken into parts as one’s language proficiency increases and only if necessary, but in any case, at a late stage. Whether lexical acquisition fossilizes or goes further

Core vocabulary and core collocations

241

into the full understanding of definitional meaning would depend on collocational productivity, i.e. “the ease with which words enter into collocational relationships with a wide variety of other words” (2002: 132). Language acquisition is indeed a matter of pragmatic and communicative needs, i.e. it is moved by the need of maximizing efficiency in communication. Decades of research on collocations have helped to shape a new view of vocabulary learning and to promote new teaching methodologies focused on contextualized language rather than on isolated lexical items. The traditional Chomskyan approach to linguistics made a sharp distinction between grammar and lexicon relying on two basic assumptions: language is a predictable system in which grammatical slots are filled with lexical items and allows much freedom on both the paradigmatic and syntagmatic axes (in terms of synonymic substitution and use of the same world class). In reality language is much more patterned – each word co-occurrs with a limited set of other words to convey specific meanings (as in break a promise instead of *interrupt a promise) and each word combination features in specific grammatical patterns (e.g. do the cooking and not *do a cooking). According to Stubbs (2002) every word typically occurs in specific contexts and selects other words creating a specific universe of discourse. Part of the meaning depends on social context and part on linguistic context. Speakers make abundant use of prefabricated phrases – which they process as chunks to fulfil their communicative purposes. In isolation words are polysemous or ambiguous, but when they occur in sentences, i.e. in a given linguistic context, they are not ambiguous anymore. Similarly, Brent (2009) explains that partial acquisition of a word meaning is a very common outcome. The author proposes to see lexical acquisition as a “meaning-last” process: we usually encounter a word in a specific lexical environment, i.e. we understand words only in predictable lexical environments. Partial knowledge of a word is often functionally sufficient for our communicative needs: this is very true for foreign learners but also for native speakers that often don’t know all the meanings a word can have and mostly use words in frequent specific communicative contexts. On the same line of thought, Hoey (2005) sees repetition and experience (i.e. the number of encounters with a word) as the key to learning. He states that “every word is primed for

242

Veronica Benigno and Olivier Kraif

use in discourse as a result of the cumulative effects of an individual’s encounters with the word” (2005: 13). Every time an individual comes across a word (in a specific register, semantic context, and grammatical and syntactical pattern), his/her chances to automatically retrieve that word increase, i.e. the priming effect is reinforced. Then grammar is “the product of the accumulation of all the lexical primings of an individual’s lifetime” (2005: 159). Another important strand to the study of phraseology is the cognitivist approach which recognizes functional and cognitive motivations behind the use of collocations. According to Legallois (2005) there are two main cognitive and usage-based models in the British context. The “Construction Grammar” considers phraseological units as symbolic chunks. For example, Fillmore’s theory of “semantic frames” (Fillmore 1982) shows the existence of linguistic routines, for example going to the supermarket, which are associated with entrenched cognitive structures. The other main approach is Francis and Huston’s (2000) “Pattern Grammar” according to which a pattern is a phraseology frequently associated with (a sense of) a word, particularly in terms of the prepositions, groups, and clauses that follow the word. Patterns and lexis are mutually dependent, in that each patter occurs with a restricted set of lexical items, and each lexical item occurs with a restricted set of patterns (2000: 3).

One of the several examples provided by the authors is the noun matter, generally preceded by the article a and followed by the preposition of and a verb in the “-ing” form, e.g. a matter of knowing… In the present study, collocations are defined by two typical traits:  The transparency in meaning of the pivot (generally the noun in verb-noun combinations and the noun in adjective-noun combinations)  The restricted semantism of the collocate, i.e. the fact that the collocate acquires a particular meaning when co-occurring with a given pivot. Let’s take a few examples. In heavy rain, rain keeps its literal meaning, whereas heavy conveys the particular meaning of intensity when

Core vocabulary and core collocations

243

co-occurring with rain and cannot be replaced by a synonym as in *strong rain. In the above examples the pivots are transparent while the restricted semantism of the collocate is shown by its limited or lack of substitutability with synonymic words. If we apply a test of semantic commutability to the collocates, we often find out that the use of a synonym is unnatural or limited to a few members. In the above examples, it seems that no other word would convey the same meaning in a natural (native-like) way. In commit a crime, inversely, perpetrate would probably sound as an acceptable substitute of com mit (though a change in register would take place), but we cannot make, operate, or perform a crime. We would like to complete this brief overview of the different approaches to the study of collocations by outlining the main differences between collocations and other types of multiword combinations. Because of their hazy outline, collocations are variously tagged and place themselves along a continuum ranging from free combinations to idioms but the difference between different phraseological units is not always straightforward. The main difference between collocations and free combinations is the nature of the restrictions they are subjected to. Free combinations are subjected to general selection restrictions based on common semantic and syntactic rules (see Lo Cascio in this volume): a noun indicating a concrete object, for example, cannot be described with adjectives expressing emotions such as happy; and the grammatical category article cannot be combined with another article in English. Beside the above general selection restrictions, collocations therefore appear to be subjected to usage restrictions. In fact the affinity between the two components of a collocation is often arbitrary: heavy rain sounds acceptable whereas strong rain is not. The above dichotomy was referred to by Bally (1951) already in the last century as contrainte lexicale and con trainte de signe and by Sinclair (1991) as open choice principle and idiom principle. Idioms are easier to distinguish from collocations because they appear to be more constrained and they respond to creative processes such as semantic transliteration which are typical of all languages. Idioms are fully non-compositional, i.e. both their components lose their literal meaning and they generate a new meaning – which sometimes is opaque, as in to pull one’s leg, other times can inversely be easily interpreted, as in to smoke like a chimney. Additionally, idioms are generally syntactically

244

Veronica Benigno and Olivier Kraif

more constrained, i.e. they don’t admit the inclusion of other elements in their syntactic pattern as in *take (another) rain check – instead of take a rain check once again – which would not be acceptable. In the next sub-section we present the concept of “core vocabulary” (1.2) and the derived concept of “core collocations” (1.3). 1.2 Core vocabulary: pioneering pedagogical lists and defining criteria The basic assumption behind the idea of a core vocabulary inventory is that knowledge of basic words provides learners with “a survival kit […] that they could use in any situation” (McCarthy 1990: 49). According to Carter (1998), basic words in the lexicon are more “central, ‘nuclear’ or ‘core’ than others” and can be identified by a number of traits such as generic meaning (e.g. eat vs dine), superordinateness (e.g. flower vs tulip), high collocability (e.g. bright vs gaudy). Core vocabulary is characterised by particular traits which make it easier to learn for a foreign learner, for example the fact that the majority of core words are concrete items and therefore easier to memorize; and that they generally refer to the most basic and prototypical meanings (of polysemous words) which are cognitively more relevant and also more useful for an efficient communication in the target language (Benigno 2007). At the beginning of the last century some pioneering research was conducted for some major European languages such as English (Thorndike 1921), French (Henmon 1924, cited by Haygood 1937), German (Kaeding 1898, cited by Haygood 1937), and Spanish (Buchanan 1927, cited by Haygood 1937), with the aim of producing pedagogical lists for learning a foreign language. For English, for example, Thorndike (1921) compiled a list of about 10,000 words extracted from a corpus of fiction books, while Ogden (1930) produced an essential list of about 850 words which he claimed would provide the basics for communicating efficiently in English. His work received a great deal of criticism because the list was judged to be too concise and to include too narrow a number of words and grammatical categories (18 verbs only). Pursuing a similar aim, West (1953) produced the General Service List using both frequency and subjective criteria such as the usefulness and universality of words. He indicated different word meanings and at what stage each

Core vocabulary and core collocations

245

meaning was known by his students. His work is documented in the In terim Report on Vocabulary Selection for Teaching English as a Foreign language drawn up by Palmer in collaboration with West, Faucett and Thorndike (Faucett et al. 1936, cited by Richards/Savard 1970: 24–25). The most widely known core vocabulary list for French, the Français Fondamental (originally published as Français Elémentaire 1954), was created by a team of researchers supervised by Gougenheim and represents another outstanding piece of work from the last century. The Français Fondamental used oral corpora and distinguished between frequent words and available or useful words (in Gougenheim’s words, “mots disponibles”). The authors use the term disponibilité to refer to a trait of low-frequency words such as fork and tooth which represent very basic concepts or concrete objects from everyday life. These words are essential for basic communication, regardless of the fact that they are not frequently spoken or written by native speakers and therefore difficult to retrieve by mere word counting. As such they are part of an individual’s core vocabulary and can be retrieved by interviews with native speakers – often a quicker method than, for example, manual extraction of onomasiological information from thesauri. In what follows we would like to briefly present the key compiling criteria which were used (individually or in combination) to produce pedagogical lists in the last century. These same criteria will be used to identify core collocations in the study on which we report in section 2. Frequency. As Fuster Márquez and B. Pennock Speck (2008) point out, the main criterion to define core vocabulary is frequency, considered as an objective criterion based on the empirical observation of language use and as a feature of the input which facilitates learning. Studies in neurolinguistics have shown that repeated lexical exposure strengthens vocabulary knowledge: the more frequent items are, the easier is to store them in our mental lexicon. Paradis’ Activation Tresh old Hypothesis (2004) correlates the frequency of use of a linguistic item and its activation and availability to the language user. Similarly, Tomasello (2003) explains that frequency is crucial in learning as it is demonstrated by the fact that high-frequency irregular chunks are easily learned although their complexity. Research on the lexical diversity of texts produced by language learners has also proved that frequency is a

246

Veronica Benigno and Olivier Kraif

useful feature to differentiate between learners at different proficiency levels and between learners and L1 speakers (Meara/Bell 2001; Ellis 2002; Nation/Beglar 2007; Benigno/Vedder forthcoming). Availability or usefulness. Frequency alone, however, is not 100% reliable in selecting the most useful vocabulary to communicate in another language. A purely frequency-based pedagogical list is likely to be biased by the nature of a corpus, for example by the domains, the age of the writers/speakers, the text types, etc. represented in a specific corpus. An additional practical limitation of frequency counts is then the fact that lemma or word form counts are not able to discriminate between different meanings of polysemous words. As Stubbs (2002) points out, the definition of what is basic depends on frequency, but also on functional criteria such as communicative relevance or usefulness. The usefulness of words depends on their textual distribution, e.g. core words are distributed over different kinds of texts, are more neutral in style and can be used to explain the meaning of non-core words. Of the same opinion are Goddard and Wierzbicka (2007): core meanings are generally simpler than non-core meanings and can explain more complex meanings. Subjective evaluation will reflect the importance attached by speakers to words (meanings) and to the domains in which these words are used, e.g. to talk about eating, working, social interaction, etc. It is therefore of crucial importance to combine frequency analysis to considerations about communicative usefulness to identify what lexical items are more basic or core than others. Dispersion. Another measure which helps identify core vocabulary and which we used in our study (see section 2) is dispersion, i.e. how evenly a word is distributed in a given corpus. A corpus which claims to represent general language should include non-specific and non-genre oriented language which reflects the way ordinary speakers use the language. Such a corpus is obviously very difficult to create because different genres, domains, and writers would need to be equally represented. The British National Corpus is an example of a balanced and representative corpus of general language because it attempts to capture a broad cross-section of genres, text types, and registers. Now, when a sampling error occurs, the use of raw frequency can

Core vocabulary and core collocations

247

be misleading, because a high-frequency word may have repeatedly occurred in only one section of a corpus, i.e. it has a low dispersion value. Adding a dispersion filter to a frequency analysis is therefore a useful measure to make sure a word is evenly distributed along the different sections of a corpus and therefore representative of general language. To conclude, we would like to flag two important considerations. Firstly, although it is possible, and pedagogically relevant, to identify a relatively stable core vocabulary, vocabulary use is characterised by inherent variability. Language learning is a dynamic, multidimensional (vocabulary being one of these dimensions), and non-linear process which is difficult to capture within a pre-established frame. Language learning is a progressive experience favoured by a process of acculturation with the target language which consists of a gradual understanding of the target language pragmatic codes. Vocabulary learning in particular is affected by a number of intralinguistic and interlinguistic variables (i.e. characteristics of the L1 and of the target language) as well as by individual variables (e.g. learners’ motivation, personal experience with the language, personal or professional domain of interest, learning style, etc.) and by external variables (e.g. the effectiveness of the instructional method, the quality of the input, the number of encounters with the target language, the context of usage, e.g. formal vs. informal, etc.). Secondly, we would like to highlight the principal merits of the afore-mentioned enterprises to create core vocabulary lists.  It has been acknowledged that beyond individual features of variability, there exists a core lexicon commonly shared by speakers.  Such core lexicon was identified as a desirable objective of vocabulary learning and teaching, an ideal starting point to provide learners with basic tools to engage in a successful communication.  In some cases, statistical methods or quantitative analysis and considerations about words’ characteristics such as usefulness were combined to identify core vocabulary. The concept of core vocabulary has brought us to investigate a related concept, the one of “core collocation”, which we discuss in the next section.

248

Veronica Benigno and Olivier Kraif

1.3 Core collocations: learning vocabulary beyond individual words In this study we would like to argue for the need to extend the concept of core vocabulary to the syntagmatic dimension and would like to introduce the concept of “core collocation”. Knowledge of core vocabulary is crucial to learning. West’s Gen eral Service List has largely been used to create vocabulary teaching, lexicographical, and assessment resources, for example as the basis of the Lexical Frequency Profile (Laufer/Nation 1995) and of the defining vocabulary in the Longman Dictionary of Contemporary English LDOCE (1978). However, an obvious limitation of West’s approach to the study of core vocabulary (which is reflected in other lists with similar purpose) is the underlying assumption that the learning of vocabulary takes place by retention of single word forms. The fact that no attention has generally been paid to different meanings of polysemous words and to the syntagmatic environment in which these words occur, has raised our interest in the issue of exploring core vocabulary within its phraseological dimension, and with a particular focus on collocations. From a pedagogical and lexicographical point of view, the question arises as to how to identify which collocations are more central than others in order to help learners filter and prioritize vocabulary in an efficient way. With this in mind, we developed the main focus of the present research and our basic assumption that the criteria applied to define core vocabulary, mainly frequency and usefulness, should be equally applied to define core collocations. In our study, we define core collocations as frequent or useful (i.e. essential for accomplishing basic communicative tasks) word combinations consisting of two lexemes which yield a significant (collocational) relation and which represent the most basic co-occurrences of a word. This definition is evaluated through a corpus-based study combining corpus analysis and L1 speaker judgements about the usefulness of collocations – which we report on in the next section.

Core vocabulary and core collocations

249

2. A corpus-based study of core collocations In this section we present the findings of a study combining corpus analysis and native speaker judgement in order to investigate the concept of “core collocation” (Benigno et al. 2015; Benigno et al. forthcoming). The following sections describe the research questions (2.1), the data and tools (2.2), the methodology to extract collocations from the corpus (2.3), the surveys with native speakers (2.4), and the findings of the study (2.5). 2.1 Research questions A study on French collocations by Benigno et al. (2015) and Benigno et al. (forthcoming) was conducted with a twofold purpose: 1. 2.

To identify a valid methodology to automatically extract core collocations from corpora – retrieved by means of frequency, dispersion, and associative measures such as Mutual Information To determine the validity of the sample extracted from the corpus against native speaker judgements with the purpose to ascertain the relative importance of frequency and to identify which criteria play a role in the perceived usefulness by the speakers.

The study explored the French corpus frWaC (Baroni et al. 2010), a lemmatized and POS-tagged corpus assembled by crawling the Web and comprising more than 1 billion tokens. Perl scripts were developed on pre-established parameters to extract the 2,000 most frequent collocates of 10 basic words (for a total of about 20,000 co-occurrents extracted). For each collocation, the script produced a table where frequency of occurrence, dispersion value, and associative measures were computed. Qualitative analysis and a combination of these different measures were used to narrow the sample to about 450 potential core collocations. The final set included the most frequent co-occurrents that were also selected by at least one of the associative measures and the top co-occurrents according to associative scores. A survey to 90 native speakers was then conducted to evaluate the usefulness of the selected word combinations with the purpose of understanding why some units were considered as

250

Veronica Benigno and Olivier Kraif

more central than others. Findings showed that frequency alone is not sufficient to identify core collocations and that units’ fixedness plays a role in their perceived usefulness. 2.2 Data and tools: node words, corpus, and Perl scripts The language sample extracted from frWaC consists of about 20,000 phraseological units, i.e. the most frequent co-occurrents of 10 nouns in the function of node (in the couple node-collocate): colloque, conférence, congrès, conversation, débat, fête, interview, rencontre, réunion, séminaire. The 10 nouns are basic words included in the Dictionnaire Fon damental list (Gougenheim 1971) and belong to the lexical area of “social relations” according to the Thesaurus of Péchoin (1991). The extraction was restricted to specific syntactic types: V+N, N+V, A+N, N+A, and N+N. The choice of the noun as grammatical category of the pivot is motivated by the fact that nouns usually play the role of node in the node-collocate co-occurrence (see Grossmann/Tutin 2003) and therefore have a main syntactic and semantic role in the syntagmatic structure (Lo Cascio 2000). Nouns are also more referential and denotative than adjectives and verbs, at least in general language, and lexical access is claimed to be quicker for nouns than for other categories (Ellis 1997). From a computational point of view, nouns are also easier to process than, say, verbs, because morphologically simpler (they vary in number and person or they are subjected to derivative processes). Additionally, the analysis was limited to a specific semantic area for obvious practical reasons and in order to exploit similarity in meanings and to ensure a homogeneous and well-defined language sample (see Benigno et al. 2015 for pedagogical implications of the concept of core collocations within the same semantic area). The corpus frWaC (Baroni et al. 2010) is a very large corpus assembled trough automatic procedures by crawling Google web pages, claimed to be representative of contemporary French in general and which includes a variety of text types and topics. It was developed within the context of an international research project called WaCky (Web as Corpus kool yinitiative) initiated by a community of linguists and information technology specialists (). WaCky researchers collaborate to develop tools (and interfaces to existing

Core vocabulary and core collocations

251

tools) to allow linguists to crawl a section of the web, process the data, index, and search them. FrWaC was built in two key stages: the selection of “seed” URLs and the crawling. The “seed” URLs were selected to include a variety of content and genre texts, e.g. recipes, technical manuals, short stories (which the authors refer to as ‘pre-Web’ texts because they can be found in electronic format on the Web) but also personal pages, blogs, forums (more strictly Web-based texts). The procedure for the selection of the URLs consisted of submitting around 1,800 random bigram queries to Google. Bigrams consisted of words sampled from a basic vocabulary list, more specifically: a first set of 1,000 word pairs of mid-frequency content words which was selected from Le Monde Diplomatique newspaper, claiming to include “public sphere” documents, such as journalistic and academic texts; and a more basic set of 769 word pairs generated from a vocabulary list for children (from eight to ten years old) which allowed the team to capture “personal interest pages” such as blogs, forum, etc. (Baroni et al. 2009). The authors used the Heritrix crawler and limited the crawling to pages in the .fr Web domain. The crawled data were subsequently cleaned to reduce any noise and annotated with partof-speech and lemma information, using the TreeTagger (see Baroni et al. 2009 for further details). A number of validation studies were conducted to evaluate the quality of the corpus content. Baroni et al. (2010), for example, showed that frWaC and the comparable English corpus ukWaC provided relevant content for use in lexicographic resources. FrWaC may be considered as an unconventional corpus for a number of reasons. The automatic sampling procedure of the corpus does not guarantee a perfect balance in terms of selected genres and types. Additionally, the corpus has some inherent noise deriving from the fact that texts were crawled within a narrow time-frame and that web pages obviously contain “boilerplate” texts (menu, footers, etc.), text fragments, repeated quotations, lists, tables, etc. Despite these limitations, in our opinion frWaC is an invaluable resource for corpus linguistics’ research because:  The size of the corpus is impressive  It contains a wide variety of text types and genres. According to Sinclair, a general corpus is “[…] gathered from a variety of sources so that the individuality of a source is obscured, unless the researcher isolates a particular text” (1991: 17)

252

Veronica Benigno and Olivier Kraif

 It is a collection of texts written by a very heterogeneous group of writers, the cyber population  It is more recent and therefore includes more up-to-date language than other well-established corpora such as the British National Corpus. For further details about the WaCky corpora, see Baroni et al. (2009). To explore this huge corpus, we had to build an analysis tool from scratch. We developed a purpose-specific extraction script using Perl programming language (Kraif 2011). The scripts used a group of pre-set parameters. For example, the size of the span was set to 4 co-occurrents on the right and left of the pivot, including any punctuation signs, while the span size for concordancing in KWIC format was 50 tokens. The execution of the scripts produced a double output: concordancing lines with an indication of their URL; and a frequency list accompanied by other statistical measures, i.e. dispersion, Pointwise Mutual Information, Log-likelihood, t-score, and z-score (Evert 2008). The simplest frequency-based method of detecting collocations would not have fit perfectly to our analysis since colligations, free combinations, and even false co-occurrences may have appeared to be frequent by chance, but not in a significant relationship to each other. Inversely, associative measures are supposed to measure the strength of association between the components of a unit and therefore allowed us to extract not only frequent collocations, but also collocations that were eventually considered as useful by our informants despite their low frequency. 2.3 Methodology for corpus extraction Based on the parameters set in our research tool, the top 2,000 co-occurrents of the selected 10 pivots were extracted (to make a total of 20,000 co-occurrents). The threshold was set for the top 2,000 co-occurrents based on the manual filtering of all co-occurrents returned by the script and the observation that this figure was large enough to cover lemmatization errors or errors deriving from the web-nature of the corpus and without missing any genuine co-occurrent. For the pivot fête, for example, only 16 genuine co-occurrents were found in the frequency rank

Core vocabulary and core collocations

253

between 1,000 and 1,100, compared with the 73 valid co-occurrents found in the frequency rank 0–100. This figure consistently decreased after automatically removing all co-occurrents which corresponded to functional grammatical categories (using a stoplist), all low-dispersed co-occurrents, and any remaining named entities (e.g. proper nouns, geographical places, titles, etc.) or bugs. These cleaning steps allowed us to reduce the list of co-occurrents by up to 300 for some pivots. In a second step, frequency, dispersion, Mutual Information, Log-likelihood, t-score, and z-score were combined to identify the final sample of potential core collocations. We decided to keep the top 50 most frequent co-occurrents and any co-occurrent beyond this threshold which was selected by at least one of the other statistical measures (dispersion and associative measures) – with the purpose of including any significant low-frequency combination which would have been discarded by absolute frequency. For each of these measures, thresholds were empirically established based on a manual analysis of the candidates retained above or below a given cut-off, except for the Mutual Information for which the literature indicates a threshold of 3 (Church/ Hanks 1990). If frequency and dispersion are likely to detect collocations which are interesting for language teaching or lexicographic purposes, associative measures tell us how significant collocations are. From an original figure of 20,000 co-occurrents (2,000 for each pivot), the size of the sample eventually reduced to about 450 co-occurrents. 2.4 Surveys to native speakers The sample selected using the procedure described in 2.3 was subsequently presented to a group of 90 French, Belgium, Swiss, and Canadian native speakers from different professional backgrounds and age groups. The purpose of the survey was to determine the relative importance of frequency as defining criterion for core collocations and to understand what other criteria motivated the informants’ decision about the usefulness of collocations. The 90 informants were divided into three groups: the first group was asked to identify core collocations of the pivots conférence, débat, and rencontre; the second group colloque, fête, and réunion; and the third group congrès, séminaire, interview, and

254

Veronica Benigno and Olivier Kraif

conversation. Each pivot was therefore evaluated by a total of 30 informants. The informants’ task was to go through the list of collocations automatically extracted and to mark any collocation which they thought was important or useful for everyday communication. Collocations were presented in their most frequent syntactic pattern: for example, the combination between conférence and cycle was presented as un cycle de conférences. Each individual was sent a written briefing document about the nature and the purpose of the task, with examples extracted from the corpus. For further details about the test and the informants, see Benigno et al. (forthcoming). 2.5 Results The analysis of the frequency data and the informants’ data showed three main findings. 1. Calculation of the Pearson correlation coefficient between frequency (of collocations in the corpus) and native speaker score (corresponding to the number of times each collocation was chosen by the informants) showed a slightly positive relationship between these two variables (between 0.08 and 0.43) for 8 pivots out of 10. This correlation was not always found to be significant (as the p-value was found to be below 0.05 only for 40% of the sample) as we show in Table 1 below. Table 1: Pearson’s correlation coefficient between frequency and native speaker score; p < 0.05. PIVOTS

PEARSON’S P-VALUE PIVOTS PEARSON’S P-VALUE COEFFICIENT COEFFICIENT

Colloque

-0.19

*0.2362

Fête

0.16

*0.1858

Conférence

0.43

0.0006

Interview 0.05

*0.7777

Congrès

0.08

*0.5926

Rencontre - 0.11

*0.4215

Conversation 0.27

*0.1615

Réunion

0.29

0.0413

Débat

0.0222

Séminaire 0.29

0.0290

0.30

Pearson’s coefficient provides us with information about the direction and the strength of the association between the two variables. The coefficient varies between –1 and +1, with +1 indicating a perfect

Core vocabulary and core collocations

255

positive relationship, –1 a perfect negative relationship, and 0 indicating that the two variables are not associated. Observation of values in table 1 confirmed our original hypothesis that the feature of “coreness” is not entirely dependent on the frequency of occurrence of collocations and implied that other factors may play a role in determining the informants’ decisions. On closer inspection, we systematically observed some deviating cases in the data distribution, i.e. high-frequency collocations (within or in proximity of the first quartile of the data distribution) which were chosen by only a few informants (i.e. with an average native speaker score within or in proximity of the last quartile of the data distribution); or inversely, low-frequency collocations (within or in proximity to the last quartile of the data distribution), yet marked as important by a good percentage of informants (i.e. with an average native speaker score within or in proximity to the first quartile of the data distribution). Let’s take as an example the pivot réunion. The relationship between the two variables was found to be slightly positive (0.29) and significant. Abnormal observations were: la prochaine réunion (frequency 4,303, voted by 23% of informants), participer à la réunion (frequency 3,806, voted by 23% of informants too), and la réunion a lieu (frequency 3,598, voted by 30% of informants). These units were found to have a frequency value higher than 2,648, i.e. the threshold of the first quartile of the frequency distribution, and an informant score lower than 30%, i.e. the threshold of the last quartile of the score distribution. Inversely, une réunion préparatoire (frequency 767, voted by 93% of informants), le calendrier de la réun ion (frequency 638, voted by 87% of informants), and une réunion consultative (frequency 379, voted by 87% of informants) scored low in frequency and high in the native speaker score. These units were found to have a frequency value lower than 1,016, i.e. the threshold of the last quartile of the frequency distribution, and an informant score higher than 73%, i.e. the threshold of the first quartile of the score distribution. In table 2 below we present the list of collocations for the pivot réunion (replaced by an asterisk) and emphasize the abnormal observations in italics.

256

Veronica Benigno and Olivier Kraif

Table 2: Collocations of réunion sorted by frequency. Abnormal observations are marked in italics. COLLOCATIONS être en *

FREQUENCY

SCORE

14,868

87%

une * publique

8,170

93%

organiser une *

6,826

67%

une * d’information

6,218

67%

la salle de *

5,358

80%

une * de travail

4,699

63%

la prochaine *

4,303

23%

se rendre à la *

4,231

87%

la * se tiendra …

3,860

73%

participer à la *

3,806

23%

la * a lieu …

3,598

30%

la * du comité (central, etc.)

3,393

87%

la * du groupe x

3,205

63%

la * de la commission (mixte, etc.)

3,048

53%

le compte-rendu de la *

2,899

83%

la dernière *

2,396

57%

assister à la *

1,944

43%

faire une *

1,908

20%

l’organisation de la *

1,628

33%

la date de la *

1,596

30%

une * de bureau

1,431

37%

animer une *

1,406

17%

une * de concertation

1,397

40%

prévoir une *

1,368

70%

l’issue de la *

1,310

80%

une * collective

1,263

97%

une * plénière

1,242

7%

une * de section

1,221

77%

la * de l’assemblée (générale, etc.)

1,196

47%

une * annuelle

1,149

73%

257

Core vocabulary and core collocations Table 2: Continued. COLLOCATIONS

FREQUENCY

SCORE

la * des ministres

1,073

27%

une * de quartier

1,050

73%

préparation de la *

981

63%

une * municipale

974

53%

une * mensuelle

971

40%

la participation à la *

903

20%

le procès-verbal de la *

873

53%

une * préparatoire

767

93%

une * thématique

741

23%

la * se déroulera ...

720

43%

le calendrier de la *

638

87%

présider une *

555

47%

une * hebdomadaire

488

60%

une * informelle

466

33%

convoquer une *

416

50%

une * consultative

379

87%

une * interministérielle

327

57%

convier une *

293

73%

une * tripartite

135

43%

2. Our second finding was that the degree of fixedness between the components of the collocation appeared to determine the informants’ perceived usefulness. A qualitative analysis of the frequency data and the native speaker data suggested that the informants tended to discard high-frequency but loose collocations (with a low Mutual Information score), and inversely, to mark as important low-frequency collocations whose components were strongly associated, e.g. conférence téléphonique, which in fact has a one word synonym, téléconférence. This observation suggested the soundness of the method used in extracting our sample of potential core collocations since associative measures were used to detect associations which would have otherwise been missed in the frequency count.

258

Veronica Benigno and Olivier Kraif

3. The third outcome of our study was the evaluation of the relative contribution of the different statistical measures to automatically extract core collocations. For our corpus, which has some inherent bias due to the massive replication of similar content on the web, we claim that frequency and z-score are the best predictors – provided that there is a stoplist to eliminate function words. We also claim that dispersion is a key measure to reduce the corpus bias and eliminate localized phenomena. In some cases, dispersion proved to be the only valid measure to signal the presence of a wrong co-occurrence: in our sample, the co-occurrence fête (party) + tarte (cake) was selected by all measures (probably because the two words often occur in similar contexts) except by dispersion, which indicated that tarte was found only in one source. Log-likelihood seems to have a certain power of prediction but is less accurate than the two above-mentioned other measures. This measure gave interesting results, such as rencontre + organiser or rencontre + lieu and helped filter out some high-frequency combinations including words such as être and avoir, which are likely to be random matches due to their extended use as auxiliary verbs. In fact log-likelihood is considered as one of the most popular significance measures, and according to mathematical theory appears to be “the most appropriate and convenient measure” (Evert 2008). However, in our study this measure was astonishingly outperformed by raw frequency counts with regard to its ability to extract genuine co-occurrences. In particular log-likelihood tended to select very frequent named entities such as François Bayrou (the name of a French politician) – due to the fact that the corpus reflects the contents or topics discussed in the Web during the period of the crawling process. Overall, this measure seems to work relatively well if associated with dispersion. Inversely, Mutual Information and t-score do not appear to be useful in retrieving core collocations. These measures mostly extracted multiwords that were poorly dispersed, named entities or bugs. Table 3 below shows the top co-occurrents extracted by the different measure for the pivot rencontre.

259

Core vocabulary and core collocations Table 3. The top 10 co-occurrents of rencontre by statistical measure. MEASURE

SCORE INTERVAL TOP 10 RESULTS

t-score

1.339–4.234

blasheim, udf-modem, goody, jacques, par is, bordeaux, clae, franche, gilles, averroèes

PMI

10.36–11.64

Blaesheim, goody, franche, jacques, bor deaux, averroès, udf-modem, méditerran neés, Gilles, nationale

Log-likehood 7.268–18.932

organiser, lieu, paris, échange, françois, occasionnel, être, udf-modem, bayrou, clae

Frequency

2,979–18,786

être, avoir, lieu, organiser, faire, aller, pre mier, échange, site, national

z-score

35.34–72.72

lieu, organiser, échange, premier, aller, occasion, international, occasionnel, an noncer, national

Mutual Information and t-score yielded a lot of noise. Mutual information calculates the number of times two words co-occur in the corpus and compares that figure against the number of times the two words occur separately. It is known to work well with low-frequency collocations and to detect more specialised collocations. T-score is more useful in measuring the significance of an association and is useful in detecting relatively frequent collocations which would get lost in the frequency count but which are quite obvious but not highly specialised pairs, including some colligations. In our corpus, these two measures selected low frequency collocations and brought in quite some noise, retrieving a good percentage of named entities. For instance, the top co-occurrent by Mutual Information is udf-modem (the name of a French political party) which co-occurs with rencontre 413 times. The dispersion value tells us that instances in which udf-modem and recontre co-occur are poorly dispersed in the corpus because they feature in only two different domains. We believe that the principal sources of noise in our corpus are information redundancy and the narrow timeframe of the crawling, 10 days. The two aspects are closely related. The news topics in the crawling timeframe were obviously very similar in terms of events reported and people mentioned. The same newspaper articles, in particular, are likely to feature in a number of different websites with slight edits (we

260

Veronica Benigno and Olivier Kraif

would like to remind the reader that the corpus was originally cleaned of any duplicated documents) and named entities therefore appear to have an overwhelming saliency. If the sampling had taken place across a larger time window, the above sources of noise would have been consistently reduced. However, in our study we were able to reduce noise by using the dispersion filter. In conclusion, we fully agree with Evert (2008) in that “the suitability of an association measure also depends on many other parameters such as co-occurrence type, frequency threshold, language, genre, domain, corpus size, etc.”. Our own observations constituted a clear validation of this statement. A web-based corpus is a non-conventional source of data for various reasons: the duplication of information above discussed, the extra noise in the data deriving from (non-strictly) textual documents, the difficulty of sampling (when the editorial sources are not clearly identified), and the overrepresentation of certain subjects (because of news topics, for instance). When extracting statistical observations from such kinds of corpus, the tools need to be fit-for-purpose, i.e. designed according to the particular structure of the corpus. For further details about the sample data, the informants of the study, and the test administered to the native speakers, see Benigno et al. (2015) and Benigno et al. (forthcoming).

3. Discussion: implications for the treatment of collocations in learner dictionaries Our findings confirmed the original hypothesis that there is no perfect correlation between coreness and frequency and that usefulness is another important defining criterion in vocabulary selection – which in the case of collocations seems to depend on the degree of fixedness between the components of the collocation. We believe these findings are useful to inform the treatment of collocations in learner dictionaries. In the past few decades research in corpus linguistics has revolutionized the understanding of the role of multiword units in language. Most of

Core vocabulary and core collocations

261

the currently available learner dictionaries claim to be corpus-based, e.g. the Collins Cobuild English Dictionary, the Longman Dictionary of contemporary English, the Oxford Advanced Learner’s dictionary, and the Cambridge International Dictionary of English. But to what extent is information about the frequency of collocations complemented by qualitative considerations about the usefulness of collocations? Although corpus linguistics and statistical approaches represent an invaluable resource to run complex analyses of large data sets in a much shorter time than would be possible by manual search, it is crucial that these methods are complemented by qualitative insight. In the case of lexicographic practice, judgements by lexicographers represent an invaluable source of qualitative input because they validate the reliability of a purely statistical approach and back up claims based on quantitative observations (see Giacomini in this volume). The Longman Collocations Dictionary and Thesaurus is an example of such integration between corpus analysis and expert input. This dictionary is based on the Longman Corpus Network (LCN), a corpus used as the basis for all Longman dictionaries. LCN consists of about 450 million words of written and spoken data of British and American English. The Corpus consists of five sections: the Long man Written American Corpus which consists of 100 million words of American newspaper and book text; the Longman Spoken American Corpus which includes 5 million words of everyday American speech; the Spoken British Corpus, which is a section of the British National Corpus; the Longman/Lancaster Corpus which consists of more than 30 million words; and the Longman Learners’ Corpus which records learners’ productions. The written component of the corpus includes a variety of text types and genres, namely fiction, non-fiction, and news, and the subject categories range from Applied Science to Leisure to Academic English. Texts were compiled between 1960 and 2007. In this dictionary, meanings of each entry are organized by partof-speech and listed in order of frequency – based on data from LCN. Manual filtering by a team of expert lexicographers was required to filter genuine pairs among the collocations automatically extracted by means of frequency. Additionally, associative measures such as Mutual Information and t-score were used to identify significant pairs – which

262

Veronica Benigno and Olivier Kraif

were manually selected to make sure useful low-frequency collocations were not discarded. In Figure 1 we show the online entry for table.

Figure 1: Entry table.

From the above example, it is evident that the Longman Collocations Dictionary was compiled with a strong qualitative component. Every collocation is accompanied by a corpus example sentence to help the user contextualize the meaning. Collocations are grouped by meanings, important information which perfectly links with current research evidence that vocabulary learning is a gradual process and that different word meanings are not acquired at once but rather gradually as the learner’s proficiency increases and his/her encounters with a given word cover different contexts (Brent 2009). Additionally, for many other entries, thesaurus information is available, as for the entry ice in Figure 2:

Core vocabulary and core collocations

263

Figure 2: Thesaurus information for the entry ice (noun).

We would also like to point out that the concept of collocation seems not to be interpreted in its narrow sense. Idiomatic units such as break your word feature in the entries (though sparingly) as well as units corresponding to functional categories, such as out of turn (see below entries break – verb – and turn – noun – in Figure 3 and 4 respectively).

264

Veronica Benigno and Olivier Kraif

Figure 3: Entry break (verb).

Figure 4: Entry turn (noun).

Pragmatic information is also provided through the indication of the degree of formality, as in the collocation conduct research – formal – (Figure 5) and through the indication of whether entries are British or American variants, as in the below example of realize (Figure 6).

Core vocabulary and core collocations

265

Figure 5: Entry research (noun).

Figure 6: Entry realize (verb).

The above examples have shown the type of information provided to learners in the Longman Collocations Dictionary and Thesaurus to help them understand how to make an appropriate use of vocabulary in English – with the intent to stress the importance of combining corpusbased or empirically-based methods with qualitative considerations by language experts. The treatment of meaning in learner dictionaries could certainly be refined to broaden users’ chances to make a native-like use of language, for example by improving the information about the

266

Veronica Benigno and Olivier Kraif

contextual and pragmatic dimensions of language use; or by outlining what semantic or syntactic constraints operate in specific contexts; or by enriching entries with information about language use in relation to the speaker’s intention. The question of how to systematically tailor lexicographic entries to learners’ needs presents many challenges and further research in this area would be needed.

References Aitchinson, Jean 1987. Words in the mind: An introduction to the mental Lexicon. Oxford: Blackwell. Bally, Charles 1951. Traité de stylistique française. Paris: Klincksieck. Baroni, Marco et al. 2009. The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web Crawled Corpora. Language Resources and Evaluation. 43/3, 209–226. Baroni, Marco et al. 2010. Web Corpora for Bilingual Lexicography: A Pilot Study of English/French Collocation Extraction and Translation. In Xiao, Richard (ed.) Using Corpora in Contrastive and Translation Studies. Newcastle: Cambridge Scholars Publishing. Benigno, Veronica 2007. Il vocabolario di base: tratti costitutivi, rilevanza cognitiva e acquisizione in italiano L2. In Lo Cascio (ed.) Parole in rete. Torino: Utet Università, 115–128. Benigno, Veronica / Grossmann, Francis / Kraif, Olivier 2015. Les collocations fondamentales: une piste pour l’apprentissage lexical. Revue française de linguistique appliquée. Le lexique: descrip tion et apprentissage. 20/1, 81–96. Benigno, Veronica / Kraif, Olivier / Grossmann, Francis (forthcoming). La notion de collocation fondamentale. Une étude de corpus. Cahiers de Lexicologie. Benigno, Veronica / Vedder, Ineke (forthcoming). Lexical richness and collocational competence in second-language writing. IRAL. Boers, Frank et al. 2006. Formulaic sequences and perceived oral proficiency: Putting a lexical approach to the test. Language Teaching Research. 10, 245–261.

Core vocabulary and core collocations

267

Brent, Wolter 2009. Meaning-last vocabulary acquisition and collocational productivity. In Fitzpatrick Tess / Barfield Andy (eds) Lexical Processing in Second Language Learners: Papers and Perspectives in Honour of Paul Meara. Bristol: Multilingual Matters (“Second Language Acquisitionˮ, vol. 39), 128–140. Carter, Ronald 1998. Vocabulary. Applied Linguistic Perspectives, London/New York: Routledge, 2nd edition. Church, Kenneth Word / Hanks, Patrick 1990. Word association norms, mutual information, and lexicography. Computational Linguis tics. 16, 22–29. COBUILD English Language Dictionary, 1987, London: Collins. Cowie, Anthony P. (ed.) 1998. Phraseology. Theory, Analysis, and Ap plications. Oxford: Clarendon Press. Cowie, Anthony P. 1999. English dictionaries for foreign learners: a history. Oxford: Clarendon Press. De Mauro, Tullio 1980/2003. Guida all’uso delle parole. Roma: Editori Riuniti. Durrant, Philip / Schmitt, Norbert 2009. To what extent do native and non-native writers make use of collocations? IRAL. 47, 157–177. Ellis, Nick C. 2002. Frequency effects in language processing. A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition. 24, 143–188. Ellis, Nick C. 2003. Constructions, chunking, and connectionism: The emergence of second language structure. In Doughty, Catherine J. / Long, Michael H. (eds) The handbook of second language acqui sition, Oxford: Blackwell, 63–103. Evert, Stefan 2008. Corpora and collocations. In Lüdeling, Anke / Kytö, Merja (eds) Corpus Linguistics. An International Handbook. Vol. 2. De Gruyter, 1212–1248. Fillmore, Charles J. 1982. Frame semantics. In Linguistics in the Morning Calm, ed. By The Linguistic Society of Korea. Seoul: Hanshin, 111–137. Firth, John Rupert 1957. A synopsis of linguistic theory, 1930–1955. In Firth, John Rupert et al. (eds) Studies in linguistic analysis, Spe cial volume of the Philological Society. Oxford, Blackwell, 1–32. Francis, Gill / Huston, Susan 2000. Pattern Grammar, a corpus-driv en approach to the lexical grammar of English. Amsterdam/ Philadelphia: John Benjamins.

268

Veronica Benigno and Olivier Kraif

Fuster Marquez, Miguel / Pennock Speck, Barry 2008. The spoken core of British English: a dyachronic analysis based on the BNC. Mis celanea: a journal of English and American studies. 37, 53–74. Gitsaki, Christina 1999. Second language lexical acquisition: A study of the development of collocational knowledge. Bethesda, MD: International Scholars Publications. Goddard, Cliff / Wierzbicka, Anna 2007. Semantic primes and cultural scripts in language learning and intercultural communication. In Farzad, Sharifian / Palmer, Gary B. (eds) Applied Cultural Lin guistics: Implications for second language learning and intercul tural communication. Amsterdam/Philadelphia: John Benjamins, 105–124. Granger, Sylviane / Paquot, Magali 2008. Disentangling the phraseological Web. In Granger, Sylviane / Meunier, Fanny (eds) Phraseology. An interdisciplinary perspective. Amsterdam/ Philadelphia: John Benjamins, 27–50. Grossmann, Francis / Tutin, Agnès (eds) 2003. Les collocations: ana lyse et traitement. Amsterdam: De Werelt. Gougenheim, Georges 1958/1971. Dictionnaire fondamental de la langue française. Didier Edition international. Halliday, Michael A. K. 1978. Language as Social Semiotic. The social interpretation of language and meaning. London: Arnold. Halliday, Michael A. K. 1985. An introduction to functional grammar. London: Arnold, 2nd edition. Halliday, Michael A. K. 1996. Lexis as a Linguistic Level. Journal of Linguistics. 2/1, 57–67. Haygood, James Douglas 1937. Le vocabulaire fondamental du français. Etude pratique sur l’enseignement des langues vivantes. Genève: Librairie E. Droz. Hoey, Michael 2005. Lexical priming. A new theory of words and lan guage. London: Routledge. Howarth, Peter A. / Nesi, Hilary 1996. The teaching of collocations in EAP. Technical report, Université de Leyde. Hunston, Susan 2002. Corpora in applied linguistics. Cambridge: Cambridge University Press. Ogden, Charles Kay 1930. Basic English, a general introduction, with rules and grammar. London: K. Paul.

Core vocabulary and core collocations

269

Kraif, Olivier 2011. Les concordances pour l’observation des corpus; utilité, outillage, utilisabilité. In Chuquet, Jean (ed.) Le langage et ses niveaux d’analyse, Presses Universitaires de Rennes, 67–80. Laufer, Batia / Nation, Paul 1995. Vocabulary size and use: lexical richness in L2 written production. Applied Linguistics. 16, 307–322. Laufer, Batia / Waldman, Tina 2011. Verb-noun collocations in second language writing: A corpus analysis of learners English. Lan guage Learning. 61, 647–672. Le Francais élémentaire 1954. Ministère de l’éducation nationale, Centre National de Documentation Pédagogique, Paris. Legallois, Dominique 2005. Du bon usage des expressions idiomatiques dans l’argumentation de deux modèles anglo-saxons: la Grammaire de Construction et la Grammaire des Patterns. In Bolly, Catherine / Klein, Jean René / Lamiroy, Béatrice (eds) La phraséologie dans tous ses états. Actes du colloque Phraséologie 2005, Louvain-la-Neuve, October 2005, Cahiers de l’Institut de Linguistique de Louvain. 31, 109–127. Lo Cascio, Vincenzo 2000. La théorie des profils textuels et la compé tence lexicale: les collocations. In Collès, Luc et al. (eds) Didac tique des langues romanes: le développement des compétences chez l’apprenant, Actes du colloque de Louvain-la-Neuve, January 2000. Bruxelles: De Boeck/Duculot, 349–359. Longman dictionary of contemporary English, 2013. London: Longman Pearson. Manning, Christopher D. / Schütze, Hinrich 1999. Foundations of sta tistical natural language processing. Cambridge: MIT Press. McCarthy, Michael 1990. Vocabulary. Oxford: Oxford University Press. Meara, Paul Michael / Bell, Huw 2001. P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect. 16/3, 5–19. Mel’čuk, Igor 1998. Collocations and Lexical Functions. In Cowie, Anthony P. (ed.) Phraseology. Theory, Analysis, and Applications. Oxford: Clarendon Press, 23–53. Nation, Paul 2001. Learning vocabulary in another language. Cambridge: Cambridge University Press. Nation, Paul / Beglar, David 2007. A Vocabulary Size Test. The Lan guage Teacher, 31/7, 9–13.

270

Veronica Benigno and Olivier Kraif

Nation, Paul 2006. How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review. 63/1, 59–81. Nattinger, James R. / DeCarrico, Jeanette S. 1992. Lexical Phrases and Language Teaching. Oxford: Oxford University Press. Ogden, Charles Kay 1930. Basic English, a general introduction, with rules and grammar. London: K. Paul. Paradis, Michel 2004. A neurolinguistic theory of bilingualism. Amsterdam: John Benjamins. Pechoin, Daniel 1991. Thesaurus Larousse, Des mots aux idées, des idées aux mots. Paris: Larousse. Richards, Jack C. / Savard, Jean Guy 1970. Les Indices d’utilité du vocabulaire fondamental français. Québec: Les Presses de l’université de Laval. Sinclair, John 1991. Corpus, concordances, collocation. Oxford: Oxford University Press. Stubbs, Michael 2002. Words and phrases: corpus studies of lexical semantics. Oxford: Blackwell Publishing. Thorndike, Edward L. 1921. The Teacher’s Word Book. New York Teachers College: Columbia University. Tomasello, Michael 2003. Constructing a language. Usage-Based Theory of Language Acquisition, Cambridge/Massachusetts/ London: Harvard University Press. Vander Beke, G., E. 1929. French Word Book. Publications of the American and Canadian Committees on Modern Languages. 15. New York: The Macmillan Co. West, Michael 1953. A General Service List of English Words. London: Longman. Wray, Alison 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press.

Online References Bonk, William J. 2000. Testing ESL Learners’ Knowledge of Colloca tions, . Wacky, .

Francis Grossmann, Agnès Tutin

NOUN PREP NOUN collocations in French: the case of scientific lexicon1

Abstract: In line with Explanatory and Combinatorial Lexicology (Mel’čuk 1998), and with continental European tradition (Hausmann 1989; Grossmann/Tutin 2003; Tutin 2013), we posit that collocations are recurrent binary associations of meaningful words, which have a syntactic and a semantic relation. They include two dissymmetric components (Hausmann 1989): the base (e.g. attention in pay attention) which works in an autosemantic way (the semantic meaning can be interpreted in isolation) and the collocate (e.g. pay in pay attention) which works in a synsemantic way (the semantic meaning is constructed in cooccurrence with the base). However, sometimes, it is not easy to draw the line between collocations and “full phrasemes”: it is the case for constructions such as NOUN PREP NOUN (e.g. cuiller à soupe, ‘tablespoon’). Due to their varying semantic and syntactic properties, these constructions constitute in the French language a real challenge for analysis, both in the field of general and specialized discourse. We focus here on cross-disciplinary scientific lexicon, i.e. lexicon dealing with methods, arguments, opinions and metadiscourse in scientific writing (e.g. hypothèse de travail ‘work hypothesis’ or cadre d’interprétation, ‘interpretative framework’), analyzed on the basis of a large corpus of scientific papers. The study examines in detail the criteria used to decide whether a NOUN PREP NOUN construction is or is not a collocation by carrying out a case study on nominal collocates associated with prototypical nouns in scientific lexicon. Candidate collocations were extracted from our corpora and the list of co-occurrences examined in order to classify them into different types on the basis of a combination of semantic and syntactic criteria: the semantic status of Noun 1 or Noun 2, the presence or absence of a PP argument and the syntactic status of Noun 2, the role played by the preposition, the determiners, and the grammatical number specification. By crosschecking the different criteria, our study allows five types to be distinguished: a) objective genitive constructions, b) subjective genitive constructions, c) predicative structures, d) specification structures, and e) classification structures. The results seem to indicate that predicative and specification structures establish particularly favourable conditions for the emergence of collocations, although other factors – lexical or pragmatic – may also be involved. Keywords: Noun Phrase Construction, scientific lexicon, corpus-based study, collocations

1

Special thanks to colleagues involved in this project: Olivier Kraif, Thi Thu Hoai Tran, Sylvain Hatier and Emmanuelle Dusserre.

272

Francis Grossmann, Agnès Tutin

1. Introduction Among collocations in French, NOUN PREP NOUN constructions (hereafter N PREP N constr) such as principes de base (‘basic principles’) or champ d’étude (‘field of study’) are especially frequent, and the same is true of this structure in English.2 This is particularly the case in cross-disciplinary scientific lexicon pertaining to scientific activity, scientific notions, scientific reasoning, and scientific discourse. However, delimiting collocations of this structure can be a complex task for linguists and lexicographers because of its polyfunctional nature. In this study, we assume that a better understanding of the syntactic and the semantic properties of N PREP N collocations is crucial for helping linguists in lexicographic treatment. We believe that a typology of constructional frames likely to produce collocations is a useful tool for the lexicographer. After defining our conception of collocation, inspired by the continental tradition, we outline the specific lexical field of our study, i.e. cross-disciplinary scientific lexicon, as well as our corpus-based methodology and the collocation extraction method used. We then discuss problems raised by delimiting cross-disciplinary scientific collocations. We finally outline a classification of N prep N constructional frames related to collocations in our field.

2. A definition of collocation Collocations now constitute a well-known subtype of multiword expressions. However, several competing definitions of this notion exist, among which two main trends stand out (Williams 2003) namely the British tradition and the continental tradition. In the framework of 2

It is nevertheless important to point out that a fundamental difference between English and French lies in the fact that in English the Noun 2 Noun 1 construction represents a serious competitor to forms in Noun1 of Noun 2 (see Rossi et al. forthcoming).

NOUN PREP NOUN collocations in French: the case of scientific lexicon

273

functional linguistics, the British tradition often defines collocations on the basis of usage-based and statistical criteria: they are simply sequences of words which tend to co-occur, as formulated by Sinclair (1991). Collocation is the occurrence of two or more words within a space of each other in a text … Collocations […] can be important in the lexical structure of the language because of being frequently repeated. This second kind of collocation, often related to measure of statistical significance, is one that is usually meant in linguistic discussions (Sinclair 1991: 170).

The continental tradition of phraseology, inspired by several sources, inter alia Bally (1909 [1951]) and the Russian tradition (e.g. Vinogradov 1947; Zolkovskij/Mel’čuk 1967; Cowie 1998), is part of lexicology and provides fine-grained typologies of MWEs. In this line of research, definitions of collocations use formal criteria: they are broadly defined as recurrent binary associations of meaningful words, which have a syntactic and a semantic relation. They include two dissymmetric components (Hausmann 1989; Heid 1994): the base (e.g. attention in pay attention) which works in an autosemantic way (the semantic meaning can be interpreted in isolation) and the collocate (e.g. pay in pay at tention) which works in a synsemantic way (the semantic meaning is constructed in cooccurrence with the base). We endorse this approach, whose most eminent representatives are Hausmann (1989) and Mel’čuk (1998). Following previous work (mainly Tutin 2010), we propose a set of criteria for defining collocations: 1.

2.

Collocations are associations which typically involve two semantically full linguistic elements. We thus exclude associations between a full element and a grammatical word, such as give and to in he gave a book to Laura. But we include collocations such as out of love in she committed the crime out of love, where out of has a causal explanation. Most collocations are binary collocations and most ternary collocations such as pay close attention can be analyzed as merging binary collocations such as pay attention + close attention or as recursive collocations such as mettre (en colère) where en colère is both a collocation and the base associated with mettre.

274 3.

4.

5.

Francis Grossmann, Agnès Tutin

Collocations are involved in specific syntactic relations such as subject-verb, verb-complement, noun-adjective, adverb-adjective, etc. which can be represented with the help of dependency relations such as in Tesnières’s model (1959). Collocations are endocentric multi-word expressions which generally follow usual syntactic rules. Some collocations may sometimes have specific syntactic constructions such as Adj Adj constructions (e.g. ivre mort, lit. ‘drunk dead’) or N N (amour passion, lit. ‘love passion’). They also may involve specific constructions associated with specific meanings, such as similes (e.g. Adj comme N, heureux comme un roi) which generally have an intensive meaning, or color adjectives with the pattern Adj N (e.g. bleu roi, vert pomme, gris souris). As regards semantics, collocations have the following properties: a. They are fully or largely compositional: one element of the collocation, the base, retains its usual meaning. The other part, the collocate, can be a) regular (it applies to a large set of elements, e.g. grand in grand travailleur), b) transparent, if it is easy to decode, but unpredictable (e.g. faim de loup, ‘wolf hunger’ but appétit d’ogre, lit. ‘ogre appetite’), or c) opaque, when it is not easily predictable (e.g. peur bleue, lit. ‘blue fear’ for ‘intense fear’) (for a semantic typology of collocations, see Tutin/Grossmann 2002; Grossmann/Tutin 2003). Regular collocations seem to be by far the most numerous. b. Collocations can generally be analyzed as predicate-argument relations (Tutin 2013). The predictable element, the base, is the argument of the relation, whereas the unpredictable element, the collocate, is a predicate. In peur bleue, the collocate bleue intensifies peur; in drive crazy, drive is the causative predicate of the argument crazy. As regards usage, the constituent elements of collocations are frequent combinations which have a strong attraction, which is often computed with the help of association measures (see Evert 2008, for an overview on this topic). While the meaning of the collocation is generally transparent, the collocates are quite arbitrary to encode. Non-native speakers generally understand

NOUN PREP NOUN collocations in French: the case of scientific lexicon

275

them but may be unable to produce them. This is why collocations are a major issue in language teaching and learning as reflected in the large number of dictionaries of collocations dedicated to second language acquisition (Cf. Nesselhauf 2005). These multiword expressions must be learnt and stored by non-native speakers. In the next section, we present the framework of this linguistic study – the semantic field of cross-disciplinary scientific lexicon, already studied in English, e.g. in the Academic Wordlist (Coxhead 2002) or the Academic Vocabulary (Paquot 2010).

3. Framework of the study: cross-disciplinary scientific lexicon This project is part of the ANR Termith project3 – Terminologie et Indexation de Textes en Sciences Humaines (‘Terminology and Indexing in Human Sciences’) – aimed at developing lexical resources to facilitate the automatic indexing of terms in the human sciences. In order to make this process easier, non terminological resources of cross-disciplinary scientific lexicon are used (see Jacquey et al. 2013). This general scientific lexicon is made up of notions which pertain to scientific activity, scientific notions, scientific reasoning, and scientific discourse (see Pecman 2004; Paquot 2010; Tutin 2014), e.g. words and expressions such as hypothesis, notion, analyze, encouraging, to a certain extent, contrary to our expectations, take into account, etc. In the Termith project, this lexicon is mainly used in two ways: 1.

2.

As an exclusion lexicon for term candidates. For example, the word conclusion would not be a likely term candidate in expressions such as draw a conclusion, where it is included in a cross-disciplinary collocation, used in any type of scientific text. As term introducers. Jacquey et al. (2013) observed that verbs of cross-disciplinary lexicon, and especially some semantic

3 .

276

Francis Grossmann, Agnès Tutin

classes, were more likely to introduce terms. For example, verbs of the ‘definition’ semantic class (define, call) often co-occur with terminology, e.g. we define strong collocations as colloca tions which have a hyponymic relation with the head noun of the expression. We also wish to use this lexicon for pedagogical applications, mainly in the teaching of French for Academic Purposes. Currently, a lexicon of 1246 single words of cross-disciplinary scientific lexicon has been extracted from a large corpus of scientific articles (Hatier 2013), for the following categories: noun, verb, adjective, and adverb with cross-category semantic labels. In addition to single words, resources are also being developed for different kinds of phraseological expressions (Tutin 2014). Thus, we distinguish several kinds of multiword expressions, depending on their syntactic, semantic, pragmatic, and rhetorical functions, mainly: 1.

2.

3.

4.

Referential expressions, which are related to scientific activity, scientific objects, or scientific reasoning, among which a. Collocations such as faire une hypothèse, lit. ‘make a hypothesis’ or jouer un rôle ‘to play a role’ b. Frozen non-compositional expressions (“full phrasemes” according to Mel’čuk’s terminology) such as tenir compte ‘to take into account’ or point de vue ‘point of view’ Discursive multiword expressions, which aim at linking textual parts and are mainly structural and metatextual markers (to begin with, in other words) and logical connectives such as that is why (see Tran 2014, for a study of reformulation discourse markers in scientific articles). Attitudinal multiword expressions, which are mainly related to stance (e.g. jusqu’à un certain point ‘to a certain extent’) and modality (e.g. deontic modality, il est nécessaire de ‘it is essential to’) (for a typology of attitudinal MWEs, see Grossmann et al. forthcoming). Semantico-rhetorical routines, which are fully specific to the scientific genre. They are associated with rhetorical functions such as comparison with peers or definition of the main research objective.

NOUN PREP NOUN collocations in French: the case of scientific lexicon

277

In this article, we only focus on collocations, which involve single words from the cross-disciplinary scientific lexicon belonging to the list of 1246 single words (see Hatier 2013). For this task, we used a corpus developed in the framework of the Scientext project and extended by Tran (2014) and Hatier (2013), which includes a set of 500 scientific articles of first-class French scientific journals in the human sciences, involving 10 disciplines.4 For collocation extraction, we used a corpus tool developed by Kraif/Diwersy (2014): the Lexicoscope. This corpus tool extracts combinatory profiles based on syntactically analyzed corpora (XIP, for our corpus), similar to the collocation extraction techniques developed by Seretan (2011) and Kilgarriff et al. (2010) but it also enables the extraction of complex collocations and the use of parameters such as frequency threshold or type of syntactic relation (Kraif/Diwersy 2014). For collocation extraction, it is possible to select specific syntactic relations, for example nouns which are complements of nouns, with a minimum frequency of 7 occurrences, a log-likelihood ratio of at least 10.7, and a distribution in at least 3 disciplines. Fig. 1 shows the most significant nominal co-occurrences of the noun résultat (as a head). With our criteria, 15 expressions were extracted. But, obviously, once the expressions are extracted, they must be examined carefully in order to decide which ones can be considered as cross-disciplinary collocations. Naturally, some syntactic errors must be discarded. For example, the collocation résultat de DET présent should not be retained because there is a morpho-syntactic mistake with présent which is analyzed as a noun instead of an adjective, as in the following example: (1) Les résultats de la présente étude suggèrent que les relations entre ces deux niveaux d’information ne sont pas déterminées une fois pour toutes. [‘The results of this study suggest that the relations between these two levels of information are not fixed once and for all’] (Psychology).

Among our expressions, we wanted to differentiate those that could clearly be considered as cross-disciplinary scientific collocations from those that were more clearly phrasemes (e.g. discourse analysis) or 4

Economics, sociology, political science, education, information science, psychology, linguistics, history, geography, and anthropology.

278

Francis Grossmann, Agnès Tutin

belonged to a specific discipline. We also wanted to examine the semantic and syntactic relationship between the elements of the NOUN PREP NOUN collocations. This is the issue that will be addressed in the following two sections.

Fig. 1. Extraction of NOUN PREP NOUN collocations with the noun résultat with the help of the Lexicoscope.

4. Delimiting collocations We believe that it is useful to differentiate collocations from other kinds of expressions, for two main reasons. First, as compositional expressions, collocations are encoded within the lexical entries of their elements, and some semantic relations between the elements, such as Lexical Functions (Mel’čuk 1998), can be used. From a practical viewpoint, dictionaries of collocations use single words as lexical entries (e.g. Dictionnaire des combinaisons de mots 2007, Oxford Collocations Dictionary for Students of English 2002), while dictionaries generally use multiword expressions (e.g. pomme de terre) as entries. Second, for pedagogical purposes it is

NOUN PREP NOUN collocations in French: the case of scientific lexicon

279

important to highlight the specificity of collocations, which are especially easy to understand but hard to produce and are particularly tricky for second language learners (Nesselhauf 2005). Delimiting collocations from other kinds of expressions is a challenging task. In particular, collocations can be confused with “full phrasemes”. In a recent study on the annotation of multiword expressions in several types of texts (Tutin et al. 2015), we observed several kinds of interannotator disagreement between collocations and full phrasemes due to specific syntactic problems (lack of determiner, for example) or hyperonymic expressions (e.g. is cuiller à soupe (‘tablespoon’) a collocation since it is partially compositional or is it a phraseme?). We wanted to examine in detail the criteria used to decide whether a multiword expression is a collocation by carrying out a case study on collocations associated with prototypical nouns in scientific lexicon, pertaining to different semantic classes (such as analyse, cadre, champ, dimension, évaluation, facteur, hypothèse, mesure, méthode, résultats, pertinence, importance). Candidate collocations were extracted from our corpora with the Lexicoscope tool as presented in section 3. This list of co-occurrences was examined in order to refine our criteria for analyzing and describing collocations. 4.1 Expressions which are not collocations Our extraction tool automatically extracts co-occurrences with a set of criteria (freq > 7; 3 disciplines out of 10; log-likelihood ratio > 10.7) but among the extracted elements, some are clearly not interesting for the task at hand. As highlighted by Evert (2008), results of extraction often require tedious processing of elements since automatic extraction is unable to distinguish between collocations and full phrasemes, for example.5 Among our co-occurrences, we obviously discarded N PREP Nconstr related to specific disciplinary fields such as activité de production (‘production activity’) or capacité de stockage (‘storage activity’). Other than 5

Unless full phrasemes are directly encoded as single lexical units in our annotated corpus. This is the case for some prepositions (e.g. au fur et à mesure) but for very few nominal expressions. Verbal phrasemes are not analyzed with XIP.

280

Francis Grossmann, Agnès Tutin

that, three main types of “non collocations” occurred: free combinations, expressions included in larger expressions, and full phrasemes. 4.1.1 Free combinations As highlighted by Hausmann (1989), it is often difficult to distinguish free combinations from more routinized collocational expressions. In our project, we are guided by practical considerations. Are combinations typical of scientific writing and useful, e.g. for pedagogical purposes? If they are too common in all kinds of combinations, they are not included. For example, as a general rule, in our list of noun-adj collocations we do not include adjectives such as ordinals (première hypothèse6, dernière hypothèse) or adjectives such as autre (‘other’) or différent (‘different’) which are very frequent and for which there are almost no combinatory restrictions. For the same reason, we exclude nouns such as différence (‘difference’) with a dependent prepositional noun phrase: différence de résultats (lit. ‘difference of results’), différence d’interprétation (lit. ‘difference of interpretation’), différence de performance (lit. ‘difference of performance’), etc., although deadjectival property nouns can sometimes be the syntactic head of a collocation, when N1 has a more specific meaning (see 5.2.1). 4.1.2 Subparts of larger multiword expressions As our extraction method is fully automatic, it extracts co-occurrences of two words which may occur in larger expressions. This cannot be easily completely automated although the extraction tool Lexicoscope also includes a complex collocation extraction device (Cf. Kraif et al. 2014). The observation of some co-occurrences in concordances and our knowledge of expressions enabled us to discard several kinds of associations included in other expressions. For example, large mesure (‘large extent’) is always almost included in the adverbial expression dans une large mesure (‘to a large extent’), as is the expression mesure du possible (‘possible extent’) in the adverbial expression dans la mesure du possible (lit. ‘to the possible extent’). The problem is even more complex when a collocation such as cadre d’analyse (lit. ‘frame of analysis’) seems to be integrated in phrases 6

In a postnominal position, however, premier (which means ‘main, crucial’) has a very specific meaning: cause première, objectif premier.

NOUN PREP NOUN collocations in French: the case of scientific lexicon

281

such as dans le cadre de notre analyse, which are part of semanticorhetorical routines (see section 3 above). 4.1.3 Nominal phrasemes Distinguishing phrasemes from collocations is a bit more complicated, since nominal phrasemes in our semantic field are rarely completely non compositional. It is widely known that collocations and phrasemes can be distributed along a continuum (see, for example, Gabrielatos 1994; Dubreuil 2008) and this is especially true in our semantic field. Prototypical phrasemes such as point de vue (‘point of view’) or cahier des charg es (‘tender specifications’) are very rare. We considered expressions as phrasemes when the meanings of their components were not sufficient to be considered as compositional. Such examples include cadre théorique (‘theoretical framework’), cadre conceptuel (‘conceptual framework’), or analyse de discours (‘discourse analysis’) which refer to specific concepts in scientific writing. The expression analyse de discours refers to a specific analytical method, borrowed from linguistics and widely used in the human sciences. Consequently, the notion of analyse du discours, contrary to expressions such as analyse quantitative or analyse qualita tive, does not constitute a subtype of analysis but rather a disciplinary subfield, or a scientific methodology used in the human sciences. Due to these elements, we can consider that this expression is generally more a phraseme than a collocation. However, the expression cannot be considered as being completely non compositional since the meaning of analyse remains transparent and accessible in the expression.

5. N PREP N constr of cross-disciplinary scientific collocations 5.1 Criteria for a N PREP N constr classification In the N PREP N constr, the relations between N1 and N2 need to be carefully examined, in order to choose the precise way in which to identify and sort the various cross-disciplinary collocations. Several criteria

282

Francis Grossmann, Agnès Tutin

must be taken into account and we would like to highlight four of them seem particularly relevant to our perspective: a) b)

c)

d)

The semantic characterization of N1: is it predicative or not predicative? And if it is predicative, could a sub-categorization be necessary? If it is not predicative, what kind of nouns are found? The semantic characterization of N2, and the semantic relation between N1 and N2: even if at first sight the semantic characterization of N1 seems more useful for our purposes than that of N2, the analysis of the latter is closely linked with that of the former. The presence or absence of a PP argument and the syntactic role played by the Prep N2: is this an argument for a predicative N1 or not? Are the differences observed between the different kinds of argumental structures significant for the “collocational status” of the N PREP N constr? The role played by the preposition, the determiners, and the grammatical number: this relates in particular to the presence or absence of a determiner before N2, and the type and order of the determiners (e.g. definite determiner with N1, indefinite determiner with N2, etc.): this is a very complex issue, which deserves to be studied in its own right; only a few remarks about this question will be presented here.

It is important to keep in mind that the different characteristics involved interact and therefore cannot be viewed in isolation. The next subsection briefly discusses some of their implications. 5.2 Semantic and syntactic constraints in N PREP N constr 5.2.1 The semantic status of N1 or N2 Some aspects related to criteria a and b (see 5.1. above) are considered in the present subsection. While the semantic status of N1 or N2 is important for classification purposes, it should be noted that this largely depends on the general characteristics of scientific discourse, which requires many abstract concepts often expressed by predicative

NOUN PREP NOUN collocations in French: the case of scientific lexicon

283

or ‘ideality’ nouns (see below). Both in N1 or in N2 position, two main types of predicative nouns are often used in this context, namely: –

–

(deverbal) activity nouns, and more specifically nouns that express a mental process or a language activity, e.g. évaluation de la performance (‘performance appraisal’); cadre d’interprétation (‘interpretative framework’) (deadjectival) property nouns or quality nouns, e.g. pertinence de l’analyse (‘relevance of the analysis’).

In N1 position, these predicative nouns entail a predicate-argument structure (see 5.2.2) whose collocational status needs to be checked. Another semantic category, also found in N1 or N2 position in our corpus, includes nouns of ‘mental objects’ (or ‘ideality nouns’), e.g.: l’hypothèse de départ (‘initial hypothesis’); la pertinence de l’hypothèse (‘the relevance of the hypothesis’). This class was established by Flaux (2012) and Flaux/Stosic (2014), following Husserl, to explain the linguistic behavior of nouns such as sonata, poem, engraving, and theorem.7 The sub category of ‘free idealities’ (see note 7) has been further divided into the logical, mathematical, and symbolic subclasses, on the basis of syntactic and modal properties. In scientific writing, nouns such as théorie (‘theory’), hypothèse (‘hypothesis’) postulat (‘postulate’), and modèle (‘model’) seem to denote ‘free idealities’ of a logical type, but their syntactical behavior is not fully similar: for example, hypothèse can have a complement clause (l’hypothèse que … ‘the hypothesis that’) or a pseudo-relative (l’hypothèse selon laquelle…) whereas théorie only accepts pseudorelatives (la théorie selon laquelle…). Contrary to mathematical free idealities such as nombre (‘number’), logical free idealities can have a truth value, which explains the possibility of collocations such as cette hypothèse est fausse (‘this hypothesis is false’). 7

Such nouns “refer to those objects that are endowed with a spiritual content supposed to be interpreted by humans” (Flaux/Stosic 2014: 127). In this class, the authors make a distinction between nouns denoting “free idealities” (e.g. theorem, number, triangle) and nouns denoting “bound idealities” (e.g. sym phony, novel, painting), on the basis of the fact that the latter, unlike the former, are instantiated in space and time.

284

Francis Grossmann, Agnès Tutin

There is another interesting class of nouns in which, semantically, N1 plays the role of a general ‘classifier’8 that indicates the frame or the perspective in which the meaning of N2 is to be considered: for example, cadre d’analyse (‘framework of analysis’) is the scientific frame in which an analysis is constructed. These nouns (coded as ‘espace_domaine’ in the Termith lexical database) such as cadre, champ (‘field’), domaine (‘domain’), generally have a spatial (metaphorical) meaning, with a very broad and fuzzy definition. Of course other common types must also be added to these classes, such as human (agent) nouns, which can appear in the context of cross-disciplinary scientific N PREP N collocations: for example, in the collocation évaluation par les pairs (‘peer review’), N1 is a deverbal activity noun and N2 denotes a [+HUMAN] agent. As a conclusion, it could be said that the semantic categorization of N1 or N2, as interesting as it may seem for linguistic analysis, does not provide sufficient input to define the collocational status of N PREP N constr. Let us now focus rather on the role of the argumental structure. 5.2.2 The presence or absence of a PP argument and the syntactic status of N2 The presence or absence of a PP argument is closely linked to the semantic status of N1 since predicative – often deverbal or deadjectival – nouns have an argumental structure (cf. above 5.2.1., examples such as évaluation de la performance or pertinence de l’analyse). In the first case, N2 is the object of the deverbal noun whereas, in the second, N2 is the subject argument of a deadjectival noun. This kind of argumental structure is very common, and it is crucial for us to know whether such constructions could be analyzed as collocations rather than free combinations (see 4.1.1.). Viewed in narrow terms, their inclusion is highly debatable. But, both from the continental perspective of phraseology9 8

9

‘Classifier noun’ is here used in the meaning of a noun with a categorizational function: if I speak of ‘the unemployment problem’, I present unemployment as a problem. Cf. examples such as arrêt du combat (‘cessation of fighting’), réalisation d’un désir (‘desire fulfilment’), témoignage d’admiration (‘expression of admiration’) in Dictionnaire explicatif et combinatoire du français contemporain (1984, 1988, 1992, 1999) .

NOUN PREP NOUN collocations in French: the case of scientific lexicon

285

and from the contextual statistical British perspective, provided they are frequent – which is the case with our selection criteria (see section 4) – they often do deserve collocational status. It should also be taken into account that the predicative N1 can have a greater or lesser degree of specificity: with frequent but specific combinations, in view of the practical implications these can have for second language learning, it seems appropriate not to exclude them. When the N PREP N constr does not have an argumental PP, what could its syntactic role be? There are two main possibilities: a.

b.

10

N2 is the subject of an (implicit) attributive structure that marks its identity with N1: in an expression such as le principe d’uni formité (‘uniformity principle’), the N2 uniformité is identified as a principle. These combinations are sometimes analyzed as appositives.10 Our hypothesis is that in this kind of N PREP N constr, in which the identity judgment closely binds the N1 to the N2, a lot of good candidates for collocation status – or sometimes for phraseme status – are to be found. N2 plays the role of modifier: in the cross-disciplinary context of our study, N1 is generally an abstract noun: we can find examples such as hypothèse de départ (‘initial hypothesis’) or hypothèse de travail (‘working hypothesis’) which semantically seem to operate in the same way as modifiers, with a descriptive value: an initial hypothesis is an assumption which is used as the starting point for a study and a working hypothesis a hypothesis accepted, during the groundwork, as the basis for further research. These modifiers are descriptive elements, which are not syntactically or semantically indispensable but, in combination with the N1, some of them (for example the two quoted above) are lexicalized and play a role in the scientific epistemology, in the same way as noun-adjective collocations (analyse descrip tive ‘descriptive analysis’, analyse longitudinale ‘longitudinal analysis’).

Today this categorization (as ‘appositives’) is rejected, for this kind of N PREP N constr, by a majority of French linguists (see for example Noailly 2000).

286

Francis Grossmann, Agnès Tutin

5.2.3 The role played by the preposition, the determiners, and the grammatical number specification The role of the preposition in N PREP N constr, and especially the role of de,11 has already been discussed at length in the available literature. For the French language,12 several authors (Cadiot 1993; Bartning 1996; Rouget 2000; Marque-Pucheu 2008, among many others) have pointed out, with regards to constructions with the preposition de, that some general principles can be established to explain the semantic relation between N1 and N2, in accordance with their respective semantic and syntactic properties. In this paper, we will limit ourselves to the relations that can be really productive given the cross-disciplinary scientific field of collocations under examination. For example, the very specific category of “binominals” with an evaluative function – which has been largely documented in both English (Quirk et al. 1985; Aarts 1998) and French (Milner 1978; Gaatone 1985) and which includes French equivalents of constructions such as: a hell of a problem, a wonder of a city, that idiot of a prime minister – does not seem useful for our purposes, considering that this type is very rare in academic or scientific discourse. We also exclude well known types such as pseudo-partitive constructions, e.g. quantifier noun constructions (a number of people), measurenoun constructions (a pint of beer), container-noun constructions (box of chocolates), and collective classifiers (a herd of elephants), also well documented in French (Van de Velde 1995; Flaux / Van de Velde 2000). While all these constructions can be found in scientific papers, they are not productive in the field of cross-disciplinary collocations, except perhaps for some quantifier noun constructions with série (une série de questions ‘a series of questions’, une série d’enquêtes, ‘a series of surveys’, etc.) or ensemble (ensemble de facteurs, ‘a combination of factors’, ensemble de propriétés, ‘a set of properties’, etc.). For the same reasons, we also excluded certain other types that are very useful in academic papers: the Sort/Kind/Type-constructions (SKT) (with N1 such as sorte de, type de [‘sort of’, ‘type of’…]). Many examples of this construction can be found in scientific discourse, see example 2, found in our corpus: 11 12

See the historical study by Englebert (1992). For the Italian language, see Masini (2010).

NOUN PREP NOUN collocations in French: the case of scientific lexicon (2)

287

Les « théories of the middle range », selon l’expression de Merto, caractérisent le type de modélisation défendu par Fararo, Coleman, Karlsson et Simon. [‘According to Merto’s expression, “Theories of the middle range” characterize the type of modeling defended by Fararo, Coleman, Karlsson, and Simon’] (Social studies).

In some cases, these SKT constructions could have an identification function (see example 2). But they frequently play a very different role: they express approximation and are used “to identify a marginal and/or unstable element with respect to the category N2” (Masini 2010: 9). In academic discourse, the latter are often used to hedge (Hyland 1998), as seen in example 3: (3)

[…] les modèles évoqués (ici à l’état de projet) constituent une sorte de “fiction surveillée” […]. [‘The models mentioned (here at the draft stage) are a kind of “monitored fiction” […]’]. (Anthropology).

It could be worthwhile to take these constructions into account in a specific study, but in light of their high degree of generality, they do not match the objectives of our particular study on cross-disciplinary nouns. Another interesting question is the role of determination. While the absence of determiner before N2, considered by some linguists (Berrendonner 1995; Benetti 2008) as a ∅ determiner, is a complex matter it often seems – on the basis of syntactic tests13 – to operate like an intensional subcategory.14 However, in our corpus, there are many cases of N PREP N constr with a ∅ determiner before N2, which can hardly be seen as operating as a subcategory, although, in some ways, they 13

14

For example, most specifiers are pronominalizable with possessive pronouns, the pronoun en or de lui, de là (for a more complete presentation of these tests, see Benetti 2008: 119), whereas this type of anaphor is not possible for subclassifiers. Therefore, Bally (1945: 9, quoted by Berrendonner 1995: 18) analyzes the alternative phrases a) chien du berger vs b) chien de berger, distinguishing two types of PP: in a), the PP “localizes” the reference of chien (“actualization”) whereas in b) the PP is used to specialize the “virtual concept” expressed by the noun chien (“characterization”). But other equally well known considerations (see Reichler-Béguelin 1995: 194), must also be taken into account: for example, when N2 is plural and not preceded by a determiner, the determiner could be considered as a superficial allomorph of the indefinite plural.

288

Francis Grossmann, Agnès Tutin

do have a categorizing function. For example, in alternative collocations such as résultats d’enquête (‘survey results’) vs résultats de l’enquête (‘results of the survey’), the semantic difference due to the absence of determiner before N2 does not seem to entail a sub-categorization – see examples 4 and 5: (4)

Imaginons qu’on dispose de N résultats d’enquête, […] [‘in the event that we have X survey results’] (Political science)

(5)

Cette publication présente les résultats d’une enquête par questionnaire [‘This publication presents the results of a questionnaire survey’] (Political science)

Even if the meaning is naturally different due to the types of determiners (and to their respective succession in each of these occurrences), this does not mean that the ‘survey results’ are a subspecies of the ‘results’ category. 5.3 Toward an operational classification of N PREP N constr The classification used here, crosschecked against the different criteria, follows Benetti (1995) and Bartning (1996) to some extent, whose proposals are the most relevant for the purposes of this study. In this regard, we have sorted them according to five main types: 5.3.1 N1 is a predicative noun (typically deverbal) and N2 is its object argument (‘objective genitive’) In this category, we find mainly deverbal N1 that express, for the most part, a mental process or a language activity,15 e.g. évaluation de la performance (‘performance appraisal’). Although only évaluation is formally a deverbal noun, these nouns share with deverbal nouns the ability to generate a double interpretation: processive or resultative, the latter leading often – but not always – to transformation into a countable and concrete noun. 15

See the list of metalinguistic labels established by Francis (1994) for abstract nouns in academic writing, and quoted by Paquot (2010: 80) and the list of scientific cross-disciplinary items, established by Hatier (forthcoming).

NOUN PREP NOUN collocations in French: the case of scientific lexicon

289

Among scientific nouns, we can find deverbal nouns from general language such as conception (conception d’outils ‘tool design’) or utilisation (utilisation de méthodes ‘use of methods’, utilisation de modèles, ‘use of models’) but also examples of scientific processes such as analyse (‘analysis’) and évaluation (‘evaluation’) which are all good illustrations of this category. N1 refers to a mental process or a language activity where N2 represents the object (and syntactically, the second argument of N1), and is mostly a cross-disciplinary noun. Determination and grammatical number are not fixed (see our comment in 5.2.3). For example, N2 refers to the object of the analysis or the evaluation; the object argument specifies the applied field of analysis or evaluation process, e.g. analyse de l’activité or des activités (‘activity analysis’), analyse de l’effet or analyse des effets (‘effect analysis’). 5.3.2 N1 is a predicative noun or a relational noun and N2 is the agent or the semantic subject (‘subjective genitive’) Two cases must be distinguished here: a)

b)

16

N1 is a predicative noun – deadjectival or deverbal: e.g. la com plexité du processus (‘the complexity of the process’), la per tinence de l’indicateur (‘the relevance of the indicator’), le changement de paradigme (‘the paradigm shift’). Deadjectival nouns are mainly property nouns (importance des facteurs, com plexité des processus). The class of deverbal N1 mainly includes evolving processes such as changement, baisse, évolution, trans formation). N2 frequently denotes an ideality noun. N1 is a relational noun or sometimes an ideality noun (when it functions as a relational noun). ‘Relational nouns’ are nouns that semantically require a complement in their argumental structure (Fillmore 1968; Löbner 1985; Keiser 2007). In general language, relational nouns usually refer to body parts or relationship, but in scientific writing they can be text nouns,16 for example, which specify the parts of an academic paper (la prochaine section de notre article ‘the forthcoming section of our paper’: a section is “Text nouns refer to the formal textual structure of discourse, e.g. phrase, words, quotation, excerpt, section, term, etc.” (Paquot 2010).

290

Francis Grossmann, Agnès Tutin

a part of paper, seen as a whole). We also include in this class the category of logical relations such as cause, conséquence, cor rélation, but this subcategory is not very productive for collocations in our corpus. In some cases, nouns that belong to these predicative subjective constructions or are relational appear to impose more constraints on the grammatical number specification of N1 or N2: for example, in com plexité du processus (‘complexity of the process’), N2 can be singular or plural but N1 is seldom used in the plural form (*Les complexités du processus). 5.3.3 N1 is a predicative noun and the N2 subject of an (implicit) predicative structure N2 (in scientific writing) is typically an ideality noun. This category includes nouns such as principe (‘principle’), critère (‘criterion’), and facteur (‘factor’) in collocations such as le principe d’uniformité (‘uniformity principle’) or le facteur de production (‘productivity factor’) sometimes analyzed as appositives. This construction corresponds to a predicative sentence whose subject is the N2, expressing a judgment of equivalence: ‘uniformity is a principle’. In scientific writing, N2 is typically a singular abstract noun, belonging to ideality or activity nouns: – l’idéal d’objectivité (‘the ideal of objectivity’, epistemology) – l’hypothèse de normalité (‘the assumption of normality’, economics) – critère de selection (‘selection criterion’, methodological element) Few cross-disciplinary scientific collocations could be found among these N PREP N constr, which are mainly used to categorize a concept or a technique based on domain specific knowledge. 5.3.4 N1 is an abstract activity or ideality predicative noun, the PP is a modifier with a specification function The nouns belonging to this category (such as hypothèse de travail ‘working hypothesis’, hypothèse de depart ‘initial hypothesis’, enquête de terrain ‘field survey’, principes de base ‘basic principles’, see comment in 5.2.2 b) have a specification function, which can explain that the

NOUN PREP NOUN collocations in French: the case of scientific lexicon

291

selection of the N2 is often quite constrained. The PREP N2 (de base, de terrain, de travail) works as an adjective (it is actually often translated with the help of an adjective into English, e.g. de base → basic). These PREP N2 are quite productive and are frequent in co-occurrence with specific semantic classes. For example, de terrain specifically applies to the class of scientific processes (enquêtes de terrain, études de terrain, recherche de terrain, travail de terrain). The determination of N2 is always constrained, with a ∅ determiner (see discussion at the end of 5.2.3 section). These are clues that could lead to the conclusion that such combinations are often collocations. 5.3.5 N1 is a general ‘classifier’ This sub-class is presented in 5.2.1. We must remember that these nouns give a general shape or a perspective in which the meaning of N2 is to be considered. Combined with an N2, they could have a pragmatic function in scientific discourse, introducing the epistemic paradigm of the author. Actually, they are used in several prepositional phrasemes (dans le cadre de, sur le plan de, au plan de). Nouns such as cadre and champ belong to this category. They can be seen as ‘subspecified’ nouns (Legallois 2008) because their meaning is very vague, and they always occur in combination with an adjective or in a N PREP N constr. In some cases, e.g. cadre d’analyse, the N PREP N constr is lexicalized – in a meaning close to cadre théorique (‘theoretical framework’). The “free” alternations belong to sentences in which the specific meaning of the collocation is not always preserved: (6)

Notre intérêt pour les figures dépasse le cadre d’une analyse stylistique et se propose de les replacer dans le projet de la rhétorique antique. [‘Our interest in the figures goes beyond the scope of a stylistic analysis and intends to place them within the project of ancient rhetoric.’] (Linguistics)

In 6, we find a concatenation of collocations, both verbal and nominal (dépasser le cadre de + le cadre de l’analyse) with a determiner before N1 requiring a different approach, based on semantico-rhetorical routines (see above, section 3). It could be similar for collocations with champ as a N1, but some differences can be observed. In N PREP N constr champ (‘area’, ‘field’) is often associated with the noun études (‘studies’):

292

Francis Grossmann, Agnès Tutin

the collocation champ d’études appears in phrases such as notre champ d’étude (‘the scope of our study’) or when the author wants to clarify his/her research area: le champ des études culturelles (‘the field of cultural studies’). Like cadre, champ is often part of formulaic phrases (i.e. entrer/ne pas entrer dans le champ de l’étude) when the author seeks to delineate the object of his/her research with more precision: (7)

Ce phénomène s’est accentué par la suite et les générations suivantes, qui n’entrent pas dans le champ de cette étude, terminent en moyenne leurs études après 20 ans. [‘this phenomenon became more pronounced afterwards and subsequent generations, which do not fall within the scope of this study, complete their education, on average, after the age of 20’] (Economics).

Despite their apparent similarity, the two constructions with their alternations (cadre d’analyse or cadre de l’analyse vs champ d’étude or champ de l’étude) do not have exactly the same behavior: the collocation cadre d’analyse delimits a specific concept (see example 8) in scientific writing, which does not appear in examples such as 9. By contrast, the two more compositional collocations champ d’étude and champ de l’étude (examples 8 and 10) are very close, even though the former has a broader extension (referring to a disciplinary or sub-disciplinary field), and the latter refers to the field of a specific study. (8)

Le cadre d’analyse est exposé dans la section 2. [‘the analytical framework is outlined in Section 2’ ] (Economics )

(9)

Dans le cadre plus général de l’analyse des phénomènes coalitionnels, le cas suisse est riche en enseignements. [‘In the broader context of the analysis of coalitions, the Swiss case is instructive’].

(10)

Je m’en tiendrai d’abord au seul espace des relations professionnelles […], champ d’étude suffisamment vaste pour nourrir la réflexion. [‘I will initially limit my analysis to the level of professional relationships […], a sufficiently broad scope of study to provide useful elements for reflection’].

We do not claim here that our classification covers every type of construction likely to be used for collocations in general, but we do believe that the most productive syntactic frames used in our specific genre are represented here. The table below summarizes the different kinds of

NOUN PREP NOUN collocations in French: the case of scientific lexicon

293

constructions of NOUN PREP NOUN collocations, along with their semantic and syntactic properties. Table 1: Synthesis of the most productive N PREP NOUN constr for collocations. Constructions

Example

1. Objective genitives (N2 is the “object”)

Évaluation de la performance Analyse des effets

2. Subjective genitives (N2 is the “subject”)

Complexité du processus Changement de paradigme Section de DET article

3. Predicative structure (N2 is a N1)

Principe d’uni formité Idéal d’objec tivité Critère de sélection

4. Specification structure (Prep N2 = Adj)

Hypothèse de départ Recherche de terrain Principes de base 5. Classification Champ d’étude structure Cadre (N1 classifies N2) d’analyse/de l’analyse

Semantic properties

Determiners

Grammatical number specifications – N1 is a Most N2 have No conpredicative a determiner straint on noun, gener- (which can grammatical ally a dever- be omitted in number bal activity specific cases) noun – N2 is a scientific object – Several cases: Most N2 have Few N1 are N1 is a a determiner plural deajectival or deverbal noun – N1 is a relational noun – Few conNo determiner N2 is generstraints on for N2 ally singular N1. – N2 is an ideality noun or an activity noun – few conNo determiner N2 is sinstraints on for N2 gular N1 – Prep N2 works as an adjective – N1 belongs No determiner N2 is generto the seman- or determiner ally singular tic class of for N2 (slight abstract space differences) nouns

294

Francis Grossmann, Agnès Tutin

6. Conclusion and perspective Due to their varying semantic and syntactic structures, the N PREP N constr represent a real challenge for the study and classification of collocations both in the field of general and specialized discourse. Our focus here was on cross-disciplinary scientific lexicon, studied on the basis of a large corpus of scientific papers. In order to shed more light on the different constructions observed in the corpus, we used several criteria: the semantic characterization of N1 and N2 (predicative or not, and their mutual relation), the role played by the PREP N2 (presence or not of a syntactic argument), the role played by the preposition, the grammatical number specifications of N1 and N2 and the presence or absence of determiners behind the N2. By crosschecking the different criteria, our study allows five clusters to be distinguished: a) objective genitive constructions, b) subjective genitive constructions, c) predicative structures, d) specification structures, and e) classification structures. We have seen evidence that, among these types, two are more directly linked to the ‘manufacturing’ of collocations: it is the case for predicative structures (idéal d’objectivité, critères de sélection) and specification structures (hypothèse de départ, recherche de terrain) which both allow the easy production of cross-disciplinary notions in different ways: the former due to the closed equivalence relation established between N1 and N2, and the latter through a characterization of N1, which, working as an adjective, better specifies its scope when used. Although it is difficult to draw up a general rule, the tendency for these constructions is the absence of determiner before the N2 and a generally singular grammatical number specification with few semantic constraints on N1. Objective and subjective genitive constructions behave differently because, syntactically, they have fewer restrictions: most N2 have a determiner and their grammatical number specification is often free. The process of lexicalisation therefore seems linked to the characteristics of the nouns themselves, which denote scientific processes or activities. What we have called the classification structure is an intriguing case: these constructions seem fairly close to predicative structures

NOUN PREP NOUN collocations in French: the case of scientific lexicon

295

because they are based on the same relation of equivalence. However, the N1 (e.g. champ or cadre) are so fuzzy that this equivalence is not meaningful. Syntactically, no real difference with the predicative structures could be found, which also pleads in favour of their inclusion in the same category. Pragmatically, however, they play a specific role, allowing the scientific author to present the scientific context for his/ her research or to set the epistemological scene and are therefore sometimes included as elements of routines. As a general conclusion, it appears that an approach based on various criteria (syntactic, semantic, and sometimes pragmatic), in addition to statistical measures, could at least partly clarify the status of multiword expressions such as N PREP N constr to determine whether they can be regarded as collocations (or not). A more in-depth analysis has to be attempted to verify the accuracy of the proposed classification, through the systematic examination of all the cross-disciplinary NOUN PREP NOUN occurring in the corpus. We also hope, at a future stage, to be able to indicate more accurately the statistical distribution of the most productive N PREP N constr for cross-disciplinary collocations.

References Aarts, Bas 1998. Binominal noun phrases in English. Transactions of the Philological Society. 96/1, 117–158. Bally, Charles 1909 [1951]. Traité de stylistique française. Genève: Georg et Cie. Bally, Charles 1945 [rééd. 1965]. Linguistique générale et linguistique française. Bern: Francke (4ème éd. revue et corrigée). Bartning, Inge 1996. Eléments pour une typologie des SN complexes en de en français. Langue française, 109, 29–43. Benetti, Laurence 1995. Typologie des syntagmes binominaux de type N1 de N2 recueillis dans les manuels homéopathiques. Tranel. 23, 57–76.

296

Francis Grossmann, Agnès Tutin

Benetti, Laurence 2008. L’article zéro en français contemporain: as pects syntaxiques et sémantiques. Bern/Berlin/Bruxelles: Peter Lang. Berrendonner, Alain 1995. Quelques notions utiles à la sémantique des descripteurs nominaux, Tranel. 23, 9–39. Cadiot, Pierre 1993. De et deux de ses concurrents: avec et à. Langag es. 110, 68–106. Cowie, Anthony Paul 1998. Phraseological Dictionaries: Some EastWest Comparisons. In Cowie, Anthony Paul (ed.) Phraseology: Theories, Analysis and Applications. Oxford: Oxford University Press, 209–227. Coxhead, Averill 2002. The academic word list: A corpus-based word list for academic purposes. Language and Computers. 42/1, 73–89. Dictionnaire des combinaisons de mots. 2007. Paris: Le Robert. Dictionnaire explicatif et combinatoire du français contemporain. Re cherches lexico-sémantiques. 1984 (I), 1988 (II), 1992 (III), 1999 (IV). Montréal: Les Presses de l’Université de Montréal. Dubreuil, Estelle 2008. Collocation: définitions et problématiques. Texto! 13/1–2. . Englebert, Annick 1992. Le petit mot DE. Étude de sémantique histori que. Genève: Droz. Evert, Stefan 2008. Corpora and collocations. In Anke Lüdeling / Merja Kytö (eds) Corpus Linguistics. An International Handbook. Berlin: Mouton/DeGruyter, 1212–1248. Fillmore, Charles. J. 1968. The Case for Case. In Emmon Bach / Robert Harms (eds) Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston, 1–88. Flaux, Nelly / Stosic, Dejan 2014. Les noms d’idéalités et la modalité: marquage d’une opposition. Langages. 193, 127–142. Flaux, Nelly / Van de Velde, Danièle 2000. Les noms en français: es quisse de classement. Paris: Editions Ophrys. Flaux, Nelly 2012. Noms d’idéalités libres et noms d’idéalités liées. In De Saussure, Louis / Borillo, Andrée / Vuillaume, Marcel (eds) Grammaire, lexique, référence. Regards sur le sens. Mélanges of ferts à Georges Kleiber pour ses quarante ans de carrière. Bern: Peter Lang, 59–75.

NOUN PREP NOUN collocations in French: the case of scientific lexicon

297

Francis, Gill 1994. Labelling discourse: an aspect of nominal-group lexical cohesion. In Coulthard, Malcom (ed.) Advances in Writ ten Text Analysis. London/New York: Routledge, 82–101. Gaatone, David 1988. Cette coquine de construction. Remarques sur les trois structures affectives du français. Travaux de linguistique. 17, 159–176. Gabrielatos, Costas 1994. Collocations: Pedagogical implications, and their treatment in pedagogical materials. Unpublished essay, Research Centre for English and Applied Linguistics, University of Cambridge. . Grossmann, Francis / Tutin, Agnès (eds) 2003. Les collocations: ana lyse et traitement. Amsterdam: De Werelt. Grossmann, Francis / Tutin, Agnès / Augustyn, Magdalena (forthcoming). Les adverbiaux polylexicaux d’attitude dans l’écrit scientifique. Actes du colloque Approches théoriques et empiriques en phraséologie, Nancy, December 2014. Hatier, Sylvain 2013. Extraction des mots simples du lexique scientifique transdisciplinaire dans les écrits de sciences humaines: une première expérimentation. Actes de Recital’2013. Les Sables d’Olonne, France, 138–149. Hausmann, Franz Josef 1989. Le dictionnaire de collocations. In Hausmann Franz Josef et al. (eds) Wörterbücher: ein internation ales Handbuch zur Lexicographie. Dictionaries. Dictionnaires. Berlin/New York: DeGruyter, 1010–1019. Heid, Ulrich 1994. On Ways Words Work Together – Topics in Lexical Combinatorics. Proceedings of the 6th EURALEX Internation al Congress. Vrije Universiteit Amsterdam, August-September 1994, 226–257. Hyland, Ken 1998. Hedging in Scientific Research Articles. Amsterdam: John Benjamins. Jacquey, Evelyne et al. 2013. Filtrage terminologique par le lexique transdisciplinaire scientifique: une expérimentation en sciences humaines. TIA 2013. Villetaneuse, October 2013, 121–128. Keizer, Evelien 2007. The English Noun Phrase. The Nature of Linguis tic Categorisation. Cambridge: Cambridge University Press.

298

Francis Grossmann, Agnès Tutin

Kilgarriff, Adam et al. 2010. A quantitative evaluation of word sketches. In Proceedings of the 14th EURALEX International Congress. Leeuwarden, The Netherlands, July 2010, 372–379. Kraif, Olivier / Diwersy, Sascha 2014. Exploring combinatorial profiles using lexicograms on a parsed corpus: a case study in the lexical field of emotions. In Blumenthal, Peter / Novakova, Iva / Siepmann, Dirk (eds) Nouvelles perspectives en sémantique lexicale et en organisation du discours. Frankfurt: Peter Lang, 381–394. Kraif, Olivier / Tutin, Agnès / Diwersy, Sascha 2014. Extraction de pi vots complexes pour l’exploration de la combinatoire du lexique: une étude dans le champ des noms d’affect. In SHS Web of Con ferences. EDP Sciences. Vol. 8, 2663–2674. Legallois, Dominique 2008. Sur quelques caractéristiques des noms sous-spécifiés. Scolia. 23, 109–127. Löbner, Sebastian 1985. Definites. Journal of Semantics. 4, 279–326. Marque-Pucheu, Christiane 2008. La couleur des prépositions à et de. Langue française. 157, 74–105. Masini, Francesca 2010. Binominals Constructions in Italian of the N-di-N type. Towards a typology of Light Noun Constructions. Workshop on Binominal syntagms SLE, Vilnius, September 2010. Mel’čuk, Igor 1998. Collocations and Lexical Functions. In Cowie, Anthony Paul (ed.) Phraseology. Theory, Analysis and Applica tions. Oxford: Clarendon Press, 23–53. Milner, Jean-Claude 1978. De la syntaxe à l’interprétation. Quantités, insultes, exclamations. Paris: Seuil. Nesselhauf, Nadja 2005. Collocations in a learner corpus. Amsterdam/ Philadelphia: John Benjamins. Noailly, Michèle 2000. Apposition, coordination, reformulation dans les suites de deux GN juxtaposés. Langue française. 125, 46–59. Oxford Collocations Dictionary for Students of English. 1st ed. 2002. Oxford: Oxford University Press. Paquot, Magali 2010. Academic vocabulary in learner writing: From extraction to analysis. Bloomsbury Publishing. Quirk, Randolph et al. 1985. Comprehensive Grammar of the English Language. London/New York: Longman. Reichler-Béguelin, Marie-José 1995. Déterminant zéro et anaphore. Tranel. 23, 177–201.

NOUN PREP NOUN collocations in French: the case of scientific lexicon

299

Rossi, Caroline / Frérot, Cécile / Falaise, Achille (forthcoming). Integrating controlled corpus data in the classroom: a case-study of English NPs for French students in specialised translation. In Alonso, Francisco / Cruz, Laura / González, Víctor (eds) Corpusbased studies on language varieties. Bern: Peter Lang. Rouget, Christine 2000. Distribution et sémantique des constructions Nom de Nom. Paris: Champion. Seretan, Violeta 2011. Syntax-based collocation extraction. Dordrecht/ Heidelberg/London/New York: Springer Science. Sinclair, John 1991. Corpus, concordance, collocation. Oxford: Oxford University Press. Tesnière, Lucien 1959. Eléments de syntaxe structurale. Paris: Klincksieck. Tran, Thi Thu Hoai 2014. Les séquences lexicalisées à fonction discursive comme outil d’aide à l’écriture auprès des étudiants étrangers. In Blumenthal, Peter / Novakova, Iva / Siepmann, Dirk (eds) Nouvelles perspectives en sémantique lexicale et en organ isation du discours. Frankfurt: Peter Lang, 355–364. Tutin, Agnès / Grossmann, Francis 2002. Collocations régulières et irrégulières: esquisse de typologie du phénomène collocatif. Revue française de linguistique appliquée. 7/1, 7–25. Tutin, Agnès 2013. Les collocations lexicales: une relation essentiellement binaire définie par la relation prédicat-argument. Langages. 189/1, 47–63. Tutin, Agnès 2014. La phraséologie transdisciplinaire des écrits scientifiques: des collocations aux routines sémantico-rhétoriques. In Tutin, Agnès / Grossmann, Francis (eds) L’écrit scientifique: du lexique au discours. Autour de Scientext. Rennes: Presses Universitaires de Rennes, 27–44. Tutin, Agnès et al. 2015. Annotation of multiword expressions in French. Proceedings of Europhras. Malaga, Spain, June-July 2015. . Van de Velde, Danièle 2001. Les structures nominales dénominatives. In Amiot, Dany / De Mulder, Walter / Flaux, Nelly (eds) Le syntagme nominal: syntaxe et sémantique. Cahiers de l’Univer sité d’Artois. 22, 289–311.

300

Francis Grossmann, Agnès Tutin

Vinogradov, Victor 1947. On the basic types of phraseological units in Russian. Collection of Articles and Materials by Acad S. P. Moscow-Leningrad. Williams, Geoffrey 2003. Les collocations et l’école contextualiste britannique. In Grossmann, Francis / Tutin, Agnès (eds) Les collo cations lexicales: analyse et traitement. Amsterdam: De Werelt. Zolkovskij, Aleksandr / Mel’čuk, Igor 1967. O semanti!eskom sinteze [Sur la synthèse sémantique]. Problemy kibernetiki/. 19, 177– 238. [French translation: T.A.Informations. 1970. 2, 1–85].

Luigi Matt

Italian dictionaries of collocations

Abstract: In this paper we describe all Italian collocation dictionaries available at the moment. The four dictionaries are very different in their approach and results. We analyse the theoretical background of each dictionary (although this issue is not extensively addressed or not addressed at all by the authors), and more practical aspects ranging from the creation of the lemma list and the selection of collocations to the lemma structure and the presentation of grammatical and semantic information. The paper highlights the major achievements and shortcomings of the four dictionaries, with the aim to provide some interesting food for thought for future lexicographic work. Keywords: collocations, lexicography, Italian

0. Introduction Four Italian dictionaries of collocations have recently been published: Urzì (2009), Russo (2010), Tiberii (2012), and Lo Cascio (2012, 2013).1 These publications have at least partly filled the gap between Italian linguistic studies and other major European languages such as English, French and Spanish. In fact, in Italy the term collocazione only came into use in recent years and is still struggling to establish itself in linguistic studies, to the extent that it is often even ignored in important reference texts. The first reference work to deal with collocations in a serious manner was probably Marello (1996). However, her model was rarely imitated. In fact, there is no reference to collocations in Dardano (1996), which is frequently used in university courses on Italian linguistics. Nor 1

Compared to the first edition, Lo Cascio (2013) not only doubled the number of headwords, but also included several modifications to existing entries, which were expanded or reorganized.

302

Luigi Matt

do collocations get a mention in Aprile (2005), whose book is expressly devoted to the study of the lexicon, though in a similar text (Adamo/ Della Valle 2008) there is a passing reference to them. Italian works usually refer to unità lessicali superiori or lessemi complessi [higher lexical units, complex lexemes] and this category includes combinations with varying levels of restriction which are not, however, further subcategorized. For example, amongst the “unità lessicali superiori” cited by Dardano (1996: 239–240), there are mainly cases of combinations whose meaning derives from the sum of the meanings of the words involved (circolazione stradale, falsa testimonianza, vestito su misura, etc.), but there are also cases of combinations whose meaning cannot be reconstructed from the combined meanings of the component parts (tavola rotonda, uomo rana, villaggio globale, etc.). In Italian there are also very few specialist essays on collocations or in general on lexical combinations. The exceptions relate to specific aspects such as the use of the so-called “verbi supporto” (Cicalese 1999) or “tecnicismi collaterali” (the term, which has been widely used in recent studies, was coined by Serianni [1989: 103] explained as: “particolari espressioni stereotipiche, non necessarie, a rigore, alle esigenze della denotatività scientifica, ma preferite per la loro connotazione tecnica” [particular stereotypical expressions, not essential to needs of scientific denotation but in any case preferred due to their technical connotation]). Some attention has been given to the role of collocations in the study of Italian as a second language (Bini/Pernas/Pernas 2007). There is not even a common terminology: suffice it to say that although in all Italian studies that deal in some way with collocations the term base is used to indicate the semantic crux of a collocation. For the added element, which in English is called collocate (or collocator) in French collocatif and in German Kollokator, in Italian there are three different terms: collocatore (Heid 1997; Giacoma/Kolb 2006, Siller/ Runggaldier 2006), collocante (Lo Cascio 1997) and collocato (Faloppa 2010). The first term seems to be the most commonly used,2 but the third probably has a good chance of being used, since the l’Enciclopedia

2

But it is worth noting that all the authors quoted are likely to be influenced in their choice of German.

Italian dictionaries of collocations

303

dell’italiano (which includes the discussion of Faloppa 2010) seems destined to become an essential reference for linguistic studies. Regarding Italian lexicography, although all dictionaries give some space to collocations, they are not clearly differentiated either from idiomatic expressions or, often, from free combinations. The dictionary that offers the richest material is without doubt GRADIT, which lists the combinations in a separate section at the end of the article on the basic lemma – in fact, this is one of the most significant new features offered by this important dictionary. This section also includes collocations (under the entry for amore are, amongst others, amore an cillare, amore di gruppo, libero amore), technical terms whose meaning cannot be deduced (amore perfetto o viola d’amore, which mean a type of ornamental plant and a string instrument respectively), and idiomatic expressions (maniglie dell’amore, defined as “gli accumuli adiposi che si formano sui fianchi” [deposits of excess fat at the side of one’s waistline]). But in many cases the collocations are not lemmatized, but can be searched for through examples (scapolo impenitente, which is found under both entries). Some bilingual dictionaries have focused on collocations; among the most convincing are three dictionaries that were all produced in the same year: Giacoma/Kolb (2001: German-Italian), Lo Cascio (2001: Italian-Dutch) and Ciesla/Jamrozik/Klos (2001–2010: Italian-Polish). Giacoma/Kolb’s approach (2001) has perhaps the clearest and most functional structure, as the entries therein provide a summary of the essential information for Italian learners of German, whether beginners or advanced. The four collocation dictionaries available in Italian today are very different from various points of view: the amount of documentation, the layout, the choice of ‘lexicographical technique’, and the quality of the results. In the following sections a comparative analysis is made which highlights the main features of each of the dictionaries. Their failings are to be discussed in detail, both because they help to focus on issues of lexicographical theory and practice, and because they offer useful insights when devising new dictionaries or updated editions of already existing dictionaries.

304

Luigi Matt

1. Titles and target readership The four books have very different titles. Urzì (2009: Dizionario delle combinazioni lessicali) and Lo Cascio (2013: Dizionario combinatorio italiano) bring to the fore the concept of combination which is semantically broader than that of collocation. The term collocation is expressly referred to only in Tiberii’s title (2012: Dizionario delle collocazioni. Le combinazioni delle parole in italiano), but is in Russo’s subtitle (2010: Modi di dire. Lessico italiano delle collocazioni). In fact, Russo’s title uses the ambiguous term modo di dire which actually has no scientifically unique meaning in Italian; for example this term is defined in GRADIT as “parola o frase dal significato particolare, tipica di una lingua, di un dialetto o di un linguaggio specifico o anche di un singolo individuo” [word or phrase with special meaning, typical of a language, a dialect or a specific type of speech or even an individual]. The decision taken by three of four lexicographers to avoid the use of the term collocazioni in the title is explained first of all by the fact that their dictionaries, as will be seen, also cover other types of lexical combinations – Tiberii (2012), on the other hand, is more selective. The reason for not using the term collocazioni was probably to not seem too obscure for the Italian readership, given that the concept of collocations is not taught in language classes in schools in Italy.3 All four authors have naturally targeted their dictionaries primarily at an Italian audience, in the belief that they could help such readers make a more correct and expressively more effective use of their native language. However three of them (Urzì 2009; Tiberii 2012; Lo Cascio 2013) expressly mention foreigners who study Italian as target users, for whom simply choosing the right collocations can be a major problem.

3

From this point of view, however, the situation in the Italian lexicography is similar to that in other languages, where there is a similar alternation between the concepts of collocation and combination in the titles of dictionaries, for example in English (Oxford Collocations Dictionary for Students of English vs. The BBI Dictionary of English Word Combinations), and in French (Dic tionnaire collocationnel du français général vs. Dictionnaire explicatif et combinatoire du français).

Italian dictionaries of collocations

305

2. Theoretical Aspects The concept of collocation has not yet been rigorously fixed and shared by all linguists, as highlighted by the fact that in the literature there are some quite significant theoretical and taxonomic differences. Consequently, it would be helpful if the lexicographers clearly stated what camp they are in. In fact, the introductions to the four dictionaries only partially explain the parameters used in their design. All four authors understandably try to present their work in an easily accessible way to non-specialist readers, but this is at the expense, at least in part, of scientific accuracy. In his introduction, Urzì (2009) defines combinazioni lessicali as the union of the words “caratterizzate da un grado più o meno accentuato di coesione interna che si riflette solitamente nella loro maggiore frequenza d’uso” [characterized by a varying degree of internal cohe sion which is usually reflected in their greater frequency of use]. Given this general category, Urzì sees collocazioni (or combinazioni ristrette) as a subset whose main characteristic is the fact that “il legame fra i due costituenti è immotivato o imprevedibile” [the link between the two constituents is unmotivated or unpredictable], since it is not possible to intuitively establish, for example, that a noun like concorso in Italian must be preceded by the verb bandire rather than other “potenziali concorrenti” [likely contendors] such as lanciare or avviare. This specification probably has a generally didactic purpose, but it is irrelevant for consulting the dictionary, since the treatment of combinations is always the same, regardless of the degree of restriction. Let us now turn to Russo (2010: I), whose simplistic formulation is far from being suitable for such a rigorous instrument as a dictionary, even if aimed at a wide audience: Per quanto strano ci possa apparire e per quanto fastidio possa darci, nulla possiamo contro il fatto che una frase non è mai, in nessun caso, una sequenza di ‘parole’ messe una dopo l’altra come se fosse una fila più o meno lunga di sassolini sulla sabbia. Allo stesso modo, per quanto strano e per quanto fastidio possa darci, dobbiamo arrenderci a un altro fatto: imparare a memoria e alla perfezione una grammatica non ci permetterà mai di formare frasi ‘vere’, cioè utili, comprensibili, gradevoli, efficaci. [As strange as it may seem and however annoying,

306

Luigi Matt we can do nothing about the fact that a sentence is never a sequence of ‘words’ placed one after the other like a row of pebbles on the sand. Likewise, there is no escaping another fact: memorizing a grammar by heart will never allow us to form ‘true’ sentences, which are useful, understandable, pleasing, and effective.]

Russo’s (2010) introduction has a non-scientific ring, which is somewhat inappropriate in terms of the concepts expressed. In fact, collocations are not defined correctly at all “tutti quegli elementi linguistici formati da due o più parole che quando stanno insieme hanno la fastidiosissima abitudine di avere un significato particolare che non coincide quasi mai con la somma dei significati” (2010: I) [all those linguistic elements made up of two or more words which when placed together have the annoying habit of having a particular meaning that almost never coincides with the sum of their meanings]. In reality, it can be said that in many collocations the meaning, “sia pur secondo meccanismi non completamente trasparenti, deriva da quello dei suoi costituenti” [despite the not completely transparent mechanism, is derived from that of its constituents] (Faloppa 2010: 230).4 The most effective definition is certainly Tiberii’s (2012: 3), which has the merit of explaining a collocation in a simple but scientifically correct way: Le collocazioni sono espressioni formate da due o più parole che per uso e consuetudine lessicale formano una unità fraseologica non fissa ma riconoscibile. Le collocazioni possibili sono molte, alcune più frequenti e comuni, altre più specifiche e raffinate, e tutte sono contraddistinte dalla riconoscibilità come unità lessicale che le rende elemento distintivo e caratteristico della lingua. Spesso non vi è alcun nesso logico che leghi i termini tra loro, né le corrette combinazioni possono essere desunte da un ragionamento o da una regola. [Collocations are expressions formed by two or more words which in their general and customary lexical use form a phraseological unit that is not fixed but recognizable. There are many possible collocations, some more frequent and common, others more specific and refined, and all marked by recognition as a lexical unit that makes them a distinctive and characteristic element of the language. Often there is no logical connection between the terms that bind them, nor can correct combinations be derived by reasoning or by a rule.] 4

This issue is in fact controversial, and depends on how the category of collocations is considered – among linguists themselves there are also quite different positions. In any case, there is no doubt that Russo’s approach (2010) is not particularly helpful for readers and creates confusion with idiomatic expressions.

Italian dictionaries of collocations

307

Unfortunately, this useful definition is not developed by the author, who does not seem willing to spend many words on theoretical problems, probably believing that her readers would not be interested. Lo Cascio (2013: V) avoids any rigorous definition of a lexical combination and focuses his introduction on a certain number of useful examples that help readers get an intuitive idea of the concept, but the rest is explained in a very generic manner: Language is composed of words which seek each other out and gather together to form a coherent set. Word combination, however, is not random. Each word has preferences and belongs to a specific family. Language is actually made up of many families and of different family relationships.

It is evident here that the author, the only one of the four to have published papers on collocations, does not want to make trouble for the non-specialist reader, who is deemed not to have a thorough knowledge of the matter. However, it would be much clearer if precise concepts were used in an understandable way. Essentially in the introductory sections of their dictionaries all four authors seem more interested in providing practical guidance in order to facilitate consultation than in clarifying the scientific rationale underlying their work.

3. Choice of headwords As regards the range of headwords, three of the dictionaries are very similar: the number of words that form the basis for the collection of collocations are, as stated by the authors, about 6,700 for Urzì (2009), 6,000 for Tiberii (2012) and 6,500 for Lo Cascio (2013). Russo’s word list (2010) is made up of approximately 3,600 words (as can be deduced from looking at the index). Aside from the number of headwords included, one would expect to be enlightened about the rationale behind the choice of words. However, three of the four authors fail to explain their reasoning

308

Luigi Matt

satisfactorily. Urzì (2009) claims not to have observed “alcun criterio di frequenza o di ‘uso’. […] Il criterio che ha guidato la scelta dei lemmi è la coesione fra il lemma e l’unità lessicale con esso combinabile” [any criterion of frequency or ‘use’. […] The criterion that guided the choice of lemmas is the cohesion between the lemma and the lexical unit it is combined with]. But it is difficult to understand how this principle is translated into lexicographical practice: to identify the lexical combinations with the highest degree of “cohesion” it would be necessary to analyze all those in the entire Italian language system, which is obviously impossible. Lo Cascio (2013: VI) is also not very transparent. He claims to have selected “high frequency words in the Italian language that appear in a wide network of word combinations”; but he does not say exactly what kind of methodology enabled him to recognize words of this type. Tiberii (2012) simply opts to give no information on the matter. Russo (2010: II) is much clearer. He states that the words recorded in his dictionary are words “del Vocabolario di Base dell’italiano […], quelle fondamentali e quelle di alto uso” [from the Core Vocabulary of Italian, which are those that are fundamental and those that are frequently used]. This terminology seems to refer directly back to GRADIT, the Italian dictionary which has made the most consistent attempt to classify the most frequently used lexical items. In fact, by comparing the headwords it is clear that Russo’s selection is essentially based on being fondamentale [fundamental] and di alto uso [high-use], as taken from GRADIT. This choice certainly responds in a transparent manner to an objective criterion. The downside is that there are fewer headwords than in the other three dictionaries.5

5

The unexplainable absence of a word like cosa (obviously marked as fundamental in GRADIT) should also be noted, which in Italian is certainly one of the most frequent of all, and gives rise to a huge number of collocations. And this is not just limited to Russo (2010); interestingly also Tiberii (2012) omits the word, which instead appears in the other two dictionaries.

Italian dictionaries of collocations

309

4. Choice of collocations In contrast with what we have seen for the headwords, Russo (2010) is the only one who does not provide any indication of how he selected the collocations recorded for each word. But not even the other authors are very clear on this. Urzì’s (2012) claim is difficult to understand as the dictionary prende in esame tutte le Combinazioni Lessicali per le quali, data per nota la base (Nome, Verbo o Aggettivo), la modalità di ricerca orizzontale (sintagma tica) della parola da abbinare risulti più rapida ed efficiente di qualunque altra modalità di ricerca (ricerca per sinonimi, mediante i dizionari tradizionali o altro) [examines all the lexical combinations for which, taking for granted that the grammatical part of speech is known (noun, verb or adjective), the horizontal search mode (syntagmatic) of the word to match is more rapid and efficient than any other search mode (search for synonyms, using a traditional dictionary, and such like)].

Tiberii (2012: 4) warns that the collocations in her dictionary non costituiscono ovviamente tutte le combinazioni possibili. È stata operata una scelta escludendo sia collocazioni rare o estremamente specifiche, sia quelle troppo comuni e generali” [do not obviously constitute all the possible combinations. A deliberate decision was made to exclude both rare or very specific locations, as well as those that are very common and general].

This wording perfectly clarifies her intent, but not the methods used: she does not specify what methodology she used to decide whether to omit a certain collocation as being too rare or too common. Thus it is likely that she made subjective choices. The only reference to using specific sources for finding collocations is in Lo Cascio’s introduction (2013: XI), although the way these sources are described is rather vague: “ranging from the knowledge and intuition of many native speakers to monolingual and bilingual dictionaries and electronic corpora such as Sketchengine, and others on the Internet delivering a lot of authentic texts”. A review of Urzì’s dictionary (2009) laments the fact that “the presence or absence of items seems somewhat random” (Coffey 2010:

310

Luigi Matt

363). These comments could also be extended to the other dictionaries: although not exactly random, the choice of collocations is in any case motivated by individual assessments that are not supported by objective criteria.

5. Treatment of collocations 5.1 Typology In all four dictionaries headwords belong to three grammatical categories: nouns, adjectives and verbs. This is of course in line with the commonly accepted classifications, which provide that only these three parts of speech can serve as the base of a collocation. The currently most popular schema, proposed by influential linguists such as Heid (1997: 49) and Hausmann (1999: 125), allows for the following possibilities: noun + adjective, noun (subject) + verb, verb + noun (direct object or indirect complement), verb + adverb, adjective + adverb, and noun + noun (with or without preposition between the two elements).6 Three of the four dictionaries basically adhere to this classification, with just a few differences. In Urzì (2009), the noun + noun collocations are very often categorized as noun + adjective. This happens in particular when the second noun, preceded by a preposition, “dal punto di vista semantico, indica un’entità più piccola, o il gruppo cui appartiene un determinato oggetto / essere” [from a semantic point of view, indicates a smaller entity, or a group which belongs to a given object / being] (Faloppa 2010: 231). For example, the headword occhiali is recorded as noun + noun only in 6

It is worth noting that for both linguists the possibility of building a verb with a noun that has the function of an indirect complement is suggested implicitly, through the schema verb + preposition + noun (Hausmann 1999: 125), or verb + any element (therefore also a preposition) + noun (Heid 1997: 49). Previously, Hausmann (1989: 1010) stated that a verb + noun collocations could only act as an object; the schema proposed here is taken on essentially for Italian by Faloppa (2010: 231).

Italian dictionaries of collocations

311

un paio di o.; while other collocations with occhiali are catalogued as noun + adjective: o. da vista, o. da sole / da neve, o. a stringinaso / a stanghetta, o. d’oro / d’argento / di tartaruga, o. a giorno. The type of collocations adopted by Tiberii (2012), in addition to the ones mentioned above, include a category that is not covered by the literature, but which is not of secondary importance in Italian, i.e. verb + adjective collocations, where the adjective has the the function of a predicative complement. For example, for the adjective pronto combinations are included with the verbs apparire, dimostrarsi, rivelarsi, sembrare, sentirsi. Less convincing is the introduction of adjective + adjective collocations, in such cases as assente giustificato / ingiustifi cato, since here assente can be considered as a noun. Many collocations are recorded with a noun + construction structure. However in the vast majority of cases, this category simply refers to noun + noun combinations (barca a motore / da pesca); only rarely are there more complex phrases (effetto a breve termine, presente carico di ansie). Among the noun + adjective collocations are those that contain nouns that function as adjectives (under the headword effetto we have e. domino, e. placebo, e. serra, e. sorpresa, e. valanga). The only particular characteristic of Lo Cascio (2013) is that he includes quantifiers among the elements that may be part of lexical combinations. In reality, the addition is more apparent than real, since ultimately it is a subset of noun + noun collocations, where the first element indicates the quantity of the second, which is introduced by the prepositions di or dello (under pomata we have un barattolo di p., strato di p., tubetto della p.). Russo’s (2010) method is very different: it completely ignores scientific classifications and is not based on any kind of typology founded on the grammatical relationships between combined words. In contrast, he proposes a rather irrational taxonomy, on the basis of which all entries are structured. This system was undoubtedly designed to be one of the dictionary’s strengths; however it is actually one of the most arguable aspects of the work. It involves a division into four categories (not all necessarily present for each lemma) which should correspond at the same time to the frequency of use (descending from the first to the fourth category) and the degree of cohesion of the collocations

312

Luigi Matt

(gradually increasing). The underlying assumption appears to be that the two parameters are inversely proportional. This is of course not the case. Clearly, these are two completely independent variables. In fact, if we analyse the entries we get an idea that cohesion is the rationale used to allocate the collocations to the various categories. The third and fourth categories tend to accommodate all restricted combinations, including the most frequent ones (for instance, under forza we find the following in the third category: f. bruta, f. maggiore, f. di volontà etc; and in the fourth type: f. aerea, f. magnetica, f. di gravità etc.). This is a very arbitrary approach given that the degree of cohesion is not an objectively measurable concept. It is impossible, for example, to understand what rationale was used to allocate andare indietro to the second category, while andare avanti is in the third category. Even the order of the collocations within the various categories is not clear: bizarrely it is based not on the types of collocations but on the position of the headword in relation to the other words that make up the collocation. The categories described in the Guide to Consultation (Russo 2009: IV) are: “collocazione di due elementi con lemma iniziale” [collocations of two elements with the headword in first position] (a. paternal); “collocazione di due elementi con lemma finale” [collocations of two elements with the headword in the last position] (divino a); “collocazione di tre elementi con lemma centrale” [collocations of three elements with the headword in the middle] (per a. di); “collocazione di quattro elementi con lemma in seconda” [collocations of four elements with the headword in second position] (l’a. è cieco); “collocazione di sette elementi con lemma in seconda” [collocation of seven elements with the headword in second position] (l’a. sacro e l’amor profano); “collocazione di quattro elementi con lemma finale” [collocation of four elements with the headword in the last position] (avere un nuovo a.). 5.2 Base and collocates According to the most popular theory, expounded by Hausmann (1985: 119) and echoed by many linguists, in collocations that contain a noun, this noun acts as the base; adjectives and verbs can be base words only

Italian dictionaries of collocations

313

of collocations with adverbs.7 One would thus expect our four dictionaries to follow this approach. In reality, only Tiberii (2012) lists her collocations under the base word: the other three lexicographers record their collocations both under the base word and under the collocate (except in cases where the collocate is an adverb). Tiberii’s system is more rigorous, and avoids possible repetition, since each collocation is only recorded once. On the other hand, the approach of the other dictionaries makes the search process easier for less expert readers. 5.3 Microsyntactic information Given that in Italian certain options are not determined by general rules, and are therefore not predictable, for some types of collocations it is thus important to provide syntactic specifications. In particular, in verb + noun (object) collocations the reader needs to know which article to use: definite article, indefinite article, or zero article. All four dictionaries provide this information, using different techniques due to their different ways of presenting collocations. In noun + adjective collocations, however, the reader needs to know whether the position of the adjective is fixed or free. From this point of view, Tiberii’s solution (2012) is the best. She clearly states three possibilities: the adjective must precede the noun, or must follow it, or alternatively both positions are allowed. The other lexicographers always opt for one of two positions. Lo Cascio (2013: IX) is the only one to spell out the criteria adhered to: “Wherever an adjective can take both positions then the most frequent position is chosen, based on the number of Google occurrences”. But in doing so, the most common situation in Italian is in fact hidden from the reader, i.e. the freedom to position the adjective with respect to the noun to which it refers.

7

In fact, this approach should probably be revised: recently, Orlandi (2013: 195) made a convincing case for considering the adjective as a base in combinations such as champignon atomique or révolution industrielle, where the overall meaning is derived from the second element, and not the first (the reasoning, based on examples in French, can naturally extend to other languages, including Italian).

314

Luigi Matt

5.4 Other combinations With the exception of Tiberii (2012), the dictionaries do not only present collocations, but also other types of combinations. The solutions adopted by Urzì (2009) and Lo Cascio (2013) are very similar to each other: both also list the free combinations that are most frequently used, and they also cover idiomatic expressions. Both lexicographers sensibly mark idioms with a symbol. Lo Cascio (2013) also records the most common proverbs (patti chiari, amicizia lunga). Russo (2010), however, makes no distinction between the various types of combinations. They are mixed up with each other due to his 4-category approach (see 5.1). For example, under gabbia there are several idiomatic expressions such as sentirsi in g., g. dorata, g. di mat ti, which the reader finds alongside real collocations (chiudere in g.) and free combinations (proteggere con/da una g.). Grammatical collocations present a special case, “in cui cioè si combinano una parola dominante – verbo, nome, aggettivo – e una grammaticale, tipicamente una preposizione” [when a dominant word – verb, noun, adjective – is combined with another function word, typically a preposition] (Faloppa 2010: 231), and how certain verbs take a particular preposition (known in Italian as reggenze). It is a very problematic area of Italian, given the lack of rules that determine how to construct prepositional objects, which therefore cannot be predicted but must be memorized for each single verb. This is what makes it notoriously difficult for foreign learners of Italian, and sometimes even mother tongue speakers. In recent years, an indication of the verbs plus associated prepositions has been included in Italian dictionaries and grammars.8 One would expect to find help in this matter in dictionaries of collocations, although most linguists do not consider grammatical combinations as actually being collocations. However, of our four dictionaries, the only one to deal specifically with this issue is Lo Cascio (2013), who thus includes the category of verb + preposition (ammalarsi di qualcosa). In Urzì (2009), the issue of what verb requires what preposition is often, but not always, derived from the examples (see 6): the fact 8

Among the first were Devoto/Oli (2004), Trifone/Palermo (2000).

Italian dictionaries of collocations

315

that offrirsi takes the preposition per can be gleaned from the example offrirsi volontariamente per una missione; but no example is given to show that obbedire takes the preposition a (obbedire a qualcuno). Even in Russo (2010) the presentation of collocations may implicitly indicate what preposition should be associated with a particular verb, although when there are several possibilities (which is not uncommon in Italian) the choices appear random: under lamentarsi Russo puts l. della cattiva sorte / per il baccano, but the reader is not informed that l. per la cattiva sorte / del baccano would be equally correct. Clear indications of the preposition required after certain verbs are offered by Tiberii (2012), who, in verb + noun (direct object or indirect complement) collocations, always specifies the possible use of prepositions, and also cases where two different prepositions are permissible (transitare per / su una strada).

6. Definitions and examples Lo Cascio (2013: IX) is the only one of the four lexicographers to suggest a definition for some collocations; which occurs naturally for those combinations “may be difficult to understand, especially when the entry partners with a difficult or not very common word or when, by combining with the basic term (the entry), the combination becomes figurative”. This is undoubtedly a strong point of the dictionary and the few incorrect definitions or bits of missing information in no way undermine this strength. One incorrect definition is for pizza marinara, which is not a “pizza con frutti di mare” [pizza with seafood], but a “pizza condita con olive, capperi, pomodori e acciughe” [pizza topped with olives, capers, tomatoes and anchovies] or “con olio, aglio e origano” [with olive oil, garlic and oregano] (as stated respectively in GRADIT and Zingarelli, which give pizza alla marinara as a form of collocation). Regarding missing information, it would have been helpful to have a definition of pericolo giallo, whose meaning cannot be deduced from the sum of the constituents (as stated in the Zingarelli: “per le popolazioni occidentali, quello potenzialmente rappresentato, all’inizio del XX sec., dalla Cina

316

Luigi Matt

e del Giappone” [for western populations, those potentially represented, at the beginning of the twentieth century, by China and Japan]). In Urzì (2009) and Lo Cascio (2013), some collocations are included within a longer sentence, in order to provide a concrete example of how they are used, which is particularly helpful for less common combinations. Urzì uses such longer sentences in his examples very frequently: under some entries the collocations given in a longer context sometimes make up more than half of the total. In addition, the contexts mentioned may be very long (for example, under onorificenza, the collocation with the verb fregiarsi is illustrated by the following example: I citta dini italiani decorati del Sacro Militare Ordine Costantiniano di San Giorgio possono ottenere l’autorizzazione a fregiarsi dell’onorificenza nel territorio della Repubblica). However, occasionally no example is provided for some collocations or idiomatic expressions where actually an example would be useful, for example omicidio bianco, which many foreigners, and maybe even some Italians, may not know (which Zingarelli defines as “la morte di operai sul lavoro, causata dalla mancanza di adeguate misure di sicurezza” [the death of a worker on the job, caused by a lack of adequate safety measures]). Based on the fact that the examples, “opportunamente adattati, sono stati tratti da fonti giornalistiche e da Internet” [appropriately adapted, were taken from journalistic sources and the Internet], Urzì (2009) states that his dictionary “si può pertanto considerare in larga misura corpus-based” [can therefore be seen to a large extent as corpusbased]. But of course this would only be true if there was a systematic use of real corpora, which instead is not the case. Moreover, although the use of search engines has become indispensable, it is important not to forget that the Internet “non è e non potrà mai essere un corpus: infatti è un insieme non definibile” [is not and never will be a corpus: in fact it is non-definable] from a quantative point of view, consisting of texts that “non sono facilmente raggruppabili in categorie” [are not easily grouped into categories] (Matt 2010: 800). Lo Cascio (2013) on the other hand gives contexts only for a limited number of collocations, though the definitions are in most cases sufficient for the reader to deduce the meaning. These exempla ficta

Italian dictionaries of collocations

317

are almost always short and deliberately schematic (for example, under amore, to illustrate the collocation with the noun dichiarazione the following phrase is used: fare una dichiarazione d’amore a qualcuno). Only rarely more elaborate phrases are used; for example under età, to illustrate the collocation in giovane e. the following example is provided by Lo Cascio: purtroppo ha avuto una brutta malattia ed è morto in giovane età, aveva appena trent’anni. The lack of examples and definitions is certainly a serious limi tation in Russo (2010) and Tiberii (2012). It is especially striking in Tiberii, whose dictionary is also aimed specifically at an audience of non-native speakers: how could the latter, consulting for instance the entry for carne, guess the meaning of collocations such as c. viva or essere in c.?

7. Diaphasic markers A fundamental characteristic of any type of dictionary is the adoption of diaphasic markers that take account of the fact that some elements are not used in whatever context in the common language, but appear only in certain registers or subcodes. Of the collocation dictionaries, only Lo Cascio (2013) provides this kind of information, and this is a major advantage over the others. Lo Cascio states that this is designed specifically for “Foreign speakers who want to understand in which context phrases and expressions are used”. Lo Cascio (2013) uses fifteen diaphasic markers: eufemistico, familiare, figurato, gergale, idiomatico, informale, ironico, letterario, peggiorativo, poetico, popolare, raro, scherzoso, spregiativo, volgare. As happens in all general dictionaries, the distinction between the various markers is not always clear: it is often hard to tell what differentiates an ironico from a scherzoso usage, or a familiare from an informale usage. In some cases, two markers (one relating to use and the other to register) are used simultaneously, as in pel di carota (“scherz.; fam.”); essere d’un pelo e d’una buccia (“spreg.; pop.”).

318

Luigi Matt

The term literary is not always enlightening – in several cases Lo Cascio has used it for expressions that could be found in any type of formal writing, such as anima bella, or which could be used by the average Italian, even in the spoken language, such as cadere in errore. With regard to the subcodes, in Lo Cascio (2013) about 60 labels have been used for specific fields. These labels can refer to a specific collocation but also generically to a lemma in its entirety. For example, the entry altar is used as a religious term; and in the relative collocations there is in some cases a second category, which refers to architectural vocabulary (altare laterale), pictoral (pala d’altare) and military (altare da campo). This kind of category, in itself very useful, is quite often used excessively, such as under the entry for amante, in which the collocations amante del cinema / del jazz / della buona cucina / dello sport are classified respectively as belonging to film, music, culinary and sports language, when clearly these terms are not specialistic but are part of the common language. If a sentence includes the word doctor or physicist, this clearly does not mean that the specialized language of medicine or physics is being used. Browsing through the list of abbreviations in Lo Cascio (2013) it can be seen that some diatopic labels have also been adopted, though these are not mentioned in the introduction. This is particularly useful in a dictionary of Italian, a language permeated with local influences. These categories may be general in character (regionale, regionale non standard), or refer to macro-areas (centro-meridionale, centrale, meri dionale, settentrionale), regions (emiliano, piemontese, sardo, siciliano, toscano, veneto), or cities (milanese, napoletano, romanesco).

8. Conclusions Of the four dictionaries examined in this chapter, Russo (2010) is the least convincing. Although in its own way it is rich in data, it is not very user friendly due to the limitations of its layout. Non-expert readers will certainly have difficulty in consulting it due to its poor organization.

Italian dictionaries of collocations

319

The three other dictionaries are, on the other hand, useful tools – with their defects being more than counterbalanced by positive aspects. What matters most in any lexicographic work, which by definition is never going to be perfect, is the amount of useful information offered to readers, which in these three cases is considerable. From a scientific point of view, the rigor of Tiberii’s (2012) layout is particularly impressive – her classification of collocations is excellent. Equally rigorous and impressive is the richness of the materials offered by Lo Cascio (2013), whose dictionary is the only one to include fundamental items such as definitions and diaphasic marks. Taken together, the dictionaries of Urzì (2009), Tiberii (2012) and Lo Cascio (2013) certainly constitute a first step taken by Italian lexicography to close the gap that separates it in the field of collocations from other major European languages. It is highly likely that in the coming years new dictionaries of Italian collocations will also be published. Dictionaries of collocations, perhaps more than other lexicographical works, can exploit information technology: thus electronic dictionaries are set to become the norm.9 Spina (2010a, 2010b) has proposed the design for an electronic dictionary, specifically aimed at foreign learners of Italian. She describes in detail the statistical methods of “extraction and selection of collocations from a reference corpus” (Spina 2010b: 3205), which she states would constitute the main features of the work. Giacomini (2010) has presented the guidelines for another electronic dictionary, designed for translators and advanced students. There is an underlying theoretical awareness supporting her proposal, based on a classification of collocations which take account of syntactic and semantic parameters. The result, which is facilitated by an electronic format, would be the construction of “a complex cross-reference system”, through which “lexical information which meets search requirements is made available to dictionary users in the form of simple excerpts from the concrete microstructures” Giacomini (2010: 1191). 9

Of our four dictionaries, only Tiberii (2012) offers a computerized version as well as the paper format. The design and structure of the work, however, are traditional: the DVD-ROM makes consultation easier, but the logic of the dictionary remains unchanged. It cannot therefore be considered an electronic dictionary in the true sense of the term.

320

Luigi Matt

The website describes a project for an Italian-German bilingual dictionary of collocations that is now in press. The presentation and materials offered so far give hope that this work will contain a wealth of data organized with a clear approach.

References Adamo, Giovanni / Della Valle, Valeria 2008. Le parole del lessico ita liano. Roma: Carocci. Aprile, Marcello 2005. Dalle parole ai dizionari. Bologna: Il Mulino. Benson, Morton / Benson, Evelyn / Ilson, Robert 1997. The BBI Dic tionary of English Word Combinations. Amsterdam-Philadelphia: John Benjamins. Bini, Milena / Pernas Almudena / Pernas Paloma 2007. Apprendimento/ insegnamento delle collocazioni dell’italiano. Con i NUNC è più facile. In Barbera, Manuel / Corino, Elisa / Onesti, Cristina (eds) Corpora e linguistica in rete. Perugia: Guerra, 323–333. Cicalese, Anna 1999. Le estensioni di verbo supporto: uno studio introduttivo. Studi italiani di linguistica teorica e applicata. 28, 447–485. Ciesla, Hanna / Jamrozik, Elzbieta / Klos, Radoslaw 2001–2010. Grande dizionario italiano-polacco. Varsavia: Wiedza Powszechna. Coffey, Stephen 2010. Review of Urzì 2009. International Journal of Lexicography. 63, 355–364. Crowther, Jonathan / Dignen, Sheila / Lea, Diana (eds) 2002. Oxford Collocations Dictionary for Students of English. Oxford: Oxford University Press. Dardano, Maurizio 1996. Manualetto di linguistica italiana. Bologna: Zanichelli. Serianni, Luca / Trifone Maurizio (eds) 2004. Il Devoto-Oli 2004–2005. Vocabolario della lingua italiana. Firenze: Le Monnier.

Italian dictionaries of collocations

321

Faloppa, Federico 2010. Collocazioni. In Simone, Raffaele (ed.) Enci clopedia dell’italiano. Roma: Istituto della Enciclopedia italiana. I, 229–232. Giacoma, Luisa / Kolb, Susanne 2001. Dizionario tedesco-italiano italiano-tedesco. Bologna/Stuttgart: Zanichelli/Klett. Giacoma, Luisa / Kolb, Susanne 2006. L’utilità dell’introduzione sistematica delle collocazioni nella voce lessicografica bilingue. L’esempio del Dizionario di tedesco (Giacoma-Kolb, Zanichelli-Klett, 2001). In Corino, Elisa / Marello, Carla / Onesti, Cristina (eds) Atti del XII Congresso Internazionale di Lessico grafia. Alessandria: Edizioni dell’Orso, 967–978. Giacomini, Laura 2010. A proposal for an electronic dictionary of Italian collocations highlighting lexical prototypicality and the syntactic-semantic relations between collocation partners. In Dykstra, Anne / Schoonheim, Tanneke (eds) Proceedings of the XIV Euralex International Congress. Leeuwarden: Fryske Akademy / Algemiene Fryske Underjocht Kommisje, 1183–1192. GRADIT = De Mauro, Tullio (ed.) 1999–2007. Grande dizionario ita liano dell’uso. Torino: UTET. Grobelack, Lucjan 1990. Dictionnaire collocationnel du français général. Varsovie: Państwowe Wydawnictwo Naukowe. Hausmann, Franz Josef 1985. Kollokationen im deutschen Worterbuch. Ein Beitrag zur Theorie des lexikographischen Beispiels. In Bergenholtz, Henning / Mugdan, Joachim (eds) Lexikographie und Grammatik. Tübingen: Niemeyer, 118–129. Hausmann, Franz Josef 1989. Le dictionnaire des collocations. In Hausmann, Franz Josef et al. (eds) Wörterbücher: ein inter nationales Handbuch zur Lexicographie. Berlin/New York: DeGruyter, 1010–1019. Hausmann, Franz Josef 1999. Le dictionnaire de collocations – Critères de son organisation. In Greiner, Norbert / Kornelius, Joachim / Rovere, Giovanni (eds) Texte und Kontexte in Sprachen und Kulturen. Trier: WVT, 121–139. Heid, Ulrich 1997. Proposte per la costruzione di un dizionario elettro nico delle collocazioni. In De Mauro, Tullio / Lo Cascio, Vincenzo (eds) Lessico e grammatica. Teorie linguistiche e applicazioni lessicografiche. Roma: Bulzoni, 47–62.

322

Luigi Matt

Lo Cascio, Vincenzo 1997. Semantica lessicale e i criteri di collo cazione nei dizionari bilingui a stampa ed elettronici. In De Mauro, Tullio / Lo Cascio, Vincenzo (eds) Lessico e gramma tica. Teorie linguistiche e applicazioni lessicografiche. Roma: Bulzoni, 63–88. Lo Cascio, Vincenzo 2001. Il dizionario italiano-neerlandese. Hand woordenboek Italiaans-nederlands. Utrecht/Antwerpen: Van Dale Lexicographie/Zanichelli. Lo Cascio, Vincenzo 2012. Dizionario combinatorio compatto italiano. Amsterdam/Philadelphia: John Benjamins. Lo Cascio, Vincenzo 2013. Dizionario combinatorio italiano. Amsterdam/Philadelphia: John Benjamins. Marello, Carla 1996. Le parole dell’italiano. Lessico e dizionari. Bolo gna: Zanichelli. Matt, Luigi 2010. I motori di ricerca in Internet come fonte per la lessicologia e la lessicografia. In Iliescu, Maria / SillerRunggaldier, Heidi M. / Danler, Paul (eds) Actes du XXVe Con grès International de Linguistique et de Philologie Romanes. Berlin: De Gruyter. II, 799–806. Mel’čuk, Igor A. / Clas, Nadia / Polguère, Alain 1984–1999. Diction naire explicatif et combinatoire du français contemporain: re cherches lexico-sémantiques. Montréal: PUM. Orlandi, Adriana 2013. Pour une typologie raisonnée des expressions figées Nom Adjectif. In Ligas, Pierluigi / Tallarico, Giovanni (eds) Lexique Lexiques. Théories, méthodes et perspectives en lexicologie, lexicographie, terminologie et phraséologie. Verona: QuiEdit, 183–200. Russo, Domenico 2010. Modi di dire. Lessico italiano delle collocazio ni. Roma: Aracne. Serianni, Luca 1989. Saggi di storia linguistica italiana. Napoli: Morano. Siller-Runggaldier, Heidi 2008. Le collocazioni lessicali: strutture sintagmatiche idiosincratiche? In Cresti, Emanuela (ed.) Prospet tive nello studio del lessico italiano. Firenze, Firenze University Press, 591–598. Spina, Stefania 2010a. The DICI Project: towards a Dictionary of Italian Collocations integrated with an online language learning platform.

Italian dictionaries of collocations

323

In Granger, Sylviane / Paquot, Magali (eds) eLexicography in the 21st century: new challenges, new applications, Louvain-LaNeuve: Cahiers du Cental, 273–282. Spina, Stefania 2010b. The Dictionary of Italian Collocations: Design and Integration in an Online Learning Environment. In Calzolari, Nicoletta et al. (eds) Proceedings of the Seventh conference on International Language Resources and Evaluation. Malta, European Language Resources Association, 3202–3208. Tiberii, Paola 2012. Dizionario delle collocazioni. Le combinazioni delle parole in italiano. Bologna: Zanichelli. Trifone, Pietro / Palermo, Massimo 2000. Grammatica italiana di base. Bologna: Zanichelli. Urzì, Francesco 2009. Dizionario delle combinazioni lessicali. Luxembourg: Convivium. Zingarelli = Lo Zingarelli 2009. Bologna: Zanichelli.

Notes on Contributors

Veronica Benigno (Pearson English), Doctor Europaeus (Italy, France, and The Netherlands) in Linguistics and Didactics, is an active researcher in vocabulary studies. Currently working at Pearson English, she conducts corpus-based research contributing to the alignment of products to the CEFR and is responsible for the research and validation programme in language teaching and testing. In the past she worked as teacher of Italian as a Second Language and as lexicographer contributing to the publication of a dictionary of Italian collocations (Lo Cascio, 2012). Daniela Capra (University of Modena and Reggio Emilia) is a founder of Phrasis, the first Italian association of phraseology and paremiology. She teaches Spanish Linguistics and Translation at the University of Modena and Reggio Emilia. She is the author of several articles centered on different aspects of phraseology, such as its presence, treatment and meta-terminology in dictionaries, its labelling in bilingual Spanish-Italian dictionaries and its translation in contemporary novelistic dialogues. Gloria Corpas Pastor (University of Málaga, Spain / University of Wolverhampton, UK), BA in German Philology (English) from the University of Malaga. PhD in English Philology from the Universidad Complutense de Madrid (1994). Visiting Professor in Translation Technology at the Research Institute in Information and Language Processing (RIILP) of the University of Wolverhampton, UK (since 2007), and Professor in Translation and Interpreting (2008). Research lines: Specialised translation. Translation Technology. Corpus Linguistics. Phraseology. Lexicography and Terminology. E-resources and Virtual Learning Environments (VLE). Published and cited extensively, member of several international and national editorial and scientific committees. Spanish delegate for AEN/CTN 174 and CEN/BTTF 138, actively involved in the development of the UNE-EN 15038:2006 and currently involved in the future ISO Standard (ISO TC37/SC2-WG6 “Translation

326

Notes on Contributors

and Interpreting”). Regular evaluator of University programmes and curriculum design for the Spanish Agency for Quality Assessment and Accreditation (ANECA) and various research funding bodies. President of AIETI (Iberian Association of Translation and Interpreting Studies) and Vice-President of AMIT-A (Association of Women in Science and Technology of Andalusia). Laura Giacomini (University of Heidelberg) has a PhD in Applied Linguistics from the Department of Translation and Interpretation of Heidelberg University (Germany), where she is a teacher and researcher. Her research fields include lexicography, phraseology, LSP and translation studies. She is currently involved in different lexicographic projects (e.g. WLWF) and is working at her habilitation thesis on LSP databases of the technical domain, with special focus on the topic of phraseological variation in specialised language and its representation in e-lexicohraphic resources. Francis Grossmann (LIDILEM Univ. Grenoble Alpes) is Chair Professor of Linguistics at the Grenoble University and a member of the LIDILEM Research Team. His research has focused in recent years on the analysis of the scientific discourse on its phraseological dimensions as well as the lexicon of emotions, and the discursive markers of reported speech. In cooperation with his colleague Agnès Tutin, he has managed a project for the French National Agency of Research, to make available to researchers a large corpus of scientific literature, in which can be studied linguistic and stylistic features of the academic writing in different disciplines. His research interests also include lexical learning at different levels of education. Olivier Kraif is an associate professor in Grenoble at Université de Grenoble Alpes (UGA). He is teaching in the field of Computer Science and Natural Language Processing. As a researcher, he has been a member of LIDILEM (Laboratoire de LInguistique et DIdactique des Langues étrangère et maternelle) since 2002. He works in the field of text corpora processing, and deals especially with multilingual corpora (comparable as well as parallel). His researches aim at developing techniques and tools to investigate linguistic phenomena from various points of view: lexicon, phraseology, contrastive analysis and translational studies.

Notes on Contributors

327

Béatrice Lamiroy (University of Leuven) is Professor of French and General Linguistics at the University of Leuven, Belgium. Her research field is Romance languages, comparative linguistics in particular. She has published various books and papers on the relations between French and other Romance languages, between French from France and other French varieties and between Romance and Germanic languages. Her major research domains are (a) grammaticalization theory and (b) lexicalization, in particular French idiomatic expressions. Vincenzo Lo Cascio (University of Amsterdam/Italned Foundation) has been working at the University of Amsterdam from 1963 up to 2001 (full professor of Italian Linguistics 1975-2001). He specialized in theoretical and applied linguistics. Among his books (28) the following should be mentioned: Persuadere e Convincere oggi: Nuovo manuale dell’argomentazione, Milano, Academia Universa Press (2009), Pa role in rete: apprendimento e teoria nell’era elettronica (ed), Novara, Utet-Università (2007); Dictionary Italian-Dutch / Dutch-Italian, Utrecht, Van Dale and Bologna, Zanichelli (2 vols., 2001); Great Electronic Dictionary Italian-Dutch / Dutch-Italian, Cd-Rom, Italned Amstelveen (2006); Dizionario Combinatorio Italiano (DICI), Amsterdam, John Benjamins Publisher (2 vols., 2013). Founder and editor of the Inter national Journal of Italian Linguistics (Foris Publications, Dordrecht 1976-1986). All his dictionaries have been on-line since the end of June 2015 (website: www.dizionarilocascio.com). Luigi Matt (University of Sassari) is Professor of History of the Italian language at the University of Sassari (Italy). He is co-director of the scientific review «Studi linguistici italiani». His research fields include literary language, lexicology and lexicography. His writings have been published in many scientific journals such as «Lingua nostra», «Studi di lessicografia italiana», «Linguistica e letteratura», «Lingua italiana d’oggi». He also published Teoria e prassi dell’epistolografia italia na tra Cinquecento e primo Seicento. Ricerche linguistiche e retoriche (Roma, Bonacci 2005), Gadda. Storia linguistica italiana (Roma, Carocci 2006), La narrativa del Novecento (Bologna, Il Mulino, 2001), ‘Quer pasticciaccio brutto de via Merulana’. Glossario romanesco (Roma, Aracne, 2012); Forme della narrativa italiana di oggi (Roma, Aracne, 2014).

328

Notes on Contributors

Adriana Orlandi (University of Modena and Reggio Emilia, Italy) has a PhD in French Linguistics, and teaches French Linguistics and Translation at the University of Modena and Reggio Emilia. Her main research interests are: Semantics, Terminology and Translation. She has been studying collocations since 2011, with a special interest in definition of collocations and its possible applications in Lexicography. In 2012 she organized the International Workshop “New perspectives on collocations” (Modena). Michele Prandi (University of Genoa, Italy) is full professor of General Linguistics and head of the Department of Modern Languages at the University of Genoa. He is Doctor H.C. at Uppsala University. His research fields include grammar and semantics of complex expressions and figures. Among his publications: Sémantique du contresens (Paris, Editions de Minuit, 1987); Grammaire philosophique des tropes (Paris, Editions de Minuit, 1992; Spanish transl: Gramática filosófica de los tropos, Madrid, Visor, 1995); The Building Blocks of Meaning (Amsterdam / Philadelphia, John Benjamins, 2004); La finalité: fondements conceptuels et genèse linguistique (with Gaston Gross) (Bruxelles, De Boeck – Duculot, 2004); La finalità. Strutture concettuali e forme di espressione in italiano (with Gaston Gross and Cristiana De Santis) (Florence, Olschki, 2005); Le regole e le scelte. Introduzione alla grammatica italiana (Turin, UTET 2006); L’analisi del periodo (Rome, Carocci, 2013). Agnès Tutin (LIDILEM Univ. Grenoble Alpes) is a chair professor in linguistics at the University Stendhal Grenoble 3 and attached to the laboratory LIDILEM. Her research focus is on corpus linguistics, phraseology and language prefabrication, mainly in the lexical scope of emotions and scientific writing. She has conducted with Francis Grossmann a funded research project on scientific writing for the French National Agency, Scientext, and is involved in a project on scientific phraseology.

Linguistic Insights Studies in Language and Communication

This series aims to promote specialist language studies in the ﬁelds of linguistic theory and applied linguistics, by publishing volumes that focus on speciﬁc aspects of language use in one or several languages and provide valuable insights into language and communication research. A cross-disciplinary approach is favoured and most European languages are accepted. The series includes two types of books: – Monographs – featuring in-depth studies on special aspects of language theory, language analysis or language teaching. – Collected papers – assembling papers from workshops, conferences or symposia. Each volume of the series is subjected to a double peer-reviewing process. Vol.

1

Maurizio Gotti & Marina Dossena (eds) Modality in Specialized Texts. Selected Papers of the 1st CERLIS Conference. 421 pages. 2001. ISBN 3-906767-10-8 · US-ISBN 0-8204-5340-4

Vol.

2

Giuseppina Cortese & Philip Riley (eds) Domain-speciﬁc English. Textual Practices across Communities and Classrooms. 420 pages. 2002. ISBN 3-906768-98-8 · US-ISBN 0-8204-5884-8

Vol.

3

Maurizio Gotti, Dorothee Heller & Marina Dossena (eds) Conﬂict and Negotiation in Specialized Texts. Selected Papers of the 2nd CERLIS Conference. 470 pages. 2002. ISBN 3-906769-12-7 · US-ISBN 0-8204-5887-2

Vol.

4

Maurizio Gotti, Marina Dossena, Richard Dury, Roberta Facchinetti & Maria Lima Variation in Central Modals. A Repertoire of Forms and Types of Usage in Middle English and Early Modern English. 364 pages. 2002. ISBN 3-906769-84-4 · US-ISBN 0-8204-5898-8

Editorial address: Prof. Maurizio Gotti

Università di Bergamo, Dipartimento di Lingue, Letterature Straniere e Comunicazione, Piazza Rosate 2, 24129 Bergamo, Italy Fax: +39 035 2052789, E-Mail: [email protected]

Vol.

5

Stefania Nuccorini (ed.) Phrases and Phraseology. Data and Descriptions. 187 pages. 2002. ISBN 3-906770-08-7 · US-ISBN 0-8204-5933-X

Vol.

6

Vijay Bhatia, Christopher N. Candlin & Maurizio Gotti (eds) Legal Discourse in Multilingual and Multicultural Contexts. Arbitration Texts in Europe. 385 pages. 2003. ISBN 3-906770-85-0 · US-ISBN 0-8204-6254-3

Vol.

7

Marina Dossena & Charles Jones (eds) Insights into Late Modern English. 2nd edition. 378 pages. 2003, 2007. ISBN 978-3-03911-257-9 · US-ISBN 978-0-8204-8927-8

Vol.

8

Maurizio Gotti Specialized Discourse. Linguistic Features and Changing Conventions. 351 pages. 2003, 2005. ISBN 3-03910-606-6 · US-ISBN 0-8204-7000-7

Vol.

9

Alan Partington, John Morley & Louann Haarman (eds) Corpora and Discourse. 420 pages. 2004. ISBN 3-03910-026-2 · US-ISBN 0-8204-6262-4

Vol.

10

Martina Möllering The Acquisition of German Modal Particles. A Corpus-Based Approach. 290 pages. 2004. ISBN 3-03910-043-2 · US-ISBN 0-8204-6273-X

Vol.

11

David Hart (ed.) English Modality in Context. Diachronic Perspectives. 261 pages. 2003. ISBN 3-03910-046-7 · US-ISBN 0-8204-6852-5

Vol.

12

Wendy Swanson Modes of Co-reference as an Indicator of Genre. 430 pages. 2003. ISBN 3-03910-052-1 · US-ISBN 0-8204-6855-X

Vol.

13

Gina Poncini Discursive Strategies in Multicultural Business Meetings. 2nd edition. 338 pages. 2004, 2007. ISBN 978-3-03911-296-8 · US-ISBN 978-0-8204-8937-7

Vol.

14

Christopher N. Candlin & Maurizio Gotti (eds) Intercultural Aspects of Specialized Communication. 2nd edition. 369 pages. 2004, 2007. ISBN 978-3-03911-258-6 · US-ISBN 978-0-8204-8926-1

Vol.

15

Gabriella Del Lungo Camiciotti & Elena Tognini Bonelli (eds) Academic Discourse. New Insights into Evaluation. 234 pages. 2004. ISBN 3-03910-353-9 · US-ISBN 0-8204-7016-3

Vol.

16

Marina Dossena & Roger Lass (eds) Methods and Data in English Historical Dialectology. 405 pages. 2004. ISBN 3-03910-362-8 · US-ISBN 0-8204-7018-X

Vol.

17

Judy Noguchi The Science Review Article. An Opportune Genre in the Construction of Science. 274 pages. 2006. ISBN 3-03910-426-8 · US-ISBN 0-8204-7034-1

Vol.

18

Giuseppina Cortese & Anna Duszak (eds) Identity, Community, Discourse. English in Intercultural Settings. 495 pages. 2005. ISBN 3-03910-632-5 · US-ISBN 0-8204-7163-1

Vol.

19

Anna Trosborg & Poul Erik Flyvholm Jørgensen (eds) Business Discourse. Texts and Contexts. 250 pages. 2005. ISBN 3-03910-606-6 · US-ISBN 0-8204-7000-7

Vol.

20

Christopher Williams Tradition and Change in Legal English. Verbal Constructions in Prescriptive Texts. 2nd revised edition. 216 pages. 2005, 2007. ISBN 978-3-03911-444-3.

Vol.

21

Katarzyna Dziubalska-Kolaczyk & Joanna Przedlacka (eds) English Pronunciation Models: A Changing Scene. 2nd edition. 476 pages. 2005, 2008. ISBN 978-3-03911-682-9.

Vol.

22

Christián Abello-Contesse, Rubén Chacón-Beltrán, M. Dolores López-Jiménez & M. Mar Torreblanca-López (eds) Age in L2 Acquisition and Teaching. 214 pages. 2006. ISBN 3-03910-668-6 · US-ISBN 0-8204-7174-7

Vol.

23

Vijay K. Bhatia, Maurizio Gotti, Jan Engberg & Dorothee Heller (eds) Vagueness in Normative Texts. 474 pages. 2005. ISBN 3-03910-653-8 · US-ISBN 0-8204-7169-0

Vol.

24

Paul Gillaerts & Maurizio Gotti (eds) Genre Variation in Business Letters. 2nd printing. 407 pages. 2008. ISBN 978-3-03911-681-2.

Vol.

25

Ana María Hornero, María José Luzón & Silvia Murillo (eds) Corpus Linguistics. Applications for the Study of English. 2nd printing. 526 pages. 2006, 2008. ISBN 978-3-03911-726-0

Vol.

26

J. Lachlan Mackenzie & María de los Ángeles Gómez-González (eds) Studies in Functional Discourse Grammar. 259 pages. 2005. ISBN 3-03910-696-1 · US-ISBN 0-8204-7558-0

Vol.

27

Debbie G. E. Ho Classroom Talk. Exploring the Sociocultural Structure of Formal ESL Learning. 2nd edition. 254 pages. 2006, 2007. ISBN 978-3-03911-434-4

Vol.

28

Javier Pérez-Guerra, Dolores González-Álvarez, Jorge L. Bueno-Alonso & Esperanza Rama-Martínez (eds) ‘Of Varying Language and Opposing Creed’. New Insights into Late Modern English. 455 pages. 2007. ISBN 978-3-03910-788-9

Vol.

29

Francesca Bargiela-Chiappini & Maurizio Gotti (eds) Asian Business Discourse(s). 350 pages. 2005. ISBN 3-03910-804-2 · US-ISBN 0-8204-7574-2

Vol.

30

Nicholas Brownlees (ed.) News Discourse in Early Modern Britain. Selected Papers of CHINED 2004. 300 pages. 2006. ISBN 3-03910-805-0 · US-ISBN 0-8204-8025-8

Vol.

31

Roberta Facchinetti & Matti Rissanen (eds) Corpus-based Studies of Diachronic English. 300 pages. 2006. ISBN 3-03910-851-4 · US-ISBN 0-8204-8040-1

Vol.

32

Marina Dossena & Susan M. Fitzmaurice (eds) Business and Ofﬁcial Correspondence. Historical Investigations. 209 pages. 2006. ISBN 3-03910-880-8 · US-ISBN 0-8204-8352-4

Vol.

33

Giuliana Garzone & Srikant Sarangi (eds) Discourse, Ideology and Specialized Communication. 494 pages. 2007. ISBN 978-3-03910-888-6

Vol.

34

Giuliana Garzone & Cornelia Ilie (eds) The Use of English in Institutional and Business Settings. An Intercultural Perspective. 372 pages. 2007. ISBN 978-3-03910-889-3

Vol.

35

Vijay K. Bhatia & Maurizio Gotti (eds) Explorations in Specialized Genres. 316 pages. 2006. ISBN 3-03910-995-2 · US-ISBN 0-8204-8372-9

Vol.

36

Heribert Picht (ed.) Modern Approaches to Terminological Theories and Applications. 432 pages. 2006. ISBN 3-03911-156-6 · US-ISBN 0-8204-8380-X

Vol.

37

Anne Wagner & Sophie Cacciaguidi-Fahy (eds) Legal Language and the Search for Clarity / Le langage juridique et la quête de clarté. Practice and Tools / Pratiques et instruments. 487 pages. 2006. ISBN 3-03911-169-8 · US-ISBN 0-8204-8388-5

Vol.

38

Juan Carlos Palmer-Silveira, Miguel F. Ruiz-Garrido & Inmaculada Fortanet-Gómez (eds) Intercultural and International Business Communication. Theory, Research and Teaching. 2nd edition. 343 pages. 2006, 2008. ISBN 978-3-03911-680-5

Vol.

39

Christiane Dalton-Puffer, Dieter Kastovsky, Nikolaus Ritt & Herbert Schendl (eds) Syntax, Style and Grammatical Norms. English from 1500–2000. 250 pages. 2006. ISBN 3-03911-181-7 · US-ISBN 0-8204-8394-X

Vol.

40

Marina Dossena & Irma Taavitsainen (eds) Diachronic Perspectives on Domain-Speciﬁc English. 280 pages. 2006. ISBN 3-03910-176-0 · US-ISBN 0-8204-8391-5

Vol.

41

John Flowerdew & Maurizio Gotti (eds) Studies in Specialized Discourse. 293 pages. 2006. ISBN 3-03911-178-7

Vol.

42

Ken Hyland & Marina Bondi (eds) Academic Discourse Across Disciplines. 320 pages. 2006. ISBN 3-03911-183-3 · US-ISBN 0-8204-8396-6

Vol.

43

Paul Gillaerts & Philip Shaw (eds) The Map and the Landscape. Norms and Practices in Genre. 256 pages. 2006. ISBN 3-03911-182-5 · US-ISBN 0-8204-8395-4

Vol.

44

Maurizio Gotti & Davide Giannoni (eds) New Trends in Specialized Discourse Analysis. 301 pages. 2006. ISBN 3-03911-184-1 · US-ISBN 0-8204-8381-8

Vol.

45

Maurizio Gotti & Françoise Salager-Meyer (eds) Advances in Medical Discourse Analysis. Oral and Written Contexts. 492 pages. 2006. ISBN 3-03911-185-X · US-ISBN 0-8204-8382-6

Vol.

46

Maurizio Gotti & Susan Šarcevi´c (eds) Insights into Specialized Translation. 396 pages. 2006. ISBN 3-03911-186-8 · US-ISBN 0-8204-8383-4

Vol.

47

Khurshid Ahmad & Margaret Rogers (eds) Evidence-based LSP. Translation, Text and Terminology. 584 pages. 2007. ISBN 978-3-03911-187-9

Vol.

48

Hao Sun & Dániel Z. Kádár (eds) It’s the Dragon’s Turn. Chinese Institutional Discourses. 262 pages. 2008. ISBN 978-3-03911-175-6

Vol.

49

Cristina Suárez-Gómez Relativization in Early English (950-1250). the Position of Relative Clauses. 149 pages. 2006. ISBN 3-03911-203-1 · US-ISBN 0-8204-8904-2

Vol.

50

Maria Vittoria Calvi & Luisa Chierichetti (eds) Nuevas tendencias en el discurso de especialidad. 319 pages. 2006. ISBN 978-3-03911-261-6

Vol.

51

Mari Carmen Campoy & María José Luzón (eds) Spoken Corpora in Applied Linguistics. 274 pages. 2008. ISBN 978-3-03911-275-3

Vol.

52

Konrad Ehlich & Dorothee Heller (Hrsg.) Die Wissenschaft und ihre Sprachen. 323 pages. 2006. ISBN 978-3-03911-272-2

Vol.

53

Jingyu Zhang The Semantic Salience Hierarchy Model. The L2 Acquisition of Psych Predicates 273 pages. 2007. ISBN 978-3-03911-300-2

Vol.

54

Norman Fairclough, Giuseppina Cortese & Patrizia Ardizzone (eds) Discourse and Contemporary Social Change. 555 pages. 2007. ISBN 978-3-03911-276-0

Vol.

55

Jan Engberg, Marianne Grove Ditlevsen, Peter Kastberg & Martin Stegu (eds) New Directions in LSP Teaching. 331 pages. 2007. ISBN 978-3-03911-433-7

Vol.

56

Dorothee Heller & Konrad Ehlich (Hrsg.) Studien zur Rechtskommunikation. 322 pages. 2007. ISBN 978-3-03911-436-8

Vol.

57

Teruhiro Ishiguro & Kang-kwong Luke (eds) Grammar in Cross-Linguistic Perspective. The Syntax, Semantics, and Pragmatics of Japanese and Chinese. 304 pages. 2012. ISBN 978-3-03911-445-0

Vol.

58

Carmen Frehner Email – SMS – MMS 294 pages. 2008. ISBN 978-3-03911-451-1

Vol.

59

Isabel Balteiro The Directionality of Conversion in English. A Dia-Synchronic Study. 276 pages. 2007. ISBN 978-3-03911-241-8

Vol.

60

Maria Milagros Del Saz Rubio English Discourse Markers of Reformulation. 237 pages. 2007. ISBN 978-3-03911-196-1

Vol.

61

Sally Burgess & Pedro Martín-Martín (eds) English as an Additional Language in Research Publication and Communication. 259 pages. 2008. ISBN 978-3-03911-462-7

Vol.

62

Sandrine Onillon Pratiques et représentations de l’écrit. 458 pages. 2008. ISBN 978-3-03911-464-1

Vol.

63

Hugo Bowles & Paul Seedhouse (eds) Conversation Analysis and Language for Speciﬁc Purposes. 2nd edition. 337 pages. 2007, 2009. ISBN 978-3-0343-0045-2

Vol.

64

Vijay K. Bhatia, Christopher N. Candlin & Paola Evangelisti Allori (eds) Language, Culture and the Law. The Formulation of Legal Concepts across Systems and Cultures. 342 pages. 2008. ISBN 978-3-03911-470-2

Vol.

65

Jonathan Culpeper & Dániel Z. Kádár (eds) Historical (Im)politeness. 300 pages. 2010. ISBN 978-3-03911-496-2

Vol.

66

Linda Lombardo (ed.) Using Corpora to Learn about Language and Discourse. 237 pages. 2009. ISBN 978-3-03911-522-8

Vol.

67

Natsumi Wakamoto Extroversion/Introversion in Foreign Language Learning. Interactions with Learner Strategy Use. 159 pages. 2009. ISBN 978-3-03911-596-9

Vol.

68

Eva Alcón-Soler (ed.) Learning How to Request in an Instructed Language Learning Context. 260 pages. 2008. ISBN 978-3-03911-601-0

Vol.

69

Domenico Pezzini The Translation of Religious Texts in the Middle Ages. 428 pages. 2008. ISBN 978-3-03911-600-3

Vol.

70

Tomoko Tode Effects of Frequency in Classroom Second Language Learning. Quasi-experiment and stimulated-recall analysis. 195 pages. 2008. ISBN 978-3-03911-602-7

Vol.

71

Egor Tsedryk Fusion symétrique et alternances ditransitives. 211 pages. 2009. ISBN 978-3-03911-609-6

Vol.

72

Cynthia J. Kellett Bidoli & Elana Ochse (eds) English in International Deaf Communication. 444 pages. 2008. ISBN 978-3-03911-610-2

Vol.

73

Joan C. Beal, Carmela Nocera & Massimo Sturiale (eds) Perspectives on Prescriptivism. 269 pages. 2008. ISBN 978-3-03911-632-4

Vol.

74

Carol Taylor Torsello, Katherine Ackerley & Erik Castello (eds) Corpora for University Language Teachers. 308 pages. 2008. ISBN 978-3-03911-639-3

Vol.

75

María Luisa Pérez Cañado (ed.) English Language Teaching in the European Credit Transfer System. Facing the Challenge. 251 pages. 2009. ISBN 978-3-03911-654-6

Vol.

76

Marina Dossena & Ingrid Tieken-Boon van Ostade (eds) Studies in Late Modern English Correspondence. Methodology and Data. 291 pages. 2008. ISBN 978-3-03911-658-4

Vol.

77

Ingrid Tieken-Boon van Ostade & Wim van der Wurff (eds) Current Issues in Late Modern English. 436 pages. 2009. ISBN 978-3-03911-660-7

Vol.

78

Marta Navarro Coy (ed.) Practical Approaches to Foreign Language Teaching and Learning. 297 pages. 2009. ISBN 978-3-03911-661-4

Vol.

79

Qing Ma Second Language Vocabulary Acquisition. 333 pages. 2009. ISBN 978-3-03911-666-9

Vol.

80

Martin Solly, Michelangelo Conoscenti & Sandra Campagna (eds) Verbal/Visual Narrative Texts in Higher Education. 384 pages. 2008. ISBN 978-3-03911-672-0

Vol.

81

Meiko Matsumoto From Simple Verbs to Periphrastic Expressions: The Historical Development of Composite Predicates, Phrasal Verbs, and Related Constructions in English. 235 pages. 2008. ISBN 978-3-03911-675-1

Vol.

82

Melinda Dooly Doing Diversity. Teachers’ Construction of Their Classroom Reality. 180 pages. 2009. ISBN 978-3-03911-687-4

Vol.

83

Victoria Guillén-Nieto, Carmen Marimón-Llorca & Chelo Vargas-Sierra (eds) Intercultural Business Communication and Simulation and Gaming Methodology. 392 pages. 2009. ISBN 978-3-03911-688-1

Vol.

84

Maria Grazia Guido English as a Lingua Franca in Cross-cultural Immigration Domains. 285 pages. 2008. ISBN 978-3-03911-689-8

Vol.

85

Erik Castello Text Complexity and Reading Comprehension Tests. 352 pages. 2008. ISBN 978-3-03911-717-8

Vol.

86

Maria-Lluisa Gea-Valor, Isabel García-Izquierdo & Maria-José Esteve (eds) Linguistic and Translation Studies in Scientiﬁc Communication. 317 pages. 2010. ISBN 978-3-0343-0069-8

Vol.

87

Carmen Navarro, Rosa Mª Rodríguez Abella, Francesca Dalle Pezze & Renzo Miotti (eds) La comunicación especializada. 355 pages. 2008. ISBN 978-3-03911-733-8

Vol.

88

Kiriko Sato The Development from Case-Forms to Prepositional Constructions in Old English Prose. 231 pages. 2009. ISBN 978-3-03911-763-5

Vol.

89

Dorothee Heller (Hrsg.) Formulierungsmuster in deutscher und italienischer Fachkommunikation. Intra- und interlinguale Perspektiven. 315 pages. 2008. ISBN 978-3-03911-778-9

Vol.

90

Henning Bergenholtz, Sandro Nielsen & Sven Tarp (eds) Lexicography at a Crossroads. Dictionaries and Encyclopedias Today, Lexicographical Tools Tomorrow. 372 pages. 2009. ISBN 978-3-03911-799-4

Vol.

91

Manouchehr Moshtagh Khorasani The Development of Controversies. From the Early Modern Period to Online Discussion Forums. 317 pages. 2009. ISBN 978-3-3911-711-6

Vol.

92

María Luisa Carrió-Pastor (ed.) Content and Language Integrated Learning. Cultural Diversity. 178 pages. 2009. ISBN 978-3-3911-818-2

Vol.

93

Roger Berry Terminology in English Language Teaching. Nature and Use. 262 pages. 2010. ISBN 978-3-0343-0013-1

Vol.

94

Roberto Cagliero & Jennifer Jenkins (eds) Discourses, Communities, and Global Englishes 240 pages. 2010. ISBN 978-3-0343-0012-4

Vol.

95

Facchinetti Roberta, Crystal David, Seidlhofer Barbara (eds) From International to Local English – And Back Again. 268 pages. 2010. ISBN 978-3-0343-0011-7

Vol.

96

Cesare Gagliardi & Alan Maley (eds) EIL, ELF, Global English. Teaching and Learning Issues 376 pages. 2010. ISBN 978-3-0343-0010-0

Vol.

97

Sylvie Hancil (ed.) The Role of Prosody in Affective Speech. 403 pages. 2009. ISBN 978-3-03911-696-6

Vol.

98

Marina Dossena & Roger Lass (eds) Studies in English and European Historical Dialectology. 257 pages. 2009. ISBN 978-3-0343-0024-7

Vol.

99

Christine Béal Les interactions quotidiennes en français et en anglais. De l’approche comparative à l’analyse des situations interculturelles. 424 pages. 2010. ISBN 978-3-0343-0027-8

Vol. 100

Maurizio Gotti (ed.) Commonality and Individuality in Academic Discourse. 398 pages. 2009. ISBN 978-3-0343-0023-0

Vol. 101

Javier E. Díaz Vera & Rosario Caballero (eds) Textual Healing. Studies in Medieval English Medical, Scientiﬁc and Technical Texts. 213 pages. 2009. ISBN 978-3-03911-822-9

Vol. 102

Nuria Edo Marzá The Specialised Lexicographical Approach. A Step further in Dictionary-making. 316 pages. 2009. ISBN 978-3-0343-0043-8

Vol. 103

Carlos Prado-Alonso, Lidia Gómez-García, Iria Pastor-Gómez & David Tizón-Couto (eds) New Trends and Methodologies in Applied English Language Research. Diachronic, Diatopic and Contrastive Studies. 348 pages. 2009. ISBN 978-3-0343-0046-9

Vol. 104

Françoise Salager-Meyer & Beverly A. Lewin Crossed Words. Criticism in Scholarly Writing? 371 pages. 2011. ISBN 978-3-0343-0049-0.

Vol. 105

Javier Ruano-García Early Modern Northern English Lexis. A Literary Corpus-Based Study. 611 pages. 2010. ISBN 978-3-0343-0058-2

Vol. 106

Rafael Monroy-Casas Systems for the Phonetic Transcription of English. Theory and Texts. 280 pages. 2011. ISBN 978-3-0343-0059-9

Vol. 107

Nicola T. Owtram The Pragmatics of Academic Writing. A Relevance Approach to the Analysis of Research Article Introductions. 311 pages. 2009. ISBN 978-3-0343-0060-5

Vol. 108

Yolanda Ruiz de Zarobe, Juan Manuel Sierra & Francisco Gallardo del Puerto (eds) Content and Foreign Language Integrated Learning. Contributions to Multilingualism in European Contexts 343 pages. 2011. ISBN 978-3-0343-0074-2

Vol. 109

Ángeles Linde López & Rosalía Crespo Jiménez (eds) Professional English in the European context. The EHEA challenge. 374 pages. 2010. ISBN 978-3-0343-0088-9

Vol. 110

Rosalía Rodríguez-Vázquez The Rhythm of Speech, Verse and Vocal Music. A New Theory. 394 pages. 2010. ISBN 978-3-0343-0309-5

Vol. 111

Anastasios Tsangalidis & Roberta Facchinetti (eds) Studies on English Modality. In Honour of Frank Palmer. 392 pages. 2009. ISBN 978-3-0343-0310-1

Vol. 112

Jing Huang Autonomy, Agency and Identity in Foreign Language Learning and Teaching. 400 pages. 2013. ISBN 978-3-0343-0370-5

Vol. 113

Mihhail Lotman & Maria-Kristiina Lotman (eds) Frontiers in Comparative Prosody. In memoriam: Mikhail Gasparov. 426 pages. 2011. ISBN 978-3-0343-0373-6

Vol. 114

Merja Kytö, John Scahill & Harumi Tanabe (eds) Language Change and Variation from Old English to Late Modern English. A Festschrift for Minoji Akimoto 422 pages. 2010. ISBN 978-3-0343-0372-9

Vol. 115

Giuliana Garzone & Paola Catenaccio (eds) Identities across Media and Modes. Discursive Perspectives. 379 pages. 2009. ISBN 978-3-0343-0386-6

Vol. 116

Elena Landone Los marcadores del discurso y cortesía verbal en español. 390 pages. 2010. ISBN 978-3-0343-0413-9

Vol. 117

Maurizio Gotti & Christopher Williams (eds) Legal Discourse across Languages and Cultures. 339 pages. 2010. ISBN 978-3-0343-0425-2

Vol. 118

David Hirsh Academic Vocabulary in Context. 217 pages. 2010. ISBN 978-3-0343-0426-9

Vol. 119

Yvonne Dröschel Lingua Franca English. The Role of Simpliﬁcation and Transfer. 358 pages. 2011. ISBN 978-3-0343-0432-0

Vol. 120

Tengku Sepora Tengku Mahadi, Helia Vaezian & Mahmoud Akbari Corpora in Translation. A Practical Guide. 135 pages. 2010. ISBN 978-3-0343-0434-4

Vol. 121

Davide Simone Giannoni & Celina Frade (eds) Researching Language and the Law. Textual Features and Translation Issues. 278 pages. 2010. ISBN 978-3-0343-0443-6

Vol. 122

Daniel Madrid & Stephen Hughes (eds) Studies in Bilingual Education. 472 pages. 2011. ISBN 978-3-0343-0474-0

Vol. 123

Vijay K. Bhatia, Christopher N. Candlin & Maurizio Gotti (eds) The Discourses of Dispute Resolution. 290 pages. 2010. ISBN 978-3-0343-0476-4

Vol. 124

Davide Simone Giannoni Mapping Academic Values in the Disciplines. A Corpus-Based Approach. 288 pages. 2010. ISBN 978-3-0343-0488-7

Vol. 125

Giuliana Garzone & James Archibald (eds) Discourse, Identities and Roles in Specialized Communication. 419 pages. 2010. ISBN 978-3-0343-0494-8

Vol. 126

Iria Pastor-Gómez The Status and Development of N+N Sequences in Contemporary English Noun Phrases. 216 pages. 2011. ISBN 978-3-0343-0534-1

Vol. 127

Carlos Prado-Alonso Full-verb Inversion in Written and Spoken English. 261 pages. 2011. ISBN 978-3-0343-0535-8

Vol. 128

Tony Harris & María Moreno Jaén (eds) Corpus Linguistics in Language Teaching. 214 pages. 2010. ISBN 978-3-0343-0524-2

Vol. 129

Tetsuji Oda & Hiroyuki Eto (eds) Multiple Perspectives on English Philology and History of Linguistics. A Festschrift for Shoichi Watanabe on his 80th Birthday. 378 pages. 2010. ISBN 978-3-0343-0480-1

Vol. 130

Luisa Chierichetti & Giovanni Garofalo (eds) Lengua y Derecho. líneas de investigación interdisciplinaria. 283 pages. 2010. 978-3-0343-0463-4

Vol. 131

Paola Evangelisti Allori & Giuliana Garzone (eds) Discourse, Identities and Genres in Corporate Communication. Sponsorship, Advertising and Organizational Communication. 324 pages. 2011. 978-3-0343-0591-4

Vol. 132

Leyre Ruiz de Zarobe & Yolanda Ruiz de Zarobe (eds) Speech Acts and Politeness across Languages and Cultures. 402 pages. 2012. 978-3-0343-0611-9

Vol. 133

Thomas Christiansen Cohesion. A Discourse Perspective. 387 pages. 2011. 978-3-0343-0619-5

Vol. 134

Giuliana Garzone & Maurizio Gotti Discourse, Communication and the Enterprise. Genres and Trends. 451 pages. 2011. ISBN 978-3-0343-0620-1

Vol. 135

Zsuzsa Hoffmann Ways of the World’s Words. Language Contact in the Age of Globalization. 334 pages 2011. ISBN 978-3-0343-0673-7

Vol. 136

Cecilia Varcasia (ed.) Becoming Multilingual. Language Learning and Language Policy between Attitudes and Identities. 213 pages. 2011. ISBN 978-3-0343-0687-5

Vol. 137

Susy Macqueen The Emergence of Patterns in Second Language Writing. A Sociocognitive Exploration of Lexical Trails. 325 pages. 2012. ISBN 978-3-0343-1010-9

Vol. 138

Maria Vittoria Calvi & Giovanna Mapelli (eds) La lengua del turismo. Géneros discursivos y terminología. 365 pages. 2011. ISBN 978-3-0343-1011-6

Vol. 139

Ken Lau Learning to Become a Professional in a Textually-Mediated World. A Text-Oriented Study of Placement Practices. 261 pages. 2012. ISBN 978-3-0343-1016-1

Vol. 140

Sandra Campagna, Giuliana Garzone, Cornelia Ilie & Elizabeth Rowley-Jolivet (eds) Evolving Genres in Web-mediated Communication. 337 pages. 2012. ISBN 978-3-0343-1013-0

Vol. 141

Edith Esch & Martin Solly (eds) The Sociolinguistics of Language Education in International Contexts. 263 pages. 2012. ISBN 978-3-0343-1009-3

Vol. 142

Forthcoming.

Vol. 143

David Tizón-Couto Left Dislocation in English. A Functional-Discoursal Approach. 416 pages. 2012. ISBN 978-3-0343-1037-6

Vol. 144

Margrethe Petersen & Jan Engberg (eds) Current Trends in LSP Research. Aims and Methods. 323 pages. 2011. ISBN 978-3-0343-1054-3

Vol. 145

David Tizón-Couto, Beatriz Tizón-Couto, Iria Pastor-Gómez & Paula Rodríguez-Puente (eds) New Trends and Methodologies in Applied English Language Research II. Studies in Language Variation, Meaning and Learning. 283 pages. 2012. ISBN 978-3-0343-1061-1

Vol. 146

Rita Salvi & Hiromasa Tanaka (eds) Intercultural Interactions in Business and Management. 306 pages. 2011. ISBN 978-3-0343-1039-0

Vol. 147

Francesco Straniero Sergio & Caterina Falbo (eds) Breaking Ground in Corpus-based Interpreting Studies. 254 pages. 2012. ISBN 978-3-0343-1071-0

Vol. 148

Forthcoming.

Vol. 149

Vijay K. Bhatia & Paola Evangelisti Allori (eds) Discourse and Identity in the Professions. Legal, Corporate and Institutional Citizenship. 352 pages. 2011. ISBN 978-3-0343-1079-6

Vol. 150

Maurizio Gotti (ed.) Academic Identity Traits. A Corpus-Based Investigation. 363 pages. 2012. ISBN 978-3-0343-1141-0

Vol. 151

Priscilla Heynderickx, Sylvain Dieltjens, Geert Jacobs, Paul Gillaerts & Elizabeth de Groot (eds) The Language Factor in International Business. New Perspectives on Research, Teaching and Practice. 320 pages. 2012. ISBN 978-3-0343-1090-1

Vol. 152

Paul Gillaerts, Elizabeth de Groot, Sylvain Dieltjens, Priscilla Heynderickx & Geert Jacobs (eds) Researching Discourse in Business Genres. Cases and Corpora. 215 pages. 2012. ISBN 978-3-0343-1092-5

Vol. 153

Yongyan Zheng Dynamic Vocabulary Development in a Foreign Language. 262 pages. 2012. ISBN 978-3-0343-1106-9

Vol. 154

Carmen Argondizzo (ed.) Creativity and Innovation in Language Education. 357 pages. 2012. ISBN 978-3-0343-1080-2

Vol. 155

David Hirsh (ed.) Current Perspectives in Second Language Vocabulary Research. 180 pages. 2012. ISBN 978-3-0343-1108-3

Vol. 156

Seiji Shinkawa Unhistorical Gender Assignment in Lahamon’s Brut. A Case Study of a Late Stage in the Development of Grammatical Gender toward its Ultimate Loss. 186 pages. 2012. ISBN 978-3-0343-1124-3

Vol. 157

Yeonkwon Jung Basics of Organizational Writing: A Critical Reading Approach. 151 pages. 2014. ISBN 978-3-0343-1137-3.

Vol. 158

Bárbara Eizaga Rebollar (ed.) Studies in Linguistics and Cognition. 301 pages. 2012. ISBN 978-3-0343-1138-0

Vol. 159

Giuliana Garzone, Paola Catenaccio, Chiara Degano (eds) Genre Change in the Contemporary World. Short-term Diachronic Perspectives. 329 pages. 2012. ISBN 978-3-0343-1214-1

Vol. 160

Carol Berkenkotter, Vijay K. Bhatia & Maurizio Gotti (eds) Insights into Academic Genres. 468 pages. 2012. ISBN 978-3-0343-1211-0

Vol. 161

Beatriz Tizón-Couto Clausal Complements in Native and Learner Spoken English. A corpus-based study with Lindsei and Vicolse. 357 pages. 2013. ISBN 978-3-0343-1184-7

Vol. 162

Patrizia Anesa Jury Trials and the Popularization of Legal Language. A Discourse Analytical Approach. 247 pages. 2012. ISBN 978-3-0343-1231-8

Vol. 163

David Hirsh Endangered Languages, Knowledge Systems and Belief Systems. 153 pages. 2013. ISBN 978-3-0343-1232-5

Vol. 164

Eugenia Sainz (ed.) De la estructura de la frase al tejido del discurso. Estudios contrastivos español/italiano. 305 pages. 2014. ISBN 978-3-0343-1253-0

Vol. 165

Julia Bamford, Franca Poppi & Davide Mazzi (eds) Space, Place and the Discursive Construction of Identity. 367 pages. 2014. ISBN 978-3-0343-1249-3

Vol. 166

Rita Salvi & Janet Bowker (eds) Space, Time and the Construction of Identity. Discursive Indexicality in Cultural, Institutional and Professional Fields. 324 pages. 2013. ISBN 978-3-0343-1254-7

Vol. 167

Shunji Yamazaki & Robert Sigley (eds) Approaching Language Variation through Corpora. A Festschrift in Honour of Toshio Saito. 421 pages. 2013. ISBN 978-3-0343-1264-6

Vol. 168

Franca Poppi Global Interactions in English as a Lingua Franca. How written communication is changing under the inﬂuence of electronic media and new contexts of use. 249 pages. 2012. ISBN 978-3-0343-1276-9

Vol. 169

Miguel A. Aijón Oliva & María José Serrano Style in syntax. Investigating variation in Spanish pronoun subjects. 239 pages. 2013. ISBN 978-3-0343-1244-8

Vol. 170

Inés Olza, Óscar Loureda & Manuel Casado-Velarde (eds) Language Use in the Public Sphere. Methodological Perspectives and Empirical Applications 564 pages. 2014. ISBN 978-3-0343-1286-8

Vol. 171

Aleksandra Matulewska Legilinguistic Translatology. A Parametric Approach to Legal Translation. 279 pages. 2013. ISBN 978-3-0343-1287-5

Vol. 172

Maurizio Gotti & Carmen Sancho Guinda (eds) Narratives in Academic and Professional Genres. 513 pages. 2013. ISBN 978-3-0343-1371-1

Vol. 173

Madalina Chitez Learner corpus proﬁles. The case of Romanian Learner English. 244 pages. 2014. ISBN 978-3-0343-1410-7

Vol. 174

Chihiro Inoue Task Equivalence in Speaking Tests. 251 pages. 2013. ISBN 978-3-0343-1417-6

Vol. 175

Gabriel Quiroz & Pedro Patiño (eds.) LSP in Colombia: advances and challenges. 339 pages. 2014. ISBN 978-3-0343-1434-3

Vol. 176

Catherine Resche Economic Terms and Beyond: Capitalising on the Wealth of Notions. How Researchers in Specialised Varieties of English Can Beneﬁt from Focusing on Terms. 332 pages. 2013. ISBN 978-3-0343-1435-0

Vol. 177

Forthcoming.

Vol. 178

Cécile Desoutter & Caroline Mellet (dir.) Le discours rapporté: approches linguistiques et perspectives didactiques. 270 pages. 2013. ISBN 978-3-0343-1292-9

Vol. 179

Ana Díaz-Negrillo & Francisco Javier Díaz-Pérez (eds) Specialisation and Variation in Language Corpora. 341 pages. 2014. ISBN 978-3-0343-1316-2

Vol. 180

Pilar Alonso A Multi-dimensional Approach to Discourse Coherence. From Standardness to Creativity. 247 pages. 2014. ISBN 978-3-0343-1325-4

Vol. 181

Alejandro Alcaraz-Sintes & Salvador Valera-Hernández (eds) Diachrony and Synchrony in English Corpus Linguistics. 393 pages. 2014. ISBN 978-3-0343-1326-1

Vol. 182

Runhan Zhang Investigating Linguistic Knowledge of a Second Language. 207 pages. 2015. ISBN 978-3-0343-1330-8

Vol. 183

Hajar Abdul Rahim & Shakila Abdul Manan (eds.) English in Malaysia. Postcolonial and Beyond. 267 pages. 2014. ISBN 978-3-0343-1341-4

Vol. 184

Virginie Fasel Lauzon Comprendre et apprendre dans l’interaction. Les séquences d’explication en classe de français langue seconde. 292 pages. 2014. ISBN 978-3-0343-1451-0

Vol. 185

Forthcoming.

Vol. 186

Wei Ren L2 Pragmatic Development in Study Abroad Contexts 256 pages. 2015. ISBN 978-3-0343-1358-2

Vol. 187

Marina Bondi & Rosa Lorés Sanz (eds) Abstracts in Academic Discourse. Variation and Change. 361 pages. 2014. ISBN 978-3-0343-1483-1

Vol. 188

Forthcoming.

Vol. 189

Paola Evangelisti Allori (ed.) Identities in and across Cultures. 315 pages. 2014. ISBN 978-3-0343-1458-9

Vol. 190

Erik Castello, Katherine Ackerley & Francesca Coccetta (eds). Studies in Learner Corpus Linguistics. Research and Applications for Foreign Language Teaching and Assessment. 358 pages. 2015. ISBN 978-3-0343-1506-7

Vol. 191

Ruth Breeze, Maurizio Gotti & Carmen Sancho Guinda (eds) Interpersonality in Legal Genres. 389 pages. 2014. ISBN 978-3-0343-1524-1

Vol. 192

Paola Evangelisti Allori, John Bateman & Vijay K. Bhatia (eds) Evolution in Genre. Emergence, Variation, Multimodality. 364 pages. 2014. ISBN 978-3-0343-1533-3

Vol. 193

Jiyeon Kook Agency in Arzt-Patient-Gesprächen. Zur interaktionistischen Konzeptualisierung von Agency 271 pages. 2015. ISBN 978-3-0343-1666-8

Vol. 194

Susana Nicolás Román & Juan José Torres Núñez (eds) Drama and CLIL. A new challenge for the teaching approaches in bilingual education. 170 pages. 2015. ISBN 978-3-0343-1629-3

Vol. 195

Alessandra Molino & Serenella Zanotti (eds) Observing Norm, Observing Usage. Lexis in Dictionaries and in the Media. 430 pages. 2015. ISBN 978-3-0343-1584-5

Vol. 196

Begoña Soneira A Lexical Description of English for Architecture. A Corpus-based Approach. 267 pages. 2015. ISBN 978-3-0343-1602-6

Vol. 197

M Luisa Roca-Varela False Friends in Learner Corpora. A corpus-based study of English false friends in the written and spoken production of Spanish learners. 348 pages. 2015. ISBN 978-3-0343-1620-0

Vol. 198

Rahma Al-Mahrooqi & Christopher Denman Bridging the Gap between Education and Employment. English Language Instruction in EFL Contexts. 416 pages. 2015. ISBN 978-3-0343-1681-1

Vol. 199

Rita Salvi & Janet Bowker (eds) The Dissemination of Contemporary Knowledge in English. Genres, discourse strategies and professional practices. 171 pages. 2015. ISBN 978-3-0343-1679-8

Vol. 200

Maurizio Gotti & Davide S. Giannoni (eds) Corpus Analysis for Descriptive and Pedagogical Purposes. ESP Perspectives. 432 pages. 2014. ISBN 978-3-0343-1516-6

Vol. 201

Ida Ruffolo The Perception of Nature in Travel Promotion Texts. A Corpus-based Discourse Analysis. 148 pages. 2015. ISBN 978-3-0343-1521-0

Vol. 202

Ives Trevian English sufﬁxes. Stress-assignment properties, productivity, selection and combinatorial processes. 471 pages. 2015. ISBN 978-3-0343-1576-0

Vol. 203

Maurizio Gotti, Stefania Maci & Michele Sala (eds) Insights into Medical Communication. 422 pages. 2015. ISBN 978-3-0343-1694-1

Vol. 204

Carmen Argondizzo (ed.) European Projects in University Language Centres. Creativity, Dynamics, Best Practice. 371 pages. 2015. ISBN 978-3-0343-1696-5

Vol. 205

Aura Luz Duffé Montalván (ed.) Estudios sobre el léxico. Puntos y contrapuntos. 502 pages. 2016. ISBN 978-3-0343-2011-5

Vol. 206

Maria Pavesi, Maicol Formentelli & Elisa Ghia (eds) The Languages of Dubbing. Mainstream Audiovisual Translation in Italy. 275 pages. 2014. ISBN 978-3-0343-1646-0

Vol. 207

Forthcoming.

Vol. 208

Vijay K. Bhatia & Maurizio Gotti (eds) Arbitration Discourse in Asia. 331 pages. 2015. ISBN 978-3-0343-2032-0

Vol. 209

Forthcoming.

Vol. 210

Francisco Alonso Almeida, Laura Cruz García & Víctor González Ruiz (eds) Corpus-based studies on language varieties. 285 pages. 2016. ISBN 978-3-0343-2044-3

Vol. 211

Juan Pedro Rica Peromingo Aspectos lingüísticos y técnicos de la traducción audiovisual (TAV). 177 pages. 2016. ISBN 978-3-0343-2055-9

Vol. 212-213 Forthcoming. Vol. 214

Larissa D’Angelo Academic posters. A textual and visual metadiscourse analysis. 367 pages. 2016. ISBN 978-3-0343-2083-2

Vol. 215

Evelyne Berger Prendre la parole en L2. Regard sur la compétence d’interaction en classe de langue. 246 pages. 2016. ISBN 978-3-0343-2084-9

Vol. 216

David Lasagabaster and Aintzane Doiz (eds) CLIL experiences in secondary and tertiary education: In search of good practices. 262 pages. 2016. ISBN 978-3-0343-2104-4

Vol. 217

Forthcoming.

Vol. 218

Sandra Campagna, Elana Ochse, Virginia Pulcini & Martin Solly (eds) Languaging in and across Communities: New Voices, New Identities. Studies in Honour of Giuseppina Cortese. 507 pages. 2016. ISBN 978-3-0343-2073-3

Vol. 219

Adriana Orlandi & Laura Giacomini (ed.) Deﬁning collocation for lexicographic purposes. From linguistic theory to lexicographic practice. 328 pages. 2016. ISBN 978-3-0343-2054-2