295 134 3MB
English Pages 312 Year 2005
Edouard Machery • Markus Werning • Gerhard Schurz (Eds.) The Compositionality of Meaning and Content Volume II: Applications to Linguistics, Psychology and Neuroscience
LINGUISTICS & PHILOSOPHY Edited by Günther Grewendorf • Wolfram Hinzen Hans Kamp • Helmut Weiss Band 2 / Volume 2
Edouard Machery • Markus Werning Gerhard Schurz (Eds.)
The Compositionality of Meaning and Content Volume II Applications to Linguistics, Psychology and Neuroscience
ontos verlag Frankfurt I Paris I Ebikon I Lancaster I New Brunswick
Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliographie; detailed bibliographic data is available in the Internet at http://dnb.ddb.de
North and South America by Transaction Books Rutgers University Piscataway, NJ 08854-8042 [email protected] United Kingdom, Ire, Iceland, Turkey, Malta, Portugal by Gazelle Books Services Limited White Cross Mills Hightown LANCASTER, LA1 4XS [email protected]
2005 ontos verlag P.O. Box 15 41, D-63133 Heusenstamm www.ontosverlag.com ISBN 3-937202-53-6 2005 No part of this book may be reproduced, stored in retrieval systems or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use of the purchaser of the work Printed on acid-free paper ISO-Norm 970-6 FSC-certified (Forest Stewardship Council) This hardcover binding meets the International Library standard Printed in Germany by buch bücher dd ag
Contents Preface
I
C OMPOSITIONALITY AND L ANGUAGE
7
11
Compositionality, Linguistic Evolution, and Induction by Minimum Description Length 13 Henry Brighton Compositionality and Molecularism Denis Bonnay
41
Compositionality, Aberrant Sentences and Unfamiliar Situations Filip Buekens
63
Challenging the Principle of Compositionality in Interpreting Natural Language Texts 83 Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy Compositionality in Plant Fixed Expressions Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
107
Type- and Token-Level Composition Olav Mueller-Reichau
123
Compositionality and Contextuality as Adjoint Principles Oleg Prosorov
149
II
C OMPOSITIONALITY AND THE M IND
177
Perspectives, Compositionality and Complex Concepts Nick Braisby
179
Compositionality and the Pragmatics of Conceptual Combination Fintan J. Costello and Mark T. Keane
203
The Goldilocks Scenario: Is Noun-noun Compounding Compositional? George Dunbar
217
Structured Thoughts: The Spatial-Motor View Pierre Poirier and Benoit Hardy-Vall´ee
229
III
251
C OMPOSITIONALITY AND THE B RAIN
Learning Representations: How Stochastic Models Compose the World Ralf Garionis Neural Architectures of Compositionality Frank van der Velde
253
265
Neuronal Synchronization, Covariation, and Compositional Representation 283 Markus Werning
Preface As editors of the two volumes The Compositionality of Meaning and Content: Foundational Issues and The Compositionality of Meaning and Content: Applications to Linguistics, Psychology and Neuroscience we are aiming at an audience of students, scholars and scientists who are interested in the topic of compositionality and want to learn about it from a great variety of perspectives. Under the umbrella of cognitive science, psychologists, philosophers, linguists, logicians, modelers and neuroscientists have been concerned with the compositionality of representational structures such as language, mind and brain for a long time. However, little exchange took place across the disciplines on that topic. For the first time, a compendium of two volumes has now been brought together that conjoins those often diverging perspectives. The first volume was exploring foundational issues regarding the principle of compositionality. It focused on its justification, scope and formal analysis, its relation to holism and contextual aspects, as well as its implications for the nature of meaning, thought and other forms of representation. The first volume also contains a longer preface we would like to defer the reader to, if she is interested in a survey of the various contemporary debates on compositionality. The preface of the first volume synoptically relates the papers of both volumes to each other and to the relevant topics debated in the cognitive sciences. The second volume now shifts perspectives towards more applied issues. This is why the contributors now predominantly come from linguistics, psychology and computational neuroscience. Still a couple of papers by logicians and philosophers have been added because of their relevance for the issues discussed in this volume. This volume is divided into three parts. The first part comprises papers on the role of compositionality in the sciences of language. It begins with Henry Brighton’s study on the emergence of compositional structures in the evolution of language. Denis Bonnay then discusses the question of how many sentences of a language one has to take into account to determine the meaning of its basic expressions. He contrasts the holistic view, according to which it is necessary to consider the whole use of sentences, with the molecular view, according to which it suffices to consider the use of some sentences, only. To decide the question, he appeals to the principle of compositionality and distinguishes several readings of it. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
8
Filip Buekens asks whether we can understand syntactically well-formed, but semantically aberrant sentences like ‘Oscar opened the mountain.’ Whereas truth-functional compositionalists say we can, some truth-conditional pragmatists claim that we don’t. Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy argue that the hypothesis of compositionality is defective because it fails to account for the interpretation of natural language texts. As an alternative they favor a non-monotonic system of rules, which express constraints on diverse levels of linguistic and extra-linguistic knowledge. Oleg Prosorov, in contrast, takes a positive stance towards the principle of compositionality and explores its role in hermeneutics. He argues that it is adjoint to the context principle. Two applied studies round up the first part on the role of compositionality in the sciences of language by comparing the compositional structures of different languages. Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier argue for a semantic difference in the composition of idiomatic expressions involving words for plants in German and Mandarin Chinese. Olav MuellerReichau, finally, focuses on a semantic type/token difference as is, e.g., responsible for the ambiguity of the English sentence ‘The Dutchman is a good sailor.’ Unlike English, languages such as Hungarian or Turkish unequivocally signal the type/token status by syntactic means. Pointing to languages like Tongan, Mueller-Reichau suggests that the syntactic specification of the type/token status may outbalance the lexical specification of the noun/verb distinction so that there may be ‘type/token languages’ and ‘noun/verb languages.’ The second part of the book deals with the role of compositionality in psychology. Here the composition of complex concepts and its relation to categorization is the main focus. Nick Braisby presents an account of concepts and categorization that regards concepts as binary, and categorization as perspectival. So, he argues, a number of problematic cases for the compositional interpretation of complex concepts can be accounted for. Fintan Costello and Mark Keane plead for a graded definition of compositionality. The degree of compositionality of a complex expression is identified with the degree to which that expression can be understood correctly by people with a range of differences in background knowledge. Compositionality is thus linked to the pragmatics of communication. George Dunbar supplements the discussion of conceptual complexity with a corpus study and some experimental data. The last paper in the second part, authored by Pierre Poirier and Benoit Hardy-Vall´ee, asks if thinking is linguistic. The two authors argue that we are thinking not in words, but in ‘structured analogous representations.’ The third part of this volume discusses the issue of compositionality in the light of neuronal models of information processes. Ralf Garionis presents a stochastic model for the generation of compositional representations. Frank van der Velde and Markus Werning present different models for the realization
Preface
9
of compositional semantic structures in neural architectures. Both papers are closely related to the binding problem of neuroscience. While van der Velde rejects neural synchronization as a vital means to solve the binding problem and to achieve compositional structures, Werning’s model is based on the hypothesis of neuronal synchronization. The contributions to these volumes originated in presentations to two conferences organized by the editors of these volumes. The first conference, CoCoCo, Compositionality, Concept, & Cognition, was organized in March 2004 at the Heinrich-Heine University D¨usseldorf, Germany. The second, NAC 2004, New Aspects of Compositionality, was organized in June 2004 at the Ecole Normale Sup´erieure, Paris, France. The two conferences on compositionality and the two volumes resulting from them would not have been possible without the support of many people and various institutions. We would like to repeat our thanks to all of them here. Compositionality, Concepts, & Cognition in D¨usseldorf was generously funded by the Thyssen Foundation and the Heinrich-Heine University D¨usseldorf, while New Aspects of Compositionality in Paris was financially supported by the RESCIF (R´eseau des Sciences Cognitives en Ile-de-France), the department of cognitive studies at the Ecole Normale Sup´erieure in Paris, the University of Paris-Sorbonne, and the Institut Jean-Nicod. We, the editors, would like to thank these institutions for their support. We are also pleased to express our gratitude to those who have trusted us and who have supported our efforts. We would particularly like to thank Daniel Andler, director of the department of cognitive studies at the Ecole Normale Sup´erieure, who backed this project from the outset, as well as Pascal Engel (University of Paris-Sorbonne, philosophy department) and Pierre Jacob (Institut Jean-Nicod, Paris). We would like to thank the prestigious scientific board for the two conferences in Paris and D¨usseldorf, which consisted of the following distinguished scholars: Daniel Andler (department of cognitive studies, ENS, Paris), Peter Carruthers (department of philosophy, University of Maryland), James Hampton (department of psychology, City University London), Douglas Medin (department of psychology, Northwestern University), Jesse Prinz (department of philosophy, University of North Carolina, Chapel-Hill), Francois Recanati (Institut Jean-Nicod, Paris), Philippe Schlenker (department of linguistics, UCLA), and Dag Westerst˚ahl (department of philosophy, University of Gotenborg). We would also like to thank the referees for the two volumes: Claire Beyssade, Daniel Cohnitz, Fabrice Correia, David Danks, J´erome Dokic, Chris Eliasmith, Manuel Garcia-Carpintero, James Hampton, Heidi Harley, Paul Horwich, Theo Janssen, Kent Johnson, Ingrid Kaufman, Marcus Kracht, Hannes Leitgeb, Pascal Ludwig, Alexander Maye, Thomas M¨uller, Reiner Osswald,
10
Elisabeth Pacherie, Peter Pagin, J´erome Pelletier, Josh Schechter, Benjamin Spector, Su Xiaoqin, and Dan Weiskopf. Finally, in the making of the D¨usseldorf conference a number of helpful hands were involved: in particular, those of Myung-Hee Theuer, Marc Breuer, Jens Fleischhauer, Hakan Beseoglu, Markus Stalf, Eckhart Arnold, Nicole Altvater, Marco Lagger, Sven Sommerfeld, and Celia Spoden. The Parisian conference would not have been possible without the help of Evelyne Delmer at the department of cognitive studies of the Ecole Normale Sup´erieure. To all those we cordially express our thanks. Pittsburgh and D¨usseldorf, October 2005 M.W., E.M., & G.S.
Part I
C OMPOSITIONALITY AND L ANGUAGE
Compositionality, Linguistic Evolution, and Induction by Minimum Description Length Henry Brighton
1
Introduction
The following question is crucial to both linguistics and cognitive science in general: How can we go about explaining why language has certain structural properties and not others? The dominant explanation proposes that constraints on linguistic variation – universal patterns found in language – are a direct reflection of properties of a genetically determined language faculty (e.g., Chomsky, 1965, 2002). Compositionality is one such universal characteristic of language. The dominant explanation suffices if we regard the process of language acquisition to be in no way a process involving inductive generalizations. This is to say that the essential characteristics of language are not learned in any meaningful sense, as they are not the product of inductive generalizations made from linguistic data. This conjecture depends in large part on what is known as the argument from the poverty of the stimulus (APS) which states that the data required to make the appropriate inductive generalizations is simply not available to a child during language acquisition (e.g., Wexler, 1991). Despite the dominance of this theory, the assumption that linguistic universals are in no sense acquired as a result of inductive generalizations is controversial: the APS is only conjecture, and is in opposition to several alternative theoretical standpoints and empirical studies (see Cowie, 1999; Pullum and Scholz, 2002). In the discussion that follows, I consider how deviation from the extreme position suggested by the APS poses a problem when explaining linguistic universals such as compositionality. To address this deficiency, the following discussion considers an alternative explanation for the occurrence of compositionality in language. The validity of this alternative is tested using a computational Address for correspondence: Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Lentzeallee 94, D-14195 Berlin, Germany. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
14
Henry Brighton
model. In particular, I will argue that compositionality in language is not a direct reflection of our genetic endowment, but is instead fundamentally related to the way language is learned and culturally transmitted (see also Deacon, 1997; Christiansen and Kirby, 2003; Brighton et al., 2005). Central to this explanation is the role of inductive inference. For this reason, the computational model discussed below employs a model of induction based on the minimum description length principle (Rissanen, 1978, 1989). 2
Issues in Explaining Linguistic Universals
Language is a particular system of relating sound and meaning. Individual languages achieve this relationship in different, but tightly constrained ways. That is to say that variation exists across languages, but the object of study for many linguists are the common structural hallmarks we see across the worlds languages. Why do all languages share these properties? Among those interested in language, a widespread hypothesis is that these intrinsic properties of language are, like the visual system, an expression of the genes (e.g., Chomsky, 2002). To support this view, we can note that children master complex features of language on the basis of surprisingly little evidence. In fact, as we have seen, the APS is a conjecture stipulating that the knowledge of language children attain is surprising precisely because it cannot be derived solely from information made available by the linguistic environment (e.g., Chomsky, 1965; Wexler, 1991; Cowie, 1999; Pullum and Scholz, 2002). The modern debate on the innateness of language is dominated by the notion that the framework for linguistic development is innate, with the linguistic environment serving to supply information that steers an internally directed course of development. In this sense, languages themselves (e.g., Spanish, Mandarin Chinese) are not encoded entirely in the genes, but the fundamental, abstract properties of language are. How can we gain an understanding of these innately specified hallmarks of language? Linguistics, by conducting a thorough analysis of the world’s languages, proposes a set of descriptive statements which capture these hallmarks of language. For example, the linguist may identify properties common to all languages they encounter, properties that occur according to a certain statistical distribution, or implicational hierarchies of properties that fit with known languages. Collectively, such descriptive statements constitute a theory of language universals (e.g., Comrie, 1989; Croft, 1990; O’Grady et al., 1997). Linguistic universals define the dimensions of variation in language. Modern linguistic theory rests on the assertion that it is these dimensions of variation that are genetically determined. As an explanatory framework this approach to explaining why language ex-
Compositionality and Linguistic Evolution
15
hibits specific structural characteristics is very powerful. One aspect of its strength is that by coupling universal properties of language tightly to a theory of innate constraints our analysis of the structural hallmarks of language must center on a wholly psychological (ie., cognitive, mentalistic, or internalist) explanation. As a consequence, by understanding those parts of the human cognitive system relevant to language we can understand why languages have certain structural characteristics and not others. With respect to understanding language, our object of study has been circumscribed to encompass a physical organ: the brain. Hence, the relationship between descriptive statements of universal properties of language and an explanatory statements of why language is this way are therefore largely transparent. As we have seen, this position is largely substantiated by the argument from the poverty of the stimulus. One outcome of this hypothesis is that children do not learn language in the usual sense, but rather they acquire it as a result of the internally directed processes of maturation. For example, Chomsky states that “it must be that the basic structure of language is essentially uniform and is coming from inside, not from outside” (Chomsky, 2002). The claim that language is not learned causes a great deal of controversy and will impact heavily on the discussion to come. Nevertheless, to characterize the traditional position, we should note that language is often considered part of our biological endowment, just like the visual system. The intuition is that one would not want to claim that we learn to see, and the same way, we should not claim that we learn speak. Language learning under innate constraints Linguistic nativism is far from accepted in the extreme form presented above (for insightful discussion, see Cowie, 1999; Jackendoff, 2002; Culicover, 1999). A less extreme alternative to this hypothesis is that the structure of language, to some extent, is learned by children: humans can arrive at complex knowledge of language without the need to have hard-wired (genetically determined) expectations of all dimensions of linguistic variation. This is the view I we will adopt throughout this article. I assume that to some degree language is learned through inductive generalizations from linguistic data, but to what degree it is learned is unclear. My position is therefore at odds with Chomsky’s positional that knowledge of language goes “far beyond the presented primary linguistic data and is in no sense an ‘inductive generalization’ from these data.” (Chomsky, 1965). What evidence can we draw on to resolve this debate? Frustratingly, there is little concrete evidence either way; linguistics lacks a rigorous account of which (if any) aspects of language are acquired on the basis of innate constraints (Pullum and Scholz, 2002; Culicover, 1999). This debate is often reduced to statements such as “linguistic structure is much more complex than
16
Henry Brighton
the average empiricist supposes” (Wagner, 2001), and claims that “the attained grammar goes orders of magnitude beyond the information provided by the input data” (Wexler, 1991). These claims are backed up with specific examples designed to show how children’s knowledge of language extends beyond what the data suggests (e.g., Kimball, 1973; Baker, 1978; Crain, 1991; Lidz et al., 2003; Lidz and Waxman, 2004). Nevertheless, many still argue that the required information is in fact present in the linguistic data (Pullum, 1996; Pullum and Scholz, 2002), and to claim that it is not is “unfounded hyperbole” (Sampson, 1989). These rebuttles of the argument from the poverty of the stimulus are often based on the notion that “[l]earning is much more powerful than previously believed” (Bates and Elman, 1996). It should be noted that this stance in no way denies the fact that language has an innate biological basis. Only humans can acquire language, so any theory of language must consider an innateness hypothesis of some form. The real issue is the degree to which language acquisition is a process of induction from data within constraints (Elman, 2003). In the light of this debate, I will make an assumption that will be carried through the remainder of the article: If we deviate from the position that language acquisition in no sense involves inductive generalizations (ie., question the APS), then we must acknowledge that the linguistic environment must supply information. This information impacts on how universal properties of languages, like compositionality, are represented and processed within the cognitive system. Towards an evolutionary explanation The thrust of this discussion rests on the realization that the degree to which language is learned through a process of inductive generalization has a profound affect on the character of the explanatory framework we use to understand why language has the structure that it does (Brighton et al., 2005). Why is this? If induction plays a role in determining knowledge of language, then environmental considerations must be taken seriously; any linguistic competence acquired through learning will be determined to a significant degree by the structure, or information, present in the environment. The environment must be supplying structural information in order for induction to occur. To achieve explanatory adequacy we must now explain why the environment is the way it is: How did this information come to exist? To address this issue I will argue for an evolutionary perspective, and seek to explain how, from a non-linguistic environment, compositional structure can develop through linguistic evolution. In short, this view casts doubt on the view that the hallmarks of language are, as Chomsky states, “coming from inside, not from outside.” Necessarily, if inductive generalizations made from data contained in the environment determine the kind of linguistic structure we observe, then a wholly psychological theory of linguistic structure must be inadequate.
Compositionality and Linguistic Evolution
3
17
Linguistic Evolution Through Iterated Learning
Languages are transmitted culturally through, firstly, the production of utterances by one generation and, secondly, the induction of a grammar by the next generation, based on these utterances. This cycle, of repeated production and induction, is crucial to understanding the diachronic process of language change (e.g., Andersen, 1973; Hurford, 1990). Focusing on this process of linguistic transmission, several computational models have demonstrated how phenomena of language change can be understood in these terms (Clark and Roberts, 1993; Hare and Elman, 1995; Niyogi and Berwick, 1997). The study of language change seeks to understand how full-blown human languages undergo structural change over time. For example, these models could inform an enquiry into the morphological change that characterized the transition from, say, Latin to French. Of more importance to this discussion are studies that focus specifically on the linguistic evolution of linguistic complexity from nonlinguistic communication systems (Batali, 2002; Kirby 2002, 2001; Brighton, 2002; Smith et al., 2003b). It should be noted, therefore, that linguistic evolution is a process that drives both evolution and change in language. Studies of language evolution are concerned with the origin of linguistic structure found in human languages while studies of language change are concerned with how such languages alter over time. Much of the work focusing on the evolution of language through linguistic evolution has been consolidated under a single computational modeling framework termed the iterated learning model (Kirby, 2001; Brighton, 2002; Smith et al., 2003a,b). In this article I will use an iterated learning model to demonstrate the evolution of compositionality. This model will be based on a contemporary theory of induction known as the minimum description length principle. An iterated learning model is composed of a series of agents organized into generations. Language is transmitted through these agents: the agents act as a conduit for language. For a language to be transmitted from one agent to another, it must externalized by one agent (through language production), and then learned by another (through language acquisition). An agent therefore must have the ability to learn from examples of language use. Learning results in the induction of a hypothesis on the basis of the examples. This hypothesis represents the agent’s knowledge of language. Using the hypothesis, an agent also has the ability to produce examples of language use itself. Agents, therefore, have the ability to interrogate an induced hypothesis to yield examples of language use. Within this general setting, we can explore how the process of linguistic evolution is related to the mechanisms of hypothesis induction (language acquisition), and hypothesis interrogation (language production). A language is a particular mapping between a meaning space and a signal
18
Henry Brighton
space. Meanings are often modeled as compound structures such as feature vectors or logical expressions. Signals are usually serial structures, such as a string of symbols. Knowledge of language (a hypothesis) is a representation of this mapping. This representation could be modeled by any one of a number of computational models of inductive inference. The basic iterated learning model considers each individual agent in turn. Throughout this article I will consider the case when each generation contains only one agent. The first agent in the simulation, Agent 1, is initialized with knowledge of language, h1 , the precise nature of which will depend on the learning model used. This hypothesis will represent knowledge of some language Lh1 . Agent 1 then produces some set of utterances Lh0 1 by interrogating the hypothesis h1 . This newly constructed set of utterances will be a subset of the mapping (language) Lh1 . These utterances are then passed to the next agent to learn from. Once the language has been transmitted from the first to the second agent, the first agent plays no further part in the simulation. The simulation proceeds by iteratively introducing a new agent to transmit the language. Each agent represent one generation, and the experiment is run for some number of generations. The important point is that, under certain conditions, the language will change from one generation to another; it will evolve and undergo adaptation. This process is illustrated in Figure 1. One crucial driving force behind linguistic evolution is the transmission bottleneck, which imposes a constraint on how languages are transmitted. The transmission bottleneck reflects, as a constraint within the model, the fact that natural languages cannot be transmitted in totality from one individual to another. Linguistic data is never exhaustive; it is always sparse. For example, in the case of natural language, it is impossible for such an infinitely large mapping between between meanings and signals to be externalized. In the iterated learning model this situation occurs too: Within each body of linguistic data only a subset of the set of possible meanings will be associated with a signal. This constraint will hit hard when we consider the process of language production. Production is the process by which agents find signals for meanings they are prompted to produce. The meanings the agent must produce signals for are, in the model that follows, always drawn at random from the meaning space. Production has to occur in the model, as an agent needs to create a set of utterances from which the next agent in the simulation is to learn from. If a meaning was not seen by an agent in conjunction with a signal during acquisition, then how is the agent to produce an appropriate signal? There are two courses of action. First, the agent can use the generative capacity of the induced hypothesis to yield an appropriate signal; in this case the agent will have generalized. Second, if generalization is not possible, then the agent will have to invent some signal for the meaning using some other means.
Compositionality and Linguistic Evolution Language represented by Agent 1
Hypothesis
h1
19
h1
Lh
1
L’h 1 Externalised utterances Agent 2
Hypothesis
h2
Lh 2 L’h 2
Agent 3
Hypothesis
h3
Lh 3 L’h 3
Figure 1: The iterated learning model. Agent 1 has knowledge of language represented by hypothesis h1 . This hypothesis represents a language Lh1 . Some subset of this mapping, Lh0 1 , is externalized as linguistic performance by Agent 1 for the next agent, Agent 2, to learn from. On the basis of this performance, Agent 2 induces hypothesis h2 . The process is then repeated, generation after generation.
20
Henry Brighton
The impact of the transmission bottleneck has two interpretations within an iterated learning model. On the one hand, it represents a constraint on transmission. On the other, it represents a constraint on how much evidence is available to the learning algorithm used by each agent. By imposing sparsity in the available learning data, a situation analogous to the poverty of stimulus, discussed above, is introduced. In order for an agent to acquire a generative capacity, the agent must generalize from the data it has have been given. In this sense, linguistic competence represents the ability to express meanings. To achieve a generative capacity requires that structure is present in the data. This is a crucial observation I will return to later. Modeling compositionality A model of language needs to capture the fact that language is a particular relationship between sound and meaning. The level of abstraction used here will capture the fact that language is mapping from a “characteristic kind of semantic or pragmatic function onto a characteristic kind of symbol sequence” (Pinker and Bloom, 1990). When I refer to a model of language, I will be referring to set of possible relationships between, on the one hand, entities representing meanings and on the other, entities representing signals. Throughout this article I will consider meanings as multi-dimensional feature structures, and signals as sequences of symbols. Meanings are defined as feature vectors representing points in a meaning space. Meaning spaces will be defined by two parameters, F and V . The parameter F defines the number of features each meaning will have. The parameter V defines how many values each of these features can accommodate. A meaning space M specified by F = 2 and V = 2 would represent the set: M = {(1, 1), (1, 2), (2, 1), (2, 2)}. Notice that meanings represent structured objects of a fixed length. Signals, in contrast, are represented as a variable length string of symbols drawn from some alphabet Σ. The length of signals within the model is bounded by the maximum length denoted by lmax . For example a signal space S , defined by Σ = {a, b, c, d} and lmax = 4, might be the set S = {ba, ccad, acda, c, . . .}. We now have a precise formulation of the meanings and signals, but of greater importance to following discussion will be the kinds of structural relationships that exist between meanings and signals. It is the kind relationship between meanings and signals that makes human language so distinctive. As it stands, the model of language presented above can capture a key feature of language I will be focusing on: It can represent both compositional mappings and noncompositional mappings (for more exotic language models see Kirby 2002; Batali, 2002). Compositionality is a property of the mapping between meanings and signals. It is not a property of a set of meanings, nor a property of a
Compositionality and Linguistic Evolution
21
set of signals. A compositional mapping is one where the meaning of a signal is some function of the meaning of its parts (e.g., Krifka, 2001). Such a mapping is possible given the model of language developed so far. Consider the language Lcomp : Lcomp = {h{1, 2, 2}, adfi, h{1, 1, 1}, acei, h{2, 2, 2}, bdfi, h{2, 1, 1}, bcei, h{1, 2, 1}, adei, h{1, 1, 2}, acfi} This language has compositional structure due to the fact that each meaning is mapped to a signal such that parts of the signal (some sub-string) correspond to parts of the meaning (a feature value). The symbol a, for example, represents feature value 1 for the first feature. The precise relationship between meanings and signals can vary substantially. For example, one feature value can map to two separate parts of the signal, these parts of the signal can be of variable length, and some parts of the signal can correspond to no part of the meaning. But importantly, the property of compositionality is independent of such characteristics of the mapping. Compositionality is an abstract property capturing the fact that some function determines how parts of the signal correspond to parts of the meaning. The exact definition of a compositional relationship is subject to heated debate, and one I will sidestep in the interests of brevity (e.g., Zadrozny, 1994). All human languages exhibit compositionality. Instances of the model of language with no compositional structure whatsoever are also of interest. I will term such relationships holistic languages1 : the whole signal maps to a whole meaning, such that no obvious relationship exists between parts of the signal and parts of the meaning. Here is an example of a holistic language Lholistic : Lholistic = {h{1, 2, 2}, sghsi, h{1, 1, 1}, ppoldi, h{2, 2, 2}, monkeyi, h{2, 1, 1}, qi, h{1, 2, 1}, rcdi, h{1, 1, 2}, esoxi} A holistic language is usually constructed by associating a random signal to each meaning. For this reason, holistic languages may also be referred to as random languages in the discussion that follows. Given the model of language described above we can now consider in more depth how iterated learning models can be used to explore the linguistic evolution of compositionality. 1
Strictly speaking, I should use the term holistic communication system since one of the defining features of language is compositionality. Nevertheless, we will continue to abuse the term language in this way in the interest of clarity.
22
4
Henry Brighton
Linguistic Evolution and Induction
A crucial component of any iterated learning model is the process of induction, as agents are required to induce hypotheses explaining the linguistic data they observe. Making an inductive inference involves choosing a hypothesis from a set of candidate hypotheses H = {H1 , H2 , . . .} in the light of some data D. Such an inference, depending on the chosen hypothesis, can result in a general statement not only concerning the observed data, but also data yet to be observed. The problem of induction is the problem of identifying the most appropriate hypothesis, and hence the most appropriate statement, that explains the given data D. Contemporary theories of induction regard this problem as one fundamentally resting on the issue of model complexity (Rissanen, 1978; Li and Vit´anyi, 1997; Pitt et al., 2002). Complexity is the flexibility inherent in a class of hypotheses that allow them to fit diverse patterns of data. For example, choosing a hypothesis that is consistent with the observed data may describe the observed data but, due to the high degree of complexity of the hypothesis, may be woefully inadequate as an explanation of the data (overfitting). The hypothesis, by virtue of its inherent complexity, may also describe an extremely diverse range of data. This makes the hypothesis less likely to be an appropriate model of the underlying data generating machinery. Similarly, a hypothesis with insufficient complexity will not possess the complexity required to explain the data (underfitting). Accordingly, the inductive process is fundamentally an issue of identifying a trade-off in the complexity of hypotheses. One approach to tackling this issue is the minimum description length (MDL) principle (Rissanen, 1978; Li and Vit´anyi, 1997). The MDL principle provides a means of judging, given a hypothesis space H and some data D, which member of H represents the most likely hypothesis given that D was observed. The key idea behind MDL is that the more we are able to compress the observed data, the more we have learned about it: any kind of pattern of regularity present in the data can potentially allow us to compress the data. Once we have identified such a pattern, we can re-describe the observed data using fewer symbols than a literal description of the data. This is the philosophy behind the principle. MDL can be deployed in a practical sense by drawing on a theoretically solid and formally well understood body of techniques. Basing a model of induction on the MDL principle has the advantage that hypothesis selection is determined by the complexity of the hypotheses under consideration. In recent years, the MDL principle has become increasingly influential in the analysis of learning (Rissanen, 1997), model selection (Gr¨unwald, 2002; Pitt et al., 2002), and many aspects of the cognitive system (Chater, 1999; Chater and Vit´anyi, 2003) including language acquisition (Wolff, 1982; Chater and Vit´anyi, 2004). More formally, the MDL principle states that the most likely hypothesis is the
Compositionality and Linguistic Evolution
23
one which minimizes the sum of two quantities. The first quantity is the length, in bits, of encoding the hypothesis. The second quantity is the length, in bits, of the encoding the data, when represented using this hypothesis. To formalize this statement, we require an optimal encoding scheme for the hypotheses, C1 , and an encoding scheme for data represented in terms of the hypothesis, C2 . Furthermore, the only relevant issue for hypothesis selection is the length of these encodings: LC1 and LC2 . Given the set of hypotheses H , and the observed data, D, the MDL principle selects a member of H , HMDL , as follows: HMDL = min {LC1 (H) + LC2 (D|H)} H∈H
(1)
This expression states that the best hypothesis to explain the data is the one which, when chosen, leads to the shortest coding of the data. The coding is achieved using a combination of the chosen hypothesis and a description of the data using this hypothesis. Here we see how the selected hypothesis represents a point in a trade-off between high and low complexity explanations. The MDL principle tells us how to judge competing hypotheses with respect to this tradeoff by exploiting the relationship between coding and probability (Cover and Thomas, 1991). Learning based on minimum description length To transfer this discussion into a model and test the impact of learning based on the MDL principle requires us to construct a hypothesis space H , and coding schemes over these hypotheses. Recall that data in this discussion are collections of utterances whose form is determined by the language model introduced earlier. One example is the following set of utterances, Lcomp : Lcomp = {h{1, 2, 2}, adfi, h{1, 1, 1}, acei, h{2, 2, 2}, bdfi, h{2, 1, 1}, bcei, h{1, 2, 1}, adei, h{1, 1, 2}, acfi} In order to apply the MDL principle to the selection of hypotheses given some arbitrary series of utterances, I will consider a hypothesis space composed of finite state unification transducers2 (FSUTs) (Brighton, 2002). These transducers relate meanings to signals using a network of states and transitions. A number of paths exist through the transducer. Each path begins at the start state. These paths always end at another privileged state termed the accepting state. A path through the transducer is specified by a series of transitions between states; each of these transitions relates part of a signal to part of a meaning. For example, 2
A FSUT is a variation on the basic notion of a finite state transducer (Hopcroft and Ullman, 1979). The use of such transducers was inspired by and extends the work of Teal and Taylor (2000).
24
Henry Brighton
consider the transducer shown in Figure 2(a). It depicts a transducer which represents the language Lcomp . This transducer – termed the prefix tree transducer – corresponds to the maximally specific hypothesis: it describes the data verbatim, and therefore does not capture any structure present in the language. It is the largest consistent hypothesis in H that can be used to describe the data Lcomp , and only Lcomp . Given a transducer and a signal, the associated meaning can be derived by following a path consistent with that signal, and collecting the meanings associated with each transition taken. Similarly, given a meaning, the signal can be derived by following a path consistent with the meaning, and concatenating each symbol encountered along the path. Given some observed utterances L, the space of candidate hypotheses will consist of all FSUTs consistent with the observed utterances. By consistent, I mean that observed examples of meaning/signal associations are never discarded, the candidate hypotheses are constrained to always be able to generate, at a minimum, all the observed utterances. We are interested in situations in which a transducer is capable of generating utterances for meanings it has never observed; in such a situation, the transducer will have generalized. If structural regularity exists in the observed language the prefix tree transducer can be used to derive further, more general, transducers that are also consistent with the observed data. Such derivations are achieved by applying compression operations on the transducer. Compression operators, when applicable, can introduce generalizations by by merging states and edges. Given the prefix tree transducer – which is simply a literal representation of the observed data – only two operators, state merge and edge merge, are required to derive all possible consistent transducers. For the details of how states and edges are merged, as well as the details of the encoding schemes C1 and C2 , the reader should refer to the work presented in Brighton (2002, 2003). The important feature of the FSUT model, in combination with the MDL principle, is that compression can lead to generalisation. For example, Figure 2(b) and (c) illustrate some possible state and edge merge operations applied to the prefix tree transducer representing Lcomp . The transducer resulting from these merge operations is shown in in Figure 2(d). Figure 2(e) depicts the fully compressed transducer, which is found by performing additional state and edge merge operations. Note that further compression operations are possible, but they lead the transducer to express meanings which are inconsistent with the observed language. Nevertheless, by applying the compression operators, all consistent transducers can be generated. Some of these transducers will be more compressed that others, and as a result, they are more likely to generalise than others. Note that if Lcomp was an instance of a random (holistic) language, then few, if any, compression operations would be applicable; regularity is required for compression to be possible. Generalisation can lead to the ability to express meanings that where not
Compositionality and Linguistic Evolution
25
Lcomp = {h{1, 2, 2}, adfi, h{1, 1, 1}, acei, h{2, 2, 2}, bdfi, h{2, 1, 1}, bcei, h{1, 2, 1}, adei, h{1, 1, 2}, acfi}
(a)
(b)
(c)
(d)
(e) + Lcomp = {h{1, 2, 2}, adfi, h{1, 1, 1}, acei, h{2, 2, 2}, bdfi, h{2, 1, 1}, bcei, h{1, 2, 1}, adei, h{1, 1, 2}, acfi, h{2, 1, 2}, bcfi, h{2, 2, 1}, bdei}
Figure 2: Given the compositional language Lcomp , the prefix tree transducer shown in (a) is constructed. By performing edge and state merge operations, the result of which are shown in (b), (c), and (d), the transducer can be compressed. The transducer shown in (e) represents a fully compressed transducer. It can + generalize to the language Lcomp .
26
Henry Brighton
mentioned in the linguistic data. To express a novel meaning, a search through the transducer is sought such that an appropriate series of edge transitions are found. Some of the meanings on these edge transitions, as a result of the application of the compression operators, may contain wildcard feature values that represent unbound feature values. These free variables are introduced when two transitions are merged which contain conflicting values for a particular feature. To express a novel meaning, the unification of the set of meanings occurring on the transitions must yield the meaning to be expressed. The resulting signal is formed by concatenating the symbols mentioned on each edge; the ordering of the symbols in the signal therefore reflects the ordering of the edge traversals when passing through the transducer. For example, a close inspection of the compressed transducer shown in Figure 2(e) reveals that meanings which are not present in Lcomp can be expressed. The expressivity of a transducer is sim+ ply the number of meaning that can be expressed. The language Lcomp , shown in Figure 2, contains all the meaning/signal pairs which can be expressed by the fully compressed transducer in the above example. In this case, compression led to generalisation, and the expressivity of the transducer increased from 6 meanings to 8 meanings. By compressing the prefix tree transducer, the structure in the compositional language has been made explicit, and as result, generalisation occurred. Generalisation will not be possible when structure is lacking in the observed data, and the result will be that some meanings cannot be expressed. We now have a hypothesis space over which we can apply the MDL principle. The hypothesis chosen in light of data D is the one with the smallest description length, HMDL . This search for this hypothesis is performed using a hill-climbing search described in Brighton (2003). With these model components in place, we are now in a position to assess the impact of induction based on the MDL principle within the iterated learning model. 5
The Evolutionary Consequences of the Simplicity Principle
The previous section summarised a model of learning based on compression constrained by the MDL principle. In this section I will describe how this model of learning leads to the cultural adaptation of the language as it is transmitted from one generation to the next within the iterated learning model. In order to specify this process in sufficient detail for simulation, several parameters need to defined. The meaning space is defined by F, the number of feature in each meaning, V , the number of values each feature can accommodate. The signal space is defined by Σ, the alphabet of symbols, and lmax , the maximum string length for randomly generated signals. A transmission bottleneck is imposed by restricting the number random utterances observed, R, to 32. These parameters are used to define the initial state of the system, including the initial language.
Figure 3: Given an initial random language defined by the above parameter values, HMDL is an example of an induced FSUT. Negligible compression has occurred, and as a result the transducer does not generalise to novel meanings: 32 utterances were given as input, and each of these utterances is encoded by a unique path through the transducer.
F = 3, V = 2, |Σ| = 20, lmax = 15, R = 32
Compositionality and Linguistic Evolution 27
28
Henry Brighton
Figure 3 details these parameter values and depicts an example MDL FSUT induced from an initial random language constructed with the given parameter values. Negligible compression occurs. The language represented by the transducer is holistic; the compositional structure we seek to explain is lacking, and this is the situation we are interested in: how can a compositional mapping with maximum expressivity evolve through cultural adaptation? Next, I will consider a crucial aspect of the model side-stepped so far: the issue of invention. Invention occurs when an agent is presented with a meaning it cannot express. That is, the meaning was not observed in conjunction with a signal during learning, and cannot be expressed as a result of generalisation. For example, the transducer in Figure 3 can only express the meanings which were present in the observed language. Within the iterated learning model, transducers will be required to express meanings which were not in the observed set of utterances. To solve this problem, a policy of random invention can be deployed, where a random signal is generated for novel meanings. This policy will be investigated first. Initialized with a random language the simulation is run for 200 iterations. Figure 4(a–b) illustrates how the system develops from one generation to the next. First of all, Figure 4(a) depicts compression rate, α, as a function of iterations. The compression rate measures the relative size of the |H | prefix tree transducer, H pre f ix , and the chosen hypothesis Hmdl : α = 1 − |Hpremdlf ix | . A high compression rate means that the language is compressible. Figure 4(a) illustrates that the compressibility of the language, from one generation to the next, changes very little. The initial random language undergoes no significant adaptation and remains unstructured and therefore uncompressible (α ≈ 0.06). Figure 4(b) highlights this fact, by showing the transitions through a state space depicting the expressivity of the language as a function of the encoding length of the language. Here, we see that from the initial state, labeled A, the systems follows an unordered trajectory through the sub-space of small inexpressive transducers. Because the language remains unstructured, generalisation is not possible and expressivity remains low. Similarly, unstructured languages cannot be compressed, and therefore the encoding length remains relatively high. The key point here is that a cumulative evolution of structure does not occur. Why is this? The mechanisms supporting linguistic evolution – language learning and language production – are somehow inhibiting the cumulative evolution of structure. The source of this inhibition is the random invention procedure. Invention based on a simplicity principle The MDL principle can tell nothing about the process of production. As a result, the process of interrogating the hypothesis with novel meanings to yield signals is not fully defined, and needs to be developed. To address this problem, a
Compositionality and Linguistic Evolution
(a)
(b)
Figure 4: Linguistic evolution resulting from partially random invention.
29
30
Henry Brighton
more principled invention mechanism is investigated where the invented signal is a derived using hypothesis itself rather than being constructed at random. The invented signal will be constrained by structure present in the hypothesis. The invention method I employ here exploits the structure already present in the hypothesis by using those parts of the transducer consistent with the novel meaning to construct part of the signal. This approach is detailed in Brighton (2003), but the essentials of the process can be summarised as follows. The invented signal, if it where seen in conjunction with the novel meaning during the learning phase, would not lead to an increase in the MDL of the induced hypothesis. This invention procedure therefore proposes a signal which in some sense matches the structure of hypothesis. If such a signal cannot be found, then no signal is invented. In short, the invention procedure, rather than being random, now takes into account the structure present in the hypothesis. Running the experiment with this invention procedure, Figure 5 illustrates exactly the same measurements as those shown in Figure 4. Strikingly, Figure 5 reveals that very different evolved states result as a consequence of the alternative invention procedure. Figure 5(a) illustrates an entirely different trajectory, one where a series of transitions lead to small, stable, and expressive hypotheses. Starting at an expected expressivity of approximately 22 meanings (point A), the system follows an L-shaped trajectory. There are two distinct jumps to a stable state where we see small hypotheses capable of expressing all 64 meanings. The compressor scheme consistently directs linguistic evolution toward compositional systems. The first major transition through the state space takes the system from the bottom-right end of the L-shape (point A) to the bend in the L-shape (points B and C), where expressivity increases slightly, but the minimum description length of the language decreases by a factor of 3. From requiring approximately 6000 bits to encode the evolving language, linguistic evolution results in transducers being induced with an MDL of approximately 2000 bits. The lack of increase in expressivity is a reflection of the transducers organizing themselves in such a way that significant compression results, but an increase in expressivity is not achieved. The second transition, leading to the top of the L-shape (through point D to point E), is very different in nature. Here, for a small decrease in the MDL of the developing language, a significant increase in expressivity occurs. This is an important transition, as it results in the system entering a stable region of the state space. Although a few deviations away from this stable region occur early on, the system settles into a steady state characterized by high expressivity. Figure 5(b) reflects these transitions. The compression rate rises in two stages corresponding to the points in the L-shaped trajectory. Figure 6(a–d) depicts the transducers corresponding to the points B, C, D, and E in Figure 5(b). Figure 6(a) represents the transducer corresponding to point
Compositionality and Linguistic Evolution
(a)
(b)
Figure 5: Linguistic evolution arising from simplicity based invention.
31
32
Henry Brighton
(a)
(b)
(c)
(d)
Figure 6: The transducers corresponding to positions B, C, D, and E highlighted in Figure 5.
Compositionality and Linguistic Evolution
33
B. In this transducer, we see the beginnings of significant structure emerging. The first symbol in each signal appears to discriminate between feature values in the second feature. This structural relationship acts as a seed for further discrimination, which will ultimately result in generalisation. Between point B and point C, the evolution of the language becomes increasingly more evident. Point D shown in Figure 6(c), corresponds to a transducer where further discrimination occurs, and certain meanings can be expressed even though they were not observed – significant generalisation is occurring. Figure 6(d) illustrate the occurrence of further discrimination and generalisation, as the state of the system climbs up to and moves around a stable region of the state space. Although this transducer exhibits a large amount of redundancy, this redundancy does not effect its ability to be induced generation after generation. Initially, as the state approaches point E, some variation occurs across iterations before the steady state is arrived at. This suggests the stable regions of the state space are Liapounov stable: if the system were to start in this region, it would stay within this region (see, for example, Glendinning, 1994).
6
Linguistic Evolution of Compositionality
The previous section demonstrated how linguistic evolution can lead, from an initially holistic communication system, to the development of a compositional mapping between meanings and signals. It also indicated that certain conditions must be met if compositional structure is to develop at all: the invention mechanism, for example, cannot be random. This is one condition of many required in order for cumulative evolution to occur. It is not the case that compositional structure is the inevitable outcome of an iterated learning model. In fact, the conditions required for cumulative evolution are strict. Brighton (2002) showed that the evolution of compositional structure requires that: (1) a transmission bottleneck imposing a sufficient degree of data sparsity is in place, and (2), that a sufficient degree of complexity is present in the meaning space. The parameters used in the previous section were chosen to maximize the likelihood of compositional systems according to the mathematical model reported by Brighton (2002). Although the example we have just considered represents the result of one simulation, this evolutionary trajectory is typical for the given parameter values. In short, it should be stressed that without a transmission bottleneck present languages will not change and compositional systems will not be observed. Similarly, without a sufficient degree of complexity in the meaning space, compositionality will not confer a stability advantage and therefore compositional languages are unlikely to be observed. The sensitivity of the environmental conditions required for the evolution of
34
Henry Brighton
(a)
G1 : S/x,y,z → A/x B/y C/z A/1 → w A/2 → mmsf B/1 → tomt
B/2 → btps C/1 → w C/2 → q
(b)
G2 : S/x,y,z → A/x d B/y C/z A/1 → r A/2 → n A/3 → m
B/y → D/y sb B/3 → js D/1 → j D/2 → b
C/3 → sgc C/z → b E/z E/1 → q E/2 → b
Figure 7: Two evolved languages. (a) Shows a transducer, and the corresponding grammar, containing redundant transitions, variable length signals, and several syntactic categories. (b) shows a language with variable length substrings. compositional systems suggests that an explanation for why language exhibits compositionality cannot be framed exclusively in terms of the properties of the cognitive system. To fully appreciate this point, it is worth considering the nature of stable states in the model, as they provide an example of how the linguistic complexity observed is not trivially related to the properties of linguistic agents. Figure 7 shows two stable states result from the model. Figure 7(a) depicts a transducer for a meaning space defined by F = 3 and V = 2 along with the grammar, G1 , which describes how signals are constructed for each of the 8 meanings. Similarly, Figure 7(b) depicts the transducer and the corresponding grammar, G2 , for a meaning space defined by F = 3 and V = 3 which comprises 27 meanings. Optimal transducers, those with the lowest description length given the parameter values, are those where a single symbol is associated with each feature value of the meaning space. Even though the minimum description length principle would prefer these transducers, they do not occur in the model. A close inspection of the transducers shown in Figure 7 demonstrates that features are coded inefficiently: variable length strings of symbols are used, rather a sin-
Compositionality and Linguistic Evolution
35
gle symbol, and some feature values are associated with redundant transitions which carry no meaning. In Figure 7, for example, all meanings are expressed with signals containing a redundant symbol (the second symbol d). These imperfections are frozen accidents: the residue of production decisions made before stability occurred. The imperfections do not have a detrimental impact on the stability of the language, and they therefore survive repeated transmission due to being part of the compositional relationship coded in the language. This phenomenon is an example of how the process of linguistic evolution leads to complexity which is not a direct reflection of the learning bias: transducers with lower description length exist. The evolved transducers are functional in the sense that they are stable, despite deviating from the “optimal” transducer, and this is why we observe them. The key point here is that given an understanding of the learning and production mechanisms of the linguistic agents, it is far from clear that such an understanding would by itself allow us to predict the outcome of the model. The process of linguistic evolution represents a complex adaptive system. This conclusion can be related to task of explaining why human languages have certain structural relationships and not others. If linguistic evolution plays a role in determining the structure of human language, then we must conclude that: (1) linguistic universals are not necessarily direct reflections of properties of the cognitive system, and (2), that an internalist or mentalistic explanation of linguistic universals is likely to be fundamentally lacking. 7
Conclusion
This discussion began with the following observation: If the universal features of language are in no sense acquired through a process of inductive generalizations, then we can rightfully circumscribe the cognitive system alone as the focus of an explanation for why language exhibits certain structural characteristics and not others. This must be the case, as the only remaining source of explanation has to appeal to characteristics of the cognitive system: characteristics of the environment are rendered irrelevant. The assumption that universal features of language are not acquired through inductive generalizations is controversial. The assumption has many critics. Importantly, deviating from this assumption necessarily creates a problem in explaining why language exhibits these universal characteristics. If the acquisition of linguistic universals rely to some extent on properties of the linguistic data, then to retain explanatory adequacy requires us to explain why the environment contains certain structural characteristics and not others. This discussion has focused on the process of linguistic evolution as a source
36
Henry Brighton
explanation for why the linguistic data exhibits certain characteristics and not others. In particular, I have focused on the property of compositionality and explored the possibility that the process of linguistic evolution can explain how the linguistic environment came to contain compositional structure. To test this theory, I have used a computational model of linguistic evolution. The model predicts the cumulative evolution of compositional structure given certain environmental conditions. Importantly, the model also suggests that an understanding of the properties of linguistic agents cannot by itself satisfactorily explain why the evolved languages exhibit compositionality. This alternative standpoint places an explanation for why language exhibits certain hallmarks and not others fundamentally in the terms of an interaction between how language is acquired, how language is transmitted, and how the innate constraints on acquisition came to exist. In short, this work cast doubt on the utility of wholly psychological, cognitivistic, or internalist explanations for linguistic universals such as compositionality. References Andersen, H. (1973). Abductive and deductive change. Language, 40, 765–793. Baker, C. L. (1978). Introduction to generative-transformational syntax. Englewood Cliffs, NJ: Prentice-Hall. Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In E. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models (pp. 111–172). Cambridge: Cambridge University Press. Bates, E., & Elman, J. (1996). Learning rediscovered. Science, 274, 1849–1850. Brighton, H. (2002). Compositional syntax from cultural transmission. Artificial Life, 8(1), 25–54. Brighton, H. (2003). Simplicity as a driving force in linguistic evolution. Unpublished doctoral dissertation, The University of Edinburgh. Brighton, H., Kirby, S., & Smith, K. (2005). Cultural selection for learnability: Three principles underlying the view that language adapts to be learnable. In M. Tallerman (Ed.), Language origins: Perspectives on evolution. Oxford: Oxford University Press. Chater, N. (1999). The search for simplicity: A fundamental cognitive principle? Quarterly Journal of Experimental Psychology, 52A, 273–302.
Compositionality and Linguistic Evolution
37
Chater, N., & Vit´anyi, P. M. B. (2003). Simplicity: a unifying principle in cognitive science? Trends in Cognitive Sciences, 7(1), 19–22. Chater, N., & Vit´anyi, P. M. B. (2004). A simplicity principle for language acquistion: Re-evaluating what can be learned from positive evidence. (Manuscript under review) Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. (2002). On nature and language. Cambridge: Cambridge University Press. Christiansen, M. H., & Kirby, S. (2003). Language evolution: consensus and controversies. Trends in Cognitive Sciences, 7(7), 300–307. Clark, R., & Roberts, I. (1993). A computational model of language learnability and language change. Linguistic Inquiry, 24(2), 299–345. Comrie, B. (1989). Language universals and linguistic typology (Second ed.). Oxford: Blackwell. Cover, T., & Thomas, J. (1991). Elements of information theory. New York: Wiley Interscience. Cowie, F. (1999). What’s within? Nativism reconsidered. Oxford: Oxford University Press. Crain, S. (1991). Language acquisition in the absence of experience. Behavioral and Brain Sciences, 14, 597–612. Croft, W. (1990). Typology and universals. Cambridge: Cambridge University Press. Culicover, P. W. (1999). Syntactic nuts: Hard cases, syntactic theory, and language acquistion. Oxford: Oxford University Press. Deacon, T. W. (1997). The symbolic species. W. W. Norton and Company. Elman, J. L. (2003). Generalization from sparse input. In Proceedings of the 38th annual meeting of the Chicago Linguistic Society. Chicago. Glendinning, P. (1994). Stability, instability, and chaos: An introduction to the theory of nonlinear differential equations. Cambridge: Cambridge University Press. Gr¨unwald, P. (2000). Model selection based on minimum description length. Journal of Mathematical Psychology, 44, 133–152.
38
Henry Brighton
Hare, M., & Elman, J. L. (1995). Learning and morphological change. Cognition, 56, 61–98. Hopcroft, J. E., & Ullman, J. D. (1979). Introduction to automata theory, languages, and computation. Addison-Wesley. Hurford, J. R. (1990). Nativist and functional explanations in language acquisition. In I. M. Roca (Ed.), Logical issues in language acquisition (pp. 85–136). Foris Publications. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press. Kimball, J. P. (1973). The formal theory of grammar. Englewood Cliffs, NJ: Prentice-Hall. Kirby, S. (2001). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Journal of Evolutionary Computation, 5(2), 102–110. Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In E. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models (pp. 173–203). Cambridge: Cambridge University Press. Krifka, M. (2001). Compositionality. In R. A. Wilson & F. Keil (Eds.), The mit encyclopaedia of the cognitive sciences. Cambridge, MA: MIT Press. Li, M., & Vit´anyi, P. M. B. (1997). A introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag. Lidz, J., & Waxman, S. (2004). Reaffirming the poverty of the stimulus argument: a reply to the replies. Cognition, 93, 157–165. Lidz, J., Waxman, S., & Freedman, J. (2003). What infants know about syntax but couldn’t have learned: Evidence for syntactic structure at 18-months. Cognition, 89, 65–73. Niyogi, P., & Berwick, R. (1997). Evolutionary consequences of language learning. Linguistics and Philosophy, 20, 697–719. O’Grady, W., Dobrovolsky, M., & Katamba, F. (1997). Contemporary linguistics (3rd edition ed.). Longman. Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707–784.
Compositionality and Linguistic Evolution
39
Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method of selecting among computational models of cognition. Psychological Review, 109(3), 472–491. Pullum, G. K. (1996). Learnability, hyperlearning, and the poverty of the stimulus. In J. Johnson, M. L. Juge, & J. L. Moxley (Eds.), Proceeding of the 22nd annual meeting: General session and parasession on the role of learnability in grammatical theory (pp. 498–513). Berkeley, CA: Berkeley Linguistic Society. Pullum, G. K., & Scholz, B. C. (2002). Empirical assessment of stimulus poverty arguments. The Linguistic Review, 19(1–2), 9–50. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471. Rissanen, J. (1989). Stochastic complexity in statistical inquiry. World Scientific. Rissanen, J. (1997). Stochastic complexity in learning. Journal of Computer and System Sciences, 55, 89–95. Sampson, G. (1989). Language acqusition: Growth or learning? Philosophical Papers, 18, 203–240. Smith, K., Brighton, H., & Kirby, S. (2003). Complex systems in language evolution: The cultural emergence of compositonal structure. Advances in Complex Systems, 6(4), 527–558. Smith, K., Kirby, S., & Brighton, H. (2003). Iterated learning: a framework for the emergence of language. Artificial Life, 9(4), 371–386. Teal, T. K., & Taylor, C. E. (2000). Effects of compression on language evolution. Artificial Life, 6(2), 129–143. Wagner, L. (2001). Defending nativism in language acqusition. Trends in Cognitive Sciences, 6(7), 283–284. Wexler, K. (1991). On the argument from the poverty of the stimulus. In A. Kasher (Ed.), The Chomskyan turn (pp. 253–270). Cambridge: Blackwell. Wolff, J. G. (1982). Language acquisition, data compression, and generalization. Language and Communication, 2(1), 57–89. Zadrozny, W. (1994). From compositional to systematic semantics. Linguistics and Philosophy, 17, 329–342.
Compositionality and Molecularism Denis Bonnay This paper is devoted to the analysis of the conceptual interplay between the principle of compositionality (PC) and the claim that meaning is molecular rather than holistic. I first argue that some sort of molecularity thesis being true is a necessary condition for arguments in favor of the principle of compositionality to make sense. Then I propose a formal characterization of molecularity which aims at taking into account the way it interacts with compositionality, as captured by Hodges setting, to (help) explain our mastery of language. 1
Introduction
Hodges, in his seminal 2001 paper Formal Features of Compositionality, provides a setting for a precise definition of compositionality, in which various questions about the possibility of finding compositional semantics for languages are successfully addressed. This fact suggests that this setting could be used for the formalization of further strengthening of compositionality, for which, pace (Hodges 1998), compositionality is indeed the problem. Various approaches have alreadt been suggested. Pagin (2003) advocates some kind of reverse compositionality, Kracht (1998) puts forward a notion of strict compositionality with built-in computability requirements. Our approach is based on a notion of compositionality which encapsulates anti-holist requirements. The first section is devoted to conceptual arguments in favor of this one strengthening. In the second section, Hodges’ setting will come into the picture and prepare the ground for the formalization of our strong notion of compositionality in the third section, based on an extension of Hodges’ mathematics. 2
Compositionality and Molecularity
A proposal for strengthening (PC) Among the many conceptual and technical issues surrounding compositionality, two of them are specially noteworthy and enduring: Address for correspondence: Institut d’Histoire et de Philosophie des Sciences et des Techniques, 13 rue du Four, 75006 Paris, France. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
42
Denis Bonnay
• methodological status. What is the status of the principle of compositionality for formal or natural languages: is it substantial or empty? is it a necessary condition for a language to be understandable? Must any semantics be compositional? • formalization. One seeks a definition of compositionality in a given mathematical framework, and then tries to investigate its properties, such as conditions for existence, possible strengthenings and so on (see e.g. Janssen 1997). It has been a temptation to rest on formal results about compositionality in order to sustain various theses concerning the status of the principle. In particular, the temptation has been to rely on results establishing that a compositional semantics is always available – the most famous such result being Zadrozny’s (1994) – and to argue for the non-substantiality of the principle. These attempts have been widely criticized on the ground that not any compositional semantics was interesting1 . So the upshot of formalization has been a little bit disappointing, and even Hodges’ conclusion for example was that, after all, compositionality was not the problem. The lesson to be drawn is first that one should start the other way around, and ask why compositionality is interesting, on which theoretical grounds it is needed, before going back to formalization, and second that one should look for strengthenings of the notion of compositionality that would protect availability results from triviality. There are two main arguments in favor of (PC): • the argument from productivity. Competent speakers can understand a complex expression e they have never encountered before. So there must be some knowledge which is responsible for this ability. As all they have to start with to grasp the meaning of e is the syntactic strucutre of e plus its simple constituents, they must use their knowledge of the meaning of these constituents and of the associated syntactic links of combination. • the argument from systematicity. Anyone who understands e understands also, without any additional information e0 if e0 is built out of the same constituents as e, and e and e0 are both meaningful. The only plausible way this is possible is if the meaning of e was known thanks to the knowledge of the meaning of its constituents and syntactic links, so that the meaning of e0 could be computed in an analoguous way. 1
Westerst˚ahl (1998) convincingly shows the limits of Zadrozny’s claims, but Zadrozny himself recognizes that his result shows only the need of adding new constraints on what kind of compositionality is interesting.
Compositionality and Molecularism
43
Both arguments have a common structure: compositionality is taken to be the best explanation of facts about speakers competence. But I shall argue that both arguments share a common presupposition: compositionality is the best explanation only if some kind of molecularism is true, that is, only if the meanings of basic lexical items and syntactic constructions can be determined on a finite and local basis. This is because compositionality plays the role of best explanation only if it performs some sort of epistemic reduction, from knowledge of sentence meaning to knowledge of word meaning. In order to establish this point precisely, let’s consider the following theses: Principle of Compositionality (PC). The meaning of a complex expression is a function of the meaning of its parts and of the syntactic rules by which they are combined. Epistemic version of the Principle of Compositionality (EPC). Knowledge of the meaning of a complex expression is attained through knowledge of the meaning of its parts and of the syntactic rules by which they are combined. Epistemic version of Frege’s context principle (EFCP). Knowledge of the meaning of basic expressions is attained through knowledge of the use of sentences containing them.2 Molecularism (M). As long as the meaning of basic expressions gets determined by their use in sentences, it is sufficient to consider the use of certain sentences – the meaning fixing ones – so that it is possible to learn the language step by step. Holism (H). As long as the meaning of basic expressions is determined by their use in sentences, it is necessary to consider the whole use of sentences in the language, so that it is not possible to learn only one part of the language. We sustain the following claims: 1. (EPC) implies (PC), but the converse is not true. 2. (EPC) – and not (PC) – explains productivity and systematicity. 3. Modulo (EFCP), (H) is not compatible with (EPC), though it is compatible with (PC). 4. Modulo (EFCP), (PC) + (M) strongly supports (EPC). 2
(EFCP) is an epistemic version of Frege’s context principle, if it is construed as (FCP): The meaning of basic expressions gets determined by their use in sentences.
44
Denis Bonnay
Point 1. is clear: (EPC) states that some dependence exists, which is epistemically exploited, so that it implies (PC). But, in the other direction, the functions involved in (PC) could be non computable, so that the shift from (PC) to (EPC) is not valid. As to 2., it should be clear from the arguments for productivity and systematicity that what they do point at is (EPC) and not (PC). In both cases, one needs to account for the understanding of some complex expressions, but this knowledge can be explained only if the dependence between meanings has some epistemic significance, as in (EPC). (PC) in itself is not interesting when the concern is with actual linguistic competence; it will be interesting only as long as it is a basis for an epistemic reduction, in the sense that it reduces the problem of knowing the meaning of a new complex expression to a simpler problem about knowledge of the meanings of basic terms. This epistemic reduction is phrased by (EPC). Because of 1. and 2., it is already clear why (PC) could be misconstrued as the best explanation of productivity and systematicity; but 2. and 3. shows that it is only part of the story, that is, (PC) is not sufficient to explain productivity and systematicity. The argument in favor of 3. rests on a charge of circularity. By (EPC), knowledge of the meaning of a certain complex sentence S depends on knowledge of the meaning of its basic parts. But then, by (EFCP), knowledge of the meaning of these parts depends on knowledge of the use of sentences containing them. Finally, by (H), this knowledge involves knowledge of the use of the very sentence S itself, so that grasping the meaning of S is shown to presuppose grasp of the very meaning of S. This is not to be confused with a contradiction between (PC) and (H): one can perfectly well imagine a compositional language such that changes in the meaning of any simple or complex expression induce changes on the meaning of every other expression. We have seen that there is an epistemological gap between (EPC) and (PC), and more generally, there is no hope to find an absolutely conclusive path from formal properties of languages to (EPC), which is an empirical fact concerning the actual working of our mastery of language. But nevertheless, the whole point of discussions about (PC) is to find formal constraints on languages on the basis of certain features of their use, like productivity and systematicity. 3. suggests the direction in which (PC) should be strengthened: 4. states what this direction is. The intuition for 4. is that the charge of circularity which has appeared in the discussion of 3. is leveled if (H) is false and (M) is true. In that case, it is possible to have a clear picture of how language learning can proceed: even if mastery of sentence meaning is prior to mastery of word meaning, (PC) might not be deprived of explanatory power, because the basic mastery of sentence meaning with which we cannot dispense is easier to get
Compositionality and Molecularism
45
than the whole mastery of language to which the mastery of word meaning gives access. The epistemological gap is not bridged, in the sense that (PC) + (M) still does not imply (EPC), but (PC) + (M) are viable candidates for formal constraints on languages, because (PC) + (M) supports (EPC) and (EPC) seems the best explanation of essential facts concerning language use. In the particular case of the argument from systematicity, it presupposes that there is something special with the expression e0 , namely that its meaning is interdependent with the meaning of e. But if meaning holism is true, this interdependence between e and e0 is nothing special: such an interdependence holds between any two sentences of the language, so that there is nothing to explain, and compositionality is not needed either. This justifies Dummett’s claim: The principle of compositionality is not the mere truism [...] that the meaning of a sentence is determined by its composition. Its bite comes from the thesis that the understanding of a word consists in the ability to understand characteristic members of a particular range of sentences containing that word. (Dummett, 1991, p. 225) The formalization problem A large part of the interest in formalizing (PC) was lying in the fact that it could be used as a criterion of admissibility for possible theories of language. A theory failing the test of (PC) would be a theory of a language which would fail to explain basic facts of our mastery of this language which are closely connected to our ability of learning it. Under the hypothesis that these facts hold or that the language this theory is about is learnable, such a theory should be rejected. If the preceding argument is correct, these very facts support not only (PC), but a stronger property of languages, namely (PC) + (M). Our aim in the rest of this paper will thus be to get a formal grip on (PC)+(M) and provide a stronger test for theories of language. How should this formalization proceed? First, note that there is some ambiguity in the phrasing of the principle of molecularity. On one hand, it can be seen as a static property of a semantics for a language: meanings for the whole language can be seen as inherited from meanings of small parts of the language, that is, meanings are preserved from the parts to the whole. Slightly more formally, a language L together with a semantics µ is taken to be molecular if it is possible to analyze it into proper sublanguages Li , such that µ agrees with the semantics µi for the Li on terms belonging to the Li . We will say that a language together with a semantics enjoys static molecularity if (PC)+(M) holds for it, (M) being taken in a static sense. On the other hand, molecularity is also a dynamic thesis pertaining to language learning: it is the claim that language
46
Denis Bonnay
learning proceeds incrementally, or step by step. The basis on which meanings are determined should be structured in such a way that it is possible to learn meanings of terms independently of each other, or at least in a cumulative way, such that new semantic knowledge does not imply revision of previous semantic knowledge. Dynamic molecularity is a property not of semantics but of linguistic materials on which language learning is based. This second sense of molecularity is stronger than the second one: a molecular picture of the materials of language learning for hL, µi (dynamic molecularity) should yield an analysis of language hL, µi (static molecularity). Here comes a first problem. In its strong dynamic sense, (M) lacks content until an answer is provided to the problem of how word meaning gets determined (determination problem). Paraphrasing Dummett, (PC) loses its bite if no molecularist answer to the problem of the determination of word meaning is provided. But (M) is not by itself such an answer; it is only a constraint on possible answers to the determination problem. We shall choose a particular type of answer to the determination problem and side with proponents of inferential role semantics (IRS), according to whom what matters is the acceptance of certain sentences and inferences. More precisely, following Greenberg and Harman, (IRS) is a special case of conceptual role semantics. A conceptual role semantics is “any theory [of meaning] that holds that the content of [...] symbols is determined by any part of their role or use in thought” (Greenberg & Harman, in press, p. 1). (IRS) is then a special case of conceptual role semantics, according to which “the recognition of internal inferential and implicational relations [is taken] to be crucial to the meaning” (Greenberg & Harman, in press, p. 2) of symbols.3 Other non inferentialist options would be perfectly acceptable: Molecularism and inferentialism are two independent theses. One can stick to inferential role semantics while embrac3
IRS can then be seen as one possible way to build a theory of meaning determination on (EFCP). Here we shall assume that the crucial part is a sufficient part, so that meaning is fully determined by inferential role. An IRS in this sense is a semantics only in the weak sense that it determines the synonymy relations inside the language, but it has nothing to say about the link between linguistic and non-linguistic entities. In particular, we will not be concerned with how meanings of particular types of expressions, say nouns, predicates, should be represented (More on this point in Kracht, 2003, p. 290). The fact that our formalization is tied to the adoption of a particular kind of theory of meaning, like IRS, restricts its interest and its generality. It remains interesting both as an example of what can be done for any other kind of theory of meaning and as criterion of admissibility for all kinds of IRS. Finally, Dummett’s own meaning theory is a special kind of inferential role semantics, but the fact that we have drawn upon his arguments in favor of molecularism does not commit us to the adoption of his particular version of IRS.
Compositionality and Molecularism
47
ing holism. For example, this is the kind of position endorsed by Hartry Field (1977). Conversely, nothing in molecularism per se forces one to be an inferentialist. Starting with a definition of static molecularity, we will look for a characterization of dynamic molecularity. More precisely, the aim will be to capture the features of inferential roles which guarantee static molecularity; therefore the possibility to derive static molecularity from dynamic molecularity will be the touchstone of our tentative characterization of dynamic molecularity. Here comes a second problem. Inferential roles are given through semantic rules, giving rise to some consequence relation. The basic material of an IRS consists thus in a relation between sentences. As a consequence, IRS directly supports a semantics only for the sentential part of a given language. In contradistinction to that, compositionality and static molecularity are a property for full semantics, because static molecularity is concerned with the semantics for every kind of terms of the language. Here comes into play the results of Hodges (2001) which aim precisely at bridging the gap between word meaning and sentence meaning. The full picture should thus include a characterization of a property of inferential roles for a language, such that they constrain sentence meanings in a way that guarantees the holding of static molecularity. This picture is intended to suppport (EPC), because it should make the following plausible: Understanding of words comes from understanding of sentences, as constrained by acceptance of certain inferences; and it is on the basis of the understanding of word meaning that the full range of sentences can be understood. In a nutshell, the underlying claim behind the formalization to come is that the core idea of compositionality as the explanation of productivity and systematicity is the interplay between sentence meaning and word meaning. Let’s finally discuss a possible objection to our presentation of the formalization problem. If no constraint is placed upon the way the semantics is to be analyzed in the definition of static compositionality, enjoying static compositionality is bound to be trivial. As a consequence, our distinction of two versions of molecularity might seem at best unnecessary and at worse misleading. In contrast, Pagin (1997) states clearly that holism refers to a property of theories of meaning. It involves both semantic facts and facts pertaining to meaning determination: certain non-semantic properties of linguistic expressions are pinpointed, and these properties determine the meaning of the expressions. A theory of meaning will be holistic if, and only if, the whole extension of the non-semantic property is relied upon and the meanings get determined together, in the sense that assignments of meanings to different expressions are interdependent. But in fact this is perfectly consistent with our setting: static molecularity is interesting because it describes a semantic property which makes (EPC)
48
Denis Bonnay
possible, but the analysis of language underlying molecularity must come from some actual determination of meaning, i.e. static molecularity must hold as the result of the holding of dynamic molecularity, if one accepts as relevant nonsemantic facts the inferential roles. 3
Hodges’ Setting
The setting of Hodges (2001) is well-suited to formalize the strong version of compositionality we are interested in, even if it is not primarily designed to do that. The results concerning the extension problem – i.e. how to get from a compositional semantics for a small language or for the sentential part of a language to a compositional semantics for a bigger language or for the whole language – paves the way for a general definition of static molecularity and for a result linking dynamic and static molecularity. To begin with, we will recall these results. Preliminary definitions Language and term algebra A language L is a triple hE, A, Σi where • E is the set of expressions • A ⊆ E is the set of atomic expressions • Σ is a set of syntactic rules, that is partial maps from E n to E. To each L, one can associate a grammatical term algebra GT (L), which is a subset of the term algebra over L. From an intuitive point of view, it corresponds to the structural analysis of the expressions in L which disambiguates them. Semantics and synonymy relations A semantics for L is a map µ whose domain is a subset of GT (L). A synonymy for L is an equivalence relation ≡ on a subset of GT (L). One associates to every semantics µ the synonymy ≡µ induced by in the usual way; conversely, a synonymy relation ≡ gives rise to a semantics µ ≡ which interprets each term by its synonymy class. Two semantics are equivalent if they have the same associated synonymy relation. We say that µ is as fine-grained as µ 0 , notation µ ≤ µ 0 iff ≡0µ ⊆≡0µ 0 where ≡0 is obtained from ≡ by restricting it to the intersection of the domains of µ and µ 0. We say that µ preserves µ 0 iff the domain D of µ 0 is a subset of the domain of µ and µ ≤ µ 0 and µ 0 ≤ µ. µ preserves µ 0 iff µ is equivalent to an extension of µ 0 in the usual sense.
Compositionality and Molecularism
49
Two terms p and q are separated by a context t(x) in µ iff t(p/x) 6≡µ t(q/x). Two terms are separated in µ iff there is a context which separates them. Two terms p and q have the same µ-category (notation p ∼µ q) iff replacing one by another preserves meaningfulness with respect to µ. A semantics µ is µ 0 husserlian iff for all terms p,q, if p ≡µ then p ∼µ 0 q (that is, µ-synonyms have the same µ 0 -category). A semantics µ is compositional if there exists a function r such that for every complex µ-meaningful term s = α(t1 ...tn ), µ(s) = r(α, µ(t0 )...µ(tn )). Intuitively, the functions rα tell how the meaning of a complex expression α(t1 ...tn ) depends on the meaning of its constituents t1 ...tn plus the syntactic way they are combined, α. It is useful to isolate a weaker property, which for a husserlian semantics, is equivalent to compositionality, 1-compositionality: µ is 1-compositional iff p ≡µ q implies s(p/x) ≡µ s(q/x). The extension theorems The main idea introduced by Hodges (2001) is the notion of Fregean extension, which captures what should be the semantics ν for the expressions of a language, given a semantics µ for the sentences of a language. It is intended to capture the Fregean idea, expressed in Frege’s context principle, that the meaning of a term of the whole language is the contribution that the term makes to the meaning of the sentences containing it. The four conditions are: a) ν is µ-husserlian. b) µ is 1-compositional with respect to ν synonyms. c) If two terms are not ν synonyms, there must be a context which separates them in µ (full abstraction over µ). d) ν is an extension of µ. Hodges’ results are existence results:4 Theorem 1 (First Extension Theorem, Hodges 2001). Let L be a language, GT (L) the set of grammatical expressions of this language, SL the subset 4
The theorems do not involve GTL and SL but GTL and any subset CGTL cofinal to GTL . CGTL is cofinal to GTL if, though CGTL is smaller than GTL , every expression in GTL is a subexpression of an expression in CGTL . Cofinality to GTL is usually taken to be a necessary condition for a subset of GTL to be the set of sentences of L, for any L, though a necessary and sufficient syntactic characterization of SL is still lacking. Since the only subset of GTL we will be interested in is SL , we have rephrased Hodges’ theorems and dropped talk of cofinality.
50
Denis Bonnay
of GT (L) corresponding to the set of L sentences, µ a husserlian and 1compositional semantics for SL , there exists a semantics ν for GT (L) which is a total Fregean extension of µ and which is compositional and unique up to equivalence. Hodges first extension theorem explains how we can get from sentence meaning to word meaning, in so far as it explains how a satisfactory compositional semantics for the whole class of grammatical terms can be devised on the basis of a well-behaved semantics for sentences.5 Theorem 2 (Second Extension Theorem Hodges 2001). Let L and L0 be two languages, GT (L) and GT (L0 ) the set of grammatical expressions of these languages, SL and SL0 the subsets of GT (L) and GT (L0 ) corresponding to the sets of sentences of these languages, µ and µ 0 two husserlian and 1-compositional semantics for SL and SL0 , ν and ν 0 the two semantics for GT (L) and GT (L0 ) given by Theorem 1. Then: • If µ ≤ µ 0 (and ∼µ SL ⊆∼µ 0 ), then ν ≤ ν 0 . • If any two terms p and q of the same µ category which are separated in µ are already separated in µ 0 – we shall say that the separability property holds – (and ∼0µ ⊆∼µ ), then ν 0 ≤ ν. The idea behind the second extension theorem is that since the meanings of the whole set of terms are determined by the meanings of sentences – as set up by the definition of Fregean extension – the degree to which sentence meanings are kept fixed as the language gets bigger determines the degree to which term meanings are kept fixed. Therefore, Hodges theorem can be seen as capturing, at least partially, the idea of a step by step growth of the language, because it can represent the semantics for the whole language as (quasi-)extensions obtained from semantics for parts of the language. Since this second theorem is the most interesting to us, we shall spell out the proof in details, so that the way this determination works is clear: Let p and q be in GT (L0 ), assume p ≡ν q, we want to show that p ≡ν 0 q. From p ≡ν q, we get (1) p ∼µ q by condition a) of the definition of Fregean extension, (2) for all contexts s(x) yielding µ-meaningful sentences for p and q, s(p/x) ≡µ s(q/x) by condition b) of the definition of Fregean extension. 5
Actually, this too involves some drastic idealization, because even though word meaning might be acquired only in the context of their use of these words in sentences, other mechanisms – such as innate sensibility to various kind of objects – obviously come into play at this very stage.
Compositionality and Molecularism
51
From that we get (1’) p ∼µ 0 q by (1) and ∼µ SL ⊆∼µ 0 , (2’) for all contexts s(x) yielding µ 0 -meaningful sentences for p and q, s(p/x) ≡µ 0 s(q/x) by (2) and µ ≤ µ 0 . By condition c) of the definition of Fregean extension, this implies that p ≡ν 0 q. Similarly, let p and q be in GT (L0 ), assume p ≡ν 0 q, we want to show that p ≡ν q. From p ≡ν 0 q, we get (1) p ∼µ 0 q by condition a) of the definition of Fregean extension, (2) for all contexts s(x) yielding µ 0 -meaningful sentences for p and q, s(p/x) ≡µ 0 s(q/x) by condition b) of the definition of Fregean extension. From that we get (1’) p ∼µ q by (1) and ∼0µ ⊆∼µ , (2’) for all contexts s(x) yielding µ-meaningful sentences for p and q, s(p/x) ≡µ s(q/x) by (2) and the separability property. By condition c) of the definition of Fregean extension, this implies that p ≡ν q. We note here that this second proof is not exactly the dual of the first one: to get from (2) to (2’), µ 0 ≤ µ is not sufficient, because there are new contexts in S(L) for which µ 0 ≤ µ says nothing, and that’s why the separability property, which is stronger than µ 0 ≤ µ 6 , is needed. 4
Formalizing Molecularity and Compositionality Together
Even if the notion of Fregean extension explains how sentence meaning determines word meaning, this does not imply that it provides an answer to the determination problem, that is the problem to determine on which learnable basis meanings get known. In order to be able to apply Hodges extension theorems, one has to start with meanings for an infinite set of sentences, therefore an infinitary access to meaning is still presupposed. To answer the determination problem, one has explain how we can get to know, on a finite basis, sentence The separability property implies µ 0 ≤ µ. Assume that p ≡µ 0 q and p 6≡µ q. x is then a separating context in µ for these two terms, by the separatibility property, there is a context s(x) such that s(p/x) 6≡µ 0 s(q/x), but, since µ 0 is assumed to be 1-compositional, this contradicts p ≡µ 0 q. It is easy to see that µ 0 ≤ µ is compatible with the separability property not holding. 6
52
Denis Bonnay
meaning. Here comes into play inferential role semantics, which explains how meaning is already constrained at the level of sentences. So we still have to formalize static molecularity in Hodges setting – to be done in the first subsection – to integrate inferential roles into Hodges’ account of meaning – second subsection – and (this is the crucial point) to put these two things together in order to show what a strong account of compositionality based on molecularism and inferentialism consists in. This will be done in the third subsection, through a definition of dynamic molecularity as the counterpart in terms of inferential roles of static molecularity. Formalizing compositionality plus static molecularity Molecularity is the thesis that language can be learned step by step. Considering things the other way around and from a static point of view, this implies that a language can be cut into simpler parts such that meaning is roughly preserved from the simpler parts to the whole language. Given a language L and its set of grammatical expressions GTL , one can imagine two kinds of analysis of GTL into simpler languages: • We can drop a few basic expressions of AL (the set of atomic expressions of L) in order to see it as the addition of a few new expressions determined in a certain way to an already determined language. This is a shift from GTL to GTL0 where AL ) AL0 and AL \AL0 is required to be as small as possible. • We can see GTL as the union of already determined S smaller languages. This is a shift from GTL to {GTLi }i∈I , where AL = {ALi }i∈I . Intuitively, the first case corresponds to an enrichment of the language, new expressions being added together with new semantic rules as implicit definitions. For example, GTL0 could be a purely logical language and GTL the language of arithmetic. The second case amounts to dividing up the language into separate parts. For example, one of the GTLi could be the mathematical part of language and another one of them could be the purely physical talk consisting in the ascription of everyday life properties to macroscopic objects. The two cases should be distinguished, because in the first one, some new information on meanings must be available, so that the speaker can understand the meaning of expressions with occurrences of basic expressions among AL /AL0 . In the second case, no new information should be required to grasp the meaning of sentences of the whole language. Intuitively, it should be possible to consider ν as completely determined by the νi . If special conditions hold, namely if ν extends the νi , is compositional and if synonyms can be found at will across the GT (Li ), the meaning of every new term of GT (L) is indeed already deter-
Compositionality and Molecularism
53
mined.7 From this top-down perspective, molecularity for a language L amounts first to the possibility of performing such analysis on L until one reaches the empty language: Definition 1 (Molecular Compositional Tree). A molecular tree for a language L is a tree constructed from the root GTL by applying one of the two operations of analysis just defined, such that at each node, the sublanguage is compositional and the end nodes are the empty set. But of course, molecularity encapsulates at the same time the thesis that through such an analysis, meanings are roughly preserved. Here the point is to agree on which kind of rough preservation of meaning we require. Given a molecular tree for a language L labeled with semantics νs for the sublanguages at each node s, let’s say that the property (a) holds at s iff for every node sa immediately below s, νs ≤ νsa , and that the property (b) holds iff νs ≥ νsa . The following definitions are intended to capture the various kind of requirements that one may impose upon a molecular tree to be a molecular analysis. Definition 2 (Static Molecular Analysis). A molecular compositional tree for a language L with a semantics ν for GTL , labeled with semantics νs for the sublanguages at each node s, is • a strong static molecular analysis of L iff at every node s, properties (a) and (b) hold. • a cumulative static molecular analysis of L iff at every node s property (a) holds. • an anti-cumulative static molecular analysis of L iff at every node s property (b) holds. Assume that for every α(b1 ...bn ) ∈ GT (L), there exists i ∈ I and b01 ...b0n such that ∈ GT (Li ) and bk ≡ν b0k . Then, we do have ν 0 (α(b1 ...bn )) = νi (α(b01 ...b0n )), since by compositionality of ν 0 , ν 0 (α(b1 ...bn )) = ν 0 (α(b01 ...b0n )) and ν 0 is an extension of νi . This means that the functions rα interpreting the syntactic rules α of L are determined by those doing the same job for the Li . More generally, what most probably happens in real life cases, is that the values of the rα for unknown inputs is inferred by some kind of induction. For example, if you learn to interpret compositionally conjunction as intersection on a small language, you go on interpreting it the same way when you have new semantics values as inputs. Unfortunately, the formalization of this point, which would involve an account of rule-following, goes beyond our present aims and possibilities. 7
α(b01 ...b0n )
54
Denis Bonnay
• a weak static molecular analysis of L iff at every node s either property (a) or property (b) holds. Of course, we take a language to enjoy static molecularity if and only if it has a molecular analysis. But the last definition shows that static molecularity is not an all or nothing concept: one can imagine many ways in which meanings are preserved from parts to the whole. The question is: which notion of molecular analysis do we need in order to support (EPC)? Precisely on the ground of (EPC), Dummett’s position is in favor of a strong interpretation of molecularity. But it has been argued that such a strong conclusion is not justified (Pagin 1997; Pagin 200?). Let’s restate Pagin’s argument in our terms. First, only a very strong version of holism implying the thesis Pagin has labeled the Total Change Thesis according to which any change of acceptance of sentences or inferences will change the meanings of the sentences, threatens (EPC). This means that it is perfectly possible to accept a weaker form of holism without being committed to the charge of circularity developed in the first section. It is clear that if the Total Change Thesis is true, understanding any sentence requires understanding every sentence, so that the detour provided by word meaning is epistemically useless. But, following Pagin, it is perfectly compatible with holism to refuse the Total Change Thesis so that: the meanings assigned to my expressions at time ti , based on the set Σi of sentences and inferences that I accept then, are the same as the meanings assigned to my expressions at time tk , based on the new set Σk of sentences and inferences I accept at tk , provided my decisions between ti and tk are normal. (Pagin, 1997, p. 30) If this is correct, grasp of the meaning of basic terms in the context of a sublanguage would guarantee reliable knowledge of their contribution to meanings of expressions in wider contexts of use. Another solution is to reject the black and white picture Dummett wants us to draw. Though perfect knowledge of meaning may depend on total acquaintance with the whole set of accepted sentence and inferences, partial acquaintance with this set might well yield partial knowledge of meaning, so that understanding of basic expressions in the setting of a fragment of the whole language would not be irrelevant to understanding of sentences containing them in the whole language. This view has been advocated by Bilgrami (1986) and more recently by Dresner (2002);8 these authors interpret it as a rejection of Dum8
Dresner (2002) thinks that the partiality view supports the shift from a modeltheoretic account of meaning to an algebraic one. Unfortunately, Dresner is very sketchy as to how this algebraic setting can be put to good use. The approach we develop here is clearly related to Dresner’s, as long as it leaves room for the partiality view, the main
Compositionality and Molecularism
55
mett’s requirement of molecularity, we prefer to give it a dual interpretation according to which what is sought for is a weaker notion of molecularity. The common strategy behind these counter-arguments against Dummett is to enhance the role played by regularities among meanings even in a holistic setting. The point is that even if (EPC) is not literally true, some weaker version of it might be sufficient to explain the acquisition of our semantic competence. Even though fully grasping the meaning of a complex expression might involve irreducible knowledge about its use, knowledge of the meanings of basic terms of a complex expression, as determined by their use in other contexts, could do a significant part of the job and yield rough knowledge of the meaning of the complex expressions, where rough would mean either reliable though defeasible or partial and incomplete. How does this fit with the various kind of molecular analysis we defined? Dummett’s position is captured by the notion of strong molecular analysis, because in this kind meanings are kept fixed along the tree. Without further ado, the partiality view seems to correspond to the notion of weak molecular analysis: the synonymy relation gets refined or new synonymy are discovered, but it is assumed that this process allows for some kind of progressive grasp of meaning; acceptance of the cumulative or anti-cumulative view would stem out of the claim that only one of this two processes takes place, because of some special features of language learning. The reliabilist view is captured also by the weak notion, or perhaps more adequately by some mixed version of the strong and the weak notions: the strong case would be the rule, although some local exceptions are allowed to take place, according to the weak notion. In the rest of this paper, our aim will not be to adjudicate between these competing stories about molecularity and the possibility of language learning, but to see how they fit in the compositionality view about meaning and to see which constraints on an Inferential Role Semantics ensue from them. Adding inferential roles Hodges (2001) provides an account of how sentence meaning determines word meaning. We have just sketched an account of how molecularity works for word meaning. To push things further and characterize dynamic molecularity, we still need an account of how sentence meaning is constrained by meaningconstitutive inferences. So now we add a representation of inferential roles. These roles consist in acceptance of certain sentences or inferences. As a consequence, they directly difference being that we are not committed to a determinate way of representing meanings, since we work in Hodges setting in which, in a very Quinean spirit, everything is dealt with thanks to the synonymy relation, bypassing the representation problem.
56
Denis Bonnay
determine meaning only at the sentence level: we have then to explain how this direct determination works and Hodges result will provide for free the understanding of how it bears more generally on term meaning. The intuition is that semantics for a given language are constrained by the acceptance of certain inferential roles for the terms of this language: any semantics for a language must respect the meaning relations underlying the recognition of these roles. Let’s then define a constrained language L, SL = hSL , `L i, as a set of sentences of a language L equipped with a consequence relation on ℘(SL ) × SL induced by the set of semantic rules RL of L. The notion of semantics is enriched in order to take into account the consequence relation: it is an ordered set hM, i and a function µ : SL → M such that if φ ` ψ, then µ(φ ) µ(ψ). is intended to mirror at the semantic level the syntactic consequence relation: meanings are equipped with a entailment relation. First, let’s make clear some points concerning this enriched notion of semantics and ask the two following questions: given a constrained language SL 1) what are sufficient conditions for the existence of a 1-compositional semantics for SL ? and 2) is there something like the good semantics for SL ? We shall then prepare the ground for the definition of dynamic molecularity by asking: given a constrained sublanguage SL0 of SL with semantics µ 0 , 3) what are the conditions for the existence of a (unique) compositional semantics µ for SL such that µ ≤ µ 0 ? and conversely 4) what are the conditions for the existence of a (unique) compositional semantics µ for SL such that the separability property holds between µ and µ 0 ? Let’s start with a few more or less standard definitions.9 Definition 3 (Well-Behaved Language). A constrained language SL is wellbehaved iff `L is reflexive and transitive.10 Definition 4 (Equality Friendly Language). A constrained language SL is equality friendly iff if φ `L ψ and ψ `L φ , then for all meaningful contexts s(x) for φ and ψ, s(φ /x) `L s(ψ/x). Definition 5 (Conservativity). Given two constrained languages SL and SL0 such that L0 ⊆ L, SL is conservative over SL0 iff for all φ , ψ in SL0 such that φ `L ψ, φ `L0 ψ. Definition 6 (Extension). Given two constrained languages SL and SL0 such that L0 ⊆ L, SL extends SL0 iff for all φ , ψ in SL0 such that φ `L0 ψ, φ `L ψ. 9
Definition 3 matches Tarski’s well known definition of a consequence relation. Definition 4 is intended to yield 1-compositionality, and it is generalized in definition 7. Definitions 5 and 6 are fairly standard. 10 Strictly speaking, we mean that the relation on SL × SL induced in the expected way by `L is reflexive and transitive.
Compositionality and Molecularism
57
Definition 7 (Fully Equality Friendly Language). A constrained language hSL , `L i is fully equality friendly iff for all semantic categories α, for all terms p, q, belonging to α, there is a context e(x) such that if e(p/x) `L e(q/x) and e(q/x) `L e(p/x), then for all meaningful contexts s(x) for p and q, s(p/x) `L s(q/x).11 Before tackling questions 1) to 4), let us make clear the following points. First, traditional logical consequence relations are taken to be structural and monotonic. Here the consequence relation is not purely logical: it might involve special rules for special expressions, therefore structurality does not make sense. As to monotonicity, it might proved hard for a non-monotonic language to be equality friendly in the sense of definition 4, but since equality friendliness is the important thing here, we do not focus on monotonicity. Second, the semantics we are after is not a model-theoretic semantics adequate for the consequence relation. Because we just want to be able to put Hodges (2001) to good use, we shall be satisfied here with the semantics given by a synonymy relation. To answer question 1), one thus needs assumptions on `L so that it gives a synonymy relation from which a semantics can be constructed in the usual way. We set φ ≡`L ψ iff φ `L ψ and ψ `L φ and [φ ] `L [ψ] if φ `L ψ.12 If we assume that SL is well-behaved, that is `L is reflexive and transitive (these are already the two basic axioms characterizing a consequence relation according to Tarski), ≡`L will be an equivalence relation on the set of formulas and `L will be well-defined.13 What kind of compositionality do we need for our semantics? As SL is not closed under subterms, full compositionality certainly does not make sense. What we will need is to satisfy the hypothesis of the first extension theorem, that is µ ≡`L must be 1-compositional in Hodges sense, that is we must have: If s(x) is a meaningful context for two sentences φ and ψ such that φ ≡`L ψ, then s(φ /x) ≡`L s(ψ/x). It is trivial to check that ≡`L is 1-compositional if and only if SL is equality friendly. 14 11
Note that Equality Friendliness does not imply Fully Equality Friendliness, it implies it of course for the category of sentences, but not necessarily for other syntactic categories. 12 Other notions of synonymy might be considered, such as having the same consequences, they will generally be provably equivalent only if the consequence relation satisifies certain logical properties. 13 Technically, this is trivial, `L as a relation on sentences will be a preorder and the ordered set of meanings is the order associated to it. 14 Another hypotheses is needed for the first extension theorem, husserlianity which says that synonymous expressions belong to the same syntactic type, that is synonymous
58
Denis Bonnay
Question 2) is analoguous to Hodges question as to which extension of a semantics from the class of sentences to the class of terms is the good one; in the sense that we need some guiding principles on how to extrapolate from `L to the semantics for SL . There are two very different kind of answers to this kind of question: 1. One may wish to restrict the class of available semantics on independent grounds, hoping that this will be enough to ensure uniqueness. For example, one might decide that the meaning of a sentence must be a set of possible world. 2. One may wish to add a requirement in the spirit of Hodges full abstraction principle, that is a requirement to the effect that everything that matters to meaning must be observable, in this case, that every link among meaning must mirror an explicit inferential link. We will say that the semantics is determinate. Technically, one adds a converse requirement as to what meaning should be: if φ 0 ψ, then µ(φ ) 6 µ(ψ). We follow here, by default so to speak, the second strategy: according to it, the algebraic semantics just outlined is the good semantics. First, it is immediate that it satisfies the requirement: if φ 0 ψ, then µ(φ ) 6 µ(ψ). Second, two semantics satisfying it are unique up to equivalence. Let µ and µ 0 be two such semantics, assume φ ≡µ ψ, this means that µ(φ ) = µ(ψ), therefore we have both µ(φ ) µ µ(ψ) and µ(ψ) µ µ(φ ), by determinateness, this implies µ(φ ) `L µ(ψ) and µ(ψ) `L µ(φ ), and then by definition of a semantics, µ 0 (φ ) 0µ µ 0 (ψ) and µ 0 (ψ) 0µ µ 0 (φ ), whence µ(φ ) = µ(ψ) that is φ ≡µ ψ. We tackle now question 3). Let µ and µ 0 be the two determinate semantics for SL and SL0 . µ ≤ µ 0 might well be false. Let’s assume that φ ≡µ ψ, it might well be the case that φ 6≡µ 0 ψ, because for example φ `L ψ but φ 0L0 ψ. To ensure that this is not the case, it is sufficient that `L be conservative over `L0 . If the language is rich enough, this is also a necessary condition, assume that φ `L ψ and φ 0L0 ψ, under reasonable assumptions on the meaning of ∧, we have then φ ` ψ ∧ φ and of course ψ ∧ φ ` φ so that ψ ∧ φ and φ will be µ-synonymous but not µ 0 -synonymous. Note that if determinateness does not hold, it might well be the case that µ ≤ µ 0 though conservativity does not hold, because restrictions on possible meaning assignements were already implementing in µ 0 the synonymies revealed by `L . The answer to question 4) is sligthly more difficult. If `L0 is the restriction of `L to SL0 , as we shall assume from now on, it is immediate that µ 0 ≤ µ. But, expressions can be plugged in the same contexts. As we are dealing with sentences, we can assume that this property is always satisfied.
Compositionality and Molecularism
59
as we have seen in the proof of the second extension theorem, this will not be informative enough when we will be interested in shifting to semantics for GL instead of SL . We need the separability property; how can we guarantee that it will hold through constraints on `? The separability property says that if two terms can be separated in the semantics for the big language, they can already be separated in the semantics for the small language. As we have said, it is clear that `L being an extension of `L0 is not in general sufficient to guarantee that. But it will do if there is something in the language like a universal separator for terms of a same semantic category. This is exactly the role the e(x) contexts in definition 7 of fully equality friendly languages play.15 Let SL and SL0 be two fully equality friendly constrained languages, with semantics µ and µ 0 such that L0 ⊆ L and SL extends SL0 . We prove that the separability property holds. Assume for contradiction that there are two terms p and q of GT (L0 ) such that (*) p and q can be separated by a context s(x) in SL , but that there is no separating context for them in SL0 . This implies that e(x) is not such a context, so that e(p/x) is synonymous with e(q/x), but then (modulo determinateness) this means that e(p/x) `L0 e(q/x) and e(q/x) `L0 e(p/x) hold. By extension, e(p/x) `L0 e(q/x) and e(q/x) `L0 e(p/x) hold as well16 , so that, by SL being fully equality friendly s(p/x) `L s(q/x) and s(q/x) `L s(p/x), which contradicts (*). Dynamic molecularity yields static molecularity On the basis of the last subsection, we are now able to define a notion of dynamic molecularity as a property of constrained languages, in a way which follows closely the definition of static molecularity. Let L be a language constrained by semantic rules `L for S(L). As before,one can imagine two kinds of analysis of S(L) into simpler languages: • We can drop a few basic expressions of AL in order to see it as the addition of a few new expressions determined in a certain way to an already determined language. This is a shift from S(L) to S(L0 ) where AL ) AL0 and AL \AL0 is required to be as small as possible. `L0 is be taken to be the restriction of `L to sentences of SL0 • We can see SL as the union of already determined smaller languages. This 15
The situation here, where the semantics for the big language is constrained by an extension of the consequence relation of the small one, is very different from, and, in fact, better than Hodges purely semantic setting, in which Preservation property has no reason to hold even if the semantics for sentences of the big language is an extension of the semantics for sentences of the small language. 16 We assume here that the e(x) contexts remain the same in SL as in SL0 .
60
Denis Bonnay
S is a shift from SL to {SLi }i∈I , where AL = {ALi }i∈I . `Li are be taken to be the restrictions of `L to sentences of SLi Definition 8 (Molecular IRS Tree). A molecular tree for a constrained language SL is a tree constructed from the root SL by applying one of the two operations of analysis just defined, such that at each node, the constrained sublanguage is well-behaved and equality friendly, and such that the end nodes are the empty set. Given a constrained language S(L), it is clear that if the semantics µ for GTL is the Fregean extension of the determinate semantics for S(L), a molecular IRS tree for S(L) yields, through Hodges first extension theorem, a molecular compositional tree for hL, µi. Now we just have to capture the analogue of a static molecular analysis. Given a molecular IRS tree for S(L) labeled with consequence relations `s for the constrained sublanguages at each node s, we will say that the property (a) holds at s iff for every node sa immediately below s, `s is conservative over `sa , and that the property (b) holds iff `s extends `sa , both of them being also fully equality friendly.17 Definition 9 (Dynamic Molecular Analysis). A molecular IRS tree for a constrained language S(L) • a strong dynamic molecular analysis of L iff at every node s, properties (a) and (b) hold. • a cumulative dynamic molecular analysis of L iff at every node s property (a) holds. • an anti-cumulative dynamic molecular analysis of L iff at every node s property (b) holds. • a weak dynamic molecular analysis of L iff at every node s either property (a) or property (b) holds. The last definition is intended to capture the counterpart in term of inferential role of (static) molecularity as a property of a semantics. The following proposition is an easy consequence of the previous subsection:18 Proposition 1. Given a constrained language S(L), and a determinate semantics µ for GTL , hL, µi has a X static molecular analysis if S(L) has a X dynamic molecular analysis, where X ranges over { strong, cumulative, anti-cumulative, weak }. 17
As long as the consequence relations for parts are restrictions of those for wholes, the first part of (b) is always satisfied. 18 We skip the ‘only if’ direction, but the needed provisos have been made clear in the previous subsection as well.
Compositionality and Molecularism
5
61
Conclusion
Our starting point was that (PC) helps explain our mastery of language only because it hints at some sort of epistemic reduction: sentence meanings can be grasped through word meanings, and word meanings can be grasped on a basis which does not involve understanding of all and every sentences of the language. According to this claim, the interesting notion to capture, if one is interested in formalizing some kind of touchstone for theories of languages, is not (PC) alone, but (PC) plus some version of molecularism. Molecularity can be seen as a static property of the semantics of a language, but the interesting point is to understand how it ensues from properties of the basis of our grasp of meanings, i.e. the interesting point is to capture the relevant properties from the point of view of the materials at hand for language learning. This is what section III achieves in the setting of inferential role semantics: we characterize a property of semantic rules of a language which yield the relevant static molecularity for the semantics. A key feature of this analysis is Hodges work on Fregean extensions, because it gives means to shift from semantic constraints on sentences through semantic rules to semantics for the whole class of terms. Finally, our characterization result is not phrased as concerning one specific notion of molecularity, because we take molecularity to be rather an open range of variably strong properties, the significance of which should be evaluated from both conceptual and empirical perspectives. Acknowledgements I thank the anonymous referee and the audience at NAC 2004 in Paris for very valuable comments and suggestions on earlier versions of this paper. Special thanks to Benjamin Simmenauer for his discussion of several technical and philosophical points. References Bilgrami, A. (1986). Meaning, holism and use. In E. Lepore (Ed.), Truth and interpretation (pp. 197–218). London: Basic Blackwell. Dresner, E. (2002). Holism, language acquisition, and algebraic logic. Linguistics and Philosophy, 25, 419–452. Dummett, M. (1991). The logical basis of metaphysics. London: Duckworth. Field, H. (1977). Logic, meaning, and conceptual role. Journal of Philosophy, 69, 379–408.
62
Denis Bonnay
Greenberg, M., & Harman, G. (2004). Conceptual role semantics. In E. Lepore & B. Smith (Eds.), The oxford handbook of philosophy of language. Oxford: Oxford University Press. (In press) Hodges, W. (1998). Compositionality is not the problem. Logic and Logical Philosophy, 6, 7–33. Hodges, W. (2001). Formal features of compositionality. JoLLI, 10, 7–28. Janssen, T. (1997). Compositionality. In J. van Benthem & A. ter Meulen (Eds.), Handbook of logic and language. Amsterdam: Elsevier. Kracht, M. (2001). Strict compositionality and literal movement grammar. In M. Moortgat (Ed.), Logical aspects of computational linguistics. Berlin: Springer LNAI. Kracht, M. (2003). The mathematics of language. Berlin: Mouton. Pagin, P. (1997). Is compositionality compatible with holism? Language, 12, 11–33.
Mind and
Pagin, P. (2003). Communication and strong compositionality. Journal of Philosophical Logic, 32, 287–322. Pagin, P. (2004). Meaning holism. In E. Lepore & B. Smith (Eds.), Handbook of philosophy of language. Oxford: Oxford University Press. (In press) Westerst˚ahl, D. (1998). On mathematical proofs of the vacuity of compositionality. Linguistics and Philosophy, 21, 635–643. Zadrozny, W. (1994). From compositional to systematic semantics. Linguistics and Philosophy, 17, 329–342.
Compositionality, Aberrant Sentences and Unfamiliar Situations Filip Buekens Truth-conditional compositionalists hold that understanding is boundless, hence that there is a sense in which we understand such syntactically well-formed but semantically aberrant sentences as ‘John cuts the sun’ or ‘Oscar opens the mountain’. John Searle has denied this and recent defenders of ‘TruthConditional Pragmatics’ like Franc¸ois Recanati agree with him. In this paper I offer a number of arguments against the thesis that we don’t understand aberrant sentences. What we lack is the capacity to form a representation of what the assertive use of such sentences is an epistemic substitute for. 1
Truth, Understanding and Compositionality
A standard picture connects the key notions of compositionality, truthconditional semantics and understanding as follows: unless context contributes in a systematic way on every occasion of uttering a term t of a language L its semantic value (as is the case for indexicals and demonstratives), there cannot be further systematic impact of context on the condition under which an uttered sentence is true. The truth-condition of an uttered sentence is fully determined by syntactic relations (structure) between constituents and the semantic values contributed by the constituents. Semantic values are identified as the values terms contribute to a sentence’s truth-conditions and every semantically relevant contribution is reflected in the syntactic structure of the sentence used. An utterance’s truth-condition, in turn, locates that sentence in a logical space characterized by valid inferential patterns. Semantic values and rules of composition must be chosen such that systematicity and productivity are preserved. Compositionalists also hold that there is a sense of understanding such that knowledge of a theory of truth that conforms to the compositionality requirements which yields empirically adequate truth-conditions for every sentence of L is sufficient for understanding the semantic properties of an infinite number Address for correspondence: Tilburg University, PB 90153, 5000 LE Tilburg, The Netherlands. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
64
Filip Buekens
of actual and potential utterances of sentences of L. If you understand the utterances ‘a is F’ and ‘b is G’, you will also understand the recombinations ‘b is F’ and ‘a is G’. (See Evans, 1982, for what he dubbed ‘the Generality Constraint’.) There could have been more or less utterances than there actually are, but not all potential sentences are useable sentences. And whilst there may be situations in which we can truly say that a is F or that b is G but perhaps no situation in which we can truly say that b is F or that a is G, we are still in a position to understand what someone would have said had she said that a is G or that b is F. Compositionalists hold that, since these sentences have truth-conditions that can be derived from an empirically adequate theory of truth for L, they fall within the domain of sentences we understand. The compositional nature of the semantic properties of utterances is the best explanation for that capacity. There are, of course, other intentional aspects of utterances that mobilise different aspects or concepts of understanding as, for example, understanding what type of speech act a particular utterance falls under, or understanding what the speaker means by uttering a sentence (the utterance’s conversational implicatures), but these aspects of understanding are not addressed by compositionalists. Compositionalists are happy to accept that not all forms of understanding should be modelled after a compositional account of understanding. Other types of understanding might be grounded in, for example, our capacity to infer information from decoded semantic information as in conversational implicatures. A compositionalist intends to capture what someone who understands a language can decode when she hears a sentence uttered in that language. She doesn’t claim to have explained the complex nature of understanding communicative behaviour. Compositionalists weave the central notions of understanding, truthconditions and compositionality together as follows: to understand the assertive use of a sentence of a language L is to know the condition under which that utterance is true. Truth conditions are compositionally determined and understanding is boundless in the sense that, for any actual or potential syntactically well-formed utterance of a sentence of L a speaker/hearer of L knows under which condition that actual or potential utterance is true. Compositionalism (a.k.a. semantic minimalism (Borg, 2004), because context plays a minimal role in determining truth-conditions) denies that, after disambiguating a sentence and assigning semantic values to indexical and demonstrative expressions in the sentence as uttered, its truth conditions may still vary with variations in the background (see Bezuidenhout, 2002). This very general picture leaves open the exact format of a compositional theory of truth, its ontology, and it is neutral with respect to the way language users implement the kind of knowledge knowledge of a theory of truth for their language is supposed to be an abstract model of. Knowledge of a theory of truth is
Aberrant Sentences
65
sufficient for understanding an infinite number of actual and potential utterances of sentences (Larson & Segal, 1995). An empirically adequate compositional theory of truth is thus a highly theoretical model of the cognitive equivalent of such a theory – the implicit knowledge we have of a language L and which allows us to grasp the meaning of sentences in L. 2
Aberrant Sentences and Unfamiliar Situations
In 1980 John Searle launched a devastating attack against the tight connection between understanding, compositionality, and the nature of truth-conditions just sketched, and he was followed in this by Charles Travis (1981, 1995), , Julius Moravcsik (1998), Anne Bezuidenhout (2002), and Franc¸ois Recanati (2004) among others (I will concentrate on Searle and Recanati’s arguments). John Searle argued as follows. While it is obvious that we understand such sentences as Tom opened the door Sally opened her eyes The surgeon opened the wound we do not understand, according to him, sentences like Bill opened the mountain Sally opened the grass Sam opened the sun And we would not have understood ‘John opened the door’ correctly if we though he opened the door the way we usually open a shoebox. On the last three sentences Searle makes the following intuitive comments: [t]he difficulty with these sentences is that though I understand each of the words, I don’t understand the sentences. I do not know the truthconditions are supposed to be determined ... I do not know what I am supposed to do. (...) I can imagine a context in which I would be able to determine a set of truth-conditions; but by themselves, without the addition of context, the sentences do not do that. (Searle, 1980, p. 222) In Intentionality he writes: (t)hough the semantic content contributed by the word ‘open’ is the same in each set of the first set, the way that semantic content is understood is quite different in each case. In each case the truth-conditions marked by the word ‘open’ are different, though the semantic content
66
Filip Buekens
is the same. What constitutes opening a wound is quite different from what constitutes opening a book. Concerning aberrant sentences, he points out that There is nothing grammatically wrong with any of these sentences. They are all perfectly good sentences and we easily understand each of the words in the sentences. But we have no clear idea at all of how to interpret these sentences. We know, for example, what ‘open’ means and what ‘open the mountain’ means. If somebody orders me to open the mountain, I haven’t the faintest idea of what to do. I could of course invent an interpretation for each of these, but to do that would be to contribute more to understanding than is contributed by literal meaning. (Searle, 1983, p. 147) Searle’s argument is (characteristically) crystal-clear. He argues as follows: when you understand the utterance of a sentence, you must be able to determine its truth-conditions (no understanding without knowledge of truth-conditions). Aberrant sentences, however, show that without addition of context, their truthconditions cannot be determined. Understanding these sentence requires that I imagine a situation in which they are true. Since imagining – a mental act – must accompany the application of knowledge of the compositional rules to the uttered sentence, understanding involves a lot more than just the exercising of a mental capacity or module an empirically adequate theory of truth would be a theoretical model of. To assign truth-conditions to sentences (aberrant or not), we must, in addition to understanding the semantic values of terms and the way they are composed, mobilize background knowledge: As members of our culture we bring to bear on the literal utterance and understanding of a sentence a whole background of information about how nature works and how our culture works. A background of practices, institutions, facts of nature, regularities and ways of doing things are assumed by speakers and hearers when one of these sentences is uttered or understood.’ (Searle, 1980, p. 227) Whilst understanding a sentence like ‘John opens the book’ doesn’t require that I actively imagine a context or situation in which it is true, background knowledge is always at work on a tacit level, in the sense that we more or less automatically enrich the sentence’s truth-conditions with background information. This shows that what I need to know to understand that sentence is more than what compositionalists claim is sufficient. Lack of suitable background information would turn ‘John cuts the lawn’ into an aberrant sentence. Generalizing this argument yields the following picture: to understand actual or potential utterance of sentences – to understand the truth-conditions of
Aberrant Sentences
67
their utterances – speakers must not only have acquired or mastered the cognitive equivalent of a compositional theory of truth, but also possess suitable background information. If what a speaker understands when he understands an uttered sentence is captured by truth-conditions – an assumption about the format of a semantic theory Searle (and other truth-conditional pragmatists) accepts, then it follows that truth-conditions must be enriched with suitable background information. Searle’s forceful argument breaks the close tie between understanding actual and possible utterances of sentences and the psychological equivalent of knowledge of a compositional theory of truth. To understand a sentence – to get at the truth conditions that capture our intuitive notion of understanding – one must have tacit knowledge of (the cognitive equivalent of) an empirically adequate compositional theory of truth and have suitable background knowledge. Since knowledge of background conditions cannot be reduced to knowledge of rules with a strict determinate format (background information is uncodifiable), understanding an actual or potential utterance is not determined by a compositional theory. So, contrary to what was assumed, knowledge of a theory of truth for L turns out to be not sufficient for understanding potential or actual sentences in L. (Note that Searle defines aberrant sentences as sentences wholly composed of terms one understands and combined according to the rules that apply to L. Aberrant sentences shouldn’t be confused with Jabberwocky Sentences, where new terms not understood are combined. We don’t understand “’twas brillig, and the slithy toves / Did gyre and gimble in the wabe” (Lewis Carroll) in the semantic sense we defined earlier, for we do not know the semantic values of the terms used.)
3
Recanati on Truth-conditional Intuitions
In Literal Meaning Franc¸ois Recanati offers the following argument for a conclusion which reinforces and deepens Searle’s earlier claim: truth-conditions in a theory of semantic understanding – a theory of what a speaker says on a given occasion – are pragmaticallyenriched. Truth-conditions obtained in a purely compositional way are too impoverished to play any significant cognitive role in understanding, i.e. to count as encapsulating a cognitively relevant notion of ‘what is said’ when we understand an uttered sentence. The concept of understanding captured by compositionalists (see section 1) is not consciously available. It is an artificial concept of understanding, a concept that doesn’t correspond with a real, accessible and available concept of understanding. (I’ll return to this claim later.) One of his first and crucial moves is to reject scepticism about intuitions about ‘what is said’ voiced by Herman Cappelen and Ernest Lepore, who claim that
68
Filip Buekens
(w)e ourselves don’t see how to elicit intuitions about what-is-said by an utterance of a sentence without appealing to intuitions about the accuracy of indirect reports of the form ‘he said that ...’ Or ‘what he said is that ...’. Or even ‘what was said is that ...’. (Lepore & Capellen, 1997, p. 280) Against this sceptical move about intuitions about ‘what is said’, Recanati offers the following counter-proposal: (t)here obviously is another way of eliciting truth-conditional intuitions. One has simply to provide subjects with scenarios describing situations, or even better, with – possibly animated – pictures of situations, and to ask them to evaluate the target utterance as true or false with respect to the situation in question. He refers to an experiment by Bart Geurts, where (t)wenty native speakers of Dutch were asked to judge whether or not donkey sentences correctly described pictured situations. Instructions urged subjects to answer either true or false, but they were also given the option of leaving the matter open in case they couldn’t make up their minds’ (Recanati, 2004, pp. 14–15) Recanati draws the following parallel: In calling understanding an experience, like perception, I want to stress its conscious character. Understanding what is said involves entertaining a mental representation of the subject matter of the utterance that is both determinate enough (truth-evaluable) and consciously available to the subject. (Recanati, 2004, pp. 16) It follows that if no such mental representation is entertained, as is the case when we confront aberrant sentences, we do not really understand the sentence uttered, for we would not know how to evaluate it. We are not able to entertain a mental representation of the subject matter of such aberrant utterances as ‘Sally opened the grass’, unless we actively imagine one. This shows that understanding such sentences, thought of as a conscious experience, requires more than just the application of (tacit) knowledge of a compositionally determined theory of truth. The explicit appeal to imagination in the case of aberrant sentences reveals the implicit involvement of background knowledge in normal cases. While Searle expands on the necessity of background information for understanding sentences (and intentional phenomena in general), Recanati focuses on the various enrichment procedures involved in getting at the pragmatically enriched truth-conditions of sentences uttered. These enrichment rules draw extensively on contextual knowledge and background information. Recanati, like
Aberrant Sentences
69
Searle, accepts that the format of semantic knowledge is captured by knowledge of truth-conditions, but the truth conditions must be enriched with material that is not derivatively or hence not compositionally obtained. (The components that enrich the truth-conditions are, typically, not represented on the syntactic level.) Understanding a sentence is (a bit like) a perceptual experience in that it consists of entertaining a representation of the utterance’s subject matter. The content of that representation is not determined by compositional features alone. Both Searle and Recanati accept that understanding is boundless: aberrant sentences can be understood, but to make a start the intended hearer must actively imagine (perhaps visually imagine, as Recanati suggests), or be presented with an imaginary situation, as Geurts is interpreted by Recanati) in which the sentence presented is true to get at its truth-conditions (as they understand them); non-aberrant sentences normally fit contextually determined background knowledge. The rules or guidelines for enriching truth-conditions pick out appropriate material in order to get at pragmatically enriched truth conditions. Imagining a situation (in the case of aberrant sentences), or being visually presented with a situation, or simply forming a representation of the utterance’s subject matter, is accessing background information from which the audience selects the appropriate elements to enrich the truth-condition; the result of that process matches our intuitive notion of ‘understanding what a speaker says’. We understand utterances of ordinary sentences because background information is automatically accessed. We can come to understand aberrant sentences by actively imagining an unfamiliar situations in which that sentence would have been true. In both cases, semantic understanding involves more than knowledge of a compositional theory of truth. In the rest of this paper, I argue that orthodox compositionalists should not be impressed by this type of argument. 4
Rivalling Concepts of Understanding?
Compositionalists and contextualists do not really disagree, one might be tempted to think. The idea would be that two distinct notions of understanding are involved: understanding based on tacit knowledge of what a compositional theory of truth is a model of, and a richer concept of understanding as the (psychologically real) capacity to connect actual or potential utterances with the kind of circumstance that would make those sentences true. Compositionalists are minimalists about understanding hence minimalists about truth-conditions, while contextualists invoke a richer conception of understanding, hence a richer conception of truth-conditions. But there is a genuine, non-trivial disagreement lurking in the background.
70
Filip Buekens
There is, first of all, the obvious fact that we can provide truth-conditions for aberrant sentences. Compositionalists like Larson and Segal (1995) hold that an interpretative adequate, compositional theory of truth for English yields the following truth-conditions for aberrant sentences (Larson & Segal, 1995, pp. 47): (1) ‘John’s toothbrush is trying to kill him’ is true iff John’s toothbrush is trying to kill John (2) ‘Max went to the pictures tomorrow’ is true iff Max went to the pictures tomorrow For the oddness of aberrant sentences they offer the following diagnosis: Briefly put, it seems to arise not from an absence of truth conditions for (1) but rather from the presence of truth conditions that we do not know how to evaluate. The truth condition in (1) seems to involve some kind of misapplication of concepts that make it difficult for us to see how or in what circumstances they could actually apply. For example, murderous intentions are only attributable to sentient beings that can plan and act. But in the world as it is constituted, dental instruments do not and, it seems, cannot fall into this category. Similarly, past conditions like those specified by Max went to the pictures seem incompatible with occurrence in the future (tomorrow). Since time is noncircular, what was cannot also be part of what will be, and so such a situation is ruled out. (...) These points suggest that if there is a theory of aberrancy to be developed, this lies outside semantics proper and within a more general theory of human concepts. The T-theorems in (1) and (2) require situations not merely that do not obtain, but that cannot obtain, given our conceptual map of the world. The general theory of this map, which may lie entirely outside of linguistics, depending on the ultimate relation between language and thought, would presumably tell us which concepts are compatible for humans, and so what could constitute truth-conditions framed in compatible concepts. (Larson & Segal, 1995, p. 47) Although Larson & Segal do not explicitly claim that we understand aberrant utterances, it is clear enough from what they say and from their compositional approach to meaning that if we agree that these potential utterances have truthconditions, we should also agree that there is a genuine, intuitive sense in which we understand those sentences. The crucial point is that understanding the conditions under which they are true differs crucially from seeing in what kind of circumstances they could apply – circumstances in which they would be evaluated as truths. A theory of what makes a particular sentence aberrant for a
Aberrant Sentences
71
speaker S ultimately part of a theory of human concepts. It does not follow from that fact that we do not see in what kind of circumstance a sentence could apply – i.e. could be used as a true sentence – that we do not understand the sentence. Secondly, there is no need to consider unuttered aberrant sentences to see their point. There is an important sense in which we fully understand, on the basis of knowledge of what a theory of truth is a model of, the following actual utterance: I’m a riddle in nine syllables An elephant/a ponderous house/a mellon strolling on two dendrils. (From ‘Metaphors’, in Sylvia Plath, Collected Poems, London, 1981, p. 116.) Every speaker of English (everyone who is in a position empirically equivalent with someone who has mastered (the cognitive equivalent of) an empirically adequate a compositional theory of truth for English) understands these lines – what they mean. The author can reasonably count on her reader’s knowledge of English when she wrote these lines. Of course, there is another sense in which we may (or may not) understand what Sylvia Plath intended to convey, but that merely shows that different concepts of understanding are involved or can be construed and that what a sentence means and what the speaker means, shouldn’t be confused. (Recall the notoriously polysemous character of words like ‘understanding’ and ‘meaning’!) Given that utterances are produced with many intentions (only some of which have the Gricean feature of being genuine communicative intentions), we should expect that there are at least as many corresponding levels of understanding. The dispute would be settled too easily if both parties agreed that different notions of understanding are in play, that compositionalists and contextualists talk at cross-purposes. On the contrary: both parties intend to elucidate the one notion of understanding that corresponds with, or captures, an utterance’s semantic properties. The claim is that we understand the semantic properties of Sylvia Plath’s lines, and that such understanding is a necessary condition to get at what she intended to communicate, or to undergo intended poetic effects. The Searle/Recanati-argument illustrates, according to its proponents, that we must appeal to our faculty of imagination if we want to understand aberrant sentences (in their sense of ‘understand’), while ordinary sentences don’t require that we actively imagine unfamiliar situations (background information is mobilised automatically upon hearing a familiar sentence). But this begs the question against the compositionalist, for the argument already assumes that the notion of understanding to be captured must be a richer one than the concept of understanding pure compositionalists have in mind – it could not be the modest one that characterizes our capacity to understand an infinite number of
72
Filip Buekens
sentences, including aberrant sentences. Thirdly, the fact that aberrant sentences can be used as instructions to imagine unfamiliar situations in which they can be judged to be true (’Imagine a situation in which John is cutting the sun!’) shows that the compositionalist’s modest concept of understanding can be isolated and can be made consciously available. It does not follow from the fact that the modest concept of understanding thus isolated is not always consciously available that it cannot be made available. It is dispositionally available, and actively available when required. (Recanati’s Availability Principle plays an important role in his argument: his claim is that our modest notion of understanding is not one that is available – it does not yield input for conscious, inferential mechanisms. I accept M. Garcia-Carpintero’s (2001) objection that availability must be given a dispositional reading and that an encounter with aberrant sentences is one among many ways the dispositional state can be turned into an occurrent state.) And an encounter with an aberrant sentence surely is an occasion for semantic understanding to become an active, conscious state of understanding: we struggle to reason from what we understand to what could possibly be a situation in which a use of that sentence would constitute a true utterance. 5
Rivalling Conceptions of Truth-condition?
Recanati points out that compositionalists (but his taxonomy is far more richer than I can describe here) criticize contextualists (or truth-conditional pragmatists) insofar as they confuse knowledge of truth-conditions with the availability of a procedure for the verification of the actual truth-value of the utterance in a given context (I owe this formulation to Dascal, 1981, p. 174. The argument reoccurs in Borg, 2004): (t)here is a crucial difference between ‘knowledge of truth-conditions’ and the knowledge that truth-conditions are satisfied (...) the sentence ‘Oscar cuts the sun’ does possess truth conditions; such truth conditions are determined by a recursively truth theory of the language, which issues theorems such as ‘Oscar cuts the sun’ is true iff Oscar cuts the sun’. We know those truth conditions provided we know the language. (Recanati, 2004, p. 92) What to make of the charge, made by compositionalists, that contextualists confuse understanding with evaluating – the core argument behind the ‘verificationist’ charge? This, it seems to me, is not the most powerful argument against contextualists, for a contextualist can always reply that he is not a verificationist viz a` viz the expanded (pragmatically enriched) truth-condition of uttered sen-
Aberrant Sentences
73
tences. He can maintain that he is a realist with respect to these truth-conditions. This reply shifts the burden of argument back to the compositionalist. Recanati rejects the notion of truth-condition pure compositionalists employ as too weak for different reasons: This move strikes me as an unacceptable weakening of the notion of truth-condition. The central idea of truth-conditional semantics (as opposed to mere translational semantics) is the idea that, via truth, we connect words and the world. If we know the truth-conditions of a sentence, we know which state of affairs must hold for the sentence to be true and that means that we are able to specify that state of affairs. (here Recanati refers back to the test, mentioned earlier, developed by Bart Geurts, FB). T-sentences display knowledge of truth-conditions in that sense only if the right-hand-side of the biconditional is used, that is, only if the necessary and sufficient condition which it states is transparent to the utterer of the T-sentence. If I say “’Oscar cuts the sun’ is true iff Oscar cuts the sun” without knowing what it is to cut the sun, then the T-sentence I utter no more counts as displaying knowledge of truth-conditions than if I utter it without knowing who Oscar is, (i.e., if I use the name ‘Oscar’ deferentially, in such a way that the right hand side is not really used, but involves some kind of mention. (Recanati, 2004, pp. 92–93) First, it does not seem to be correct to ascribe to the compositionalist as a disquotationalist or a defendant of a translational semantics. Disquotationalists claim that knowledge of the truth-conditions of a sentence involves the exercise of a capacity to apply a disquotation rule – a rule to the effect that if one quotes a sentence on the left hand side of the predicate ‘is true iff’ and uses the very same sentence on the right hand side, one knows one has thereby obtained a true T-sentence. (That is suggested by Recanati’s remark that we can in this sense display our knowledge of the truth-condition of ‘Oscar cuts the sun’.) However, what makes knowledge of truth-conditions of actual and potential utterances of sentences in a language L genuine knowledge that suffices for understanding is the requirement, built into the original compositionalist proposal, that the theory of truth knowledge of which is supposed to be sufficient for understanding actual and potentially uttered sentences has that property of being known to be an empirically adequate theory only if it is verified by its users. The equivalent of that theoretical requirement for actual speakers of L is that understanding sentences must be based on having mastered the language, and not be based upon having mastered a disquotational device and knowledge that applying that principle yields correct truth-conditions. Pure Compositionalists should not be committed to disquotationalism.
74
Filip Buekens
Secondly, a translational semantics doesn’t give us knowledge of what an expression means; it connects expressions in different languages via the ‘means the same as’ (or the “...’ is true in L iff ‘... in L’ is true’ relation) without telling us what the expressions mean (the original argument goes back to Davidson, 1984). As with other forms of knowledge, what one knows if one understands a sentence must be acquired in the right way. (Similarly for proper names: it does not suffice to understand a name that one knows that quoting a name and plugging it into the left hand side of ‘refers to’ and using the same name on the right hand side of the scheme, one thereby understands, or manifests one’s understanding of that name. Recanati’s deferential use of the name ‘Oscar’ (as when a taxi driver enters the pub, saying ‘Taxi for Oscar!’, when ordered to pick up a guy called ‘Oscar’ in the nearby pub) is better described as the attributive use of a proper name. The taxi driver is using the name ‘Oscar’ attributively – to refer to whoever is named ‘Oscar’. There is no sense in which the taxi driver does not understand what he says as the deferential use theory would suggest. Understanding as (the cognitive equivalent of) knowledge of an empirically adequate compositional theory of truth (and knowledge that it is empirically adequate) applied to utterances is quite a substantial notion of understanding. Speakers confronted with aberrant sentences like ‘John opens the mountain’ can give that sentence a place in inferential patterns, i.e. are able to specify which sentences would logically follow from it (’John opens something’, ‘John did something’, ‘something happened to the sun’) and which sentences it is entailed by. And note that an aberrant sentence can entail non-aberrant sentences. We understand ‘John opens something’ and we understand that this logically follows from ‘John opens the mountain’. Thirdly. Contrary to what Recanati suggests, speakers never display knowledge of the meaning of an uttered sentence – their understanding of it – by actually giving or stating its truth conditions. (If that were the case, the Compositionalist could claim an easy victory, for we can state truth-conditions of aberrant sentences.) Understanding is displayed by successful use, in countless circumstances, of sentences of that language, and that use depends on the speaker’s being correctly interpretable as using sentences with such and such truth-conditions. A compositionalist could characterize the relation between the truth condition assigned by a compositional theory and real speakers and audiences as follows: a truth-condition generated by a compositional theory of truth of L for an actual or potential utterance of a sentence of L is a theoretical description of a semantic property that can be recognized to be instantiated by hearers and be produced by speakers only if they understand L. (Knowledge of a compositional theory of truth for L is a theoretical model of what speakers know when they understand a language). But speakers do not intentionally produce that instantiated property under the theoretical description given by theorists in
Aberrant Sentences
75
their account of a theory of truth for L and ordinary speakers do not recognize it under that theoretical description. This reflects a distinction – a crucial distinction – between an utterance’s actual semantic properties, recognized by hearers and produced by speakers – and the theoretical description of that property as the condition under which that what instantiates it – the speaker’s actual utterance – acquires the property of being true. On this modest account, truth-conditions are systematic descriptions of abstract semantic properties that can be instantiated intentionally by speakers who utter sentences for various purposes. The properties have the following canonical form: (1) xˆ (x is true iff p) (read as: the property of being true iff p) of which (2) is an instantiation: (2) xˆ (x is true iff Oscar cuts the sun) (read as: the property of being true iff Oscar cuts the sun) Speakers can successfully instantiate property (2) by, for instance, uttering the English sentence ‘Oscar cuts the sun’, or, if Oscar happens to be the speaker, by uttering the sentence ‘I cut the sun’. (The speaker must have the intention to produce an utterance with that semantic property. This intention has all the Gricean features of a communicative intention, hence must be ‘open to view’ to and be recognized by the intended audience. Compare Recanati, 2004, p. 14, who accepts this characterization.) Compositionalists claim that semantic properties of complex expressions have an internal structure and compositionality is supposed to be the best explanation for the systematic connections between semantic properties and the fact that new properties can be generated at will (productivity). Aberrant sentences like ‘Oscar cuts the sun’ show that speakers can instantiate the semantic property of being true iff Oscar cuts the sun and that hearers can recognize such instantiations as instantiations, in English, of that semantic property. Theoretical descriptions of semantic properties like (2) are descriptions of semantic properties speakers recognize when they understand an utterance that instantiates (2). Aberrant sentences, and, more generally, sentences ‘one does not know how to verify’ do show something about the nature of truth-conditions and reveal an important distinction, but not that truthconditions should be enriched by new material obtained in non-compositional ways. Note that (2) allows us to isolate semantic properties from the effects, of wordings, i.e. effects of instantiating that property using specific wordings. According to the contextualist view, a sentence’s truth-condition specifies what makes that sentence, used in an utterance, true, and what makes it true is surely richer than what the semantic property of the sentence (on the occasion of use) itself displays. Recanati seems to commit himself implicitly to this picture when he writes:
76
Filip Buekens
(t)here obviously is another way of eliciting truth-conditional intuitions. One has simply to provide subjects with scenarios describing situations, or even better, with – possibly animated – pictures of situations, and to ask them to evaluate the target utterance as true or false with respect to the situation in question. and The ability to pair an utterance with a type of situation is more basic than ... the ability to report what is said by using indirect speech. (Recanati, 2004, p. 14) But now note that Recanati’s procedure doesn’t work for sentences like ‘There are no living dinosaurs in this room’. (This sentence is true for almost any room.) And the sentence ‘All goldfish in the room are surrounded by water’ is vacuously true if there is no goldfish in the room. How can we exclude such trivial truth-making situations? (See Moravcsik, 1998, p. 65, for similar examples.) And there is more. 6
Truth-Conditions, Epistemic Substitutes and Presentational Force
What explains the powerful intuition behind Searle’s aberrant sentence argument and Recanati’s plea for contextually or pragmatically enriched truthconditions? The argument thus far shows that the semantic properties captured by a compositionalist theory of truth-conditions can be made consciously available to the speaker and, on occasion, are consciously available as occurrent states. (Recall that understanding an aberrant sentence is required to form the intention to actively imagine circumstances or situations in which the sentence’s utterance would constitute a true utterance.) Aberrant sentences and unfamiliar situations illustrate when semantic properties are characteristically consciously available. But we must also explain why they need not always be available, and show what knowledge of background conditions enriches. First, why would a speaker intend to instantiate a semantic property one could describe on the appropriate theoretical level as (2) xˆ (x is true iff p) (to be read as: (the property of) being true iff p) and assertively use a sentence which instantiates that property, i.e. when property (2) is true of a particular utterance u: (3) xˆ (x is true iff p)(u) hence when, via lambda-conversion,
Aberrant Sentences
77
(4) u is true iff p is true and, in asserting the sentence he uses, assert that p? What is the point of that complex intentional, communicative action? Here’s a proposal: speakers who use sentences in assertive utterances produce intentional actions which, if understood, are intended (and recognized by the intended hearer to be so intended – the Gricean reflexive intentions are at work) to get the intended hearer (hence not necessarily someone who overhears an utterance) into an epistemic position – the result of acquiring a belief which, in optimal circumstances, constitutes knowledge – similar to the one the intended hearer would be in were she to perceive the situation for which the utterance – the assertive use of a sentence – can be taken as an epistemic substitute. Given the semantic properties it instantiates and given that they are correctly recognized by the intended audience, the utterance activates in the speaker’s intended audience a representation of the situation it was intended, by the speaker, to be an epistemic substitute for. For example, ‘John cuts the cake’ functions, when assertively used by a speaker, as an epistemic substitute for a situation such that, if the intended hearer would be in it, he could see for himself that John is cutting the cake. Hearing the speaker asserts ‘John cuts the cake’ and seeing for himself that John is cutting the cake would, in normal (and favourable) circumstances, put the hearer in the same epistemic position – a position in which he learns that (hence comes to know) that John cuts the cake. Communication is, characteristically, a way (not the only way, and surely a fallible way) of transmitting knowledge (Evans, 1982; Williamson, 2000). An utterance u that amounts to asserting that p is, functions, for its intended audience, an an epistemic substitute for a situation such that if that audience could have seen, or observe, on the basis of direct perception, that scene or situation, it would judge (and know) that p. But nothing in this picture prevents the mental representation the intended audience thus forms, to be richer than simply believing that p. The point of the speaker’s assertive use of a sentence was to create an epistemic substitute for a situation: the ensuing representation caused by understanding an epistemic substitute for that situation should not match, or mimic, the semantic content of the epistemic substitute itself but match the features of the situation it was intended to be a substitute for. (After all, an intended audience interested in acquiring knowledge from a speaker is ultimately interested in knowing what is the case, not in knowing how what is the case is communicated to him by the speaker.) The function of background knowledge is thus not to enrich the truth-conditions of the sentence used by the speaker (the semantic property of the utterance that instantiates it) but to enrich mental representations (contents of beliefs) acquired upon hearing an epistemic substitute for seeing a situation in which it is true that p. The intended audience’s background knowledge is not enriching a semantic property created by the speaker and for which the speaker is ultimately
78
Filip Buekens
responsible. The hearer is enriching his acquired representation with whatever is necessary to make the acquired representation a cognitively interesting representation that matches what she would have acquired were she in a position to see (or in some other way perceive) a situation or a scene perhaps, in which she would judge that p. The mental representation developed upon understanding an assertive use of a sentence (in favourable circumstances) and the representation picked up (caused by) seeing a situation the assertive utterance was an epistemic substitute for, are in a very complex sense similar representations (the latter will be something like a ‘faint image’ of the former, to use Hume’s famous phrase – a use that allows me to circumvent the immensely complex question how much fainter the representation can be allowed to be or to what extent a faint image can be enriched). We can now redescribe the problem with aberrant sentences. A sentence is aberrant for an audience only if that audience is not capable of automatically merging decoded output with suitable background material such that she finds herself to be in a state which represents a particular situation. The alternative is that she must now actively imagine, given what the speaker asserted, a situation in which it is true that, say, John opened the mountain. The utterance’s semantic properties are a bit like a ladder. You don’t enrich a property of the medium. In fact, you kick away the medium and enrich your representation of what an instantiation of that property (the speaker’s utterance) was an epistemic substitute for. Recanati seems to miss the point that the cognitive responsibilities of speakers and their intended audience must be carefully distinguished. He writes: Understanding what is said involves entertaining a mental representation of the subject-matter of the utterance that it both determinate enough (truth evaluable) and consciously available to the subject. This suggests a criterion, distinct from the minimalist criterion, for demarcating what is said. Instead of looking at things from the linguistic side and equating ‘what is said’ with the minimal proposition on arrives at through saturation, we can take a more psychological stance and equate what is said with (the semantic content of) the conscious output of the complex train of processing which underlies comprehension. (Recanati, 2004, p. 16) Recanati’s more psychological stance oversees a crucial point: when a hearer evaluates what the speaker said, she is not evaluating the enriched mental representation she construed (i.e. her acquired belief) upon hearing the speaker and she is ultimately responsible for, but, as we should expect anyway, what the speaker is responsible for. On the other hand, an intended audience will blame itself when it experiences an utterance as aberrant, for it is the intended audience
Aberrant Sentences
79
that is responsible for construing a rich representation of what the utterance is an epistemic substitute for. An extreme example: if John cuts the cake the way one cuts the grass (don’t ask me how!) and the speaker uttered the sentence ‘John cut the cake’, then what he said was true. (Of course, it would be very uncooperative to use that sentence without further comments in this scenario). The possibility that I construe a representation which turns out not to be an epistemic substitute for the scene or situation the speaker had in mind (perhaps he witnessed the strange events himself) entails that what I come to believe is false, but not because the speaker said something that was false. The speaker can always claim that what he said was true. A less spectacular case: If John cuts the cake in 1.000.000 little pieces (suppose this is what the speaker intended me to form a representation of, and the speaker was confident that his utterance would have that ‘presentational force’ for me) then when I recognize his utterance as an epistemic substitute for seeing a scene or situation in which John cuts the cake in four equally sized pieces, I am to blame. Again, what the speaker said was true. The intended audience is not construing enriched mental representations for the purposes of understanding the speaker, but for the purposes of putting itself in an optimal epistemic position, i.e. a position that matches as closely as is cognitively feasible the content of a representational state she would have been in had she been in perceptual contact with the situation of which it is true that p. That seems to be the intuitive purpose of enriching. That process will be steered by what the speaker said, but it will not modify the semantic property he instantiated in uttering is assertion. Conversely, the speaker should not be considered responsible for how the representation he caused is enriched by the hearer so as to match, from her perspective, the situation the assertive utterance of a sentence was intended to be an epistemic substitute for. That is something the intended audience is responsible for. The assertive use of sentences has a distinctive feature it shares with other forms of mediated perception. Ordinary sentences, for an intended audience that is suitably cognitively equipped, have a presentational force. The semantic properties of the sentence usually go unnoticed, because they are intended to engender a representation of what the utterance is an epistemic substitute for. Such representations are always richer than what the semantic properties of the sentence itself contain or reveal, since we use background information to arrive at a relevant, perhaps more vivid, representation of what the assertive use of a sentence is an epistemic substitute for. It does not follow that the semantic properties of the sentence as used by the speaker on the occasion are thereby in some sense enriched. Aberrant sentences are perfectly well understood sentences which lack, for an intended audience, at the occasion of understanding them, presentational force –
80
Filip Buekens
for the contingent reason that the audience lacks the appropriate representational resources or background knowledge to see what type of situation the aberrant sentence is an epistemic substitute for. Strange sentences turn out not to trigger the capacity to form a representation of what the sentence used in an assertion is an epistemic substitute for, and the intended audience is left with mere semantic understanding of the sentence used. Their existence shows that semantic understanding can be made consciously available. It does not follow from the fact that we usually ‘see through’ the semantic property and the assertive use of a sentence the situation its use was an epistemic substitute for, that that semantic property was not an object of genuine semantic understanding. The condition under which a sentence uttered in a context is true - a complex semantic property – should not be confused with what the assertive use of sentences is an epistemic substitute for. What a true utterance of a sentence is an epistemic substitute for in a communicative situation is, in a metaphysical sense, fully fixed by the world, and can be endlessly further described and redescribed. Someone who understands the assertive utterance of a sentence and captures its presentational force (i.e. has suitable background knowledge to get access to a representation of the situation it was an epistemic substitute for) will form a representation enriched with whatever further cognitive material is relevant for him. The truth-condition itself is fixed by the speaker’s semantic intentions and recognized by the speaker. Semantics connects language and objects in an intended model, but it is the speaker who uses sentences to create epistemic substitutes for situations or scenes. This confusion explains an ambiguity. Recanati claims that If we know the truth-conditions of a sentence, we know which state of affairs must hold for the sentence to be true (Recanati, 2004, p. 93) which seems right, but he continues in a footnote to refer to a test mentioned earlier in his book: (t)here obviously is another way of eliciting truth-conditional intuitions. One has simply to provide subjects with scenarios describing situations, or even better, with – possibly animated – pictures of situations, and to ask them to evaluate the target utterance as true or false with respect to the situation in question. (Recanati, 2004, p. 15) And this gets the picture wrong. The test should be read according to Recanati as comparing the mental representation caused by understanding an assertive use of a sentence with what the assertive utterance of the sentence is supposed to be an epistemic substitute for. What is tested, however, is the preferential reading of mildly ambiguous sentences by showing a non-ambiguous situation and asking which reading is preferred in that situation. There is a difference
Aberrant Sentences
81
between evaluating the utterance of a sentence given a situation, and evaluating a representational content engendered by assertive use of a sentence in an intended audience. The former provides an answer to ‘Was it true what the speaker said?’, while the latter answers the question ‘Is what the audience came to represent, having heard what the speaker said, an adequate (if faint) picture of what the original utterance was an epistemic substitute for?’ The latter question is a good question about one’s capacity to enrich one’s representations with background information, but not a question about what the speaker has said. The responsibilities are different. I conclude that compositionalism is not threatened by Searle’s and Recanati’s arguments. References Aizawa, K. (1997). The role of the systematicity argument in classicism and connectionism. In S. O’Nuallain (Ed.), Two science of mind: Readings in cognitive science and consciousness (pp. 197–218). Amsterdam: John Benjamins. Bezuidenhout, A. (2002). Truth-conditional pragmatics. Philosophical Perspectives, 16, 105–134. Borg, E. (2004). Minimal semantics. Oxford: OUP. Cappelen, H., & Lepore, E. (1997). On an alleged connection between indirect quotation and semantic theory. Mind and Language, 12, 278–296. Davidson, D. (1984). Inquiries into truth and interpretation. Oxford: OUP. Evans, G. (1985). The varieties of reference. Oxford: OUP. Fodor, J., & LePore, E. (2002). The compositionality papers. Oxford: OUP. Garcia-Carpintero, M. (2001). Gricean reconstructions and the semanticspragmatics-distinction. Synthese, 128, 93–156. Geurts, B. (2002). Donkey business. Linguistics and Philosophy, 25, 93–131. Lahav, R. (1989). Against compositionality: the case of adjectives. Philosophical Studies, 57, 261–279. Larson, R., & Segal, G. (1995). Knowledge of meaning. Cambridge (Mass.): MIT Press. Moravcsik, J. (1998). Meaning, creativity and the partial inscrutability of the human mind. Stanford: CSLI Publications.
82
Filip Buekens
Recanati, F. (1993). Direct reference. From language to thought. Oxford: Basil Blackwell. Recanati, F. (2002). Does linguistic communication rest on inference? Mind and Language, 17, 105–126. Recanati, F. (2004). Literal meaning. Cambridge: CUP. Reimer, M. (1998). What is meant by ‘what is said’? a reply to Cappelen and Lepore. Mind and Language, 13, 598–604. Searle, J. (1980). The background of meaning. In M. B. J. Searle, F. Kiefer (Ed.), Speech act theory and pragmatics (pp. 221–231). Dordrecht: Reidel. Searle, J. (1983). Intentionality. Cambridge: CUP. Travis, C. (1981). The true and the false: the domain of pragmatics. Amsterdam: John Benjamins. Travis, C. (2000). Unshadowed thought. Cambridge (Mass.): Harvard University Press. Williamson, T. (2000). Knowledge and its limits. Oxford: OUP. Yablo, S. (2002a). OUP.
In Conceivability and possibility (pp. 237–276). Oxford:
Yablo, S. (2002b). Coulda, woulda, shoulda. In T. Gendler (Ed.), Conceivability and possibility (pp. 441–495). Oxford: OUP.
Challenging the Principle of Compositionality in Interpreting Natural Language Texts Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
1
Introduction
The main assumption of many contemporary semantic theories, from Montague grammars to the most recent papers in journals of linguistics semantics, is and remains the principle of compositionality. This principle is most commonly stated as: The meaning of a complex expression is determined by its structure and the meanings of its constituents. It is also adopted by more computation-oriented traditions (Artificial Intelligence or Natural Language Processing – henceforth, NLP). Although its adequacy to “real” language clearly raises several difficulties, it is almost never challenged and the presuppositions on which it relies are seldom questioned. Rather than proposing yet more arrangements to overcome some of its drawbacks, the point of this paper is to make a more radical critique which addresses first the very notion of meaning. For us, a semantic theory should focus not on finding “the meaning of a complex expression” but on drawing appropriate inferences. This shift of focus makes it possible to adopt an approach which fulfills the tasks that the notion of meaning is supposed to fulfill, without an explicit (and compositional) construction of the meaning of parts and wholes. The paper is organized as follows. The first part explains the reasons why compositionality has been so widely adopted and compares and contrasts what compositionality presupposes about natural language semantics with our own conception of language and semantics. The second part describes in more details and through many examples, the problems raised by the compositional hypothesis. The last part suggests a new framework which could open the way for an alternative approach. Address for correspondence: Franc¸ois L´evy, University Paris Nord, Av. J. B. Cl´ement, 93430 Villetaneuse, France. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
84
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
Why has the compositional hypothesis been so widely adopted? The principle of compositionality offers many advantages. – First, it can explain our ability to understand sentences we have never heard before. In the literature (Fodor & Pylyshyn, 1988), this ability is considered from two angles: ◦ productivity: human beings can produce/understand infinitely many new complex sentences from finite syntactic and semantic resources; ◦ systematicity: each word makes a uniform contribution to the meaning of the infinite number of sentences in which it may occur. In other words, once we have understood some complex expressions, we are able to understand others that are obtained by combining the same constituents in a different way: although the sentence is new, we are able to work out its meaning because we already know the meaning of its parts (preserved through recombination), and we also know the rule to put all those meanings together in order to get the meaning of the whole sentence. – Second, and from a practical point of view, the compositionality principle offers a simple way to compute the meaning of complex linguistic expressions. It is one of the easiest ones that one could imagine: starting from the meanings of its constituents and following its syntactic structure, the building of the meaning of the complex expression proceeds step by step, incrementaly, from the most elementary constituents to the most complex ones. – Third, the assumption that natural language semantics is entirely determined by syntax and lexical semantics, and that the construction of meaning is an incremental process, entails a modular vision of the interpretation process. Modularity therefore becomes the basis of the architecture of most NLP systems, so that it is possible to clearly separate a (syntactico-)semantic module from a (pragmatic or inferential) module which integrates world knowledge and the elements about the situation of utterance. The first module, which has to translate natural language expressions into some suitable language (a formula of some logical calculus, or feature structures, or whatever, depending on the underlying theory), is directly concerned with compositionality and concentrates all the efforts. The second module, often considered as less linguistic in nature, is also less connected with the issue of compositionality. As it is not described as formally as the first module, it is often considered of less scientific value and thus postponed to a later phase.
Challenging the P. C.
85
Some presuppositions of compositionality The basic terms of the compositional hypothesis (meaning, structure), as we expressed it at the begining of this paper, are supposed to refer to clear-cut and unproblematic categories. They hide in fact at least three presuppositions. – The first presupposition concerns the issue of meaning. The words which compose an expression, since they are the building blocks of the construction, are supposed to come with self-contained predetermined meanings; if compositionality is assumed in its strongest sense, this meaning must be independent, both of the other words that compose the expression, and of the situation in which the sentence is uttered. A weaker form would assert that the meaning of the whole is determined by the meanings of the parts when they are fixed, but such a statement is circular, since it allows the final meaning of the parts also to depend on the meaning of the whole. – The second presupposition concerns the notion of constituent and structure and is clearly related to syntax. Syntax embodies both the decomposition of constituents and the structure of complex expressions. These are supposed to be the ingredients in the compositional process. Hence two hypotheses are needed. First, decomposition and structure are independent of the resulting meaning (otherwise we would be trapped in a vicious circle). Second, a given syntactic structure is always involved in the same way in the elaboration of the meaning of the corresponding complex expressions. Thus, a compositional analysis generally assumes the coupling of every single syntactic rule with a semantic rule. The latter assigns a meaning to the structure built by the former, so long as it is provided with the meaning of its input, i.e. of the constituents on which it operates. – The third one is about the process which constructs the meaning of an expression: note that at the formal level (mathematical for instance), a determined relation does not need to be algorithmically constructible. Nevertheless, compositional theories consider that determination is constructive, i.e. they consider that there exists an incremental determination process, running stepwise from the most elementary constituents to the most complex ones, composing at each step the meanings of the sub-expressions (obtained at the previous step) and following the syntactic rule underlying the constitution of the actual node. It should be observed that this common ground commits neither to a specific conception of meaning, nor to a specific approach to syntax, as long as the necessity of the notion of meaning is not challenged. It consists just of principles that are applicable to whatever semantic theory one chooses in order to assign meanings to expressions of a language: Montague grammar (Montague, 1974),
86
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
DRT (Kamp & Reyle, 1993), etc. So, theories adopting the compositional hypothesis can still diverge on several points : – the way in which syntactic and semantic analysis is done; either the syntactic analysis is considered as operating first, and as providing its output prior to any semantic analysis; or the syntactic and semantic analyses are combined as in HPSG (Pollard & Sag, 1994): the syntactic rules integrate the semantic information obtained from the elements which are composed using this rule. – the form of the semantic output: a logical formula in first order logic, a lambda expression, a typed-feature representation, etc. Beyond these differences, they all agree that the process of interpretation results in attaching to each expression of a text its ‘correct’ meaning, and builds a (possibly under-determined) single end product. The result of the semantic step would provide an adequate representation of ‘the’ meaning of the text which, in turn, can either be interpreted in the appropriate models within theories where the notion of truth plays a central role and/or can serve as input to a pragmatic module that may inflect the interpretation, depending on extra-linguistic parameters. Our conception of semantics For us, the compositional hypothesis and the presuppositions noted above are strongly related to the generative point of view on language, where emphasis is given to the innate human ability to speak and understand, and where this ability is viewed as completely disconnected from the surrounding world. In our opinion, language should primarily be seen as a flexible means of communication. Therefore, we see no point in maintaining a tight separation between linguistic factors (morpho-syntax and semantics) and the environment in which linguistic productions are uttered (pragmatics). On the contrary, we take all dimensions of language as being equal contributors. In more concrete terms we are trying to emancipate ourselves from the influence of the currently most popular semantic theories, and so we think necessary to ask, in naive terms, the prior question: what exactly does interpreting a text amount to? The simplest idea that comes to mind is the observation that when someone has understood a text s/he is able to answer questions about it1 . Drawing the 1
The exercises proposed at school in order to assess whether a student has understood a text, or to check whether someone has fully understood a text written in a foreign language, confirm this opinion: the judgment will be positive if the person correctly answers a list of questions about it.
Challenging the P. C.
87
inferences that are intended from a given text is clearly an ability shared by humans. Many psychological experiments confirm this ability when they show that inference (McKoon & Ratcliff, 1992) is everywhere, even for the interpretation of a simple noun phrase or for the basic comprehension of a text. On the contrary, the ability to spell out ‘the’ meaning of each constituent of the text is rather more difficult to obtain than a judgment of whether an assertion is trivially entailed by a text. Many experiments on corpus sense tagging2 (e.g. V´eronis, 2001) show that when humans have to tag a corpus according to a list of senses provided by a common dictionary, they disagree widely in their judgments. The rate of disagreement is so high for some words that it sometimes reaches the level of chance. On the other hand, validating some (theoretical) object as being the proper meaning of a term or of a structure requires proving that this meaning can help interpretation in every situation where this structure is uttered. It may appear unfair to some readers to compare the task of inference, which takes place at token level, with the task of attributing meaning, which it would be better to put at type level. As the only empirically available objects are utterances, i.e. tokens, it should be no surprise that asking people questions about a text gives much more coherent results than asking them about meanings, i.e. entities which are not directly accessible. So, it is easier to test (and possibly refute) theories that concern the interpretation process than theories of meaning. This is the reason why we put inferences at the core of semantics; moreover, this allows us to consider language in all of its interactions with the world, and to see the interpretive process as a huge play of interactions between diverse pieces of information coming from various sources that are linguistic, or related to the world and to the situation of utterance. The interpretive process cannot be seen as decomposable into clearcut and sequential operations, but as a dynamic process which reaches a sort of equilibrium after all the interactions have been accounted for. The notion of equilibrium comes from an analogy with mechanics: a body submitted to various forces (attraction, repulsion) generally stabilizes at a point where their resultant cancels out. No force can be singled out and considered in isolation in order to determine this point. Similarly, a text is submitted to a number of sometimes contradictory constraints, and its preferred interpretation(s) correspond(s) to satisfying as many of them as possible. This analogy can be taken in a strong sense, i.e. the various constraints can be given a numerical expression, and the interpretation can be found as a minimum of some potential (Ploux & Victorri, 1998). We believe that the symbolic nature 2
In corpus analysis, associating meanings to words is called word sense disambiguation or sense tagging.
88
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
of language makes the recourse to numbers a bit artificial (Kayser, 1994), and that the idea of an equilibrium can be taken more metaphorically: as said above, the constraints that weigh on the interpretation originate from various sources (lexical, syntactic, semantico-pragmatic, extra-linguistic) and can be prioritized: some of them must be given greater importance without resorting to a total order, as would be the case for numbers. Using non-monotonic formalisms, it is possible to find solutions that satisfy as many constraints as possible, the importance of which are taken into account, and these solutions play a similar role in logic, as a state of equilibrium in mechanics. This provides us with a framework in which these interactions are modeled in a uniform fashion and which puts inference at the centre. This framework will be presented in the last section. 2
The Problems
This section goes back over some difficulties raised when confronting real sentences to the hypotheses underlying the compositionality principle. We will examine three of them, corresponding to the presuppositions underlined in the previous sections. – the question of meaning, in particular the handling of polysemy as a list of meanings, and the correlative view of interpretation as a disambiguation process, – the fact that an interpretive process should assign a unique meaning for each word of an expression, and for each expression in a sentence. – the “localism“, which forces the meaning of complex expressions to depend only on the meanings of their constituents and on their structure, and thus excludes consideration of other factors. Although these questions are not really distinct, they are separated here for the sake of presentation. The question of polysemy As its starting point is to know the meaning of the words, the hypothesis of compositionality immediately faces the question of polysemy. Formal semantics (e.g. Montague, 1974) generally ignores the question and seems to consider lexical semantics (the process by which the word meaning is obtained) and grammatical semantics (the semantic construction itself) as two separate problems. Lexical semantics assigns a meaning to a word directly; in the case of polysemy, a disambiguation process would be necessary but it is not
Challenging the P. C.
89
made explicit. Be that as it may, emphasis is put only on grammatical semantics, i.e. on the parallel constructions of syntactic and semantic structures. This method is only legitimate if the two problems are actually separable from each other! More precisely, the meaning of a word is taken as a formal argument, merely expressed by means of a symbol, often a typographic variant of the word itself (e.g. adding a prime: man’ stands for the semantic value of the word man). Consequently, the lexicon is almost void of content (just a list of empty symbols)3 . Polysemy is therefore seen as a kind of exception that must be got rid of, one way or another. The most popular expedient to circumvent the obstacle consists in assigning a priori to a polysemic word a fixed set of symbols, often distinguished by a number (e.g. play-1, play-2, etc.). Although frequently adopted, this solution does not make any distinction between homonymy and polysemy and leaves aside two questions on which the validity of the solution crucially depends: – how many symbols should correspond to a given word, – the set of symbols being given, how to make explicit the procedure enabling the correct symbol to be chosen for a given occurrence of the word. The compositional process is supposed to operate on the “right” symbol, as if the two questions had satisfactory answers, i.e. as if the disambiguating procedure could be completely independent of the problem that composition is said to resolve. Now, evacuating this disambiguation phase and eluding the criteria which must guide it weaken the grounds on which the theory stands. Another popular way to escape the problem is to consider as often as possible that words (except in the case of homonyms) have a single (literal) meaning and that every shift of meaning is a metonymic use, or some other trope, where tropes are to be handled separately. Labeling uniformly every non literal meaning as, say, a metonymy does not help at all in finding the actual contribution of a word to the meaning of an expression in which it occurs. Some recent approaches tackle this question instead of evading it, and try to work with a restricted form of the listing of senses in the lexicon, thereby eliminating from it the meanings which can be generated through composition 3
In some extensions (Partee, 1995), lexical meaning postulates are added in order to describe some semantic properties associated to lexical items. They aim at connecting formal semantics with lexical semantics and at constraining the possible models which satisfy the formula corresponding to the sentence. But the lexical information expressed in these postulates remains too rudimentary to be considered as really solving the problem.
90
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
mechanisms (Pustejovsky, 1995; Briscoe & Copestake, 1995; Ostler & Atkins, 1991; Nunberg, 1995). These mechanisms, including coercion or selective binding, account for some cases of systematic polysemy by exploiting richer lexical entries, but not for all, as we will now see. The entries associated to nouns assign, in their weakest form, a type to each meaning. Lexical entries for verbs express their selection restrictions in terms of types imposed on their arguments. A verb is considered as acting as a filter on the meanings associated to its arguments. As numerous examples have proven the filters to be too rigid, type coercion softens the constraint. But the rules governing coercion are far from obvious. According to the most popular approaches, the meaning of a constituent includes the specification of a qualia structure, which encodes form, content, agentive and telic roles. When combining lexical items, the composition mechanisms can exploit this information and select from these complex structured and typed lexical entries, the type which is appropriate to the current composition rule, even in the case where there is an apparent type mismatch. So, these approaches can handle homogeneously examples such as to begin a book/a pizza, or as a fast typist/car, etc. Now, in order to remain at the linguistic level, these techniques force the composition process to rely on a restricted operation on meaning: type matching. This is not always possible. Let us take the example of the French verb surveiller (to watch, to look after, to invigilate) and some of the very common verbal phrases in which it occurs: surveiller la piscine, les enfants, un examen, un gˆateau, sa valise (to ‘watch’ the swimming pool, the children, an exam, a cake, your luggage) (Gayral, Pernelle & Saint-Dizier, 2000). The objects of these verb phrases are very different from each other and they could hardly be seen as instances of the same type. If we try to characterize their common properties when put in relation with surveiller, we can observe that what is actually watched are processes related to the object rather than the object itself: actions performed in or around a swimming pool, actions done by a child, the cooking of a cake, the fact that your suitcase is still there, etc. The common property of the object arguments is thus the existence of a process involving them, which may lead to a negative state: a child may have an accident, your cake may burn, your luggage may be stolen. The constraints which are at work are very difficult to characterize in a systematic way and cannot be summed up just by saying that finding an activity related to the noun would solve the problem. Quite a lot of world knowledge is involved, which can clearly not be found, either in the definition of adequate types, or in qualia. For example, the telic role of the object swimming pool would contain the predicate ‘swim’, but no predicate describing accidents. Even if the notion of accident is somehow associated in our mind with a swimming pool, this cannot be considered as purely lexical information. The situation is
Challenging the P. C.
91
even worse in the case of children or luggage. The only solution is the reference to world experiences, to domain knowledge, stored, for example, in scripts (Schank & Abelson, 1977). The problem of co-presence With the compositional hypothesis, the interpretation of a portion of text has to be unique, and this unique interpretation must be available for every other part of the sentence. But this postulate can be challenged too, particularly when faced with a phenomenon that we call co-presence (Gayral, Kayser & Pernelle, 2001) which is illustrated in the following examples. (1) J’ai d´epos´e l’examen de mercredi prochain sur ton bureau (I put next Wednesday’s exam on your desk) (2) Les examens de la semaine derni`ere ont d´ej`a e´ t´e corrig´es (last week’s exams have already been corrected) (3) La voiture passe au feu rouge (the car is going through the red light) (4) Etant arrˆet´e au feu tricolore4 (rouge) (While waiting at the threecoloured light (red)) In (1) and (2), examen refers both to an event (because of its association with a date) and to a physical object (sheet(s) of paper), according to the verb: you can neither put nor correct events! In (3), the expression feu rouge refers both to a location (the car moves through a location) and to a time (the event happens when the light is red); so, the preposition au introduces both a place and a time complement. In (4), the two adjectives tricolore (three-coloured) and rouge (red) do not qualify the same facet of the meaning of feu: one is static, the other dynamic. In systems relying on the compositional hypothesis, co-presence, although fairly frequent in natural language, either yields unsolvable contradictions or remains unseen. From the various observations of this section, we conclude that the solutions discussed here (e.g. Pustejovsky’s generative lexicon) are half measures. They recognize the need for more knowledge and for the complexification of the composition procedures, and they believe that in this way the compositional hypothesis can be saved. But, they want qualia to remain within “linguistic” boundaries whereas the required knowledge cannot in general be reduced to a small number of role fillers, nor can it be stored in lexical entries. 4
In French, traffic light are officially known as “three-coloured lights”
92
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
Even if lexical knowledge were available, which would be rich enough for this objection to be overcome, another one would still remain and will be considered now: meaning cannot be determined on the basis of the portion of text within the scope of the syntactic construction. This corresponds to our next criticism. Localism and incrementality of the interpretation process in question Let us briefly explain what is meant exactly by localism and incrementality. According to compositionality, the only information that is available at a given node comes from the semantic information assigned to the immediately dominated nodes, together with the syntactic information determined by the combination method. That is what we call localism. By “incremental”, we mean that the interpretation process follows a strict order, the same order in which the constituents are combined to form complex expressions, step by step in a deterministic manner. At each step, the process builds the semantic value of the current node and makes it available for further steps. A semantic value for a given expression, once acquired, cannot be changed. The consequences are twofold. First, assigning a semantic value to a node cannot depend on factors which are exterior to the portion of sentence currently being analyzed. Second, once a meaning has been assigned, it will never vary and will remain the same regardless of what the subsequent constituents (phrases, sentences, discourse) may reveal. The following sections show that numerous counter-examples can be produced. Cases where other parts of the sentence play a role Consider for example the case of a simple sentence of the form NP1 V NP2 (a transitive verb V with subject NP1 and object NP2). The incremental process is generally supposed to start by combining the verb with its direct object, thus yielding a meaning for the verb phrase, and then to combine the subject with the verb phrase to find the meaning of the whole sentence (our argument would also hold for different orders of composition). That prevents the subject from influencing the meaning of the verb phrase. However, the contrast between (5) and (6) is striking. (5) L’orage a coup´e la route (the storm cut off the road) (6) Ma voiture a coup´e la route (my car cut across the road) The meaning of the verb couper (to cut) and thus of the verb phrase depends on the subject. In the first case, it means ’to obstruct’ whereas in the second it
Challenging the P. C.
93
means ‘to drive from one side of the road to the other’. In English, the problem would not arise because of the phrasal verbs. Another case is the influence, crossing the syntactic tree, of the subject over the meaning of the object, the meaning of the verb remaining more or less invariant. Let us consider: (7) Le docteur impose un r´egime terrible (The doctor imposes a drastic diet) (8) Le dictateur impose un r´egime terrible (The dictator imposes a drastic regime) The verb phrase (V NP2) is identical in both cases, but NP2 has different meanings depending on the subject. In other cases, even with an identical subject NP1 and an identical verb phrase, the interpretation of the whole sentence depends on a prepositional phrase, as in (9) and (10): (9) Suite a` des inondations, les gendarmes ont coup´e la route (After a serious flooding, the police cut the road off) (the police cut = the police stopped the traffic) (10) Entraˆın´es dans une folle poursuite, les gendarmes ont coup´e la route (caught up in a wild car chase, the police cut across the road) (the police cut = the police drove across) The contrast between (9) and (10) is a good illustration of the fact that sentences sharing the same form should not be interpreted in the same way. The difference cannot be postponed to a further pragmatic step: (9) has a durative aspect and take a time adverb pendant (during); by contrast, (10) is telic and cannot take this adverb. This is so because in (9), our knowledge about floods enables us to infer that the road is obstructed with water, branches or mud, and that these obstacles are not going to disappear instantly. In (11) and (12) below, the different complements influence both the interpretation of the verb quitter and the object phrase l’´ecole. In (11), quitter has the interpretation of a physical event, and l’´ecole means the building. From (11), a norm-based inference would conclude that the next day, Paul will return to school. This inference seems implausible in (12), where quitter has a more abstract meaning, with a permanent consequence and l’´ecole corresponds to the institution rather than to a specific determined school: the main inference drawn from (12) concerns a choice in Paul’s life. (11) Paul quitta l’´ecole par la fenˆetre de derri`ere (Paul left the school through the back window) (12) Paul quitta l’´ecole a` 16 ans pour devenir apprenti (Paul left school5 at 5
French makes no grammatical difference between the institution and the place
94
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
16, in order to become an apprentice) Cases where several polysemic words constrain each other and ultimately yield a non-ambiguous interpretation The constraints at play are not one-way and no ordering in the composition of meanings can account for the equilibrium finally reached. In the following examples concerning combinations of the words examen (exam) and laisser (to leave), no single part of the sentence is sufficient to determine the semantic value either of the verb, or of its object: (13) J’ai laiss´e l’examen de math´ematiques sur ton bureau (I left the maths exam on your desk) (to leave = to put, exam = paper on which the wording of the exam is written) (14) J’ai laiss´e l’examen de math´ematiques a` Paul qui est plus comp´etent (I left the maths exam to Paul who is more competent) (to leave = to entrust, exam = task of writing the subject) (15) J’ai laiss´e l’examen de math´ematiques pour l’an prochain (I left the maths exam for next year) (to leave = to postpone, exam = whole process of evaluating students) This co-influence leads to some sort of circularity that conflicts with the incrementality required by the compositional hypothesis. Cases where information external to the sentence itself is needed In the last three examples as in almost all of the previous ones, the necessary clues come both from factors present in the sentence and from a vast amount of world knowledge. The problem at hand is not related to the complexity of the sentences since even the interpretation of simple expressions encounters the same difficulty (Fabre, 1996). Let us take noun phrases, as (16–18) below, with just a variation in the noun of the prepositional phrase. The variation concerns words (virage, garage, ravin) which are a priori members of the same semantic category (spatial location); the preposition dans, which can have many values, takes in each of these cases its spatial meaning. Even so, the inference of the position of the car relative to the location indicated is very different in each case: (16) la voiture dans le virage (the car in6 the bend) (17) la voiture dans le garage (the car in the garage) (18) la voiture dans le ravin (the car in the gully). 6
Literal translation from French
Challenging the P. C.
95
In (16), the car is on a road (a 2-D surface); in (17), it is enclosed in a closed volume; in (18), it is located ‘inside’ an open volume. These varying interpretations and the different inferences they trigger come neither from strictly lexical knowledge, nor from syntactic knowledge, but more from our experience of cars. In the same vein, the adjectives red, green, yellow, three-coloured, etc. qualify the color of the object corresponding to the noun they are associated to (flag, cloth, car, etc.). Nevertheless, as we saw in examples (3) and (4) above, when three-coloured is applied to traffic lights, it refers to the permanent potentiality for the equipment to change to three lights, each of a different colour, consecutively. On the contrary, when green is applied to the same traffic light, it takes a temporal interpretation (time when the light is green). This shows that the way the meaning of a qualifier affects the meaning of the noun it qualifies depends on elements which are external to the noun phrase. An argument that is not going to be developed here, because it is too remote from the main theme of the paper, although it is relevant to the compositional hypothesis, concerns syntax itself. The hypothesis requires the knowledge of the syntactic structure of the sentence, i.e. the syntactic nature of the constituents and their respective roles (subject, object, etc.). In all the examples that have been examined here, this requirement is met, because the examples were deliberately kept simple. In sentences taken from real corpuses, the exact determination of the syntactic nature and role of each constituent may turn out to be the matter of much debate whereas the semantic interpretation of the whole sentence is indisputable. Cases where a plurality of worlds is needed Other examples are related neither to the polysemy of words, nor to a variable interpretation of a given syntactic pattern. An implicit hypothesis of compositionality is that reference is obtained by applying the (referential part of) meaning to the world at hand. But sentences often allude, explicitly or not, to a plurality of worlds, particularly when there is a question of time and when reference has to be considered at different moments. This phenomenon appears with the intensional interpretation of an expression denoting a function (e.g. ‘president’) which can refer to a specific person (the ‘token’ reading) or to the person who carries out the function, whoever s/he may be (the ‘type’ reading). It appears too with the interpretation of some plural nominal expressions which require several worlds as well as relations between them. Whereas the first phenomenon has been widely studied, less attention has been paid to the case of the plural. Let us examine the question through the following sentences which, though they present an identical plural expression in exactly the same syntactic pattern, lead to different interpretations:
96
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
(19) Depuis 3 mois, mes clients viennent de plus en plus souvent (during the last three months, my customers have been coming more and more frequently) (20) Depuis 3 mois, mes clients viennent de plus en plus de loin (during the last three months, my customers have been coming from more and more distant places). In (19), the expression mes clients (my customers) refers to a fixed set of elements. From (19), one can infer something like: the typical element of this set comes more frequently than he did 3 months ago. In (20), the same expression refers to a “varying” set. Understanding it requires creating several sets that are not necessarily made up of the same people. What (20) induces is that the typical element of the current set lives in a more distant place than the typical element of an earlier set. These two possible interpretations of plural noun phrases are frequent. So, the classical interpretation of plurals, namely that a plural noun phrase refers to a collection of individuals, is not sufficient to account for collections persisting over time, where their members change as in (20). In (Gayral, Kayser & L´evy, 2001), we draw a distinction between two interpretations: – a de dicto interpretation allows the collection to vary, but the way the collection is named holds at any time, as in (20), – a de re interpretation refers to a fixed set of elements, as in (19). The way the collection is named helps define the members of the set without necessarily remaining relevant for the whole sentence as in the striking example (21): (21) The fugitives are now in jail. (Enc, 1986) Contrary to some linguists, we cannot consider that this double interpretation can be solved by evoking an ambiguity, either a lexical one – the words customer, fugitive,... 7 , in themselves, would be ambiguous and could give rise to a de re (token) interpretation or a de dicto (type) one – or a grammatical one: the plural definite determiner would be ambiguous too. Indeed, the mechanisms which trigger the de dicto interpretation are far from straightforward and rely on different knowledge which can be neither strictly enclosed within the lexicon, even if enriched with qualia, nor associated to some syntactic clue. External factors, such as the ability for the given individuals to satisfy a given property or to be involved in a given act, play a prominent role in the choice of a de dicto or a de re interpretation. Even if some markers can be exploited, we are again faced with a case where the ‘correct’ interpretation of a constituent cannot come 7
And all nouns that can give rise to similar de re/de dicto interpretations.
Challenging the P. C.
97
‘from inside’ and the de dicto or de re interpretation of a plural is thus ‘clearly not compositional 3
A Possible Solution
Many of the drawbacks that have been mentioned so far are well known by the advocates of compositionality. Yet this does not seem to weaken their belief because, as we have seen, the compositionality principle is one of the easiest that one could imagine; it provides clear starting points for the interpretation process; it avoids the intervention of non linguistic knowledge, etc. It is reassuring since it applies the Cartesian principle of dividing difficult problems into small chunks, and thus avoids having to embrace all at the same time the lexicon, context, and world knowledge. What is more, this option allows NLP systems to be modular. But, as we have seen, it often leaves aside the pragmatic-inferential module, which is not considered to be directly concerned with language. On the contrary, we think that inference is necessary at every level of the interpretation process and that mastering a linguistic expression amounts to being able to draw a core set of inferences that fix its semantic relations to other expressions, and to the situation in which it is uttered. Contesting the existence of clear-cut distinctions between syntax, semantics, pragmatics, discourse, etc. we consider the interpretation process as a huge play of interactions between information coming from very different sources: linguistic (syntactic, semantic) and pragmatic, in both of its usual acceptations, i.e. related to the world and to the situation of utterance. According to this way of thinking, the interpretation process is no longer decomposable into sequential modules, and so we replace a compositional interpretation process, in which constraints propagate one way, by a system in which constraints can propagate both “bottom-up” (e.g. starting from the lexicon and the syntax) and “top-down” (e.g. starting from world knowledge) until one or more points of equilibrium are reached. We need thus to build a system where all the constraints can be expressed. This system must embody a reasoning tool able to draw the adequate inferences and to cope with the different phenomena mentioned above. A reasoning tool The project of computer scientists, and more specifically of Artificial Intelligence, regarding language understanding is to build a symbolic and logical system which offers an artificial language able to express the varied constraints at play. It is clear that these constraints, whatever level they come from (linguistic, world knowledge, information about the situation of utterance) are ‘soft’ constraints that can be broken and that can be potentially conflicting.
98
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
They can be broken within the ‘academic’ view of language: most, if not all, traditional grammars contain rules having exceptions, and despite the efforts of formal linguists, it is hard to give an account of a full language without resorting to the notion of exception. They are even more often broken, with little or no harm at all concerning comprehensibility, in everyday (non academic) speech and writing. This obvious fact did not particularly worry that much the advocates of strict compositionality either. Therefore non-monotonicity (see the Special Issue of the Artificial Intelligence Journal on Non-Monotonic Logic, 1980), i.e. the logical property making it possible to handle rules with exceptions, is the first requirement for an adequate language-understanding system. Second, as said earlier, our rules express both hard constraints and soft ones. Among the latter, the ability to order exceptions hierarchically and to define priorities is of great importance to be able to handle the phenomena described in this paper. This ability prohibits a property called semi-monotonicity, the technical definition of which cannot be given here; roughly, it means that, if the “hard” knowledge remains the same while ”soft” constraints are added, the derived consequences can only increase. As has been observed (Brewka, 1991), this property is incompatible with putting priorities on the ”soft” constraints. Therefore it is necessary to select a non-monotonic inference system that lacks semi-monotonicity whereas most of them are semi-monotonic. Third, we have insisted on the fact that the interpretation process is not oneway and must account for the mutual influence between the interpretation of a word and the interpretation of its context. This co-influence leads to a form of circularity. The only possible way of overcoming this difficulty is to consider this process as the reaching of equilibrium. Technically, this corresponds to finding a solution to a fix-point equation: provided the sentence is comprehensible, the interaction between its different parts leads to a stable state making it possible to infer the interpretation of the whole sentence, from which one can hopefully derive the correct interpretation of each of its parts8 . We have adopted Reiter’s Default Logic (Reiter, 1980), which is one of the best candidates for these requirements. – It is a default logic in which the equilibrium reached by a system of constraints is expressed as the solution of a fix-point equation. Each solution is an ‘extension’ (i.e. a set of propositions considered as derivable), and this is fully appropriate as an account of circularity. – It is a non-monotonic inference system that lacks semi-monotonicity and 8
The system does not necessarily lead to a unique final state. Cases where several “stable states” are finally reached correspond to cases where the initial sentence is truly ambiguous.
Challenging the P. C.
99
can represent precedence among rules by the use of semi-normal defaults. – The fact that the fix-point equation sometimes accepts several solutions is more an advantage than a problem as we have just seen. We will show how most of the problems encountered by the compositional hypothesis can receive a tentative solution in the framework of a non-monotonic, non-semi-monotonic inference system. Accounting for linguistic and encyclopedic knowledge Using our approach, we write logical rules in order to express various constraints, possibly combining diverse levels of knowledge. The rules express regularities concerning the behavior of words, both when they interact with other words within some syntactic combination, and also relative to world knowledge and to the situation of utterance. Concerning lexical information, our rules do not operate on the words themselves. We do suppose that words are associated to categories (semantic classes), but not in the sense of giving them types or qualia structures. As inference is central in our framework, the main purpose of a category is to factor out a common inferential behavior. This is very different from viewing categories – as in extensional semantics – as denoting sets of entities (Gayral & Kayser 2001). The criterion for categorization is thus inference invariance. For example, in a corpus of reports of road accidents (see below), the word feu (traffic light) shares with stop (both the signpost and the line on the ground), traffic sign, yellow line, etc. the functionality of regulating the traffic; all these words are related to a “legal framework”; their meaning is pure convention; their physical materialization should be made very visible, and so on. So, for this corpus, we need a category RoadTrafficObject which bundles this information, encapsulates deontic knowledge about what this sign allows/forbids, and thus triggers inferences concerning whether a user does or does not respect the traffic rules associated to the sign. Feu possesses another type of property in this corpus: it is both a physical object (generally a pole standing at a crossroads) and an electrical device having the functionality of emitting coloured lights following a given temporal pattern. The property of being both an object and a functional device is shared by other words such as car, warning sign, engine, etc. These common factors lead to a category Device with three facets: the physical object, the mechanism it houses, and the device-related process which has time-dependent properties. These categories are crafted just for a given purpose: they only correspond to groupings leading to correct inferences with respect to a specific context. They are not claimed to be “universal”, nor are they context-independent.
100
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
For grammatical information, we suppose a surface syntactic analysis leading to simple “grammatical” predicates such as subject, object, etc. linking word occurrences together. Rules give a default semantic interpretation to these grammatical relations. Concerning world knowledge, two important ways to structure knowledge are what Marvin Minsky introduced under the name of frames (Minsky, 1975) (the expression of how a concept is related to the entities that are directly relevant to it, and to its typical uses) and what Roger Schank named scripts (description of the unfolding of events it evokes). For example, the word car evokes its different parts (wheel, etc.); the person(s) who normally use(s) it (the driver(s)); the persons or goods that can be transported; its typical locations (road, garage, etc.), and many scripts related to driving (normal course of events as well as problematic ones: speed limits, traffic lights, jams, breakdowns, accidents, etc.). At any given level, pointers to other scripts are available (the script of puncture related to the concept wheel, the script of being repaired related to breakdown, etc.). Constructing the interpretation Our system will therefore be efficient if it is able to satisfy the three following requirements: – To be able to write a set of rules adapted to its contexts of use. This requires determining the categories at play in these contexts; the static or dynamic binding of words into those categories; in the case of dynamic binding, the factors that determine the membership; and finally the effective writing of non-monotonic rules expressing the linguistic and extra-linguistic constraints concerning a domain. – To be able to design a reasoning system that finds an equilibrium from the constraints which are active in a given interpretive process. At the beginning of the process, only the inference rules that have word occurrences in their premises are applicable; in their conclusion, they determine specific (and provisional) semantic values (categories) associated to these occurrences and semantic relations between them. At that point, inference rules using semantic values in their premises become accessible as well. And the process can go on. – To be able to insure tractability and convergence; the play of constraints has to reach, in a tractable way, one or more final stable states which correspond to an adequate interpretation of the input. It is clear that it is hard to satisfy these requirements in a general way. It can be done only in very restricted situations where the vocabulary, as well as the
Challenging the P. C.
101
world knowledge involved, are limited. We have been working for several years on a particular corpus of car-crash reports (L´evy, 1994). Just to give the flavor of what our system will look like, let us describe a mechanism corresponding to how we deal with one of the problems discussed above, namely co-presence. In the following paragraphs, we assume that the reader is fairly familiar with Reiter’s logic (Reiter 1980) and we abridge the original notations to make the rules more readable. A : B [R1] is a shorthand for A:B∧R1 which means: if A is B true and if the conjunction B ∧ R1 is consistent with what is known, then B is true. Let us now consider a word A (e.g. feu) which can belong to two possible categories, say B (e.g. RoadTrafficObject) and C (e.g. Device). Each possibility is enforced only when it is considered as making sense in the given context. For the sake of simplicity (a full development would, step by step, carry us too far away), we represent this state of affairs by two propositions BM and CM (read: assigning A to category B, resp. C would be meaningful in the present context). Two defaults will be a priori applicable: (d1) A ∧ BM : B [R1] and (d2) A ∧ CM : C [R2], i.e. as long as B ∧ R1, resp. C ∧ R2 is consistent (see below), the meaningfulness of an assignment is a sufficient argument to consider it as a legitimate choice. Now, in most cases, considering A as a B makes the choice of C inappropriate and vice-versa. We could express this by a “strong” exclusion statement, e.g. B ⇒ ¬ C, but this would forbid the possibility of co-presence, whereas this is precisely what we want to be able to have. We thus adopt a weaker form: (d3) B : ¬R2 [R3] and (d4) C : ¬R1 [R3] If we want to accept the co-presence of the two interpretations, it is possible to override the mutual inhibition between B and C, by: (d5) BM ∧ CM : ¬R3 [R1 ∧ R2]. To sum up, let D be the set {d1,...,d4} and D’ the set {d1,...,d5} and let A represent the fact that the word feu occurred, (i) both theories and have a unique extension containing neither B nor C: this models the fact that, in the absence of any statement of meaningfulness, no interpretation for A is proposed; (ii) both theories and have a unique extension containing B and not containing C; this models the fact that if A belonging to category B makes the sentence meaningful, then the only solution is to consider A as a B; the situation is symmetric with C;
102
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
(iiia) theory has two extensions (one containing B but not C, the other C but not B), i.e. in the absence of an explicit permission for co-presence, if both readings are meaningful, the sentence is ambiguous; (iiib) theory adds a third extension containing both B and C: if both assignments make sense, allowing for co-presence gives the choice between mutually exclusive readings (preserving the ambiguity), and a new reading in which the word enjoys simultaneously the features of the two categories. Furthermore, non-monotonicity takes into account the fact that the meaning of a sentence may appear later: suppose that, at the time of reading, no meaningful statement BM or CM is derived. According to (i) above, A is left uninterpreted; if the subsequent lines bring evidence that category B would make sense, i.e. allow the derivation of BM, the assignment of A to B becomes effective (case ii); conversely, if we are in case (ii) because of a default proof of BM, and if afterward some step of the proof is invalidated, we modify the comprehension of part of the text which, up till then, was considered as being understood. For instance, the sentence: (22) The traffic-light is damaged. is meaningful with two different interpretations (case iiia): the pole supporting the traffic light is damaged, or the pole is OK but the lights are damaged. If we read later: (23) It is lying across the road. world knowledge makes it difficult to believe that the electrical device is concerned, but rather that the pole has fallen down. Going back to the occurrence of traffic light in (22), we can no longer interpret it as a member of the category Device, and we find ourselves in the case (ii) where only its attachment to PhysicalObject makes sense. These various possibilities hint at the adequacy of semi-normal defaults to handle defeasible category assignments, and prove this tool to be flexible enough to give a local permission or interdiction of co-presence, yielding either ambiguous readings (multiple extensions), or multiple meanings of a word as part of the same reading. 4
Conclusion
We have shown that, despite the attractiveness of the layered architecture it would allow, the compositional hypothesis contains a number of defects which prevent it from accounting for the interpretation of natural language.
Challenging the P. C.
103
The main one concerns the treatment of polysemy. Even if one accepts a rough inventory of meanings limited to types or possibly enriched with qualia, nothing is said about how the choice of the adequate meaning can take place before the composition operates. To be acceptable, the ‘magic’ of this prior disambiguation phase must be disenchanted. In addition, we have shown that a lot of very plain and common expressions/sentences cannot be understood without using information originating from outside of the subtree under consideration. Now, if compositionality is disqualified, several other views, although commonly shared, must also be questioned. What is the importance of a complete syntactic analysis, if syntax is no longer considered as governing alone the building of interpretation? Is the aim of the interpretation process to label words, phrases, sentences, texts with unique meanings? And even to build a single end-product? Trying to elaborate upon these points would take us far beyond the limits of an article. But we have given some indications for a framework which might be able to overcome the main drawbacks of the compositionality hypothesis: a non-monotonic system whose rules express various constraints coming from diverse levels of knowledge, and which reaches an equilibrium when all these constraints have been taken into account. The solution we propose, although rough and incomplete, offers an alternative way of thinking about text interpretation. References Brewka, G. (1991). Cumulative default logic: in defense of nonmonotonic inference rules. Artificial Intelligence Journal, 50(2), 183–205. Briscoe, T., & Copestake, A. (1995). Semi-productive polysemy and sense extension. Journal of Semantics, 12(1), 1–53. Enc, M. (1986). Towards a referential analysis of temporal expressions. Linguistics & Philosophy, 9, 405–426. Fabre, C. (1996). Interpr´etation automatique des s´equences binominales en anglais et en franc¸ais. Application a` la recherche d’informations. Unpublished doctoral dissertation, Universit´e de Rennes. Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture. Cognition, 28, 3–71. Gayral, F., & Kayser, D. (2001). Categorisation seen as factorisation of the
104
Franc¸oise Gayral, Daniel Kayser and Franc¸ois L´evy
inferential potential. Application to the comprehension of utterances. Cognitive Systems, 5(4), 345–371. Gayral, F., Kayser, D., & L´evy, F. (2001). Plurals, time, and world knowledge. In Recent advances in NLP (RANLP). Tzigov Chark, Bulgaria. (International Conference) Gayral, F., Kayser, D., & Pernelle, N. (2001). In search of the semantic value(s) of an occurrence: an example and a framework. In H. Bunt, R. Muskens, & E. Thijsse (Eds.), Computing meaning (Vol. 2, pp. 53–69). Kluwer. Gayral, F., Pernelle, N., & Dizier, P. S. (2000). On verb selectional restrictions: Advantages and limitations. In 2nd international conference on natural language processing (NLP) (pp. 57–68). Gr`ece. Kamp, H., & Reyle, U. (1993). From discourse to logic (Vol. II). Dordrecht: Kluwer. Kayser, D. (1994). What kind of models do we need for the simulation of understanding? In C. Fuchs & B. Victorri (Eds.), Continuity in linguistic semantics (pp. 111–126). John Benjamins. L´evy, F. (Ed.). (1994). Approches s´emantiques. Traitement Automatique des Langues, 35(1). McKoon, G., & Ratcliff, R. (1992). Inference during reading. Psychological review, 99(3), 440–466. Minsky, M. (1975). A framework for representing knowledge. In P. H. Winston (Ed.), The psychology of computer vision (pp. 211–275). McGraw-Hill. Montague, R. (1974). Formal philosophy: Selected papers of Richard Montague. New Haven: Yale University Press,. Nunberg, G. (1995). Transfers of meaning. Journal of Semantics, 12. Ostler, N., & Atkins, B. (1991). Predictable meaning shift: some linguistic properties of lexical implication rules. In Springer-Verlag (Ed.), Lexical semantics and knowledge representation (pp. 88–100). Berkeley, CA. Partee, B. (1995). Lexical semantics and compositionality. In L. Gleitman & M. Liberman (Eds.), Language part I (pp. 311–360). Cambridge, MA: MIT Press. Ploux, S., & Victorri, B. (1998). Construction d’espaces s´emantiques a` l’aide de dictionnaires de synonymes. Traitement automatique des langues, 39(1), 161–182.
Challenging the P. C.
105
Pollard, C. J., & Sag, I. A. (1994). Head-driven phrase structure grammar. University of Chicago Press. Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press. Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence Journal, 13(1–2), 81–132. Schank, R., & Abelson, R. P. (1977). Scripts, plans, goals and understanding. Hillsdale, NJ: Lawrence Erlbaum Ass. Special issue of the artificial intelligence journal on non-monotonic logic. (1980, April). (Vol. 13, Nos. 1–2) V´eronis, J. (2001). Sense tagging: does it make sense? In Corpus linguistics ’2001 conference. Lancaster, U.K.
Compositionality in Plant Fixed Expressions Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
A good number of plant names in our languages offer denotations other than plants. L´evi-Strauss noticed the consistent application of plant and animal species in cultural symbolism and offered the reason that because they are easily remembered, they are convenient for symbolic thought (1963, p. 2, in Atran, 1990, p. 217). Plants are “catching” candidates for the representation of our thinking and concepts. Vervaeke and Kennedy (1996, pp. 278–79) say in Metaphors and Thought that a concept basic to people is always at the core of the metaphors we choose and create. Plants are vivid and memorable. They offer concrete image banks for languages to generate fixed expressions that capture and compose fleeting moments into words and express our cognition. A system of representations is compositional when the semantic values of complex representations are determined by the semantic values of their parts. This study presents a semantic analysis of a range of plant fixed expressions (hereafter PFEs) in Mandarin Chinese and German, such as zhuang1-suan4 (pretend-garlic = to pretend not to know something)1 and Das macht das Kraut nicht fett (that-makes-the-cabbage-not-fat = that doesn’t fatten the cabbage; that doesn’t help much). We first apply frame theory (Fillmore & Aktins, 1992) to f lower fixed expressions to reveal the compositionality of the concepts of botanical names. Then we locate the popular vehicles (plant names) to explore which plant names are rooted in the given languages, and then look into underlying conceits (the relations between the vehicles and the meanings of the PFEs) in order to confirm the findings. Different compositionalities of Mandarin Chinese and German reveal the distinct concepts and cognition in Chinese and German speakers’ minds. Address for correspondence: Shelley Ching-yu Hsieh, Applied English Department, Southern Taiwan University of Technology, No. 1, Nantai Street, Yung Kang City, Tainan Hsien, Taiwan 710, R.O.C. E-mail: Hsieh [email protected]. 1
There are basically four tones in Chinese: 1 (level tone), 2 (rising tone), 3 (fallingrising tone), and 4 (falling tone). The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
108
1
Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
Research Framework
In recent word-usage research, linguists have turned their attention to multiword units (Carter, 1998; Moon, 1998). A fixed expression (Alexander, 1978, 1979; Carter, 1987, 1998) is a string of words behaving as a unitary lexical item. Various terms are used to describe fixed expressions, such as freezes, binomials, and frozen locutions (Pinker & Birdsong, 1979; McCarthy, 1990; Landsberg, 1995; Moon, 1998; inter alia). According to Moon (1998, p. 2), fixed expressions include metaphors, similes, proverbs, sayings, frozen collocations, grammatically ill-formed collocations, and routine formulae. There are a lot of prefabricated expressions (Bolinger, 1975, pp. 107–111; Wang 1991) like fixed expressions in our languages. We use them aplenty and naturally in the daily conversation. Such expressions consist of historical culture, are stored in the long-term memories and thus shape speakers’ cognitions. When one masters a language, one uses fixed expressions adeptly (Lien, 2003). The present study examines the fixed expressions that contain at least a plant name in which the plant name has a metaphorical connotation. For example, in zhuang1-suan4 (pretend-garlic = to pretend not to know something), suan4 (garlic) metaphorically means ‘doesn’t know anything’. In etwas durch die Blume sagen (something-through-the-flower-say = to imply; to describe in an agreeable voice), Blume (flower) denotes ‘embellish’ or ‘appreciative’. R¨ohrich (1991) focuses on etymological epics and the development of PFEs. Beuchert (1995) probes into the German symbolism of plant metaphors. Liu and Qin (2001) compare Mandarin Chinese and English PFEs to promote better communication between these two peoples. These research works speak for Atran’s assumption (1990, p. 219) that plant names are convenient choices for describing human beings or human society. There are noteworthy treatises that delve into plant concepts in human cognition. Li (1959) recounts historical events and folklore that mold the Chinese concept of flowers. Chen and Gu (1999) design experiments to explore children’s cognition of prototypical plants. Wen (1986), Meng (2001), and others study plant expressions in Shijing (The Book of Odes) and reveal historical cultural life in the Zhou Dynasty. “Totemism, myth, religion and other speculative activities of the mind” are “well-defined cognitive domains” (Atran, 1990, p. 217). PFEs draw linguists’ attention. Nerlich, Clarke, and Dingwall (2000, p. 225) state that there are various reasons of describing plants and features of farming to describe humans. By examining the compositionality of PFEs, we will focus on how PFEs display human concepts and cognition. The following definitions define some terms that we use in this paper and launch the theoretical framework of the present study. Mandarin Chinese (here-
Compositionality in Plant Fixed Expressions
109
after Chinese) refers to the official language in Taiwan and in China.2 German refers to High German, the official language in Germany. Most of our raw data were collected from Academia Sinica Ancient Chinese Corpus, Academia Sinica Balanced Corpus of Mandarin Chinese, Duden Großw¨orterbuch Englisch, and the German Corpus Search, Management and Analysis System (COSMAS). The spoken PFEs were heard and gathered from conversations with native speakers over the past two years. The raw data were then categorized by vehicle and compiled in alphabetical order in Microsoft Excel for analysis. Fodor (1998) restated the compositionality principle as ”satisfying the possession conditions for a complex concept includes satisfying the possession conditions for its constituents.” Do idioms and fixed expressions agree with this principle? Nunberg, Sag, and Wasow (1994) assert the semantic compositionality of idioms or fixed expressions where compositionality is “the degree to which the phrasal meaning can be analyzed in terms of the contributions of the idiom parts” (Nunberg, Sag, & Wasow, 1994, p. 498). One given example is the English idiomatic expression spill the beans. Language users know or learn that spill the beans means ‘divulge the information’ through the compositionality of this expression: “We can assume that spill denotes the relation of divulging and beans the information that is divulged, even if we cannot say why beans should have been used in this expression....” (Nunberg, Sag, & Wasow, 1994, p. 497). The availability of these meanings for each component relies on the presence of another lexical item. Nunberg, Sag, and Wasow (1994, p. 511) advocate compositionality and also specify noncompositional flexibility in German. They highlight German noncompositional idioms. One given example is ins Gras beissen (in-grass-bite = die), whose meaning cannot be analysed from the parts of the idiom. The entire VP binds the meaning. Most German idiomatic expressions are as compositional as English idiomatic expressions, only that German has a syntactic flexibility that when an idiomatic expression (e.g., ins Gras beissen) is noncompositional, it is marked with or given a liberty (Pollard, 1984; Reape, 1996) in which object fronting and verb-second process are allowed. The present study will also deal with noncompositional expressions; the meaning of the entire expression will then be counted in order to disclose the cognition depicted in PFEs. Let us first focus on frame semantics (Fillmore & Aktins, 1992). Frame semantics links to people’s comprehension process, that is, to how we understand meanings in context: lexical meaning and grammatical characteristics of information about related words and our cultural knowledge of the world (Goddard, 2
We use “Mandarin Chinese speakers” to stand for the Mandarin speakers in Taiwan and in China in order to avoid the long debated issue of political sovereignty.
110
Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
1998, p. 69) work together in the comprehension process. Lakoff (1987) has a similar theory called “idealized cognitive models”. The next section explicates this issue. 2
Linguistic Frames of Flowers
Fillmore and Aktins (1992) propose that the meaning of a word can be understood only against a background frame of experience, beliefs, or practices that “motivate the concept that the word encodes” (1992). They give this set of verbs as an example: buy, sell, charge, pay, cost, and spend. To understand any of these verbs one needs to understand a complete ‘commercial transaction frame’: in which one person acquires control or possession of something from a second person, by agreement, as a result of surrendering to that person a sum of money. The needed background requires an understanding of property ownership, a money economy, implicit contract, and a great deal more. (Fillmore & Atkins, 1992, p. 78) In other words, this frame is a complex yet compact linguistic base for words such as buy, sell, and charge in the given society. People who do not have this linguistic frame in mind will not understand the meaning of buying and selling. Tarzan, for example, would have such difficulty. Stated otherwise, by means of the compositionality of the concepts in the related words and the background knowledge of the society, we comprehend the words and expressions that we use in our daily life. Likewise, the understanding of the vehicle f lower3 in Chinese hua1 or German Blume requires a complete ‘linguistic flower frame’ in the speakers’ minds. Flower in Mandarin Chinese In Old Chinese, the written form hua2 of ‘flower’ was a much more complicated character than the contemporary hua1. Hua2 was adopted as a loan character to render the sense of ‘glossy’ and ‘magnificent’ later on. As a consequence, the original sense of the character (viz., flower) became more and more obscure. To remedy the situation, a new character pronounced as hua1 was coined probably no later than the Qin dynasty (221–206 BC) to represent the original sense. As the new character hua1 (flower) consists of cao3 (plant) and the phonetic hua4 (’hua4’ can mean ‘change’), Todo (1974, p. 1090) regards hua4 (change) as a graph coined in terms of hui4-yi4, meaning composition, as one of the six principles of Chinese orthography. 3
In italicized letters to show its technical status.
Compositionality in Plant Fixed Expressions
111
To attain a complete picture of Chinese hua1 (flower), we call for the incorporation of the following frame of linguistic flower: Hua1 is the essence of just anything. The blossomy flower looks expansive. This most showy part of the plant can denote flourishing, dishonesty or blurriness. It also represents girl, woman, and even prostitute to Chinese speakers. This linguistic frame is rooted in native speakers’ minds and is expressed in various hua1 (flower) fixed expressions. We give one example for each concept in this frame as follows: Hua1 is the essence (kai1-hua1-jie2-guo3 openflower-produce-fruit = to blossom and bear fruit) of just anything (yi2-hua1jie1-mu4 move-flower-connect-wood = to connect A with B; to palm off the spurious for the genuine). The blossomy flower looks expansive (cong1-hua1 green onion-flower = chopped green onions). This most showy part (hua1-zhi1zhao1-zhan3 flower-branch-attract-unfold = flower branches move; woman in showy dress) of the plant can denote flourishing (bai3-hua1-qi2-fang4 hundredflowers-together-release = different styles of art or manners of thinking rising, developing, and flourishing), dishonesty (hua1-yan2-qiao3-yu3 flowerspeech-skillful-language = a lot of artful talk, pretty words) or blurred vision (yan3-hua1-liao2-luan4 eyes-flower-entangled-mess = to dazzle the eyes). It also represents girl (jie3-mei4-hua1 elder sister-younger sister-flower = a pair of sisters), woman (hua1-rong2-shi1-se4 flower-face-loose-color = [said of a woman’s face] turn pale, as from fear), and even prostitute (xun2-hua1-wen4liu3 search-flower-ask-willow = to visit brothels) to Chinese speakers. In the examples above, kai1-hua1-jie2-guo3 (open-flower-produce-fruit = to blossom and bear fruit) denotes yielding positive results out of a task, an effort or a labor. They are what people have been sweating for, thus the essence of the working process. The hua1 (flower) and mu4 (tree, wood) in yi2-hua1-jie1-mu4 (move-flower-connect-wood) can be anything that people put together, such as a wrong plan to a project or a spare part of a bike to that of a car, specifically, to stealthily substitute one thing for another. Regardless of any positive or negative arbitrariness, the linguistic frame of hua1 is integrated by the appearance of flowers: spreading (as a bud opens to a bloom), showy, woman-like (e.g., girl, woman and prostitute), and the quality of flowers: essence, dishonesty, and blurriness. The appearance and quality of flowers are also expressed in German PFEs, but in a different tone and with varied cognition. Flower in German Both Blume (flower) and Bl¨ute (flower) belong to the word group bl¨uhen (to blossom). Blume stands for the general term of flowers and florescent plants, while Bl¨ute is the blossom of a plant. The linguistic flower frame in German
112
Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
Blume/Bl¨ute (flower) is: Bl¨uten/Blumen are the harvest and rewards. The Bl¨ute is the best time of a development. The pleasant odor of a Blume represents other odors, but its nice outlook can merely be a decoration. A Blume can have exactly opposite denotations; i.e., it can be too much of something or too little of something. A Bl¨ute is useless and untrue, and can even be a disease. Both Bl¨ute and Blume can stand for a woman. This linguistic frame is rooted in native speakers’ minds and is expressed in various Blume/Bl¨ute (flower) fixed expressions. Likewise, we give one example for each concept in the frame: Blumen are the harvest (Bl¨utezeit flowering-time = heyday) and rewards (damit ist kein Blumentopf zu gewinnen there-with-isno-flower-pot-to-win = that’s nothing to write home about). The Bl¨ute is the best time (seine Bl¨ute erreichen its-flower-reach = to reach its peak) of a development (etwas in der Bl¨ute vernichten something-in-the-blossom-destroy = to destroy something which is still developing). The pleasant odor of a Blume represents other odors (die Blume des Weins the-flower-of-the-wine = the bouquet of the wine), but its nice outlook can merely be a decoration (blumenreich schreiben flowers-rich-write = to write in a flowery way). A Blume can have exactly opposite denotations; i.e., it can be too much of something (durch die Blume sagen through-the-flower-say = to put something in a roundabout way) or too little of something (vielen Dank f¨ur die Blumen thank you for the flower = thank you for nothing). Bl¨ute is useless (Du bist ja eine sch¨one Bl¨ute you-areyes-a-beautiful-blossom = you are such a beautiful flower; ironically towards a young, inexperienced person) and untrue (Bl¨ute bloom = fake money), and can even be a disease (Bl¨ute bloom = rash). Both Bl¨ute and Blume can stand for a woman (Bl¨ute bloom = young girl). The verb of Bl¨ute (flower) is bl¨uhen whose participle bl¨uhend functions as an intensifier; it usually gives a positive hint, e.g., bl¨uhende Zukunft = glowing future, and bl¨uhender Handel = flourishing business. However, when the collocation carries a negative meaning, bl¨uhend also intensifies it, as in bl¨uhender Unsinn (absolute nonsense). As in Chinese, German Blume/Bl¨ute can also be used to describe a woman. Nevertheless they are not always compliments, e.g., G¨ansebl¨umchen (daisy = a less attractive girl) and eine erbl¨uhte Sch¨onheit (a-bloomed-beauty = once a real beauty, but no longer). As mentioned earlier, Chinese hua1 can also represent women. There are hua1-yan4-nian2-hua2 (flower-kind-year-flourishing = [of women] being young), jie3-mei4-hua1 (elder sister-younger sister-flower = a pair of [nice] sisters), hua1-dan4 (flower-role = the leading female role in Chinese opera), etc. in Chinese. Herrera-Sobek (1994, p. 103) observed Mexican ballads and songs, she indicated that the most utilized metaphor is that of the
Compositionality in Plant Fixed Expressions
113
woman being compared or associated with a flower. Flowers represent women may be universal, because they both evince the evanescent nature of beauty. As the above examples illustrate, when involving the concept derived from the appearance of flowers, hua1 in Chinese shows viewing, appreciation, or criticism of the appearance (e.g., showy). German Blume and Bl¨ute, on the other hand, imply that flowers function as a decoration (e.g., blumenreich schreiben). When entailing the quality of a flower, Chinese has detached concepts like essential, dishonesty, or blurriness, while German Blume and Bl¨ute contain the concept that flowering is the best time in the life span of a plant (i.e., flowering is the phase just before harvest), which is again a functional point of view. The following section will make this primary finding explicit. 3
Plants in Languages
In this section, we first look into the underlying conceit and then list the favorite vehicles of Chinese and German PFEs in order to confirm different concepts observed from linguistic frames of flowers. Underlying conceits When looking into this proverb “Ants on a millstone whichever way they walk, they go around with it” that describes humans and their destinies, Lakoff and Turner (1989, pp. 205–6) assert that “the choice of ants and a millstone is by no means arbitrary.” There is a certain correspondence or relation between an ant and a millstone to make this proverb meaningful: The virtual sizes of the huge millstone and the small ant, the motions of ants on the shape of the millstone all call out the meaning of the proverb; i.e., humans are like ants walking on a millstone: they can never escape destiny. People observe and perceive the world. They generate fixed expressions with associations, such as the size and the shape of the vehicles (ants and millstone) in the above proverb. The associations, that is, the underlying conceits, link the world and the expressions, are intermixtures of human culture and cognition. A variety of PFEs in both Chinese and German are associated with the underlying conceits of (a) the edibility of the plants and (b) customs or historical events. Plants are important suppliers of nourishment for other living creatures, and people often describe this in their languages, e.g., jiao2-cai4-gen1 (chewvegetable-root = to eat old and tasteless vegetables) means metaphorically ‘to bear hardships’, for when someone gets nothing to eat but the coarse roots or old leaves, he is bearing hardships. Dao4-chi1-gan1-zhe4 (invert-eat-sugar-cane) refers to ‘gradually enter a state of elevated enjoyment’, because when we eat sugar cane, we know it gets sweeter nearer the root. The German PFE mit ihm
114
Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
ist nicht gut Kirschen essen, (with-him-is-not-good-cherry-eat = it’s not good to eat cherry with him) figuratively means ‘it’s best not to tangle with him,’ because this person has bad habits or is arrogant: he might spit cherry pits in your face if you eat cherries with him. In den Pfeffer geraten (in-the-pepper-get) means ‘to be in trouble’, the red pepper looks nice but may give the eater a hard time as the eater might not expect that such a good looking little thing is so hot. As for those associated with peculiar customs or historical events, in China, the cane was used as an implement of punishment at school. Though it is not used for this purpose any more, teng2-tiao2 (cane) is still used as an implication of punishment in Chinese. It is linked with school history. The German PFE Stammtisch (trunk-table) is a table in a pub that is reserved for regulars. Since it was mostly simple folk who originally met at a Stammtisch, the word has become a synonym for ‘popular opinion’ because people then ate and talked about their opinions at the table. Further, Stammtischpolitik (trunk-table = armchair politics) has developed and is used to describe when politicians seize upon policies in a populist way solely to win votes. The pub custom in German originated these PFEs. In addition, in Chinese, the underlying conceits that associate most vehicles and the meanings of expressions are: (1) Growth characteristics and cultivation: e.g., the PFE gua1-tian2-li3-xia4 (melon-patch-plum-under) conjures up a tableau like ‘to do up the shoes in a melon-patch and to put on a hat under a plum tree’. Since melons grow on the ground, when one bends down on a melon-patch, one may touch the ripened melons. Because plums grow on a tree, if someone lifts up his arms under a plum tree, he shall reach a luscious plum. For these reasons and associations, the PFE is used to warn people not to be found in a suspicious position. The underlying conceit uses the growing characteristics of melons and plums. (2) Odor: the fragrances of flowers and plants have caused many PFEs to come into being, such as niao3-yu3-hua1-xiang1 (bird-language-flowerfragrance = birds’ twitter and fragrance of flowers, an idyllic scene), jia1hua1-na3-you3-ye3-hua1-xiang1 (an indoor flower is not as fragrant as a wild flower = a wife is not as attractive as a girlfriend) and ru4-zhi1lan2-zhi1-shi4 jiu3-er2-bu4-wen2-qi2-xiang1 (one smells no fragrance when one stays long in the room that has irises and orchids = pervading uplifting character of a moral gentleman). (3) Outer features of plants: e.g., the human brain has a round shape just like a melon. We therefore have nao3-dai4-gua1 (brain-bag-melon) in Chinese to mean a brain. In German, most underlying conceits stem from:
Compositionality in Plant Fixed Expressions
115
(i) Usability of plants: e.g., Holz vor der T¨ur haben (firewood-in front of-the-door-have = to be well-endowed or well-stacked), and K¨ornung (corning = decoy place). Corn is used as animal feed, so a decoy place is named K¨ornung. (ii) Cultivation: apples, pears, berries, etc. are common in the climatic zone of Germany. The Germans use a lot of them in their plant fixed ex¨ pressions. For example, harte Birnen fortgeben und faule Apfel wiederbekommen (hard-pears-give away-and-rotten-apple-gain again = to give away hard pears and receive rotten apples; to make a bad exchange) and so billig wie Brombeeren (as-cheap-as-blackberries = as cheap as blackberries; very cheap). Common plants don’t necessarily draw a Chinese speakers’s attention to form their fixed expressions. (iii) From Scripture or the classics: e.g., aus dem Stamme Davids (fromthe-stem-David’s = of David’s line) is a biblical saying. Von drauß’ vom Walde komm’ ich her (from-outside-from-the-forest-come-I-here = I came from a forest far from here) was a line in Theodor Storm’s poem ‘Knecht Ruprecht’. It was later used to refer to where someone came from is far away or is used to give a vague reply to an unfavorable question “Where are you from?” This is not to say that no German PFEs are associated with the growth characteristics or the odor of a plant, and no Chinese PFEs are linked with the usability of a plant. The higher percentage of the data we have collected so far implies the above statement. From the examination of underlying conceits, we see that Chinese speakers tend to perceive the outer appearance of the plants and compile “sensible or visual” Chinese expressions – those related to plants’ outer features and odor. German speakers lay emphasis on function by adopting more usability of plants in their PFEs. Favorite vehicles People observe and perceive plants from different standpoints and therefore use different plant vehicles to produce their expressions. While less productive vehicles tend to convey stereotypical and repetitive metaphorical meanings, the productive vehicles have rich connotations. Tables 1 and 2 list the plant metaphorical vehicles that are most productive in Chinese and German, respectively. Each vehicle “lives” in the minds of the speakers differently. Trees like willow, pine, cypress, and elm, and flowers like peach blossom, lotus, orchid, and chrysanthemum are for viewing and admiring. These popular vehicles in Table 1 suggest that Chinese speakers produce more PFEs that are based on their vision. As for Table 2, the Germans use more fruits to coin their PFEs.
116
Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
Fruits are edible parts of plants. A functional standpoint is adopted. Also, most of the productive vehicles are founded on their functions, such as under the category ‘others’, Holz (wood) (for burning), Pfeffer (pepper) (a spice) and Kaffee (coffee) (for drinking). Categories Plant vehicles trees: flowers: fruits: others:
mu4 (tree), shu4 (tree), liu3 (willow), song1 (pine), bo2 (cypress), zhu2 (bamboo), yu2 (elm) tao2 (peach blossom), lian2 (lotus), lan2 (orchild), mian2 (cotton), ju2 (chrysanthemum) li3 (plum), gua1 (melon), tao2 (peach), sang1 (mulberry), zao3 (date) cao3 (grass), mu4 (wood/tree), lin2 (woods) Table 1: Favorite plant vehicles in Mandarin Chinese.
Categories Plant vehicles fruits:
Nuss (nut), Frucht (fruit), Apfel (apple), Birne (pear), Hafer (oats) division of Blatt (leaf), Stamm (Stem), Kern (seed, kernel), plants: Dorn (thorn), Wurzel (root), Ast (branch), Stachel (thorn, prickle), Schale (fruit skin) flowers: Blume (flower), Rose (rose) trees: Baum (tree) others: Holz (wood), Pfeffer (pepper), Kaffee (coffee), Kraut (herb), Gras (grass) Table 2: Favorite plant vehicles in German. On the other hand, a good number of German PFEs, as shown in Table 2, are based on the divisions of plants (36.6% of our German data). However, Table 1 shows that, as the category ‘others’ hints, there are many Chinese PFEs generated from lin2 (woods; forest). Woods/forest occupies 3.5% of the collected Chinese data, next only to hua1 (flower) 8.7% and mu4 (wood/tree) 3.9%. In comparison with the category ‘trees’ in Table 1 and Table 2, it seems that ‘tree’ is more collective in German than in Chinese for only the general term Baum (tree) is in German (Table 2), while different types of trees’ names are in Table 1 besides the general term mu4 (tree) and shu4 (tree). When we examine the
Compositionality in Plant Fixed Expressions
117
statistics, the category ‘trees’ occupies 3.9% of the collected Chinese data, but ’trees’ has only 0.02% in German. In terms of productivity PFEs show an uneven distribution in Mandarin, whereas there is a more balanced distribution in German. German speakers notice the individual division of the plants while the Chinese speakers behold the holism. The next section elaborates this point and concludes this study. 4
Conclusion
The discussion about underlying conceits and popular vehicles provides additional support for our observation of f lower frames. The frame indicates that ‘flower’ is a polysemous word; it can be applied to animate or inanimate referents and can carry concrete or abstract denotations. Nerlich, Clarke, and Dingwall (2000, p. 234) believe that multiple meanings are created as a by-product of conceptual integration. The linguistic frames of Chinese hua1 and German Bl¨ute/Blume show that Chinese and German speakers observe and perceive flowers from different standpoints and therefore compose different concepts in their minds. German speakers lay more emphasis on the point that flowers are a stage preceding the fruit/the result, a functional point of view, whereas Chinese speakers tend to show a visual perspective. We have examined the concepts in the German flower frame: the appearance of flowers in German serves the function of decoration, and in the life span of a plant, the flowering phase is the best time because soon the plant’s fruit will be harvested. Other concepts in this linguistic frame support the functional standpoint in German: ‘too much’ or ‘too little’ of something, as shown in the example etwas durch die Blume sagen (something-through-the-flower-say = to put something in a roundabout way), actually also generated from the decorative function of flowers. When someone doesn’t know what to say or how to express a certain event, he speaks in a roundabout way and says more than he needs to. This is like putting flowers around the main point in German cognition. The last example vielen Dank f¨ur die Blumen (very-thank-for-the-flowers = thank you very much for the flowers) is obviously generated from the custom that people bring flowers as simple present when visiting friends. The function of flowers as presents is associated here. The Chinese frame of hua1 (flower) shows that Chinese speakers’ attention is drawn by the look of flowers, not a functional standpoint of decoration, but the description of what a flower looks like (shape, beauty, showiness), and an imagination about flowers (woman-like, blurry, dishonest). Cong1-hua1 (green onion-flower = chopped green onions) and lang4-hua1 (wave-flower = spin-
118
Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
drift) describe the chopped green onions and spindrift as having an open shape. Hua1-zhi1-zhao1-zhan3 (flower-branch-attract-unfold = flower branches move; woman in showy dress) depicts the beautiful, showy, and woman-like sides of flowers. Also the description about girls and women place an emphasis on the looks of flowers because women wear makeup like colorful flowers, and they dress more loudly or more colorfully than men usually do. The old Chinese character of flower hua2 is probably closer to the origin of Chinese cognition in this respect: a visual perspective. Hua2 takes part in concerning more the ‘showy appearance’, as in hua2-er2-wu2-shi2 (bloom-but-not-avail = showy and superficial) and hua2-li4 (luminous-beautiful = resplendent). Popular vehicles and underlying conceits authenticate German’s functionalism and Chinese visualization. This section further hints at the individualism of the Germans and contrasts it with the holistic cognition of Chinese speakers by illustrating that while many PFEs in Chinese are derived from the holistic view, e.g., from lin2 (forest), a good variety of German PFEs are based on the divisions of plants, e.g., Kern (seed) and Dorn (thorn). Semin, G¨orts, Nandram, and Semin-Goossens (2002) proved that an individualistic community and a collective society have different perspectives on the linguistic representation of emotion and emotional events. The present study further demonstrates that people in a collective society observe and compose their fixed expressions with a holistic perspective, such as the Chinese, while people in an individualistic society, such as the Germans, focus on individual aspects of the subject when performing the same linguistic task. The real world provides a starting point for plant fixed expressions by offering metaphorical vehicles to languages, but the choice of salient features, and the meaning and implication attached to that specific feature, vary from language to language (Nesi, 1995, p. 276). Whereas the Germans place more emphasis on the point that a flower is a stage preceding and leading to fruit/result – a functional point of view – there is linguistic visualization in Chinese. Not only is the Chinese a logographic writing system, but so too is the cognition that underlies it for generating fixed expressions, as is revealed in the compositionalities of German and Chinese plant fixed expressions.
Acknowledgements The research presented here is supported by the project NSC 93-2411-H-218003. We are indebted to Chi-Hsuan Lin and Han-Chun Huang for their technical assistance with the compilation of the LaTex file as well as an anonymous reviewer for insightful comments.
Compositionality in Plant Fixed Expressions
119
References Alexander, R. J. (1978). Fixed expressions in English: A linguistic, psycholinguistic, sociolinguistic and didactic study (part 1). Anglistik und Englischunterricht, 6, 171–188. Alexander, R. J. (1979). Fixed expressions in English: A linguistic, psycholinguistic, sociolinguistic and didactic study (part 2). Anglistik und Englischunterricht, 7, 181–202. Atran, S. (1990). Cognitive foundations of natural history: Towards an anthropology of science. New York: Cambridge University Press. Beuchert, M. (1995). Symbolik der Pflanzen: von Akelei bis Zypresse. Frankfurt am Main: Insel-Verlag. Bolinger, D. (1975). Aspects of language (2nd ed.). New York: Harcourt Brace Jovanovich. Carter, R. (1987). Vocabulary. London: Allen and Unwin. Cater, R. (1998). Vocabulary: applied linguistic perspectives. London: Routledge. Chen, S.-H., & Ku, C.-H. (1999). Aboriginal children’s conceptions and alternative conceptions of plants. Proceedings of the National Science Council ROC(D), 9(1), 10–19. Fillmore, C. J., & Atkins, B. T. (1992). Toward a frame-based lexicon: the semantics of RISK and its neighbors. In A. Lehrer & E. F. Kittay (Eds.), Frames, fields and contrasts (pp. 75–102). Hillsdale, NJ: Erlbaum. Fodor, J. (1998). There are no recognitional concepts – not even RED. In In critical condition. Cambridge, MA: MIT Press. Goddard, C. (1998). Semantic analysis: A practical introduction. Oxford: Oxford University Press. Herrera-Sobek, M. (1994). Gender and rhetorical strategies in Mexican ballads and songs: Plant and mineral metaphors. Lore and Language, 12, 97–113. Hsieh, S. C. Y. (2004). The emotive and ruminative expressions in Mandarin Chinese and German. Paper presented at the International Conference of Language, Culture and Mind, University of Portsmouth, Portsmouth. Lakoff, G. (1987). Women, fire and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.
120
Shelley Ching-yu Hsieh, Chinfa Lien and Sebastian Meier
Lakoff, G., & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor. Chicago: University of Chicago Press. Landsberg, M. (1995). Semantic constraints on phonologically independent freezes. In M. Landsberg (Ed.), Syntactic iconicity and linguistic freezes: the human dimension (pp. 65–78). Berlin: Mouton de Gruyter. L´evi-Strauss. (1963). The bear and the barber. The Journal of the Royal Anthropological Institute, 93, 1–11. Li, H. L. (1959). The garden flowers of China. New York: The Ronald Press. Lien, C. F. (2003). Taiwan minnanyu guding yushi shilun (Fixed expressions in Taiwanese Southern Min). In Proceedings of the Second Tamkang Global Sister Universities Conference on Chinese Linguistics and Culture. Taipei: Graduate Institute of Chinese Linguistics and Documentation, Tamkang University. Liu, M. G., & Qin, Z. Y. (2001). Han yin wenhua zhong zhiwu fuhao biao zheng chayi de fenxi (The comparative study of plant symbols in Chinese and English cultures). Journal of Guilin College of Education, 15(2), 58– 60. McCarthy, M. (1990). Vocabulary. Oxford: Oxford University Press. Meng, Q. R. (2001). Shan you fusu xi you hehua (Fusu in mountain, lotus in marshy land). Journal of Chang Chun Teachers College, 20(3), 63–65. Moon, R. (1998). Fixed expressions and idioms in English. Oxford: Clarendon Press. Nerlich, B., Clarke, D. D., & Dingwall, R. (2000). Clones and crops: The use of stock characters and word play in two debates about bioengineering. Metaphor and Symbol, 15(4), 223–239. Nesi, H. (1995). A modern bestiary: a contrastive study of the figurative meanings of animal terms. ELT Journal, 49(3), 272–278. Nunberg, G., Sag, I. A., & Wasow, T. (1994). Idioms. Language, 70, 491–538. Pinker, S., & Birdsong, D. (1979). Speaker’s sensitivity to rules of frozen word order. Journal of Verbal Learning and Verbal Behavior, 18, 497–508. Pollard, C. J. (1984). Generalized phrase structure grammars, head grammars, and natural languages. Doctoral dissertation, Stanford University, California. Qiu, X. G. (1995). The introduction to semiotics. Taipei: Wan Juan Lou.
Compositionality in Plant Fixed Expressions
121
Reape, M. (1996). Getting things in order. In H. C. Bunt & A. v. Horck (Eds.), Natural language processing (Vol. 6: Discontinuous Constituency, pp. 209–253). Berlin: Mouton de Gruyter. R¨ohrich, L. (1991). Lexikon der sprichw¨ortlichen Redensarten. Freiburg: Herder. Semin, G. R., G¨orts, C. A., Nandram, S., & Semin-Goossens, A. (2002). Cultural perspectives on the linguistic representation of emotion and emotion events. Cognition and Emotion, 16(1), 11–28. Todo, A. (1974). Gakken kan-wa daijiten (Comprehensive Chinese-Japanese dictionary). Tokyo: Gakashu Kenkyusha. Vervaeke, J., & Kennedy, J. M. (1996). Metaphors in language and thought: Falsification and multiple meanings. Metaphor and Symbolic Activity, 11(4), 273–284. Wang, L., Tang, Z. F., Guo, X. L., Cao, X. Z., He, J. Y., Jiang, S. Y., & Zhang, S. D. (2000). Wang Li old Chinese dictionary. Beijing: Zhong Hua Bookstore. Wang, W. S.-Y. (1991). Language prefabs and habitual thought. In W. S.Y. Wang (Ed.), Explorations in language (pp. 397–412). Taipei: Pyramid Press. Wen, L. L. (1986). Shijing zhong caomu niaoshou yixiang biaoxian zhi yanjiu (A study of the image of plants and animals in The Book of Odes). Doctoral dissertation, Department of Chinese Literature, National Chengchi Univeristy, Taipei. Wu, Z. Y., Huang, Q. Y., & Liu, Y. Q. (1989). Ciyuan (Chinese etymological dictionary). Taipei: Shangwu Yinshu Guan. Zhongguo Shehui Kexue Yuan Yuyanxue Yanjiu Suo (Ed.). (1990). Xiandai Hanyu Cidian (Xiuding Ben). Beijing: Shangwu Yinshu Guan.
Type- and Token-Level Composition Olav Mueller-Reichau
1
Introduction
When two syntactic units combine to form a complex syntactic unit, their individual meaning structures semantically compose with each other giving rise to a new meaning structure. Referential theories of meaning assume that meaning structures correspond to entities in the world (or what we conceptualise as the world) talked about. Carlson (1977) demonstrated that we must conceive of the world as containing type entities besides token entities in order to understand many linguistic patterns. The aim of this paper is to reemphasise the relevance of the type/token distinction for the patterning of linguistic structures: Semantic composition can proceed at the token-level as well as at the type-level. Specifically, I will make out a case for assuming that the token-level can only be accessed through the type-level. Token referring expressions result from a grammatical process I will call “spatiotemporal localisation” applying to typelevel expressions. To give substance to this notion, a formalisation is offered that builds on the DRT-based theory of semantic incorporaton developed in Farkas and de Swart (2003). The paper is structured as follows: Section 1 discusses the ontological distinction between a token domain and a type domain with reference to the recent elaboration of Dayal (2004). Four basic sentence types are distinguished dependent upon whether the grammatical subject or the grammatical predicate, respectively, are interpreted at the type-level or at the token-level. Section 2 sets the stage for developing an adequate formalisation. The distinction between uninstantiated arguments and instantiated arguments proposed by Farkas and de Swart (2003) to cope with incorporation structures across languages is introduced. Dayal (2003) has presented Hindi evidence against giving uninstantiated arguments an existential interpretation by necessity, as Farkas and de Swart do. Address for correspondence: Department of Slavonic Studies, Beethovenstr. 15, 04107 Leipzig, Germany. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
124
Olav Mueller-Reichau
This is discussed in section 3. Adopting an idea of McNally and Van Geenhoven, it is argued that the source of existentiality of incorporated nominals is located outside of the nominal itself. My way of distinguishing between typelevel arguments and token-level arguments, as proposed in section 4, captures this insight. In section 5 I demonstrate how my version of a distinction between type-level and token-level arguments is capable of deriving the sentence typology of section 1 in a systematic way. Section 6 concludes the paper. 2
The Underdeterminacy of English Noun Phrases
It is a well-known fact that English noun phrases have a dual referential potential. A noun phrase may refer not only to object individuals, but also to kind individuals (Krifka et al., 1995): (1) The Dutchman is a good sailor. Sentence (1) is ambiguous between at least two readings dependent upon the interpretation of the subject noun phrase. Under the first reading it expresses the proposition that Dutchmen in general are good sailors. In this case the Dutchman refers to a kind and the grammatical predicate is a good sailor ascribes a property to this kind. Under the second reading the sentence expresses the proposition that a particular person who is a Dutchman qualifies as a good sailor. Here the Dutchman refers to an object and the predicate ascribes a property to this object. Dayal (2004) considers the locus of the ambiguity to be the common noun possibly interpreted at either the kind-level or the object-level. This view opens up the possibility of a unified semantic analysis of the English articles. The definite article in (1), for example, can under both readings of the sentence be viewed as an operator picking out the single greatest member of a set of individuals (i.e. as the iota-operator). The two different interpretations of the grammatical subject the Dutchman arise because in one case this operator determines the greatest member of a set of kinds whereas in the other case it determines the greatest member of a set of objects. Hence, there is no need to appeal to a special generic definite article besides the canonical one. Dayal’s analysis presupposes a systematic two-way division of the ontology. Every ontological sort manifests itself once as a type-level domain and once as a token-level domain (compare also D¨olling, 1992, 1995; Carlson, 2000). Against this background, prototypical nouns denote either within the domain of object types or within the domain of object tokens. In the former case the noun functions as a kind-level predicate, in the latter case it functions as an object-level predicate. Similarly, the prototypical verb denotes either within the domain of event types or within the domain of event tokens.
Type- and Token-Level Composition
125
Although every English noun phrase introduced by an overt article is in principle free to denote on either of the two ontological levels, particular linguistic environments may block one of the two options. The verb form in (2), for example, requires its subject to denote at the object-level and so does the verb form in (3) with respect to its direct object: (2) The telephone is ringing. (3) Little John has built a snowman. The subject of (4) and the direct object of (5) must be kind-level expressions, by contrast, because this is what the respective verb forms select for: (4) The Kalashnikov is the most widespread weapon in the world. (5) This morning Fred invented a pumpkin-crusher.
[Dayal (2004) (B.Geurts)]
Dayal (2004) presents the sentences under (6) to illustrate kind-level uses of definite singulars, indefinite singulars, definite plurals, and indefinite plurals1 : (6a) The lion is likely to become extinct. (6b) A lion roars when it is hungry. (6c) The crustaceans evolved simultaneously. (6d) Crustaceans can evolve simultaneously. The morphosyntactic underdeterminacy of English noun phrases also holds for forms introduced by a demonstrative determiner. While this book in (7a) refers to a kind, this book in (7b) refers to an object (Krifka et al., 1995): (7a) This book sells well. (7b) This book got wet in the rain. In other languages the situation may be different, of course. Maori, for example, possesses two indefinite articles, he and teetahi (Chung & Ladusaw, 2003). According to Bauer (1993), their distribution is controlled by the following criterion: “he is used when the type of object is crucial, and teetahi is used when the number of individuals present is significant” (Bauer, 1993, p. 357). The dual referential potential hypothesis suggests a simple typology of natural language sentences of the structure [S [NP . . . ][VP . . . ]]. Let us speak of “kind generics” when, intuitively speaking, both the NP and the VP are type-level2 : 1
Here Dayal objects to Krifka et al.’s (1995, p. 18) position according to which the indefinite in (6b) is nonspecific and non-kind-referring (Dayal, 2004, p. 396). 2 Unlike me, Krifka et al. (1995) favor the view that the verbal predicates in (8b) and (9c) are interpreted on the token-level with a covert generic operator being responsible for their generic interpretation.
126
Olav Mueller-Reichau
(8a) The flamingo is a bird. (8b) Sailors smoke. (8c) Snow is white. Let us furthermore refer to cases of token-level NPs combining with type-level VPs as “object generics”: (9a) Look kids, this is the lion.
[Krifka et al., 1995]
(9b) Moby Dick is a whale. (9c) Mary smokes. (9d) John is intelligent. Under (10) we find plausible candidates for combinations of type-level NPs and token-level VPs which could be called “kind episodics”: (10a) Around 1570 the potato reached Europe. (10b) Man set foot on the Moon in 1969.
[Krifka et al., 1995]
Token-level NPs meeting token-level VPs yield “object episodics” which come in two varieties. If an episodic is based on an adjectival or a nominal predicative or on a stative verb, it will normally be stative. If it is based on an action verb, it will be eventive/dynamic: (11a) John is tired. (11b) At the fancy ball, John was a woman. She was very sexy.3 3
Typically, a nominal predicative gives rise to a generic reading (Carlson, 1977). There are, nevertheless, episodics with nominal predicatives; for discussion see Baker (2003, p. 163) from where example (4) is taken: (1) In the spring, all the little rivers are a big river. It is very wide. (2) In this film, Harry is a nun. She saves the world. (3) Yesterday at the circus, Sue was a clown named Peppino. He was very funny. (4) In the winter, Merlin is a wolf. It has a brown coat and sharp teeth.
[Baker, 2003]
(5) After the witch said these words, the prince was a frog. It was very ugly. (6) Der Prinzmasc ist eine Unkefem . Siefem ist sehr h¨asslich.
[German]
(7) Leipzig is a huge construction site at the moment. It involves a lot of digging.
Type- and Token-Level Composition
127
(11c) Mary is smoking. (11d) The landscape is white. In what follows I am going to systematically derive the semantics of kind generics, object generics, kind episodics and object episodics. My formalisation builds on Farkas and de Swart’s (2003) DRT-based approach to the semantics underlying the morphosyntactic process of object incorporation in various languages. Let me therefore briefly introduce the general linguistic properties of object incorporation and the way Farkas and de Swart try to account for them. 3
Semantic Incorporation
According to Mithun (1984, 2000), more often than not noun incorporation is a word formation process where, simply speaking, a nominal lexeme attaches to a verbal lexeme to form a new verbal lexeme. (12) shows an example from West Greenlandic (Van Geenhoven, 1998): (12) Arnajaraq eqalut-tur-p-u-q. A.ABS salmon-eat-IND -intrans-3.sg ‘Arnajaraq ate salmon’ Does the incorporated noun introduce a discourse referent? With respect to the morphologically incorporated structure in (12) one might want to say that eqalut is not a syntactically represented phrase and since noun incorporation is a lexical process the question simply does not arise. The question does arise, however, in incorporation structures where the incorporated nominal is realised as a syntactically autonomous phrase. West Greenlandic has at its disposal also this option. Compare (13) with (14), both taken from Van Geenhoven (1998): (13) Angunguaq aalisakka-mik neri-v-u-q. A.ABS fish-INST .sg eat-IND -intrans-3sg ‘Angunguaq ate fish’ (14) Angunguu-p aalisagaq neri-v-a-a. A.ERG fish-ABS eat-IND -trans-3.sg.3sg ‘Angunguaq ate the/a particular fish’ In both, (13) and (14), the object of eating – fish – is syntactically represented as a case marked constituent. The constituent aalisagaq in (14) has the status of a complement of the verb. This is witnessed by (i) the fact that the verb shows
128
Olav Mueller-Reichau
object agreement inflection (glossed as “trans”) and (ii) the fact that the constituent bears absolutive case signalling participation in the transitive (ergativeabsolutive) case frame. What about (13)? Neither the agreement inflection of the verb nor the oblique case marking of the constituent aalisakkamik indicates transitivity. Thus, although (13) contains two syntactically automomous nominal constituents, only one of them counts as a complement of the verb. As can be read from the respective translations, this difference in syntactic status corresponds to a difference in interpretation. To terminologically distinguish structures like (13) from cases of “prototypical noun incorporation” (Mithun) like (12) where the noun and the verb compound to form one syntactic word, Dayal (2003) uses the term “pseudoincorporation”. The challenge is to identify the semantic contribution of the pseudoincorporated noun phrase and to spell out the way this expression composes with the verb. This inevitably forces one to answer the question of whether the pseudoincorporated noun phrase introduces a discourse referent or not. Languages making use of pseudoincorporation vary with respect to the precise morphosyntactic properties of the structure they realise. For instance, Turkish (Kornfilt, 2003) requires the incorporated direct object to appear juxtaposed to the verb while Hindi (Dayal, 1999, 2003) does not. Hungarian overtly casemarks incorporated direct objects while Turkish does not. Pseudoincorporation structures also differ from language to language with respect to interpretive properties. For example, while incorporated singulars in Hungarian cannot serve as antecedents for anaphors, incorporated plurals can (Farkas & de Swart, 2003). While object incorporation in Hindi is subject to idiosyncrasies (Dayal, 1999, 2003), object incorporation in Maori is a productive process (Bauer, 1993). Despite these cross-linguistic variations it is generally presumed that there is a common semantic mechanism underlying the grammatical incorporation of direct objects. An adequate theoretical reconstruction of this mechanism must be able to explain why, unlike non-incorporated noun phrases, pseudoincorporated noun phrases are felt to have a generic meaning. Moreover, it must answer the following questions. Why do pseudoincorporated noun phrases always have narrow scope? Why is the morphological number value of the noun in many cases irrelevant for the interpretation of the noun phrase? Why are pseudoincorporated nouns often unable to serve as antecedents for anaphors? Why is it that verb phrases formed via pseudoincorporation are often only licensed in case they denote well-established activities? Different answers have been given in the recent past. Van Geenhoven (1998) derives her theory of semantic incorporation on the basis of data from West Greenlandic. Dayal (1999, 2003) develops a semantics for pseudoincorporation in Hindi. Chung and Ladusaw (2003) propose a special mode of composition to
Type- and Token-Level Composition
129
deal with the peculiarities of certain pseudoincorporated noun phrases in Maori. Last not least, Farkas and de Swart (2003) offer an extension of DRT that has the potential to account for incorporation structures, among others, in Hungarian. All of these authors attempt to formalise more or less the same intuition. As Mithun puts it: Incorporated nouns [. . . ] qualify the verb, narrowing its scope semantically to pertain to the kind of patient, location, or instrument designated by the noun. (Mithun, 1994, p. 5025) To pin down this intuition, the pseudoincorporated noun phrase is analysed as a modifier of the verbal predicate adding its descriptive content as an additional condition to the descriptive content of the verb thereby creating a more specific event description than the one associated with the verb alone. The authors share the view that the pseudoincorporated noun phrase does not function as a regular argument expression saturating an argument slot of the predicate. Instead, the incorporated nominal is viewed as contributing a property to the sentence meaning4 . Farkas and de Swart (2003) derive their theory of semantic incorporation from Hungarian data. The basic observation is that pseudoincorporated bare singulars cannot serve as antecedents for anaphors. This suggests that either the direct object introduces no discourse referent or it introduces a special kind of non-transparent discourse referent: (15a) J´anosi betegetj vizsg´alt a rendel¨oben. J. patient.ACC examine.PAST the office.in ‘Janos patient-examined in the office’ (15b)
*
e´ s beutaltatta 0/ j a 0/ i T´ul sulyosnak tal´alta o¨ tj pro too severe.DAT find.PAST he.ACC and intern.CAUS .PAST pro the korh´azba. hospital.in
‘Hei found himj too sick and sent himj to hospital’ 4
At this point a note on properties is in order. Chierchia (1984) and Chierchia and Turner (1988) point out that properties as ontological entities have a double nature. Each property can be conceived of as either an individual or a function from individuals to truth values. Given this, linguistic signs expressing properties are expected to come in two versions. Those denoting properties-as-individuals are argument expressions, while those denoting properties-as-functions are predicate expressions. Chierchia (1998) reinterprets properties-as-individuals as kinds. It is therefore possible to analyse the property contributed by an incorporated noun phrase either as a kind or as a property-asfunction.
130
Olav Mueller-Reichau
To capture the lack of discourse transparency of certain nominals, Farkas and de Swart propose a modification of standard DRT by drawing a distinction between discourse referents on one hand and what they call thematic arguments on the other. The basic idea is that while every noun phrase introduces a thematic argument into the discourse representation structure under construction only a subset of thematic arguments will gain the status of a discourse referent in the final interpretation. To become a discourse referent, a thematic argument has to undergo a grammatical process called Instantiation. Discourse transparency is thus viewed as resulting from the application of Instantiation. The determinerless singular beteget in (15) remains uninstantiated. This comes as no surprise given Farkas and de Swart’s hypothesis that triggering Instantiation is the ultimate function of the syntactic category of determiners (Farkas & de Swart, 2003, p. 22). Uninstantiated arguments and instantiated arguments compose with the verb denotation along different lines. Instantiated arguments saturate an argument slot of the verbal predicate in the usual way. Uninstantiated arguments, by contrast, are not capable of doing so. Farkas and de Swart therefore assume an alternative process called Unification to govern the composition of a predicate and an uninstantiated argument. Unification is in effect the identification of the thematic argument of the nominal with a thematic argument of the verb. In the semantic representation, the letters u and v are used by Farkas and de Swart as variables for discourse referents, z is used as a variable for a thematic argument: (16) Az orvos beteget vizsg´alt. the doctor patient.ACC examine.PAST ‘The doctor patient-examined’ u doctor(u) patient(z) examine(u,z) The argument z introduced by the incorporated noun phrase beteget has the status of a thematic argument only and is, by convention, not represented within the universe of the DRS. The argument u introduced by the full fledged noun phrase az orvos, by contrast, is a discourse referent and as such it is listed in the universe of the DRS. As can be seen, the verbal predicate is supposed to relate a discourse referent and a thematic argument. The slot of the discourse referent is saturated by u and Unification causes the slot of the thematic argument to be restricted to entities having the property of being a patient.
Type- and Token-Level Composition
131
(17) Az orvos egy beteget vizsg´alt. the doctor a patient.ACC examine.PAST ‘The doctor examined a patient’ uv doctor(u) patient(v) examine(u,v) In (17) the direct object egy beteget is realised as a full fledged noun phrase with an overt determiner. According to Farkas and de Swart, the derivation proceeds as follows. First, by virtue of its lexical meaning, the head noun beteget feeds a condition and a thematic argument into the DRS under construction. Syntactically combining with a determiner, the thematic argument then undergoes Instantiation thereby gaining the status of a discourse referent. Finally, the discourse referent saturates the argument position of the verb. As can be read from the final DRS under (17), the universe contains two discourse referents and the predicate condition relates these two discourse referents to each other. The picture is complicated by the fact that Hungarian bare plural objects do provide antecedents for anaphors. This is surprising given that the introduction of discourse referents follows from the presence of a determiner: (18a) J´anosi betegeketj vizsg´alt a rendel¨oben. J. patients.ACC examine.PAST the office.in ‘Janos patients-examined in the office’ e´ s beutaltatta 0/ j a (18b) 0/ i T´ul sulyosnak tal´alta o¨ ketj pro too severe.DAT find.PAST they.ACC and intern.CAUS .PAST pro the korh´azba. hospital.in ‘Hei found themj too sick and sent themj to hospital’ To account for the discourse transparency of pseudoincorporated bare plurals in Hungarian (as well as in English), Farkas and de Swart propose an additional mechanism called Secondary Instantiation. The idea behind Secondary Instantiation is, very briefly, that the plural morphology of the noun contributes a presupposed non-atomic discourse referent. In the absence of overt determiners, this presupposed referent needs to be accommodated at the end of the derivation of the DRS. This explains why only bare singulars lack discourse transparency in Hungarian pseudoincorporation structures.
132
Olav Mueller-Reichau
Within DRT, whether a sentence is true or false depends on whether there is an embedding function relating every discourse referent (listed in the universe of the DRS induced by the sentence) to some entity in the world/model such that the entity satisfies each condition (listed in the condition set of the DRS) imposed on the respective discourse referent. Farkas and de Swart’s conception of uninstantiated arguments produces conditions imposed on arguments not corresponding to variables in the universe of a DRS. This raises the question of how to interpret uninstantiated arguments. What is it that the embedding function maps thematic arguments in final DRSs onto? Farkas and de Swart postulate that uninstantiated arguments must be related to some entity in the model (Farkas & de Swart, 2003, p. 63). As a consequence, uninstantiated arguments are necessarily interpreted existentially.
4
The Existentiality Problem
Even though developed on the basis of Hungarian data, the theory of Farkas and de Swart (2003) aims at cross-linguistic validity. More than once the authors point out that their model is supported by the Hindi data presented in Dayal (1999). Dayal (2003), however, observes that pseudoincorporated nominals in Hindi do not necessarily entail the existence of objects satisfying the nominal description. If the aspectual value of the verbal predicate is imperfective, it may happen that the existence of particular referents of the incorporated nominal is not entailed. Here is the relevant example5 : (19) anu aaj kal santare bectii hai. A. these-days oranges sells ‘Anu sells oranges these days’ 5
The morphosyntactic rules of Hindi license bare nominals. There is no definite article in Hindi and whether the formative ek (‘one’) should be counted as an indefinite article or as a numeral is not entirely clear (Dayal 2004, p. 418). Incorporated direct objects in Hindi manifest themselves as either bare singulars or bare plurals. Dayal (2003) shows that the incorporated nominal, be it singular or plural, behaves like a syntactic complement of the verb in triggering the agreement pattern of a transitive verb. Moreover, certain syntactic constituents like, for instance, the negation particle may intervene between the noun and the verb. Dayal points out that Hindi incorporation is subject to idiosyncrasies – not all noun-verb combinations are possible. Finally, while the incorporation of a bare singulars is only possible if it leads to a well-established category like SELL OLD BOOKS, bare plural incorporation may also result in verb phrases describing ad hoc categories like SELL EXPENSIVE BOOKS (Dayal, 1999, p. 12).
Type- and Token-Level Composition
133
According to Dayal, the sentence can be true even in a situation where there are no particular oranges6 : The activity of oranges-selling simply requires certain types of moves. Setting up a shop, determining some prices for different types of oranges, entering into a contract with a supplier might be enough to qualify as engaging in oranges-selling. And if no oranges get delivered, the sentence would still be true. (Dayal, 2003, p. 25) This poses a problem for any incorporation theory which, like Farkas and de Swart (2003), automatically derives the existence of an object having the property of the incorporated nominal as a truth condition7 . Dayal’s observation calls for a theory of semantic incorporation that leaves open the possibility of incorporated nominals not entailing the existence of particulars8 . Such a theory might be Van Geenhoven (1998), because for Van Geenhoven it is the lexical meaning of the verb rather than any general principle that delivers the entailment: existentiality follows from the semantics of the incorporated nominal composing with the semantics of the verb. In the West Greenlandic sentence (20), for example, the incorporated nominal is analysed as supplying a property-as-function and the verb as containing a lexicalised existential quantifier so that the interpretation of noun and verb together entails the existence of objects having the nominal property (Van Geenhoven, 1998, pp. 131–141). 6
If you are not convinced by example (19), consider the following one: The bakery in my street sells Tut Anch Amun cakes these days. This sentence can certainly be true even if every particular Tut Anch Amun cake has been sold (and consumed) and no other bakery in the world has this type of cake in the assortment. 7 Besides Farkas and de Swart (2003), the theory of Chung and Ladusaw (2004) is of this kind. One sort of Maori indefinites is considered to compose with the verb via a special mode of composition called Restrict. Restrict equals Unification in that a complex predicate is formed by letting two descriptions apply to the same argument. The source of existentiality of the incorporated nominal is automatic existential closure at the event level. Unlike Farkas and de Swart, however, Chung and Ladusaw aim at a language-particular formal description. 8 Dayal herself hyphenates the two properties in the semantic representation so that no commitment is made to the existence of objects having the nominal property. A high degree of likelihood for such an object to exist follows from a conceptual requirement imposed on the event variable. Here is the semantics of (19) according to Dayal (2003): ∃e.Imp(Oranges-Selling(e)&Ag(e)=anu&Appropriately-Classificatory(e)) I am inclined to doubt the explanatory power of simply hyphenating two properties in the metalanguage.
134
Olav Mueller-Reichau
(20) Arnajaraq iipili-tur-p-u-q. A.ABS apple-ate-IND -intrans-3.sg ‘Arnajaraq ate apples’ Van Geenhoven’s approach is special in still another respect. Whereas for Farkas and de Swart (2003) it is legitimate to speak of semantic incorporation only if the incorporation process manifests itself in some way or the other in morphosyntactic structure (e.g. in terms of the absence of a determiner), semantic incorporation in the sense of Van Geenhoven need not be formally signalled. Although English has no special morphosyntactic incorporation structure, it is analysed by Van Geenhoven as having semantically incorporated noun phrases. English noun phrases in certain argument positions are considered to compose with their predicate via the same semantic mechanism as overtly incorporated nominals in West Greenlandic do. The direct object of a verb form like ate in the English translation of (20) is a case in point9 . Indefinites appearing in this position (like the bare plural apples) are traditionally characterised as “weak” nominals. Van Geenhoven (1998) draws our attention to the fact that weak indefinites and incorporated nominals show a very similar linguistic behaviour. English transitive sentences with weak indefinite direct objects often carry an existential entailment. It is impossible that Arnajaraq ate apples when there have never been any apples, for example. According to Van Geenhoven (1998), not the nominal is the source of existentiality in these cases, but rather the verb form. On the basis of this conviction, Van Geenhoven and McNally (1998) suggest a (re-)definition of the classic notional distinction between weak and strong nominals: Weak expressions depend on the verb with which they combine for what we might call their referential force. It is the verb, not the nominal, that licenses the introduction of the discourse referent intuitively associated with the weak nominal. Strong nominals [. . . ] all have in common that they do not depend on the verb for their referential force. (McNally & Van Geenhoven, 1998, p. 15) I propose to identify McNally and Van Geenhoven’s distinction between weak nominals and strong nominals with the distinction between kind-level nominals and object-level nominals (cf. section 2). So let us distinguish between two sorts of discourse referents, kind referents and object referents. What “having 9
Also, Van Geenhoven (1998) considers the pivot of an existential sentence to be a semantically incorporated noun phrase. That the truth of an existence sentence like There are flies in my soup entails the existence of flies is a consequence of the existential predicate’s application to the property supplied by the pivot noun phrase, see also McNally (1998).
Type- and Token-Level Composition
135
referential force” boils down to is “introducing an object discourse referent”. Following McNally and Van Geenhoven’s suggestion, we can expect that there are two ways for an object discourse referent to be introduced into the semantic representation. Under the first possibility, this is achieved by a (strong/objectlevel) nominal in itself capable of introducing an object discourse referent. Under the second possibility, this is achieved by a (weak/kind-level) nominal in itself not capable of doing so. To introduce an object referent in the latter case, the nominal is in need of semantic support “from outside”, e.g. from the side of the verbal predicate. If there is no support coming from the cotext, the nominal will be understood as introducing no object referent, but “only” a kind referent. Below I shall couch this story in terms of Farkas and de Swart’s DRT-version. Jumping ahead, I will say that basic lexical predicates are type-level predicates. As such, they introduce into the condition set of the DRS under construction a thematic kind argument together with a condition imposed on it. Instantiating this thematic kind argument brings about the introduction of a kind discourse referent into the universe of the DRS. However, if prior to instantiation, the basic type-level predicate is converted into a token-level predicate, then instantiation brings about the introduction of an object discourse referent. Technically, this conversion – call it “spatiotemporal localisation” – is the replacement of a thematic kind argument by a thematic object argument. Therefore, if spatiotemporal localisation applies during the course of the nominal projection, the nominal will denote at the token-level. If, however, spatiotemporal localisation does not apply during the course of the nominal projection, the nominal will denote at the type-level. This does not exclude the possibility of spatiotemporal localisation applying “from outside” to a fully projected nominal. If a type-level noun phrase appears as the direct object of the verb form caught, for example, spatiotemporal localisation of the nominal’s thematic argument is required by the verb. In this case, then, independent of whether the nominal itself projects up to the token-level, its interpretation within the context of the verb phrase always yields the introduction of a token discourse referent. Therefore unlike (21a), (21b) has no interpretation under which the existence of a particular unicorn was not entailed (Van Geenhoven & McNally, 2002): (21a) John is looking for a unicorn. (21b) John caught a unicorn. If there is no existential import coming from the verb, as in (21a), no tokenlevel interpretation of the nominal is forced. Similarly, (22a) can be true in a world/situation not containing any token lion with toothaches. Recall from above that, according to Dayal (2003), the truth of (22b) does not entail the existence of particular oranges.
136
Olav Mueller-Reichau
(22a) The tamer fears a lion with toothaches. (22b) Anu sells oranges these days. As pointed out by von Heusinger (2002), definite noun phrases show a similar behaviour. Although it is a little more difficult to construct examples with typelevel definites, if only the noun is chosen to describe some well-established category (established for example on the menu of a restaurant or by virtue of a partition of the world into postal districts), then the pattern repeats itself also for definites: (23a) Laura ordered the Asian noodles with shrimps. (23b) Mary handled the mail from Antarctica.
[Krifka et al., 1995]
On one hand, the definite noun phrase in (23a) can actualise a token-level reading under which it is a particular portion of Asian noodles with shrimps which Laura ordered (e.g. by pointing to a showcase). On the other hand, the definite can actualise a type-level reading where no commitment to the existence of particular Asian noodles with shrimps is made. On this latter interpretation, the sentence would truly describe a situation in a restaurant even if the restaurant had run out of Asian noodles with shrimps at the moment when Laura was doing the ordering. Virtually the same has been said with respect to (23b): “[I]t can either mean that Mary was in charge of the mail of Antarctica in general (even if there never was some real mail from Antarctica), or that she handled some particular batch of mail from Antarctica” (Krifka et al., 1995, p. 9). (24a) Jones wants to meet the president. (24b) Mary wants to meet the perfect man.10 Sentence (24a) can either mean that there is a certain person who is the president such that Jones wants to meet this person (Donnellan’s (1966) referential use), or that Jones wants to meet the president regardless of the particular person doing this job (Donnellan’s attributive use). Similarly, (24b) can mean that there is a certain man who qualifies as the perfect man and Mary wants to meet this person. On the other hand it can mean that Mary wants to meet the perfect man not knowing who he is or even whether he exists at all. Interestingly, it seems that, in out of the blue contexts, with (24a) the token-level/referential 10
The definite noun phrase of (24b) is from Ward and Birner (1994), where it is shown to qualify as the pivot in existential sentences (i.e. not triggering the definiteness effect): There was the perfect man for Mary in my history class. This is not by chance given the pivot composes with the existential predicate via semantic incorporation as suggested by Van Geenhoven (1998).
Type- and Token-Level Composition
137
reading is easier to get, while with (24b) the type-level/attributive reading is more salient. Note that, here again, in order to actualise a type-level definite reading, the nominal must describe a well-established category. The definite ad hoc description in (25) has only the existence entailing token-level reading: (25) Mary wants to meet the young linguist who investigates Abkhaz and Samoan. At this point remember that reference to kinds by means of a definite noun phrase requires the kind referent to be well-established (Krifka et al., 1995): (26a) The Coke bottle has a narrow neck. (26b) The green bottle has a narrow neck. The subject of (26b) must be understood as referring to a token bottle unless one well-establishes the kind GREEN BOTTLE contextually (Dayal, 2004, p. 425): (27) The factory produces two kinds of bottles, a green one for medical purposes and a clear one for cosmetics. The green bottle has a long neck. Think of a definite subject term describing a category which is well-established (like COKE BOTTLE) and at the same time not known for having instances in the actual world (unlike COKE BOTTLE). Such subjects may well give rise to sentences which are true relative to the actual world, as witnessed by (28): (28a) The dinosaurs are either herbivorous or carnivorous. (28b) The perfect man for Mary speaks Abchaz, Samoan and Evenki. The same holds for indefinite subjects, the difference being only that there is no well-establishedness (or: familiarity) requirement. Neither does the truth of (29a) entail the existence of lions with toothaches, nor does the truth of (29b) entail the existence of a unicorn: (29a) Lions with toothaches are dangerous.
[Krifka et al., 1995]
(29b) A unicorn has a horn. What I conclude from these observations is that every English NP-type – be it a singular or plural indefinite or even a singular or plural definite – can be used without “referential force”, i.e. without entailing the existence of particular objects satisfying the nominal description. The morphosyntax of English noun phrases, it turns out, is underspecified with respect to the distinction between interpretations at the token-level entailing the existence of objects and interpretations at the type-level not entailing the existence of objects. English differs in this regard from languages which overtly signal type-level arguments of token-level verbal predicates by a morphosyntactic realisation strategy that can be called (pseudo)incorporation.
138
5
Olav Mueller-Reichau
Type-level Arguments and Token-level Arguments
In this section I propose to augment Farkas and de Swart’s DRT-version with a distinction between type-level arguments and token-level arguments. This move leads to different predictions concerning the existentiality of the incorporated nominal. Specifically, I suggest that thematic arguments are basically type (or: kind) arguments which may or may not be promoted to token (or: object) arguments dependent on whether the grammatical mechanism of spatiotemporal localisation applies or not11 . My proposal has the following features: • The lexical meaning of every nominal predicate includes a thematic kind argument which will be introduced into the discourse every time the predicate is chosen to participate in the structure of a sentence. • A thematic kind argument introduced into the discourse can but need not be promoted (spatiotemporally localised) to a thematic object argument. • Instantiation of a thematic kind argument results in the introduction of a kind discourse referent; instantiation of a thematic object argument leads to the introduction of an object discourse referent. • Kind discourse referents in final DRSs are not interpreted existentially (only object-level expressions entail the existence of objects). • That a kind-level expression by itself is not interpreted existentially does not exclude the possibility that the interpretation of the sentence as a whole entails the existence of an object satisfying the description of the kind-level expression. This situation arises whenever the verbal predicate comes with an existence claim concerning its argument. • Noun phrases accompanied by the definite article the require their discourse referent, be it a kind referent or an object referent, to be familiar (previously established), whereas the presence of the indefinite article a imposes a novelty as well as an atomicity condition on the respective discourse referent. A general blocking mechanism (Chierchia, 1998) bars bare nominals from those contexts for which there are lexicalised determiners in the language system. So let us list kind referents (symbolised here by capital letters U,V,W,...) in the universe of the DRS side by side with object arguments (symbolised by small letters u,v,w...) and indicate their link by adding an additional relation “loc(u,U)” to the condition set of the DRS. Compare now the DRS induced 11
Spatiotemporal localisation plays virtually the same role as does Carlson’s (1977) realization relation.
Type- and Token-Level Composition
139
by the Hungarian sentence (16) given Farkas & de Swart’s (2003) DRT-version with the one induced by the same sentence (30) given the modified DRT-version proposed here. From now on I assume that also verbs introduce thematic (event) arguments possibly localised by the appropriate tense/aspect information: (30) Az orvos beteget vizsg´alt the doctor patient.ACC examine.PAST ‘the doctor patient-examined’ ueUVE doctor(U) loc(u,U) patient(V) examine(E) loc(e,E) actor(e,u) undergoer(E,V) I assume that, simply by virtue of being a noun phrase, any noun phrase introduces a discourse referent into the DRS under construction – either a kind referent or an object referent. That is to say the semantic composition of any descriptive noun phrase must involve the instantiation of a thematic argument. An overt article is the lexical manifestation of this derivational step. The idea is that with noun phrases lacking an overt article, instantiation always applies at the kind-level12 . If this kind referent fills an object argument slot in a predicate, the result will be an existential interpretation of the bare nominal (Chierchia, 1998, p. 364). Accordingly, sentence (30) introduces an object referent u and two kind referents U and V into the discourse whereby U corresponds to the kind DOCTOR, V corresponds to the kind PATIENT and u is an instance (spatiotemporal manifestation) of U. The sentence furthermore introduces an event token referent e and an event type referent E whereby E corresponds to the event type EXAMINE and e is an instance of E. Moreover, u is related to e via the participant relation “actor” and V is related to E via the participant relation “undergoer”. Sentence (17), repeated here as (31), contains the same lexical material as (30) differing only in the morphosyntactic realisation of the second noun phrase which is now accompanied by the indefinite article egy. This brings it about that on top of the kind argument V an object argument v is introduced into the DRS: 12
With the exception of those noun phrases where spatiotemporal localisation, i.e. the step from the kind-level to the object-level, is brought about by nominal modifiers as in parts of that machine, books she lost yesterday, etc. (Carlson, 1977; Chiercha, 1998).
140
Olav Mueller-Reichau
(31) Az orvos egy beteget vizsg´alt the doctor a patient.ACC examine.PAST ‘The doctor examined a patient’ uveUVE doctor(U) loc(u,U) patient(V) loc(v,V) examine(E) loc(e,E) actor(e,u) undergoer(e,v) The DRS of (30) and the DRS of (31) differ from each other in that they impose different requirements on embedding functions making them true. The DRS of (31), to begin with, explicitly requires the truthmaking model to contain two object individuals, one satisfying the condition of being a doctor (of being an instance of the kind DOCTOR) and one satisfying the condition of being a patient (of being an instance of the kind PATIENT). The DRS of (30), by contrast, does not explicitly require an object instance of the kind PATIENT. What it does require the model to contain, however, is an event type EXAMINE and an instance thereof. The event type must be restricted to instances of the kind PATIENT as possible undergoer participants13 and the event token must have an instance of the kind DOCTOR as the actor participant. To the extent that examining is impossible without an examined object, this constrains the set of truthmakers of (30) to those models containing at least one object individual satisfying the condition of being a patient even though this is not explicitly required. 6
Deriving the Sentence Typology
To conclude, let us look at how the present proposal derives the semantics of the sentences presented in section 1. Kind generics do not refer to tokens at 13
This is the literal way to formalise Mithun’s description of the function of incorporated nominals (emphasis added): “Incorporated nouns [. . . ] qualify the verb, narrowing its scope semantically to pertain to the kind of patient, location, or instrument designated by the noun” (Mithun, 1994, p. 5025); “The N does not refer to a specific, countable entitiy. Instead, it narrows the scope of the V to an activity directed at a certain type of patient” (Mithun, 1984, p. 852).
Type- and Token-Level Composition
141
all. The DRS of (32) is true if, within the model, the single greatest member of the set of types of flamingos is identical to one member chosen from the set of types of birds. The DRS of (33) is true if the structure of the model supports that any type of sailor is at the same time a type of smoker, i.e. qualifies for the actor role of the event type SMOKE. The fact that there are certainly particular sailors (“sailor-tokens”) who do not smoke does not question the truth of (33) as long as the asserted relation is among types. (32) The flamingo is a bird. UV flamingo(U) bird(V) =(U,V) (33) Sailors smoke. UE sailors(U) smoke(E) actor(E,U) Characterising sentences predicate an essential property of the object referred to by the subject term (Krifka et al., 1995, p. 13). We can model essential property assignments by letting spatiotemporal localisation apply only to the subject and relating the kind argument introduced by the predicate to the kind argument underlying the object argument of the subject term. Accordingly, the DRS of (34) is true if the model contains an object individual which is an instance of that member of the set of types of animals which is identical to the greatest member of the set of types of lions. (34) Look kids, this animal is the lion. uUV animal(U) loc(u,U) lion(V) =(U,V) Proper names are special in that they directly introduce an object referent into the DRS. Spatiotemporal localisation is superfluous with proper names, so to
142
Olav Mueller-Reichau
speak. Because reference to an object presupposes the presence of an underlying kind (La Palme Reyes, Macnamara and Reyes, 1994; D¨olling, 1995), the proper name Moby Dick in (35) introduces not only an object argument u (bearing the name Moby Dick) but also a kind argument U as well as a condition loc(u,U). The sentence furthermore differs from (34) in that the predicative is represented by an indefinite. Nevertheless, (35) is a characterising sentence. Its DRS is true if one member chosen from the set of types of whales is identical to the kind underlying the object individual named Moby Dick: (35) Moby Dick is a whale. uUV Moby Dick(u) loc(u,U) whale(V) =(U,V) Unlike (34) and (35), sentence (36) expresses an accidental property assignment. To model episodicity, I assume that a woman in (36) introduces not only a kind argument but on top of that an object argument. This analysis is supported by the observation that the noun phrase a woman provides an antecedent for the pronoun she. (36) At the fancy ball, Harry was a woman. She was very sexy. uvUV Harry(u) loc(u,U) woman(V) loc(v,V) =(u,v) This DRS is true if the model contains the object named Harry and a kind underlying Harry, if it contains the kind WOMAN as well as an instance of it, and if this instance of the kind WOMAN is identical to the object named Harry14 . 14
A reviewer points out that the DRS for (36) entails, contra to what we understand from the sentence, that Harry is a woman: if u and v were identical, then every condition imposed on u must at the same time be a condition imposed on v. What the DRS is supposed to encode is the information that, within the given context of utterance, the object
Type- and Token-Level Composition
143
Sentence (37) is again a characterising sentence. The verb introduces an event type argument E and a condition smoke(E). Since Mary appears as the subject of smokes, the actor (smoker) role of the event type is additionally required to be identical to the kind underlying this proper name. According to this analysis, the following would be an adequate paraphrase: ‘Mary is an instance of the type of people (individuals) who qualify as smokers, i.e. who typically smoke’. Note that “typically smoke” is reflected by a non-localised event argument in the final DRS so that no appeal is made to a generic operator (of whatever format)15 . (37) Mary smokes. uUE Mary(u) loc(u,U) smoke(E) actor(E,U) Example (38) is an eventive episodic sentence. The event type argument of the action verb smoke undergoes spatiotemporal localisation so that an additional event token argument is introduced into the DRS. The bearer of the name Mary is designated as playing the actor role in the event token referred to. (38) Mary is smoking. ueUE Mary(u) loc(u,U) smoke(E) loc(e,E) actor(e,u) to which one may refer by means of the proper name Harry is an object to which one may refer by means of the indefinite description a woman. Objects are spatiotemporally bounded entities. Therefore, for two objects to be identical, it suffices that they occupy the same spatial and temporal extension – as do Harry and a specific woman at the fancy ball in (36). For two objects to be identical it is not required that they are instances of identical sets of kinds. If Harry were indeed a woman (not only in the context of a fancy ball), the DRS would contain the condition “=(U,V)”. 15 I draw a categorial distinction between “capacity generics” which report on relations among types as in Mary is a smoker and “habitual generics” (pluractionals) which report on quantifications over event tokens as in Mary smokes a cigarette every morning. This terminology is adopted from Green (2000) who shows that these two types of genericity are formally distinguished in African American English.
144
Olav Mueller-Reichau
Last not least, we saw that type-level subjects can go together with token-level predicates and called these cases kind episodics: (39) Man set foot on the moon in 1969. eUE man(U) set foot on moon(E) loc(e,E) actor(E,U) in 1969(e)
7
Conclusion
English is the kind of language that shows no direct morphosyntactic reflexes of the ontological type/token distinction. However, in view of the bulk of indirect evidence found in connection with the Carlsonian stage-level/individual-level contrast, Bach (1994, p. 277) is led to conclude that the distinction does manifest itself in the English grammar as a “covert category”. The proposal outlined in the present paper supports this view. Even more, it implies that the type/token distinction is fundamental to every natural language grammar. Formally underdetermined16 , an English noun phrase (or determiner phrase) is interpreted in accordance with whether the predicate selects for a type term or a token term in the particular context at hand. Note that in order to maintain underdeterminacy, one has to give up the often made assumption (e.g. Vergnaud & Zubizarreta, 1992) that the English articles map type expressions onto token expressions. Languages such as Hungarian or Turkish unequivocally signal the type-level status of certain complements of token-level verbal predicates by dropping the determiner or the case marker, respectively. The fact that many languages display a systematic formal contrast between “non-specific” noun phrases and “specific” noun phrases cross-cutting the definite/indefinite distinction may likewise belong here (Giv´on, 1984; von Heusinger, 2002)17 . 16
I have to remain agnostic as to whether and how prosodic structure serves to disambiguate noun phrase usages in English. 17 It seems to me that type referring phrases which do not undergo spatiotemporal localisation, neither by themselves nor “from outside”, get a generic interpretation, that indefinite type-level expressions which appear in syntactic contexts calling for a tokenlevel interpretation get a nonspecific existential interpretation and that indefinites which autonomously project up to the token-level get a specific interpretation. Admittedly, this is a vague statement, needing substantiation via future research.
Type- and Token-Level Composition
145
To conclude this paper, let me mention a suggestion put forward in typological linguistics. Observations on the phrase structure of Tongan lead Broschart (1997) to postulate two general language types. A prototypical “type-token language” like Tongan formally distinguishes type expressions from token expressions at the same time leaving the distinction between words projecting argument phrases (nouns) and words projecting predicate phrases (verbs) underspecified. A “noun-verb language” like Latin, by contrast, draws a clearcut lexical distinction between nouns and verbs while not distinguishing between type expressions and token expressions at the level of phrases. English does not represent the prototypical case of a noun-verb language, because so many lexical units are not unequivocally categorised as nouns or verbs. Nevertheless, English is a good member of the noun-verb language category. References Bach, E. (1994). The semantics of syntactic categories: A cross-linguistic perspective. In J. Macnamara & G. E. Reyes (Eds.), The logical foundations of cognition (pp. 264–281). New York: OUP. Baker, M. (2003). Lexical categories: Verbs, nouns and adjectives. Cambridge: CUP. Bauer, W. (1993). Maori. London and New York: Routledge. Broschart, J. (1997). Why tongan does it differently: Categorial distinctions in a language without nouns and verbs. Linguistic Typology, 1, 123–165. Carlson, G. (1977). Reference to kind in English. Amherst: University of Massachusetts. Carlson, G. (2000). Weak indefinites. (unpublished manuscript) Chierchia, G. (1984). Topics in the syntax and semantics of infinitives and gerunds. Amherst: University of Massachusetts. Chierchia, G. (1998). Reference to kinds across languages. Natural Language Semantics, 6, 339–405. Chierchia, G., & Turner, R. (1988). Semantics and property theory. Linguistics & Philosophy, 11, 261–302. Chung, S., & Ladusaw, W. (2004). Restriction and saturation. Cambridge: MIT Press. Dayal, V. (1999). Bare singular reference to kinds. In Proceedings of SALT IX, cornell university. Ithaca, New York.
146
Dayal, V. (2003). manuscript)
Olav Mueller-Reichau
A semantics for pseudo incorporation.
(unpublished
Dayal, V. (2004). Number marking and (in)definiteness in kind terms. Linguistics & Philosophy, 27, 393–450. D¨olling, J. (1992). Flexible interpretation durch sortenverschiebung. In I. Zimmermann & A. Strigin (Eds.), F¨ugungspotenzen (pp. 1–65). Berlin: Akademieverlag. D¨olling, J. (1995). Ontological domains, semantic sorts and systematic ambiguity. International Journal of Human-Computer Studies, 43, 785–807. Donnellan, K. (1966). Reference and definite descriptions. Philosophical Review, 74, 281–304. Farkas, D., & Swart, H. de. (2003a). Hungarian. Stanford: CSLI publications. Farkas, D., & Swart, H. de. (2003b). The semantics of incorporation. Stanford: CSLI. Farkas, D., & Swart, H. de. (to appear). Incorporation, plurality, and the incorporation of plurals: A dynamic approach. Catalan Journal of Linguistics. Geenhoven, V. van, & McNally, L. (2002). On the property analysis of opaque complements. (unpublished manuscript) Geenhoven, V. V. (1998). Semantic incorporation and indefinite descriptions. Stanford: CSLI. Giv´on, T. (1984). Syntax: A functional-typological introduction. Amsterdam: John Benjamins. Green, L. (2000). Aspectual be-type: Constructions and coercion in AfricanAmerican English. (Unpublished) Heusinger, K. von. (2002). Specificity and definiteness in sentence and discourse structure. Journal of Semantics, 19, 245–274. Kenesei, I., Vago, R. M., & Fenyvesi, A. (1998). Hungarian. London and New York: Routledge. Kornfilt, J. (2003). Scrambling, subscrambling, and case in Turkish. In S. Karimi (Ed.), Word order and scrambling. Oxford: Blackwell. Krifka, M. (1994). Genericity: An introduction. In G. Carlson & F. Pelletier (Eds.), The generic book (pp. 1–124). Chicago, London: University of Chicago Press.
Type- and Token-Level Composition
147
Ladusaw, W. (1994). Thetic and categorial, stage and individual, weak and strong. (unpublished manuscript) McNally, L. (1998). Existential sentences without existential quantification. Linguistics & Philosophy, 21, 353–392. McNally, L., & Geenhoven, V. van. (1998). Redefining the weak/strong distinction. (unpublished manuscript) Mithun, M. (1984). The evolution of noun incorporation. Language, 60, 847– 894. Mithun, M. (1986). On the nature of noun incorporation. Language, 62, 32–37. Mithun, M. (1994). Word-formation: Incorporation. In R. Asher (Ed.), The encyclopedia of language and linguistics (pp. 5024–5026). Oxford: Pergamon Press. Mithun, M. (2000). Incorporation. In G. B. et al. (Ed.), Morphology. an international handbook on inflection and word-formation (pp. 916–929). Berlin: Walter de Gruyter. Reyes, M. L., Macnamara, J., & Reyes, G. E. (1994). Reference, kinds, and predicates. In J. Macnamara & G. E. Reyes (Eds.), The logical foundations of cognition (pp. 91–143). New York: OUP. Vergnaud, J.-R., & Zubizarreta, M. (1992). The definite determiner and the inalienable construction in French and in English. Linguistic Inquiry, 23, 595–652. Ward, G., & Birner, B. (1994). Definiteness and the English existential. Language, 71, 722–742.
Compositionality and Contextuality as Adjoint Principles Oleg Prosorov In this article we present an outline of a mathematical framework for some explicit formulation of Frege’s generalized compositionality and contextuality principles proposed in our previous works on formal hermeneutics. The main result is an equivalence between two key categories, called Frege Duality, that relates compositionality and contextuality as adjoint principles. The formal hermeneutics we develop does not mean hermeneutics of any formal system but is a kind of discourse interpretation theory which is concerned with the application of rigorous mathematical methods in analysis of natural language understanding. More precisely, by a natural language we further mean some unspecified Indo-European language such as English, French, German, Russian, and we consider a natural language as a means of communication. We consider mainly a written type of linguistic communication and so its basic units are discourses or texts. The only texts we admit are supposed to be written with good grace and intended for a human understanding; we call the texts of this kind admissible. All sequences of words written in order to confuse a reader or to imitate some human writings are cast aside as irrelevant to linguistic communication. In the process of reading, the understanding is not postponed until the final word of the final sentence of a given text. The meaning is not at the end of the text but traverses it. So the text should have the meaningful parts and the meanings of these parts determine the meaning of the whole as it is postulated by the principle of hermeneutic circle. A central task of semantic theory is to explain how these local understandings of the constitutive parts produce the global understanding of the whole. According to Schleiermacher, the part-whole structure is essential in a matter of text interpretation. The theoretical principle of the hermeneutic circle is a precursor to that of compositionality and that of contextuality formulated later in 19th century. Grosso modo, hermeneutics as a discourse Address for correspondence: Laboratory of Algebra and Number Theory, Petersburg Department of the Steklov Institute of Mathematics, Russian Academy of Sciences, 27 Fontanka, 191023 St. Petersburg, Russia. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
150
Oleg Prosorov
interpretation theory is based on the hermeneutic circle principle according to which the meaning of the whole text is sought in terms of the meanings of its constitutive parts. This is a sort of compositionality that is meant implicitly to hold at the level of text. At any rate, the usual semantics at the level of sentence is based on the implicit use of the compositionality principle according to which the meaning of the whole sentence is a function of the meanings of its constitutive parts. So the hermeneutics may be defined as semantics at the level of text which covers a usual semantics at the level of a sentence. It is a reason why we call formal hermeneutics the present sheaf-theoretical discourse interpretation theory that provides a mathematical account of the text understanding process while rejecting the attempt to codify interpretative practice as a kind of calculus.
1
Sense, Meaning and Reference
In the formal analysis of text understanding, we distinguish the semantic notions sense, meaning and reference. This triad of concepts formalize a certain distinction that seems to appear in various forms all over the history of language studies. To avoid the possible misunderstanding from the very beginning, we would like to make precise our usage of these key terms and to point out that our distinction sense/meaning differs from Frege’s classic Sinn/Bedeutung distinction, whereas we accept reference to be an English translation of Frege’s Bedeutung. Our aim is not to propose some competitive alternatives to Frege’s Sinn/Bedeutung distinction but to find some adequate semantic concepts pertinent as instruments for the rigorous formal analysis of the text interpretation process where Frege’s classic compositionality and contextuality principles are involved. However, one can find our distinction sense/meaning in the different usage of the word ‘Sinn’ in early writings by Frege before he had formalized the Sinn/Bedeutung distinction in his classic work of 1892. We consider sense and meaning as being the basic notions and we express them by means of examples and descriptions. Instead of an analysis of these notions in terms of more basic ones, we seek for key mathematical structures that underlie the process of discourse or text understanding. We accept the term fragmentary meaning of some fragment of a given text to be the content which is grasped when the reader has understood this fragment in some particular situation of reading. But it depends on so many factors such as the personality of the reader, the situation of the reading, many kinds of presuppositions and prejudices summed up in the reader’s attitude, etc., which we call sense or mode of reading; every reading is only an interpretation where the historicity of the reader and the historicity of the text are involved; thus in our usage, a fragmentary meaning is immanent not in a given fragment, but
Compositionality and Contextuality as Adjoint Principles
151
in an interpretative process of its reading. In our approach, the notion sense (or mode of reading) may be considered as a secular remake of the exegetical approach to this notion in the medieval theology. The Fathers of the Church have distinguished the four senses of Sacred Scripture: “Littera gesta docet, quid credas allegoria, moralis quid agas, quo tendas anagogia”. In other words, our approach defines the term sense as a kind of semantic orientation in the interpretative process which relates to the totality of the message to understand, as some mode of reading. At the level of text, it may be literal, allegoric, moral, eschatological, na¨ıve, psychoanalytical, etc. At the level of the sentence, it may be literal or metaphoric (indirect). At the level of the syntagma or word, it may be literal or figurative. In our approach, the reader grasps a fragmentary meaning that results from an interpretative process guided by some mode of reading or sense adopted in accordance with his attitude and based on the linguistic competence, which is rooted in the social practice of communication with others using the medium of language. Note that, following this terminology, we can read one and the same text in many different senses (moral, historical, etc.) to realize, in result, that we have grasped the different meanings. Likewise for a sentence. It seems that the usage of the key terms sense, meaning is in accordance with their everyday usage as common English words (likewise for the French terms sens and signification). As for the term sense, it should be mentioned that in French the word ‘sens’ literally equals ‘direction’ and as figurative it may be litt´eral, strict, large, na¨ıf, bon, platonicien, leibnitz´een, fr´eg´een, kripk´een, etc. In English, a figurative sense may also be literal, narrow, wide, na¨ıve, common, platonistic, Leibnizian, Fregean, Kripkean, etc. In this usage, the term sense deals with the totality of discourse, text, expression or word and involves our subjective premises that what is to be understood constitutes a meaningful whole. In this usage, the term sense or mode of reading concerns the reader’s interest in the subject matter of the text; it is a kind of questioning that allows a reader to enter into a dialogue with the author. So our usage of the term sense as a mode of reading is near to that proposed by the exegetic concept of the four senses of the Sacred Scripture, whereas our usage of the term fragmentary meaning as the content grasped in some particular situation of reading corresponds rather to the common usage of ordinary English words. But this fragmentary meaning should not be understood as some mental state of the reader because the mental states of two readers could neither be identified, nor compared in some reasonable way; in contrast, our approach is based on the explicit criterion of equality between the fragmentary meanings we shall formulate later. In our usage, the term fragmentary meaning should not be understood in the Tarski/Montague style as the relation between words and world;
152
Oleg Prosorov
nor should it be related to any kind of truth-value or truth-conditions because the understanding of e. g. novels, fairy tales or science fictions is achieved regardless of any assumption about verifiability. The understanding of meaning and the knowledge of truth both relate to the world, but in different ways. We observe that a meaning s of some fragment U of a given text X is understood by the reader as an objective result of interpretation of this passage U; its ‘objectivity’ carries no claim of correspondence to reality, but is grounded in the conviction that this meaning s may be discussed with anybody in some kind of dialogue (actual or imaginary) where such a meaning s may be finally shared by the participants or may be compared with any other meaning t of the same fragment U. We shall later formulate the criterion for such a comparison procedure as some definition of equality (S). This kind of objectivity of meaning is based not only on the shared language, but principally on the shared experience as a common life-world and it deals so with the reality. According to Gadamer, this being-with-each-other is a general building principle both in life and in language. The understanding results from being together in a common world. This understanding as a presumed agreement on ‘what this fragment U wants to say’ becomes for the reader its meaning s. In this usage, the meaning of an expression is the content that the reader grasps when he understands it; and this can be done regardless of the ontological status of its reference. The process of coming to such a fragmentary meaning s may be thought of as an exercise of the human capacity of naming and understanding; it is a fundamental characteristic of human linguistic behavior. 2
Semantic Topologies
First of all, we need to define rigorously what a text is in our formalism. Clearly any text is not just a set of its sentences as the sentence is not a set of its words. What is important is the order they ought to be read. In addition, the same words may occur in several places of one sentence and the same sentences may occur in several places of one text. So from a mathematical point of view, we ought to consider a given sentence as a sequence of its words and a given text as a sequence of its sentences. Likewise any part of a considered text is simply a subsequence of a given sequence. Any mathematical structure on a given text, such as topology, sheaves etc., is to be defined on the functional graph of the corresponding sequence. Henceforth, we shall simply identify a given text with the graph of its corresponding sequence. The reading of text as well as the utterance of discourse is always a process that develops in time, and so it inherits in some way its order structure. From a linguistic point of view, this order structure is known under the notion of lin-
Compositionality and Contextuality as Adjoint Principles
153
earity or that of word order. At the level of text, it is a natural linear order ≤ of sentences reading the text bears on. It is a well known mathematical result that any order structure carries several standard topological structures as for example the classical interval topology generalizing Euclidean topology on the real line or other topologies like upper topologies, the Scott topology or the Alexandroff topology (Ern´e, 1991). But it’s not a question of grafting some topology onto a given text, but of observing that any admissible text has an underlying topological structure which arises quite naturally. For, the understanding is not postponed until the final word of the final sentence of a given text, the meaning is not at the end of a text but traverses it. So the text should have the meaningful parts and the meanings of these parts determine the meaning of the whole as it is postulated by the principle of the hermeneutic circle. A central problem of semantic theory is to explain how these local understandings of the constitutive meaningful parts produce the global understanding of the whole text. The philological investigations are abound in examples of meaningful fragments quoted from studied texts. So the meaningful parts might be the subject of comment or discussion for being considered as worth of interpretation. Certainly, not each subsequence of a given text is meaningful, as for example it might happen to be for this one made up of the tenth, the twentieth, the thirtieth, and so on, of every 10n-th sentence of a given text. To the contrary, some meaningful fragment becomes understood in the process of reading and rereading. It seems to be in agreement with our linguistic intuition that (i) an arbitrary union of meaningful parts of an admissible text is meaningful; (ii) a non-empty intersection of two meaningful parts of an admissible text is meaningful. For, an admissible text is supposed to be meaningful as a whole by definition, it remains only to define the meaning of the empty part of a given text in order to provide it with some topology in a strict mathematical sense, where the open sets are all the meaningful parts. Given an admissible text X, let the meaning of its empty part ∅ be a one-element set pt (e. g. the meaning of the title of X if there is one). It shows that any admissible text may be endowed with a semantic topology where the open sets are defined to be all its meaningful parts (Prosorov, 2004, sec. 1.2). Now, however, the question arises, what formal criteria would be given for the meaningfulness of a part U ⊂ X? The concepts of fragmentary meaning and meaningful fragment are closely related, for they should come together in the matter of natural language text understanding. As has been noted from the very beginning, our theory concerns only the special kind of texts referred to as admissible, which are supposed to be written with good grace as destined for a human understanding. Thus the meaningful parts are supposed to
154
Oleg Prosorov
be those which are intended to convey the communicative content. Therefore, a natural style of text writing should respect good order and arrangement, as each part ought to fall into its right place; because the natural process of reading (from left to right and from top to bottom) supposes that understanding of any sentence x of the text X should be achieved on the basis of the text’s part already read, because the interpretation cannot be postponed, although it may be made more precise and corrected in further reading and rereading. This is a fundamental feature of a competent reader’s linguistic behavior. Here is how Rastier describes it: Alors que le r´egime herm´eneutique des langages formels est celui du suspens, car leur interpr´etation peut se d´eployer apr`es le calcul, les textes ne connaissent jamais le suspens de l’interpr´etation. Elle est compulsive et incoercible. Par exemple, les mots inconnus, les noms propres, voire les non-mots sont interpr´et´es, validement ou non, peu importe (Rastier, 1995, pp. 165–166). According to such a paradigm of ordinary reading, a given text is treated as a written speech and so their distinctive feature is a temporality, implicit for the former and explicit for the latter. The natural temporality of phonetic phenomena is a reason to call this kind of semantic topology phonocentric. Thus for every pair of distinct sentences x, y of an admissible text X, there exists an open (i. e. meaningful) part of X that contains one of them (to be read first in the natural order ≤ of sentences) and doesn’t contain the other. Hence the admissible text endowed with the phonocentric topology should satisfy the separation axiom T0 of Kolmogoroff and so it is a T0 -space. This characteristic might be proposed as a formal property distinguishing the phonocentric topology between the other semantic topologies. According to our conceptual distinction sense/ meaning, we consider sense as a kind of semantic orientation in the interpretative process which relates to the totality of the message to be understood. Thus we suppose that any part U ⊂ X which is meaningful in one sense (or mode of reading) should remain meaningful under the passage to some another sense (or mode of reading). It should be noticed that another concept of meaning or criterion of meaningfulness would imply another definition of meaningful fragment and so will define yet another type of semantic topology. Remark. In Prosorov (2002, chap. 3), we have defined a phonocentric topology at the level of text by specifying, in a constructive manner, the basis of topology at each sentence x ∈ X to be the class of intervals {l | e ≤ l ≤ x}, where e is the first sentence of the paragraph that contains x or the first one in any paragraph which precedes that containing x. In this approach, the opens of a topological basis are defined by means of the explicit semantic markers (indents) the text is endowed with. This definition allows to take into consideration the
Compositionality and Contextuality as Adjoint Principles
155
anaphora and the theme/rheme semantic relations. As any constructive definition, it has some advantage of being concrete, but not all semantic relations can be formally recovered by means of explicit text division into paragraph, section, chapter, etc. However this definition covers the majority of examples of meaningful fragments cited in the philological investigation. An uttered discourse has many other expressive means such as stress patterns, intonation patterns, rhythm and pause, which disappear in a written text but should be recovered in the process of reading. Moreover, that constructive definition disregards the influence of the author’s vocabulary choice on the reader’s understanding process. In the present paper, we will follow our approach (Prosorov, 2004) to define a phonocentric topology in a general axiomatic setting. As we have mentioned above, not all the parts of an admissible text are meaningful, and hence the phonocentric topology is not discrete. On the other hand, there are certainly the proper meaningful parts in an admissible text, hence the phonocentric topology is not coarse. Recall that a topological space X is an A-space (or Alexandroff space) if the set O(X) of all its open sets is closed under arbitrary intersections. For an admissible text being finite, it defines a finite space and thus it is an A-space. Let X be an admissible text. For a sentence x ∈ X, we define Ux to be the intersection of all the meaningful parts that contain x. In other words, for a given sentence x, the part Ux is a smallest open neighborhood of x. It is clear that x ∈ Uy if and only if y ∈ cl({x}), where cl({x}) denotes the closure of the one-element set {x}. This relation ‘x is contained in all open sets that contain y’ is usually called a specialization, and some authors denote it as y x or y ≤ x (e. g. Ern´e, 1991, p. 59) contrary to others who denote it as x y or x ≤ y (e. g. May, 2003, p. 2). As for the notation choice, we rather follow May (2003) in defining a relation on the text X by setting x y if and only if x ∈ Uy or, equivalently, Ux ⊂ Uy . Note that in this notation, for all x, y ∈ X, x y implies that x ≤ y, where ≤ denotes the natural order of sentences reading. Indeed, given x y, we have that x ∈ Ux ⊂ Uy ; and, following a fundamental feature of a competent reader’s linguistic behavior described in the above quotation from Rastier (1995), there is certainly a meaningful neighborhood V of y such that V ⊂ {z | z ≤ y}; hence Uy ⊂ V ⊂ {z | z ≤ y} and so x ∈ {z | z ≤ y} and x ≤ y. The following properties of phonocentric topology and its close relation with partial order structure are the simple translation of the well-known results for the finite topological spaces to the linguistic situation . Lemma 1. The set of all open sets of the kind Ux is a basis of the phonocentric topology for X. Moreover, it is the unique minimal basis of phonocentric topology for X. Clearly, for each x ∈ U ∈ O(X), indeed x ∈ Ux ⊂ U. If B is another basis,
156
Oleg Prosorov
then there is a B ∈ B such that x ∈ B ⊂ Ux . Hence B = Ux , so that Ux ∈ B for all x ∈ X. Lemma 2. The relation is a partial order on X. The relation is clearly reflexive and transitive. It is also antisymmetric, because x y and y x means Ux = Uy ; for X being T0 -space, it implies that x = y. Lemma 3. Let x, y be any two sentences in X, then Ux = Uy if and only if x = y. This obvious property of the smallest basic sets will be important further in the definition of contextual meanings. Proposition. The phonocentric topology on an admissible text defines a partial order structure on it by means of specialization; the initial phonocentric topology can be reconstructed from this partial order in a unique way. Given a partial order on a finite set X, one defines a T0 -topology by means of the basis constituted of all sets {l | l x}. The given order structure is reconstructed from this topology by means of specialization. This is a linguistic variant of a well-known general result concerning the relationships between topological and order structures on a finite set. That is, the category of finite topological spaces and continuous maps is dually equivalent to the category of finite partially ordered sets and monotone maps (Ern´e, 1991). All these considerations might be repeated with slight modifications in order to define a phonocentric topology at each semantic level of a given admissible text. At each level (text, sentence, word), we distinguish its primitive elements which are the points of a corresponding topological space considered to be the whole at this level. The passage from one semantic level to another immediately superior consists in the gluing of the whole space into a point of the higher level space. In the following, we consider mainly a phonocentric topology at the level of text and we often use the term fragment as equivalent to that of open subset in the case of topological space related to text. As soon as we have defined a phonocentric topology, we may seek to interpret some linguistic notion in the topological terms and then to study it by the topological means. Take for example the property of a literary work to be the communicative unity of meaning. So for any two novels X and Y yet of the same kind, say historical, detective or biographical, their concatenation Z under one and the same cover doesn’t constitute a new one. What does it mean, topologically speaking? We see that for any x ∈ X there exists an open neighborhood U of x that doesn’t meet Y , and for any y ∈ YFthere exists an open neighborhood V of y that doesn’t meet X. Thus Z = X Y , that is a disjoint union of two non-empty open subsets X and Y , and hence Z is not connected.
Compositionality and Contextuality as Adjoint Principles
157
Recall that a topological space X is said to be connected if X is not the union of two non-empty disjoint open subsets. It is clear that each smallest basic set Ux is connected. In the mathematical order theory, there exists a simple intuitive tool for the graphical representation of finite partially ordered set (poset) called Hasse diagram (Stanley, 1986). For a poset (X, ), the cover relation ≺ is defined by: ‘x ≺ y if and only if x y and there exists no element z ∈ X such that x z y’. In this case, we say that y covers x. For a given poset (X, ), its Hasse diagram is defined as the graph whose vertices are the elements of X and whose edges are those pairs {x, y} for which x ≺ y. In the picture, the vertices of Hasse diagram are labeled by the elements of X and the edge {x, y} is drawn by an arrow going from x to y (or sometimes by an indirected line, but in this case the vertex y is displayed lower than x). The usage of some kind of Hasse diagram under the name of Leitfaden is widely spread in the mathematical books to facilitate the understanding of logical dependence of the chapters. The poset considered in this usage is always constituted not of all sentences but of all chapters of the book. So, in the introduction to Serre (1979), we read: “The logical relations among the different chapters are made more precise in the Leitfaden below.” and there is the following Hasse diagram: 1
2
3
10
4
7 ==
== ==
6
8
9
5:
:: ::
12 =
11
14
15
== ==
13 = == ==
Fortunately, there exists another book (Cassels, 1986) on the same theory and with the same title, where the author writes about its content: “The logical dependence of the chapters is given by the Leitfaden.” In this Hasse diagram,
158
Oleg Prosorov
all edges are supposed to be directed from the upper vertex to the lower one: 1 2 4
PPP PPP
3 5
6 7 8
PPP
12
9 10
PPP
13
11
These two Leitfadens surely presuppose the linear reading within each chapter. It is clearly seen that these two Hasse diagrams have different geometrical invariants, which shows some difference in these two expositions of the class field theory. As for geometrical properties of the corresponding Hasse diagrams, we conclude that there are several minima and maxima in the specialization order . The points of the minimum are clearly open one-element sets; its understanding should be considered at the immediately inferior semantic level. We see that if x y then x ≤ y, where ≤ denotes the linear order of chapter numbering. We remark also that these two Hasse diagrams are connected. It is known that a finite topological space is connected if and only if the Hasse diagram of the corresponding specialization order is connected. Given an admissible text, one can, by means of analytical reading or perhaps with the help of the author, define its phonocentric topology at the level of text (find all the basis sets Ux ) and then draw the Hasse diagram of the corresponding poset. Certainly, the author has some clear representation of this kind during the writing process when he resolves the problem of dispositio in the terminology of Quintilien. Anyhow, the representations of this kind appear implicitly during the reading process. From a theoretical point of view, the usage of Hasse diagrams should be important in the discourse analysis and many philological investigations contain it in some disguised form. On the other hand, we can firstly construct the Hasse diagram of a given admissible text X and then define an Alexandroff topology on it by choosing the open sets to be all the down-sets; the initial partial order is recovered from this topology by means of specialization. Hence the importance of geometrical studies of the corresponding Hasse
Compositionality and Contextuality as Adjoint Principles
159
diagrams, which may be thought of as a constitutive part of some discipline like formal textual syntax. The rest of the paper should be thought of as a passage from formal syntax to formal semantics. 3
Compositionality
Being composed during the interpretative process, some meaning s of a given fragment U is rooted in the use and is motivated by the mode of reading (or sense) F . According to Gadamer, the fragmentary meaning s which is grasped in some particular situation of reading may be considered as what the reader or listener understands to be a possible response to an implicit question implied by the text. One finds such a meaning by asking himself: ‘What does it mean in this or that sense?’. So it is an odd misconception to think that an expression, a fragment, a text has only one true meaning, hence the multiplicity of meanings for any meaningful fragment of a text. Let X be an admissible text, and let F be an adopted sense or mode of reading. For a given fragment U ⊂ X, we, in a platonistic manner, collect all the corresponding fragmentary meanings of U in the set F (U). Thus for any sense (mode of reading) F , we are given a map U 7→ F (U) defined on the set of all meaningful parts U ⊂ X. Formally, the meaning s of a fragment U of a text X in some particular situation of reading in a sense F is one of the elements of the set F (U). Formulated not only for the whole text X, but more generally for any meaningful fragment V ⊂ X, the precept of the hermeneutic circle ‘to understand any part of text in accordance with the understanding of the whole text’ defines a family of maps resV,U : F (V ) → F (U), where U ⊂ V . So a meaning s ∈ F (V ) defines the meaning resV,U (s) ∈ F (U), with the obvious property of identity preserving 1◦ resV,V = idF (V ) and that of transitivity 2◦ resV,U ◦ resW,V = resW,U for all non-void nested subfragments U ⊂ V ⊂ W . Recall that we have defined F (∅) = pt above, so the maps resV, ∅ and res∅, ∅ are uniquely defined in an obvious manner. Thus the sets F (U) and the maps resV,U satisfying the properties of identity preserving 1◦ and transitivity 2◦ as above are defined for all open sets in the phonocentric topology on X. Thus F is a (contravariant) functor in a strict mathematical sense because any topological space X defines a very simple category O(X): the objects of O(X) are the open subsets of X, the morphisms of O(X) are the canonical inclusions U ⊂ V ; all axioms of category are obviously satisfied. Considered as a functor to the category of sets, a sense (mode of reading) F associates with any object U of Ob(O(X)) the set F (U) of its fragmentary meanings and with any morphism U ⊂ V of Mor(O(X)) the map resV,U : F (V ) → F (U). From a mathematical point of view, these data (F (V ), resV,U )V,U∈O(X) defines a presheaf (of sets)
160
Oleg Prosorov
over X (Lambek & Scott, 1986; Mac Lane & Moerdijk, 1992; Tennison, 1975). For a given presheaf F , the elements of F (U) are called sections (over) U; the elements of F (X) are called global sections. It may happen that some fragment of a given text needs many resumptions of the reading process. So we have to consider the reading process for any fragment U as its covering by some family of subfragments (U j ) j∈J already S read. Such a covering of U is said to be open if U = j∈J U j and each U j is open in X. According to Quine, there is no entity without identity; so we need some notion of identity between fragmentary meanings accepted technically as the content grasped during the reading process. Otherwise, it would be impossible to consider the fragmentary meanings to be well-defined objects susceptible to set theoretic operations and quantifications with them. The notion of equality that seems to be quite adequate to our linguistic intuition is defined by the following: Claim S (Separability). Let X be an admissible text, and let U be a fragment of X. Suppose that s,t ∈ SF (U) are two fragmentary meanings of U and there is an open covering U = j∈J U j such that resU,U j (s) = resU,U j (t) for all fragments U j . Then s = t. In other words, s, t are considered to be identical meanings of the whole fragment U (i. e. globally on U) if and only if they are identical locally. This definition determines an effective procedure to decide whether two given fragmentary meaning s, t of one and the same U ⊂ X are equal. Following a standard sheaf theoretic terminology (Tennison, 1975, p. 14), a presheaf satisfying the claim (S) is called separated. Thus any sense (mode of reading) F defines some separated presheaf of fragmentary meanings over an admissible text X. The precept of the hermeneutic circle ‘to understand the whole text by means of understandings of its parts’ is really a kind of compositionality principle for the fragmentary meanings at the level of text. Formulated rigorously, it states that the fragmentary meanings together satisfy the following: Claim C (Compositionality).SLet X be an admissible text, and let U be a fragment of X. Suppose that U = j∈J U j is an open covering of U; suppose we are given a family (s j ) j∈J of fragmentary meanings, s j ∈ F (U j ) for all fragments U j , such that resUi ,Ui ∩U j (si ) = resU j ,Ui ∩U j (s j ). Then there exists some meaning s of the whole fragment U such that resU,U j (s) = s j for all fragments U j . In other words, the locally compatible fragmentary meanings are composable in some global meaning. This claim (C) may be considered as a generalization of Frege’s classic principle of compositionality of meaning stated at the level of sentence, in the narrow sense, to the level of text.
Compositionality and Contextuality as Adjoint Principles
161
In mathematics, a separated presheaf satisfying the claim (C) of compositionality is called a sheaf. Note that for any sheaf, the presence of (S) guarantees the meaning s, whose existence is claimed by (C), to be unique as such. So we have the following sheaf theoretic Definition (Frege’s Generalized Compositionality Principle). A separated presheaf of fragmentary meanings naturally attached to any sense (mode of reading) of an admissible text is really a sheaf; its sections over any fragment of the text are the fragmentary meanings; its global sections are the meanings of the whole text. We have not yet defined morphisms for these sheaves. To illustrate this notion by means of example, consider e. g. the historical sense F and the moral sense G of some biographical text X. Let U ⊂ V be any two fragments of the text X. It seems to be very natural that any meaning s of fragment V understood in the historical sense F gives a certain well-defined meaning φ (V )(s) of the same fragment V understood in the moral sense G . Hence, for each V ⊂ X, we are given a map φ (V ) : F (V ) → G (V ). To transfer the meaning s of V in the historical sense to its meaning in the moral sense and then to restrict the latter to a subfragment U ⊂ V , following the precept of the hermeneutic circle, is the same operation as to make first the restriction from V to U of the meaning s in the historical sense, and then to transfer the understanding in the historical sense to the understanding in the moral one. This property of a family (φ (V ))V ∈O(X) may be expressed simply by claiming that the following diagram φ (V )
F (V ) −−→ G (V ) res0 resV,U y y V,U F (U) −−−→ G (U) φ (U)
commutes for all fragments U ⊂ V of X. This kind of transfer from the understanding in one sense to the understanding in some another sense is a usual matter of linguistic communication. Hence, such a family of maps (φ (V ))V ∈O(X) defines a natural transformation of senses φ : F 7→ G considered as functors and hence defines their morphism as sheaves. Thus, given an admissible text X, the data of all sheaves F of fragmentary meanings together with all its morphisms constitutes some category in a strict mathematical sense of the term. This category Schl(X) describing the exegesis of a particular text X will be called a category of Schleiermacher because he is often considered as the author of the hermeneutic circle idea understood as a main principle of interpretation (Skirbekk & Gilje, 1999, chap. 19). The succeeded development of hermeneutics has confirmed the importance of Schleier-
162
Oleg Prosorov
macher’s concept of circularity in text understanding. So in the classic Being and Time, Heidegger analyzed the circular structure of human understanding and its study was continued in Gadamer’s masterpiece Truth and Method. From our viewpoint, the concept of part-whole structure expressed by Schleiermacher in the hermeneutic circle principle, in the linguistic form, reveals the fundamental mathematical concept of a sheaf formulated by Leray more than hundred years later, in 1945. This justifies us to name the particular category of sheaves Schl(X) after Schleiermacher. It summarizes all multiple recurrences to the hermeneutic circle principle occurring in the exegeses of a given text X. The key word in this denomination of the category Schl(X) after Schleiermacher is ‘sheaf’, and neither ‘fragmentary’, nor ‘meaning’. At the level of sentence, the same considerations give a generalization of Frege’s classic compositionality principle (Prosorov, 2002, p. 35), but with words as primitive elements and syntagmas as meaningful fragments. 4
Sheaf-theoretic Aspect of Functionality
It is generally accepted that the conception of functionality gives a fairly definitive characterization of compositionality. According to Janssen (1997, p. 419), the best known formulation of compositionality principle is: “The meaning of a compound expression is a function of the meanings of its parts.” The compositionality principle arises in logic, informatics, linguistics and philosophy of language in many different formulations, which all however presuppose the concept of function and functionality. The notion of function is fundamental in mathematics and sciences. It was definitively formalized in the 18th century and is now uniformly accepted by all scientific communities. Any attempt to formalize a notion that bears the germ of the conception of functionality is dominated to a high degree by the formal mathematical definition of a function and this situation may be considered as a kind of paradigmatic trap. In defining some particular kind of compositionality principle as a function f : X → Y , one needs to explain formally what the domain X, the codomain Y and a subset f of the cartesian product X ×Y are that satisfy 1◦ for each x ∈ X, there is an y ∈ Y such that the ordered pair (x, y) is in f and 2◦ for a given x such an y is unique. These two fundamental properties are essential in the definition and they are formalized by the requirements that 1◦ f ◦ f −1 ⊃ ∆X and 2◦ f −1 ◦ f ⊂ ∆Y , where ∆X = {(x, x) | x ∈ X} is a diagonal in X. A function f (x1 , . . . , xn ) of several variables x1 , . . . , xn is simply modeled by presenting its domain as a cartesian product X = X1 × · · · × Xn , where again f ⊂ X ×Y and should satisfy the properties 1◦ and 2◦ characteristic for the functionality. In the present formal hermeneutics we propose an approach in modeling
Compositionality and Contextuality as Adjoint Principles
163
functionality based on another formalization of the fundamental properties 1◦ and 2◦ . Intuitively, we consider the claims (C) and (S) characteristic for a presheaf to be a sheaf as the generalization of the properties 1◦ and 2◦ characteristic for a binary relation to be functional. Indeed for a function f of several variables, it is settled that: 1◦ for any family of variables’ values (s1 , . . . , sn ), there exists a function’s value f (s1 , . . . , sn ) as dependent on them, and 2◦ this value is unique. Analogously for a sheaf, it is settled that: (due to C) for any family of sections (si )i∈I which are locally compatible on the open U, there exists a section s as their composition, and (due to S) this composition is unique. The claim (C) guarantees that the glued section s exists, and the claim (S) guarantees that such a section is unique. This is a revised conception of functionality which we use to express how the compositionality of fragmentary meanings should be formally defined. In this generalized conception of a functional dependence, the variables and their number are not fixed in advance (we consider an arbitrary family of pairwise compatible sections as variables), but for any such family of variables, there exists the glued section considered as their composition (the function’s value in a given family of variables) and such a section is unique. Other possible formalizations of compositionality based on the analysis of the expressions ‘depending on’, ‘function of’ are considered in (Hodges, 1999). 5
Contextuality
So far, we have defined only a notion of fragmentary meaning. To consider (at each semantic level) not only the meanings of fragments but also the meanings of its primitive elements (the points of a corresponding topological space), we define a notion of contextual meaning. Literally generalized to the level of text, Frege’s classic contextuality principle would claim that a given sentence has a meaning in relation to the whole text. But this widest contextualization seems to be excessive at the level of text for the reason that our understanding is progressing gradually with the reading or listening, and the meanings of sentences are caught during this process. In other words, the understanding of any sentence is not postponed until the reading of the final word of the whole text. To understand a given sentence x, we need a context constituted by some meaningful part containing it, i. e. by some open neighborhood of x. The problem is that the same sentence may occur in many quite different texts. According to Frege, if we ask for the meaning of a word, we ought to consider it in the context of some sentence; likewise when we ask for the meaning
164
Oleg Prosorov
of a sentence, we ought to consider it in the context of some fragment containing it. Suppose we want to assign a contextual meaning to some sentence x of a text X. Given a fragmentary meaning s of some fragment U containing x, we dispose a piece of data which constitutes some context to determine a contextual meaning of x. What we really want is to define a map ρ which transforms each fragmentary meaning s of a neighborhood U of x to some contextual meaning ρ(s) of x. It is clear that another fragmentary meaning t of another neighborhood V of x may supply yet something else concerning the sought contextual meaning of x. So this map ρ would be defined as its own for each neighborhood U of x. Thus we need to define a family of maps ρU, x : F (U) → T , where T ought to be a set that unites all the presumed contextual meanings of a sentence x considered relative to a given text X. Let U, V be two neighborhoods of x, such that V ⊂ U. Recall that F (U) and F (V ) are related by the restriction map resU,V : F (U) → F (V ) which determines how each meaning s of the fragment U gives a meaning resU,V (s) of its subfragment V . It seems to be very natural that two fragmentary meanings s and resU,V (s) define the same contextual meaning for x. In other words, for any s, we have an obvious compatibility ρV, x (resU,V (s)) = ρU, x (s) or simply ρV, x ◦ resU,V = ρU, x . These compatibility conditions need to be satisfied by any possible candidate T for the set of all contextual meanings of x. So for all nested neighborhoods V ⊂ U of the sentence x, the following diagram F (U) RRρU, x resU,V
F (V )
RRR RRR ( l6 T l l ll llρl V, x
commutes. In the standard terminology (Tennison, 1975, def. 3.4), the set T with a family (ρU, x )U∈V(x) that makes the aforesaid diagram commutative is called a target of the direct (or inductive) system of sets (F (U), resU,V )U,V ∈V(x) , where V(x) denotes the ordered set of all open neighborhoods of x. For a given particular text X, it is natural to eliminate the unnecessary entities (according to Ockham’s principle) and consider the set T as containing only those contextual meanings of sentence x which are relevant to X. So, the set T of all contextual meanings of the sentence x should depend on x and the adopted sense (mode of reading) F ; it will be settled by denotation T = Fx . Thus the set Fx of all contextual meanings of a sentence x relative to the adopted sense F should satisfy the following: Claim Ct (Contextuality). Let F be a sense (mode of reading) adopted for a given text X, then for any contextual meaning f ∈ Fx of a sentence x, there
Compositionality and Contextuality as Adjoint Principles
165
exist a neighborhood U of x and a fragmentary meaning s ∈ F (U) such that f = ρU, x (s). Being formulated at the level of text, the claim (Ct) may be considered as a generalization in the narrow sense of Frege’s classic contextuality principle posed at the level of sentence; it may be paraphrased as ‘ask for the meaning of a sentence only in the context of some fragment of a given text’. Now let U, V be two neighborhoods of x and let F be some mode of reading. Two fragmentary meanings s ∈ F (U) and t ∈ F (V ) are considered to give the same contextual meaning to the sentence x ∈ U if s and t agree on some smaller neighborhood of x. It seems to conform with the common reader’s intuition about what it would mean for two given fragmentary meanings s and t to induce the same contextual meaning of x. So these two fragmentary meanings s ∈ F (U) and t ∈ F (V ) are said to induce the same contextual meaning at x when there is some open neighborhood W of x, such that W ⊂ U ∩ V and resU,W (s) = resV,W (t) ∈ F (W ). This property should be demanded by any reasonable definition of the notion of contextual meaning. Thus the set Fx of all contextual meanings of a sentence x ∈ X should satisfy the following: Claim E (Equality). Let U, V be two open neighborhoods of a sentence x and let s ∈ F (U), t ∈ F (V ) be two fragmentary meanings for a given sense (mode of reading) F . Then the equality ρU, x (s) = ρV, x (t) in Fx between induced contextual meanings of the sentence x holds if and only if there exists an open neighborhood W of x such that W ⊂ U, W ⊂ V and resU,W (s) = resV,W (t). The claim (E) postulates an explicit criterion of equality between contextual meanings of a sentence that seems to conform with our linguistic intuition. As far as we know, Frege had never considered the notion of equality between the meanings of a given word in the context of a given sentence. According to a well known inductive limit characterizing theorem (Tennison, 1975, th. 3.8, p. 5), the conjunction (Ct)&(E) implies that for a given x ∈ X, the target Fx of the inductive system of sets (F (U), resU,V )U,V ∈V(x) is isomorphic to its inductive limit. In category theory, there are many equivalent definitions of the notion of an inductive limit: it might be defined as a target (of a given inductive system) which is universal in the sense that it is an initial object in some closely related category, or it might be defined as a target satisfying two additional properties analogous to (Ct) and (E), or it might be constructed somehow from a given inductive system. The main result concerning all these definitions is that they all give rise to isomorphic limits. This justifies the usage of functional notations for the inductive limit lim −→(F (U), resU,V )U,V ∈V(x) and for the other related notions defined by a given inductive system of sets (F (U), resU,V )U,V ∈V(x) . Thus
166
Oleg Prosorov
following the standard terminology, the set Fx is called a stalk of F at x, and the canonical image in Fx of a fragmentary meaning s ∈ F (U), is called the germ of s at x and is denoted as germx s. This terminology is due to the following construction of the inductive limit of sets (F (U), resU,V )U,V ∈V(x) : We note first that the relation ‘induce the same contextual meaning at x’ is obviously an equivalence relation, and then we note that the set of all equivalence classes is clearly the universal target Fx ; intuitively, any equivalence class of fragmentary meanings agreeing in some neighborhood of x defines some contextual meaning of x. In other words, all the contextual meanings of a sentence x ∈ X are united in the set Fx and for every neighborhood U of x, we are given a canonical map germU, x : F (U) → Fx defined as s 7→ germx s with the target Fx . Thus, for the phonocentric topology at the level of text, the conjunction (Ct)&(E) will answer the question how the set Fx of all contextual meanings for a sentence x ∈ X should be formally defined in order to generalize Frege’s classic contextuality principle proposed at the level of sentence to the level of text. Thus, at the level of text, we have the following: Definition (Frege’s Generalized Contextuality Principle). A sentence x within a fragment U of an admissible text X has a contextual meaning defined as the germ at x of some fragmentary meaning s ∈ F (U), where the sheaf F is the adopted sense (mode of reading); the set Fx of all contextual meanings of a sentence x ∈ X is defined as the stalk of F at x, i. e. as the inductive limit Fx = lim −→(F (U), resU,V )U,V ∈V(x) . In other words, if we have grasped some fragmentary meaning s of U ⊂ X, then for any sentence x ∈ U we dispose a canonical way to find a corresponding contextual meaning germx s. The contextuality principle proposed above is an explicit definition of contextual meaning for a given locus at the semantic level of text. The similar definition may be formulated at each semantic level. This one formulated at the level of sentence renders Frege’s classic contextuality principle . As soon as the semantic level is fixed, the corresponding definition of contextual meaning for a locus x is given as germx s, where s is some fragmentary meaning defined on some neighborhood U of x. Remark 1. More generally, one can consider the inductive system of all the open neighborhoods of some arbitrary part A ⊂ X to define all its contextual meanings as FA = lim −→ (F (U)); as was to be expected, for any open part A, U⊃A
this definition provides its contextual meanings as the fragmentary ones already given, i. e. in this case FA = F (A).
Compositionality and Contextuality as Adjoint Principles
167
Remark 2. The inductive limit may be calculated by means of any confinal part of a considered inductive system; in particular, lim −→(F (U), resU,V )U,V ∈V(x) is clearly determined by the basis of neighborhoods in V(x), that is, by the smallest basic set Ux . One sees clearly that the stalk Fx is in a one-to-one correspondence with F (Ux ). Although the set Fx of contextual meanings of x can be identified with the set of fragmentary meanings of the smallest open neighborhood Ux of x, we need the conceptual definition of Fx as lim −→(F (U), resU,V )U,V ∈V(x) to prove the theorems. Moreover, the understanding is progressing in time during the consecutive process of reading and rereading, where the different fragmentary meanings of the different neighborhoods of x are identified to finally give one contextual meaning of x, which is a germx (s) of some s ∈ F (U) grasped by the reader in some particular situation of reading. This neighborhood U of x need not be the smallest one Ux because it is not explicitly marked for each x as the context needed for its understanding. We stress that the grasped contextual meaning germx (s) ∈ Fx of x is immanent not in the text, but in the particular process of its reading. So the reading of a famous quotation cannot substitute the reading of the original work because the sheaf of fragmentary meanings F need not be flasque as we have argued in Prosorov (2002, p. 27). Recall that a sheaf F on a topological space is said to be flasque if for every inclusion V ⊂ U of open sets, the restriction map resU,V : F (U) → F (V ) is surjective. We would like to stress the difference between the two kinds of meaning considered in the interpretative process: • the notion of fragmentary meaning supplies a relevant frame to characterize the successful understanding of any textual fragment as a whole at a given semantic level in a current interpretative process; • the notion of contextual meaning supplies a relevant frame to characterize the successful understanding of any primitive textual element (locus) at a given semantic level in a current interpretative process. Now, the time has come to give an example. Let us consider a simple sentence x like “Close the door”. In some conversation, it might mean a lot of things, according to the context of its usage. One can say it and, at the same time, think something more or something else as for example: “Close the door, for, I feel it’s too cold for me!”, or “Close the door, for, I don’t want to hear the noise from the corridor!”, or simply “Close the door, but from the outside.” Anybody who participates in these communicative situations, surely in the context understands what the simple “Close the door.” means in each particular utterance. Let us now consider a possible use of this phrase in some (written) text. If you command Google to find all documents with the text “Close the door.” in the
168
Oleg Prosorov
Internet, you get approximatively 479,000 references in a moment. Certainly, there are many different usages of this very short sentence. In each particular case however, the author ought to write some meaningful fragment containing “Close the door.” (its neighborhood) in order to make clear and understandable what it means. The smallest such a neighborhood Ux depends on the particular author’s communicative intention and, in general, this Ux cannot be reduced to x. Hence, the grasped contextual meaning of x corresponds to one of the fragmentary meanings of Ux grasped in accordance with the reader’s mode of reading (sense) F and it cannot be otherwise! Paraphrasing Frege, we say: “Never ask for the meaning of a sentence in isolation, but only in the context of some fragment of a text.” It may be Ux or some other neighborhood of x, but de facto the contextual meaning of x can be identified with the germx (s) of some fragmentary meaning s ∈ F (U). For any other sentence y ∈ Ux such that y 6= x, we have Uy 6= Ux due to Lemma 3 of chapter 2, and hence the contextual meaning of y is defined by one of the fragmentary meanings of Uy , not of Ux despite of the fact that y lies in Ux . So the process of the reader’s understanding may be thought of as the consecutive choice of only one element from each stalk Fx . Even for a juridical text, the multiplicity of possible contextual meanings in Fx is inevitable (for a great joy of advocates). Our next goal is to define a category which we will consider as a mathematical frame of contextuality. Note that for a given admissible text X and an adopted sense (mode of reading) F , we have already defined the set Fx of all contextual meanings for each sentence x within a fragment U ⊂ X. The process of reading of the fragment U may be thought of as a consecutive choice of one appropriate element of Fx for each x ∈ U. This gives rise to a function t on U F which takes its values in the total coproduct (disjoint union) F = x∈X Fx ; we speak about the coproduct to formally avoid a possibility F for any two sets Fx and Fy to have some elements in common. For F = x∈X Fx , we consider a map p : F → X called projection which sends each germx s to the point x where it is taken. It is clear that this projection p and a function t : U → F constructed in the process of reading have an obvious property p(t(x)) = x for all x ∈ U. On. the other hand, each fragmentary meaning s ∈ F (U) determines a. function s : x 7→ germx s to be well-defined on U; for each x ∈ U, its value s(x) is taken in Fx . This gives rise to a functional representation .
η(U) : s 7→ s
(1)
defined for all fragmentary meanings s ∈ F (U). This representation of a frag. mentary meaning s as a genuine function s is of a great theoretical importance in explaining the nature of fragmentary meanings. Each fragmentary meaning s ∈ F (U), which has been described in. chapter 2 as an abstract entity, may now be thought of as a genuine function s defined on the fragment U of a given
Compositionality and Contextuality as Adjoint Principles
169
.
text. For .each sentence x of a fragment U, this function s representing s takes its value s(x) defined as the contextual meaning germx s of a sentence x .
s(x) = germx s
(2) .
We now define a topology on F by taking all the image sets s(U) ⊂ F as a basis of topology; thus an open set in F is a union of images of the functions of the . kind s. The projection p : F → X so constructed is a local homeomorphism, in the sense that each point of F has an open neighborhood which is mapped by p homeomorphically onto an open subset of X. That is, each point germx s ∈ F . . . . has some open neighborhood s(U), and p restricted to s(U) has s : U → s(U) as a two-sided inverse, hence p is a homeomorphism to U. We call any continuous function t : U → F such that t(x) ∈ p−1 (x) for all . x ∈ U a cross-section. Any function of the kind s defined on some open U (i. e. determined by some fragmentary meaning s) is obviously a cross-section. For any cross-section t : U → F, the projection p has the obvious property p(t(x)) = x for all x ∈ U, that is p ◦ t = idU . This situation may be resumed by saying that we are given two topological spaces F, X and a continuous projection p : F → X. In topology, the given pair (F, p) is called a bundle over the base space X. A morphism of bundles from p : F → X to q : G → X is defined as a continuous map h : F → G preserving the fibers p−1 (x), i. e. such that q ◦ h = p. So we have defined a category of bundles over X. A bundle (F, p) over X is said to be e´ tale when p : F → X is a local homeomorphism (Mac Lane & Moerdijk, 1992, p. 88). The e´ tale bundles constitute a full subcategory in the category of F bundles over X. It is immediately seen that a bundle of contextual meanings ( x∈X Fx , p) constructed, as above, from a given sheaf F of fragmentary meanings is e´ tale. For any admissible text X, each e´ tale bundle of contextual meanings (F, p) represents some mode of reading (sense); a morphism h of e´ tale bundles from (F, p) to (G, q) over the same text X represents a certain coherent transfer of contextual meanings, that is h : Fx → Gx . Thus, for any admissible text X, we have defined the category Context(X) of e´ tale bundles of contextual meanings over X as a framework for the generalized contextuality principle at the level of text. An admissible text is a kind of a structured whole where we consider the partwhole structure at different semantic levels. The text understanding consists in an incessant passage from one semantic level to another in the interpretative process. At each semantic level, this process may be modeled as a sheaf of fragmentary meanings on the one hand, and it may be modeled as an e´ tale bundle of contextual meanings on the other hand. The understanding of text is achieved in some kind of inductive process where we may distinguish its inductive basis and its inductive step at each semantic level (Prosorov, 2003, chap. 4).
170
6
Oleg Prosorov
Comparing Algebraic and Sheaf-theoretical Approaches
According to Janssen (2001), the compositionality principle in its standard interpretation is a theoretical basis for a Montague Grammar, Generalized Phrase Structure Grammar, Categorial Grammar and Lexicalized Tree Adjoining Grammar; these theories propose the different notions of meaning, but a meaning is assigned to words in isolation: “A technical description of the standard interpretation is that syntax and semantics are algebras, and meaning assignment is a homomorphism from syntax to semantics”. Let us consider this conception of standard interpretation as an algebraic homomorphism SYNTAX → SEMANTICS from a general point of view. Linguistically speaking, the syntax and the semantics should not be one and the same theory. Thus the aforesaid homomorphism should not be an isomorphism. Clearly, the syntax should not be a part of semantics. So the aforesaid homomorphism should not be a monomorphism. Thus, by the general theorem about the structure of algebraic homomorphism, it should be an epimorphism with a non-trivial kernel which is a congruence relation on its domain. Two different elements of an algebra A representing syntax are congruent if they are mapped to the same element of an algebra B representing semantics; hence the different syntactical objects (words or expressions) should have one and the same meaning as their value under such a homomorphism. This approach is adequate in the studying of texts of some formal programming language when they give rise to the same result after having been executed on the computer. We can also study the different transformations (say optimizations) of a given program, which lead to the same result after having been executed. As for natural language, this approach is quite adequate when we study a problem of synonymy, or when we study some formalized subset of a natural language, but the problem of polysemy should resist to this approach. When studying the process of interpretation of a natural language text, we are confronted with quite another situation. Any literary text under our analysis (say Hamlet, Notre-Dame de Paris or Faust) is given once and forever! And it is really a great universe of meanings to be disclosed or reconstructed in the process of reading and interpretation. But all these possible interpretations are presented to us as being identified in one and the same text. Thus, in studying the process of interpretation of a natural language text, we are confronted with a surjection: SEMANTICS → SYNTAX. Note that we turn the arrow round and this is a paradigmatic turn! The discourse interpretation activity looks like this from a sheaf-theoretical point of view. The text X under interpretation is a given sequence of its sentences x1 , x2 , x3 , x4 , . . . , xn ; this is a finite combinatorial object from the universe of syntax. Over these sentences, there is another sequence of stalks of their contextual meanings Fx1 , Fx2 , Fx3 , Fx4 , . . . , Fxn ; this
Compositionality and Contextuality as Adjoint Principles
171
is a potentially infinite and, in some degree, a virtual object from the universe of semantics. The total disjoint union F of all these stalks is projected by a local homeomorphism p on the text X. Thus we have the surjective projection p : F → X from semantics to syntax. The challenge of text interpretation is to create a global cross-section s of this projection p, which represents one of all possible global meanings of the whole text X interpreted in the sense F . This sheaf-theoretical approach gives another response to the crucial questions what the fragmentary meanings are and how they are formally composed. That is, we consider the reading process of a fragment U in a sense F as its covering by some family of subfragments (U j ) j∈J each read in a unique physical act. Any family (s j ) j∈J of pairwise compatible fragmentary meanings . s j ∈ F (U j ) under a functional representation (1) gives rise to a family (s j ) j∈J . of genuine functions (where each s j is defined on U j by (2)), which are pairwise . . compatible in the sense that si U ∩U (x) = s j U ∩U (x) for all x ∈ Ui ∩U j . Let a i S j i j . cross-section s be defined on U = j∈J U j as s(x) = s j (x) if x ∈ U j for some j. . Then this cross-section s over U is clearly a composition of the family (s j ) j∈J as it is claimed by the Frege’s generalized compositionality principle. This is a new sheaf-theoretical aspect of compositionality proposed in the present formal hermeneutics. This approach has an advantage because 1◦ it extends the area of semantics from the level of isolated sentence to that of a whole text or discourse and it gives a uniform treatment of discourse interpretation at each semantic level (word, sentence, text); 2◦ it takes into consideration the multiplicity of meanings of words, sentences and texts. 7
Frege Duality
For a given admissible text X, we have defined two categories formalizing the interpretative process: the Schleiermacher category Schl(X) of sheaves of fragmentary meanings and the category Context(X) of e´ tale bundles of contextual meanings. Our intention now is to relate them to each other. We will firstly define a so-called germ-functor Λ. Let F be an arbitrary sheaf of fragmentary meanings, i. e. an object of category Schl(X). We have already shown Fhow to assign an e´ tale bundle of contextual meanings (F, p) to F , where F = x∈X Fx and the projection p sends each contextual meaning germx s to the sentence x where it is taken. We propose Λ(F ) = (F, p) to define how the germ-functor Λ operates on sheaves. For a given morphism of sheaves φ : F → F 0 , F the induced F map of stalks φx : Fx → Fx0 gives rise to a continuous map Λ(φ ) : x∈X Fx → x∈X Fx0 such that p0 ◦ Λ(φ ) = p. Given another morphism of sheaves ψ, one sees easily
172
Oleg Prosorov
that Λ(ψ ◦ φ ) = Λ(ψ) ◦ Λ(φ ) and Λ(idF ) = idF . Thus, we have constructed a desired germ-functor Λ : Schl(X) → Context(X). We will now define a so-called section-functor Γ. We start with a category Context(X). For simplicity, we denote a bundle (F, p) over X by F. For a bundle F, we denote Γ(U, F), the set of all its cross-sections over U. For all nested opens U ⊂ V , one has a restriction map resV,U : Γ(V, F) → Γ(U, F) which operates as s 7→ s|U , where the cross-section s|U : U → F is defined as s|U (x) = s(x) for all x ∈ U. It is clear that resU,U = idΓ(U,F) for any open U ⊂ X, and that the transitivity resV,U ◦ resW,V = resW,U holds for all nested open U ⊂ V ⊂ W of X. So we have constructed a presheaf (Γ(V, F), resV,U )U,V ∈O(X) or simply Γ(F) which is obviously a sheaf. For any given morphism of bundles h : E → F, we have at once a map Γ(h)(U) : Γ(U, E) → Γ(U, F) defined in the obvious way as Γ(h)(U) : s 7→ h ◦ s. It is clear that the diagram Γ(h)(V )
Γ(V, E) −−−−→ Γ(V, F) resV,U 0 res V,U y y Γ(U, E) −−−−→ Γ(U, F) Γ(h)(U)
commutes for all opens U ⊂ V of X. Hence we have defined a morphism of sheaves Γ(h) : Γ(E) → Γ(F). Thus, we have constructed a desired sectionfunctor Γ : Context(X) → Schl(X). In many sources, one can find a general important theorem about a dual adjunction between presheaves and bundles established by the section-functor Γ and the germ-functor Λ (Lambek & Scott, 1986, p. 179; Mac Lane & Moerdijk, 1992, p. 89); when both functors are restricted to the corresponding full subcategories of sheaves and e´ tale bundles, it induces a dual equivalence of categories (duality). In the linguistic situation, this important result yields the following: Theorem (Frege Duality). The generalized compositionality and contextuality principles are formulated in terms of categories that are in natural duality Λ
−→ Schl(X) ←− Context(X) Γ
established by the section-functor Γ and the germ-functor Λ which are the pair of adjoint functors. As many of well-known classic dualities arising from dual adjunctions, Frege duality may be proven as an equivalence between full subcategories of sheaves and e´ tale bundles arising from a dual adjunction between categories of
Compositionality and Contextuality as Adjoint Principles
173
presheaves and bundles. However, a formal translation of such a proof into the linguistic situation compels us to give a semantic interpretation for the latter vast categories, too. In a work (Prosorov, 2004), we have outlined a proof of Frege duality which is restrained within the framework of the category of sheaves of fragmentary meanings and the category of e´ tale bundles of contextual meanings. This proof has the advantage to operate along the way with notions that are all linguistically interpreted; so we can linguistically interpret not only the statement of the fundamental topological theorem, but we dispose its proof each step of which can be interpreted in the linguistic situation. Frege duality is of great theoretical importance to clarify the nature of a fragmentary meaning and to answer a principal question: what are the fragmentary meanings. In particular, the Frege duality theorem states that every sheaf of fragmentary meanings is a sheaf of cross-sections of some e´ tale bundle of contextual meanings. This gives rise to the functional representation of fragmentary meanings at each semantic level and permits us to establish an inductive theory of meaning (Prosorov, 2003, 2004) describing how the process of text understanding proceeds. Moreover, Frege duality may be considered in some sense as a reconciliation between compositionality and contextuality principles. For any contextual bundle, the property of being e´ tale is equivalent to the conjunction (E)&(Ct). For any presheaf, the property of being a sheaf is equivalent to the conjunction (S)&(C). Recall that the claim (Ct) is a generalization in the narrow sense of Frege’s classic contextuality principle and the claim (C) is a generalization in the narrow sense of Frege’s classic compositionality principle. Separately, they seem to be in rather difficult relations, but augmented with the corresponding notions of equality (E) and (S), they give rise to equivalent categories being in adjunction in the strict mathematical sense. It is exactly in this sense that we consider compositionality and contextuality as adjoint principles. References Cassels, J. W. S. (1986). Local fields. Cambridge: Cambridge University Press. Ern´e, M. (1991). The ABC of order and topology. In H. Herrlich & H.-E. Porst (Eds.), Research and exposition in mathematics Vol. 18. Category theory at work (pp. 57–83). Berlin: Heldermann. Hodges, W. (1999, November). Some mathematical aspects of compositionality. Retrieved February 22, 2005, from the Queen Mary and Westfield College Web site: http://www.maths.qmw.ac.uk/˜wilfrid/ tuebingen99.pdf. Janssen, T. M. V. (1997). Chapter 7. Compositionality. In J. van Benthem
174
Oleg Prosorov
& A. ter Meulen (Eds.), Handbook of logic and language (pp. 417–473). Amsterdam: Elsevier Science B.V. Janssen, T. M. V. (2001). Frege, contextuality and compositionality. Journal of Logic, Language, and Information, 10, 115–136. Lambek, J., & Scott, P. S. (1986). Introduction to higher order categorical logic. Cambridge: Cambridge University Press. Mac Lane, S., & Moerdijk, I. (1992). Sheaves in geometry and logic. A first introduction to topos theory. New York: Springer-Verlag. May, J. P. (2003). Finite topological spaces. Retrieved January 31, 2005, from http://www.math.uchicago.edu/˜may/MISC/ FiniteSpaces.pdf. (Notes for REU) Prosorov, O. (1997). Critique de la raison herm´eneutique. Esquisse d’une herm´eneutique formelle [Critique of hermeneutical reason. Outline of formal hermeneutics]. Unpublished master’s thesis, Coll`ege universitaire franc¸ais, St. Petersburg. Prosorov, O. (2001). Esquisse d’une herm´eneutique formelle [Outline of formal hermeneutics]. Echos du Coll`ege : Dialogue franco-russe, 2, 9–29. Prosorov, O. (2002). Herm´eneutique formelle et principe de Frege g´en´eralis´e [Formal hermeneutics and Frege’s principle generalized]. Available from the Web site on texts semantics texto ! http://www.revue-texto. net/Inedits/Prosorov_Principe.pdf. Paris. (Published with the participation of Ferdinand de Saussure Institute) Prosorov, O. (2003). Formal hermeneutics and Frege duality (PDMI preprint No. 5/2003). St. Petersburg: Steklov Mathematical Institute. Prosorov, O. (2004). Compositionnalit´e et contextualit´e, deux principes de Frege en adjonction [Compositionality and contextuality, two Frege’s principles in adjunction] (PDMI preprint No. 8/2004). St. Petersburg: Steklov Mathematical Institute. Rastier, F. (1995). Communication ou transmission. C´esure, 8, 151–195. Serre, J.-P. (1979). Local fields (M. J. Greenberg, Trans.). New York: SpringerVerlag. (Original work published 1962) Skirbekk, G., & Gilje, N. (1999). History of philosophy. An introduction to the philosophical roots of modernity. Oslo: Scandinavian University Press. Stanley, R. P. (1986). Enumerative combinatorics, Vol. 1. Monterey, CA: Wadsworth & Brooks/Cole.
Compositionality and Contextuality as Adjoint Principles
175
Tennison, B. R. (1975). Sheaf theory. Cambridge: Cambridge University Press.
Part II
C OMPOSITIONALITY AND THE M IND
Perspectives, Compositionality and Complex Concepts Nick Braisby
1
Introduction
The past 20 years or so have seen many different theoretical orientations on concepts and categorisation, and shifts from one position to another have been well documented (Medin, 1989; Margolis & Laurence, 1999; Murphy, 2002). Early approaches held to the classical idea that entities fall under a concept because they satisfy a definition (cf. Bruner, Goodnow & Austin, 1956). Definitions have a number of appealing properties. It is of special relevance to discussions of compositionality that definitions compose. If the definition of square gives the criteria for falling under the concept square, and the definition of red gives the criteria for falling under the concept red, then combining these definitions should give the criteria for falling under the complex concept red square. If we knew the defining features of squares and the defining features of red, we could take the union of these sets of features as the definition of the complex concept. In view of the story one could tell about compositionality, it is disappointing that the consensus is that the definitional view of concepts is just plain wrong. Many arguments have been presented and much evidence cited in support of the view that concepts are not definitions. Much of the evidence centres on what have become known as typicality effects. Not all members of a category are equally representative of it: exemplars that are more similar to a prototype elicit greater agreement as to their categorisation (McCloskey & Glusksberg, 1978), are learned earlier (Rosch, Simpson & Miller, 1976), their names are produced with greater frequency as examples of the category (Mervis, Catlin & Rosch, 1976), are verified as category members more quickly (Rips, Shoben & Smith, 1973), and so on. The evidence seems to suggest that there are privileged (possibly ideal) members of a category called prototypes. Prototypes serve as Address for correspondence: Department of Psychology, The Open University, Milton Keynes, MK7 6AA, United Kingdom. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
180
Nick Braisby
cognitive reference points (Rosch, 1975), and it is the process of computing similarity to the prototype that helps explain the various typicality effects that are observed (cf. Hampton, 1993), though accounts have also been offered in which similarity is computed to previously categorized exemplars (Medin & Schaffer, 1978; Kruschke, 1992; Nosofsky & Palmeri, 1997). From today’s perspective, much of this theoretical development might be regarded as being of only historical relevance. Indeed, there have been a number of further theoretical shifts: it has been proposed that categorization is effected by theories (Murphy & Medin, 1985), for example, and that it betrays a belief in essentialism (Medin & Ortony, 1989, Gelman & Wellman, 1991). There is relatively little consensus however on which of these theoretical accounts fares best in explaining concepts. In this paper I will argue that to too readily consign the classical-prototype debate to the historical archive would be to miss some important lessons for the contemporary literature, lessons that have particular resonance for discussions of compositionality. I will then outline an account of concepts, RVC, which has been developing for some years, one that seeks to incorporate positive features of a number of different theoretical frameworks. I will then spend the largest part of this paper showing how this account treats complex concepts and, by implication, how it treats the issue of compositionality.
2
Conceptual Problems
One way of framing the debates between different theoretical orientations on concepts is in terms of their characterization of phenomena as either conceptual or not conceptual. For example, an adherent of the classical view of concepts might contend that certain typicality effects are not really conceptual – rather than reflecting properties of concepts, those effects might arise because of the particular use of concepts under certain circumstances. Rather than being intrinsically attributed to concepts in general, the effects might be more properly attributed to the particular concept in question, its use, and/or to circumstance. Conversely, an adherent of prototype theory might contend that compositionality is not a property of concepts whose typicality effects apparently fail to compose. If the typicality of items as ‘pet fish’ cannot be composed from their typicality as ‘pets’ and their typicality as ‘fish’ then a theorist may judge that compositionality is not a hallmark of (complex) concepts. The question as to whether typicality is compositional, and the deeper related question as to whether concepts are compositional, has been the focus of many papers (e.g., Osherson & Smith, 1981; Kamp & Partee, 1995; etc.). One reason for this is surely that the question has implications for which phenomena we are to regard as conceptual and so, ultimately, for our understanding of the functions
Perspectives and Complex Concepts
181
and nature of concepts. With regard to typicality and compositionality, there appear to be three plausible stances one can adopt. 1. Both typicality effects and compositionality are intrinsic properties of concepts. 2. Typicality effects are but compositionality is not an intrinsic property of concepts. 3. Typicality effects are not but compositionality is an intrinsic property of concepts. Each of these three stances suggests an obvious problem to be explained. The problem facing a theorist adopting the first is how to explain the apparent noncompositionality of typicality effects. A theorist adopting the second stance needs to explain how concepts appear to compose, how meaning can be given to complex expressions, and how concepts appear to be so productive and systematic (Fodor, 1994). Finally, a theorist adopting the third stance needs to explain how it is that the use of concepts can give rise to robust and reliable typicality effects. This article is not intended as an exhaustive analysis of the viability of each of these three stances, but rather as an exploration of the third. In the remainder of this paper, I outline a Relational View of Concepts (or RVC), a view that implicitly adopts the third stance, and the implications this has for our understanding of the compositionality of concepts.1 3
The Relational View of Concepts
RVC is a framework for describing concepts and their combination and builds on a number of ideas developed within the situation theory (Barwise & Perry, 1983) and information-theoretic literatures (Barwise & Seligman, 1997; Cavedon, Blackburn, Braisby & Shimojima, 2000). The framework also has much in common with John Dupr´e’s pragmatic realism (Dupr´e, 2002). There are two core, motivating intuitions behind the framework. 1. Categorization always takes place relative to a perspective, comprising a background of information, and the intentions and beliefs of a categorizer (cf. Clark, 1996; Braisby & Franks, 1997). 1
Of course there is also a fourth option, wherein both typicality and compositionality are regarded not to be properties of concepts. However, there appear to be no compelling reasons for pursuing such an account.
182
Nick Braisby
2. Categorization is flexible – entities can be considered both to fall under a concept and not to fall under that same concept, depending on the perspective the categorizer adopts (Braisby, 1990). Details of RVC have been published elsewhere (Braisby, 1993; 1998). The following description avoids all details of the computational implementation of the framework. Concepts are described somewhat formally in terms of schemalike structures that specify attributes and values. Nevertheless, the treatment of both single and complex concepts remains necessarily illustrative. The aim is to show that RVC respects a number of psychological and linguistic intuitions concerning concepts and their combination, and sheds some light on the difficult relationship between concepts and compositionality. An overview RVC respects the intuition that categorization is flexible by assuming that concepts have different content in different perspectives. In this, the framework denies what has been described as the mapping assumption (Braisby, 1990), the assumption that different (conventional) uses of a word map to the same conceptual content. In RVC by contrast, different uses of the same word can map to different conceptual contents – sometimes they will map to default content, represented in the framework by objects we call CONCEPTS, and sometimes to non-default content, content that arises from combining those CONCEPTS2 . RVC’s central features are as follows. 1. Default Content. C ONCEPTS represent default content, content that describes prototypical exemplars or a category’s central tendency. 2. Non-Default Content. Content that represents non-prototypical exemplars arises from combining CONCEPTS with RELATIONAL CONCEPTS. 3. Binary Classification. C ONCEPTS and RELATIONAL CONCEPTS give rise to strictly binary categorisations – CONCEPTS are not fuzzy in the sense of reflecting graded category membership or fuzzy extensions (cf. Zadeh, 1965); they are more like definitions. 4. Perspectives. The process by which non-default content is generated is deemed to be sensitive to the perspective of the categoriser.3 Thus CON CEPTS can be thought of as definitions-in-a-perspective. 2
This technical use of ‘concept’ is flagged by the use of small caps, while regular type continues to signal the non-technical use of ‘concept’. 3 Braisby & Franks (1997) offer a more general account of the role of perspectives in categorization, in which categorisation is seen as embedded in a purposive, largely communicative context.
Perspectives and Complex Concepts
183
5. Coherence. An exemplar can be categorized as a member of a category if it either a) falls under the corresponding CONCEPT or b) satisfies non-default content formed from combining that CONCEPT with RELA TIONAL CONCEPTS. Such RELATIONAL CONCEPTS will include operations and transformations over objects. The representation of a given category is therefore something that is achieved by an array of CONCEPTS and RELATIONAL CONCEPTS, which can be thought of as approximating a naive theory of the relevant domain (cf. Murphy & Medin, 1985). These features of RVC mean that it can partially respect the core motivating intuitions behind some of the major approaches to concepts. Binary classification, for example, accords with the classical view’s commitment to categorization being all-or-none. Default content is consistent with the claim that categories are based around cognitive reference points. Coherence fits with the claim that categorization is effected by some kind of theory. Having considered the central features of RVC, we now consider the nature of CONCEPTS in RVC. Concepts in RVC There are two types of CONCEPT within RVC: UNARY CONCEPTS (referred to just as CONCEPTS) and RELATIONAL CONCEPTS. C ONCEPTS are modeled as three place relations holding between (i) a set of individuals or category members, (ii) conditions that must hold for those individuals to fall under the CONCEPT, and (iii) the word that labels the concept. For simplicity, in the following analysis CONCEPTS are discussed solely in terms of the conditions that individuals must satisfy to fall under the CONCEPT. Consider an example. Suppose a categoriser’s foreground situation (e.g., the situation they are attending to, be it an aspect of the immediate environment, or a recalled or imagined environment) contains a lemon, an object that for convenience we take to have the following features (or attributes and values).4 shape :ovoid taste :acidic texture :rough colour :yellow isa : f ruit 4
It is a convenience to think of objects and concepts in terms of features or attributes and values. Whether this is an appropriate representational vehicle for concepts, and what such features would be, remain difficult questions. The intention here is not to prejudge such questions, but to use attribute-value structures as an illustrative device.
184
Nick Braisby
The items to the left of the colon in regular type are attributes, while the items to the right of the colon in italic type are values. Thus, this particular lemon has ovoid shape, yellow colour, and so on. C ONCEPTS can also be thought of as belonging to a situation, in this case the categoriser’s background situation (cf. Barwise, 1985, 1989) and it is the contents of the background situation that go some way in RVC to model the role of perspectives. Pursuing the example of lemons, the CONCEPT thus expresses the default content associated with the concept lemon, and describes the prototype or cognitive reference point for the category. Suppose a categoriser possesses the following CONCEPT for lemon. shape :ovoid taste :acidic L EMON colour :yellow isa : f ruit Because classification in RVC is binary, and CONCEPTS are definition-like and specify only the properties of prototypical exemplars, it follows that CON CEPTS also only refer to prototypical category members. To show how nonprototypical category members also can be deemed to fall under the concept, we need to consider RELATIONAL CONCEPTS. R ELATIONAL CONCEPTS can also be modeled in terms of attribute-value structures, but with certain key differences. Consider the concept for flatten. It expresses a relationship between two states of affairs: one in which an individual has not been flattened, and one in which it has. In RVC, such a RELATIONAL CONCEPT is modeled as follows. shape: shape: f lat F LATTEN ...1 , ...1 The co-indexed ellipses indicate that whatever else may be true of an object prior to flattening, it is also true of that object after flattening. In effect, the RELATIONAL CONCEPT models the case where flattening an object renders its shape flat, but preserves all other properties. While such a treatment is necessarily simplistic, it nonetheless offers a useful framework for thinking about concepts; even a little knowledge can go a long way. R ELATIONAL CONCEPTS and CONCEPTS can combine using the simple information combining operation of unification. It is through such CONCEPT combinations that non-default contents become associated with a CONCEPT’s label, and the label can apply to non-prototypical individuals. In the above example, the RELATIONAL CONCEPT for flatten can combine with the CONCEPT for lemon to give the combined CONCEPT for flattened lemon. At its most simple, the ‘before’ conditions of the RELATIONAL CONCEPT flatten can unify with the
Perspectives and Complex Concepts
185
conditions of the CONCEPT lemon, to produce a new set of ‘after’ conditions, identical except for the value of the shape attribute being specified as flat. C ONCEPTS can also combine via unification. A CONCEPT for lemon, for example, could combine with the CONCEPT for yellow – their two sets of conditions would unify to form conditions associated with the concept combination yellow lemon. Indeed, any two sets of conditions that do not contradict one another will unify. The critical question, of course, is how to explain concept combinations where the conditions of the two CONCEPTS do contradict one another, such as blue lemon or plastic lemon. To explain these cases we need to consider how categorization is modeled in RVC. Categorisation in RVC In RVC, there are two categorization rules. 1. An individual falls under a concept X if it satisfies the conditions of the associated CONCEPT X in the categoriser’s perspective. 2. An individual falls under a concept X if it satisfies the conditions formed from combining the associated CONCEPT X with a RELATIONAL CON CEPT ( S ) in the categoriser’s perspective. Thus, according to rule 1, an object can be said to be a lemon, or fall under the concept lemon, if it satisfies the conditions for the CONCEPT lemon, i.e., it is a prototypical lemon. According to rule 2, an object can also be said to be a lemon, or fall under the concept, if it satisfies conditions formed from combining the CONCEPT lemon with RELATIONAL CONCEPTS. Because of the operation of the two categorization rules, and because the second makes reference to RELATIONAL CONCEPTS, how an individual is categorized will depend on what CONCEPTS are in the categoriser’s background situation or perspective. If the individual is prototypical of its category, it will be so categorized just so long as the agent’s perspective contains the relevant CONCEPT . How non-prototypical members of a category are categorized will depend on whether the categoriser’s perspective contains the relevant CONCEPT and contains appropriate RELATIONAL CONCEPTS. There is therefore an important asymmetry between the categorization of prototypical and non-prototypical individuals – whenever the latter can be categorized so can the former, but the converse does not hold. In RVC, the claim that the prototype forms a cognitive reference point amounts to this: with the appropriate CONCEPT, the categoriser is guaranteed to categorise prototypical members; without this CONCEPT, the categoriser is unable to categorise any category members. It could be argued that combination with certain relational concepts actually prevents objects from falling under a concept. For example, combining the
186
Nick Braisby
concept ‘gun’ with the relational concept ‘fake’ describes something that is not a gun. If so, this would appear to be a prima facie violation of categorisation rule 2. However, the intuition that lies behind RVC is that there are perspectives in which it is true that a ‘fake gun’ is a ‘gun’, and perspectives in which it is not – presumably the reason why one might assert ‘a fake gun is not a real gun’. RVC and complex concepts Reflecting the different categorization rules, there are also two ways in which the content or conditions associated with complex concepts can arise. Suppose (the linguistic expression of) a complex concept contains a modifier (M) and head (H). For present purposes we will not distinguish the case where the modifier is an adjectival construction from that where it is a nominal construction. Then, assuming the head and modifier label unary CONCEPTS, the conditions for the complex concept M H are given by one of two rules. 1. The conditions for a complex concept M H are given by the unification of the conditions for the CONCEPT M and the conditions for CONCEPT H (where these conditions unify) in the categoriser’s perspective. 2. The conditions for a complex concept M H are given by the unification of the conditions for the CONCEPT M with the conditions formed from combining the CONCEPT H with a RELATIONAL CONCEPT ( S ) in the categoriser’s perspective. Though RVC is not intended as a process account of categorization, it would be possible to think of these two rules for complex concepts as suggesting a twostage process. First, the categoriser may try to unify the conditions associated with head and modifier concepts (in effect, trying to construct an interpretation based on their default content). Where this fails, the categoriser attempts to find a non-default interpretation of the head concept in order that this can unify with the conditions associated with the modifier concept. This process interpretation may be a convenient way of thinking about RVC, but it is unlikely to be wholly correct – in some perspectives, the categoriser may show a preference for nondefault interpretations. Concept combination rule 1 accounts for some intersective combinations, in particular those where the modifier either (re-)asserts or fails to contradict information in the prototype corresponding to the head. The example offered earlier of yellow lemon is just such a combination. The modifier re-asserts prototype information of the head CONCEPT, and the two sets of conditions can unify straightforwardly. In effect, the rule means that an individual can fall under a complex modifier-head concept provided it falls under both of its constituents.
Perspectives and Complex Concepts
187
Since RVC takes CONCEPTS to represent default content, some complex concepts that would conventionally be considered as intersective require a different analysis. The concept blue lemon would normally be considered intersective – conventionally its extension is the intersection of the sets of blue things and lemons. Within RVC, however, in certain perspectives, this combination is nonpredicating or non-intersective – in a perspective containing the CONCEPT for lemon and no RELATIONAL CONCEPTS, a blue lemon would be a contradictio in adjectio (since the two sets of associated conditions, shown below, contradict one another with respect to the colour attribute and so fail to unify). B LUE [colour:blue] shape :ovoid taste :acidic L EMON colour :yellow isa : f ruit To derive appropriate interpretations for complex concepts such as these, concept combination rule 2 is required. In effect, this rule implies that an individual will fall under a complex concept provided it falls under the modifier CONCEPT, falls under a non-default interpretation of the head, and provided their associated conditions can unify. In the example of blue lemon an interpretation can be derived provided there is a RELATIONAL CONCEPT whose ‘before’ conditions unify with those for lemon and whose ‘after’ conditions unify with those for blue. What kind of RELATIONAL CONCEPT would be required? Consider the RELATIONAL CONCEPT paint. The ‘after’ conditions will be identical to the ‘before’ conditions, except that the colour attribute will be of indeterminate value: ‘painting a wall’ results in nothing more than the wall being of a particular (unspecified) colour. Thus the RELATIONAL CONCEPT relates the two following sets of conditions (again, the co-indexed ellipses indicate that whatever else may be true of an object prior to painting, is also true of that object afterwards). colour: colour: PAINT ...1 , ...1 In a perspective containing the CONCEPTS blue, lemon and paint, the ‘before’ conditions for paint can unify with the conditions for lemon, to give the following conditions for lemon0 – the prime (0 ) indicates that this is derived or non-default content. shape :ovoid taste :acidic L EMON0 colour : isa : f ruit
188
Nick Braisby
These conditions give an interpretation of lemon that effectively reflects the fact that a lemon qua painted lemon can be any colour. These conditions, ‘after’ conditions, can unify with the conditions for blue, yielding the following conditions for blue lemon. shape :ovoid taste :acidic B LUE LEMON colour :blue isa : f ruit Intuitively, the complex concept is treated as a complex nominal (cf. Levi, 1978), and in effect, from this perspective, blue lemon can be thought of as a short form, maybe for lemon that has been painted blue or blue painted lemon. The interpretation, however, is only possible because of the RELATIONAL CON CEPT: without it, the other CONCEPTS simply could not combine. Note also, that the RELATIONAL CONCEPT paint is not a constituent of the complex concept. Rather, the relational concept furnishes the categoriser with a way of thinking about the objects involved: the categoriser can think about lemons as entities that have been painted, and this way of thinking allows an interpretation of the combination blue lemon. Of course only the number of different RELA TIONAL CONCEPTS that the categoriser can bring to mind limits the number of different interpretations. The categoriser need not attend to such RELATIONAL CONCEPTS nor be aware of them, to be provided with these ways of thinking. The explicitness of RELATIONAL CONCEPTS is an issue we return to in the discussion. It is worth noting that blue lemon is treated differently depending on the categoriser’s perspective. With respect to certain perspectives blue lemon is nonpredicating: in a perspective with no RELATIONAL CONCEPTS the combination is a contradictio in adjectio. However, with respect to other perspectives, such as those containing the RELATIONAL CONCEPT paint, blue lemon is strictly predicating. Whether or not adjectives that have been claimed to be predicating (such as blue) are truly predicating is, in RVC, a matter of perspective. RVC also offers a similar account of noun-noun combinations such as stone lion. Suppose the conditions associated with the relevant concepts are as follows. S TONE [animacy:−] animacy :+ L ION shape :lion shaped These are of course minimal conditions – there is much more that one can say about even prototypical lions and prototypical stone things – but these conditions suffice to illustrate how such difficult combinations as these (cf. Franks,
Perspectives and Complex Concepts
189
1995; Kamp & Partee, 1995) can be analysed. As before, in a perspective containing no RELATIONAL CONCEPTS, the two sets of conditions fail to unify; they contradict one another on the attribute of animacy. As before, stone lion would express a contradiction from this kind of perspective. However, categorisers may possess RELATIONAL CONCEPTS that relate animate and inanimate versions. A statue, for example, is an inanimate version of an animate being. Thus, we can think of the RELATIONAL CONCEPT statue of as describing a relation between the two following sets of conditions. animacy:− animacy:+ , ...1 S TATUE OF ...1 This RELATIONAL CONCEPT does not describe a transformation, unless we think of the process of creating a statue, and so the two sets of conditions cannot be thought of as ‘before’ and ‘after’ conditions. We should perhaps think of the two sets of conditions as reflecting the ‘representation’ and the ‘represented’. This RELATIONAL CONCEPT can combine with the CONCEPT lion to form a new set of conditions as follows. animacy :− L ION0 shape :lion shaped These conditions reflect the fact that a lion qua statue of a lion is inanimate. These conditions, ‘representation’ conditions, can now unify with the conditions for stone, yielding the (same) following conditions for stone lion. animacy :− S TONE LION shape :lion shaped As before, the RELATIONAL CONCEPT provides the categoriser with a way of thinking about lions, a way of thinking that allows them to make sense of the combination stone lion. Of course, there are a few other RELATIONAL CON CEPTS that could do the same job: fossilize and petrify are two obvious concepts that come to mind. It is an implication of the RVC that a categoriser equipped with no such relational concepts could generate no interpretation for stone lion. Is there a way of thinking about stone lions wherein the notions of representation, fossilization or petrification are not at play? One could try answering this question by posing a riddle: ‘I am a lion but I am also made of stone. I am not a representation of a lion, nor am I a real lion that has been fossilized or petrified. What am I?’ The riddle shows how interpretations of the combination rely on relational concepts: without them no interpretation is possible5 . 5
In fact, there are other interpretations, though none are plausible. It could be that a categoriser would interpret stone lion through modifying their existing CONCEPTS of
190
Nick Braisby
A similar treatment can be offered for apparent nominal compounds such as telephone man.6 This can also be interpreted through the use of relational concepts. Suppose the CONCEPTS of telephone and man are, minimally, indicated by the following conditions. T ELEPHONE [isa:machine] isa :human M AN sex : male As before, the question arises as to the nature of RELATIONAL CONCEPTS that might relate these two CONCEPTS. Two obvious candidates are the relations of repairing and selling, and we can schematically indicate their content as follows. isa:machine isa:human R EPAIRS ...1 , ...1 S ELLS
isa:machine isa:human , ...1 ...1
Again, the RELATIONAL CONCEPTS provide conditions that can unify separately with the modifier (telephone) and head (man), yielding an interpretation of the combination either as a man that sells telephones or a man that repairs telephones. 4
Discussion
This paper has offered RVC as a useful way forward in thinking about the compositionality of complex concepts. RVC suggests ways in which plausible interpretations of apparently non-compositional complex concepts can be derived. stone and lion. For example, the concept of lion could be modified to allow inanimacy as a permissible attribute. However, this would make the concept misleading – real lions are not inanimate. Equally misleading would be to suggest modifying the concept of stone to allow animacy as a permissible attribute. 6 Although telephone man is unlikely to possess a specialised meaning, as Levi (1978) indicates, this alone is unlikely to be an adequate criterion for judging nominal compounds. Telephone man appears to pass two other criterion that have been thought to individuate nominal compounds. These are the possession of fronted stress (i.e., on telephone) and the presumption of a permanent or habitual relation between the referents of the constituents. That is, telephone man can refer to a man whose job it is to repair or sell telephones, but is unlikely to successfully refer to a man who only occasionally uses a telephone.
Perspectives and Complex Concepts
191
However, there are two outstanding questions. The first concerns the analysis of typicality effects, something that has not thus far been mentioned in the exposition of RVC. The second concerns the over-riding issue of compositionality. If even apparently intersective combinations such as blue lemon involve relational concepts, then to what extent are complex concepts in general compositional? In this discussion, I want to defend two propositions concerning these issues. First, I shall argue that some, but not all, typicality effects emerge as natural by-products of RVC’s categorization rules. Second, I shall argue that the notion of perspective allows us to reconcile the claim that relational concepts are involved in the interpretation of many complex concepts with the claim that those complex concepts are nonetheless compositional. Typicality Typicality appears to present a particular difficulty to the development of a compositional theory of concepts. If typicality is compositional, then the typicality effects of complex concepts should be predictable from those of their constituents. However, much evidence suggests that typicality effects are not so predictable. The example of pet fish has been given already. A typical pet is probably a cat or dog; a typical fish, a salmon or trout, perhaps. Yet, a typical pet fish is probably a goldfish, neither a typical pet, nor a typical fish. Thus typicality in the combination appears not to be predictable from typicality in the constituents. Numerous similar cases have been cited, and arguments presented to the effect that typicality effects are not compositional (e.g., Fodor & Lepore, 2002). Earlier, I raised the question as to what properties of concepts we should regard as conceptual properties, properties that are constitutive of concepts, and those that are non-conceptual but are perhaps merely associated with concepts. In the preceding discussion of RVC and its treatment of single and complex concepts typicality has been taken to be a non-conceptual property, and not something that should necessarily be treated as an aspect of concepts. Nevertheless, it is important that any such approach to concepts should offer some indication as to how typicality effects might arise. RVC regards categorization as something that is effected relative to a perspective – a categoriser brings to a categorization situation a stock of relevant concepts, and can combine these in order to categorise items, infer the properties of category members, and to interpret complex concepts. In RVC, there are broadly two means by which categorisers might judge or derive typicality-related information: intra-perspective and inter-perspective typicalities (Braisby, 1993). Intra-perspective typicality reflects differences in the associative strength be-
192
Nick Braisby
tween attributes and a concept within a given perspective, or background situation. For example, a categoriser possessing just the three concepts lemon, flatten and paint can entertain six different perspectives as shown in Figure 1. Each perspective is defined by the availability of a subset of these three concepts. The figure illustrates the different conditions that attach to the concept lemon for each perspective: note some perspectives do not allow items to be categorized as lemons. Lemon Flatten Painted
Lemon Flatten
Lemon
Lemon Painted
Flatten
Painted
Figure 1: Conditions attaching to the concept lemon in 6 different perspectives. Perspectives are depicted by boxes each indicating the concepts available within that perspective. The topmost perspective in the figure is one defined by the availability of all three CONCEPTS. Thus, within that perspective, there are three different conditions attached to the concept lemon, reflecting the CONCEPT lemon, and two concept combinations – lemon combined with paint, and lemon combined with flatten. These three sets of conditions are referred to as L, L0 and L00 and shown below.7 shape :ovoid taste :acidic L = colour :yellow isa : f ruit 7
For simplicity, I have omitted a fourth condition which reflects the combination of lemon with flatten and paint – a lemon could be flattened and painted, therefore having both a flat shape and any colour.
Perspectives and Complex Concepts
shape taste L0 = colour isa shape taste L00 = colour isa
193
:ovoid :acidic : : f ruit : f lat :acidic :yellow : f ruit
Considering the different conditions available within the topmost perspective in Figure 1, an item is a lemon in that perspective provided it satisfies any of the three different conditions L, L0 or L00 (i.e., the disjunction L ∨ L0 ∨ L00 ). This means that to be categorized as a lemon an item would have to be a fruit and acidic, but it could be either ovoid or flat (though ovoid is the more strongly associated property, being reflected in two of the three sets of conditions), and it could be any colour (though yellow is also more strongly associated, being reflected in two of the three conditions). The different degrees of association between the attributes and the positive categorization indicate how typicality effects might arise from within a single perspective. A prototypical lemon having ovoid shape and yellow colour is most strongly associated with a positive categorization, while flat lemons and non-yellow lemons are less strongly associated. Inter-perspective typicality, by contrast, refers to different degrees of association between attributes and a categorization when a range of different perspectives or background situations is considered. For example, a categoriser may shift perspective and thereby consider categorization judgments from different perspectives. In figure 1, there are four perspectives that allow a categoriser to judge items to be lemons (the perspectives with arrows pointing to them). A prototypical lemon would be categorized as a lemon in all four (reflecting the condition L); a painted lemon would be categorized as a lemon in two of the four (L0 ); a flat lemon would be categorized as a lemon in two of the four also (L00 ); and a flat painted lemon would be categorized as a lemon in just one, the topmost perspective (see footnote 7). Thus, by considering a range of different perspectives, it would be possible to induce a typicality ordering on the category, where the prototype for the category is at one extreme, and items with fewest properties in common with the prototype are at the other. In general, the two measures, intra- and inter-perspective typicality, do not coincide. How do these two measures of typicality relate to the question of the compositionality of concepts? In RVC, the question of whether a combination such as stone lion is compositional is a question of whether their conditions can unify, and this is a matter of perspective. There are three possibilities.
194
Nick Braisby
1. From a perspective containing suitable RELATIONAL CONCEPTS (such as statue) the conditions can be unified in the manner outlined earlier, and the combination can be seen to be compositional. 2. From a perspective that contains no appropriate relational concept, then the combination can still be seen as compositional – however, as the constituent conditions cannot combine the only interpretation is a contradiction. 3. Still other perspectives will fail to contain either constituent of the combination, and here the question of their compositionality does not arise. But what of typicality effects? Since typicality might arise from RVC reflecting differing degrees of association between attributes and positive categorizations, either intra- or inter-perspective, there is simply no requirement or in general an expectation that such typicality effects will be compositional. Indeed, while typicality effects can be inter-perspectival, concepts combine only within perspectives; the range of perspectives that are reflected in the typicality gradients of constituents will in general not coincide with the perspectives in which the complex concept can be given an interpretation. There are other reasons to believe that typicality effects would not be compositional. Reflecting possibility 2), where perspectives contain the constituent concepts but not an appropriate relational concept, the combination is treated as a contradictio in adjectio. Even though no sensible interpretation is obtained under such perspectives, they still may contribute to the inter-perspective typicality of both constituents. Similarly, reflecting possibility 3), there will be some perspectives that fail to contain both constituents. While, relative to this kind of perspective, the question of compositionality does not arise, again such a perspective may still contribute to the inter-perspective typicality of a constituent concept. RVC also expresses feature co-occurrence restrictions. Where relational concepts are needed to ensure the constituents of a complex concept combine, the same relational concepts in effect restrict the attributes of those constituents. The relational concepts will in general prevent the attributes of complex concepts enjoying the same range of variation as they enjoy in the constituent concepts. Suppose, for example, that a lemon may be black or yellow, or sweet or acidic. It may be that a lemon painted black tastes the same as other lemons, while a lemon that has become black through decay will also change in taste. Perhaps a black, decayed lemon is especially sweet. Whether and what kind of feature co-occurrence restrictions are then expressed is a matter of which RELATIONAL CONCEPT is involved in the categorisation. In general, though, the complex concept black lemon will not enjoy the same variation in the taste attribute that will lemons in general. Such feature co-occurrence restrictions
Perspectives and Complex Concepts
195
suggest that typicality for a complex concept is influenced by factors that do not necessarily influence typicality for its constituents. In general, RVC suggests that the factors that influence the typicality for complex concepts are not identical with those that influence the typicality for their constituents. To the extent that RVC offers a plausible account of complex concepts, we should not expect typicality to be compositional. Indeed, typicality effects on this view are not really conceptual properties – they are not constitutive of concepts, but arise from their perspectival nature and the ability of categorisers to consider variation both within and across different perspectives. In support of this position, consider two well-known examples where typicality has appeared to pose difficulties for accounts of concepts: the ‘pet fish’ case and the ‘wooden spoon’ case. In the former, even though typicality information might be derived for the complex concept, there is no reason to believe this would be predictable from the typicality information that might be derived for the constituent concepts alone. The fact that a guppy or goldfish is more typical than a salmon or trout is, on this account, not a conceptual phenomenon. And this seems right. There are many reasons why goldfish make better pets than salmon do, such as their relative cost, size and needs. But were these circumstances to change – perhaps a disease would wipe out most goldfish rendering them prohibitively expensive and difficult to keep – perhaps salmon, after all, would become a more frequently kept pet. But surely this would not imply a conceptual change. Likewise, were fashions to change, and large spoons were more typically made of metal, and small spoons of wood, this would surely not signal a change in our concepts. Although RVC suggests ways in which typicality orderings could be induced, it nonetheless treats typicality as not being an intrinsic property of concepts. Category membership The literature on concepts is replete with examples of complex concepts whose category membership is not compositional. Examples that appear to support that conclusion include non-intersective, and non-predicating combinations such as ‘ocean winds,’ ‘coffee cup,’ ‘electric shock,’ ‘tax lawyer,’ ‘traffic policeman,’ and so on (cf., Kay & Zimmer, 1976; Levi, 1978; Murphy, 1988; Shoben, 1991, among others). It is difficult to dismiss such examples as idiomatic or nonproductive, and so not requiring a compositional treatment. In a way, the problem with such concepts is to explain their productivity while acknowledging their apparent non-compositionality. According to the RVC account it can be seen that compositionality is itself a perspectival notion. Provided a perspective contains appropriate relational concepts, complex concepts may, relative to that perspective, be com-
196
Nick Braisby
positional. Relative to other perspectives, the same complex concepts may be non-compositional. For example, relative to a perspective containing only the concepts lion and stone, it is not possible to derive a compositional interpretation (other than a contradiction) for stone lion; relative to a perspective containing the relational concept statue, which allows the concept lion to express a non-default content, then stone lion is compositional. Moreover, within that perspective stone lion is an intersective combination. Similar remarks can be made concerning all of the previous examples. Relative to some perspectives complex concepts cannot be interpreted (other than as self-contradictions); relative to other perspectives, the same complex concepts can be given an intersective and compositional interpretation. RVC thus suggests a way of reconciling the apparent non-compositionality of complex concepts with their indubitable productivity. If concepts are perspectival, then compositionality is not strictly a property that can be possessed by complex concepts – it is a property that can be possessed only by pairs of complex concepts and perspectives. Ignoring the perspectival nature of concepts would likely lead to inconsistent conclusions concerning compositionality, since a complex concept may be compositional in one perspective and non-compositional in another. Within a perspective category membership is strictly binary, or all-or-none; and a complex concept will either be compositional, and given an intersective interpretation, or non-compositional (perhaps expressing a contradiction). If category membership is considered across different perspectives, then a different picture would present. There would appear to be confusion: there may be some clear category members (prototypical members would be members in all perspectives), there would be some clear non-members (non-members in all perspectives), but mostly there would be intermediate cases (members in some perspectives and non-members in others). The picture would emerge of fuzzy or indeterminate category boundaries, and one might be tempted to conclude that category membership was graded. But such a conclusion would fail to reflect the true source of fuzziness – not indeterminate category boundaries, but binary and perspectival categorisation. Perspectives and relational concepts The question arises from the RVC treatment of complex concepts as to whether it is genuinely a compositional account. After all, it could be argued, the account supposes that relational concepts are required in order to derive interpretations for many complex concepts. Yet, relational concepts are themselves not constituents of the complex concept. Hence the account must be noncompositional. This argument presupposes that relational concepts in RVC are semantic constituents of complex concepts in exactly the same way as the syntactically re-
Perspectives and Complex Concepts
197
alised constituents. Such an argument violates one of the assumptions that underpins RVC, the assumption that perspectives furnish a way of thinking, or way of categorizing, that does not itself figure as a constituent in complex concepts. To explain this, we need to consider the late Jon Barwise’s analysis of inference and information flow relative to background conditions. Barwise gave the example of time zones (Barwise, 1989, p. 241). Within a time-zone, one need never be aware of the dependence on time-zones of the relation between the reading on a wrist-watch and the current (local) time. Only when one crosses a time-zone does this dependence become apparent, and only then does the time-zone itself figure as a constituent in the inference from the reading on the watch to the current time. The critical point is that the time zone is part of the background situation. One can happily live one’s life never knowing about or understanding time zones; such a lack of knowledge is no impediment to reading one’s watch and telling the time. Only when one crosses a time zone does this previously unarticulated aspect of the background become an articulated constituent. In much the same way, RVC assumes that perspectives allow concepts to remain in the background – they furnish the categoriser with ways of thinking about the world. It is only when the categoriser has to consider or entertain different perspectives, that the reliance on relational concepts becomes explicit. Until then, within a perspective, the reliance of categorization on relational concepts is just an unarticulated way of thinking. Barwise considered the time zone to be an environmental constant that could be exploited by human reasoning – instead of having to explicitly mentally represent this feature as a constituent of inferences, within the right background it could be taken for granted. Similarly, relational concepts can be thought of as perspectival or cognitive constants: only when one shifts perspective, when a relational concept becomes present or becomes absent, when its effects come and go, does the relational concept figure as a constituent. Hence, RVC points to a different model of compositionality, one inspired by Barwise’s work, and one in which perspectives play an ineliminable role. Provided we are in the right background, the right perspective, then the appropriate relational concepts are cognitive constants – they are not constituents of complex concepts even when they are involved in deriving an interpretation. Complex concepts that involve such relational concepts need not be seen as non-compositional. 5
Conclusion
In this paper, I have presented RVC, an account of concepts and categorization which regards concepts as binary, and categorization as perspectival. With two categorization rules and two related rules for deriving interpretations for com-
198
Nick Braisby
plex concepts, a number of problematic cases for a compositional account of concepts can be accounted for. In this account, relational concepts play a central role in categorization but, I have argued, should not be confused with explicit constituents and so should not be seen to undermine the claim that concepts are compositional. RVC also offers an explanation of some typicality effects, and a number of reasons why we would not expect typicality effects to be compositional. In effect, typicality is not seen as a conceptual property – not something that is constitutive of concepts, but something that arises from a categoriser considering the different degrees of assocation between attributes and positive categorizations. Such considerations may apply within (intra) and across (inter) perspectives. One objection to RVC could be that it apparently treats most concepts and complex concepts as highly polysemous. There are reasons to believe that this is less of a problem than it might at first seem. RVC is motivated in part by a wider literature concerning the role of perspectives in communication. Clark (1996), for example, offers a detailed analysis of the ‘communal common ground’ that he claims is vital for successful communication. Clark & Wilkes-Gibbs (1986) demonstrate how co-ordination between speaker and hearer can allow the same object to be referred to by increasingly abbreviated descriptions. So, a shape that might be referred to as ‘the shape that looks like an ice-skater on one leg’ might be referred to, after further speaker-hearer exchanges, as ‘the ice-skater.’ Just as one would not wish to use this observation to mount the claim that ice-skater is multiply polysemous, so too with complex concepts. RVC does not really treat them as polysemous either. Instead, it offers mechanisms for recovering non-default interpretations for concepts. Exactly how speaker and hearer coordinate their communications, that is, co-ordinate their perspectives, remain to be worked out. However, progress has been forthcoming in analysing coordination at different levels of dialogue (cf. Garrod & Pickering, 2004), and work has begun on specifying the components of what have here been termed perspectives (e.g., Clark & Carlson, 1981; Braisby & Franks, 1997).
References Barwise, J. (1985). The situation in logic II: Conditionals and conditional information (Tech. Rep. No. CSLI 85 21). Center for the Study of Language and Information, Stanford University. Barwise, J. (1989). Information and circumstance. In The situation in logic (Vol. 17, pp. 137–154). Chicago, IL: University of Chicago Press.
Perspectives and Complex Concepts
199
Barwise, J., & Perry, J. (1983). Situations and attitudes. Cambridge, MA: MIT Press/Bradford Books. Barwise, J., & Seligman, J. (1997). Information flow: the logic of distributed systems. Cambridge, UK: Cambridge University Press. Braisby, N. (1990). Situating word meaning. In R. Cooper, K. Mukai, & J. Perry (Eds.), Situation theory and its applications (Vol. I, pp. 315–341). Stanford, CA: Center for the Study of Language and Information. Braisby, N. (1993). Stable concepts and context-sensitive classification. Irish Journal of Psychology, 14(3), 426–441. Braisby, N. (1998). Compositionality and the modeling of complex concepts. Minds and Machines, 8, 479–508. Braisby, N., & Franks, B. (1997). What does word use tell us about conceptual content. Psychology of Language and Communication, 1(2), 5–16. Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. New York: John Wiley. Cavedon, L., Blackburn, P., Braisby, N., & Shimojima, A. (Eds.). (2000). Language, logic and computation (Vol. 2). Stanford, CA: CSLI. Clark, H. H. (1996). Using language. Cambridge, UK: Cambridge University Press. Clark, H. H., & Carlson, T. B. (1981). Context for comprehension. In J. Long & A. Baddeley (Eds.), Attention and performance (Vol. 9, pp. 313–330). Hillsdale, N.J: Lawrence Erlbaum Associates. Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1–39. Dupr´e, J. (2002). Humans and other animals. Oxford: Clarendon Press. Fodor, J. A. (1994). Concepts: A pot-boiler. Cognition, 50, 95–113. Fodor, J. A., & Lepore, E. (2002). The compositionality papers. Oxford: Clarendon Press. Franks, B. (1995). Sense generation: a ‘quasi-classical’ approach to concepts and concept combination. Cognitive Science, 19(4), 441–505. Garrod, S., & Pickering, M. J. (2004). Why is conversation so easy? Trends in Cognitive Sciences, 8, 8–11.
200
Nick Braisby
Gelman, S. A., & Wellman, H. M. (1991). Insides and essences: Early understandings of the nonobvious. Cognition, 38, 213–244. Hampton, J. A. (1993). Prototype models of concept representations. In J. v. Mechelen, J. A. Hampton, R. Michalski, & P. Theuns (Eds.), Categories and concepts: theoretical views and inductive data analysis (pp. 67–95). London: Academic Press. Kamp, H., & Partee, B. (1995). Prototype theory and compositionality. Cognition, 57(2), 129–191. Kay, P., & Zimmer, K. (1976). On the semantics of compounds and genitives in english. In Sixth annual meeting of the california linguistics association. San Diego, CA. Kruschke, J. K. (1992). Alcove: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44. Levi, J. N. (1978). The syntax and semantics of complex nominals. London: Academic Press. Margolis, E., & Laurence, S. (Eds.). (1999). Concepts: Core readings. Cambridge, MA: MIT Press. McCloskey, M., & Glucksberg, S. (1978). Natural categories: well defined or fuzzy sets? Memory and Cognition, 6, 462–472. Medin, D., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning. Cambridge: CUP. Medin, D. L. (1989). Concepts and conceptual structure. American Psychologist, 44(12), 1469–1481. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207–238. Mervis, C. B., Catlin, J., & Rosch, E. (1976). Relationships among goodnessof-example, category norms, and word frequency. Bulletin of the Psychonomic Society, 7, 283–284. Murphy, G. L. (1988). Comprehending complex concepts. Cognitive Science, 12, 529–562. Murphy, G. L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psychological Review, 92, 289–316.
Perspectives and Complex Concepts
201
Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of speeded categorization. Psychological Review, 104, 266–300. Osherson, D. N., & Smith, E. E. (1981). On the adequacy of prototype theory as a theory of concepts’. Cognition, 9, 35–58. Rips, L. J., Shoben, E. J., & Smith, E. E. (1973). Semantic distance and the verification of semantic relations. Journal of Verbal Learning and Verbal Behaviour, 12, 1–20. Rosch, E. (1975). Cognitive representation of semantic categories. Journal of Experimental Psychology: General, 104, 192–233. Rosch, E., Simpson, C., & Miller, R. S. (1976). Structural bases of typicality effects. Journal of Experimental Psychology: Human Perception and Performance, 2, 491–502. Shoben, E. J. (1991). Predicating and non-predicating combinations. In P. J. Schwanenflugel (Ed.), The psychology of word meanings (pp. 117–135). Hillsdale, N.J: Lawrence Erlbaum Associates. Zadeh, L. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Compositionality and the Pragmatics of Conceptual Combination Fintan J. Costello and Mark T. Keane The principle of compositionality, as normally defined, states that the meanings of complex linguistic expressions (such as phrases or sentences) are determined solely and completely by the meanings of their component words and the structural relations between those words. According to this principle, anyone who understands each word in a complex expression should need no further information to correctly understand the expression as a whole. If language is non-compositional, even someone who understands each word in a complex expression may not be able to understand the expression, because they lack some further specific information. Is human language compositional or noncompositional? On one hand, compositionality seems to be essential for linguistic communication, and especially for our ability to convey new information by putting words together to form new expressions: if language were noncompositional, even someone who understood all the constituent words of a new expression would never be sure that they had correctly understood the expression as a whole. On the other hand, people frequently seem to interpret complex expressions non-compositionally, inferring properties for the expression that are not determined solely and completely by its constituent words. Perhaps the bestknown example of this non-compositionality is a noun-noun combined phrase, ‘pet fish’: most people naturally understand the combined phrase ‘pet fish’ to mean a small, brightly coloured fish kept in a glass bowl, perhaps a guppy or a goldfish. However, when asked to interpret the words ‘pet’ or ‘fish’ on their own, people never mention the properties brightly coloured or in a glass bowl (See Hampton, 1988; Fodor & Lepore, 1994; Storms, De Boeck, Van Mechelen, & Ruts, 1998). These properties are thus not determined solely and completely by the words ‘pet’ and ‘fish’: someone who understands the words ‘pet’ and ‘fish’ is not guaranteed to be able to infer these properties for ‘pet fish’. These are non-compositional, emergent properties of the phrase in question. Address for correspondence: Fintan J. Costello, Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
204
Fintan J. Costello and Mark T. Keane
The principle of compositionality as stated above is defined in terms of the meaning of a complex expression and its relationship to the meaning of that expression’s constituent words. This definition, however, is not particularly helpful in terms of our understanding of the role of compositionality in language or our understanding of the occurrence of non-compositionality in phrases such as ‘pet fish’. The problem is that this definition is a statement about meaning, and meaning is a subjective quality: we can only have direct access to our own meanings for words and expressions, and do not have direct access to other people’s meanings. Any discussion of the compositionality or non-compositionality of a given phrase which takes this definition as its starting point is thus likely to get bogged down in debate over what exactly constitutes the meaning of the words in that phrase. To avoid this problem, in this chapter we propose a more objective definition of compositionality in terms of the extent to which a complex expression can be understood by people with different background knowledge. We then use this definition to explain why non-compositionality arises in noun-noun phrases, and to explain why this non-compositionality is not as serious a problem as it seems. We begin by examining non-compositionality in novel noun-noun combinations.
1
Non-compositionality in Novel Noun-noun Phrases
Why do people infer non-compositional properties for noun-noun compounds such as ‘pet fish’? To answer this question we must consider the mechanisms through which people come up with meanings for phrases such as these. There are two possible ways in which people can produce a meaning for a phrase such as ‘pet fish’: they can either construct a meaning for the phrase by combining the meaning of its constituent words in some way, or, if the phrase is a familiar one that they have encountered before, they can treat it as if it were a single lexicalised item (that is, as if the phrase were a single word like ‘penknife’ or ‘inkwell’ that just happens to be made up of two components), and simply recall the phrase’s meaning from those previous encounters. The question of compositionality only arises in cases where people construct the meaning for a phrase by combining the meanings of its constituent words in some way: for familiar phrases (which can be understood by recall rather than by the construction of a meaning) there is no more reason to expect a compositional relationship between the constituent words of the phrase and the phrase as a whole than there is to expect such a relationship in phrases such as ‘penknife’ or ‘inkwell’. Given that ‘pet fish’ refers to a familiar set of already-known items, it is plausible to suggest that ‘pet fish’ is actually a lexicalised item: that is, a familiar phrase that is understood as if it were a single word. If ‘pet fish’ is a
Compositionality and Pragmatics
205
lexicalised item, then its apparent non-compositionality is not in any way interesting: since the phrase is not being understood by combining the meanings of its component words, the fact that its meaning is not determined solely and completely by the meanings of those component words is not surprising. To examine the extent of non-compositionality in noun-noun phrases, therefore, we must avoid familiar phrases like ‘pet fish’ (which may potentially be lexicalised items) and instead focus on completely novel compound phrases: that is, on phrases which have never been encountered before and of which people have never seen any previous instances. In fact, non-compositional property inference is frequent in such novel phrases. To interpret a completely novel phrase requires the listener to construct a novel complex concept to represent the meaning of that phrase. In constructing this new concept the listener will need to make novel inferences to link together the two constituent words of the phrase in question. Indeed, the listener will often infer further properties to allow the resulting combined concept ‘make sense’; that is, to make the resulting concept more acceptable and believable. In a series of psychological studies (Costello & Keane, 2000, 2001), we examined how people interpreted novel compound phrases by asking them to come up with meanings for randomly generated noun-noun combinations. To give an example of non-compositionality in such novel combinations, consider three randomly-generated phrases used in one of these experiments: ‘cow spoon’, ‘grass bird’, and ‘rose gun’. When asked to interpret these phrases, the following meanings were produced by participants in our experiment: • a ‘rose gun’ is a gun that shoots an insect repellent spray at roses; • a ‘cow spoon’ is a child’s spoon with a picture of a cow’s head on the handle; • a ‘grass bird’ is a bird that hides in long grass and reeds near rivers and lakes. Note that each of these interpretations contains non-compositional information; that is, information which someone who understands the constituent words in those phrases (‘cow’ and ‘spoon’, ‘grass’ and ‘bird’, ‘rose’ and ‘gun’) is not guaranteed to know. In the first example, this extra information is that insect repellent is often sprayed onto roses; in the second example, it is that birds live in reeds near rivers and lakes; in the third, it is that children’s spoons are often decorated with pictures and that these decorations are typically on the spoon’s handle. Why did our participants deviate from compositionality in these novel nounnoun combinations? What did they gain by including these non-compositional
206
Fintan J. Costello and Mark T. Keane
properties in their interpretations? In each of the above examples the noncompositional properties inferred for these novel phrases add something to the interpretation of those phrases: they make those interpretations more believable by providing extra evidence that supports the interpretation. We can see this by comparing with ‘stripped-down’, compositional versions of the above interpretations: • a ‘rose gun’ is a gun that shoots at roses; • a ‘cow spoon’ is a spoon with that resembles a cow in some way; • a ‘grass bird’ is a bird that is found in grass. These three interpretations are less convincing and less acceptable as meanings for the phrases in question, because these interpretations do not contain the extra, corroborating information that the more detailed non-compositional interpretations provide. In this paper we examine non-compositionality in novel phrases from the perspective of a theory of noun-noun conceptual combination called Constraint Theory. In this theory, combined concepts (like the ones used to understand the noun-noun compounds ‘cow spoon’, ‘grass bird’ and ‘rose gun’) are constructed so that they satisfy various constraints imposed by the pragmatics of communication and language use. Constraint Theory has been implemented in a computer model which, when given concept descriptions, can produce combined concepts similar to those produced by people for the same concepts (Costello & Keane, 2000). Specific predictions of the theory have also been verified experimentally (Costello & Keane, 2001). In the next section we give a functional definition of compositionality from the perspective of this theory. In this definition, compositionality is characterised in terms of the function it plays in language; in particular in terms of the role it plays in communication. According to this definition, compositionality is not an all-or-nothing proposition, but is graded so that different compound phrases have greater or lesser degrees of compositionality. Following this definition we discuss various different grades of compositionality that can arise for different types of compound. We then briefly describe Constraint Theory and explain the various constraints that control concept combination in this theory. We conclude by discussing how these constraints relate to compositionality, and argue that, while conceptual combination is not perfectly compositional, people aim to maximise compositionality when constructing combined concepts, subject to requirements imposed by the pragmatics of communication.
Compositionality and Pragmatics
2
207
An Objective Definition of Compositionality
Compositionality plays a number of different roles or functions in language and thought. Together these provide a graded definition of the idea of compositionality: the more a given complex expression satisfies these roles or functions, the more compositional that expression is. Here we list three different roles that compositionality plays in linguistic communication. One function of compositionality in language is to allow communication to take place between people who have different background knowledge. Compositionality states that anyone who understands each constituent word in a complex expression should need no further information to understand the expression as a whole. Under compositionality a speaker can produce any expression they want and expect any listener, even a listener who has quite different background knowledge of the world, to correctly understand that expression as long as they understand its constituent words. Other differences in knowledge between speaker and listeners will be irrelevant to their understanding. If language is non-compositional, even a listener who understands the constituent words of a expression will not necessarily understand the expression as a whole: some extra piece of specific knowledge may be necessary to understand the phrase as intended by the speaker. A second function of compositionality is to permit generative language: to allow a speaker to convey a completely new concept to a listener by producing a completely new complex expression to describe that concept. An almost infinite number of new expressions can be produced by combining the words in a language in novel ways; under compositionality, these new expressions can be understood by anybody who knows the meaning of words in the language. If language is non-compositional, even someone who knows the words in the language could nevertheless be unable to understand some new expressions, if they lack the further specific information necessary for those expressions. A third function of compositionality is to facilitate language learning (Butler, 1995). Under compositionality, once a learner has grasped the meaning of some of the words in a language they will be able to understand any complex expression containing those words, without needing to learn any further information. If language is non-compositional, a learner’s task may never be complete: before understanding any complex expression they would have learn not only its constituent words, but also any further specific information necessary for understanding that expression. These three functions or roles that compositionality plays in language all point to the importance of compositionality in communication between a speaker and a listener who have different knowledge. In the first case the speaker and the listener are able to communicate despite having different background
208
Fintan J. Costello and Mark T. Keane
knowledge; in the second the speaker possesses some new piece of knowledge they are able to convey to the listener; in the third case the language learner is able to communicate and understand language despite not having the full range of knowledge possessed by a fully-proficient speaker. Differences in knowledge between speakers and listeners are graded in nature: some speakers and listeners may have very significant differences in knowledge, while others may have only minor differences. This leads us to the following graded definition of compositionality: The degree to which a given complex expression is compositional is equal to the degree to which that expression can be understood correctly by people with a range of differences in background knowledge. Under this definition we can point to a number of different grades of compositionality for complex expressions. The highest grade of compositionality would be perfectly compositional expressions: those that can be correctly understood by every listener who knows the meaning of the constituent words of those expressions, irrespective of all differences in knowledge between the listener and the speaker who produced the expression. Examples of such perfectly compositional expressions might be ‘three-digit prime number’ or ‘odd square number’: everyone who correctly understands ‘three-digit’ and ‘prime’, or ‘odd’ and ‘square’, will understand these expressions correctly because the meanings of these expressions depend solely on information that everyone who understands those words is guaranteed to have. The next grade of compositionality would be represented by expressions that can be correctly understood by almost all listeners who know the meaning of the constituent words of those expressions, but for which there will be a few listeners who lack a particular piece of background knowledge necessary to understand the expression in the way intended by the speaker. The ‘pet fish’ example can be seen as a highly compositional phrase: most listeners have the background knowledge necessary to infer that pet fish are small, brightly coloured, and kept in bowls, and so can understand this phrase using that information. There will be some listeners, however, who, while they know the meanings of the words ‘pet’ and ‘fish’, do not have any knowledge of guppies, goldfish, aquariums, or fishbowls. For these listeners the phrase ‘pet fish’ is non-compositional: the interpretation that most people give to this phrase is one which these listeners would not understand. A lower grade of compositionality would consist of expressions that can be correctly understood by a reasonable proportion of listeners who know the meaning of the constituent words of those expressions, but which a sizable number of people will not understand. For example, people who are familiar with computers might understand the phrase ‘graphics chip’ as referring to a com-
Compositionality and Pragmatics
209
puter chip, often used by game players, which can be added to a computer system allow the accelerated display of detailed graphics. Others, however, might understand both words ‘graphics’ and ‘chip’ correctly (a graphical display on a computer screen; a silicon chip that carries out computations) but lack the more specific knowledge about computer games and graphics acceleration that allows the correct interpretation of that phrase. Such moderately compositional expressions will often be jargon or terms of art: understood by people who share the background knowledge of a particular domain, but appearing noncompositional to those outside that domain. Finally, the lowest grade of compositionality would be represented by expressions or phrases which can only be understood by a small clique of people who have the specific knowledge needed to comprehend that phrase. For example, Downing (1977), describes a group of friends living in an apartment building where one resident habitually left her bicycle on the stairs, where it was in everybody else’s way. Among this group, the phrase ‘bike girl’ came to mean ‘an inconsiderate and selfish person’. To anyone outside this group, the phrase ‘bike girl’, used in this way, would clearly be non-compositional (outsiders would not have the information necessary to understand the phrase). Among the group, however, the phrase would be compositional: anyone living in that building who had been repeatedly inconvenienced by this person would be able to understand the phrase correctly, even if they had never heard it used in that way before. How does this graded version of compositionality compare with the standard definition (where a complex expression is compositional only if its meaning is determined solely and completely by the meanings of its component words)? Should it be described as compositionality at all? In our account a perfectly compositional expression is one that can be correctly understood by every listener who knows the meanings of the constituent words of that expression. This means that a phrase can be classified as perfectly compositional even when its meaning relies on information that is not contained in the meaning of any of its constituent words: the complex expression can contain any information at all, as long as every listener who knows those constituent meanings would use that information when understanding the complex expression. Given this, some readers may conclude that what we describe is not compositionality. In our view, this conclusion rests on a misunderstanding of the standard definition of compositionality. That definition does not require that the meaning of a complex expression must contain only information from that expression’s constituent words. Instead, the requirement is that the meaning of the complex expression be determined by the meaning of its constituent words. In our definition, a perfectly compositional expression need not contain only information from that expression’s constituent words. However, it must only contain in-
210
Fintan J. Costello and Mark T. Keane
formation that every listener who knows those words is guaranteed to have: in other words, information that is determined by those words. Our graded definition of compositionality is thus consistent with the standard approach: what we describe is, indeed, compositionality. Our graded definition of compositionality is useful in a number of ways. First, it provides for an objective measure of the degree of compositionality of complex expressions, which could be used to ground compositionality in everyday language use. Second, this definition does not take any theoretical position on how word-meaning is represented: it does not require us to assume that meaning is represented in terms of prototypes, feature lists, definitions, or other structures. This allows us to discuss compositionality without having to first agree on how word meaning is represented and on what the meaning of a given word is. Third, this definition gives us a position from which to consider how compositionality interacts with other factors in peoples’ interpretation of complex expressions. We follow up this third point by describing how compositionality interacts with the various pragmatic constraints that, according to our theory, control how people interpret novel compound phrases. 3
Constraint Theory: The Pragmatics of Compound Phrases
A listener trying to understand a novel combined phrase (such as ‘cow spoon’, ‘grass bird’ or ‘rose gun’) must construct a combined concept to represent the meaning of that phrase. For communication to take place correctly, the listener must construct a combined concept that is the same as, or close to, the concept intended by the speaker of that phrase. To guide their construction of this concept, the listener can make a number of pragmatic assumptions about the intentions of the speaker who produced the phrase. At the most basic the listener can assume that the speaker, in uttering the phrase, is trying their best to indicate a particular combined concept (Grice’s Principle of Cooperation; see Grice, 1989). This assumption of co-operation leads to three more specific inferences that the listener can validly make about that combined concept: • that the intended combined concept is one which the listener already moreor-less knows (otherwise the speaker would not have used a brief, compressed compound, but would have described the intended concept in more detail); • that the intended combined concept is one best identified by the two words in the phrase (otherwise the speaker would have selected different words); • that the intended combined concept is one for which both words in the phrase are necessary (otherwise the speaker would have used fewer words).
Compositionality and Pragmatics
211
In Constraint Theory these three pragmatic assumptions are instantiated in three constraints of plausibility, diagnosticity and informativeness, which control the construction of combined concepts. To interpret a novel compound correctly the listener constructs an interpretation which best satisfies these three constraints; such an interpretation will meet the pragmatic assumptions and thus be the correct combined concept as intended by the speaker. The first constraint, that of plausibility, requires that a listener interpreting a compound phrase should produce an interpretation describing something that is similar to things they have seen before (in other words, something that the listener already more-or-less knows). This requirement is justified by the pragmatic assumption that the speaker who produced the phrase intends to refer to something that the listener can reconstruct easily (otherwise the speaker would not have used a compound phrase to refer to this thing). We can see this requirement in action in the interpretations that people generated for the phrases ‘rose gun’, ‘cow spoon’, and ‘grass bird’ as described above. The interpretation ‘a rose gun is a gun that shoots an insect repellent spray at roses’ satisfies the plausibility constraint because it describes something similar to other types of gardening tools; the interpretation ‘a cow spoon is a child’s spoon with a picture of a cow’s head on the handle’ satisfies plausibility because it describes something similar to other children’s spoons; and the interpretation ‘a grass bird is a bird that hides in long grass and reeds near rivers and lakes’ satisfies plausibility constraint because describes something consistent with out prior knowledge about bird habitats. Our second constraint, the diagnosticity constraint, requires that a listener interpreting a compound phrase should produce an interpretation best identified by the words in the phrase being interpreted. Diagnosticity requires an interpretation to contain some properties which are diagnostic of one constituent concept of the phrase being interpreted, and some properties which are diagnostic of the other constituent concept (a property is diagnostic for a concept if it occurs frequently in examples of the concept and rarely in other concepts; Rosch, 1978). This requirement is justified by the pragmatic assumption that the speaker who produced the phrase selected the particular words they did because those words most accurately identify the intended referent (otherwise the speaker would have selected different words). Again, the interpretations for the novel phrases ‘rose gun’ , ‘cow spoon’ and ‘grass bird’ all satisfy this constraint: all interpretations either mention the constituent concepts of the phrase explicitly, or mention a diagnostic property of those concepts. For example, the ‘rose gun’ interpretation mentions shooting (an ability that is diagnostic of guns); the ‘cow spoon’ interpretation mentions a picture of a cow’s head (a distinctive property of cows) and the ‘grass bird’ interpretation mentions rivers and lakes (locations associated with birds).
212
Fintan J. Costello and Mark T. Keane
Finally, the informativeness constraint requires that a listener interpreting a compound phrase should construct a combined concept which contains new information that could not have been conveyed by either of the constituent words in that phrase on their own. This requirement is justified by the pragmatic assumption that the interpretation intended by the speaker who produced the phrase is one for which both words in the phrase are necessary (otherwise the speaker would have used fewer words). Again, the ‘rose gun’ , ‘cow spoon’ and ‘grass bird’ interpretations all satisfy this constraint: each interpretation contains new information that could not be inferred from the constituent words of the phrase on their own (a ‘rose gun’ sprays insect repellent; a ‘cow spoon’ is used by children; a ‘grass bird’ hides in long grass). These three constraints explain why the above interpretations, even though they are not fully compositional, were produced by people when given these phrases, and why people did not produce simpler fully-compositional interpretations for these phrases (such as ‘a rose gun is a gun which shoots at roses’, ‘a cow spoon is a spoon with that resembles a cow in some way’, ‘a grass bird is a bird that is found in grass’). These fully-compositional interpretations were not produced because they do not satisfy the plausibility and informativeness requirements imposed by the pragmatics of communication: they do not give enough information to evoke similarity to previously-seen items (and so are not plausible) and do not provide any new information that makes the use of a compound phrase worthwhile (and so are not informative). So where does this leave us? From the perspective of the standard all-ornothing view of compositionality, it seems that the pragmatics of communication will necessarily lead to non-compositionality in some compound phrases, and we must therefore conclude that human language is non-compositional. This is a serious problem, given the importance of compositionality for language and communication: as stated previously, if language is non-compositional then even someone who understands all the constituent words of a given complex expression can never be sure that they have correctly understood that expression as a whole (since there may be some non-compositional information which they are missing and which is essential to the correct understanding of that phrase). From the perspective of our proposed graded definition of compositionality, however, this problem is less serious. From this graded perspective we can conclude that the pragmatics of communication will indeed lead to some compound phrases being less than perfectly compositional. However, even if these phrases are less than perfectly compositional, they can still have a high degree of compositionality, meaning that most people who understand the constituent words of the phrase will also possess the non-compositional information necessary to understand the phrase correctly. In other words, our graded definition allows for a compromise between the needs of compositionality and the constraints imposed
Compositionality and Pragmatics
213
by the pragmatics of communication. In the next section we discuss how these two factors interact in the Constraint Theory view of conceptual combination.
4
Interaction between Compositionality and Pragmatics
According to Constraint Theory, when constructing an interpretation for a combined phrase such as ‘rose gun’, ‘cow spoon’, or ‘grass bird’, people try to maximise its plausibility (to ensure that the thing described is as likely to exist as possible), to maximise its diagnosticity (to ensure the interpretation contains properties which are as diagnostic as possible for the constituent concepts of the phrase), and to maximise its informativeness (to ensure that the interpretation contains as much new information as possible). These three constraints are often in conflict. For example, the more informative an interpretation is (the more new information contained in the interpretation), the less plausible it is likely to be (the less similar it will be to things we have seen before). These three constraints each interact in a different way with the compositionality of a compound phrase interpretation. The diagnosticity constraint acts to maximise the compositionality of an interpretation. If an interpretation were constructed solely from diagnostic properties, it would contain only those properties which are most strongly associated with the constituent concepts of that phrase. These are properties that the largest proportion of listeners would be able to infer from the constituent words of the phrase, thus maximising the compositionality of that interpretation according to our definition. In the ‘rose gun’ interpretation, for example, the diagnostic property of shooting would be inferred by most listeners for that phrase: everybody who knows the meaning of the word ‘gun’ will associate that property with the word. The plausibility and informativeness constraints, however, may act against compositionality, producing interpretations with properties that could not be inferred from the constituent words alone. The plausibility constraint may require the listener to add properties to an interpretation to make it more plausible (more similar to things they already know). These properties are not necessarily compositional (not derived from the constituent words of the phrase), but are required because of the pragmatic assumption that the phrase being interpreted must describe something which the listener already more-or-less knows (otherwise the speaker would not have communicated using a brief compound phrase, but would have described the intended referent in more detail). The informativeness constraint may also cause the listener to add new properties (new information) to an interpretation. Again, these properties are not necessarily compositional, but are required because of the pragmatic assumption that the speaker decided to use a compound because they intended to convey more in-
214
Fintan J. Costello and Mark T. Keane
formation than could be conveyed by a single word alone. We can see both of these constraints in action in the interpretations for ‘rose gun’, ‘cow spoon’, and ‘grass bird’. The assertions that a ‘rose gun’ shoots insect repellent spray, that a ‘cow spoon’ is used by children, and that a ‘grass bird’ hides in long grass and reeds, are included in the interpretations of these phrases not because they are a necessary inference from the constituent words of those phrases, but because they make the interpretations more plausible and more informative (satisfying the pragmatic requirements of communication). The diagnosticity constraint, then, acts to maximise the compositionality of compound phrase interpretations (requiring the inclusion of information that most people would associate with the constituent words of the phrase in question), while the plausibility and informativeness constraints in some cases will pull interpretations away from compositionality. In interpreting a given compound phrase, a listener aims to maximise the degree to which all three constraints are satisfied: a successful interpretation for a given compound phrase will be one which has the highest possible degree of compositionality consistent with satisfying the other pragmatic constraints. In this view, when people deviate from compositionality in interpreting a compound phrase they do so to the minimum extent necessary to satisfy the requirements of plausibility and informativeness, so that most of the people who understand the constituent words of the phrase will also possess the non-compositional information necessary to understand the phrase. We can see this maximising of compositionality in our example interpretations of the phrases ‘rose gun’, ‘cow spoon’ and ‘grass bird’: the non-compositional information included in those interpretations (that the gun sprays an insect repellent; that the spoon is a child’s spoon with a picture of a cow’s head; that the bird hides in grass near rivers and lakes) is information that, while it is not part of the meaning of the words in question, is information which most people who know the meanings of those words would also possess. Thus, while the interpretations for these phrases are not perfectly compositional, they all have a relatively high degree of compositionality.
5
Conclusions
Compositionality is normally defined as a requirement that the meaning of complex linguistic expressions are determined solely and completely by the meanings of those expressions’ constituent words. In this chapter we have argued that this definition does not help our understanding of compositionality or of the occurrence of non-compositional linguistic expressions. The problem is that this definition is a statement about meaning, and meaning is subjective: we only have access to our own meanings for words and expressions. To address
Compositionality and Pragmatics
215
this problem, we present a more objective, graded definition of compositionality in which the degree to which a given complex expression is compositional is equal to the degree to which that expression can be understood correctly by people with a range of differences in background knowledge. Applying this graded definition to a theory in which compound phrases are interpreted using constraints imposed by the pragmatics of communication, we conclude that when people deviate from compositionality in such phrases they do so to the minimum degree possible consistent with satisfying those other pragmatic constraints. References Butler, K. (1995). Content, context and compositionality. Mind & Language, 10(1/2), 3–24. Costello, F. J., & Keane, M. T. (2000). Efficient creativity: Constraint guided conceptual combination. Cognitive Science, 24(2), 299–349. Costello, F. J., & Keane, M. T. (2001). Testing two theories of conceptual combination: Alignment versus diagnosticity in the comprehension and production of combined concepts. Journal of Experimental Psychology: Learning, Memory & Cognition, 27(1), 255–271. Downing, P. (1977). On the creation and use of English compound nouns. Language, 53, 810–842. Fodor, J., & Lepore, E. (1996). The red herring and the pet fish: Why concepts still can’t be prototypes. Cognition, 58, 253–270. Grice, P. (1989). Studies in the way of words. Cambridge, Mass.: Harvard University Press. Hampton, J. A. (1988). Overextension of conjunctive concepts: Evidence for a unitary model of concept typicality and class inclusion. Journal of Experimental Psychology: Learning , Memory and Cognition, 15, 55–71. Kamp, H., & Partee, B. (1995). Prototype theory and compositionality. Journal of Experimental Psychology: Learning , Memory and Cognition, 57, 129– 191. Rosch, E. H. (1978). Principles of categorization. In E. H. Rosch & B. Lloyd (Eds.), Cognition and categorization (pp. 27–48). Lawrence Erlbaum Associates. Storms, G., De Boeck, P., Van Mechelen, I., & Ruts, W. (1998). Not guppies, nor goldfish, but tumble dryers, Noriega, Jesse Jackson, panties, car
216
Fintan J. Costello and Mark T. Keane
crashes, bird books, and Stevie Wonder. Memory & Cognition, 26, 143– 145.
The Goldilocks Scenario: Is Noun-noun Compounding Compositional? George Dunbar The psychological literature on concept combination has taken as its main focus the study of noun-noun (NN) combinations. Other types of combination, such as adjective-noun, or verb-argument combination, have been less frequently studied, though by no means ignored (e.g. Smith et al., 1988; Levin & Pinker, 1991). Nevertheless, NN combinations have often been taken as the paradigm of combination. Existing models of the cognitive processes involved when a hearer interprets a novel NN combination emphasise the construction of an interpretation from the concepts associated with the constituent nouns. That approach offers the promise of a compositional account. In contrast, this paper shows that novel NN combinations cannot be interpreted reliably from the constituent nouns without contextual support. More than that, the paper argues that it is important to understand NN combinations from the speaker’s perspective. The successful production of a novel, or just unfamiliar, NN combination requires the speaker to allow for the addressee’s current knowledge. Wisniewski (1997) provides one of the most precise cognitive models of the interpretation of NN combinations. Wisniewski classifies three types of NN combination in English: relational, property and hybrid. He claims that relational combinations are understood by an augmented version of the schema modification process described in Cohen and Murphy (1984). In this process, the modifier (the noun in the first position in the combination) occupies a slot in a scenario, with the scenario typically being part of the conceptual representation of the other noun. Both other types of compounding involve a two-stage process. First, comparison notes areas of similarity, and consequently difference. The differences provide candidates for the property to be provided by the modifier, the areas of similarity provide a potential aspect of the second noun upon which the property can be applied. The actual property transferred is elaborated or instantiated by context in a second processing stage. Central to Address for correspondence: University of Warwick, Coventry, CV4 7AL, United Kingdom. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
218
George Dunbar
Wisniewski’s theory seems to be the view that NN combinations are largely selfcontained, a function largely of “knowledge in the constituent concepts themselves” (1997, p. 174), and, although he acknowledges that discourse context may influence the process, its role is not elaborated. The self-contained view is held also by Shoben and Gagne (1997), who conclude that relational interpretations of NN combinations, the type they argue is most normal, are constructed by the hearer using knowledge of the kinds of relationship a noun has tended to contract with other nouns in the past to bias the interpretation of an innovative combination towards one employing the most frequent of those relationships again. Much of Wisniewski’s evidence comes from the definitions participants give to novel combinations presented in isolation. He shows, for example, that they can provide property mapping as well as thematic interpretations (Wisniewski, 1996, Experiment 1), that this is more likely when the Ns are more similar (Wisniewski, 1996, Experiment 2), and that it can occur even when plausible thematic interpretations are potentially available for a particular NN combination (Wisniewski, 1998). The choice of novel combinations is important, because there are difficulties with using established combinations experimentally (Hampton, 1997; Rips, 1995; Wisniewski, 1996). For example, established combinations may be interpreted simply by retrieval, having been learned. However, there are also difficulties in using null contexts to study combinations. First, the assumption that “listeners have little trouble comprehending them” (Wisniewski, 1998, p. 177) is not always borne out for innovative language in other settings. For example, Gerrig (1989) found high error rates on meaning paraphrase judgements in a study of lexical innovations in which each innovation was preceded by a context-setting paragraph designed by the experimenter to make the interpretation determinate. Second, although the very notion of a null context is, of course, theoretically dubious (Crain & Steedman, 1985), there is evidence that processing innovative language without explicit context is qualitatively different from processing it with context (Gerrig & Healy, 1983). Third, a crucial difference between laboratory studies of novel NN combinations presented in isolation and real-world lexical innovation is that in the latter case there is an intended meaning. The conjecture explored in this paper is that the need to convey an intended meaning, rather than only the ability to construct a plausible interpretation, is key to understanding NN combination in English. NN combination is primarily something the speaker does with the hearer in mind, rather than the converse. Pragmatic approaches to language understanding contrast with the selfcontainment view in key respects. Relevance theory (Sperber & Wilson, 1986) develops a pragmatic account of communication which emphasises the role of the speaker in selecting the linguistic stimuli that will lead the hearer to the
Goldilocks Scenario
219
intended meaning readily. The key to this is what is termed the Principle of Relevance (p. 158): the presumption that acts of ostensive communication are optimally relevant. This principle plays roughly the role that the Cooperative Principle and maxims play in Grice (1975) – indeed, one of Grice’s maxims, the Maxim of Relation, was “be relevant”. Optimal relevance has two components. First, for a stimulus, such as a phrase, to be optimally relevant, the level of contextual effect achievable is never less than enough to make the stimulus worthwhile for the hearer to process. Contextual effects, implications and changes in the strengths of beliefs (termed assumptions), are effects deduced from the union of the new information in the stimulus and the context, but not deducible from either alone. The second component of optimal relevance is that the level of effort required is never more than needed to achieve these effects. Sperber and Wilson (1986) argue that from optimal relevance it follows that the stimulus chosen to communicate will be the one that requires the least processing effort to make the speaker’s particular informative intention mutually manifest. They further argue that as a consequence of this, the first interpretation recovered by the hearer that is consistent with the belief that the speaker intended this interpretation will be the intended interpretation. This follows because speaker and hearer share the presumption that such stimuli will be optimally relevant, the Principle of Relevance. The implication is that a speaker, taking into account his understanding of his interlocuter’s beliefs, should select a stimulus that leads to the correct interpretation first – if the first interpretation reached were not the correct one, then he should have chosen a different stimulus, for example by adding explicit information. Clark and Clark (1979) describe a general theory of innovative language, illustrated with a detailed empirical study of denominal verbs. Their theory is essentially a pragmatic one. They analyse denominal verbs as “contextuals”, expressions whose sense is dependent upon context, in an analogous way to the dependence of indexicals upon context for their reference. Just as the reference of a deictic expression like now varies infinitely, according to the time of utterance, they claim the sense of a denominal verb like Houdini, as in “Tom can houdini his way out of almost any scrape”, will vary infinitely according to the mutual knowledge of the speaker and hearer (the knowledge speaker and hearer know they share), especially their knowledge of the original nominal – in this case the proper noun Houdini, a well-known escapologist. Any mutually known property of Houdini could be the basis for the denominal verb, as long as the speaker “... has good reason to believe... that on this occasion the listener can readily compute... uniquely... on the basis of their mutual knowledge...” the intended denotation (Clark & Clark, 1979, p. 787). They explicitly relate this convention to the Cooperative Principle (Grice, 1975). Elements of category knowledge with high cue validity, termed predominant
220
George Dunbar
features, play a special role, for Clark and Clark, in denominal verbs that move beyond the status of transient innovation to become established in the language. When a predominant feature is its basis, then a new term will be interpretable by the whole linguistic community, since this is knowledge they all share. This is, however, just a special case. In different contexts, features which are not predominant, or indeed knowledge which is not generic, may be more salient, and “the listener must decide which of the possible senses is most salient” (ibid., p. 795). Clark and Clark suggest that the main reason for denominal verbs is economy of expression, although they also cite, as rhetorical advantages derived from economy, precision, vividness, and surprise, surprise being useful for example in humour. Clark and Clark’s approach is similar in spirit to Relevance theory, although there are some differences, and both contrast with the self-containment theories in their emphasis on the role of cooperative and coordinated activity by both speaker and hearer. The self-containment approach emphasises NN combination as a problem for the listener. It is clear that, on the pragmatic account, the notion of an interpretation in isolation from any context is defective. We would predict from this that readers presented with novel stimuli in isolation will experience difficulty both because they cannot make the presumption of optimal relevance, since they have no evidence of intentionality, and because they therefore have no basis for differentiating the intended interpretation from any conceivable interpretation. As Sperber and Wilson (1986) point out, the number of possible interpretations is uncountably infinite. The studies which will be described in the paper were designed to investigate these questions. The first was a case study of a particular innovative combination, Goldilocks economy, examining the way it is used, and the ways usage changed longitudinally. The second study was a simple experiment to see whether this innovative combination could be interpreted in isolation by participants who had not yet encountered it, even if they were familiar with its constituent nouns. It was found that (a) when used, the phrase was often explicitly defined by the speaker; and (b) participants in the experiment were typically unable to provide the correct interpretation given the phrase in isolation. The results are discussed in relation to their implications for the view that noun-noun compounding is compositional, and in terms of the communicative function of innovative compounds. 1
The Corpus Study
A corpus was constructed consisting of the text of two daily financial newspapers (the Wall Street Journal (WSJ), Jan 1991 – July 1998, and the Finan-
Goldilocks Scenario
221
cial Times (FT), Jan 1992 – July 1998). A search located 192 occurrences of Goldilocks in 137 economics articles in the WSJ and FT. The earliest use observed, Goldilocks syndrome, occurs in the first half of 1992 in the WSJ. There is then a gap of 18 months before the next, Goldilocks economy. Most articles provided an explicit definition of the term, where the meaning of the phrase was amplified paratactically. It is not obvious why that should be necessary if the meaning of a compound can be reconstructed from its component nouns. More specific evidence that the motivation for giving definitions was pragmatic comes from a change in the pattern over time. Definitions were much more common in the early period, and were less frequent later. Prior to July 1997, 86% of uses in the FT carried an explicit definition, but after that date only 63% did. The difference was statistically significant. Again, this would be expected on a pragmatic account. As time passed, more people would have become familiar with the term, and could be relied upon to know what it meant on the basis of prior knowledge. As time went by, then, the person producing the term could have greater confidence that it would be understood without recourse to a definition. To examine the relationship between individual writers and definitions, FT articles where a single reporter was credited as author were identified (a column called “Lex” was also included). There were 30 such writers. They were divided according to the date of publication of their first Goldilocks article – 15 publishing it before 1.7.97, 15 after. Of those in the earlier group, 87% included a definition in this debut article, compared to 40% in the later group, a statistically significant difference. These results provide evidence consistent with the hypothesis that definitions are provided when it seems more likely to the writer that the term will be new or unfamiliar to the reader. As the term becomes established in a community, definitions are less likely to be provided. Cogent additional evidence comes from six readers’ letters published in the FT which used the term. None provided an explicit definition. In each case, the letter was a response to an article in which the term had been used. Understanding of the meaning of the term can therefore be taken by the letter writer to be part of the common ground or mutual understanding upon which interpretation of the letter is to be based (Clark & Marshall, 1981). This is evidence that the motivation for giving definitions is indeed pragmatic. The definitions can be considered as falling into two groups, which I will term hints and explanations. Hints were variations on the “not too hot and not too cold, but just right” slogan, the phrase Goldilocks is fabled to have uttered upon finding a bowl of porridge that she liked, including substitutions of fast, slow, weak or strong, and order variations. Hints were common, constituting 68% of definitions in the WSJ and 30% in the FT. Hints indicate that the meaning relates to an optimal configuration of some
222
George Dunbar
parameter, but do not provide a mapping from the vehicle (the heat of porridge) to the topic. Implicitly, the mapping is to the economy considered holistically, and this is not difficult for a reader because it is conventional among people who live in market economies to apply metaphors of heat, speed, and strength to the economy (Lakoff & Johnson, 1980). In a sense, the hint is just indicating which of Goldilocks’ stereotypical properties is to be transferred, with mapping and instantiation left as the beholder’s share (cf. Wisniewski, 1997). However, although the interpretation achieved from a hint by simply instantiating the heat, or strength, of (a token of) the economy with an optimal middling value may be sufficient for many purposes, it is conceptually vague. As we shall see shortly, a more specific understanding is possible. Explanations provide a more explicit mapping onto features of an economy. A particularly frequent type is illustrated by the phrase “modest growth but not enough to fuel inflation” (WSJ, 5.8.96). One of the curious things about this dataset, however, was the wide variation in the features identified in explanations. Example explanations of Goldilocks economy found in the corpus: 1. low inflation and steady growth 2. low inflation, low unemployment and a soaring stockmarket 3. one in which the bears are kept at bay 4. not too cold for the unemployed, not too hot to cause inflationary pressures 5. the miracle combination of negligible inflation and very high levels of capacity and manpower utilization 6. a ‘just right’ combination of stable interest rates and healthy economic growth One more aspect of the corpus data poses a difficulty for accounts that rely on composing the meaning of a compound from component nominal concepts. Although clearly the same concept is being referred to, across usages the actual head noun used varies. Head nouns included: Goldilocks economy, scenario, recovery, vision, growth, era, story, phase, period, expansion, and spin. The first two were by far the most common manifestations, and some of the others appeared only once in the corpus. Nevertheless, if the calculation of the meaning of the compound rests primarily on a process of construction from the conceptual elements of the constituent nouns, how can it work to vary the constituent nouns and still have the same result?
Goldilocks Scenario
2
223
The Experiment
One implication of the pragmatic account is that the expression Goldilocks economy will not be interpretable by an intelligent reader in isolation. Or, rather, an intelligent reader could not be expected to alight upon the intended interpretation reliably. Experiment 2 tests this by asking participants to define the term. We predict that although participants may be consistent among themselves, as Wisniewski has reported in a number of studies with other NN combinations, they will not reliably pick the “correct” interpretation. In addition, they will not be confident about their definition because they will realise that they have insufficient information. Seventeen undergraduate students participated for course credit. They were asked to evaluate both familiar and novel combinations, and afterwards the individual constituent words. They indicated their familiarity with each term on a scale from zero to five. The endpoints were labeled “Completely new to me” and “Very familiar”, respectively. Second, they were asked to write down what they believed the expression meant. Third, they were asked to rate their confidence in their definition on a scale with endpoints, zero and five, labeled “Just guessing” and “Absolutely certain – that’s definitely what it means”. The task typically took around 15 minutes. Confidence and familiarity were both low for novel combinations such as Goldilocks economy, see Table 1. In contrast, high ratings were returned for the constituent words, and for familiar combinations, such as fire drill. 3
Conclusion
The two studies briefly described generated a number of results not predicted by traditional approaches to NN compounding that construct the meaning of the combination from the component nominals. In the experiment, consistent with predictions of the pragmatic approach, participants could not construct accurate interpretations of an unfamiliar NN compound out of context. Moreover, as predicted, participants were not confident about the interpretations they did make. The corpus study found that writers commonly provided definitions when using a novel NN compound, but were less likely to include a definition once it became increasingly likely that the reader would be familiar with the compound. Indeed, when the novel compound was used in a letter responding to an article in which the compound had been used, definitions were never given. Moreover, the constituents of the compound varied through the corpus, though obviously the same concept was being denoted. These findings challenge the notion that the meaning of NN compounds can be reliably composed from the meanings of
224
George Dunbar
Familiarity Definition given 0 0 0 0 0 0 0 0 0 0 0 3 2 0 0 0 0
The ideal state for an economy to be in. Economy where people are all taking advantage of each other. A type of economy, sounds like a positive term. Blonde blue collar workers. An economy where people’s valuables are shared. Uncertainty or lies in economy. An economy which is careful in its approach to money matters. An economy which has become ‘just right’ through trial and error. An economy heavily dependent upon others. A not very stable economy. The economy is impatient and eager / greedy (as in the story). An economy which is ‘just right’ after experimenting with others. Take what you want. An economy in which people can use others’ possessions for free. An economy which borrows or takes money from other sources. [no definition offered] Where people in business prosper due to relying / taking ideas etc. off other people.
Confidence 0 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0
Table 1: Participant responses to Goldilocks economy out of context.
Goldilocks Scenario
225
the constituent concepts. The results are more readily understood in pragmatic terms. Earlier laboratory research has found that people are able to interpret unfamiliar compounds. However, those studies have used completely novel compounds, with target meanings determined without reference to an external standard. There was then no true normative standard to which results could be compared. The experiment presented here has the advantage that the “correct” meaning is known, it is simply not known to participants. With this essential addition, we can now see that whatever constructive cognitive operations participants apply in the laboratory tasks, they are not sufficient to guarantee accurate communication. Participants certainly could construct an interpretation. Indeed, I would go further and say that several constructed quite similar interpretations to one another. For example, several interpretations described the Goldilocks economy as one that appropriated goods, perhaps unfairly. This interpretation is entirely reasonable, and could well have between the intended interpretation. To this extent, the data is consistent with previous laboratory research. However, it just was not the intended interpretation. What has been shown, then, is that, even when participants can construct an interpretation consistently, out of context, it may not be the intended interpretation. The earlier laboratory studies, because they did not emphasise pragmatic aspects of communication as strongly, neglected a phenomenological aspect of NN interpretation to which this experiment has also drawn attention. When someone understands language, there is an associated feeling of confidence. This experiment has shown that that feeling is almost completely absent when participants attempt to understand an unfamiliar NN compound out of context. Not only were participants not in a position to know what the intended interpretation was, they knew that they didn’t know. Two further features of the definitions given in the newspaper corpus are important because they cast new light on the functional properties of NN compounds. The first striking observation is that writers would use both an NN compound and a definition. From a pragmatic perspective this is quite surprising, because it appears to go against principles of economy of expression. Why did the writer not simply give the definition? For example, instead of saying “... you’ve got a Goldilocks economy – not to heated or too weak” (J. Manahan, quoted in WSJ, 11.3.98), why not say “you’ve got an economy that is not too heated or too weak”? It appears that the NN combination serves to label a new phenomenon, a new phenomenon that needed a new name. By providing a label, the speaker identifies to the addressee that there is something to be labeled afresh. The new label makes a claim for the attention of the addressee of a particular kind. It invites the addressee to begin to form a new, (crucially) shared, concept. This is different from the usual way of analysing the meaning
226
George Dunbar
of novel NN compounds. Novel NN compounds are usually analysed as denoting a meaning, in the same way that a noun does. Here we are arguing that novel NN compounds in some cases can be understood as naming a concept. The second feature is the variety of different explanations of the Goldilocks economy. These explanations draw attention to different features of an economy in this state, including specific relationships between employment, inflation, stock-market investment, growth, productivity and international capital flows. On the one hand, these different explanations illustrate the way that a complex concept such as this can be approached from different perspectives. In the context of personal investment advice, some aspects will be relevant, and from another perspective, such as interest rate policy, a different, perhaps intersecting, subset of aspects will be most relevant to mention. However, it is also clear that there is more to it than that. The phenomenon of the benign equilibrium of the US economy in the 1990s was not conclusively and definitively characterised. Different people mentioned different features sometimes because they understood the economic basis for the phenomenon in different ways. It was the work of economic theory, lay and professional, to try to work out the concept, to explain what was happening. In short, the concept to which the phrase Goldilocks economy pointed was not fully worked out, and the variety of explanation occurred as part of the process of working out the shared concept over time. In analyses of NN compounds, it is easy to assume that the target concept can be defined fully, and to develop models to construct this target. In contrast, the data from this study suggest that the concept initially generated may be relatively underspecified, and yet be communicatively complete. Conceptual flesh is added from background knowledge of the world, and this knowledge can be a work in progress. One view of NN combinations is that hearers can interpret them if they know the concepts associated with each of the component nouns. On that view, the associated concepts provide all that is needed to construct the combined meaning, a compositional analysis. Two strands of evidence were advanced against that view in this paper. The experiment showed that participants could not reliably construct the intended interpretation of an NN combination even if they understood the component concepts. Moreover, they realised that they did not have sufficient information, given just the combination out of context, and this was shown by their confidence ratings. No compositional account predicts either of these things. The other strand of evidence came from the corpus. Speakers, or writers, clearly did not expect two nouns to stand on their own. Typically, they provided a more or less explicit definition. Only when the addressee could be expected to already possess the novel concept through previous encounter was this provision relaxed. This shows that the person using an NN combination does not believe that the nouns on their own provide sufficient information for a
Goldilocks Scenario
227
hearer to reconstruct the intended meaning. The speaker calibrates the utterance to take account of the addressee’s current knowledge. The studies reported here suggest a new picture of novel NN compounds, and imply that NN compounds are best understood in terms of what the speaker does. Novel NN compounds are not, at least not all, interpretatively compositional in any strict or narrow sense. References Clark, E. V., & Clark, H. H. (1979). When nouns surface as verbs. Language, 57, 767–811. Clark, H. H., & Marshall, C. R. (1981). Definite reference and mutual knowledge. In A. K. Joshi, B. L. Webber, & I. A. Sag (Eds.), Elements of discourse understanding (pp. 10–63). Cambridge, MA: Cambridge University Press. Cohen, B., & Murphy, G. L. (1984). Models of concepts. Cognitive Science, 8, 27–58. Crain, S., & Steedman, M. J. (1985). On not being led up the garden path: the use of context by the psychological parser. In D. Dowty, L. Karttunen, & A. Zwicky (Eds.), Natural language parsing: Psychological, computational, and theoretical perspectives. Cambridge, MA: Cambridge University Press. Gagne, C. L., & Shoben, E. J. (1997). Influence of thematic relations on the comprehension of modifier-noun combinations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 71–87. Gerrig, R. J. (1989). The time course of sense creation. Memory & Cognition, 17, 194–207. Gerrig, R. J., & Healy, A. F. (1983). Dual processes in metaphor understanding: Comprehension and appreciation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 667–675. Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Speech acts (pp. 41–58). New York: Academic Press. Hampton, J. (1997). Conceptual combination. In K. Lamberts & D. Shanks (Eds.), Knowledge, concepts, and categories. Hove: Psychology Press. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press.
228
George Dunbar
Levin, B., & Pinker, S. (Eds.). (1992). Lexical and conceptual semantics. Cambridge, MA: MIT Press. Rips, L. J. (1995). The current status of research on concept combination. Mind and Language, 10, 72–104. Smith, E. E., Osherson, D. N., Rips, L. J., & Keane, M. (1988). Combining prototypes: A selective modification model. Cognitive Science, 12, 485– 527. Sperber, D., & Wilson, D. (1986). Relevance: Communication and cognition. Cambridge, MA: Harvard University Press. Wisniewski, E. J. (1996). Construal and similarity in conceptual combination. Journal of Memory and Language, 35, 434–453. Wisniewski, E. J. (1997). When concepts combine. Psychonomic Bulletin and Review, 4(2), 167–183.
Structured Thoughts: The Spatial-Motor View Pierre Poirier and Benoit Hardy-Vall´ee
1
Introduction
Is thinking necessarily linguistic? Do we think with words, to use Bermudez’s (2003) phrase? Or does thinking occur in some other, yet to be determined, representational format? Or again do we think in various formats, switching from one to the other as tasks demand? In virtue perhaps of the ambiguous nature of first-person introspective data on the matter, philosophers have traditionally disagreed on this question, some thinking that thought had to be pictorial, other insisting that it could not be but linguistic. When any problem divides a community of otherwise intelligent rational thinkers, one suspects some deep conceptual confusion is at play. Indeed, we believe that the conceptual categories used to frame these and related questions are so hopelessly muddled that one could honestly answer “both simultaneously”, or “neither”, depending one what is meant by the alternatives. But let’s get our priorities straight. This paper first and foremost aims at defending what we believe to be a step toward the proper view of thinking, a view we call the spatial-motor view. In order to do so, however, we have found it essential to start by addressing the conceptual confusion just alluded to. Accordingly, the paper proceeds in two steps. First a conceptual step, in which we reconsider some of the traditional categories brought into play when thinking about thinking. Then an empirical step, in which we offer empirical evidence for one of the views conceptually isolated during the first part of the work. Future version of this collaborative work will include a speculative step in which we spin out an evolutionary and developmental scenario whose function is to justify the spatial-motor view by showing how it fits into current evolutionary and developmental theories. Address for correspondence: Benoit Hardy-Vall´ee, Institut Jean Nicod, UMR 8129, 1bis Avenue de Lowendal, F-75007 Paris, France. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
230
2
Pierre Poirier and Benoit Hardy-Vall´ee
Conceptual Step: Models of Thoughts
Thinking in cognitive science The first question to address to start clearing up the conceptual mess is how thinking is related to cognition generally. Conceptually, there seems to be three possible positions (and there are philosophers or cognitive scientists occupying each). First one can take the two terms to refer to the same phenomenon or capacity, either because the two are confused or identified. Simply confusing thinking with cognition is a bad thing, we take it, but identifying the two can be a viable position, provided the identification thesis is justified. There are two ways to identify thinking with cognition. The first position is rationalism wherein one develops a model of rational (and usually conscious) thinking and applies it to the whole of cognition. Jerry Fodor, for instance, has been defending (justification in hand) a rationalist position for 30 years which goes something like this: the Turing Machine, which is a model of thinking, is the only model we currently have of a rational process. The second position, identifying thinking with cognition, we call nihilism: develop models of cognition that make no space for rational deliberative thinking as a distinct process. Nihilists are usually neuroscientists, roboticists and psychologists working on lower level cognition or within the so-called micro-cognition perspective. One can be a nihilist either because: (1) one eliminates thinking altogether (the term is a throwback to an obsolete conception of the mind) or because: (2) one thinks that thinking is essentially no different from other low-level cognitive process, such as perception and categorization, and thus needs no special explanation over and above that of these processes. We believe that these two avenues are reductive dead ends: the first because it reduces the whole of cognition to one of its aspects or function and the second because it neglects an important aspect of cognition. We believe that thinking is a genuine phenomenon, in need of explanation, and that that explanation cannot be extended to cover the whole of cognition. This is the third possible position on the conceptual landscape: thinking is a special kind of cognitive process, a process that shares important aspects of cognition, because it is a part of cognition, but at the same time a process that is markedly different from the rest of cognition in certain crucial ways. An explanation of thinking must therefore show how thinking emerges (computationally, developmentally, evolutionarily) as a special kind of cognition.
The Spatial-Motor View
231
We shall therefore aim to develop a model of cognition that stays clear from the Scylla of rationalism and the Charybdis of nihilism, but this is the job of the next section and the empirical step in our defense. The remainder of the present section attempts to conceptually isolate the kind of model of thinking we believe is apt to the job. We are family All the models of thought that we discuss in this paper agree on a number of key issues. Actually, they agree on much more than they disagree on and, in that sense, they form a family of models. For reasons that will become clear, we’ll call this family “compositionalist models of thinking”, and we’ll label “compositionalist” anyone who puts forward or defends such a model. Our point in the paper will be to claim that compositionalists are wrong, not in being compositionalists, since we’ll end up defending a compositionalist view ourselves, but that they are wrong in the kind of compositionalist model that adequately accounts for thinking. Hence, the paper is a family dispute of sorts, aired in public! The present section aims (1) at setting out the common conceptual landscape that underlines how compositionalists think about thinking (subsection 2.1) and then (2) explaining how various compositionalist models differ (subsection 2.2). We said that compositionalist models of thinking agree on much. First, but this by no means sets compositionalist models apart from other models, recent or historical, they all agree that thinking is a representational process, that is, a process that takes representations as inputs, outputs and intermediaries. Thoughts, accordingly, are the representations over which thinking occurs. (T1) Thinking is representational process; thoughts are the representations over which the process is defined (Representationalism We shall not discuss representationalism further here. Apart for Ryleans, Heideggerians, extreme system dynamicists and some fans of new-AI or embodied cognition, no one will contest representationalism. We do wish to make one point, however, since we do see ourselves as fans of new-AI and embodied cognition. This paper is about thinking, not about cognition generally. Thinking, as we emphasized, is one cognitive capacity; not the whole of cognition. A common error made by a generation cognitive scientists was to develop models of thinking, and then mistake those for models of cognition generally. Some cognitive capacities are, we believe, non-representational (Brooks 1991), while others are minimally representational (Clark 1997) and while others yet are fully representational. Thinking, we believe, is among those. Be that as it may, we’ll strive to show below that thoughts, even though they are representations, are much more embodied than most representationalists believed.
232
Pierre Poirier and Benoit Hardy-Vall´ee
Hence, the present debate about thinking is among friends of representationalism. However, the process that transfers digital pictures from my camera to my computer is also, in a way, a process defined over representations and yet no one would claim that that process is actually a kind of thinking. Thinking is a special kind of process defined over representations: a process that takes representations as inputs and transforms them into output representations only (1) if the input and output representations bear some appropriate epistemic relation (such as truth-preservation) and (2) if the process by which the input representations are transformed into the output representation can itself be given an epistemic interpretation (such as inference). Given the epistemic relation between input and output representations and the epistemic interpretation of the process itself, the input representations can be seen as offering a rationale (Cummins 1983) for the output representations and the input representations transfer the justificational status to the output representations. Note that many cognitive processes defined over representations do not thereby count as thinking. The process that maps retinal representations of the visual scene onto perceived 3D representations does not thereby count as thinking, and it shouldn’t: no one would want to count vision as a form of thinking1 . (T2) Thinking is an inferential representational process (Inferentialism). Up until recently, T2 (inferentialism) would have sufficed to characterize the family of models we are about to discuss. With the advent of connectionism, however, a number of new conceptual positions have opened up. In neural networks (ANNs), number vectors can be interpreted as vehicles for representations and the vector transformations effected by ANNs have been qualified by some as a kind of statistical inference. Statistical inference is a process that transforms input representations into output representations in virtue of the statistical properties of input vectors. For instance, some ANN learning algorithms may perform Principal Components Analysis (or PCA, a procedure that transforms correlated input variables into a smaller number of uncorrelated variables) such that a trained networks map input representations onto its principal components. To properly count as a form of inference, the input and output representations must bear some relevant epistemic relation. If the principal components are thought of as a kind of prototypical representation of a category and the input vectors are thought of as instances of that category, then the vector transformation can be seen as a form of categorization, which could be described thus: given what I know about my perceptual world (as encoded in my weight matrix), there is a 0,85 probability that the object causing my current input is 1
Anyone who did would have to invent a new word to label the process someone engages in when she says, for instance, “I’m thinking about it!”, say in response to a marriage proposal.
The Spatial-Motor View
233
an apple. Hence, the representational process implemented by ANNs may be understood as a kind of inferential process. But is it thinking? The categorization example rehearsed above provides a nice case. Note the difference between two kinds of classification procedures. 1) Someone looks at an apple (retinal representations of the apple are thereby generated) and recognizes it as an apple. 2) Someone looks at an apple (retinal representations of the apple are thereby generated) and does not recognize it as an apple right away (maybe it’s a genetically modified apple that’s blue). Intrigued, that person looks around to see that it’s in the apple section of the supermarket and that, looking closer, it does have the skin-texture and consistence of other apples she has previously encountered. On the basis of that evidence, without thinking about it further, she takes the blue thing in the apple section to be an apple. We claim that these two cases of classification were done in radically different ways. The first case was presumably the kind of (unconscious) bayesian inference that might have been realized by a neural network vector transformation: given how that thing looks and what I know about my visual world, I’m pretty confident in my belief that it is an apple (e.g., if I were hungry, I would eat it). Again, we believe that no one would want to qualify this process as thinking, although, like thinking, it is a representational inferential process. However, most would qualify the second process as thinking (as we did in describing it): not recognizing the thing, she looked around, thought about it and inferred that it was an apple. Maybe the (unconscious) inference went like this: given that this thing is in the apple section and that I know that the supermarket groups products according to type, and given that the thing does have the skin texture and consistence of apples, I conclude that it is an apple. As a process, what sets this inference apart from the first one? Following Fodor and Smolensky, we say that the first kind of inference is defined over non-representational constituents of representations, whereas, in the second case, the inference is defined over the representational constituents of representations. The first kind of process is sensitive to the statistical structure of its input representation and the second kind is sensitive to the representational structure of its input representations (the representations that compose them). We’ll call the first “statistical structure-sensitive inference” and the second, following Fodor, “constituent structure-sensitive inference”. We can thus state what holds compositionalists together as a family: (T3) Thinking is an inferential process sensitive to the constituent structure of representations; thoughts possess the appropriate compositional structure to sustain structure-sensitive inference (Compositionalism). Note that T3 includes T2, which itself includes T1. Hence, we’ll take T3 to succinctly state the position we’re after.
234
Pierre Poirier and Benoit Hardy-Vall´ee
Family feud Compositionalists agree that thinking is an inferential process sensitive to the constituent structure of representations. What tears the family apart is the source of the relevant representations’ constituent structure. To understand the conceptual space in which the family dispute is framed, it is important to consider additional properties representations may have as thoughts. According to compositionalist models, these properties come organized in two neat orthogonal dichotomies: • linguistic vs. non-linguistic, and • digital vs. analogical. Linguistic vs. non-linguistic representations. Jerry Fodor published an important book in 1976 in which he claimed that thought is linguistic, a book aptly titled The Language of Thought.2 In that book, he defended the idea that the representations over which thinking was defined had to be language-like in nature; that is, they had to be language-like the cognitive science of his day were to have any chance at being true. By “language-like”, Fodor meant that mental representations had the kind of structure that public language has according to Chomskyan linguistic. Like public language, mental representations would be the result of a Chomskyan generative process, a process which automatically gives mental representations the appropriate constituent structure. Indeed, according to Chomsky, public sentences are constructed from primitives by means of processes like concatenation and permutation that preserve the original integrity of the primitives. If the representations over which thinking is defined are similarly constructed, then they cannot but have the kind of constituent structure that compositionalists think they have. Non-linguistic representations are those representations that cannot be represented as the result of a Chomskyan generative process. Digital vs. analogical representations. There is an old opposition, in philosophy and cognitive science, between analogical and digital representations. Analogical representations are said to be modal, continuous, particular, iconic and holistic while digital representation are thought to be amodal, discrete, general, symbolic and structured. Following Dretske’s lead, we say that “a representation that s is F is digital insofar as it carries nothing else than s’s being F” (Dretske, 1981). By contrast, a representation that s is F is analogical if it carries other information besides s’s being F. Dretske’s paradigmatic examples 2
In retrospect however, it seems that the title was not so apt, since the book was about linguistic nature of cognition generally. If one thinks, as Fodor probably does, that language is a good model of thinking, confusing thought with cognition is somewhat inconsequential (although it may lead to confusion in others).
The Spatial-Motor View
235
draw on the distinction between a statement and a picture. A representation like“ The cup has coffee in it “ doesn’t say how much coffee there is, what kind of cup is it, but a picture of the cup with the coffee would represent all those details. There is a perceptual similarity between the cup and the picture of the cup, while there is no perceptual similarity between “ The cup has coffee in it “ and the real cup. The idea here is that mental representations (and not only public representations like pictures and statements) also come in two kinds, mental statements (say, in mentalese) and mental images. Whereas linguistic representations, conceived on the Chomskyan model, are automatically compositional, advocates of digital representation still need to explain what makes digital representations compositional. This is not a project we address here. The important point for us is that, in the context of the present debate, it is usually understood that only digital representation can be compositional (an important exception is Cummins 1996). Analogical representations are simply not the kind of things that can have compositional structure. This is the idea we attack in the next section. To end the conceptual step, let’s characterize the three compositionalist models of thinking in terms of what distinguishes them. We saw that all compositionalist models agree that thinking is an inferential process sensitive to the constituent structure of representations (T3) and that they disagree about the source of the necessary constituent structure. (Linguistic-T3). Thinking is an inferential process sensitive to the constituent structure of linguistic representations (all linguistic representations possess constituent structure). (Digital-T3). Thinking is an inferential process sensitive to the constituent structure of digital representations (digital representations can possess constituent structure – through some process yet to be explained). (Analogical-T3). Thinking is an inferential process sensitive to the constituent structure of analogical representations (analogical representations can possess constituent structure). It is the purpose of the next section to explain how analogical representations can possess constituent compositional structure. 3
Empirical Step: Thinking With Analogical Representations
We saw that partisans of linguistic-T3 automatically get an account of compositional constituent structure. This affords them the option of posing a challenge to their opponents within the compositionalist family: Can there be compositionality, hence thinking (inference sensitive to constituent structure), without
236
Pierre Poirier and Benoit Hardy-Vall´ee
language? Their answer is: No. No language, no compositional thoughts. Partisans of digital-T3 take a more liberal view: babies and animals think when, but only when, they manipulate digital representations. We too wish to adopt a more liberal view but disagree with partisans of digital-T3 that thinking has to be digital. We argue here that empirical knowledge in cognitive science shows that some kinds of analogical representations can be compositional. We begin by outlining an account of analogical representation (3.1), and then argue that analogical representational systems can exhibit constituent structure (3.2), which can support a kind of inference sensitive to constituent structure (3.3). Together, these findings support our Spatial-motor conception of thoughts. Analogical representations After an initial anti-representationalist stance, new-AI roboticists began to design representational control architectures for autonomous robots. Cognitive robotics (Clark & Grush, 1996) thus replaced (or complemented) reactive robotics. However, cognitive roboticists did not see representational activity as the symbolic re-description of the world, but as the sensorimotor simulation of possible behaviours. A cognitive robot doesn’t have to jump off a cliff before discovering that this is dangerous; it can recognize the affordance and let its hypothesis about moving toward the cliff action die in its place (...). (MacDorman, 1999, p. 21). Lynn Stein’s MetaToto (Stein, 1994) is an example of such “neorepresentationalist” control architecture: her robot uses its knowledge of the external world to build a map of its environment, and then consults its map either on-line, to guide its behaviour, or offline, to try out behaviours. As Grush puts it: this model creates a ‘virtual reality’ with which [its] basic reactive navigational routines can interface in order to imaginatively explore the environment. (Grush, 2003, p. 85) We agree but we believe that the expression virtual reality should not be put inside quotation marks: in non-linguistic beings, cognitive representation is virtual reality (VR). Virtuality is not about “not being really real”, but about possibility: organisms that represent the world explore the possibilities of action, as players in VR environments do. On this view, (analogical) representation consists in the reactivation of sensorimotor experiences, either in the presence or in the absence of the objects or causes of the original experiences, in order to simulate the temporal evolution
The Spatial-Motor View
237
of the experience and thus predict its outcome or consequence.3 A representational system is a system that can plan ahead (McFarland & Bosser, 1993; Cruse, 2003), one that is able to pre-select among possible behaviours (Dennett, 1994). There can also be representations, not of things in the world, but of the body, that is, body models (or schemata). Maravita and Iriki (2004) define a body model as a constantly updated status of the body shape and posture. According to Cruse (1999, 2003) and Damasio (1999), body models are the basis of world models. Neuroscientists have begun to unravel the neural implementation of body models (Berlucchi & Aglioti, 1997; Maravita A. et al., 2003), their role in body posture (Morasso, Baratto, Capra & Spada, 1999) and in the phantom limbs phenomenon (Melzack, 1990). Grush (2003) analyses body models, or motor emulators as he calls them, in a control-theoretical framework: motor emulators are neuronal forward models of the musculoskeletal system (MSS) that implement the same input-output operation of the MSS. Emulators behave as internal predictive models. We will frame the distinction between body model and the world model in terms of emulation and simulation.4 On this view, the brain emulates the body (motor emulators reproduce in parallel the body’s behaviour and generate feedback, like real perception and action) and simulates the external world (it reproduces possible things, agents, and events). There maybe no strict distinction between emulation and simulation, since they are, after all, two kinds of sensorimotor representations. All that applies to emulation applies also to simulation, except that what is simulated is outside the body. Barsalou’s theory of Perceptual Symbols Systems (1999) offers a detailed account of simulation. Its foundational thesis is that perception and cognition recruit the same brain resources, a fact already acknowledged by cognitive neuroscience (Grezes & Decety, 2001). Semantic memory representations are not 3
We are prepared to agree that human representation is too complex to be restricted to that definition, but our point here is to describe analogical representations. In what follows, we stick with a minimal theory of representation, which takes the epistemic dimension - inference- as its primitive (instead of a view of representation that takes its metaphysical dimension, that is, reference, as its fundamental aspect). 4 These terms are getting to be quite popular, and they are sometimes used to mean various things. Before it gets to the point where we cannot understand each other, we propose to use their computer science acceptation (grabbed from FOLDOC, The Free On-line Dictionary of Computing (foldoc.org): Emulation: When one system performs in exactly the same way as another [...] Simulation: Attempting to predict aspects of the behaviour of some system by creating an approximate [..] model of it.
238
Pierre Poirier and Benoit Hardy-Vall´ee
language-like, amodal representations such as frames, schemata, predicate calculus expressions, or feature lists. A perceptual symbol is a “record of the neural activation that arises during perception” (Barsalou, 1999, p. 583). After every perceptual experience, the brain records sensorimotor aspects of the experience, and integrates it in multimodal frames. Only some aspects are thus recorded: selective attention focuses on some features (e.g., movements, vertices, edges, colors, spatial relations, heat, etc.) that are encoded in long-term memory and recreated when needed. Only schematic representations, then, are available for processing. These representations are analogical because, at every stage of the processing, they still possess sensorimotor properties: what is selected, memorized, transformed and recalled is a proprioceptive or exteroceptive experience, not an amodal language-like encoding of that experience (as in logic or programmation languages). It may sound like digitization, but digitization is a process whom outcome is a language-like, or propositional, representation; here, this process of selective aggregation outputs a schematic or prototypical representation of objects and events that shares similarity with these objects and events. Compare the word “balloon” with the balloon in figure 1: the cognitive system is manipulating schematic analogons of balloons, not linguistic labels. On Barsalou’s view, each frame is not only an aggregate, but also a simulator that, after perception or in linguistic processing, may generate a simulation of categories of objects or events stored in the frame by reactivating some perceptual symbols. Perceptual symbols intervene in categorization and prediction: for instance, once the object seen is categorized as an X, the simulation of schematic properties of X allows the system to predict X’s behavior, or its own behavior toward X. These simulations are the human equivalent of MetaToto’s control architecture: a sensorimotor VR that helps in planning ahead. Many simulations can be generated simultaneously, and they can be combined into complex simulations. We won’t defend this account as an exhaustive theory of human cognition; we are sympathetic to Barsalou’s model to the extent it provides a model of analogical representation (at least for animals and babies): many animals have perceptual symbol systems that allow them to simulate entities and events in their environment. [...] Because many animals have attention, working memory, and long-term memory, they could readily extract elements of perception analytically, integrate them in long-term memory to form simulators, and construct specific simulations in working memory. [...] Similar to nonhumans, infants may develop simulators and map them into their immediate world. During early development, infants focus attention selectively on aspects of experience, integrate them in memory, and construct simulators to represent entities and events [...] (Barsalou, 1999, p. 606)
The Spatial-Motor View
239
We have described what, on a minimal theory of representation, we take representations to be, how they are produced and what they’re for: 1. Representations are dynamical re-productions of something else. 2. Re-production is achieved by reactivating neural marks of sensorimotor experiences. 3. Representations helps prediction and control. However, this is only a general picture of representation, not of thoughts. Thoughts, as we saw, are representations that have constituent structure. Our current account will deserve to be seen as a general theory of thinking provided (1) we can show that analogical representations can possess compositional structure and (2) we can show that inference can use this constituent structure. The next two sections address these points respectively. Structured analogical representations Hypothesis formation, exposition and evaluation in science is often based on analogy or metaphor (Holyoak & Thagard, 1995). Cognitive science is no exception, especially when the object is to define abstract and complex concepts such as representation and its digital and analogical subspecies. Dretske’s conception of analogical representations, for instance, relies on a photographic metaphor. Analogical representations are compared to photographs and they inherit most of their properties (but not all, which is why it is a metaphor). A photograph of a coffee cup can be interpreted as “this coffee is espresso” or “the cup is black”, and so on. Since photographs are patently non compositional, it follows that analogical representations are not compositionally structured (and this held for public as well as for mental analogical representations). But in 1981, when Dretske first published Knowledge and the Flow of Information, 3D software, video games and Virtual Reality were not as common as they are today and, consequently, his metaphor could not be informed by these technologies. But ours can. VR representations (simulation, emulation) are analogical. If we can show that VR representations can be compositionnally structured, then we have showed that some analogical representations could be structured. Are VR images compositionally structured? First, look at the mechanisms for creating VR. VR and video games programming owe their development to the parallel development of computational geometry (CG), a field of computer science devoted to the “algorithmic study of geometric problems” (Mitchell, 1997, p. 1), and computer graphics. Computer scientists face several problems: how to (re)create 3D space and objects, and how to (re)create their evolution in time. These are exactly the problems faced by cognitive systems: how to predict, with internal machinery, the evolution of things and events. In CG as in nature, in
240
Pierre Poirier and Benoit Hardy-Vall´ee
Figure 1: Complex simulation (from Barsalou, 1999). Simulators are represented on the left, simulations on the right. order to predict, one needs to reproduce things/events and their evolution, what CGers call modeling and simulation. To model and simulate, computer scientists don’t rely on pictures databases, or on large amount of unstructured images. They use combinatorial structures (Chazelle et al., 1999, p. 7). VR-production systems can recursively generate a potentially infinite number of complex combinations. Brains are natural VR-production systems able to generate simulations of different objects, relations, events, etc, and combine them. In fig. 1 we see how objects simulators in long-term memory (BALLOON, PLANE, and CLOUD) generate tokens of analogical representations in working memory while combining them with spatial relations simulators (ABOVE, LEFT). Metaphors, however instructive, are by no means conclusive. We need to show that VR analogical representations have compositional structure, or constituency. To do so, we first state the four criteria for compositionality, and then show that VR analogical representations satisfy them. The four criteria for compositionality are mereology, combinatoriality, productivity and systematicity. The remainder of the present section is devoted to show that VR analogical representations are mereological and combinatorial. Section 3.3 will strive to show that they are productive and systematic when used to draw inferences. According to compositionalism, real constituency is about parts and wholes (Fodor & Pylyshyn, 1988). Then compositionality is, stricto sensu, a genuine mereological relation (like the relation between a wheel and a car). Compositionality
The Spatial-Motor View
241
is not a set-theoretic relation, such as the relation between a car and the set of cars in the parking. The basic predicate of mereology is proper parthood. Proper parthood has four individually necessary and jointly sufficient properties; it is an irreflexive, asymmetrical, transitive and supplemented relation (See Casati & Varzi, 1999). “Supplemented” means that if x is a proper part of y, then there is at least another part z, distinct of x. Mereological relations (”is a part of”) are thus distinct from set-theoretical relations (”is a member of”): while there is nothing illogical in the idea of an empty set or a one-member set, a whole with only one part (or no parts) is logically impossible. There must be at least two parts in a genuine mereological whole; if thoughts are mereological wholes, they must be made out of at least two parts, which is precisely what functionalism supposes: that mental states are wholes made up of two parts, functions and arguments. Simulations are mereological because they satisfy Casati and Varzi’s criteria for a genuine mereological whole. A simulator (the CLOUD simulator, for instance): • Is not a part of itself (irreflexivity). • Is a part of a thought, but the thought is not a part of the CLOUD simulator (asymmetry). • If the simulator is a part of a thought which is itself a part or another thought, then the simulator is also a part of the second thought (transitivity). • If the simulator is a part of a thought, then there must be at least another simulator (supplementation). Now, what about combinatoriality? Simulations are combinatorial because they satisfy Fodor and Pylyshyn’s (1988) criteria for combinatoriality: • There is a distinction between structurally atomic and structurally molecular representations (BALLOON, ABOVE and CLOUD are atomic while BALLOON ABOVE THE CLOUD is molecular). • Molecular representations have constituents that are themselves molecular or atomic (BALLOON ABOVE THE CLOUD is made out of BALLOON, ABOVE and CLOUD). • The semantic content of a (molecular) representation is a function (i) of the contents of its parts, together with (ii) its constituent structure. (The meaning of BALLOON ABOVE THE CLOUD is a function of the simulators for BALLOON, ABOVE and CLOUD). Since they are both mereological and combinatorial VR analogical representation are structured representations. Therefore, we should not restrict discreetness and structure to digital representations (linguistic or not): simulators are
242
Pierre Poirier and Benoit Hardy-Vall´ee
structured analogical representations. A remark about discreetness is in order. Because of a strong association between “digital” and “discrete” (as in Dretske’s account), analogical representations have been conceived as continuous (a notable exception is Sloman, 1971). But “analogicalness” is not about being non discrete; it is about being non digital. Analogical representations share similarity with their referent5 , whether they are structured or not is another story. Analogical inference A compositional system is structured (mereologically and combinatorially), but mereology and combinatoriality are not sufficient. Compositional systems are also productive and systematic. A clear criterion for the systematicity and productivity of cognitive systems, which is also a strong constraint on a theory of thought, is inference. We argued that analogical representational systems can be both mereological and combinatorial. We now argue that analogical representational systems can produce inferences that are sensitive to the constituent structure of their representations and that such systems are systematic and productive (we don’t argue that the converse holds; whether it does or not is not important here). Analogical representational system will then have been shown to possess the four attributes we established as criteria for compositionality. Such systems, then, would truly think. Thinking is an inferential activity that is sensitive to the constituent structure of the representations manipulated. Given a (mereologically and combinatorially) structured input representation, a system that thinks will output another structured representation according to rules that apply in virtue of the representation’s structure. These rules are transformation rules that specify how the transformation should be done. It is the conformity to these rules that makes transformations correct or not. Inference is thus an epistemic action that can be “kosher” if it conforms to some epistemic virtue. On that view, thinking, constituent structure-sensitive inference, is a deductive process. Thinking systems generate new patterns of information that they don’t have to gather from nature. They are looking for the consequences of the information they already possess, looking for what that information implies. To argue that non-linguistic analogical cognitive systems deductively infer, that their analogical simulations follow deductive patterns, a construal of “deduction” is needed, a construal that is slightly different and broader in sense than what philosophers of language usually consider as a deduction. Accordingly, deduction will be construed here as the following: a mental transformation from 5
Sharing similarity is neither necessary nor sufficient to qualify as a representation (see Dennett 1981, Putnam 1981): A counts as a representation of B insofar as it is used by a system as a representation for B.
The Spatial-Motor View
243
one representation to another according to transformation rules (1) that apply to the constituent structure of the representation, and (2) respect some epistemic virtue. For some philosophers, such as Bermudez (but also Frege, Davidson, Fodor, etc.) the epistemic virtue to which these transformation rules must conform can only be formal, or, more precisely, logico-syntactic. A good inference is an inference whose structure conforms to classical logic. Such a stance makes nonlinguistic inference a chimera, because non-linguistic representations cannot be syntactic: We understand inference in formal terms – in terms of rules that operate upon representations in virtue of their structure [Note: more precisely, logico-syntactic structure]. But we have no theory at all of formal inferential transitions between thoughts which are not linguistically vehicled. Our models of formal inference are based squarely on transitions between natural language sentences (as codified in a suitable formal language) (Bermudez, 2003, p. 111) The problem is not that we have no theory “of formal inferential transitions between thoughts which are not linguistically vehicled”, but that we have no theory of inferential transitions tout court between non linguistic thoughts. A formalist view of inference equates “good inference” with “formally valid inference”. But formalism is not the only option. Non-formalists such as Brandom or Sellars consider the inference from “it’s raining” to “the streets will be wet” as materially valid, where the transformation rules are not syntactic but semantic. Semantic inferences are drawn according to other rules, less general than logical rules: the particular rules for applying a certain word, dictated by sociolinguistic conventions. Other non-formal rules of transformation exist. Suppose you are a bartender and a client, to whom you served a martini, says to you: “this glass is empty, captain!”. He doesn’t only want to inform you of a physical fact, but was asking for another drink. This is an example of a pragmatic inference (Grice, 1989; Sperber & Wilson, 1986) in which the intentions of speakers are inferred from the illocutionary content (here, a request) of their utterances and the conversational context (the bar). Understanding the client’s assertion implies that you understand the structure of the speech act, in this case, a request, which is a kind of directive speech act (Bach & Harnish, 1979). Pragmatic inferences are genuine constituent structure-sensitive inference: insofar as an utterance is categorized as a token of a particular speech act, the inferential understanding of the token is deduced from the conditions of use of this type of structure. Syntactic, semantic and pragmatic inferences share a common feature: their transformation rules are linguistic (logic, linguistic conventions and illocutionary under-
244
Pierre Poirier and Benoit Hardy-Vall´ee
standing). But are structure-sensitive transformation rules only linguistic? Goodness of inferences relies partly on the capacity to recognize relevant input. It begins in perception, where selective attention filters out irrelevant properties. To draw a syntactic, semantic or pragmatic inference, the inferential process must be sensitive to some relevant properties of the representation, be it its logical connective, the meaning of words or the type of the speech act involved, and draw a conclusion from these relevant structural properties. According to Relevance Theory (Sperber & Wilson, 1986; Wilson & Sperber, 2004), looking for relevance is a fundamental characteristic of human cognition. An input is relevant insofar as it yields a positive cognitive effect, that is, “a worthwhile difference to the individual’s representation of the world – a true conclusion, for example” (Wilson & Sperber, 2004, p. 608). If an input, inferentially processed, yields relevant information, then it is a relevant input. Note that relevant transformation rules are also needed: even if one has relevant inputs, non relevant processing will not yield positive cognitive effects. Wilson and Sperber hold that the evolution of our cognitive system is geared toward efficient processing of relevance. The human cognitive system has evolved to make our perceptive, mnemonic and inferential process geared toward the processing of relevant information. Relevance Theory seems a little anthropomorphic. If we acknowledge, as we already did, that non-linguistic creatures can have representations, then there is no reason to believe that non-conversational aspects of Relevance Theory don’t apply to non-linguistic cognition. Relevance is not only useful in linguistic settings: in perception or memory retrieval, relevance is beneficial for all representational creatures. Predators must rely on relevant cues to hunt their prey, and must process them relevantly. Relevance Theory already has a name for this basic kind of inference: contextual implication. A contextual implication is a conclusion drawn from the input and some knowledge about the context, but neither the input nor the context is sufficient for drawing the conclusion. Turning a lamp on and seeing that no light comes from the lamp, you may infer that the light-bulb isn’t properly working anymore. The light-bulb’s defect is inferred from the input (looking at the lamp) and knowledge about the context (your attempt to turn on the light should illuminate the room). This kind of implication is a “situated” inference, an inference performed in action context that rely on knowledge and transformation rules more specific than logical connectives, word meanings or speech acts: embodied categorical knowledge. Embodied categorical knowledge is perceptual information about categories of objects, events or agents that is learned from sensorimotor interaction and processed without language, and that can represent general features of categories. Exactly what Barsalou’s simulators can do. Situated inference is thus perfomed by representing relevant properties of the
The Spatial-Motor View
245
environment and running a multimodal simulation of represented properties6 . Contextual inferences are low-level constituent structure-sensitive inferences: when some perceived object, event or agent x triggers the X simulator, the X simulation is a categorical simulation: x is perceived as a token belonging to a certain type. It is its belonging to that type that makes deductive inference possible on the basis of the type’s properties when the X simulator, together with other simulators, constitutes a complex representation. Insofar as something perceived is categorized and can be simulated in context, the inferential understanding of the token is deduced from the information embedded in the simulator’s categorical knowledge7 . The “if-then” pattern of inference is present, but implemented in sensorimotor simulations and without digital representations. The structure/content distinction is also present: all that is specific to the X category (typical behavior, shape, etc.) becomes the inference rule. Hence representations are not structured only when they are digital, but when categorical knowledge allow deduction. Finally, analogical compositionality differs trivially from digital compositionality in being applied to analogical instead of digital representations. But the two also differ non-trivially. First, analogical compositionality is restricted to categories and relations that can be simulated by multimodal simulators. With their mnemonic and sensorimotor apparatus, we could expect animals and nonlinguistic human babies to only (or mostly) have basic-level categories (Rosch, 1978, see also Proust, 1997), and this for at least two reasons: (1) basic-level categories are recognizable on the basis of schematic analogical properties and (2) because of their multimodal content (gestalt perception, imagery, etc.), basiclevel categories are more easily simulated by multimodal brain reactivation. Another distinction between analogical and digital compositionality is generality. It is plausible to assert, on empirical ground, that non-linguistic minds mosty have no intermodular fluidity (Hauser & Spelke, 2004). In such beings, information hardly flows from one module to another, while this is something that easily occurs in us, enculturated apes. Hence, the generality constraint (GC, Evans 1982) may hold only inside modules: a module may satisfy GC – to think that Fa one must be capable of thinking Fb and Ga – but only if a, b, F and G are in the actual domain of the module. With language, material culture and science, humans have access to theories and to expert knowledge. 6
There are thus two elements in the simulationist theory of representation: a claim about representation formation (compositional combination of simulators) and one about transformation (the temporal evolution of the simulations), and the second follows naturally from the first. 7 We presuppose that non-linguistic beings may possess concepts or proto-concepts, although we won’t argue for it here. See Proust (1997)
246
4
Pierre Poirier and Benoit Hardy-Vall´ee
Conclusion
We argued for a view of thinking that lies between the classical or received view defended, in one form or another, by most researchers, scientists and philosophers alike, and another view which we find in Bermudez’ latest book, Thinking without words. The Classical-T3 view emphasizes the linguistic nature of thought, while the Digital-T3 view held by Bermudez (2003) (but also Dretske and many others) insists on its digital nature. Our view stands with the DigitalT3 view on being, we believe, more empirically founded, less species-specific and “adult-oriented” (meaning less geared towards human adult cognition) than the classical view. It also stands with the Digital-T3 view in thinking that the digital versus analogical dimension is the most important dimension in which to think about thinking. This last attitude marks a major shift in contemporary thinking about thinking, which, in part because of its association with the philosophical school of analytical philosophy, a school that methodologically emphasized a priori linguistic analysis, has mostly seen thinking as an exclusively linguistic phenomenon. The shift in contemporary Anglo-American philosophy, away from linguistic analysis and towards philosophical naturalism, has left the dominant position of language unchanged, because, we believe, of the major influence of cognitive science in naturalistic philosophy and the central position of Chomskyan linguistics in the cognitive revolution (at least until recently). If thought is linguistic, then Chomskyan linguistics offers naturalistically inclined philosophers researchers a powerful model, unrivalled in depth and breadth, with which to think about thinking. But, as emphasized by Bermudez, the linguistic model has the major drawback of making thought a uniquely human adult phenomenon.8 We agree with Bermudez that this drawback is serious enough to demand a rethinking of the traditional position on thinking in contemporary Anglo-American philosophy. And with Bermudez, we strive to develop a view of thought that applies to babies and animals, making human adult thought a special case of the model rather than an unexplained cognitive innovation or a suspicious paradigm. On the other hand, although, for lack of space, we did not argue for this here, our view also stands with the classical view in thinking that thoughts, when digital, are thus because they are linguistic, that it is the linguistic character of some thoughts that makes them digital. Bermudez’s view, we believe, posits another 8
If the Language-of-Thought view is adopted, then this problem wanes, but another one appears: thought generally, including in animals and human babies, can only be studied through the prism of human adult thought. This truly seems to be putting the cart before the wheels. Human adult thinking, perhaps the crowning achievement of hominid evolution, should not be used as the model of all thinking: it should instead be explained in terms of more basic forms of thinking (evolutionarily, developmentally).
The Spatial-Motor View
247
dubious cognitive first: the appearance of digital thoughts. Unlike many naturalistic philosophers today, such as Bermudez, we do not look to mainstream cognitive science for our main source of empirical constraints on philosophical theory, but to an emerging sub-genre: evolutionary developmental cognitive neuroscience. Over and above well-heeded constraints of cognitive psychological and neurological plausibility, evolutionary developmental cognitive neuroscience demands that our philosophical understanding of thoughts lives up to current understanding of the ontogeny and phylogeny (keeping in mind the important interplay between the two) of cognition. Since Bermudez strives to open up the conceptual space surrounding the notion of thinking to include nonhuman and non-adult thought, it might seem strange to accuse him of not paying proper attention to ontogeny and phylogeny. Indeed, we sided with him against the linguistic view of thought especially because of Bermudez’s insistence that our view of thinking makes room for baby thoughts and monkey thoughts. But, by doing so, Bermudez opens Pandora’s proverbial box: once the door to such forms of thinking is opened, naturalistic philosophers must pay attention to current evolutionary and developmental theory, and such a stance, we maintain, argues against the digital model as a view of thinking generally. In short, we have defended a view of thinking that in some sense stands between the classical view and Bermudez’s view. We supported this view, which we called the spatial-motor view, in two steps: a conceptual step in which we presented the three views, insisting on their defining features. This step served to highlight what the views share and what distinguishes them, thereby painting the conceptual landscape in which people have thought about thinking. The second step was empirical: we defended the central claim of our view, that is, the idea that spatial-motor analogical representations can have compositional structure. Note that our philosophical inclination towards naturalism makes this the central step of our defense of the spatial-motor view. We are perfectly prepared to let our model stand or fall with relevant empirical data.
References Bach, K., & Harnish, R. M. (1979). Linguistic communication and speech acts. Cambridge, Mass.: MIT Press. Barsalou, L. (1999). Perceptual symbol systems. Behavioral and Brain Sciences, 577–609. Berlucchi, G., & Aglioti, S. (1997). The body in the brain: neural bases of corporeal awareness. Trends in Neurosciences, 20, 560–564.
248
Pierre Poirier and Benoit Hardy-Vall´ee
Berm´udez, J. L. (2003). Thinking without words. Oxford; New York: Oxford University Press. Brandom, R. (1994). Making it explicit: reasoning, representing, and discursive commitment. Cambridge, Mass.: Harvard University Press. Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence Journal, 47, 139–159. Casati, R., & Varzi, A. C. (1999). Parts and places: the structures of spatial representation. Cambridge, Mass.: MIT Press. Chazelle, B., Force, & the Computational Geometry Impact Task. (1999). Advances in discrete and computational geometry. In Contemporary mathematics (pp. 407–463). Providence: AMS. Clark, A. (1997). Being there: putting brain, body, and world together again. Cambridge, Mass.: MIT Press. Clark, A., & Grush, R. (1999). Towards a cognitive robotics. Adaptive Behavior, 7, 5–16. Cruse, H. (1999). Feeling our body – the basis of cognition? Evolution and Cognition, 5, 162–173. Cruse, H. (2003). The evolution of cognition – a hypothesis. Cognitive Science, 27, 135–155. Cummins, R. (1996). Systematicity. The Journal of Philosophy, 93, 591–614. Damasio, A. R. (1999). The feeling of what happens: body and emotion in the making of consciousness. New York; London: Harcourt Brace. Dennett, D. C. (1981). Brainstorms: philosophical essays on mind and psychology. Cambridge: MIT Press. Dennett, D. C. (1995). Darwin’s dangerous idea: evolution and the meanings of life. New York: Simon & Schuster. Dretske, F. I. (1981). Knowledge and the flow of information. Cambridge, Mass.: MIT Press. Evans, G., & McDowell, J. H. (1982). The varieties of reference. Oxford Oxfordshire New York: Clarendon Press; Oxford University Press. Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture. Cognition, 28, 3–71. Fodor, J. A. (1975). The language of thought. New York: Crowell.
The Spatial-Motor View
249
Grezes, J., & Decety, J. (2001). Functional anatomy of execution, mental simulation, observation and verb generation of actions: A meta-analysis. Human Brain Mapping, 12, 1–19. Grice, H. P. (1989). Studies in the way of words. Cambridge, Mass.: Harvard University Press. Grush, R. (2003). In defense of some ‘Cartesian’ assumptions concerning the brain and its operation. Biology and Philosophy, 18, 53–93. Hauser, M., & Spelke, E. (in press). Evolutionary and developmental foundations of human knowledge: a case study of mathematics. In M. Gazzaniga (Ed.), The cognitive neurosciences III. Cambridge: MIT Press. Holyoak, K. J., & Thagard, P. (1995). Mental leaps: analogy in creative thought. Cambridge, Mass.: MIT Press. MacDorman, K. F. (1999). Grounding symbols through sensorimotor integration. Journal of the Robotics Society of Japan, 17, 20–24. Maravita, A., & Iriki, A. (2004). Tools for the body (schema). Trends in Cognitive Sciences, 8, 79–86. Maravita, A., Spence, C., & Driver, J. (2003). Multisensory integration and the body schema: close to hand and within reach. Current Biology, 13. McFarland, D., & Basser, T. (1993). Intelligent behavior in animals and robots. Cambridge, Mass.: MIT Press. Melzack, R. (1990). Phantom limbs and the concept of a neuromatrix. Trends in Neurosciences, 13, 88–92. Mitchell, J. (1997). Introduction: Special issue of algorithmica devoted to computational geometry and manufacturing,. Algorithmica, 19, 1–3. Proust, J. (1999). Mind, space and objectivity in non-human animals,. Erkenntnis, 51, 41–58. Putnam, H. (1981). Reason, truth, and history. Cambridge Cambridgeshire; New York: Cambridge University Press. Rosch, E., Lloyd, B. B., & Social Science Research Council (U.S.). (1978). Cognition and categorization. Hillsdale, N.J. New York; Toronto: L. Erlbaum Associates; distributed by Halsted Press. Sperber, D., & Wilson, D. (1986). Relevance: communication and cognition. Oxford: Blackwell.
250
Pierre Poirier and Benoit Hardy-Vall´ee
Stein, L. A. (1994). Imagination and situated cognition. Journal of Experimental and Theoretical Artificial Intelligence,, 6, 393–407. Wilson, D., & Sperber, D. (2004). Relevance theory. In G. Ward & L. Horn (Eds.), Handbook of pragmatics (pp. 607–632). Oxford: Blackwell.
Part III
C OMPOSITIONALITY AND THE B RAIN
Learning Representations: How Stochastic Models Compose the World Ralf Garionis
1
Introduction
What is representation? There is no way of bypassing the term when describing the information flow in the brain as its various areas share and propagate information on the basis of different forms of neural representation. An internal representation is not a one-to-one image of the world. Instead, it corresponds to aspects of the world, which is not immediately accessible when considering subsymbolic representations. But the representations’ major links to the world we can explore are cause and inference. The problem of representation in the brain suggests that representations should have determining properties. Thus, asking for these does matter since they help us to understand learning processes and the representations they create. Here, we will describe a framework allowing us to consider stochastic representations as vocabulary the world is composed of. In particular, we propose coupled top-down and bottom-up directed causal models as point of origin for reasoning about compositionality. In sections 2 to 5 we will elucidate how these models match neurobiological findings and theories of perception as well as formal models of learning and data analysis. An example in section 6 will reveal that a simple causal model describing compositionality can give rise to extended and detailed statistical reasoning. 2
Representation
Studying representations becomes essential when we want to understand the information flow within the brain: Representations form the data structures used by different areas of the brain when passing on information. Considering the Address for correspondence: University of Dortmund, Department of Computer Science XI, 44221 Dortmund, Germany. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
254
Ralf Garionis
Figure 1: Represented world (left) and two representing worlds (middle, right). Each representation picks up a particular aspect of the represented world and represents it by another property. Here, the brightness (middle) and the volume (right) of the respective representing world’s objects stand for the width of the represented world’s objects. visual system, the world is initially encoded by the 108 photoreceptors of the retina for then becoming recoded each time when traveling along the visual pathway and moving from one processing area to another – the physical format of the information changes on this way. Each recoding step is dedicated to the subsequent area permitting this to accomplish its specific tasks. If we take this idea further towards a unifying theory of vision, representation forms the basis the various processes of the visual system operate on – such a theory was offered by Marr (1982). Then the goal of vision becomes to extract distal properties of the visual world for integrating them into the representation. Therefore it should be possible to carry out all visual tasks on the basis of the information contained in the representation – if the visual system has extracted appropriate properties of the visual stimuli, which is the crucial point. Thus, asking for the properties composing the world is a step towards understanding representations being the result of an unsupervised learning process. 3
Unsupervised Learning
By recognizing unsupervised learning as a process aiming at forming representations, it is possible to formulate an alternative view on the goals of learning. In the area of visual perception, there are good arguments from the psychological (Cavanagh, 1991) and neuroscience (Burkhalter, 1993) side for the presence of backward directed connection structures within the visual cortex (i.e. upstream the visual pathway). For example, our ability to see two different interpretations in an ambiguous image (fig. 2) is an indicator for perception being influenced by top-down processing. It is not possible to explain the alternative image interpretations through optical cues present in the image – they must be rooted in our memorized knowledge about the objects themselves. In similar way, mental imagery links to top-down processing. All this suggests perception being based on a generative model. Such a model generates from an abstract descrip-
Learning Representations
255
Figure 2: We have to take memorized knowledge about two alternative explanations into account for making sense of the image – and we have problems doing so. This ambiguous image can be seen either as a rabbit or a duck (seemingly lying on its back). tion the corresponding sensoric data (e.g. of a visual scene) – this is exactly the approach taken by computer graphics. So, generative models complement vision in the same way as computer graphics complements computer vision. As a consequence, we can consider perception as an active process: the process of inverting the generative model. If we assume that the sensoric data was created by the generative model, we get an abstract description of the data through knowledge of the model. In terms of stochastic inference, learning then means to adjust the generative model’s parameters such that the model’s likelihood of having generated the sensoric data is maximized. As consequence, perception – and therefore unsupervised learning – becomes a maximum-likelihood optimization problem. 4
Top-Down Processing
This perspective on unsupervised learning is consistent with Grenander’s (1996) pattern theory assuming the world of patterns as being generated. Independent of the origin and the properties of the patterns, pattern theory aims at the pattern analysis by reconstructing those processes, objects, and events that gave rise to the patterns. A prediction of structures through analysis via synthesis is the result. Grenander’s theory is essentially based on Bayesian statistics and information theory underlying a general computing architecture for finding solutions in the form of top-down and bottom-up iterations taking problem specific constraints into account. Here, a bottom-up directed model refers to the processing direction from sensoric data to abstract descriptions (i.e. in case of visual processing, upstream the visual pathway). Another interpretation of visual perception embeds into the framework of generative models: According to Helmholtz (1909), perception is our best guess as to what is in the world, based on current sensoric data (e.g. the observed retinal image) and prior experience (e.g. scene statistics). Then a perceptual expe-
256
Ralf Garionis
Figure 3: Different instances of one concept – cup-like objects. Shape variation and context can change the names and associated meanings we use for the objects (Labov, 1973). Therefore, our expectation and our previous experience do play a role when thinking about these objects, which is an indicator for top-down directed processing taking place. rience should choose the simplest or likeliest explanation of a scene. Helmholtz suggested perception being a process of unconscious inference: our perception is guided by the strategy that we prefer to see images that are most likely being seen – vision becomes inverse optics. Using a formal notion, we may assume that the state of the world is described by a random variable x and that it is possible to describe the perceived signals (or representation) by a random variable y. Then a bottom-up directed recognition model having parameters R transforms the sensory data x into a representation y. At the same time, a corresponding top-down generative model (with parameters G) maps the representation y back into approximated sensory data xˆ (see figure 4). In this statistical framework, we can define best guess in the Bayesian sense. Formally, the task of vision is now to infer the state of the world x given y, for which we may use Bayes’ rule: p(y | x) · p(x) p(y) resulting in a maximum a posteriori reconstruction of x x
MAP
p(x | y) =
(1)
= arg max{p(y | x) · p(x)}.
(2)
x
Here, finding the optimal model explaining the world x through its causes y is computationally intractable, since an infinite number of scenes x might have given rise to a particular signal y. Searching for such meaningful stochastic models is a major topic in theoretical neuroscience, e.g. for explaining empirical data (Dayan & Abbott, 2001). We can reduce the model search space by limiting the search to a class of density functions p of which only a parametric set is considered. That is, instead of attempting to find the true probability distribution, we look for suitable parameters governing a given distribution. This approach to probability density estimation demands choosing a parameterized distribution and for a strategy capable of finding the parameters of the distribution. Suppose we have a density
Learning Representations
257
Figure 4: Representation as a result of a mapping: The represented world’s external objects (x) are mapped to the representing world’s internal representations (y). A bottom-up directed recognition model converts an input vector x into a code vector y, which is a representation of x. In terms of causal models, y describes the causes for x. The top-down directed generative model maps the code vector y back into a reconstructed input vector x. ˆ x is an object of the represented world, y is an object of the representing world. Note that x and y are stochastic vectors. function p(x|G) governed by a set of parameters G. Additionally, a set of independent and identically distributed training data X = {x(1) . . . x(d) } with distribution p is given (we will omit in the following for simplicity the pattern index superscript). The probability density of seeing the samples of this particular set is p(X | G) =
d Y
p(x(i) | G) = L(G | x)
(3)
i=1
called the likelihood function L (of the parameters given the data). The maximum likelihood principle requires choosing the parameters G such that the likelihood function is maximized. Thus, the likelihood is a function of the parameters G with fixed training data x. Then finding appropriate parameters G needs solving G∗ = arg max L(G | x) G
(4)
which will provide parameters G∗ corresponding to a maximum of the likelihood L(G | x). In order to gain a more tractable expression often the log likelihood function `(G | x) = log L(G | x) is maximized instead. Finding a solution for equation (4) depends on the nature of p(x | G). The straightforward approach of setting the derivative of log L(G | x) to zero and solving the resulting equation for parameters G does not necessarily deliver an expression which can be worked out analytically. We will consider the more general case where such a simple strategy will not be successful demanding for a more advanced method. So far, we have described the objects (or data) x of the world through an
258
Ralf Garionis
underlying probabilistic structure we wish to discover. The approach of probability density estimation reduces the complexity of this task by assuming a set of fixed distribution functions which are governed by a set of parameters. The maximum likelihood method demands finding the parameters G such that the probabilistic model provides the most probable explanation for the observed data x. This concept is very modular because it assembles the world x from individual probability density functions forming together a causal model of x. The model’s adaptation, i.e. learning, is performed through inference, as we will see in detail: 5
Solving the Learning Problem
The above formulation casts perception in terms of probabilistic generative models. Though the formulation is simple and intriguing, it does not provide a way for finding the generative model’s parameters. A solution is offered by the iterative Expectation-Maximization (EM) algorithm (Dempster et. al., 1977). The EM algorithm is a general approach to maximum likelihood estimation of the observed input data (x) with regard to the generative model’s parameters explaining the data. Despite standard maximum likelihood optimization, the EM algorithm considers the complete data likelihood formed by the joint likelihood of the observed variables (here: x) and the so called unobserved variables (y). Within our generative model framework the y are the causes that have given rise to the model’s input x. Two iterated steps are required for the estimation of the generative model’s parameters: First, the E (expectation) step finds the conditional probability density for the unobserved variables given the observed (input) data and initial values of the parameters to estimate. Then the maximization (M) step maximizes the log-likelihood of the observed data and identifies a new set of parameters for the generative model. We start with the joint probability density function for x and y with G being the generative model’s parameters p(x, y | G) = p(x | y; G)p(y; G) such that we can formulate the log likelihood of the joint distribution
`=
* X x
+
*
log p(x, y | G) = log
+ Y x
*
p(x, y | G) = log
(5)
+ Y
p(x | y; G)p(y; G)
x
(6) as sum over all inputs x presented to the model with h·i denoting expectation values. Then the E step has to calculate values for ` while the M step typically
Learning Representations
259
maximizes ` over G by using a gradient w.r.t. G. 6
A Modeling Example
For showing how we can access the compositional nature of the world by means of modeling through top-down and bottom-up models (i.e. generative and recognition models), we will consider a simple example of stochastic modeling. Its major objective is to explain data through factorial probability distributions and thus addresses one of a number of properties assumed to be essential for learning neural representations (Barlow, 1989). Factor analysis (Bartholomew, 1984; Everitt, 1984) attempts to explain correlations among the m components xi , i = 1, . . . , m of the input vectors x (the observed variables, in terms of causal models) by a continuous n-dimensional vector y of causes y j , j = 1, . . . , n (the unobserved hidden variables, called factors). This model is of interest as it explains the world constituted by the vectors x through independent causes y x = G·y+ε
(7)
where the m × n so-called factor loading matrix G defines the connections of the generative model of figure 4. The Gaussian noise ε is assumed as distributed with mean zero and a diagonal covariance matrix Σε which may carry distinct elements. For short, we write p(ε) = N(0; Σε ). The explicit incorporation of noise into stochastic models is an important technique in statistics and neural computation. It is based on the assumption that the sensory signals (here: the stochastic vectors x) we observe are not error free expressions of structure underlying these signals. Instead, these observations contain inaccuracies modeled as noise. A consequence of assuming Σε having diagonal shape is that the noise components are mutually independent of each other. This moves the responsibility of correlations between the elements of the vectors x to the factor loading matrix G. I.e., the correlations must originate from G · y where G must have captured them during the learning steps. Additionally, since the elements of Σε are not restricted to be identical, the noise levels can be different for each input variable (i.e., component of x). The causes y are assumed to be independently chosen from a standard Gaussian distribution N(0; 1): n Y 1 √ e−y j /2 . p(yi ) = p(y; G) = 2π j=1 j=1 n Y
(8)
260
Ralf Garionis
Under the generative model, the values of the vectors x follow another Gaussian distribution with mean G · y and diagonal covariance matrix Σε . This makes p(x | y; G) a conditional multivariate Gaussian distribution N(G y; Σε ) over x: 1 1 p exp − (x − G y)T Σε −1 (x − G y) p(x | y; G) = (9) m/2 2 (2π) |Σε | while the covariance matrix of the unconditional input distribution p(x; G) can be found directly as Σx = h(G y + ε)(G y + ε)T i = GGT + Σε .
(10)
This means, factor analysis can be seen as an estimation strategy for fitting the sample estimate of Σx to the true covariance matrix of the distribution of x (Bartholomew, 1984). The generative distribution (9) describes in conjunction with the prior distribution over causes (8) and equation (7) the top-down directed generative model of fig. 4 completely. Thus, factor analysis is a linear model with a Gaussian prior distribution over causes and a non-isotropic Gaussian noise model. The corresponding recognition model (see figure 4) is specified through the recognition distribution T 1 Σyx −1 y − Σyx GT Σ−1 exp − y − Σyx GT Σ−1 p(y | x; G) = ε ε n/2 2 (2π) |Σyx | (11) where especially the covariance matrix of y and x −1 T −1 Σyx = G Σε G + I (12) 1 p
will be required later on for adjusting the generative model’s parameters. The remaining task consists in deriving learning rules by using the EM algorithm to train the model’s parameters G and Σε for adapting it to a current environment. For describing this strategy briefly, we can make use of well known findings treating aspects of the subject from a purely statistical viewpoint (Bartholomew, 1981; Rubin & Thayer, 1982; Bartholomew, 1984; Everitt, 1984). First, we have to differentiate the log-likelihood function ` given by equation (6) with respect to its parameters. Here, the generative model’s set of parameters consists of the matrices G and Σε . For calculating ` we can ignore log p(y; G) since it is – in the case of factor analysis – independent of G and Σε (see (8)). Therefore it is sufficient to write:
Learning Representations
+
* `=
log
261
Y
p(x | y; G) .
(13)
x
Setting the derivative of the simplified log-likelihood w.r.t. G to zero gives as update rule for G: ! !−1 X X G∗ = x hy | xiT hyyT | xi . (14) x
x
Maximization of (6) with regard to Σε requires setting the derivative of the likelihood w.r.t. Σ−1 ε to zero and solving for Σε . This leads to: ! ! X X 1 Σ∗ε = diag x xT − hy | xixT G∗T (15) d x x with d referring to the number of observed input data patterns x and the diag operator setting all matrix elements off the main diagonal to zero. Equations (14) and (15) form the M step of the EM algorithm. The E step requires finding values for hy | xi and hyyT | xi as prerequisite for computing G∗ and Σ∗ during the M step. These values can be obtained as: −1 T −1 hy | xi = I + G Σε G GT Σ−1 (16) ε x = R·x with R = I + GT Σ−1 ε G
−1
GT Σ−1 ε and
hyyT | xi = Σyx + hy | xihy | xiT
(17)
for which we use Σyx as given by the recognition model’s covariance matrix (12). Equation (16) reveals the recognition model (parameter matrix R, according to fig. 4) as another linear model. Despite the generative model, it does not specify an additive noise model. The EM algorithm updates the model’s parameters G and Σε by performing the computations of the E step (equations (16) & (17)) and the M step (eq. (14), (15)) in alternate order. That is, the new values G∗ and Σ∗ε calculated during some M step are used as new values for G and Σε in the next E step. Furthermore, the training of the model is solely performed through a number of simple matrix operations with the matrix inversion being the most complex calculation to do. A more detailed discussion of the factor analysis model by describing the effects of slightly varied restrictions on the parameters (e.g. as performed by
262
Ralf Garionis
(Rubin & Thayer, 1984)) would not be more illuminative as far as the scope of this text is concerned. In this section, we have provided a particular formulation of a density estimation model splitting up into two separate models for recognition and generation. The formulation is designed to match our concept of unsupervised learning based on the non-standard construct top-down processing. Though factor analysis is a simple model learning a composition of Gaussians, the class of causal models for probability density estimation it belongs to is important, e.g., for studying aspects of how the nervous system operates. The example show that the use of separate recognition and generative models is an elegant way to separate calculations describing compositionality from those that do not. Likewise, the fundamental approach of causal models, i.e. to model the world through probably underlying causes, reveals the compositional nature of the world as well. Certainly, a linear single layer model like factor analysis is not capable to learn rich and hierarchical representations which are more interesting in terms of nervous system function. But it is instructive in terms of causal modeling broken up into recognition and generative models. 7
Summary
According to Plato’s allegory, we are imprisoned in a cave knowing the world only from the shadows it leaves on the wall, mistaking appearance for reality and knowing nothing about the real causes of the shadows. This is the essential dilemma of representations. But what is representation then? Here, we have contributed a well-founded framework describing a representation as the origin of the process of maintaining the represented world’s assumed statistical structure through probabilistic learning schemes. The causes building the vocabulary the world is composed of become explicit by this modeling method. We reach this in an elegant way by splitting up the modeling into separate models for top-down and bottom-up processing. The view of the world as defined by stochastic, generative models draws a connection between representations and the world through the identification of probable causes underlying sensory data. Top-down models allow us to explore different modeling assumptions regarding how the world is composed and which properties of the world representations describe. Especially, the extend to what aspects of the world a representation accounts for is important (see figure 3). Thus, compositionality can be embedded perfectly in stochastic frameworks
Learning Representations
263
providing new ways of reasoning about compositional principles. References Barlow, H. B. (1989). Unsupervised learning. Neural Computation, 1, 295–311. Bartholomew, D. J. (1981). Posterior analysis of the factor model. British Journal of Mathematical and Statistical Psychology, 34, 93–99. Bartholomew, D. J. (1984). The foundations of factor analysis. Biometrika, 72(2), 221–232. Burkhalter, A. (1993). Development of forward and feedback connections between areas V1 and V2 of human visual cortex. Cerebral Cortex, 3(5), 476–487. Cavanagh, P. (1991). What’s up in top-down processing? In A. Gorea (Ed.), Representations of vision: Trends and tacit assumptions in vision research; a collection of essays based on the 13th European Conference on Visual Perception, Paris, Cit´e des Sciences et de l’Industrie, France, September 1990 (pp. 295–304). Cambridge, UK: Cambridge University Press. Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience. Cambridge, MA: MIT Press. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Proceedings of the Royal Statistical Society / B, 39(1), 1–38. Everitt, B. S. (1984). An introduction to latent variable models. London, England: Chapmann and Hall. Goodman, N. (1972). Problems and projects. Indianapolis, IN: Bobbs-Merrill. Grenander, U. (1996). Elements of pattern theory. Baltimore, MD: Johns Hopkins University Press. Helmholtz, H. von. (1909). Handbuch der physiologischen Optik (3rd ed.). Hamburg: Voss. Labov, W. (1973). The boundaries of words and their meanings. In C.-J. N. Bailey & R. W. Shuy (Eds.), New ways of analyzing variation in English (pp. 340–373). Washington, DC: Georgetown University Press. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: W. H. Freeman & Co.
264
Ralf Garionis
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609. Palmer, S. E. (1978). Fundamental aspects of cognitive representation. In E. Rosch & B. Lloyds (Eds.), Cognition and categorization (pp. 259–303). Hillsdale, NJ: Erlbaum. Rubin, D. B., & Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika, 47(1), 69–76.
Neural Architectures of Compositionality Frank van der Velde Compositionality is a key feature of human cognition. Thus, it has to be a key feature of the human brain as well. However, it has been notoriously hard to show how compositionality can be implemented in neural structures (both natural and artificial). In this article, I will show how compositionality can be implemented in neural structures. In particular, I will discuss a neural ‘blackboard’ architecture of compositional sentence structure. The neural blackboard architecture solves the four challenges for cognitive neuroscience described by Jackendoff (2002). Here, I will concentrate on two of these problems: the massiveness of the binding problem as it occurs in language, and the problem of multiple instantiation (or the ‘problem of 2’). These two problems are instances of the well-known binding problem in neuroscience, which deals with the question of how the activity in separated groups of neurons can be related in a manner that is in agreement with the information processed in these groups of neurons. (This ‘binding problem’ should not be confused with Binding Theory in linguistics, that deals with issues like anaphoric relations.) A discussion and solution of the remaining problems discussed by Jackendoff, and a relation with compositional structures in vision, can be found in Van der Velde and De Kamps (in press). A main purpose of language is to provide information about ‘who did what to whom’. Thus, the sentence John gives Mary a book provides information about what is given, who gives, and who is given to. To provide this information, the arguments John, Mary, and book have to be bound to the ‘thematic roles’ of the verb give. The success of this binding can be tested by the ability to answer relevant binding questions, such as “What does John give to Mary?” or “Who gives the book to Mary?”. Thus, a neural implementation of language structure must be able to represent the binding of arguments to the thematic roles of verbs, and must be able to produce answers to relevant binding questions. Furthermore, a neural implementation of language should be in line with the (astronomical) productivity of natural language. That is, with even a simple sentence structure like noun verb noun (basically a verb-argument binding) a Address for correspondence: Cognitive Psychology, Leiden University, Wassenaarseweg 52, 2333AK Leiden, The Netherlands. E-mail: [email protected]. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
266
Frank van der Velde
huge number of sentences can be produced by using a large lexicon. To provide for this productivity, the representation of verb-argument binding should result from a compositional form of representation, which allows the binding of arbitrary arguments to the thematic roles of arbitrary verbs, as in the sentences cats love dogs or houses eat furniture. Further complexity will be achieved by using clauses as arguments, as in the sentence John knows that Mary loves Bill. Continuing in this way, novel sentences can be produced on the basis of the existing representations of familiar verbs, arguments and clause structures. 1
Binding with Neural Assemblies
Obviously, a neural implementation of language should also be in line with the way in which representation and processing could occur in the brain. Pulverm¨uller (1999) discussed how words are represented in the brain as neural assemblies. An assembly is an old concept in neuroscience, introduced by Hebb (1949). It consists of a group of neurons that are strongly interconnected. In the case of words, the assemblies are distributed over different areas in the brain, depending on the nature of the words. Thus, an assembly for a word will include a representation of its phonological word form. But if the word refers to visual stimuli, the assembly for that word will also include a representation of its visual form, or if the word refers to an action, its assembly will include neurons in the motor cortices. Fig. 1a illustrates (in a schematic manner) how the words John, Mary and loves are represented as neural assemblies, distributed in some manner over parts of the brain. The figure also illustrates the difficulty of representing the structure of a sentence like John loves Mary by means of associations between the assemblies for the words John, Mary and loves. The same association between assemblies would be produced with the sentence Mary loves John, because with that sentence the same assemblies for the words John, Mary and loves are activated, as illustrated in Fig. 1b. One could propose that the difference between John loves Mary and Mary loves John can be represented in terms of synchrony of activation. In the first sentence, the assembly for John could be in synchrony with an assembly that represents the agent role (or slot) of the verb loves, which binds John as the agent of loves. In the second sentence, the assembly for John could then be in synchrony with the assembly that represents the theme role (or slot) of the verb loves, which binds John as the theme of loves in this sentence. A solution of this kind was presented by Shastri and Ajjanagadde (1993). However, several problems arise with the use of synchrony as a binding mechanism in language. Firstly, a duplication of words in different roles (i.e.,
Neural Compositionality
(a)
a loves Mary
267
a
t
t
N1
V1
N2
John
loves
Mary
John
a
(b) loves Mary
a
t
t
N3
V2
N4
Mary
loves
John
John
(c) gating: selective activation memory: binding
a
verb phrase V1
t
sub-assembly main-assembly
Figure 1: Examples of word assemblies and neural sentence structures. the ‘problem of 2’) can easily result in conflicts with synchrony as a binding mechanism. Consider a sentence like John gives Mary a book and Mary gives John a pen. Here, John is the agent of give but also the theme of give. So, the assembly for John has to be in synchrony with both the agent and the theme role of give. But this will easily create a confusion with a sentence like John gives Mary a pen and Mary gives John a book. In this sentence, John has the same binding relations with give as in the first sentence. Similar difficulties, of course, occur with the word Mary in these sentences. Secondly, synchrony as a binding mechanism will create confusions in hierarchical binding relations that often occur in language (i.e, the massiveness of binding in language). For instance, in a simple sentence like John knows that Mary loves Bill, the entire clause Mary loves Bill should be in synchrony with the theme role of knows. However, within Mary loves Bill, Mary should be in synchrony with the agent role of loves, but not with the theme role of loves, and conversely for Bill. It is not easy to see how these different constraints can be fulfilled simultaneously. The massiveness of binding (i.e., hierarchical structure) and the problem of 2 (i.e., multiple instantiation) are combined in a sentence like big John knows that Mary knows little John. If we ignore the adjectives big and little for the moment, John is both the agent of knows and the theme of knows, but in different ‘layers’ of the sentence (i.e., matrix sentence versus complement clause). Mary is the agent of knows, but only in the complement of the matrix verb (knows). With
268
Frank van der Velde
synchrony as a binding mechanism, the question arises how confusions with sentences like John knows that John knows John, Mary knows that Mary knows John or Mary knows that John knows John can be avoided. The third difficulty of synchrony as a binding mechanism is perhaps the most fundamental one. It is related with the ability to answer binding questions on the basis of synchrony. To answer ”Who loves Mary?” in case of the sentence John loves Mary, the synchrony between John and the agent role of loves has to be detected. This would seem to require a coincidence detector that is tuned for this purpose. This is indeed the case in models that use synchrony as a binding mechanism (see Van der Velde and De Kamps, 2002), including the model of Shastri and Ajjanagadde (2003). In that model, each proposition, like John loves Mary, is represented with a specialized ‘fact node’ that detects the synchrony relations in that proposition. But coincidence detectors are conjunctive forms of representation. In this way, they are the opposite of a compositional form of representation. First of all, there would have to be coincidence detectors for any possible binding relation that could occur in language, which is practically excluded given the multitude of sentences (and thus binding relations) that can be produced in language. Secondly, the use of coincidence detectores precludes the understanding of novel sentences (binding relations), never heard before. For instance, even if we hear a sentence like The terminator wants to be president for the first time, we understand what it means. In particular, we can answer a question like “What does the Terminator want?”. But a coincidence detector for this relation, like a fact node in Shastri and Ajjanagadde’s model, would be missing. (Why would we have detectors for binding relations that we have never heard or seen before?). In other words, synchrony as a binding mechanism is not productive, as a compositional form of representation should be. 2
An Architecture for Binding with Neural Assemblies
Here, I will present a neural architecture in which binding relations as discussed above can be represented. In particular, with the architecture arbitrary arguments (including clauses) can be bound to arbitrary verbs, and answers to binding questions can be produced. The model is based on interactions between neural assemblies. In particular, two kinds of assemblies are used in the model: ‘content’ assemblies and ‘structure’ assemblies. The content assemblies are the word assemblies discussed above. They represent nouns, proper names and verbs, thus the specific actors and actions that occur in a sentence. In the model presented here, it is assumed that an (unspecified) part of the assembly for a word can be used to represent the whole
Neural Compositionality
269
assembly for that word in the representation of a sentence in which that word occurs. The structure assemblies are assemblies that represent elements of syntactic structures. In particular, there are structure assemblies for syntactic structures such as Noun Phrases (NPs) and Verb Phrases (VPs). The NP and VP assemblies possess an internal structure, composed of a main assembly and subassemblies for thematic roles (agent, theme, recipient) and syntactic elements (e.g., clause, complement). The subassemblies are connected to the main assembly by gating circuits, which can be activated when certain control conditions are met. An example of a VP assembly is illustrated in Fig 1c. Here, the main assembly (V1) is connected with gating circuits to two subassemblies, one for agent (a) and one for theme (t). In the course of syntactic processing, each content assembly is temporarily bound to a structure assembly of the same nature, such as a noun to a NP assembly (I will treat proper names as nouns from here on). The NP assembly used in this binding is arbitrary, but ‘free’ (i.e., not already used). Similarly, a verb is bound to a VP assembly. Binding between a content assembly and a structure assembly occurs in a ‘memory circuit’ that connects the two assemblies. Activation of the memory circuit during syntactic processing results in delay activity in the circuit, which is a memory for the binding of the two assemblies. The delay activity can be used to reactivate the assemblies when needed (e.g., to produce answers to binding questions). The binding of arguments to the thematic roles of verbs results from the temporal binding between respective NP and VP assemblies. To this end, the matching subassemblies (e.g., for agent) of a NP and a VP are also connected by memory circuits, which produces temporal binding between the subassemblies, and thus between the NP and the VP assembly, during syntactic processing. For instance, to represent John loves in Fig 1a, John is bound to an NP assembly (N1) and loves is bound to a VP assembly (V1), and these NP and VP assemblies are bound to one another by means of their agent subassemblies. Likewise, to represent loves Mary in Fig 1a, Mary is bound to another NP assembly (N2), loves is bound to a VP assembly, and these NP and VP assemblies are bound to one another by means of their theme subassemblies. If the same VP assembly is used both in the representation of John loves and in the representation of loves Mary, as in Fig 1a, then the sentence John loves Mary is represented by means of this VP assembly and the two NP assemblies for John and Mary. The representation of Mary loves John proceeds in the same manner. The only difference with John loves Mary is that the words are bound to different structure assemblies. As illustrated in Fig 1b, Mary is bound to N3, loves is bound to V2 and John is bound to N4. When N3 and V2 are bound with their agent subassemblies and V2 and N4 are bound with their theme subassemblies,
270
Frank van der Velde
a representation of the sentence Mary loves John is produced. Even though the same words occur in both sentences, the representation of Mary loves John in Fig 1b does not interfere with the representation of John loves Mary in Fig 1a, because of the use of different structure assemblies. For instance, because John is bound to N1, it is the agent of loves in Fig 1a, and because it is bound to N4 it is the theme of loves in Fig 1b. Before I discuss the architecture in more detail, I will first describe the gating and memory circuit, because these circuits play a crucial role in the binding process as it occurs in the architecture. 3
A Gating and Memory Circuit for Control and Binding
The gating circuit is presented in Fig 2a. The gating circuit controls the flow of activation between two assemblies, X and Y in Fig 2a, by means of an external control signal. The gating circuit is based on the process of disinhibition (e.g., Van der Velde and De Kamps, 2001). Thus, if the X assembly is active, it activates an inhibition neuron (or group of neurons). This neuron inhibits the activation of an intermediary assembly, and thus inhibits the flow of activation from X to Y. In turn, the inhibition neuron can be inhibited by another inhibition neuron that is activated by a control ‘signal’, produced by an external control circuit. In that case, the X assembly can activate the intermediary assembly, which then activates the Y assembly. Thus, the Y assembly can be activated if (and only if) the control signal is ‘on’ and the X assembly is active. The activation of the X assembly by means of the Y assembly operates with a similar circuit. In the remaining figures, the gating circuit (in both directions) will be represented with the symbol illustrated in Fig 2a. The memory circuit is presented in Fig. 2b. It is in effect also a gating circuit. However, in contrast with the gating circuit presented in Fig. 2a, the control of the activation flow in this circuit results from a delay assembly embedded in the circuit. Once activated, the delay assembly will remain activated for a while due to the reverberating activity in this assembly. As a result, the circuit can be in one of two states; an ‘inactive’ state and an ‘active’ state. When it is in the inactive state, the circuit in effect operates like the gating circuit presented in figure 2b, without the availability of a control signal. The memory circuit will be in the ‘active’ state when the delay assembly in it is activated. It is assumed that this will occur when the assemblies X and Y are active concurrently. Once the delay assembly has been activated, it will remain activated (for a while) even when the X and Y assemblies are inactive. When the delay assembly is active, the memory circuit is in its active state. If the X assembly is again activated (from outside the circuit), the active delay assembly
Neural Compositionality
271
Gating Circuit
Memory Circuit
controlXtoY
delay
i
i i
X
i Y
out
Symbol: X
Y
(X to Y and Y to X) (a)
X X
out
Y
Y (inactive)
X Y (active) (X to Y and Y to X) (b)
Figure 2: A gating circuit (left) and a memory circuit (right), as used in the architecture will disinhibit the flow of activation from the X assembly to the Y assembly, so that the Y assembly will now be activated. A similar circuit operates in the direction from Y to X. Thus, if the memory circuit is inactive, it blocks the activation flow from X to Y and vice versa. In contrast, if the memory circuit is active, activation can freely flow from X to Y and vice versa (as long as the delay assembly remains active). In this way, the circuit produces a binding between two assemblies. But to activate the circuit (i.e., the delay assembly in the circuit) the X and Y assemblies have to be co-activated first. Thus, at the same time, the memory circuit represents a memory of the fact that two assemblies (X and Y in Fig 2b) have been co-activated at a certain time, e.g., in the course of syntactic processing. The memory circuit is represented with the symbols illustrated in Fig. 2a, one for the circuit in its inactive state, and one for the circuit in its active state. 4
Overview of the Architecture for Verb-argument Binding
Fig. 3 presents the global architecture for binding arguments to verbs as proposed here. Each assembly that represents a noun is connected to the main assembly of each NP assembly by means of a memory circuit, which is initially inactive. Likewise, each assembly that represents a verb is connected to the
272
Frank van der Velde
Word assemblies (nouns) Nx (inhibition) Ny connection matrix of agent subassemblies
a
t
a
t
Vi (inhibition) Vj
NP main-assemblies connection matrix of theme subassemblies VP main-assemblies
Word assemblies (verbs)
Figure 3: Overview of the architecture for verb-argument binding main assembly of each VP assembly by means of an (initially inactive) memory circuit. In turn, each NP and VP assembly is connected to subassemblies (here, agent and theme) by means of gating circuits. The gating circuits can be activated in a specific manner by neural control circuits. For instance, the gating circuits between the main assemblies and the agent subassemblies can be activated without activating any of the theme subassemblies. Finally, all subassemblies of the same kind (e.g., representing the same thematic role) are connected by means of memory circuits. For instance, each agent subassembly of the NP assemblies is connected to each agent subassembly of the VP assemblies by means of an (initially inactive) memory circuit. A memory circuit between two agent subassemblies will be activated when both subassemblies are active at the same time. In the processing of a sentence it is assumed that, whenever the assembly for a noun is activated, one of the NP assemblies is activated as well. It is arbitrary which of the NP assemblies is activated, provided the assembly is ‘free’, that is, not already bound to a noun. (Notice that when a NP assembly is bound to a noun, one of the memory circuits connected to the assembly is active. This activation could be used as a signal that the NP assembly is not free.) As illustrated in Fig. 3, only one NP assembly can be active at the same time, due to the competition between the NP assemblies that results from the lateral inhibition between their main assemblies. It is assumed that the active NP assembly will remain active until a new NP assembly is activated by the occurrence of
Neural Compositionality
273
a new noun in the sentence. (This could result from the high transient activity that is frequently found in the cortex whenever a new stimulus appears. Due to this transient activity, the new NP assembly will win the competition before its activity reduces to a steady state. Or, the occurrence of a new noun could result in the inhibition of the active NP assembly before a new NP assembly is generated.) The selection of a VP assembly proceeds in the same manner. Thus when the assembly for John is activated, a NP assembly is activated as well, and the assembly for John is bound to this NP assembly by the activated memory circuit that connects the two assemblies. In the same manner, the assembly for loves will be bound to a VP assembly. To achieve the binding of John and loves, a binding has to occur between the NP and VP assemblies to which John and loves are bound. Fig. 3 shows that binding between NP and VP assemblies occurs by means of the subassemblies of the same kind (e.g., representing the same thematic role). In this case, binding will occur between the agent subassembly of the NP assembly for John and the agent subassembly of the VP assembly for loves. This binding occurs because the gating circuits between the structure assemblies and the agent subassemblies are activated by neural circuits that control the process of binding in the architecture. In this case, a control circuit will interpret the sequence noun-verb in terms of noun as the agent (subject) of the verb. An example of a control circuit is presented in Van der Velde and De Kamps (in press). Likewise, when the assembly for Mary is activated, it will bind with a (free) NP assembly. The control circuit will interpret that in the sequence noun-verbnoun the second noun is the object of the verb. The control circuit will activate the gating circuits for the theme subassemblies, which results in the binding of the NP for Mary and the VP for loves by means of their theme subassemblies, as in Fig 1a. The binding between assemblies in this model lasts only as long as the activity in the delay assemblies of the memory circuits (figure 2). When this activity ends, a structure assembly can be used to encode a different sentence structure. In this way, the number of structure assemblies in the architecture can be limited, which means that the model has no scalability problems. For a discussion of how structures can be transferred to a more long-term memory, see Van der Velde and De Kamps (in press). 5
Answers to Binding Questions
If the main purpose of language is to provide information about ’who did what to whom’, a neural architecture for language representation should be able to produce answers to ‘who did what to whom’ questions. For instance, in case of
274
Frank van der Velde
the sentence John loves Mary, a neural architecture should be able to produce an answer to the question “Who does John love?”. In terms of Fig 1a, the answer to this question should consist of the activation of the assembly for Mary. A simulation of this process, by means of populations of spiking neurons, can be found in Van der Velde and De Kamps (in press). Here, I will discuss this process in terms of the sentence John gives Mary a book and Mary gives John a pen. As noted above, with a sentence like this, problems arise when synchrony of activation is used as a binding mechanism, because John and Mary are both agent and theme in this sentence. The fact that the same word can have different roles in one or more sentences is an example of the problem of 2 (Jackendoff, 2002). Fig. 4a illustrates the basic structure of John gives Mary a book and Mary gives John a pen in terms of the architecture in Fig 3. In addition to subassemblies for agent (a) and theme (t), there are now subassemblies for recipient (r) as well. In comparison with the neural structures illustrated in Fig 1, I have used a more simplified representation in Fig 4a, in which two subassemblies and the active memory circuit that connects them are replaced by a single subassembly. However, the full structure is still implied. Thus, for instance, the binding between John and gives in Fig 4 is similar to the binding of John and loves in Fig. 1a. Likwise, the word assemblies are connected directly to their structure (NP or VP) assemblies, but the (active) memory assemblies as in Fig. 1 are still implied. The structure in Fig 4a can be used in a reasoning process similar to the one described by Shastri and Ajjanagadde (1993). For instance, the question “What does Mary own?” can be answered even if the representation (neural structure) of Mary owns a book is not available. In that case, the question “What does Mary own?” could be transformed into a question “What did Mary receive?”. This new question can be produced in a reflexive manner because of the (learned) associations between the respective arguments of own and give. These associations in fact encode rules, such as the rule that the receiver of an object is also the owner of that object (e.g., Shastri and Ajjanagadde, 1993). Translated in terms of the structure of Fig 4a, this means that the agent of own is the recipient of give and the theme of own is the theme of give. Thus, the question “What does Mary own?” provides the information that Mary is the agent of own, and it asks for the theme of own. Using the rule described above, this can be transformed into the information that Mary is the recipient of give and that the question asks for the theme of give. Although two propositions of the form X gives Y to Z are presented in Fig. 4a, confusions due to multiple instantiation do not arise here, as in the model of Shastri and Ajjanagadde (1993). In this case, the question “What does Mary own (receive)?” produces the activation of the representation for John gives
Neural Compositionality
N1
a
r
John
N2
N1
Mary
John
V1 gives
a
r
Mary
Mary
N3
N3
book
book
N5
N4
John
Mary
V2 gives
N5 John V2 gives
t
(a)
N2 V1 gives
t
N4
275
N6
N6
pen
pen
(b)
Figure 4: Production of the answer to What does Mary own (receive)? The VP assemblies V1 and V2 are connected with inhibitory connections. Shaded assemblies are active. Mary a book, but it does not produce the activation of the representation for Mary gives John a pen. This results from the fact that the question “What does Mary own (receive)?” provides the information that Mary is the theme of give, instead of the agent of give. Fig. 4a illustrates the process of answering the question “What does Mary own” (receive)? when the assembly for Mary has activated the NP assemblies N2 and N4. Furthermore, the question provides the information that Mary has the role of recipient, which can be used to activate the gating circuits for recipient in Fig 4a. Through N2, this results in the activation of the subassembly for recipient that connects N2 and V1. In turn, this results in the activation of V1 and the assembly for gives. The assembly for gives is also connected to V2 by an active memory circuit. Thus, V2 is activated by gives as well. However, the activation of V2 is inhibited by V1, which is more strongly activated than V2 because V1 receives activation from the assembly for gives and from N2. In this way, V1 will win the competition with V2. Because V1 remains as the only active VP assembly, the activation of the theme gating circuits (induced by the question “What does Mary own (receive)?”) results in the activation of N3, and thus of the assembly for book, as the answer to the question. The process illustrated in Fig. 4a shows that the problem of 2 is solved in this architecture because a word assembly (e.g, for Mary) can be associated
276
Frank van der Velde
(temporarily) with different structure assemblies, such as N2 and N4. In combination with the control of activation provided by the gating circuits, this results in the selective activation of Mary as the recipient in John gives Mary a book, instead of the activation of the representation for Mary in Mary gives John a pen. The importance of the gating circuits in this respect is further illustrated in Fig 4b. Here, the gating and memory circuits are removed and all assemblies are connected with direct connections. Without the gating circuits, main assemblies and their subassemblies unify into a single assembly. In this case, the question “What does Mary own (receive)?” produces the activation of the assemblies for Mary and for give. Through spreading of activation, the assembly for Mary now produces the activation of both V1 and V2. So, a distinction between V1 and V2 cannot be made in this case (other than by chance). Thus, a reliable answer to the question cannot be produced. In other words, both book and pen have an equal chance of being activated in this case. Thus, the presence of the gating circuits, and the control of activation they provide, is of crucial importance to implement the structure of a given sentence. Without the gating circuits, distinct word and structure assemblies (also) unify into larger assemblies. In Fig 4b, two such larger assemblies can be distinguished, one centered around V1 and one centered around V2. Two additional points should be made in relation with Fig 4. First, if the question was “What does John give to Mary”, both John and Mary (and give) would be activated, and both the gating circuits for agent (John) and theme Mary. In this way, the distinction between John gives Mary a book and Mary gives John a pen would be lost. This collapse of activation can be avoided by a sequential process. In this process, for instance, first John (gives) and the gating circuits for agent would be activated. In this way, all (encoded) sentences of the form John gives X to Y would be selected. Then in the next step, Mary and the gating circuits for recipient would be activated, which would result in the selection of John gives Mary a book over a sentence like John gives Susan a flower. Sequential processing is thus inevitable in processing sentence structures (as it also is in encoding sentence structures, e.g., see Van der Velde and De Kamps, in press). Second, the word assemblies for Mary, gives and John are the same in both sentence structures in figure 4a. In other words, the model has a distributed encoding of sentence structure. In contrast, in a ‘distributed’ model like LISA (Hummel and Holyoak, 1997), all binding relations are encoded with local nodes, including the representation of entire sentences. That is, each sentence is encoded with its own local node in this model. This precludes the representation of novel sentences in LISA. Here, novel sentences are encoded in the same way as familiar sentences, in the manner illustrated in figure 4. Notice that the word
Neural Compositionality
277
assemblies themselves are also distributed, and they could be partly overlapping. All that is required here is that a part of each word assembly is embedded in the architecture as illustrated in figure 3. 6
Binding of Clauses as Arguments
A crucial test for a neural architecture of language representation is the ability to bind clauses as arguments to verbs, as in the sentence John knows that Mary knows John. As argued in section 1, a sentence like this one cannot be represented in terms of synchrony of activation, because the whole proposition Mary knows John has to be bound to the thematic role (theme) of know, and the thematic roles of Mary and John within this clause have to be represented as well. The architecture as discussed thus far deals with verb-argument binding with nouns as the arguments of the verb (Fig. 3). To handle clauses as arguments as well, the architecture is extended in the manner illustrated in Fig 5. An additional simplification of presentation is introduced in the figure, in which the gating circuits are not represented explicitly, but are still intended. So, the presentation of the neural structure for John loves Mary in Fig 5a is the same as that in Fig 1a. In particular, the difference between a structure without gating circuits (Fig 4b) and a structure with gating circuits (Fig 5) is represented with the presence of subassemblies. In Fig 4b, the subassemblies are not represented, because without the gating circuits the subassemblies unify with their main assemblies into a single assembly. The presence of the subassemblies in Fig 5 represents the fact that the entire structure of gating circuits and memory circuits, as explicitly represented in Fig 1, is still intended. Fig 5a illustrates the transition from a neural structure on the level of verbargument binding to a more syntactical structure, in which one argument (the subject) is placed outside the direct influence of the verb. To achieve this, new kinds of structure assemblies and subassemblies are introduced (Van der Velde and De Kamps, in press). In short, a structure assembly is introduced for ‘sentence’ (S) and subassemblies for noun and verb. S assemblies are bound to NP assemblies by means of their mutual noun subassemblies (with memory circuits), and S assemblies are bound to VP assemblies with their mutual verb subassemblies. The dotted line between the subassemblies for noun and verb signifies the agreement between subject and verb (e.g., boy loves versus boys love), which can be instantiated on the level of the subassemblies for noun and verb. Fig. 5b illustrates the neural structure of (big) John knows that Mary knows (little) John in terms of the neural architecture discussed here. The adjectives
278
Frank van der Velde ‘syntactical structures’
‘lexical structures’
S1
(a)
a N1 John
t n
V1 N2 loves Mary
v
t V1 N2 loves Mary
N1 John S1
n
(b)
N1 John
v V1 knows c C1 that n v N2 Mary
c
V2 N3 knows John
Figure 5: Extention of the architecture to include syntactic structures (a). Binding of a clause as an argument (b). big and little are not included in the representation (for such a representation, see van der Velde and De Kamps, in press). This sentence is used here as an illustration of the use of clauses in this architecture, and as a solution of the ‘problem of 2’ and the massiveness of binding. In addition to the S assembly, another structure assembly for clause (C) is introduced. A C assembly can bind with a VP assembly (or a NP assembly) with their mutual clause subassembly, again in a connection structure that is similar to the agent and theme connection structures illustrated in Fig 3 (all subassembly bindings in the architecture are achieved in this way). A C assembly can also bind with a NP assembly and with a VP assembly by means of their mutual noun or verb subassemblies. Again, agreement between noun and verb can be implemented by means of the noun and verb subassemblies. The word assembly of a complementizer like that can be bound to C assemblies. In line with the previous examples, the first occurrence of John is represented by the binding of the word assembly for John to the NP assembly N1. In turn, this assembly is bound to the VP assembly V1 that represents the first occurrence of the verb know. The assemblies N1 and V1 are bound to S1, which expresses the fact that John is the subject (agent) of knows in John knows (· · ·). Likewise, The word assembly for Mary is bound to the NP assembly N2, and the word assembly for knows is (also) bound to the VP assembly V2. In turn, N2 and V2 are bound to C1. The word assembly for that is also bound to C1,
Neural Compositionality
279
and C1 is bound as a complement to V1. Finally, the word assembly for John is also bound the NP assembly N3, which is bound to the VP assembly V2 for knows by means of their theme subassemblies. This represents the proposition Mary knows John in a manner similar to the representation of Mary loves John illustrated in Fig 1b. The binding of the clause Mary knows John to the VP assembly V1 for knows cannot be achieved by binding the NP assembly for Mary (N2) to the VP assembly V1 for knows by means of their theme subassemblies. This binding would represent the phrase John knows Mary, instead of John knows that (· · ·). Therefore, the ‘clause’ assembly (C) was introduced, which is activated by the occurrence of the complementizer that in combination with the occurrence of a verb like know. Thus the neural control circuits for sentence processing would also be determined by information about the nature of verbs, such as the fact that the argument structure of verbs like know or see allows clauses as arguments. An explicit example of such a control circuit (neural network) can be found in Van der Velde and De Kamps (in press). To answer the binding question “What does John know?”, a reverse process occurs. This binding question would first result in the activation of the VP assembly V1, in the manner described above. Then, the question asks for the theme of know. However, the theme subassembly of V1 is not bound to a NP assembly. But the question “What does John know?” can also ask for a clause as the argument of know, if the nature of this verb is taken into account (as in the process of sentence encoding). Thus, beside the theme subassembly of V1, its clause subassembly can be activated as well. This will result in the activation of C1 by means of its clause subassembly. The activation of C1 can be interpreted (by the neural control circuits that control the process of answering binding questions) that a clause is the answer to the question. 7
Further Developments and Conclusion
In a similar way, the architecture can handle more complex sentences with embedded clauses, like The girl that likes John loves Bill or The girl that John likes loves Bill. In the case of sentences with multiple (center) embeddings, interferences between parts of the neural sentence structure can occur. These interferences can account for the complexity effects with sentences of this kind observed in human language comprehension. Details of these interference effects can be found in Van der Velde and De Kamps (in press). It is obvious that the neural architecture for language proposed here is far from complete. A further development includes representations for syntactic elements and structures like adjectives and adverbs and prepositional phrases
280
Frank van der Velde
(for examples see Van der Velde and De Kamps, in press). These additional features also consist of categorical distinctions, like the distinctions between noun phrases and verb phrases or the distinctions between the thematic roles of agent, theme and recipient. Therefore, these new features are represented in a manner similar to the way in which noun phrases, verb phrases, and the thematic roles are represented. Furthermore, the architecture would have to include an account of the processing and representation of passive sentences and questions, and a more elaborate account of the process of building a sentence structure. An important aspect of the architecture is the solution for the binding problem in the architecture. The binding problem is solved in terms of a process that answers specific binding questions related to the binding at hand. This process consists of a selective flow of activation that occurs in the process of answering a binding question, as illustrated in Fig 4a. In this way, the sentence architecture described here is similar to the architecture for compositional structures in visual cognition described in Van der Velde and De Kamps (2001) and De Kamps and Van der Velde (2001). The binding of features in this visual blackboard architecture (e.g., the binding between color and shape of an object) also consists of a selective flow of activation initiated with a binding question like “What is the color of this shape?”. The fact that this architecture answers binding questions sets it apart from models that seek only a representation of sentence structure in a neural or neural-like manner. An example of the latter is the encoding of sentence structure in the form of a ’holographic reduced representation’ (Plate, 2003). The reason why in this architecture the binding problem is solved in terms of the selective process of activation that produces an answer to a binding question is related with the coordinate system or frame of reference in which the binding problem should be solved. As outside observers, we could see some form of related (e.g., concurrent) activity in brain areas that are involved in processing information in a given task, such as binding the color and shape of visual objects, or binding a noun and a verb in a sentence. But it is not clear that the observed relation in activity is used by these brain areas to solve the binding problem at hand. That is, it is not clear that these brain areas ‘know’ that they are (say) concurrently active with each other, so that they can use that information effectively. What is needed is information that is available within the system itself (instead of only from an outside perspective). A binding question like “What is the color of this shape?” or “What does Mary own (receive)?” probes for information that is available within the system itself, because the system generates behavior when it answers such a question, which it can only do by using information that is available within the system itself. Investigating the process that results in answering binding questions is, in this view, the best way to study (and solve) the issue of binding in combinatorial structures, in-
Neural Compositionality
281
cluding the binding of color and shape and the binding of words in a sentence structure. References De Kamps, M., & Velde, F. Van der. (2001). Using a recurrent network to bind form, color and position into a unified percept. Neurocomputing, 38–40, 523–528. Hebb, D. (1949). The organization of behavior. New York: Wiley. Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: A theory of analogical access and mapping. Psychological Review, 104, 427–466. Jackendoff, R. (2002). Foundations of language. Oxford: OUP. Plate, T. A. (2003). Holographic reduced representation. Stanford: CSLI. Pulverm¨uller, F. (1999). Words in the brain’s language. Behavioral and Brain Sciences, 22, 253–336. Shastri, L., & Ajjanagadde, V. (1993). From simple associations to systematic reasoning: A connectionist representation of rules, variables and dynamic bindings using temporal synchrony. Behavioral and Brain Sciences, 16, 417–494. Velde, F. Van der, & De Kamps, M. (2001). From knowing what to knowing where: Modeling object-based attention with feedback disinhibition of activation. Journal of Cognitive Neuroscience, 13, 479–491. Velde, F. Van der, & De Kamps, M. (2002). Synchrony in the eye of the beholder: An analysis of the role of neural synchronization in cognitive processes. Brain and Mind, 3, 291–312. Velde, F. Van der, & De Kamps, M. (in press). Neural blackboard architectures of combinatorial structures in cognition. Behavioral and Brain Sciences (target article).
Neuronal Synchronization, Covariation, and Compositional Representation Markus Werning This paper tries to unify a widely held semanticist view on the nature of meaning and content with two central neuro-scientific hypotheses on the role of cortical neurons. The semanticist view presupposes the covariation of concepts with their contents as well as the compositionality of meaning and content. On the side of neuroscience, the existence of cortical feature maps and the mechanism of neural synchronization is assumed. A neural correlate of a semantics is proposed that covers not only the propositional case, the case of first order predicate languages, but also modal-logical structures.1 1
Covariation and Compositionality
The semanticist view I appeal to characterizes the triangle between language, mind and world roughly as follows: Linguistic utterances are expressions of meaning. Those meanings are mental representations, which are often called concepts. Concepts again have an external content and this content is responsible for an utterance having reference or denotation. Here is an example: The utterance ‘dog’ expresses a mental concept – let’s call it [dog]. This concept has a certain content and thereby relates the utterance to its denotation in the world: dogs, doghood, sets of dogs or sets of possible dogs, depending on your favorite semantic theory. This story tells us how utterances can be about things in the world or, in other words, how one can speak of dogs by means of the word ‘dog’. Leaving aside what mechanism underlies the relation between mental representation and the production of phonological sequences, semanticists of the kind described endorse the view that the relation between concepts and their Address for correspondence: Department of Philosophy, Heinrich-Heine University D¨usseldorf, Universit¨atsstraße 1, 40225 D¨usseldorf, Germany. E-mail: [email protected]. 1
This paper is designed to supplement and extend earlier publications on this issue (Werning, 2001, 2003, 2005b) and, in parts, builds on them. The Compositionality of Meaning and Content. Volume II: Applications to Linguistics, Psychology, and Neuroscience. Edited by Edouard Machery, Markus Werning & Gerhard Schurz. c
2005 Ontos Verlag. Printed in Germany.
284
Markus Werning
content is some relation of covariation. A concept has the content it has because it co-varies with certain thing and not with others. Co-variation between concepts and their contents is a causal-informational relation of sorts and has been explored in the literature (Fodor, 1992). The sense in which conceptual contents are responsible for linguistic expressions having denotation – I assume for reasons of simplicity – is identity (More complex relations between content and denotation may be viable, too). The denotation of an utterance is identical to the content of the concept the utterance is an expression of. This view is captured by our first hypothesis:2 Hypothesis 1 (Covariation with Content). An expression has the denotation it has because the concept it expresses reliably co-varies with a content that is identical to the expression’s denotation. Since natural languages have a rich constituent structure, it is rather plausible to a assume that the structure of their meanings is complex, too, and that the structure of meanings in some way or another resembles the structure of their expressions. Now, the most simple way to spell out this relation of resemblance is by means of a structural match, in technical terms: a homomorphism. This homomorphism is spelled out by the principle of the compositionality of meaning: Hypothesis 2 (Compositionality of Meaning). The meaning of a complex expression is a syntax-dependent function of the meanings of its syntactic constituents. It would be surprising, furthermore, if the covariation relations between primitive concepts and their contents should not in some way or another contribute to the covariation relations between complex concepts and their contents. The quest for simplicity again leads us to the hypotheses that the contents of the primitive concepts are the sole factors to determine the content of a therefrom combined complex concept. Again, this is just what the principle of compositionality says for contents, our third and last semanticist hypothesis: Hypothesis 3 (Compositionality of Content). The content of a complex concept is a structure-dependent function of the contents of its constituent concepts. 2
I conceive of denotation in a broad sense as modal denotation and distinguish between reference and denotation. The denotation of an expression is a function from possible worlds to the referents of the expression in those possible worlds. The denotation of a sentence p, e.g., is the set of pairs {hw,ti|p has the truth-value t in the world w}, while its referent is the truth-value it has in the actual world. Little depends on the particular view one assumes with regard to the nature of denotation in our context.
Synchronization and Compositionality
285
The three semanticist hypotheses, though not uncontroversial, are at the core of many contemporary semanticist theories. They may thus serve us as a starting point for our reductive project.3 2
Neuronal Reduction
The aim of this paper is to make out a neuronal structure N = hN, ΣN i that fulfills the three semanticist hypotheses: co-variation with content as well as compositionality of meaning and content. The neuronal structure shall consist of a set of neuronal states N and a set of thereon defined operations ΣN . Since the three semanticist hypotheses may serve as (minimal) identity criteria for concepts, their fulfillment by a neuronal structure will justify us in identifying the neuronal structure with a structure of concepts. The three hypotheses hence form the adequacy conditions for a neuronal reduction of concepts. Since a large part of natural language can be paraphrased by means of a formal monadic first order predicate language with identity, the adequacy conditions may be formalized for that case in the following way (In the formalization of compositionality I follow Hodges, 2001; see also Werning, 2004): Principle 1 (Adequacy Conditions). Let PL= = hL, ΣL i be a monadic first order predicate language with identity. Let it comprise the set of grammatical terms L and the syntactic operations of identity, predication, negation, conjunction, disjunction, implication and existential as well as universal quantification:4 ΣL = {σ= , σ pred , σ¬ , σ∧ , σ∨ , σ→ , σ∃ , σ∀ }. Let there furthermore be a denotation function ν :L →W that maps the grammatical terms of PL= onto their denotations and let this function of denotation be compositionally evaluable by a worldly structure of denotations W = hW, ΣW i. 3
My appeal to simplicity arguments in favor of compositionality has to do with my reluctance to accept the three most often cited reasons for compositionality, namely productivity, systematicity, and inferentiality (Werning, 2005a). I am aware that the simplicity arguments lack the force of a strict argument or even a proof. They are therefore marked as hypotheses. 4 For an exemplification of the syntactic operations see the mappings (28) below.
286
Markus Werning
That is: For every syntactic operation σ ∈ ΣL , there is a function νσ ∈ ΣW such that for every non-atomic grammatical term σ (t1 , ...,tn ) ∈ L the following equation holds: ν(σ (t1 , ...,tn )) = νσ (ν(t1 ), ..., ν(tn )). Then any neuronal structure N = hN, ΣN i is a structure of internal representations expressible by PL= if and only if the following three conditions hold: (a) N is a compositional semantics of meanings for PL= , that is: There is a function µ :L→N and for every syntactic operation σ ∈ ΣL , there is a function µσ ∈ ΣN such that for every non-atomic grammatical term σ (t1 , ...,tn ) ∈ L the following equation holds: µ(σ (t1 , ...,tn )) = µσ (µ(t1 ), ..., µ(tn )). (b) N is compositionally evaluable with respect to content, that is: There is a function κ :N →W and for every operation h ∈ ΣN , there is a function κh ∈ ΣW such that for every neuronal element h(m1 , ..., mn ) ∈ N the following equation holds: κ(h(m1 , ..., mn )) = κh (κ(m1 ), ..., κ(mn )). (c) The elements of N reliably co-vary with their contents, the elements of W , such that for every grammatical term t ∈ L the following holds: ν(t) = κ(µ(t)). 3
Neurobiological Evidence
For many feature dimensions (color, orientation, direction, size, etc.) involved in the course of visual processing one can anatomically identify so-called neuronal feature maps (Hubel & Wiesel, 1968). These are clusters of clusters of neurons that exhibit a certain topological organization (see Fig. 1). With regard to one feature dimension, one finds a pinwheel-like structure for each receptive
Synchronization and Compositionality
287
Figure 1: Feature map. Optical image of an orientation map in the primary visual cortex of a macaque. Shadings code the preferred orientations of neurons as indicated by the bars on the right. Three exemplary pin-wheel centers are marked by black crosses. The horizontal extent is 3.3 mm. Adapted from Obermayer and Blasdel (1993). field (i.e., a specific region of the stimulus). This structure is called a hypercolumn. It typically has an extent of about 1 mm2 . For each receptive field or, correspondingly, each hypercolumn, neurons for the entire spectrum of features in the respective feature dimension (e.g., orientation) fan out around a pin-wheel center. Neurons of a hypercolumn with a tuning for one and the same feature (e.g., verticality) form a so-called column. A features map thus is an assembly of hypercolumns, one per receptive field. Neurons of neighboring hypercolumns are selective for properties that are instantiated in neighboring receptive fields on the stimulus. This means, there is some topological correspondence between the neighbor relations of hypercolumns in a feature map and the neighbor relations among receptive fields in the stimulus. Within one hypercolumn we, furthermore, have a topological correspondence between the neighbor relations of columns and the similarity relations of the features for which the neurons of each column select. Neurons of neighboring columns select for similar features. More than 30 so organized cortical areas, which occupy approximately onehalf of the total cortex, are experimentally known to be involved in the visual processing of the monkey (Felleman & van Essen, 1991), less are known for humans. These findings justify the following hypothesis: Hypothesis 4 (Feature maps). There are many cortical areas that function as topologically structured feature maps. They comprise clusters of neurons whose function it is to show activity only when an object in their receptive field instantiates a certain property of the respective feature dimension. The fact that features which belong to different feature dimensions, but may
288
Markus Werning
be properties of the same stimulus object are processed in distinct regions of cortex, poses the problem of how this information is integrated in an objectspecific way. How can it be that the horizontality and the redness of a red horizontal bar are represented in distinct regions of cortex, but still are part of the representation of one and the same object? This is the binding problem in neuroscience (Treisman, 1996). A prominent and experimentally well supported solution postulates neuronal synchronization as a mechanism for binding (von der Malsburg, 1981; Gray, K¨onig, Engel, & Singer, 1989): Neurons that are indicative for different properties sometimes show synchronous activation, but only when the properties indicated are instantiated by the same object in the perceptual field; otherwise they are firing asynchronously. Synchrony, thus, might be regarded to fulfill the task of binding together various property representations in order to form the representation of an object as having these properties. The fact that object-specific synchrony has been measured within columns, within and across hypercolumns, across different feature maps, even across the two hemispheres and on a global scale (for a review see Singer, 1999) supports the following hypothesis: Hypothesis 5 (Synchrony). Neurons of different feature clusters have the function to show synchronous activation only if the properties indicated by each feature cluster are instantiated by the same object in their receptive field. 4
Oscillatory Networks
The two neurobiological hypotheses on neuronal feature maps and synchrony allow us to regard oscillatory networks (see Fig. 2) as a plausible model of informational processes in the visual cortex. The design of oscillatory networks is also supported by principles of Gestalt psychology that govern the representation of objects. According to some of the Gestalt principles, spatially proximal elements with similar features (similar color/orientation/direction/size, etc.) are likely to be perceived as one object or, in other words, represented by one and the same object concept. If, for example, in a field of densely arranged, randomly moving dots a bunch of neighboring dots are moving in the same direction, you are likely to perceive them as one object. If in a field of randomly arranged, varicolored bars, a group is in parallel and of the same color, we see them as belonging together and forming an object of its own. The Gestalt principles are implemented in oscillatory networks by the following mechanism: Oscillators that select input from proximal stimulus elements with like properties tend to synchronize, whereas oscillators that select input
Synchronization and Compositionality
289
from proximal stimulus elements with unlike properties tend to de-synchronize. As a consequence, oscillators selective for proximal stimulus elements with like properties tend to form out a synchronous oscillation when stimulated simultaneously. This oscillation can be regarded as one object concept. In contrast, inputs that contain proximal elements with unlike properties tend to cause antisynchronous oscillations, i.e., different object concepts. A single oscillator consists of two mutually excitatory and inhibitory neurons, each of which represents a population of biological cells. If the number of excitatory and inhibitory biological cells is large enough, the dynamics of each oscillator can be described by two variables x and y. They evolve over time according to the following differential equations: dx = − τx x − gy (y) + L0xx gx (x) + Ix + Nx dt dy = − τy y + gx (x) − Iy + Ny . dt
(1)
Here, τξ (ξ ∈ {x, y}) are constants that can be chosen to match refractory times of biological cells. The gξ are transfer functions that tell how much of the activity of a neuron is transferred to other neurons. The constant L0xx describes self-excitation of the excitatory cell population. Iξ are static external inputs and Nξ variable white noise, which models fluctuation within the cell populations. With Iξ above threshold, the solutions of the system of equations (1) are limitcycle oscillations. For a more detailed description of the network see Maye (2003). Stimulated oscillatory networks, characteristically, show object-specific patterns of synchronized and de-synchronized oscillators within and across feature dimensions. Oscillators that represent properties of the same object synchronize, while oscillators that represent properties of different objects desynchronize. In the simulation we observe that for each represented object a certain oscillation spreads through the network. The oscillation pertains only to oscillators that represent the properties of the object in question. A considerable number of neurobiological studies have by now corroborated the view that cortical neurons are rather plausibly modelled by oscillatory networks (cf. Singer & Gray, 1995; Schillen & K¨onig, 1994; Werning, 2001; and Maye, 2003). Together with the simulations described, these studies suggest that the synchrony of oscillations indicates the sameness of objects and that an oscillation pertaining to a neuronal feature cluster indicates that the object indicated by the oscillation has the featured property.
290
a)
c)
Markus Werning
b)
d)
Figure 2: Oscillatory network. a) A single oscillator consists of an excitatory (x) and an inhibitory (y) neuron. Each model neuron represents the average activity of a cluster of 100 to 200 biological cells. L0xx describes the self-excitation of the excitatory neuron. Ix and Iy amount to external input. b) Synchronizing connections (solid) are realized by mutually excitatory connections between the excitatory neurons and hold between oscillators within one layer. Desynchronizing connections (dotted) are realized by mutually inhibitory connections between the inhibitory neurons and hold between different layers. ‘R’ and ‘G’ denote the red and green channel. The cylinder segments correspond to Hubel and Wiesel’s (1968) columns, whole cylinders to hypercolumns. c) A module for a single feature dimension (e.g., color) consists of a three-dimensional topology of oscillators. There is one layer per feature and each layer is arranged to reflect two-dimensional retinotopic structure. The shaded circles visualize the range of synchronizing (light gray) and desynchronizing (dark gray) connections of a neuron in the top layer (black pixel). d) Two coupled feature modules are shown schematically. The single oscillator in module A has connections to all oscillators in the shaded region of module B. This schema is applied to all other oscillators and feature modules. Reprinted from Werning (2005b) and Maye and Werning (2004).
Synchronization and Compositionality
5
291
Hilbert Space Analysis
Fig. 3a shows a stimulus we presented to an oscillatory network. The network consists of a color module with layers for redness and greenness and an orientation module with layers for verticality and horizontality. In the stimulus the human observer perceives two objects: A red vertical object and a green horizontal objects. Now, how does the network answer to the stimulus? What can we say about the network dynamics? The oscillations spreading through the network can be characterized mathematically: An oscillation function, or more generally the activity function x(t) of an oscillator is the activity of its excitatory neuron as a function of time during a time window [− T2 , + T2 ]. Mathematically speaking, activity functions are vectors in the Hilbert space L2 [− T2 , + T2 ] of in the interval [− T2 , + T2 ] squareintegrable functions. This space has the inner product 0
Z
+T /2
hx(t)|x (t)i =
x(t) x0 (t)dt.
(2)
−T /2
The degree of synchrony between two oscillations lies between −1 and +1 and is defined as their normalized inner product hx|x0 i . hx|xihx0 |x0 i
∆(x, x0 ) = p
(3)
The degree of synchrony, so defined, corresponds to the cosine of the angle between the Hilbert vectors x and x0 . The most important cases are: ∆(x, x0 ) = 1 ⇔ x and x0 are parallel (totally synchronous), ∆(x, x0 ) = 0 ⇔ x and x0 are orthogonal (totally uncorrelated), ∆(x, x0 ) = −1 ⇔ x and x0 are anti-parallel (totally anti-synchronous).
6
Eigenmodes
From synergetics it is well known that the dynamics of complex systems is often governed by a few dominating states. These states are the eigen- or principal modes of the system. The corresponding eigenvalues designate how much of the variance is accounted for by a mode.
292
Markus Werning
a)
b)
c)
Figure 3: a) Stimulus: one vertical red bar and one horizontal green bar. b) The eigenvectors v1 , ..., v4 of the four eigenmodes 1, ..., 4 with the largest eigenvalues are shown in one line. The values of the vector components are coded by colors. The four columns correspond to the four feature layers. Dark shading signifies negative, mid-gray zero and light shading positive components. c) The characteristic functions and eigenvalues for the first four eigenmodes. Reprinted from Werning (2005b).
Synchronization and Compositionality
The overall dynamics of the network is given by the Cartesian vector xA1 (t) ... xA2 (t) . x(t) = .. . xB1 (t) . .. xk (t)
293
(4)
A, B, ... signify the feature dimensions and the subsequent numbers enumerate particular features of the feature dimension in question. The index A1 = 1 marks the beginning of the first feature cluster (e.g., red) in the first feature module (e.g., color), A2 the beginning of the second feature cluster (e.g., green) in the first module, B1 the first cluster (e.g., vertical) in the second module (orientation) and so on. The vector x(t) comprises the activities of the excitatory neurons of all k oscillators of the network after a transient phase and is determined by a solution of the system of differential equations (1). For each eigenmode, the eigenvalue λ and its corresponding eigenvector v are solutions of the eigen-equation for the auto-co-variance matrix C ∈ Rk×k : Cv = Cλ ,
(5)
where the components Ci j of C are determined by the network dynamics x(t) as: Ci j = hxi |x j i.
(6)
The eigenvector v1 of the strongest eigenmode is shown in Fig. 3b and exhibits a significant difference between the two objects in the stimulus. To assess the temporal evolution of the eigenmodes, the notion of a characteristic function ci (t) is introduced. The network state at any instant can be considered as a superposition of the eigenvectors vi weighted by the corresponding characteristic functions ci (t) of Fig. 3c: X x(t) = ci (t)vi . (7) i
The eigenmode analysis separates spatial from temporal variation in the network dynamics. The eigenvectors are constant over time, but account for the varying behavior of the spatially distributed oscillators of the network. In contrast, the characteristic functions are the same for all oscillators, but account for the temporal dynamics of the network as a whole.
294
Markus Werning
In the long run the network exhibits stable oscillatory behavior. As one can guess from Fig. 3c only the first two eigenmodes are stable because the amplitudes of their characteristic functions do not decrease. In contrast, the amplitudes of the characteristic functions of the other eigenmodes in the long run apparently converge to zero. The eigenmodes, for any stimulus, can be ordered along their eigenvalues:5 λi > λi+1 .
(8)
For this reason, I will introduce the useful convention of signifying each eigenmode by the index i ∈ N. For any stimulus we have the mapping: i 7→ hvi , ci (t), λi i, which, for each eigenmode i, renders the i-th eigenvector vi , the corresponding characteristic function ci (t) and the eigenvalue λi . 7
First Steps Into Semantic Interpretation
In this section, I will develop a heuristics that allows us to interpret the dynamics of oscillatory networks in semantic terms. Oscillatory networks that implement the hypotheses of feature maps (Hyp. 4) and synchrony (Hyp. 5), I argue, realize a structure of internal representations expressible by a monadic first order predicate language with identity PL= . Because of Hyp. 5 we are allowed to regard oscillation functions as internal representations of individual objects. They may thus be assigned some of the individual terms of the language PL= . Let Ind = {a1 , ..., am , z1 , ..., zn }
(9)
be the set of individual terms of PL= , then the partial function T T α : Ind → L2 [− , + ] (10) 2 2 be a constant individual assignment of the language. By convention, I will assume for the domain of α, unless indicated otherwise, that dom(α) = {a1 , ..., am }
(11)
so that the a1 , ..., am are individual constants and the z1 , ..., zn are individual variables. Sometimes I will use a, b as placeholders for a1 , ..., am . 5
I assume that the ordering is strict, i.e., none of the eigenvalues is degenerate.
Synchronization and Compositionality
295
Following equation (3), the synchrony of oscillation functions is a matter of degree. The sentence a = b expresses a representational state of the system to the degree the oscillation functions α(a) and α(b) of the system are synchronous. Provided that Cls is the set of sentences of PL= , the degree to which a sentence expresses a representational state of the system, for any eigenmode i ∈ N, can be measured by the (in N possibly partial) function d : Cls × N → [−1, +1]. In case of identity sentences, for every eigenmode i and any individual constants a, b, we have: d(a = b, i) = ∆(α(a), α(b)). (12) When we take a closer look at the first eigenmode of Fig. 3b, we see that most of the vector components are exactly zero (marked by mid-gray). However, few components v1j , v1j0 , ... in the greenness and the horizontality layers are positive (marked by light shading) and few components v1l , v1l 0 , ... in the redness and the verticality layers are negative (marked by dard shading). Since the contribution of the eigenmode to the entire network state is weighted by its characteristic function, the positive component v1j contributes to the activity of x j (t) with +|v1j |c1 (t), while the negative component v1l contributes with −|v1l |c1 (t) to xl (t). Since the ∆-function is normalized, only the signs of the constants matter. The weighted positive components of the eigenmode are all exactly parallel with one another, the weighted negative components are all exactly parallel with one another, but any weighted positive component is exactly anti-parallel to any weighted negative component: ∆(v1j c1 (t), v1j0 c1 (t)) = 1,
(13)
∆(v1l c1 (t), v1l 0 c1 (t)) = 1, ∆(v1j c1 (t), v1l c1 (t)) = −1.
(14) (15)
We may interpret this by saying that the first eigenmode represents two objects as distinct from one another. The representation of the first object is the positive characteristic function +c1 (t) and the representation of the second object is the negative characteristic function −c1 (t). Both, the positive and the negative function can be assigned to individual constants, say a and b, respectively. In the eigenmode analysis we can thus identify sharp representations of objects in the network: the characteristic functions and their negative mirror images. These considerations, for every eigenmode i, justify the general definition: ( +1 if d(a = b, i) = −1, d(¬a = b, i) = −1 if d(a = b, i) > −1. (16)
296
Markus Werning
Notice that unlike identity, its negation is represented by the network as sharp, i.e., non-gradual. Within each eigenmode, at most two objects can be represented as non-identical. As we will see later on, sharpness is a general feature of negation in our semantics as such.6 Following Hyp. 4, clusters of feature selective neurons function as representations of properties. They can be expressed by monadic predicates. I will assume that our language PL= has a set of monadic predicates Pred = {F1 , ..., Fp }
(17)
such that each predicate denotes a property represented by some feature cluster. To every predicate F ∈ Pred I now assign a diagonal matrix β (F) ∈ {0, 1}k×k that, by multiplication with any eigenmode vector vi , renders the sub-vector of those components that belong to the feature cluster expressed by F: β : Pred → {0, 1}k×k .
(18)
With respect to our particular network, the matrix β (red), e.g., is zero everywhere except for the first 4k diagonal elements: 1 0 ··· . 0 . . 1 β (red) = . .. 0 ... 0 ···
0
. .. .
(19)
0
The multiplication of β (red) with the first eigenmode vector v1 gives us the components of v1 for the redness-layer in the color module of the network: 1 1 1 β (red)v = v1 · · · vi( green)−1 0 · · · 0 . (20) Since β (F) is a hardware feature of the network and does neither vary from stimulus to stimulus, nor from eigenmode to eigenmode (and is, modeltheoretically speaking, hence constant in all models), it is sensible to call it the neuronal intension of F. The neuronal intension of a predicate, for every eigenmode, determines what I call its neuronal extension, i.e., the set of those oscillations that the neurons on 6
This evaluation of non-identity is chosen for reasons of consistency with the G¨odel system introduced below.
Synchronization and Compositionality
297
the feature layer contribute to the activity the eigenmode adds to the overall networks dynamics. Unlike the neuronal intension, the neuronal extension varies from stimulus to stimulus and from eigenmode to eigenmode (just as extensions vary from possible world to possible world). Hence, for every predicate F its neuronal extension in the eigenmode i comes to: { f j |f = ci (t)β (F)vi }. (21) Here, the f j are the components of the vector f. The neuronal extension of the predicate red in the first eigenmode in our experimental setting thus comes to the following set of functions – it comprises all those temporally evolving activities the redness-components contribute to the overall network dynamics in the first eigenmode: { f j |f = β (red)v1 c1 (t)} = {v11 c1 (t), ..., v1i( green)−1 c1 (t), 0}.
(22)
To determine to which degree an oscillation function assigned to an individual constant a is in the neuronal extension of a predicate F, we have to compute how synchronous it maximally is with one of the oscillation functions in the neuronal extension. We are, in other words, justified to evaluate the degree to which a predicative sentence Fa expresses a representational state of our system, with respect to the eigenmode i, in the following way: d(Fa, i) = max{∆(α(a), f j )|f = ci (t)β (F)vi }. (23) Having by now provided a semantic evaluation for every atomic sentence of PL= , how can we evaluate the truth-functional connectives? Since we are here dealing with an infinitely many-valued semantics, we have to look at the broader spectrum of fuzzy logics. In those logics the conjunction is semantically evaluated by a t-norm.7 Having once made a choice for a certain t-norm as the semantic correlate of conjunction, the functions of semantic evaluation for most of the other connectives can be derived by systematic considerations (cf. Gottwald, 2001). As will become obvious in the course of the remaining sections, the system that fits my purposes best is G¨odel’s (1932) min-max-logic. Here the conjunction is evaluated by the minimum of the values of the conjuncts, which is a t-norm. Let φ , ψ be sentences of PL= , then, for any eigenmode i, we have: d(φ ∧ ψ, i) = min{d(φ , i), d(ψ, i)}. (24) 7
A binary operation t in the real interval [−1, +1] is called a t-norm if and only if it is (d, d 0 , d 00 ∈ [−1, +1]): (i) associative, i.e., t(d, t(d 0 , d 00 )) = t(t(d, d 0 ), d 00 ); (ii) commutative, i.e., t(d, d 0 ) = t(d 0 , d); (iii) non-decreasing in the first element, i.e., d ≤ d 0 ⇒ t(d, d 00 ) ≤ t(d 0 , d 00 ); and (iv) has 1 as neutral element, i.e., t(d, 1) = d.
298
Markus Werning
The evaluations we have so far introduced allow us to regard the first eigenmode of the network dynamics, which results from stimulation with one red vertical object and one green horizontal object (see Fig. 3), as a representation expressed by the sentence This is a red vertical object and that is a green horizontal object. We only have to assign the individual terms this (= a) and that (= b) to the oscillatory functions −c1 (t) and +c1 (t), respectively, and the predicates red (= R), green (= G), vertical (= V ) and horizontal (= H) to the redness, greenness, verticality and horizontality layers as their neuronal intensions. Simple computation then reveals: d(Ra ∧Va ∧ Gb ∧ Hb ∧ ¬a = b, 1) = 1. 8
(25)
Eigenmodes as Alternative Epistemic Possibilities
So far I have concentrated on a single eigenmode, only. The network, however, generates a multitude of eigenmodes. We tested the representational function of the different eigenmodes by presenting an obviously ambiguous stimulus to the network. The stimulus shown in Fig. 4a can be perceived as two red vertical bars or as one red vertical grating. It turned out that the network was able to disambiguate the stimulus by representing each of the two epistemic possibilities in a stable eigenmode of its own (see Fig. 4b). Eigenmodes, thus, play a similar role for neuronal representation as possible worlds known from Lewis (1986) or Kripke (1980) play for semantics. Like possible worlds, eigenmodes do not interfere with each other because they are mutually orthogonal. Moreover, the identity of oscillation functions (as for rigid designators in Kripke semantics) and of the neuronal intensions of predicates pertains across eigenmodes. We now see that both of the two stable eigenmodes shown in Fig. 4b can be expressed by a disjunctive sentence, if we semantically evaluate disjunction as follows: d(φ ∨ ψ, i) = max{d(φ , i), d(ψ, i)},
(26)
for any sentences φ and ψ of PL= and any eigenmode i. Either of the two eigenmodes i = 1, 2 makes d(φ , i) assume the value +1 if φ is set to the following disjunctive sentence, which says that there is one red vertical object – denoted by a – or two red vertical objects – denoted by b and c: (Ra ∧Va) ∨ (Rb ∧ Rc ∧V b ∧V c ∧ ¬b = c).
Synchronization and Compositionality
299
a)
b)
c)
Figure 4: a) Stimulus: two vertical red bars or one red vertical grating. b) The eigenvectors v1 , ..., v4 of the four eigenmodes 1, ..., 4 with the largest eigenvalues are shown in one line. The first mode represents the stimulus as one red vertical object, while the second mode represents it as two red vertical objects. c) The characteristic functions show the temporal evolution of the first four modes. Only the first two are non-decreasing and thus belong to stable eigenmodes. Reprinted from Werning (2005b).
300
Markus Werning
One only needs to make the following assignments of individual constants to oscillation functions: α(a) = +c1 (t), α(b) = +c2 (t), α(c) = −c2 (t). The choice of the maximum as the semantic evaluation of disjunction is the primary reason for me to prefer the G¨odel system over alternative systems of many-valued logic. In general, many-valued logics semantically evaluate disjunction by a t-conorm of which the maximum-function is an instance.8 For our purposes the maximum is the best choice of a t-conorm because it is the only continuous t-conorm that always takes the value of one of the disjuncts as the value of the disjunction (The proof has to do with the particular nonArchimedian character of the G¨odel t-norm, see Gottwald, 2001, p. 75). Other continuous t-conorms would hence not allow us to treat eigenmodes as independent alternative possibilities. We would not be able to say that a certain disjunction is true because a possibility (i.e., an eigenmode) expressed by one of its disjuncts exists. 9
Making Syntax and Semantics Explicit
We are leaving the heuristic approach now and turn to a formally explicit description of the neuronal semantics realized by oscillatory networks. Let the oscillatory network under consideration have k oscillators. The network dynamics is studied in the time window [− T2 , + T2 ]. For any stable eigenmode i ∈ N, it renders a determinate eigenvector vi , a characteristic function ci (t) and an eigenvalue λi after stimulation. The language to be considered is a monadic first order predicate language with identity (PL= ). Besides the individual terms of Ind and the monadic predicates of Pred, the alphabet of PL= contains the logical constants ∧, ∨, →, ¬, ∃, ∀ and the binary predicate =. Provided we have the constant individual and predicate assignments α and β of (10) and (18), the union γ = α ∪β
(27)
is a comprehensive constant assignment of PL= . The individual terms in the domain of α are individual constants, those not in the domain of α are individual variables. The syntactic operations of the language PL= and the set SF of sentential formulae as their recursive closure can be defined as follows, for 8
A binary operation s in the real interval [−1, +1] is a t-conorm if and only if it is (i) associative, (ii) commutative, (iii) non-decreasing in the first element, and (iv) has −1 as neutral element.
Synchronization and Compositionality
301
arbitrary a, b, z ∈ Ind, F ∈ Pred, and φ , ψ ∈ SF: σ= : (a, b) 7→ a = b; σ pred : (a, F) 7→ Fa; σ¬ : φ 7→ ¬φ ; σ∧ : (φ , ψ) 7→ φ ∧ ψ; σ∨ : (φ , ψ) 7→ φ ∨ ψ; σ→ : (φ , ψ) 7→ φ → ψ; σ∃ : (z, φ ) 7→ ∃zφ ; σ∀ : (z, φ ) 7→ ∀zφ .
(28)
The set of terms of PL= is the union of the sets of individual terms, predicates and sentential formulae of the language. A sentential formula in SF is called a sentence with respect to some constant assignment γ if and only if, under assignment γ, all and only individual terms bound by a quantifier are variables. Any term of PL= is called γ-grammatical if and only if, under assignment γ, it is a predicate, an individual constant, or a sentence. Taking the idea at face value that eigenmodes can be treated like possible worlds (or more neutrally speaking: like models), the relation ‘i neurally models φ to degree d by constant assignment γ’, in symbols i |=dγ φ , for any sentence φ and any real number d ∈ [−1, +1], is then recursively given as follows: Identity: Given any individual constants a, b ∈ Ind ∩ dom(γ), then i |=dγ a = b iff d = ∆(γ(a), γ(b)). Predication: Given any individual constant a ∈ Ind ∩ dom(γ) and any predicate F ∈ Pred, then i |=dγ Fa iff d = max{∆(γ(a), f j )|f = γ(F)vi ci (t)}. Conjunction: Provided that φ , ψ are sentences, then i |=dγ φ ∧ ψ iff d = 0 00 min{d 0 , d 00 | i |=dγ φ and i |=dγ ψ}. Disjunction: Provided that φ , ψ are sentences, then i |=dγ φ ∨ ψ iff d = 0 00 max{d 0 , d 00 | i |=dγ φ and i |=dγ ψ}. Implication: Provided that φ , ψ are sentences, then i |=dγ φ → ψ iff d = 00 000 sup{d 0 ∈ [−1, +1] | min{d 0 , d 00 } ≤ d 000 where i |=dγ φ and i |=dγ ψ}. Negation: Provided that φ is a sentences, then i |=dγ ¬φ iff (i) d = 1 and i |=−1 γ φ 0 or (ii) d = −1 and i |=dγ φ where d 0 < 1. Existential Quantifier: Given any individual variable z ∈ Ind \ dom(γ) and 0 any sentential formula φ ∈ SF, then i |=dγ ∃zφ iff d = sup{d 0 | i |=dγ 0 φ where γ 0 = γ ∪ {hz, xi} and x ∈ L2 [− T2 , + T2 ]}.
302
Markus Werning
Universal Quantifier: Given any individual variable z ∈ Ind \ dom(γ) and any 0 sentential formula φ ∈ SF, then i |=dγ ∀zφ iff d = inf{d 0 | i |=dγ 0 φ where γ 0 = γ ∪ {hz, xi} and x ∈ L2 [− T2 , + T2 ]}. Let me briefly comment on these definitions: Most of them should be familiar from previous sections. The degree d, however, is no longer treated as a function, but as a relatum in the relation |=. The semantic evaluation of negation has previously only been defined for negated identity sentences. The generalized definition, here, is a straightforward application of the G¨odel system.9 An interesting feature of negation in the G¨odel system is that its duplication digitalizes the values of d into +1 and −1. The evaluation of implication, too, follows the G¨odel system. The deeper rationale behind this definition is the adjointness condition, which relates the evaluation of implication to the t-norm (= min, by our choice).10 Calculi for our semantics have been developed in the literature (cf. Gottwald, 2001). As far as propositional logic is concerned, the calculi are in principle those of intuitionist logic.11 Fig. 5 gives a calculus of the G¨odel system for the propositional case. To evaluate existentially quantified formulae, the well-known method of cylindrification (Kreisel & Krivine, 1976, p. 17) is adjusted to the many-valued case. The supremum (sup) takes over the role of existential quantification in the meta-language and can be regarded as the limit case of the maximum-function in an infinite domain. This is analogous to the common idea of regarding the existential quantifier as the limit case of disjunction over an infinity of domain elements. It should be noted that the value of an existentially quantified sentence of the form (∃z)(Fz) measures whether the neurons in the feature cluster expressed by F oscillate. For the evaluation of universally quantified formulae, the method of cylindrification is used and adjusted again. This time the infimum (inf) assumes the role of universal quantification in the meta-language. It can be regarded as the limit case of the minimum for infinite domains in the same way as one might think of the universal quantifier as the limit case for infinite conjunction. To mention a 9
In t-norm based many-valued logics a function n : [−1, +1] → [−1, +1] is generally said to be a negation function if and only if n is non-increasing, n(−1) = 1 and n(1) = −1 (cf. Gottwald, 2001, p. 85). 10 The adjointness condition relates the evaluation of implication, the function i : [−1, +1]2 → [−1, +1], to the t-norm t by the following bi-conditional (cf. Gottwald, 2001, p. 92): d 0 ≤ i(d 00 , d 000 ) ⇔ t(d 0 , d 00 ) ≤ d 000 . 11 G¨odel (1932) developed his min-max-system under the title ‘Zum intuitionistischen Aussagenkalk¨ul’.
Synchronization and Compositionality
303
The following system of axiom schemata provides a propositional calculus for an infinitely many-valued G¨odel system G∞ as chosen in this paper. Its completeness is proven by Gottwald (2001, p. 297). H1 → (H1 ∧ H1 ) (H1 ∧ H2 ) → (H2 ∧ H1 ) (H1 → H2 ) → (H1 ∧ H3 → H2 ∧ H3 ) ((H1 → H2 ) ∧ (H2 → H3 )) → (H1 → H3 ) H1 → (H2 → H1 ) H1 ∧ (H1 → H2 ) → H2 H1 → H1 ∨ H2 H1 ∨ H2 → H2 ∨ H1 (H1 → H3 ) ∧ (H2 → H3 ) → (H1 ∨ H2 → H3 ) ¬H1 → (H1 → H2 ) (H1 → H2 ) ∧ (H1 → ¬H2 ) → ¬H1 (H1 → H2 ) ∨ (H2 → H1 )
(LC1) (LC2) (LC3) (LC4) (LC5) (LC6) (LC7) (LC8) (LC9) (LC10) (LC11) (LC12)
The only rule of inference for the calculus is modus ponens: H1 , H1 → H2 / H2 . Figure 5: Propositional calculus for the G¨odel system. concrete example, the value of a universally quantified implication of the form (∀z)(Fz → F 0 z) can be viewed as providing a measure for the overall synchronization between feature clusters expressed by the predicates F and F 0 . The propositional calculus for the G¨odel system can be extended to capture the first order predicate case. See Fig. 6. With respect to the identity relation one should keep in mind that identity is not absolute but graded. To capture the identity relation, we can nevertheless supplement the first order predicate calculus of Fig. 6 by the axiom schemata of Fig. 7. 10
Compositionality Ratified
In this section I will finally prove that the adequacy conditions for internal representation are fulfilled for oscillatory networks. The work done so far leads us
304
Markus Werning
For the infinitely valued G¨odel system G∞ , the propositional calculus of Fig. 5 can be extended to capture the first order predicate case. This is achieved if one adds as a rule of inference the rule of generalization H /∀xH and if one supplements LC1–LC12 with the following axiom schemata, where the variable x must not occur free in G (cf. Gottwald, 2001, p. 284–5, unfortunately no completeness proof is provided): ∀x(H1 → H2 ) → (∀xH1 → H2 ) (GPL1) ∀x(G → H) → (G → ∀xH) (GPL2) ∀x(H → G) → (∃xH → G) (GPL3) ∀x(H1 → H2 ) → (H1 → ∀xH2 ) (GPL4) ∀xH(x) → H(t|x) for all terms t which are substitutable for x in H (GPL5) H(t|x) → ∃xH(x) for all terms t which are substitutable for x in H (GPL6) Figure 6: First order predicate calculus for the G¨odel system.
For the infinitely valued semantics presented in this paper, the propositional calculus of Fig. 5 plus the first order predicate extension of Fig. 6 can be extended to capture sentences involving the identity relation. Since identity is evaluated by the ∆-function, identity in our case is gradual, but still reflexive (ID1) and symmetric (ID2), however, not transitive. Due to our evaluation of predication, one direction of the Leibniz law, i.e., ID3, also holds. One may thus add the following axiom schemata: ∀x(x = x) ∀x∀y(x = y → y = x) ∀x∀y((x = y ∧ F(x)) → F(y)) for every predicate F Figure 7: Axioms of identity.
(ID1) (ID2) (ID3)
Synchronization and Compositionality
305
directly to the following theorem: Theorem 1 (Compositional Meanings in Oscillatory Networks). Let L be the set of terms of a PL= -language, SF the set of sentential formulae and |= the neuronal model relation. The function µ with domain L is a compositional meaning function of the language if µ, for every t ∈ L, is defined in the following way: ( {hγ, γ(t)i|γ is a constant assignment} if t 6∈ SF, µ(t) = {hγ, i, di|i |=dγ φ } if t ∈ SF. To simply notation, we may stipulate for any γ-grammatical term t: ( γ(t) if t is not a sentence, µγ (t) = {hi, di|hγ, i, di ∈ µ(t)} if t is a sentence.
(29)
Proof. To prove the theorem, one has to show that for any of the syntactic operations σ in (28), there is a semantic operation µσ that satisfies the equation: µ(σ (t1 , ...,tn )) = µσ (µ(t1 ), ..., µ(tn )). (30) To do this for the first six operations, one simply reads the bi-conditionals in the definition of |= as the prescriptions of functions: µ= : (µ(a), µ(b)) 7→ {hγ, i, di|d = ∆(µγ (a), µγ (b))}; µ pred : (µ(a), µ(F)) 7→ {hγ, i, di |d = max{∆(µγ (a), f j )|f = µγ (F)vi ci (t)}}; µ∧ : (µ(φ ), µ(ψ)) 7→ {hγ, i, di |d = min{d 0 , d 00 |hγ, i, d 0 i ∈ µ(φ ), hγ, i, d 00 i ∈ µ(ψ)}}; etc. To attain semantic counterpart operations for σ∃ and σ∀ , we have to apply the method of cylindrification: µ∃ : µ(φ (z)) 7→ {hγ, i, di |∃γ 0 : dom(γ 0 ) = dom(γ) ∪ {z} and hγ 0 , i, di ∈ µ(φ (z))}; µ∀ : µ(φ (z)) 7→ {hγ, i, di |∀γ 0 : dom(γ 0 ) = dom(γ) ∪ {z} ⇒ hγ 0 , i, di ∈ µ(φ (z))}. One easily verifies that equation (30) is satisfied.
306
Markus Werning
Theorem 1 proves that condition (a) of Principle 1 is satisfied. If one holds fix the constant assignment γ and consequently the grammaticality of the terms of the language PL= , and if one regards Lγ as the set of grammatical terms of PL= under assignment γ, one may say that the neuronal structure N = h{γ} × µγ [Lγ ], {µ= , µ pred , µ¬ , µ∧ , µ∨ , µ→ , µ∃ , µ∀ }i compositionally evaluates the language hLγ , {σ= , σ pred , σ¬ , σ∧ , σ∨ , σ→ , σ∃ , σ∀ }i with respect to meaning. The ideal meaning of a term t under assignment γ, µγ1 (t), can be identified with the subset of µγ (t), for which all values d are 1. The formula hi, di ∈ µγ (φ ) can then be read as: The eigenmode i, to degree d, realizes the ideal neuronal meaning of φ under assignment γ. The ideal meaning µγ1 (a) of an individual constant a is henceforth identified with an object concept. Recall that the object concept µγ1 (a) just is the oscillation α(a). The ideal meaning µγ1 (F) of a predicate F is identified with a predicate concept. Notice that µγ1 (F) just is the matrix β (F), which we have called neuronal intension earlier and which identifies a specific cluster of feature-selective neurons. To comply with the condition of co-variation, i.e., condition (c) of Principle 1, we can choose the assignment γ in a way so that the oscillation function γ(a) tracks the object designated by any individual term a. We can, furthermore, make sure that γ(F) is just the cluster of neurons representing the property expressed by the predicate F. In this case, the assignment will be called natural. As we have shown in our simulations, the network dynamics warrants that the neuronal meanings of terms with respect to the natural assignment reliably covary with the terms’ denotations: Fact 1 (Covariation with Content for Oscillatory Networks). Let Γ be an intended external constant assignment for a language PL= with a set of terms L such that Γ maps individual terms and predicates to their intended denotations. Let νΓ , then, be a function that maps each element of the set of Γ-grammatical terms LΓ to its denotation. The architecture of oscillatory networks now warrants that there is a natural neuronal assignment γ of the individual constants and predicates of LΓ (= Lγ ) into the set of neuronal states N of the network and, consequently, a meaning function µγ from Lγ into N, such that meanings co-vary with their contents, or, formally speaking: There is a content function κ : µγ [Lγ ] → νΓ [LΓ ].
Synchronization and Compositionality
307
and νΓ (t) = κ(µγ (t)) for every t ∈ LΓ . Condition (c) of the adequacy conditions in Principle 1 thus turns out to be fulfilled by the network architecture. This is the central result of our simulations and has an explanation in the construction plan of the network. Recall that the construction scheme was chosen not only to match up with neurobiological data, but also to implement the Gestalt principles for object perception. If the co-variation with content is warranted according to Fact 1, the compositionality of content can also be proven: Theorem 2 (Compositional Contents of Oscillatory Networks). Let γ be the natural neuronal, and Γ the intended external assignment of the language PL= with the set of terms L. Let Lγ and LΓ be the set of grammatical terms of PL= with respect to the natural neuronal, respectively, the intended external assignment. Let, furthermore, be Lγ = LΓ . We assume that Lγ (= LΓ ) have a compositional function of denotation ν and µ be a compositional neuronal meaning function with the same domain. Then, in the case of co-variation, the natural neuronal structure N = h{γ} × µγ [Lγ ], {µ= , µ pred , µ¬ , µ∧ , µ∨ , µ→ , µ∃ , µ∀ }i can be compositionally evaluated with respect to content. Proof. Since co-variation is assumed to be the case in the antecedent of the theorem, we have ν = κ ◦ µ. Since Γ is the intended external and γ the natural neuronal assignment, we may set ν := νΓ and µ := µγ . Now, the theorem’s antecedent tells us that the language can be compositionally evaluated with respect to denotation, i.e., there is a function f (= νσ ) for every n-ary syntactic operation σ of the language such that ν(σ (t1 , ...,tn )) = f (ν(t1 ), ..., ν(tn )), which, in the case of co-variation, is equivalent to (κ ◦ µ)(σ (t1 , ...,tn )) = f ((κ ◦ µ)(t1 ), ..., (κ ◦ µ)(tn )). Since the language is compositional with respect to the meaning function µ, there is a function µσ in N for each and every σ of the language such that µσ (µ(t1 ), ..., µ(tn )) = µ(σ (t1 , ...,tn )).
308
Markus Werning
From the former two equations we derive: κ(µσ (µ(t1 ), ..., µ(tn ))) = f (κ(µ(t1 )), ..., κ(µ(tn ))). Since the surjectivity of the meaning function µ warrants that, for every element m of the carrier set of N , there is at least one grammatical term t in Lγ such that m = µ(t), we may finally conclude that, for every n-ary operation h (= µσ ) of N , and every sequence m1 , ..., mn in the domain of h, there is a function κh (= f = νσ ) such that κ(h(m1 , ..., mn )) = κh (κ(m1 ), ..., κ(mn )). The content function is hence proven to be compositional and the condition (b) of the adequacy conditions is fulfilled.
11
Perceptual Necessities
It is sometimes useful to talk not only about what is represented by one eigenmode of the network dynamics, but to talk about what the network dynamics as a whole represents. This must take into account all stable eigenmodes of the network. Each eigenmode, as I have argued earlier, stands for one perceptual or, more generally speaking, one epistemic possibility. If we take the identification of eigenmodes with possibilities – ‘possibility’ always to be read in an epistemic sense – at face value, we can apply Leibniz’s idea that necessity is truth in all possible worlds. We can then say that what the network dynamics represents as a whole is what is represented as necessarily being true by the network dynamics when the network is stimulated with a certain stimulus. If we want to capture what the network dynamics represents as a whole and identify epistemic possibilities with eigenmodes, we thus have to express what is represented as true in all eigenmodes of the network dynamics. Formally, this can be done by use of the necessity operator of modal logic. With φ being a grammatical sentence, we hence write φ to express that the network dynamics represents φ to be necessarily true, given the current epistemic situation, i.e., the current stimulus input. If we hold fix the assignment γ to be the natural neuronal assignment, the four place relation |= reduces to a three place relation. Epistemic necessity with respect to a network dynamics x(t) is now defined as follows:
Synchronization and Compositionality
309
Definition 1 (Epistemic Necessity in the Network). Given a network dynamics x with the set of stable eigenmodes E ⊆ N, then, for every sentence φ of the respective language, φ is true in x if and only if, for all stable eigenmodes i ∈ E, the following holds: i |=1 φ . Likewise epistemic possibility can be defined by the existence of an eigenmode. We write ♦φ just in case there is an eigenmode of the networks dynamics that models φ : Definition 2 (Epistemic Possibility in the Network). Given a network dynamics x with the set of stable eigenmodes E ⊆ N, then, for every sentence φ of the respective language, ♦φ is true in x if and only if, there is a stable eigenmode i ∈ E, such that the following holds: i |=1 φ . We can now apply our newly defined modal notions to describe what the network dynamics represents as a whole when the network is stimulated, e.g., with the stimulus of Fig. 4a. We may assume that i = 1, 2 are the only two stable eigenmodes of the resulting network dynamics x. As one sees in Fig. 4c the characteristic functions of the third and fourth eigenmode are decreasing over time and probably converge to zero. The eigenmodes higher than 2 thus are not stable. A little computation now reveals that (∃x∀y) (Rx ∧V x ∧ ((Ry ∧V y) → y = x)) ∨ (∃x∃y∀z) (¬x = y ∧ Rx ∧V x ∧ Ry ∧V y ∧ ((Rz ∧V z) → (z = x ∨ z = y))) is true in x.12 However, the sentence after the necessity operator just expresses what we are forced to perceive when we look at the ambiguous stimulus of 12
The first eigenmode makes the first disjunct true while the second eigenmode makes the second disjunct true. If we look at the first disjunct, the existential quantifier requires us to search for the oscillation function (the value of x) that makes the evaluation of the subsequent formula supremal, namely 1. This must be an oscillation function parallel to +c1 (t). Only then Rx ∧ V x becomes 1. Looking at the universal quantifier, we have
310
Markus Werning
Fig. 4a, namely, that there is exactly one red vertical object or there are exactly two red vertical objects. The semantics developed thus pretty well accommodates the phenomenological facts. 12
Conclusion
Oscillatory networks show how a structure of the cortex can be analyzed so that elements of this structure can be identified with mental concepts. These cortical states can be regarded as the neuronal meanings of predicative expressions. As meanings they form a compositional semantics for a language. As concepts they can themselves be evaluated compositionally with respect to external content. The approach formulated in this paper is biologically rather well-founded. It is supported by a rich number of neurophysiological and psycho-physical data and is underpinned by various computer simulations. Compared to connectionist alternatives (Smolensky, 1991/1995; Shastri & Ajjanagadde, 1993; Plate, 1995; van der Velde & de Kamps, 2006), the architecture proposed for large parts of the cortex in this paper is advantageous in that it not only implements a compositional semantics of meanings, but shows how internal representations can co-vary with external contents. As a consequence, the internal conceptual structure can itself be externally evaluated in a compositional way. It thus becomes transparent how concepts can have content and how they thereby mediate between utterances and their denotations. Oscillatory networks and their biological correlates may be assigned a central role at the interface between language and mind, and between mind and world. This is due to the quasi-perceptual capabilities of oscillatory networks, which alternative connectionist models for semantic implementations lack completely. Linking oscillatory networks to mechanisms for the production of phonological sequences remains a challenge for future investigations. The theory developed here amounts to a new mathematical description of the time-structure the cortex is believed to exhibit. Neuronal synchronization plays an essential role not only for binding, but, generally, for the generation of compositional representations in the brain. to ask whether any oscillation function (evaluating y) other than one parallel to +c1 (t) could make the antecedent of the value of the implication Ry ∧V y greater than the value of the consequent. Only then the value of the implication would be less than 1. The answer is no because all non-zero components v1j of the eigenvector are positive and pertain to the redness or verticality layer. Their contributions to the network dynamics v1j c1 (t) are hence parallel to +c1 (t) such that d(x = y, 1) would be 1. Assigning y to the constant zero-function would also leave the value of the implication at 1. For, in that case the values of the antecedent and the consequent would be equally 0. The evaluation of the second disjunct follows similar considerations.
Synchronization and Compositionality
311
References Felleman, D. J., & van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1–47. Fodor, J. (1992). A theory of content and other essays. Cambridge, MA: MIT Press. G¨odel, K. (1932). Zum intuitionistischen Aussagenkalk¨ul. Anzeiger Akademie der Wissenschaften Wien, 69(Math.-nat. Klasse), 65–66. Gottwald, S. (2001). A treatise on many-valued logics. Baldock: Research Studies Press. Gray, C., K¨onig, P., Engel, A. K., & Singer, W. (1989). Oscilliatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature, 338, 334–7. Hodges, W. (2001). Formal features of compositionality. Journal of Logic, Language and Information, 10, 7–28. Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195, 215–243. Kreisel, G., & Krivine, J. L. (1976). Elements of mathematical logic. Model theory (Vol. 2). Amsterdam: North-Holland. Kripke, S. (1980). Naming and necessity. Cambridge, MA: Harvard. Lewis, D. (1986). On the plurality of worlds. Oxford: Blackwell. Maye, A. (2003). Correlated neuronal activity can represent multiple binding solutions. Neurocomputing, 52–54, 73–77. Maye, A., & Werning, M. (2004). Temporal binding of non-uniform objects. Neurocomputing, 58–60, 941–8. Obermayer, K., & Blasdel, G. G. (1993). Geometry of orientation and ocular dominance columns in monkey striate cortex. Journal of Neuroscience, 13, 4114–29. Plate, T. (1995). Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3), 623–41. Schillen, T. B., & K¨onig, P. (1994). Binding by temporal structure in multiple feature domains of an oscillatory neuronal network. Biological Cybernetics, 70, 397–405.
312
Markus Werning
Shastri, L., & Ajjanagadde, V. (1993). From simple associations to systematic reasoning: A connectionist representation of rules, variables, and dynamic bindings using temporal synchrony. Behavioral and Brain Sciences, 16, 417–94. Singer, W. (1999). Neuronal synchrony: A versatile code for the definition of relations? Neuron, 24, 49–65. Singer, W., & Gray, C. M. (1995). Visual feature integration and the temporal correlation hypothesis. Annual Review of Neuroscience, 18, 555–86. Smolensky, P. (1995). Connectionism, constituency and the language of thought. In C. Macdonald & G. Macdonald (Eds.), Connectionism (pp. 164–198). Cambridge, MA: Blackwell. (Original work published 1991) Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–8. van der Velde, F., & de Kamps, M. (2006). Neural blackboard architectures of combinatorial structures in cognition. Behavioral and Brain Sciences. (In press) von der Malsburg, C. (1981). The correlation theory of brain function (Internal Report Nos. 81–2). G¨ottingen: MPI for Biophysical Chemistry. Werning, M. (2001). How to solve the problem of compositionality by oscillatory networks. In J. D. Moore & K. Stenning (Eds.), Proceedings of the twenty-third annual conference of the cognitive science society (pp. 1094– 1099). London: Lawrence Erlbaum Associates. Werning, M. (2003). Synchrony and composition: Toward a cognitive architecture between classicism and connectionism. In B. L¨owe, W. Malzkorn, & T. Raesch (Eds.), Applications of mathematical logic in philosophy and linguistics (pp. 261–78). Dordrecht: Kluwer. Werning, M. (2004). Compositionaltity, context, categories and the indeterminacy of translation. Erkenntnis, 60, 145–78. Werning, M. (2005a). Right and wrong reasons for compositionality. In M. Werning, E. Machery, & G. Schurz (Eds.), The compositionality of meaning and content (Vol. I: Foundational Issues, pp. 285–309). Frankfurt: Ontos Verlag. Werning, M. (2005b). The temporal dimension of thought: Cortical foundations of predicative representation. Synthese, 146(1/2), 203–24.