Interfaces: Explorations in Logic, Language and Computation: ESSLLI 2008 and ESSLLI 2009 Student Sessions, Selected Papers (Lecture Notes in Computer Science, 6211) 9783642147289, 3642147283

The European Summer School in Logic, Language and Information (ESSLLI) takes place every year, each time at a di?erent l

128 7 3MB

English Pages 174 [175] Year 2010

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title
Preface
Organization
Table of Contents
Semantics and Pragmatics
Can DP Be a Scope Island?
Introduction
Sauerland’s Data
Modal Intervention
Antecedent-Contained Deletion
Negation Intervention
Larson’s Generalization and Constraints on QR
Superiority
Problems with the Account
Surface Scope
Reliance on Covert Clausal Syntax
Double-Object Behavior in Intensional Cases
Under-Generation Issues
Re-evaluating Sauerland’s Data
Modal Intervention?
Negation Intervention?
ACD and Scope Shift
Conclusions
References
Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation
Semantic Meaning and Credible Information in Signaling Games
Credibility and Pragmatic Inference
The IBR Model and Its Assumptions
Assumptions: Focal Meaning and Bounded Rationality
Beliefs and Best Responses
Strategic Types and the IBR Sequence
Credibility and Inference
Discussion
References
What Makes a Knight?
Introduction
Preliminaries
The Language \pounds
The Denotation Function \pi
Factual Valuations and 4 Valuations
Kripke Correctness
Worlds
Assertoric Rules
Constructing a Knight Function
Assertoric Semantics and Games
Expansions
Outcomes
Inducing a 4 Valuation via Closure Conditions
The Cyclical Closure Conditions and \nu
The Compositionality Condition
The Predicates N and B
The Knight Function \kappa
Solving Logic Puzzles with \kappa
The Three Roads Riddle
The Four Roads Riddle
Conclusion
References
The Algebraic Structure of Amounts:Evidence from Comparatives
Introduction
Two Puzzles
The Plan
Comparatives and Weak Islands
Preliminaries
Quantificational Interveners and the Heim-Kennedy Constraint
Similarities between Weak Islands and Comparative Scope
Comparative Scope and the Algebraic Structure of Amounts
Szabolcsi & Zwarts’ (1993) Theory of Weak Islands
Extending the Account to Comparatives
Maximum Readings of Existential Quantifiers and Their Kin
Conjunction and Disjunction in the Comparative Complement
Modals and Intensional Verbs
Comparison with Related Proposals
Schwarzschild & Wilkinson
Fox & Hackl
Conclusion
References
Mathematical Linguistics
Extraction in the Lambek-Grishin Calculus
Categorial Analyses
Types for Extraction
The Lambek-Grishin Calculus
FormalSemantics
Derivational Semantics
Lexical Semantics
Discussion
References
Formal Parameters of PhonologyFrom Government Phonology to SPE
A Weak Theory of Phonology — Government Phonology
Informal Overview
Formalization in Modal Logic
The Parameters of Phonological Theories
Elaborate Spreading — Increasing the Generative Capacity
Feature Systems
Syllable Template
Representations versus Derivations
Conclusion
References
Applied Computational Linguistics
Variable Selection in Logistic Regression:The British English Dative Alternation
Introduction
Related Work
The Dative Alternation
Variable Selection in Logistic Regression
Data
Method
Explanatory Features
Variable Selection
Results
Mixed Models
Models without a Random Effect
Discussion and Conclusion
References
A Salience-Driven Approach to Speech Recognition for Human-Robot Interaction
Introduction
Background
Approach
Salience Modeling
Cross-Modal Salience Model
Lexical Activation
Language Modeling
Corpus Generation
Salience-Driven, Class-Based Language Models
Evaluation
Evaluation Procedure
Results
Analysis
Conclusion
References
Language Technologies for Instructional Resources in Bulgarian
Introduction and Related Work
Workbench Description
Data Processing
TheExperiment
Key Terms Suggestion
Question Generation
Distractor Generation
Evaluation
Conclusion and Future Work
References
Logic and Computation
Description Logics for Relative Terminologies
Introduction
Representation Language
Description Logic $ALC$
Description Logic $C$_ALC$
Representation of Relative Terminologies
Reasoning with Comparison Classes
Tableau Decision Procedure
Computational Complexity
Related Work
Conclusions and Future Work
References
Appendix
Cdiprover3: A Tool for Proving Derivational Complexities of Term Rewriting Systems
Introduction
Term Rewriting
Used Termination Proof Methods
Polynomial Interpretations
Context Dependent Interpretations
Implementation
Using cdiprover3
Discussion
References
POP* and Semantic Labeling Using SAT
Introduction
The Polynomial Path Order
A Propositional Encoding of POP and Finite Semantic Labeling
Experimental Results
Conclusion
References
Author Index
Recommend Papers

Interfaces: Explorations in Logic, Language and Computation: ESSLLI 2008 and ESSLLI 2009 Student Sessions, Selected Papers (Lecture Notes in Computer Science, 6211)
 9783642147289, 3642147283

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Artificial Intelligence

6211

Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science FoLLI Publications on Logic, Language and Information Editors-in-Chief Luigia Carlucci Aiello, University of Rome "La Sapienza", Italy Michael Moortgat, University of Utrecht, The Netherlands Maarten de Rijke, University of Amsterdam, The Netherlands

Editorial Board Carlos Areces, INRIA Lorraine, France Nicholas Asher, University of Texas at Austin, TX, USA Johan van Benthem, University of Amsterdam, The Netherlands Raffaella Bernardi, Free University of Bozen-Bolzano, Italy Antal van den Bosch, Tilburg University, The Netherlands Paul Buitelaar, DFKI, Saarbrücken, Germany Diego Calvanese, Free University of Bozen-Bolzano, Italy Ann Copestake, University of Cambridge, United Kingdom Robert Dale, Macquarie University, Sydney, Australia Luis Fariñas, IRIT, Toulouse, France Claire Gardent, INRIA Lorraine, France Rajeev Goré, Australian National University, Canberra, Australia Reiner Hähnle, Chalmers University of Technology, Göteborg, Sweden Wilfrid Hodges, Queen Mary, University of London, United Kingdom Carsten Lutz, Dresden University of Technology, Germany Christopher Manning, Stanford University, CA, USA Valeria de Paiva, Palo Alto Research Center, CA, USA Martha Palmer, University of Pennsylvania, PA, USA Alberto Policriti, University of Udine, Italy James Rogers, Earlham College, Richmond, IN, USA Francesca Rossi, University of Padua, Italy Yde Venema, University of Amsterdam, The Netherlands Bonnie Webber, University of Edinburgh, Scotland, United Kingdom Ian H. Witten, University of Waikato, New Zealand

Thomas Icard Reinhard Muskens (Eds.)

Interfaces: Explorations in Logic, Language and Computation ESSLLI 2008 and ESSLLI 2009 Student Sessions, Selected Papers

13

Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Thomas Icard Stanford University Stanford, CA, USA E-mail: [email protected] Reinhard Muskens Tilburg University Tilburg, The Netherlands E-mail: [email protected]

Library of Congress Control Number: 2010931167

CR Subject Classification (1998): F.4.1, F.3, F.4, I.2.3, I.2, D.3 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13

0302-9743 3-642-14728-3 Springer Berlin Heidelberg New York 978-3-642-14728-9 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180 543210

Preface

The European Summer School in Logic, Language and Information (ESSLLI) takes place every year, each time at a different location in Europe. With its focus on the large interdisciplinary area where linguistics, logic and computation converge, it has become very popular since it started in 1989, attracting large crowds of students. ESSLLI is where everyone in the field meets, teaches, takes courses, gives talks, dances all night, and generally has a good time. One of the enjoyable features of the School is its recurring Student Session, organized by students along the lines of a conference. The speakers are students too, who are eager to get a chance to present their work. They face stiff competition to get their talks accepted, as the number of papers that is sent in each year is high and acceptance rates low. In my experience many of the selected talks contain fresh and surprising insights and are a pleasure to attend. But the reader may judge the quality of the Student Session for himself, as this volume contains a selection of papers from its 2008 and 2009 installments, the first held in Hamburg, the second in Bordeaux. The book is divided into four parts. – – – –

Semantics and Pragmatics Mathematical Linguistics Applied Computational Linguistics Logic and Computation

The first two of these present work in the intersection of logic (broadly conceived) and different parts of linguistics, the third contains papers on the interface of linguistics and computation, while the fourth, as its name suggests, deals with logic and computation. The reader will see a connection with the Venn diagram that functions as ESSLLI’s logo. Let me finish by thanking everyone who contributed to making the 2008 and 2009 Student Sessions the successes they were: Kata Balogh, who chaired the 2008 Session, and Thomas Icard, who chaired that of 2009; their Co-chairs Manuel Kirschner, Salvador Mascarenhas, Laia Mayol, Bruno Mery, Ji Ruan, and Marija Slavkovik; all referees and area experts; the speakers, of course; and last but certainly not least Springer, for generously making available the Springer Best Paper Awards. May 2010

Reinhard Muskens

Organization

The ESSLLI Student Session is part of the European Summer School for Logic, Language, and Information, organized by the Association for Logic, Language, and Information. This volume contains papers from the 2008 Student Session in Hamburg and the 2009 Student Session in Bordeaux.

2008 Student Session Chair

Kata Balogh (Amsterdam)

Co-chair Logic and Language

Laia Mayol (Pennsylvania)

Co-chair Logic and Computation

Ji Ruan (Liverpool)

Co-chair Language and Computation

Manuel Kirschner (Bozen-Bolzano)

Area Experts

Anke L¨ udeling (Berlin) Paul Egr´e (Paris) Guram Bezhanishvili (New Mexico) Alexander Rabinovich (Tel Aviv) Rineke Verbrugge (Groningen)

2009 Student Session Chair

Thomas Icard (Stanford)

Co-chair Logic and Language

Salvador Mascarenhas (New York)

Co-chair Logic and Computation

Marija Slavkovik (Luxembourg)

Co-chair Language and Computation

Bruno Mery (Bordeaux)

Area Experts

Reinhard Muskens (Tilburg) Nathan Klinedinst (London) Makoto Kanazawa (Tokyo) Jens Michaelis (Bielefeld) Arnon Avron (Tel Aviv) Alexandru Baltag (Oxford)

Table of Contents

Semantics and Pragmatics Can DP Be a Scope Island? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon Charlow Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Franke

1

13

What Makes a Knight? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan Wintein

25

The Algebraic Structure of Amounts: Evidence from Comparatives . . . . . Daniel Lassiter

38

Mathematical Linguistics Extraction in the Lambek-Grishin Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . Arno Bastenhof Formal Parameters of Phonology: From Government Phonology to SPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Graf

57

72

Applied Computational Linguistics Variable Selection in Logistic Regression: The British English Dative Alternation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daphne Theijssen

87

A Salience-Driven Approach to Speech Recognition for Human-Robot Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pierre Lison

102

Language Technologies for Instructional Resources in Bulgarian . . . . . . . . Ivelina Nikolova

114

Logic and Computation Description Logics for Relative Terminologies . . . . . . . . . . . . . . . . . . . . . . . . Szymon Klarman

124

VIII

Table of Contents

Cdiprover3: A Tool for Proving Derivational Complexities of Term Rewriting Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Schnabl

142

POP* and Semantic Labeling Using SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Avanzini

155

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

167

Can DP Be a Scope Island? Simon Charlow New York University

1

Introduction

Sauerland [1] uses data from inverse linking—cf. [2]—to motivate quantifier raising (QR) out of DP, proposing to derive Larson’s generalization—cf. [3]— regarding the scopal integrity of DP via an Economy-based constraint on QR (cf. [4]). This squib is in four parts. I first lay out Sauerland’s three arguments for QR out of DP. I present (a slightly modified version of) his mechanism for constraining QR. I show that it both over- and under- generates. I conclude by arguing that the readings Sauerland uses to motivate his account don’t result from an islandrespecting QR mechanism. In short, each of the cases Sauerland considers involve DPs with “special” scopal properties: plural demonstrative DPs, bare plural DPs, and antecedent-contained deletion (ACD)-hosting DPs. The argument that apparent wide-scope readings of plural demonstratives are only apparent is motivated using (so far as I know) new data from English, while the latter two cases receive independent motivation from the literature. The conclusion is that the question posed in the title of this paper can be answered in the affirmative.

2

Sauerland’s Data

2.1

Modal Intervention

Sauerland points out that (1) can be true if Mary doesn’t have any specific individuals in mind and doesn’t want to get married twice (say she’s placed a classified ad indicating she wants to meet and marry a man from either Finland or Norway): (1) Mary wants to marry someone from these two countries. ([1]’s ex. 8a)

Sauerland concludes that (a) the non-specificity of Mary’s desire suggests that the indefinite remains within the scope of the bouletic operator O, and (b) the fact that Mary needn’t desire two marriages requires that these two countries be outside the scope of O. In sum, the scope ordering is 2 > O > ∃ and requires QR out of DP: (2) [these two  countries]x [Mary wants [λw . PRO marry [someone from x] in w ]] 

Thanks to Chris Barker, Emma Cunningham, Polly Jacobson, Ezra Keshet, Philippe Schlenker, Uli Sauerland, Anna Szabolcsi, and an anonymous reviewer. This work was supported in part by NSF grant BCS-0902671 to Philippe Schlenker and an NSF Graduate Research Fellowship to the author.

T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 1–12, 2010. c Springer-Verlag Berlin Heidelberg 2010 

2

S. Charlow

The first of these points seems correct. If the semantics of want involves quantification over want-worlds w , this scope ordering entails that the individual Mary  marries can vary with each wMary . This derives a non-specific desire. Point (b) is more subtle. Depending on the semantics assigned to want, scoping two over it may be insufficient to derive the disjunctive reading. In brief: the existence of two individuals x such that Mary marries x in each of her want-worlds w still requires, on a naïve semantics for want, that Mary marry twice in each w . More sophisticated semantics for want—e.g. [5]—may obviate this worry. Grant that scoping two over want can derive the disjunctive reading. Sauerland point then requires that leaving two within the scope of O be incompatible with a disjunctive desire. We return to this below. 2.2

Antecedent-Contained Deletion

Sauerland corroborates this observation by noting the grammaticality of “wide” non-specific readings of ACD constructions like (3): (3) Mary wants to marry someone from every country Barry does.

(3) can be true if (a) neither Mary nor Barry has anyone specific in mind and (b) the ACD is resolved “wide”—viz. anaphoric to the larger VP want to... As before, the first of these points suggests that the indefinite remains within the scope of want. Additionally, standard assumptions require that the DP containing the ellipsis site QR past the verb heading the antecedent VP in order to resolve the antecedent-containment paradox—cf. [6]. The scope ordering ∀ > O > ∃ again requires QR out of DP. 2.3

Negation Intervention

Finally, Sauerland echoes [7]’s observation regarding (4): (4) John didn’t see pictures of several baseball players.

(4) has a reading judged true if there are several baseball players x such that John didn’t see any pictures of x—several > ¬ > ∃. Sauerland assumes that the scope of the existential quantifier marks the LF position of the bare plural (a proposition I dispute below) and safely establishes that the cardinal indefinite occupies an LF position above negation. The by-now-familiar conclusion is that this reading requires QR out of DP.

3

Larson’s Generalization and Constraints on QR

[3] observes that a QP external to a DP X must scope either below or above all scopal elements in X (i.e. no interleaved scope): (5) Three men danced with a woman from every city. (*∀ > 3 > ∃) (6) Several students ate a piece of every pie. (*∀ > several > ∃)

Can DP Be a Scope Island?

3

The conclusion usually drawn from this datum is that QR out of DP is illicit. Inverse linking instead results from QR of the embedded QP to a DP-adjunction position: (7) [DP [everycity]x [DP someone from x ]] left

This approach is adopted in e.g. [6,8,9]. If QR into DP is likewise illicit, Larson’s generalization is derived. 3.1

Superiority

Sauerland rejects DP’s scope island-hood, arguing that subjecting QR to Superiority in the sense of [10,4] accounts for generalizations (8) and (9). (8) QP1 [QP2 [QP3 ]]  *QP3 > QP1 > QP2 ([3]) (9) O [DP [QP]]  QP> O > DP (Sauerland)

We won’t dwell on the syntactic details of Sauerland’s account here. It will be sufficient to note that Sauerland allows the relative scope of two QPs to be reversed iff this reversal is required for interpretation. This is effected by ordering QR of higher QPs before QR of lower QPs and requiring that QR be to the nearest node of type t (thus the lower QP in general lands below the higher one).1 “Canonical” inverse scope is derived by total reconstruction of the subject QP (which Sauerland conceptualizes in terms of subject movement at PF). Sauerland assumes that absent DP-internal clausal syntax, DP-embedded QPs are uninterpretable in situ. Accordingly, they QR to the nearest node of type t. If the embedding DP is quantificational, this entails a scope inversion (note that surface scope readings of inversely linked constructions are predicted impossible, something we revisit in §4.1). If the QP-containing DP is itself uninterpretable— e.g. in object position—a proper characterization of Superiority (cf. [4,12]) requires that it QR before the embedded QP. This is all that’s needed to derive Larson’s generalization in the extensional case. (10) [vP [QP1 three men] danced with [QP2 a woman from [QP3 every city]] ]

Two scenarios are possible. Either (a) QP1 moves to [Spec,TP] at LF (it QRs), or (b) it moves there at PF (it doesn’t). In the first case each QP QRs. The only inversion required for interpretation is between QP2 and QP3 , and so the scope ordering 1 > 3 > 2 is derived. In scenario (b) QR applies twice. One inversion comes for free (QP1 and QP2 ; since QP1 doesn’t raise, QP2 goes to the nearest node of type t—viz. above QP1 ), and one is required for interpretation (QP2 and QP3 ). Superiority also requires that QP2 raise over QP1 before QP3 raises out of QP2 . Thus the scope ordering 3 > 2 > 1 is derived. In both scenarios QP2 and QP3 scope together relative to QP1 . Non-QP Operators. The following constructions replace the subject QP with an intensional operator/negation: 1

This oversimplifies the mechanism [4] and Sauerland propose. I don’t think this affects any of my points. See [11,12] for further discussion.

4

S. Charlow

(11) Mary wants [TP PRO to marry [QP1 someone from [QP2 these two countries]] ] (12) [NegP not [vP John see [QP1 pictures of [QP2 several baseball players]] ]]

Both structures require QR of QP1 and QP2 to a TP/vP-adjunction position for interpretation. QP2 may subsequently continue climbing the tree. It’s free to raise over want/not; Superiority doesn’t come into play since these aren’t QPs. Thus the scope ordering 2 > O > 1 is derived (similarly, the ACD example is predicted grammatical).

4 4.1

Problems with the Account Surface Scope

As noted in §3.1, Sauerland’s account predicts that a DP-embedded QP can never scope inside its embedding DP. (13) John bought a picture of every player on the team. ([11]’s ex. 40a) (14) John bought a picture of each player on the team. ([11]’s ex. 40b) (15) Everyone/no one from a foreign country eats sushi. (after [13] 221, ex. 1)

As [11] notes, example (13) has a reading on which it’s true if John bought a single picture with everyone in it and false if he bought many individual pictures but no single picture with everyone. Though this reading seems to require surface scope (viz. ∃ > ∀) [11] suggests it may stem from a “group interpretation” of wide-scoping every player on the team—i.e. roughly equivalent to all the players on the team. If a group interpretation is unavailable for e.g. each player on the team in (14), [11] argues, we have an explanation for why surface scope (viz. ∃ > ∀) is “unavailable” here. A few comments are in order. First, the ungrammaticality of ∃ > ∀ in (14) is actually not clear. Though the surface-scope reading may be marked, this follows from each’s oft-noted strong preference for wide scope. Second, the grammaticality of (15) on its surface-scoping reading—viz. every/no x such that x is from a foreign country is such that x eats sushi (∀/¬∃ > ∃)—cannot be answered by appeal to group interpretations of the embedded QP. A theory of inverse linking must, it seems, account for “surface” linking. Absent an ad hoc appeal to abstract clausal syntax inside DP, [11,1] cannot. 4.2

Reliance on Covert Clausal Syntax

[8,3] observe that QPs embedded in nominal intensional complements can be read de dicto: (16) Max needs a lock of mane from every unicorn in an enchanted forest.

(16) ([3]’s ex. 4a) has a reading on which it’s true if Max is trying to perform a spell which requires him to pick an enchanted forest and then procure a lock of mane from every unicorn in it. Max’s need in this scenario is nonspecific with respect to both unicorns and locks of mane, suggesting that each QP remains within the scope of the intensional verb need.

Can DP Be a Scope Island?

5

The DP-as-scope-island approach to inverse linking predicts this state of affairs. QR of the embedded QP targets the DP-internal adjunction site rather than the nearest node of type t. The embedded QP can—indeed must—remain within the scope of need. Something more needs to be said on Sauerland’s account. Following [14] he proposes that intensional transitives take abstractly clausal complements. Informally, the syntax of (16) is something like Max needs PRO to have... The infinitive clause offers a type-t landing site for the embedded QP below need. Abstract clausal syntax in complements of intensional transitives is thus an essential feature of Sauerland’s account. 4.3

Double-Object Behavior in Intensional Cases

Surprisingly, Sauerland’s account predicts that though inversely linked DPs in extensional contexts obey Larson’s generalization, those in intensional contexts do not. Compare the following two cases:2 (17) Two students want to read a book by every author. (*∀ > 2 > ∃) (18) Two boys gave every girl a flower. (∀ > 2 > ∃)

Example (17) lacks the starred reading—unsurprisingly given Larson’s generalization. Example (18)—discussed by Bruening in unpublished work, and given as [11]’s ex. 49—permits an intervening scope reading (i.e. on which boys vary with girls and flowers with boys). (19) [QP1 two students] want [ [QP3 every author]x [ [QP2 a book by x]y [PRO to read y] ] ] (20) [QP1 two boys] gave [QP3 every girl] [QP2 a flower]

(19) and (20) represent intermediate steps in the derivations of (17) and (18), respectively. In (19) QP2 has raised from object position, and QP3 has raised out of QP2 . The difficulty for Sauerland here is that (19) and (20) are predicted to license the same subsequent movements—the numbering is intended to highlight this. If in both cases QP1 moves only at PF we may derive the following structures: (21) [QP3 every pie]x [ [QP1 two students] want [ x [[QP2 a piece of x]y [PRO to eat y]] ]] ] (22) [QP3 every girl]x [ [QP1 two boys] gave x [QP2 a flower] ]

In short, both (17) and (18) are predicted to permit 3 > 1 > 2. While this is a good result for (18), it’s a bad one for (17). Note, moreover, that appealing to the obligatoriness of the QR in (22) as compared to the non-obligatoriness of the QR in (21) won’t help: (23) A (different) child needed every toy. (∀ > ∃) (24) Two boys want to give every girl a flower. (∀ > 2 > ∃)

(23) possesses an inverse-scope reading (on which children vary with toys), and (24) possesses the interleaved scope reading that (17) lacks. As per Sauerland’s assumption, the syntax of (23) is actually as in (25): 2

I thank an anonymous reviewer for comments which helped me sharpen this point.

6

S. Charlow

(25) [a (different) child] needed [PRO to have [every toy] ] (26) [two boys] want [PRO to give [every girl] [a flower] ]

QR of every toy/girl above the subject isn’t obligatory in either case. In both instances obligatory QR targets a position below the intensional verb (and thus below the subject QP). In short, Sauerland needs to allow non-obligatory QR to reorder subject and object QPs. Ruling this mechanism out in order to save (17) dooms (23) and (24). 4.4

Under-Generation Issues

The following constructions are grammatical when the ECM indefinite scopes below the matrix-clause intensional operator O (evidenced by NPI grammaticality in 27 and a nonspecifically construed indefinite in 28) and the bracketed QP scopes above O:3 (27) Frege refused to let any students search for proofs of [at least 597 provable theorems] (28) Frege wanted many students to desire clear proofs of [every theorem Russell did]

In (27) the bracketed QP can be (indeed, on its most salient reading is) construed de re. Frege need never have wanted anything pertaining de dicto to ≥597 provable theorems. Say he made a habit of dissuading students from searching for proofs of theorems he considered unprovable, but by our reckoning he engaged in no fewer than 597 erroneous dissuasions. For Sauerland this requires 597 > refused. In (28) wide ACD resolution is permitted. As Sauerland observes, this suggests that the bracketed QP scopes at least as high as wanted. Both of these “wide” readings are compatible with O > ∃, a situation Superiority predicts impossible. Obligatory QR of the bracketed QPs in both cases targets a node below the ECM indefinite (N.B. the verbs in the infinitival complements are intensional transitives; on the [14] analysis of these constructions their complements are clausal; obligatory QR of the bracketed QPs thus targets a position below the infinitival intensional transitive). If the ECM indefinite stays within the scope of O, Superiority predicts—barring total reconstruction of the indefinite4 —that the bracketed QP will be unable to take scope over O, contrary to fact. I return to both of these constructions in §5.

5

Re-evaluating Sauerland’s Data

5.1

Modal Intervention?

Does the non-specific disjunctive-desire reading of (1)—repeated here as (29)— require QRing these two countries over the intensional operator? Here’s some evidence it doesn’t: (29) Mary wants to marry someone from these two countries. (30) (Pointing to “Toccata and Fugue in D minor” and “O Fortuna”) When these two songs play in a movie, someone’s about to die. 3 4

For Sauerland, anyway. I discuss below why I don’t think de re diagnoses wide scope. The aforementioned anonymous reviewer notes that total reconstruction as generally understood only applies to A-chains, not QR chains. True enough.

Can DP Be a Scope Island?

7

(31) The paranoid wizard refuses to show anyone these two amulets. (32) The paranoid wizard refuses to show more than two people these two amulets. (33) You may show a reporter (lacking a security clearance) these two memos. (34) [Ms. Goard] declined to show a reporter those applications. (35) At least some states consider it to be attempted murder to give someone these drugs. (36) When you give someone these viruses, you expect to see a spike as gene expression changes. (37) #Mary wants to marry someone from every Scandinavian country. (38) #When every Stravinsky song plays in a movie, someone’s about to die.

To the extent that (29) can express something felicitous, so can (30), despite the fact that QR of those two songs over the modal operator when is blocked by a tensed clause boundary. Specifically, (30) needn’t quantify over situations in which two songs play. The availability of a disjunctive reading in this case (viz. ≈ when either of those two songs plays in a movie, someone’s about to die) suggests that QR out of DP may not be required for a felicitous reading of (29). Example (31), whose infinitival complement hosts a double object configuration, corroborates this assessment. Double object constructions are known to disallow QR of the DO over the IO—cf. [4]. Here the NPI/nonspecific IO remains within the scope of the downward-entailing intensional operator refuse. Accordingly, these two amulets cannot QR above refuse. Nevertheless, the most salient reading of (31) involves a wizard who doesn’t show anyone either of the two amulets.5 Similarly (32) permits a reading such that the paranoid wizard won’t show either of the two amulets to any group of three or more people. Similarly, (33) allows a disjunctive construal of these two memos. On this reading, you are conferred permission to show any reporter lacking a security clearance either of the two memos (and possibly both). So you’re being compliant if you show such a reporter memo #1 but not memo #2. This is again despite a nonspecific IO, which should prohibit QR of these two memos to a position over the deontic modal. Examples (35)–(36) likewise permit nonspecific IOs alongside disjunctively construed DOs, despite double object configurations (and a tensed clause boundary in 36).6 5

6

Superiority theorists may counter that NPIs aren’t subject to QR and thus that the DO is free to QR over anyone in (31). This leaves (32) and (33)–(36) mysterious. Additionally, [15] shows that NPIs can host ACD gaps, suggesting they QR, after all—cf. that boy won’t show anyone he should his report card. (34)–(36) were obtained via Google search. They can be accessed at the following links:

1. http://www.nytimes.com/2000/11/19/us/counting-vote-seminole-countyjudge-asked-democrats-quash-absentee-ballots.html 2. http://tribes.tribe.net/bdsmtipstechniques/thread/ 8cd9d057-e54d-4b03-8899-edada3dc33e6 3. http://www.genomics.duke.edu/press/genomelife/current/GL_MarApr09.pdf —each of which displays the nonspecific-IO/disjunctive-DO reading.

8

S. Charlow

Finally, (37) and (38) lack felicitous readings (given certain norms surrounding marriage and film scores). They are incompatible with scenarios in which Mary wants to marry once, and every Stravinsky song playing in a given situation isn’t a clue about anything. This suggests that plural demonstratives may be necessary for disjunctive readings of (29)–(36).7 In sum, (30)–(36) show that QR over an intensional operator cannot be necessary for a disjunctive construal of a plural demonstrative. Examples (37) and (38) show that in certain cases the plural demonstrative is a necessary component of the disjunctive reading. These facts follow if we assume that disjunctive readings in these cases aren’t (necessarily) due to QR over an intensional operator but may instead arise when plural demonstratives occur in the scope of modal (or downward-entailing, cf. §5.2) operators.8 5.2

Negation Intervention?

Recall [7]’s negation-intervention cases—e.g. (4), repeated here as (39): (39) John didn’t see pictures of several baseball players (at the auction).

As [7] observes and Sauerland confirms, constructions like (39) allow a reading with several > ¬ > ∃. Several baseball players x, in other words, are such that John didn’t see any pictures of x. Independently motivated semantic apparatus for bare plurals helps explain these data. If DP is a scope island, scoping several over not requires QRing the bare plural over not: (40) [[several baseball players]x pictures of x]y [John didn’t see y]

We assume following [16] that bare plurals sometimes denote kinds and that combining a kind-level argument with a predicate of objects creates a type-mismatch resolved by an operation called ‘D(erived) K(ind) P(redication).’ Following [17], the semantics of DKP is as follows: (41) For any P denoting a predicate of objects: DKP(P ) = λy.[∃x : x ≤ y][P x], where x ≤ y iff x instantiates the kind y.

DKP generalizes to n-ary relations in the usual way (cf. [16] fn. 16), introducing an existential quantifier within the scope of the shifted verb. That DPs of the form pictures of several baseball players denote kinds on their inversely linked readings is confirmed by (a) the felicity of (42) and (b) the absence of a several > ∃ > ¬ reading for (39) (repeated as 43): (42) Pictures of several baseball players are rare. (43) John didn’t see pictures of several baseball players (at the auction). 7 8

Though plural indefinites seem to work similarly in certain cases. See §5.2. Disjunctive readings might involve something like a free-choice effect or exceptional scope (i.e. scope out of islands which doesn’t require QR out of islands).

Can DP Be a Scope Island?

9

Returning to (40), note that the trace y left by QR of the bare plural will (presumably) be kind-level.9 This creates a mismatch between see and the bare plural’s trace y. DKP applies to see y, introducing an ∃ within the scope of a ¬: (44) λz . see y z →DKP λz . [∃x : x ≤ y][see x z]

This derives several > ¬ > ∃, despite the prohibition on QR out of DP. Plural Indefinites and Demonstratives under Negation. Other factors may be at work in these cases. Recall (31), repeated here as (45): (45) The paranoid wizard refuses to show anyone these two amulets. (46) The paranoid wizard refuses to show anyone several (of his) amulets.

As noted previously, (45) requires the NPI IO to remain under refuse, while permitting a (disjunctive) reading truth-conditionally equivalent to 2 > refuse. Interestingly, the same goes for (46), which replaces the demonstrative with a plural indefinite and admits a (disjunctive) reading equivalent to several > refuse. In both cases scope freezing should prohibit QR of the DO over the IO to a position above refuse. It is hypothesized that these readings instead result from disjunctively construed DOs. Might QR of (39)’s inversely-linked bare plural over negation thus be unnecessary for a reading which gives several apparent scope over negation? Consider the following cases: (47) John didn’t read any books by these two authors. (≈ 2 > ¬ > ∃) (48) John didn’t read any books by several authors. (??? ≈ several > ¬ > ∃)

These examples replace (39)’s bare plural with full determiner phrases. (47) allows a (disjunctive) reading equivalent to 2 > ¬ > ∃, whereas the disjunctive reading of (48) is borderline ungrammatical. Why this might obtain is unfortunately beyond the scope of what I can consider here, but it shows that the QR+DKP story may be necessary for (39) but not for an example with a plural demonstrative in lieu of the plural indefinite.10,11 5.3

ACD and Scope Shift

Sauerland’s ACD data remains to be discussed. Recall that (49) is grammatical with a nonspecific indefinite and wide ACD resolution, suggesting QR out of DP. (49) Mary wants to marry someone from every country Barry does. 9

10 11

Pictures of several baseball players will denote something like the set of predicates κ of kinds such that for several baseball players x, the y = pictures of _x^ (where _x^ = x) is such that κy = 1. The contrast between (46) (permits a disjunctive reading) and (48) (doesn’t) is also unexplained. Note also that the contrast between (47) and (48) doesn’t follow for Sauerland, who permits the embedded QP to QR over negation in both cases.

10

S. Charlow

I’d like to suggest that QR to resolve ACD is a more powerful scope-shift mechanism than QR which isn’t required for interpretation. A similar claim is made in [18], which distinguishes “ACD-QR” from “Scope-QR”—viz. QR which doesn’t resolve antecedent containment. [18] notes e.g. that ACD licenses QR across a tensed clause boundary and negation, both islands for Scope-QR: (50) John hopes I marry everyone you do (hope...) (51) John said that Mary will not pass every student that we predicted he would (say...)

In the following I consider some additional evidence in favor of an ACD-QR mechanism distinct from and more powerful than Scope-QR. ACD and DE Operators. Examples (52) and (53) differ in that (53) hosts an ACD gap, whereas (52) does not. The reading of (53) we’re interested in involves wide ACD resolution: (52) Mary denied kissing everyone. (??∀ > deny) cf. Mary imagined kissing everyone. (∀ > imagine) (53) Mary denied kissing everyone Barry did. (∀ > deny)

QPs headed by every do not readily QR over downward-entailing operators—cf. [19,18].12 (52) doesn’t permit ∀ > deny without focal stress on everyone. The wide ACD reading of (53), by contrast, permits (actually, requires) ∀ > deny (note that although Barry is focused, everyone is not). Double Object Constructions. Imagine a scenario as follows: a bus full of Red Sox players pulls up. Mary and Barry both mistake them for the Yankees. Each of them wants to give the same presents to some player (or other?) on the bus. (54) Mary wants to give a Yankee everything Barry does.

(54) is grammatical with a Yankee read de dicto—as required by the mistaken beliefs of Mary and Barry in our scenario—and wide ACD resolution, pace [4].13 Whether this reading permits gift recipients to vary with gifts is a more difficult matter.14 Nevertheless, the grammaticality of a wide ACD site hosted in a DO (∀ > O), combined with a de dicto IO (O > ∃), requires subverting double-object scope freezing. (55) The wizard’s wife refuses to show anyone the same two amulets her husband does. 12 13

14

N.B. these authors only consider QR of every over not. [20] discusses an example like (54) in his fn. 10 but doesn’t consider whether it allows the IO to be read de dicto. [4] considers two examples like (54)—his (27a) and (27b)—but concludes they don’t admit a de dicto IO, contrary to the judgments of my informants and myself. As [4] correctly notes, Mary gave a child every present Barry did is grammatical but doesn’t allow girls to vary with presents (*∀ > ∃). He proposes that both IO and DO QR, the DO since it contains the ACD gap and the IO to get out of the DO’s way (i.e. to preserve Superiority). Thus IO > DO is preserved. Examples (54) and (55) suggest that this may not be the right approach.

Can DP Be a Scope Island?

11

(55) is grammatical with wide ACD. Given the NPI IO, this requires 2 > O > ∃. Again, ACD QR subverts the prohibition on QR of DO over IO in double object configurations. Larson’s Generalization. Recall (28), repeated here (slightly modified) as (56): (56) Frege wanted a student to construct a proof of [every theorem Russell did]

Previously we focused on how (28) represented a problem for Sauerland. Superiority predicts that a nonspecific reading of the ECM indefinite will be incompatible with wide ACD resolution, contrary to fact. Note, however, that this reading represents a problem for just about anybody. Specifically, its grammaticality entails a violation of Larson’s generalization:15 (57) [every theorem Russell wanted a student to construct a proof of x]x [Frege wanted a student to construct a proof of x]

LF (57) entails that a QP intervenes between a DP-embedded QP and its remnant! In other words: the same strategy that Sauerland uses to argue that DP isn’t a scope island allows us to construct examples wherein Larson’s generalization doesn’t. But of course we don’t want to conclude that Larson’s generalization doesn’t ever hold. In sum, since ACD-QR can cross islands, Sauerland’s ACD examples aren’t dispositive for the DP-as-scope-island hypothesis.

6

Conclusions

This squib has offered evidence that the conclusions reached in [11,1] favoring QR out of DP may not be warranted. Most seriously, the mechanism Sauerland proposes to derive Larson’s generalization only really works for extensional cases, over-generating when inversely-linked DPs occur in intensional contexts. Sauerland’s account also struggles with “surface-linked” interpretations of inversely linked DPs and ECM interveners which should block certain readings but appear not to, as well as a reliance on covert clausal syntax in intensional transitive constructions. On closer inspection, readings analogous to those which Sauerland takes to motivate QR out of DP occur in constructions where (we have independent reason to believe that) QR above a certain relevant operator isn’t an option. Importantly, each of Sauerland’s arguments for QR out of DP is given a doubleobject-construction rejoinder. I have speculated that plural demonstratives/ indefinites in the scope of modal/negation operators can be construed disjunctively in the absence of QR. Additionally, if (following [16]) the scope of an ∃ quantifier isn’t diagnostic of the LF position of a kind-denoting bare plural, [7]’s negation-intervention cases don’t require a split DP. 15

Similar comments don’t apply to (27). As many authors—e.g. [21,22]—have shown, scoping a QP over an intensional operator may be sufficient for a de re reading of that QP, but it cannot be necessary.

12

S. Charlow

Finally, in line with [18], I’ve provided new arguments that ACD-QR can do things Scope-QR can’t: namely, scope an every-phrase over a downward-entailing operator, carry a DO over an IO in double object configurations, and precipitate violations of Larson’s generalization. Some of these criticisms will also militate against [4]’s characterization of QR as Superiority-governed. Additionally, it remains to be determined to what extent plural demonstratives and indefinites behave as a piece with respect to disjunctive readings, why this might be the case, and what any of this has to do with modals/negation. I must leave consideration of these matters to future work.

References 1. Sauerland, U.: DP Is Not a Scope Island. Linguistic Inquiry 36(2), 303–314 (2005) 2. May, R.: The Grammar of Quantification. Ph.d. thesis. MIT Press, Cambridge (1977) 3. Larson, R.K.: Quantifying into NP. Ms. MIT, Cambridge (1987) 4. Bruening, B.: QR Obeys Superiority: Frozen Scope and ACD. Linguistic Inquiry 32(2), 233–273 (2001) 5. Heim, I.: Presupposition Projection and the Semantics of Attitude Verbs. Journal of Semantics 9, 183–221 (1992) 6. May, R.: Logical Form: Its Structure and Derivation. MIT Press, Cambridge (1985) 7. Huang, C.T.J.: Logical relations in Chinese and the theory of grammar. Garland, New York (1998) (Ph.d. thesis, MIT, 1982) 8. Rooth, M.: Association with Focus. Ph.d. thesis, UMass, Amherst (1985) 9. Büring, D.: Crossover Situations. Natural Language Semantics 12(1), 23–62 (2004) 10. Richards, N.: What moves where when in which language? Ph.d. thesis, MIT (1997) 11. Sauerland, U.: Syntactic Economy and Quantifier Raising. Ms., Universität Tübingen (2000) 12. Charlow, S.: Inverse linking, Superiority, and QR. Ms., New York University (2009) 13. Heim, I., Kratzer, A.: Semantics in Generative Grammar. Blackwell, Oxford (1998) 14. Larson, R.K., den Dikken, M., Ludlow, P.: Intensional Transitive Verbs and Abstract Clausal Complementation. Ms., SUNY at Stony Brook, Vrije Universiteit, Amsterdam (1997) 15. Merchant, J.: Antecedent-contained deletion in negative polarity items. Syntax 3(2), 144–150 (2000) 16. Chierchia, G.: Reference to Kinds across Languages. Natural Language Semantics 6, 339–405 (1998) 17. Magri, G.: Constraints on the readings of bare plural subjects of individual-level predicates: syntax or semantics? In: Bateman, L., Ussery, C. (eds.) Proceedings from NELS35, vol. I, pp. 391–402. GLSA Publications, Amherst (2004) 18. von Fintel, K., Iatridou, S.: Epistemic Containment. Linguistic Inquiry 34(2), 173–198 (2003) 19. Beghelli, F., Stowell, T.: Distributivity and negation: the syntax of each and every. In: Szabolcsi, A. (ed.) Ways of Scope Taking, pp. 71–109. Kluwer, Dordrecht (1997) 20. Larson, R.K.: Double Objects Revisited: Reply to Jackendoff. Linguistic Inquiry 21(4), 589–632 (1990) 21. Farkas, D.: Evaluation indices and scope. In: Szabolcsi, A. (ed.) Ways of Scope Taking, pp. 183–215. Kluwer, Dordrecht (1997) 22. Keshet, E.: Good Intensions: Paving Two Roads to a Theory of the De re/De dicto Distinction. Ph.d. thesis, MIT (2008)

Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation Michael Franke Seminar f¨ ur Sprachwissenschaft University of T¨ ubingen

Abstract. This paper applies a model of boundedly rational “level-k thinking” [1–3] to a classical concern of game theory: when is information credible and what shall I do with it if it is not? The model presented here extends and generalizes recent work in game-theoretic pragmatics [4–6]. Pragmatic inference is modeled as a sequence of iterated best responses, defined here in terms of the interlocutors’ epistemic states. Credibility considerations are a special case of a more general pragmatic inference procedure at each iteration step. The resulting analysis of message credibility improves on previous game-theoretic analyses, is more general and places credibility in the linguistic context where it, arguably, belongs.

1

Semantic Meaning and Credible Information in Signaling Games

The perhaps simplest game-theoretic model of language use is a signaling game with meaningful signals. A sender S observes the state of the world t ∈ T in private and chooses a message m from a set of alternatives M all of which are assumed to be meaningful in the (unique and commonly known) language shared by S and a receiver R. In turn, R observes the sent message and chooses an action a from a given set A. In general, the payoffs for both S and R depend on the state t, the sent message m and the action a chosen by the receiver. Formally, a signaling game with meaningful signals is a tuple {S, R} , T, Pr, M, [[·]] , A, US , UR  where Pr ∈ Δ(T ) is a probability distribution over T ; [[·]] : M → P(T ) is a semantic denotation function and US,R : M × A × T → IR are utility functions for both sender and receiver.1 We can 

1

I would like to thank Tikitu de Jager, Robert van Rooij, Daniel Rothschild, Marc Staudacher and three anonymous referees for insightful comments, help and discussion. I benefited greatly from discussions with Gerhard J¨ ager. Also, I am thankful to Sven Lauer for waking my interest by first explaining to me with enormous patience some puzzles about credibility that I did not fully understand at the time. Errors are my own. I will assume throughout that (i) all sets T , M and A are non-empty and finite, that (ii) Pr(t) > 0 for all t ∈ T , that (iii) for each state t there is at least one message m which is true in that state and that (iv) no message is contradictory, i.e., there is no m for which [[m]] = ∅.

T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 13–24, 2010. c Springer-Verlag Berlin Heidelberg 2010 

14

M. Franke

t∃¬∀ t∀

a∃¬∀ a∀ msome mall √ 1,1 0,0 − √ √ 0,0 1,1

Fig. 1. Scalar Implicature

thigh tlow

amate aignore mhigh mlow √ 1,1 0,0 − √ 1,0 0,1 −

Fig. 2. Partial Conflict

conceive of such signaling games as abstract mathematical models of a conversational context whose most important features they represent: the interlocutors’ beliefs, behavioral possibilities and preferences. If a signaling game is a context model, the game’s solution concept is what yields a prediction of the behavior of agents in the modelled conversational situation. The following easy example of a scalar implicature, e.g., the inference that not all students came associated with the sentence “Some of the students came”, makes this distinction clear. A simple context model for this case is the signaling game called “Scalar Implicature” in Figure 1:2 there are two states t∃¬∀ and t∀ , two messages msome and mall with semantic meaning as indicated and two receiver interpretation actions a∃¬∀ or a∀ which correspond one-to-one with the states; sender and receiver payoffs are aligned: an implementation of the standard assumption that conversation and implicature calculation revolve around the cooperative principle [7]. A solution concept, whatever it may be, should then ideally predict that S t∀ (S t∃¬∀ ) chooses msome (mall ) and the receiver responds with action a∃¬∀ (a∀ ).3 It is obvious that in order to arrive at this prediction, a special role has to be assigned to the conventional, semantic meaning of the messages involved. For instance, in the above example anti-semantic play, as we could call it, that simply reverses the use of messages, should be excluded. Most game-theoretic models of language use hard-wire semantic meaning into the game play, either as a restriction on available moves of sender and receiver, or into the payoffs, but in both cases effectively enforcing truthfulness and trust. This is fine as long as conversation is mainly cooperative and preferences aligned. But let’s face it: the central Gricean assumption of cooperation is an optimistic idealization after all; conflict, lies and deceit are as ubiquitous as air. But then, hard-wiring of truthfulness and trust limits the applicability of our models as it excludes the possibility that senders may wish to mislead their audience. We should aim for more general models and, ideally, let the agents, not the modeller decide when to be truthful and what to trust. Opposed to hard-wiring truthfulness and trust, the most liberal case at the other end of the spectrum is to model communication, not considering reputation or further psychological constraints at all, as cheap talk. Here messages do not impose restrictions on the game play and are entirely payoff irrelevant: US,R (m, a, t) = US,R (m , a, t) for all m, m ∈ M , a ∈ A and t ∈ T . However, if talk is cheap, yet exogenously meaningful, the question arises how to integrate 2 3

Unless indicated, I assume that states are equiprobable in example games. For t ∈ T , I write S t as an abbreviation for “a sender of type t”.

Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation

15

semantic meaning into the game. Standard solution concepts, such as sequential equilibrium or rationalizability, are too weak to predict anything reasonable in this case: they allow for nearly all anti-semantic play and also for babbling, where signals are sent, as it were, arbitrarily and therefore ignored by the receiver. In response to this problem, game theorists have proposed various refinements of the standard solution concepts based on the notion of credibility.4 The idea is that semantic meaning should be respected (in the solution concept) wherever this is reasonable in view of the possibly diverging preferences of interlocutors. As an easy example, look at the game “Partial Conflict” in Figure 2 where S is of either a high quality or a low quality type, and where R would like to pair with S thigh only, while S wants to pair with R irrespective of her type. Interests are in partial conflict here and, intuitively, a costless, non-committing message mhigh is not credible, because S tlow would have all reason to send it untruthfully. Therefore, intuitively, R should ignore whatever S says in this game. In general, if nothing prevents S from babbling, lying or deceiving, she might as well do so; whenever she even has an incentive to, she certainly will. For the receiver the central question becomes: when is a signal credible and what should I do if it is not? This paper offers a fresh look at this classical problem of game theory. The novelty is, so to speak, a “linguistic turn”: I suggest that credibility considerations are pragmatic inferences, in some sense very much alike —and in another sense very much unlike— conversational implicatures. I argue that this linguistic approach to credibility of information improves on the classical game-theoretic analyses by Farrell and Rabin [8, 9]. In order to implement conventional meaning of signals in a cheap talk model, the present paper takes an epistemic approach to the solution of games: the model presented in this paper spells out the reasoning of interlocutors in terms of their beliefs about the behavior of their opponents as a sequence of iterated best responses (ibr) which takes semantic meaning as a starting point. For clarity: the ibr model places no restriction whatsoever on the use of signals; conventional meaning is implemented merely as a focal element in the deliberation of agents. This way, the ibr model extends recent work in game-theoretic pragmatics [5, 6], to which it adds generality by taking diverging preferences into account and by implementing the basic assumptions of “levelk models” of reasoning in games [1–3]. In particular, agents in the model are assumed to be boundedly rational in the sense that each agent computes only finitely many steps of the best response sequence. Section 2 scrutinizes the notion of credibility, section 3 spells out the formal model and section 4 discusses its properties and predictions.

2

Credibility and Pragmatic Inference

The classical idea of message credibility is due to Farrell [8]. Farrell seeks an equilibrium refinement that pays due respect to the semantic meaning of messages. His 4

The standards in the debate about credibility were set Farrell [8] for equilibrium and by Rabin [9] for rationalizability. I will mainly focus on these two classical papers here for reasons of space.

16

M. Franke

a1 a2 a3 m1 m2 √ t1 4,3 3,0 1,2 − √ t2 3,0 4,3 1,2 − Fig. 3. Best Message Counts

a1 a2 a3 a4 m12 √ t1 4,5 5,4 0,0 1,4 √ t2 0,0 4,5 5,4 1,4 t3 5,4 0,0 4,5 1,4 −

m23 m13 √ − √ − √ √

Fig. 4. Further Iteration

notion of credibility is therefore tied to a given reference equilibrium as a status quo. According to Farrell, then, a message m is Farrell-credible with respect to a given equilibrium if all t ∈ [[m]] prefer the receiver to interpret m literally, i.e., to play a best response to the belief Pr(·| [[m]]) that m is true, over the equilibrium play, while no type t ∈ [[m]] does. A number of objections can be raised against Farrell-credibility. First of all, the definition requires all types in [[m]] to prefer a literal interpretation of m over the reference equilibrium. This makes sense, under Farrell’s Rich Language Assumption (rla) that for every X ⊆ T there is a message m with [[m]] = X. This assumption is prevalent in game-theoretic discussions of credibility, but restricts applicability. I will show in section 4 that this assumption seriously restricts Rabin’s account [9]. But for now, suffice it to say that, in particular, the rla excludes models like G1, used to study pragmatic inference in the light of (partial) inexpressibility. I will drop the rla here to aim for more generality and compatibility with linguistic pragmatics.5 Doing so, implies amending Farrellcredibility to require only that some types in [[m]] prefer a literal interpretation of m over the reference equilibrium. Still, there are further problems. Matthews et al. criticize Farrell-credibility as being too strong [12]. Their argument builds on the example in Figure 3. Compared to the babbling equilibrium, in which R performs a3 , messages m1 and m2 are intuitively credible: both S t1 , as well as S t2 have good reason to send m1 and m2 respectively. Communication seems possible and utterly plausible. However, neither message is Farrell-credible, because for i, j ∈ {1, 2} and i = j not only S tj , but also S ti prefers R to play a best response to a literal interpretation of mj , which would trigger action aj , over the no-communication outcome a3 . The problem with Farrell’s notion is obviously that just doing better than equilibrium is not enough reason to send a message, when sending another message is even better for the sender. When evaluating the credibility of a message m, we have to take into account alternative forms that t ∈

[[m]] might want to send. Compare this with the scalar implicature game in Figure 1. Intuitively, message msome is interpreted as communicating that the true state of affairs is t∃¬∀ , 5

A reviewer points out that the rla has a correspondent in the linguistic world in the “principle of effability” [10]. The reviewer supports dropping the rla, because otherwise pragmatic inferences are limited to context and effort considerations. It is also very common (and, to my mind, reasonable) to restrict attention to certain alternative expressions only, namely those that are salient (in context) after observing a message. Of course, game theory is silent as to where the alternatives come from, since this is a question for the linguist, perhaps even the syntactician [11].

Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation

17

because in t∀ the sender would have used mall . In other words, the receiver discards a state t ∈ [[m]] as a possible sender of m because that type has a better message to send. Of course, such pragmatic enrichment does not make a message intuitively incredible, as it is still used in line with its semantic meaning. Intuitively speaking, in this setting S even wants R to draw this pragmatic inference. This is, of course, different in the “Partial Conflict” game in Figure 2. In general, if S wants to mislead, she intuitively wants the receiver to adopt a certain belief, but she does not want the receiver to realize that this belief might be false: we could say, somewhat loosely, that S wants her purported communicative intention to be recognized (and acted upon), but she does not want her deceptive intention to be recognized. Nevertheless, if the receiver does manage to recognize a deceptive intention, this too may lead to some kind of pragmatic inference, albeit one that the sender did not intend the receiver to draw. While the implicature in the scalar implicature game rules out a semantically feasible possibility, credibility considerations, in a sense, do the exact opposite: message mhigh is pragmatically weakened in the “Partial Conflict” game by ruling in state tlow . Despite the differences, there is a common core to both implicature and credibility inference. Here and there, the receiver seems to reason: which types of senders would send this message given that I believe it literally? Indeed, exactly this kind of reasoning underlies Benz and van Rooij’s optimal assertions model of implicature calculation for the purely cooperative case [6]. The driving observation of the present paper is that the same reasoning might not only rule out states t ∈ [[m]] to yield implicatures but may also rule in states t ∈ [[m]]. When the latter is the case, m seems intuitively incredible. Still, the reasoning pattern by which implicatures and credibility-based inferences are computed is the same. On superficial reading, this view on message credibility can be attributed to Stalnaker [4]:6 call a message m BvRS-credible (Benz, van Rooij, Stalnaker) iff for some types t ∈ [[m]], but for no type t ∈ [[m]] S t ’s expected utility of sending m given that R interprets literally is at least as great as S t ’s expected utility of sending any alternative message m . The notion of BvRS-credibility matches our intuitions in all the cases discussed so far, but it is, in a sense, self-refuting, as the game in Figure 4 from [12] shows. In this game, all the available messages m12 , m23 and m13 are BvRScredible, because if R interprets literally S t1 will use message m12 , S t2 will use message m23 and S t3 will use message m13 . No message is used untruthfully by any type. However, if R realizes that exactly S t1 uses message m12 , he would 6

It is unfortunately not entirely clear to me what exactly Stalnaker’s proposal amounts to, as insightful as it might be, because the account is not fully spelled out formally. The basic idea seems to be that (something like) the notion of BvRScredibility, as it is called here, should be integrated as a constraint on receiver beliefs— believe a message iff it is BvRS-credible— into an epistemic model of the game together with some appropriate assumption of (common) belief in rationality. The class of game models that satisfies rationality and credibility constraints would then ultimately define how signals are used and interpreted.

18

M. Franke

rather not play a2 , but a1 . But if the sender realizes that message m12 triggers the receiver to play a1 , suddenly S t3 wants to send m12 untruthfully. This example shows that BvRS-credibility is a reliable start, but stops too short. If messages are deemed credible and therefore believed, this may create an incentive to mislead. What seems needed to rectify the formal analysis of message credibility is a fully spelled-out model of iterated best responses that starts in the Benz-van-Rooij-Stalnaker way and then carries on iterating. Here is such a model.

3

The IBR Model and Its Assumptions

3.1

Assumptions: Focal Meaning and Bounded Rationality

The ibr model presented in this paper rests on three assumptions with which it also sets itself apart from previous best-response models in formal pragmatics [5, 6, 13]. The first assumption is the Focal Meaning Assumption: semantic meaning is focal in the sense that the sequence of best responses starts with a purely semantic truth-only sender strategy. Semantic meaning is also assumed focal in the sense that throughout the ibr sequence R believes messages to be truthful unless S has a positive incentive to be untruthful. This is the second, so called Truth Ceteris Paribus Assumption (tcp). These two (epistemic) assumptions assign semantic meaning its proper place in this model of cheap-talk communication. The third assumption is the Bounded Rationality Assumption: I assume that players in the game have limited resources which allow them to reason only up to some finite iteration depth k. At the same time I take agents to be overconfident : each agent believes that she is smarter than her opponent. Camerer et al. make an empirical case for these assumptions about the psychology of reasoners [3].7 However, for simplicity, I do not implement their Cognitive Hierarchy Model in full. Camerer et al. assume that each agent who is able to reason up to strategic depth k has a proper belief about the population distribution of players who reason up to depth l < k, but I will assume here, just to keep things simple, that each player believes that she is exactly one step ahead of her opponent [2, 15]. (I will discuss this simplifying assumption critically in section 4.) 3.2

Beliefs and Best Responses

Given a signaling game, a sender signaling-strategy is a function σ ∈ S = (Δ(M ))T and a receiver response-strategy is a function ρ ∈ R = 7

A good intuitively accessible example why this should be is a so-called beauty contest game [14]. Each player from a group of size n > 2 chooses a number from 0 to 100. The player closest to 2/3 the average wins. When this game is played with a group of subjects who have never played the game before, the usual group average lies somewhere between 20 to 30. This is quite far from the group average 0 which we would expect from common (true) belief in rationality. Everybody seems to believe that they are just a bit smarter than everybody else, without noticing their own limitations.

Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation

19

(Δ(A))M . In order to define which strategies are best responses to a given belief, we need to define the game-relevant beliefs of both S and R. Since the only uncertainty of S concerns what R will do, the set of relevant sender beliefs ΠS is just the set of receiver response-strategies: ΠS = R. On the receiver’s side, we may say, with some redundancy, that there are three components in any gamerelevant belief [16]: firstly, R has a prior belief Pr(·) about the true state of the world; secondly, he has a belief about the sender’s signaling strategy; and thirdly, he has a posterior belief about the true state after hearing a message. Posteriors should be derived by Bayesian update from the former two components, but also specify R’s beliefs after unexpected surprise messages. Taken  1 together,  the set 2 3 , πR , πR of relevant receiver beliefs ΠR is the set of all triples πR for which 1 2 3 = Pr, πR ∈ S = (Δ(M ))T and πR ∈ (Δ(T ))M such that for any t ∈ T and πR 2 m ∈ M if πR (t, m) = 0, then: 3 πR (m, t) = 

1 2 (t) × πR (t, m) πR 1 2 (t , m) .  π (t ) × πR  t ∈T R

Given a sender belief ρ ∈ ΠS , say that σ is a best response signaling strategy to belief ρ iff for all t ∈ T and m ∈ M we have:  ρm (a) × US (m , a, t) . σ(t, m) = 0 → m ∈ arg max  m ∈M

a∈A

The set of all such best responses to belief ρ is denoted by S(ρ). Given a receiver belief πR ∈ ΠR say that ρ is a best response strategy to belief πR iff for all m ∈ M and a ∈ A we have:  3 ρ(m, a) = 0 → a ∈ arg max πR (m, t) × UR (m, a , t) .  a ∈A

t∈T

The set of all such best responses to belief πR isdenoted by R(πR ). Also, if   ΠR ⊆ ΠR is a set of receiver beliefs, let R(ΠR ) = πR ∈Π  R(πR ). R

3.3

Strategic Types and the IBR Sequence

In line with the Bounded Rationality Assumption of Section 3.1, I assume that senders and receivers are of different strategic types. Strategic types correspond to the level k of strategic reasoning a player in the game performs (while believing she thereby outperfoms her opponent by exactly one step of reasoning). I will give an inductive definition of strategic types in terms of player’s beliefs, starting with a fixed strategy σ0∗ of S0 .8 Then, for any k ≥ 0, Rk is characterized by a ∗ belief set πR ⊆ ΠR that S is a level-k sender and Sk+1 is characterized by a k ∗ belief πSk+1 ∈ ΠS that R is a level-k receiver. I assume that S0 plays according to the signaling strategy σ0∗ which simply sends any true message with equal probability in all states. There need not be any 8

I will write Sk and Rk to refer to a sender or receiver of strategic type k. Likewise, Skt refers to a sender of strategic type k and knowledge type t.

20

M. Franke

belief to which this is a best response, as level-0 senders are (possibly irrational) dummies to implement the Focal Meaning Assumption. R0 then believes that he is facing S0 . With unique σ0∗ , which sends all messages in M with positive probability (M is finite and contains no contradictions), R0 is characterized ∗ entirely by the unique belief πR that S plays σ0∗ . o In general, Rk believes that he is facing a level-k sender. For k > 0, Sk is characterized by a belief πS∗ k ∈ ΠS . Rk consequently believes that Sk plays a best response σk ∈ S(πS∗ k ) to this belief. We can leave this unrestricted and assume that Rk considers any σk ∈ S(πS∗ k ) possible. But it will transpire that for an intuitively appealing analysis of message credibility we need to assume that Rk takes Sk to be truthful all else being equal (see also discussion in section 4). We implement the tcp assumption of Section 3.1 as a restriction S ∗ (πS∗ k ) ⊆ S(πS∗ k ) on signaling strategies held possible by R. Of course, even when restricted, there need not be a unique signaling strategy here. As a general tie-break rule, assume the “principle of insufficient reason” that all σk ∈ S ∗ (πS∗ k ) are equiprobable to Rk . That means that Rk effectively believes that his opponent is playing response strategy  ∗ ) σ(t, m) σ∈S ∗ (πS k . σk∗ (t, m) = |S ∗ (πS∗ k )| This fixes Rk ’s beliefs about the behavior of his opponent, but it need not fix Rk ’s 3 about surprise messages. Since this matter is intricate and moreover belief πR Rk ’s counterfactual beliefs do not play a crucial role in any examples discussed in this paper, I will not pursue this issue at all in this paper (but see also footnote 9 below). In general, let us say that Rk is characterized by any belief whose second component is σk∗ and whose third component satisfies some (coherent, but possibly vacuous) assumption about the interpretation of surprise messages. ∗ ⊆ ΠR be the set of all such beliefs. Rk is then fully characterized by Let, πR k ∗ πRk . In turn, Sk+1 believes that her opponent is a level-k receiver who plays a best ∗ response ρk ∈ R(πR ). With the above tie-break rule Sk+1 is fully characterized k by the belief  ∗ ) ρ(m, a) ρ∈R(πR ∗ k . ρk (m, a) = ∗ )| |R(πR k 3.4

Credibility and Inference

∗ Define that a signal m is k-optimal in t iff σk+1 (t, m) = 0. The set of kt optimal messages in t are all messages that Rk+1 believes Sk+1 might send (thus taking the tcp assumption into account). Similarly, distill from R’s beliefs his interpretation-strategy δ : M → P(T ) as given by belief πR : δπR (m) = 3 {t ∈ T | πR (m, t) = 0}. This simply is the support of the posterior beliefs of R after receiving message m. Let’s write δk for the interpretation strategy of a level-k receiver.

Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation

21

For any k > 0, since Sk believes to face Rk−1 with interpretation strategy δk−1 , wanting to send message m would intuitively count as an attempt to mislead if sent by Skt just in case t ∈ δk−1 (m). Such an attempt would moreover be untruthful if t ∈ [[m]]. While Rk−1 would be deceived, Rk would see through the attempted deception. From Rk ’s point of view, who adheres to the tcp Assumption, a message m is incredible if it is k − 1-optimal in some t ∈ [[m]]. But then Rk will include t in his interpretation of m: recognizing a deceptive intention leads to pragmatic inference. In general, we should consider a message m credible unless some type t ∈ [[m]] would want to use m somewhere along the ibr sequence; precisely, m is credible iff δk (m) ⊆ [[m]] for all k ≥ 0.9

4

Discussion

The ibr model makes intuitively correct predictions about message credibility for the games considered so far. In the scalar implicature game, R0 responds to msome with the appropriate action a∃¬∀ , but still interprets δ0 (msome ) = {t∃¬∀ , t∀ }. In turn, R1 interprets as δ1 (msome ) = {t∃¬∀ }; he has pragmatically enriched the semantic meaning by taking the sender’s payoff structure and available messages into account. After one round a fixed-point is reached, with fully revealing credible signaling in accordance with intuition. In the game “Partial Conflict”, ibr t predicts that both S1high and S1tlow will use mhigh which is therefore not credible. In the game from Figure 3, also fully revealing communication is predicted and for the game in Figure 4 ibr predicts that all messages are credible for R0 and R1 , but not for R2 , hence incredible as such. In general, the ibr model predicts that communication in games of pure coordination is always credible.

a1 a2 m12 √ t1 1,1 0,0 √ t2 0,0 1,1 t3 0,0 1,1 -

m3 − − √

Fig. 5. White Lie 9

Pr(t) a1 a2 a3 m12 √ t1 1/8 1,1 0,0 0,0 √ t2 3/4 0,0 1,1 0,0 t2 1/8 0,0 0,0 1,1 −

m23 − √ √

Fig. 6. Game without Name

It may seem that messages which would not be sent by any type (after the first round or later) come out credible under this definition, which would not be a good prediction. (Thanks to Daniel Rothschild (p.c.) for pointing this out to me.) However, this is not quite right: we get into this predicament only for some versions of the ibr sequence, not for others. It all depends on how the receiver forms his counterfactual beliefs. If, for instance, we assume that R rationalizes observed behavior even if it surprises him, we can keep the definition unchanged: if no type whatsoever has an outstanding reason to send m, the receiver’s posterior beliefs after m will support any type. So, unless m is tautologous, it is incredible. Still, Rothschild’s criticism is appropriate: the definition of message credibility offered here is, in a sense, incomplete as long as we do not properly define the receiver’s counterfactual beliefs; something left for another occasion.

22

M. Franke

Proposition 1. Take a signaling game with T = A and US,R (·, t, t ) = c > 0 if t = t and 0 otherwise. Then δk (m) ⊆ [[m]] for all k and m. Proof. Clearly, δ0 (m) ⊆ [[m]] for arbitrary m. So assume that δk (m) ⊆ [[m]]. In t this case Sk+1 will use m only if t ∈ δk (m). But then t ∈ [[m]] and therefore δk+1 (m) ⊆ [[m]].

However, the ibr model does not guarantee generally that communication is credible even when preferences are perfectly aligned, i.e., US = UR . This may seem surprising at first, but is due naturally to the possibility of, what we could call, white lies: untruthful signaling that is beneficial for the receiver. These may occur if the set of available signals is not expressive enough. As an easy example, consider the game in Figure 5 where S t2 will use m3 untruthfully to induce action a2 , which, however, is best for both receiver and sender. To understand the central role of the tcp assumption in the present proposal, consider the game in Figure 6. Here, R0 has the following posterior beliefs: after hearing message m12 he rules out t3 and believes that t2 is three times as likely as t1 ; similarly, after hearing message m23 he rules out t1 and believes that t2 is three times as likely as t3 . Consequently, R0 responds to both signals with a2 . Now, S1t1 , for instance, does not care which message to choose from, as far as her expected utilities are concerned. But R1 nevertheless assumes that S1t1 speaks truthfully. It’s thanks to the tcp assumption that ibr predicts messages to be credible in this game. This game also shows a difference between the ibr model and Rabin’s model of credible communication [9], which superficially look very similar. Rabin’s model consists of two components: the first component is a definition of message credibility which is almost a two-step iteration of best responses starting from the semantic meaning; the second component is iterated strict dominance around a fixed core set of Rabin-credible messages being sent truthfully and believed. In particular, Rabin requires for m to be credible that m induces, when taken literally, exactly the set of all sender-best actions (from the set of actions that are inducible by some receiver belief) of all t ∈ [[m]]. This is defensible under the Rich Language Assumption, but both messages in the last considered game fail this requirement. Consequently, with no credible message to restrict iterated strict dominance, Rabin’s model predicts a total anything-goes for this game. This shows the limited applicability of approaches to message credibility that are inseparable from the Rich Language Assumption. The present notion of message credibility and the ibr model are not restricted in this sense and fare well with (partial) inexpressibility and the resulting inferences. To wrap up: as a solution concept, the epistemic ibr model offers, basically, a set of beliefs, viz., beliefs obtained under certain assumptions about the psychology of agents from a sequence of iterated best responses. I do not claim that this model is a reasonable model for human reasoning in general. Certainly, the simplifying assumption that players believe that they are facing a level-k opponent, and not possibly a level-l < k opponent, is highly implausible proportional to k, but especially so for agents that have, in a manner of speaking, already reasoned themselves through a circle multiple times. (It is easily verified that

Semantic Meaning and Pragmatic Inference in Non-cooperative Conversation

23

for finite M and T the ibr sequence always enters a circle after some k ∈ IN.)10 Still, I wish to defend that the ibr model does capture (our intuitions about) certain aspects of (idealized) linguistic behavior, namely pragmatic inference in cooperative and non-cooperative situations. Whether it is a plausible model of belief formation and reasoning in the envisaged linguistic situations is ultimately an empirical question. In conclusion, the ibr model offers a novel perspective on message credibility and the pragmatic inferences based on this notion. The model generalizes existing game-theoretical models of pragmatic inference by taking conflicting interests into account. It also generalizes game-theoretic accounts of credibility by giving up the Rich Language Assumption. The explicitly epistemic perspective on agents’ deliberation assigns a natural place to semantic meaning in cheap-talk signaling games as a focal starting point. It also highlights the unity in pragmatic inference: in this model both credibility-based inferences and implicatures are different outcomes of the same reasoning process.

References 1. Stahl, D.O., Wilson, P.W.: On players’ models of other players: Theory and experimental evidence. Games and Economic Behavior 10, 218–254 (1995) 2. Crawford, V.P.: Lying for strategic advantage: Rational and boundedly rational misrepresentation of intentions. American Economic Review 93(1), 133–149 (2003) 3. Camerer, C.F., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. The Quarterly Journal of Economics 119(3), 861–898 (2004) 4. Stalnaker, R.: Saying and meaning, cheap talk and credibility. In: Benz, A., J¨ ager, G., van Rooij, R. (eds.) Game Theory and Pragmatics, pp. 83–100. Palgrave MacMillan, Basingstoke (2006) 5. J¨ ager, G.: Game dynamics connects semantics and pragmatics. In: Pietarinen, A.-V. (ed.) Game Theory and Linguistic Meaning, pp. 89–102. Elsevier, Amsterdam (2007) 6. Benz, A., van Rooij, R.: Optimal assertions and what they implicate. Topoi 26, 63–78 (2007) 7. Grice, P.H.: Studies in the Ways of Words. Harvard University Press, Cambridge (1989) 8. Farrell, J.: Meaning and credibility in cheap-talk games. Games and Economic Behavior 5, 514–531 (1993) 9. Rabin, M.: Communication between rational agents. Journal of Economic Theory 51, 144–170 (1990) 10. Katz, J.J.: Language and Other Abstract Objects. Basil Blackwell, Malden (1981) 11. Katzir, R.: Structurally-defined alternatives. Linguistics and Philosophy 30(6), 669–690 (2007) 10

It is tempting to assume that “looping reasoners” may have an Aha-Erlebnis and to extend the ibr sequence by transfinite induction assuming, for instance, that level-ω players best respond to the belief that the ibr sequence is circling. I do not know whether this is necessary and/or desirable for linguistic applications. We should keep in mind though that in some cases human reasoners may not get to the ideal level of reasoning in this model and in others they might even go beyond it.

24

M. Franke

12. Matthews, S.A., Okuno-Fujiwara, M., Postlewaite, A.: Refining cheap talk equilibria. Journal of Economic Theory 55, 247–273 (1991) 13. J¨ ager, G.: Game theory in semantics and pragmatics. Manuscript, University of Bielefeld (February 2008) 14. Ho, T.-H., Camerer, C., Weigelt, K.: Iterated dominance and iterated best response in experimental “p-beauty contests”. The American Economic Review 88(4), 947–969 (1998) 15. Crawford, V.P.: Let’s talk it over: Coordination via preplay communication with level-k thinking (Unpublished Manuscript) (2007) 16. Battigalli, P.: Rationalization in signaling games: Theory and applications. International Game Theory Review 8(1), 67–93 (2006)

What Makes a Knight? Stefan Wintein

1

Introduction

In Smullyan’s well known logic puzzles (see for instance [3]), the notion of a knight, which is a creature that always speaks the truth, plays an important role. Rabern and Rabern (in [2]) made the following observation with respect to knights. They noted that when a knight is asked (1), he gets into serious trouble. Is it the case that: your answer to this question is ‘no’ ?

(1)

Indeed, upon answering (1) with either ‘yes’ or ‘no’, the knight can be accused of lying. How then, does a knight respond to (1)? Rabern and Rabern (henceforth R&R) assume that the knight reacts to questions like (1) with an answer different from ‘yes’ and ‘no’; let’s say that this reaction consists of answering (1) with neither. R&R use their assumption of a third possible reaction to set up an argument with the following intriguing conclusion: it is possible to determine the value of a three valued variable x by asking a single question to a knight (who knows the value of x). R&R’s argument is given in natural language, and is not backed up by formal machinery which rigorously defines the criteria which determine the answer of a knight to an arbitrary question. In [6] we asked under what conditions the informal argument of R&R can be reconstructed as a (formally) valid piece of reasoning. We showed that, under the assumption that it is not allowed to ask questions to the knight in which the “neither predicate” occurs self-referentially, there is a natural notion of validity according to which the reasoning of R&R can be considered as valid1 . The ban on self-referential questions involving the neither predicate excludes that we ask a knight a question such as the following. Is it the case that: you answer this question with ‘neither’ or you answer it with ‘no’ ?

(2)

Questions like (2) cause problems for a certain conception of a knight. According to this conception, a knight reacts to σ by answering with ‘yes’, ‘no’ or ‘neither’ if and only if σ is, respectively, true, false or ungrounded2 . By exploiting wellknown “Strengthened Liar arguments”, a contradiction can be derived from this knight conception and the hypothesis that the knight answers (2) with either 1

2

Thanks to Reinhard Muskens for his comments on this work. Actually, we were concerned with an ungroundedness predicate, equating, in alethic terms, Liars and Truthtellers. By which we mean that it does not receive a classical truth value in Kripke’s Strong Kleene minimal fixed point valuation.

T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 25–37, 2010. Springer-Verlag Berlin Heidelberg 2010

26

S. Wintein

‘yes’, ‘no’ or ‘neither’. In order to avoid the derivation of such contradictions, [6] formulated a notion of validity under which R&R’s argument is valid, but which banned self-referential constructions with the neither predicate. Interestingly, the formal reconstruction given in [6] of R&R’s informal argument differs from the intuitive conception of a knight that underlies their paper. For, as Brian Rabern explained in personal communication, it is allowed to ask questions like (2) to a knight; given such questions, a knight answers with ‘neither’, despite the fact that this reaction turns (2) into a true sentence. How then to characterize the process which determines the answer of a knight to an arbitrary question? In this paper, we will give a precise answer to that question by our definition of the knight function. According to our conception of a knight, a knight may react in four distinct ways3 to a question. Besides ‘yes’, ‘no’ and ‘neither’, we assume that a knight may also react with ‘both’. For instance, he will do so on the following question. Is it the case that: your answer to this question is ‘yes’ ?

(3)

Upon being asked (3), a knight may react by answering ‘yes’ or ‘no’ without being accused of lying. We will assume that when it is possible to answer a question σ both with ‘yes’ and ‘no’ in the sense alluded to, the knight will react to σ by answering ‘both’. Hence, we will distinguish between the semantic value of Liar like questions such as (1) and Truthteller like questions such as (3). Formally, this means that we will be concerned with the semantic values of Belnap’s famous t, f , b, n. For us, if a question σ has value logic which are contained in 4 t then the knight will answer it with ‘yes’ while a value of f implies that the knight will answer σ with ‘no’. Likewise, the semantic value n is associated with an answer of ‘neither’ while b is associated with answering ‘both’. In this paper, the questions that can be asked to a knight will be modeled as sentences of a language which has available four unary “answering” predicates, ‘T ’, ‘F ’, ‘N ’ and ‘B’, corresponding4 to an answer of ‘yes’, ‘no’, ‘neither’ and ‘both’ respectively. Importantly, in our interpretation of it is possible to create selfreference with respect to all four “answering” predicates. There is no, as there was in [6], ban on self-reference of any kind. The essence of this paper is to give a characterization of the knight function , i.e., the function which maps an arbitrary question (sentence) σ of , to the answer given to σ by a knight. Exploiting our function , we conclude the paper by showing a further interesting property of a knight; it is possible to determine the value of a four valued variable x by asking a single question to a knight (who knows the value of x). Structure of this paper. In Section 2 we will introduce the formal machinery of this paper. We will work with quantifier free languages which employ two kind 3

4

This is a major difference with [2] and [6], where knights are considered which have an answering repertoire of three distinct reactions. Of course, if the reader likes he may also adopt another (e.g. an alethic one) interpretation of the four predicates.

What Makes a Knight?

27

of constant symbols; quotational constant symbols, used for quoting sentences, and non quotational ones, which are used to generate circular and self-referential sentences. The language will be equipped with assertoric rules which are, formally, tableau rules for the assertoric sentences of , which are sentences that are signed with the symbols in A , D , A , D . Intuitively, Aσ indicates that it is possible to assert σ, while Dσ indicates that it is not possible to deny σ. In section 3 the assertoric rules of are used, in combination with techniques of assertoric semantics (see, e.g., [7], [8]), to define the knight function . In section 4 we apply the knight function  to an interesting logic puzzle in the spirit of Smullyan. Section 5 gives conclusions.

2

Preliminaries

2.1

The Language

Ä

We will work with a language that is, in a sense, intermediate between propositional logic and first order logic. Let us explain in what sense. On the one hand, contains a set of propositional constant symbols P pn  n  N. On the other hand, our language contains four (unary) predicate symbols, ‘T ’, ‘F ’, ‘N ’ and ‘B’ . The set of non quotational constant symbols is given by C cn  n  N. are contained in the set Q . This set The quotational constant symbols of is defined together with Sen  , the set of sentences of , as the smallest sets which satisfy the following three equations. c  C, p  P T c, F c, N c, B c, p  Sen   φ, ψ  Sen   φ, φ  ψ , φ  ψ   Sen  , φ  Q φ  Q T φ, F φ, N φ, B φ  Sen   2.2

The Denotation Function π



A denotation function for is a function π  C  Q Sen   which is such that for any φ  Q , π φ φ. Thus, a denotation function maps the quotational constants to the associated sentences while the non quotational constants in C may refer to any sentence of . The non quotational constants can be used to create self-reference. In this paper, we assume that constants λ, τ  C are always such that π λ T λ and π τ  T τ . Hence T λ and T τ  model5 the (Strengthened) Liar and the Truthteller respectively. 2.3

Factual Valuations and 4 Valuations

The elements of P are thought of as (possible) facts. A factual valuation for is a 1, 0, 0, 1. We will interpret ‘V p 1, 0’ as function V  P 2, where 2



5

Strictly speaking, T λ models the following question. Is it the case that: your answer to this question is not ‘yes’ ? However, for sake of notational convenience we will denote T λ as the Liar. Similar remarks apply to T τ .

28

S. Wintein

‘p can be answered positively and p cannot be answered negatively by a knight’ while ‘V p 0, 1’ is interpreted as ‘p cannot be answered positively and p can be answered negatively, by a knight’. By a 4 valuation for we mean a function 1, 0, 0, 1, 1, 1, 0, 0. We will use t, f , b from the sentences of into 4 and n as abbreviations for 1, 0, 0, 1, 1, 1 and 0, 0 respectively. 2.4

Kripke Correctness

We say that a 4 valuation V is Kripke correct just in case V is compositional with respect to the following truth tables.  tf t tf f f f nnf bbf

nb nb f f nf f b

tf nb ttt t t f tf nb ntnn t btb t b

π t T t t t f f n n b b

φ φ t f f t n n b b

π t F t t f f t n n b b

Thus, a 4 valuation V is Kripke correct just in case V is Strong Kleene compositional over 4 (i.e., it satisfies the truth tables displayed above for , , ) and V satisfies the Kripkean fixed point interpretation (see [1]) of a truth and falsity predicate (i.e., it satisfies the truth tables displayed above for T t and F t). Observe that the definition of Kripke correctness does not impose any explicit constraint on the valuation of sentences of form N t and B t. Sentences of form N t and B t will be discussed in 3.7. 2.5

Worlds

A world is the equivalent of a factual valuation. A world w is a set of assertoric sentences and the assertoric world wV  that corresponds with factual valuation V is defined as follows: wV 

Ap  V p

1  Dp  V p

The set of all assertoric sentences is denoted as



0

 , where:

Xσy  X  A, D, y  , , σ  Sen  

Alternatively, a world is defined as any w   such that Xσy  w implies that y  and that σ  P and such that, for any p  P , we have that Ap  w Dp  w. The set of all worlds will be denoted by W .



What Makes a Knight?

2.6

29

Assertoric Rules

are basically rules of a tableau system signed with The assertoric rules of the elements A , D , A , D for Strong Kleene logic, augmented with rules for T, F, N and B. We follow [4] in distinguishing two types of rules; those of disjunctive type and those of conjunctive type . An assertoric rule associates an assertoric sentence Xσy with its set of immediate semantic sub sentences, Π Xσy . Depending on its type, an assertoric rule is depicted in either one of the following two ways: Xσy Π Xσy 

Xσy

Π Xσy 

(4)

Here are the assertoric rules for . In the rules for the four predicates, t is a quotational or non quotational constant, i.e., an arbitrary element of C  Q .











A

D

A

D

Aαβ 

Dαβ 

Aαβ 

Dαβ 

Aα , Aβ 



Dα , Dβ 

Aαβ 



Aα , Aβ 

Dαβ 



Dα , Dβ 

Aαβ 

Dαβ 



Aα , Aβ 

Dα , Dβ 

Aα , Aβ 

Dα , Dβ 



Aα

Dα 

Dα

Aα 

Aα Dα 

Dα Aα 

AT

T t F t  N t B t

Aπ Aπ

t

Aπ

 t

AF

t

Dπ

 t

AN

t

, Dπ t

AB



 t

t

, Dπ t

DT

 t



t

Dπ

 t

DF

t

 t

 DN

t

Aπ

Aπ

Aπ

, Dπ t

 DB



 t

t

, Dπ t

AT

 t



Aπ Aπ

t

Aπ

 t

AF

t

Dπ

 t

AN

t

, Dπ t

AB



 t

t

, Dπ t

DT

 t



Aπ Aπ

t

Dπ

t

DT

t



Aπ

t

 DN

t



, Dπ t

 DB





t



t



t

, Dπ t



The assertoric rules for , , , T and F are associated with a Kripkean Strong Kleene fixed point interpretation of . The (downwards reading of the) assertoric rule for AN t says that if you assert N t you must refuse to assert π t and also, you must refuse to deny π t. The other rules are explained along similar lines. Observe that only the assertoric rules for N and B bring the negative assertoric sentences (those of form Xσ ) in play and that a negative rule X  is the dual of the positive rule X  in a sense which is clear from the table of rules. When the set of immediate semantic sub sentences associated with a rule X y is



 

30

S. Wintein

a singleton, it does not matter which type, or , we attribute to the rule. The allotment of types displayed in the table was chosen for sake of symmetry.

3

Constructing a Knight Function



In this section, we will construct the knight function   Sen   W 4 which is interpreted as follows. For every σ  , if σ, w t, respectively f , n or b, then the knight will answer σ in world w with, respectively, ‘yes’, ‘no’, ‘neither’ or ‘both’. 3.1

Assertoric Semantics and Games

In order to define  we will adopt the framework of assertoric semantics as developed in ([7]). The function  will be constructed in terms of the assertoric 4 valuation  , which is a valuation of an assertoric game between two players, whose strategies consist of associating immediate semantic sub sentences with assertoric sentences of type and respectively. Player wins the assertoric game for Xσy just in case he can ensure an open outcome of the game. The closure conditions by which we judge outcomes to be open and closed are discussed below. The function  will report, for each sentence σ, whether or not player wins the game for Aσ and also, whether or not he wins the game for Dσ . For instance  σ, w 1, 0 tells us that player wins the game for Aσ and that he looses the game for Dσ . In 3.2 until 3.7 we will be concerned with  . With  at hand, the definition of  is easily obtained, as will be discussed in 3.8. 3.2

Expansions



Let S   . A function f  S  is said to respect the immediate semantic sub sentence structure, denoted resf , just in case f assigns to each Xσy  S an immediate semantic sub sentence of Xσy . That is: resf 

 X

y σ

 S  f Xσy   Π Xσy 

(5)

With    and    denoting the sets consisting of all assertoric sentences of type and respectively, we define the sets  ,  and  as follows:



f  f    

  , resf 



g  g    

  , resg

 h  h     , resh For each h   and Xσy   , the expansion of Xσy according to h, hX

y σ

by letting, for each n  N: hXσy 0

Xσy ,

hXσy n  1

hhXσy n

, is defined

What Makes a Knight?

3.3

31

Outcomes

Observe that a pair f, g     induces a function h f  g  . For any f, g     and Xσy   , we say that the sequence outXσy , f, g  hXσy n n N , where h f  g, is the outcome of the interrogation for Xσy in which player plays strategy f and in which player plays strategy g. An outcome out is either open or closed in a world w, which we will denote as Ow out and Cw out respectively. Before we discuss the closure conditions for outcomes, we first describe how the closure conditions for outcomes give rise to closure conditions for assertoric sentences and how the latter conditions can be used to induce a 4 valuation V . 3.4

Inducing a 4 Valuation via Closure Conditions

An assertoric sentence Xσy is said to be open in world w, denoted Ow Xσy  just in case player can ensure an open outcome in the game for Xσy . Hence, we have that: Ow Xσy 

 f g  O

y w outXσ , f, g 

(6)

When Xσy is not open in a world w, it is closed in that world, denoted Cw Xσy . We are interested, for reasons given below, in closure conditions for assertoric sentences which validate the assertoric rules of . Closure conditions validate the assertoric rules of just in case we have, for each Xσy   and each w  W , that: Ow Yφz  for all Yφz  Π Xσy . (7) Xσy is of type : Ow Xσy 



Xσy is of type : Ow Xσy 

O

z w Yφ 

for some Yφz  Π Xσy .

Closure conditions induce a 4 valuation V as follows, where VX  Sen   is the projector function of V on its X coordinate. VA σ, w

1

O

V σ, w



w Aσ ,

x, y 

VD σ, w

V

A σ, w

1

O

x, VD σ, w





y

Proof: By an inspection of the assertoric rules, left to the reader. The Cyclical Closure Conditions and

0, 1

w Dσ 

Proposition 1. Closure conditions which validate the assertoric rules of duce a 4 valuation V which is Kripke correct.

3.5

(8)

(9) in

Î

The essential element involved in the definition of closure conditions which valis the notion of a cycle. The rationale of our idate the assertoric rules of definition of closure conditions in terms of cycles will be revealed in 3.6. In order to define the notion of a cycle, we enlarge the set of assertoric rules by adding

32

S. Wintein

the (trivial) rules for elements of P : for each p  P , we let Π Xpy  Xpy . A cycle   is a finite sequence of assertoric sentences such that 1) each term of the sequence, except the first, is an immediate sub sentence of its predecessor and 2) the first term is an immediate sub sentence of the last term. The addition of the assertoric rules for elements of P ensures that, for each p  P , Xpy is a cycle. With π λ T λ, and π γ  N γ , other examples of cycles are AT λ , DT λ , DT λ , AT λ and AN γ  , AN γ  . There are three types of cycles. A positive cycle is a cycle of which all the terms are positive, whereas in a negative cycle each term is negative and a mixed cycle has both positive and is a cycle, its inverse, denoted as 1 is obtained by negative terms. When performing a “charge swap” (from  into  or from  into ) on each element of . A cycle is either vicious or virtuous relative to a world w.







 (a) 



is positive, we say that 1. If following two conditions holds:

 is vicious in w just in case of one the

Xp and Xp  w. (b) For some σ  Sen  , both Aσ and Dσ are terms of

 

 

.



2. If is negative, we say that is vicious in w just in case 1 is virtuous in w. 3. If is mixed, we say that is vicious in w just in case for some X  A, D, σ  Sen  , both Xσ and Xσ are terms of .



An outcome outXσy , f, g  is a sequence and the set of terms of an outcome is either of finite or of denumerably infinite cardinality. If the set of terms of an outcome out is finite, then there is a first term in out, say Xσy with index n, for which there exists a term Yβz of out with index m  n such that Yβz is an immediate sub sentence of Xσy . We call such a term Xσy a stopping term. The sub sequence of out which consists of the terms up to an including the stopping term is called the initial sequence of out. The initial sequence of an outcome contains a (unique) finite cycle whose last term is the stopping term of the outcome. Call this cycle the measurement cycle of the outcome. An outcome which involves infinitely many terms does not contain a cycle and hence it does not contain a measurement cycle. We semantically valuate an outcome by judging it to be open or closed, based on the “moral” character of its measurement cycle. According to the cyclical closure conditions, an outcome out is (cyclically) closed in a world w, denoted Cw out just in case one of the following two conditions holds: 1. The set of terms of out is infinite and there is a term with index n such that any term with index  n, has negative charge, i.e., is of form Xσ . 2. The set of terms of out is finite and the measurement cycle of out is vicious in w. By instantiating schema (9) with the cyclical closure conditions, we obtain the 4 valuation  .

What Makes a Knight?

3.6

33

The Compositionality Condition

An important property of the cyclical closure conditions is that they satisfy the compositionality condition Comp. Comp  Cw  hXαy n n0 

C

y n

w  hXα n1 

The fact that the cyclical closure conditions satisfy Comp implies that along each expansion hXαy of Xαy , closure, i.e. closedness and openess, is preserved along the expansion. That the cyclical closure conditions satisfy Comp is easily seen by an inspection of those conditions. In fact, the cyclical closure conditions have been devised to satisfy Comp, as from Comp we can prove that the cyclical closure conditions validate the assertoric rules. Before we do so, we define Π  Xσy  as the set consisting of all semantic sub sentences of Xσy . Thus, Π  Xσy  is obtained by taking the transitive closure of the immediate semantic sub sentence relation. Proposition 2. The cyclical closure conditions validate the assertoric rules of . Proof: We illustrate for Aαβ . Other cases are similar and left to the reader. Suppose that Aαβ is open (in w). Then there is a strategy f    such that for all g   , the expansion hAαβ , where h f   g, is open. Now Aαβ is of type , and the strategies of player can be bi-partitioned into strategies gα , which have g Aαβ  Aα and strategies of type gβ , which have g Aαβ  Aβ . As f  results in an open outcome, no matter whether player plays a strategy of type gα or gβ , it follows that f  is such that for all g   , we have that Ow outAα , f  , g  and that Ow outAβ , f  , g . Hence, Aα is open and Aβ is open. Suppose that Aα is open and that Aβ is open. This means that there exists a strategy fα   such that for all g   we have that Ow outAα , fα , g  and that there exists a strategy fβ   such that for all g   we have that Ow outAβ , fβ , g . Let f   be any strategy which satisfies:



- Xσy  Π  Aα , type of Xσy is f Xσy  fα Xσy  - Xσy  Π  Aβ   Π  Aα , type of Xσy is f Xσy 

fβ Xσy 

From Comp it follows that the constructed f is such that for all g   we have that Ow outAαβ , f, g .  Proposition 3.

 is Kripke correct.

Proof: From Proposition 2 and Proposition 1. 3.7



The Predicates N and B

Proposition 3 tells us that  is Kripke correct. However, the notion of Kripke correctness does not tell us anything about the relation between a sentence σ

34

S. Wintein

and a sentence which says of σ that it is neither, respectively both. Let us make some remarks with respect to this relation. As a corollary from Proposition 3, we see that  is Truth Correct (T C) with respect to the values t and f . By this, we mean that:

 T t, w

t

  πt, w

t,

 F t, w

t

  πt, w

f

Thus, as  is T C with respect to t and f , a sentence which says of σ that respectively, σ is true or σ is false, is true just in case σ is true, respectively false.  is not T C with respect to the values n and b however, as we can find a denotation function π such that (for some t and any w):

 πt, w

n 

 N t, w

t

(10)

For instance, consider the sentence N η   T η , where π η  N η   T η . By drawing the relevant assertoric expansions of Aπ η and Dπ η , we see that for any world w, V nπ η , w n and  N η , w n. Hence, we have a sentence, π η , with semantic value n while N η , a sentence which says that π η  has semantic value n, is not true. Thus,  is not T C with respect to n. Neither is  Falsity Correct (F C) with respect to t and f , for we can find a denotation function π such that (for some t and any w):

 πt, w  t



 T t, w

f

 πt, w  f



 F t, w

f

For instance, the Liar has value n, but so does a sentence which says of the Liar that it is true. Space and time preclude us from continuing our discussion of the interesting notions of Truth and Falsity Correctness. We now turn to the definition of the knight function , which is based on  . 3.8

The Knight Function

Ã

In our construction of the knight function, we will make an assumption with respect to the denotation function π. First, we define the language C , which contains the same logical symbolism as , except for the fact that it contains no quotational constant symbols. A denotation function π  C  Q Sen   is said to be C closed just in case for any c  C we have that π c  Sen  C . Thus, if π is C closed it is, for instance, forbidden that π c T T λ, where c  C. The assumption of C closedness may be justified as follows. A question to a knight can be addressed in two senses. In a self-referential sense, for instance by letting π c T c  p and by asking T c  p, i.e., “you will not answer ‘yes’ to this question or p is the case” or in a detached sense, by asking T T c  p, i.e., “you will answer ‘yes’ when you are asked the quoted question”. With an C closed denotational function, one can ask a sentence of the first kind by constructing a sentence of C , while a sentence of the second kind, and mixed sentences as T λ  T T τ , can be asked by constructing a sentence in Sen    Sen  C . Thus, in a sense the fact that we can’t have π c T T λ in an C closed valuation function is no real restriction. If



What Makes a Knight?

35

by asking T c we mean asking the self-referential question ‘this sentence is not true’, one can take T λ. If by asking T c we mean asking something about the Liar in a detached sense, we can do so by asking T T λ. Now it may very well be possible to take care of the two senses, detached and self-referential, in the presence of an arbitrary dentation function. In such an approach, the structure of the denotation function itself will determine, for each sentence of , whether it is of self-referential or of detached type. However, for sake of simplicity we will work with an C closed denotation function. That being said, the knight function  is defined as follows. We first define the atomic knight function 0 , where

0  P 

T c  c  C   F c  c  C   N c  c  C   B c  c  C 

 4,

is defined as follows. For each σ in the domain of 0 , we let 0 σ, w  σ, w. Next, we define the knight function  as the recursive extension of 0 according to the truth tables for , ,  and the following truth tables for sentences of form T σ , F σ , N σ  and B σ . σ T σ  t t f f n f b f

σ F σ  t f t f n f b f

σ N σ  t f f f n t b f

σ B σ  t f f f n f b t

A quotational ascription of a semantic value to an atomic sentence in the domain of 0 , is a reflection on the outcome of the downwards expansion process, i.e. a reflection on the values of  and is, as illustrated by the truth tables for the semantic predicates, a classical reflection in the sense that all atomic sentences of form T σ , F σ , N σ  or B σ  have a classical truth value. For instance, we have that T λ, w n and that T T λ, w f and so T T λ, w t. In a sense, we cannot really say that the Liar is not true, while in another sense we can. The two senses alluded to are reflected in ’s valuation of T λ as n and of T T λ as t. The knight function is, from the perspective of formal theories of truth, an interesting function whose formal structure and (alethic) interpretation deserves attention. However, the alethic interpretation of  is not the topic of this paper and neither is it the topic of this paper to give a thorough formal analysis of . What is the topic of this paper is the construction of a plausible knight function, which is, so we claim, given by , and an application of this knight function to solve an interesting logic puzzle. It is to this application that we now turn.

4 4.1

Solving Logic Puzzles with



The Three Roads Riddle

Here is the the three roads riddle, which is a version of the riddle presented in ([2]) that was mentioned in the introduction of this paper. Suppose that you are

36

S. Wintein

at a junction point of three roads, the left, right and middle road say. One of the roads is a good road, the other two are bad ones and for each road, you have no clue as to whether it is good or bad. At the junction point, there is a knight who knows which road is good. Can you find out which road is good by asking a single yes-no question to the knight? R&R show that you can, by asking the following question to the knight: Is it the case that: (you will answer this question with ‘no’ and the left road is good) or the middle road is the good? R&R argue, in natural language using reductio ad absurdum, that an answer of ‘yes’ indicates that the middle road is good, an answer of ‘no’ indicates that the right road is good and that an explosion indicates that the left road is the good one. Our formal reconstruction of the question reflects the distribution of answers to the three possible “road situations”. We let pL , pM , pR  P stand for the propositions that the left, respectively middle and right, road is good. The question of R&R is formally modeled as π c, where c  C is such that π c F c  pL   pM . That π c is a solution to the riddle is illustrated by observing that, with ‘ ’ the double material implication which is defined as usual, for any world w we have that:



1. 2. 3. 4.2

πc, w πc, w πc, w

t f n

 A  A A

T πc  pM , w F πc  pR , w N πc  pL , w

w  pR w pL  w

pM

t t t

The Four Roads Riddle

With the present formalism at hand, it is not hard to see that we can also solve the four roads riddle, which is just like the three roads riddle except for the fact that at the junction point, there are four roads, an east, west, north and south road. We will use pE , pW , pN , pS to denote the proposition that respectively, the east, west, north or south road is good. As demonstrated in ([5]) and ([8]) in detail, we can find out which of the four roads is good by asking the following question to the knight. θ  T λ  pE   T τ   pW   pN The intuitive counterpart of θ is that we ask the knight: Is it the case that: ( your answer to this question is not ‘no’ and east is good) or (your answer to this question is ‘yes’ and west is good) or north is good? where an occurrence of ‘this question’ refers to the question that is on the same horizontal line as that occurrence of ‘this question’. It is left to the reader to verify, which he can do by drawing the possible assertoric expansions of Aθ and Dθ , that indeed question θ does the job, as we have that: 1. 2. 3. 4.

θ, w θ, w θ, w θ, w

t f n b

 A  A  A A

w w pE  w pW  w

pN pS

T θ  pN , w t F πθ  pS , w t N πθ  pE , w t B πθ  pW , w t

What Makes a Knight?

5

37

Conclusion

Using the framework of assertoric semantics, we extended Smullyan’s notion of a (classical) knight so that the notion is also defined for non classical languages in which we have the power to talk about all four possible answers of like a knight, via the predicates ‘T ’, ‘F ’, ‘B’ and ‘N ’, and in which we may ask arbitrary self referential questions to the knight involving these predicates. We studied a notion of a knight with two non classical answers, ‘neither’ and ‘both’ and we used this conception to formulate a new Smullyan like riddle, which we called the four roads riddle, and we showed that it can be solved in 1 question by this paper’s notion of a knight. Lots of interesting philosophical and technical questions with respect to the function  have been left untouched by this paper. Philosophically, I take it that the most interesting question is what (the assertoric interpretation) of  tells us about our notion of truth. This paper is not the place to discuss such matters; let me just remark that I am convinced that  can be invoked to tell an anti-deflationary story about truth. Let us conclude this paper with another Smullyan like riddle, which adds a twist to the four roads puzzle. Suppose you are at the junction point with the four roads and that there are also two creatures present, a knight and a knave (an inverse knight; someone who always lies). Again, you want to take the good road and you have no idea which of the four roads is good. The knight and the knave know which road is good and you can query them. However, you do not know which of the two creatures is the knight and which is the knave and you are only allowed to ask a single question. Can you find a question that allows you to take the good road with certainty?

References [1] Kripke, S.: Outline of a theory of truth. Journal of Philosophical Logic 72, 690–716 (1975) [2] Rabern, B., Rabern, L.: A simple solution to the hardest logic puzzle ever. Analysis 68, 105–112 (2008) [3] Smullyan, R.: The Lady or the Tiger. Pelican Books (1983) [4] Smullyan, R.: First-order Logic. Dover, New York (1995) [5] Wintein, S.: Computing with self-reference. In: Proceedings of the A.G.P.C. 09 conference, available online (2009) [6] Wintein, S.: On languages that contain their own ungroundedness predicate. To appear in: Logique et Analyse, also available in the online proceedings of ESSLLI’ 09 (2009) [7] Wintein, S.: Assertoric alethic semantics and the six cornerstones of truth (submitted) (2010) [8] Wintein, S.: Assertoric semantics and the computational power of self-referential truth (submitted) (2010)

The Algebraic Structure of Amounts: Evidence from Comparatives Daniel Lassiter New York University

Abstract. Heim [9] notes certain restrictions on quantifier intervention in comparatives and proposes an LF-constraint to account for this. I show that these restrictions are identical to constraints on intervention in wh-questions widely discussed under the heading of weak islands. I also show that Heim’s proposal is too restrictive: existential quantifiers can intervene. Both of these facts follow from the algebraic semantic theory of weak islands in Szabolcsi & Zwarts [25], which assigns different algebraic structures to amounts and counting expressions. This theory also makes novel predictions about the interaction of degree operators with conjunction and disjunction, which I show to be correct. Issues involving modal interveners [9], interval semantics for degrees [23,1], and density [4] are also considered. Keywords: Comparatives, weak islands, degrees, algebraic semantics, quantification, disjunction.

1

Introduction

1.1

Two Puzzles

Consider the sentence in (1): (1) Every girl is less angry than Larry is. A prominent theory of comparatives, associated with e.g. von Stechow [28] and Heim [9], predicts that (1) should be ambiguous between two readings, the first equivalent to (2a) and the second equivalent to (2b). (2)

a. For every girl, she is less angry than Larry is. b. The least angry girl is less angry than Larry is.

But as Kennedy [11] and Heim [9] note, the predicted reading in (2b) is absent: instead, (1) is unambiguously false if there is any girl who is angrier than Larry. The second puzzle involves the relationship between (3) and (4). (3) John is richer than his father was or his son will be. a. OK if John has $1 million, his father has $1,000, and his son has $10,000. T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 38–56, 2010. c Springer-Verlag Berlin Heidelberg 2010 

The Algebraic Structure of Amounts: Evidence from Comparatives

39

b. OK if John has $10,000, his father has $1,000, and his son has $1 million (e.g., with continuation “... but I’m not sure which one”). (4) John is richer than his father was and his son will be. a. OK if John has $1 million, his father has $1,000, and his son has $10,000. b. * if John has $10,000, his father has $1,000, and his son has $1 million. (3) is ambiguous between (3a) and (3b), while (4) is unambiguous. In other words, both (3) and (4) can be read as meaning that John is richer than the richer of his father and his son, but only (3) can be read as meaning that he is richer than the poorer of the two. This is also a problem for theories of comparatives that follow von Stechow’s and Heim’s assumptions, because they predict that (3) and (4) should both be ambiguous, and have the same two readings. Sentences like (3) have been taken as evidence that or is lexically ambiguous between the standard logical disjunction and an NPI interpreted as conjunction [23]. I will argue, first, that the ambiguity of (3) is a matter of scope rather than lexical ambiguity; and, second, that the fact that (4) does not display a similar ambiguity is explained by the same principles that rule out the unavailable reading of (1) that is paraphrased in (2b). 1.2

The Plan

Heim [9] states an empirical generalization that quantificational DPs may not intervene scopally between a comparative operator and its trace, essentially to explain the absence of reading (2b) of (1). She suggests that this can be accounted for by an intervention constraint at LF. I demonstrate that this generalization holds with some but not all quantifiers, and that the limitations and their exceptions match quite closely the distribution of interveners in weak islands. Following a brief suggestion in Szabolcsi [24], I show that the algebraic semantic theory of weak islands in Szabolcsi & Zwarts [25] predicts the observed restrictions on comparative scope without further stipulation, as well as the exceptions. This extension of the algebraic account to comparatives also predicts the existence of maximum readings of wh-questions and comparatives with existential quantifiers, and I show that this prediction is correct. In addition, it accounts for the asymmetry between conjunction and disjunction in the comparative complement that we saw in (3) and (4). Certain strong modals appear to provide counter-examples; however, I suggest that the problem lies not in the semantics of comparison but in the analysis of modals. A great deal of work has been done since 1993 on both comparative scope and weak islands, and we might suspect that the problem discussed here can be avoided by adopting one of these more recent proposals. In the penultimate section I survey two influential accounts of the semantics of degree, one relating to comparatives [23] and the other to negative islands [4]. I suggest that neither

40

D. Lassiter

of these proposals accounts for the data at issue, but both are compatible with, and in need of, a solution along the lines proposed here.

2

Comparatives and Weak Islands

2.1

Preliminaries

Suppose, following Heim [9], that gradable adjectives like angry denote relations between individuals and degrees. (5)  angry  = λdλx[angry (d)(x)] [angry(d)(x)] can be read as “x’s degree of anger is (at least) d”. As the presence of “at least” in the previous sentence suggests, we also assume with Heim that expressions of degree in natural language follow the monotonicity condition in (6). (6) A function f is monotone iff: ∀x∀d∀d [(f (d)(x) ∧ d < d) → f (d )(x)] (5) is not uncontroversial, but it is a reasonably standard analysis, and we will examine its relationship to some alternative accounts in the penultimate section. (6) is required to accommodate, e.g., the acceptability of true but underinformative answers to questions: Is your son three feet tall? Yes – in fact, he is four feet tall. I also assume, following von Stechow [28] and Heim [9], that more/-er comparatives are evaluated by examining the maxima of two sets of degrees: (7)  max  = λD ιd∀d [D(d ) → d ≥ d ] More takes two sets of degrees as arguments and returns 1 iff the maximum of the second (the main clause) is greater than the maximum of the first (the than-clause). Less does the same, replacing except that the ordering is reversed. (8)

a.  more/-er  = λDd,t λD d,t [max(D ) > max(D)] b.  less  = λDd,t λD d,t [max(D ) < max(D)]

Finally, we assume that more/-er forms a constituent with the than-clause to the exclusion of the adjective, and that typical cases of (at least clausal) comparatives involve ellipsis, so that Larry is angrier than Bill is = Larry is angry [-er than Bill is angry]. 2.2

Quantificational Interveners and the Heim-Kennedy Constraint

Once we consider quantifiers, the treatment of gradable adjectives and comparatives outlined briefly in the previous subsection immediately generates the puzzle in (1)-(2). To see this, note first that, in order for more/-er to get its second argument, it must have undergone QR above the main clause. The difference between the two predicted readings depends on whether the quantifier every girl

The Algebraic Structure of Amounts: Evidence from Comparatives

41

raises before or after the comparative clause. So, for example, Heim would assign the first reading of (1) the LF in (9a). The alternative reading is generated when the comparative clause raises to a position higher than the quantifier every girl. (9) Every girl is less angry than Larry is. a. Direct scope: every girl > less > d-angry ∀x[girl(x) → [max(λd.angry(d)(x))] < max(λd.angry(d)(Larry))] “For every girl x, Larry is angrier than she is.” b. Scope-splitting: less > every girl > d-tall max(λd.∀x[girl(x) → angry(d)(x)]) < max(λd.angry(d)(Larry)) * “Larry’s max degree of anger exceeds the greatest degree to which every girl is angry (i.e., he is angrier than the least angry girl).” If (9) had the“scope-splitting” reading in (9b), it would be true (on this reading) if the least angry girl is less angry than Larry. However, (9) is clearly false if any girl is angrier than Larry. Heim [9] suggests that the unavailability of (9b) and related data can be treated as a LF-constraint along the lines of (10) (cf. [26]): (10) Heim-Kennedy Constraint (HK): A quantificational DP may not intervene between a degree operator and its trace. The proposed constraint (10) attempts to account for the unavailability of (9b) (and similar facts with different quantifiers) by stipulating that the quantificational DP every girl may not intervene between the degree operator less and its trace d-tall. The puzzle is what syntactic or semantic principles explain this constraint given that structures such as (9b) are semantically unexceptionable on our assumptions. 2.3

Similarities between Weak Islands and Comparative Scope

As Rullmann [21] and Hackl [7] note, there are considerable similarities between the limitations on the scope of the comparative operator and the core facts of weak islands disussed by Kroch [14] and Rizzi [20], among many others. Rullmann notes the following patterns: (11)

a. b. c. d. e.

I I I I I

wonder wonder wonder wonder wonder

how how how how how

tall tall tall tall tall

Marcus is / # isn’t. this player is / # no player is. every player is / # few players are. most players are / # fewer than ten players are. many players / # at most ten players are.

(12)

a. b. c. d. e.

Marcus is taller than Lou is / # isn’t. Marcus is taller than this player is / # no player is. Marcus is taller than every player is / # few players are. Marcus is taller than most players are / # fewer than ten players are. Marcus is taller than many players are / # at most ten players are.

42

D. Lassiter

These similarities are impressive enough to suggest that a theory of the weak island facts in (11) should also account for the limitations on comparatives in (12). Rullmann suggests that the unavailability of the relevant examples in (11) and (12) is due to semantic, rather than syntactic, facts. Specifically, both whquestions and comparatives make use of a maximality operation, roughly as in (13): (13)

a. I wonder how tall Marcus is. I wonder: what is the degree d s.t. d = max(λd.Marcus is d-tall)? b. Marcus is taller than Lou is. (ιd[d = max(λd.Marcus is d-tall)]) > (ιd[d = max(λd.Lou is dtall)])

With these interpretations of comparatives and questions, we predict that the sentences in (14) should be ill-formed because each contains an undefined description: (14)

a. # I wonder how tall Marcus isn’t. I wonder: what is the degree d s.t. d = max(λd . Marcus is not d-tall)? b. # Marcus is taller than Lou isn’t. (max(λd.Marcus is d-tall) > (max(λd. Lou is not d-tall))

If degrees of height are arranged on a scale from zero to infinity, there can be no maximal degree d such that Marcus or Lou is not d-tall. However, the similarities between comparatives and weak island-sensitive expressions such as how tall go deeper than Rullmann’s discussion would indicate. S&Z point out that several of the acceptable examples in (11) do not have all the readings predicted by the logically possible orderings of every player and how tall. As it turns out, the same scopal orders are also missing in the corresponding comparatives when we substitute -er for how tall. For example, (15) I wonder how tall every player is. a. every player > how tall > d-tall “For every player x, I wonder: what is the max degree d s.t. x is d-tall?” b. how tall > every player > d-tall “I wonder: what is the degree d s.t. d = max(λd. every player is d-tall)?” A complete answer to (15) would involve listing all the players and their heights. In contrast, an appropriate response to (15b) would be to intersect the heights of all the players and give the maximum of this set, i.e. to give the height of the shortest player. This second reading is clearly not available. In fact, although S&Z and Rullmann do not notice, similar facts hold for the corresponding comparative:

The Algebraic Structure of Amounts: Evidence from Comparatives

43

(16) Marcus is taller than every player is. a. every player > -er > d-tall “For every player, Marcus is taller than he is.” b. -er > every player > d-tall # “Marcus’ max height is greater than the max height s.t. every player is that tall, i.e. he is taller than the shortest player.” Rullmann’s explanation does not exclude the unacceptable (16b): unlike comparatives with an intervening negation, there is a maximal degree d s.t. every player is d-tall on Rullmann’s assumptions, namely the height of the shortest player. Note in addition that (16) is identical in terms of scope possibilities to our original comparative scope-splitting example in (1)/(9), although its syntax is considerably different. Like (9), (16) falls under Heim’s proposed LF-constraint (10), which correctly predicts the unavailability of (16b).1

3

Comparative Scope and the Algebraic Structure of Amounts

3.1

Szabolcsi & Zwarts’ (1993) Theory of Weak Islands

Like Rullmann [21], S&Z argue that no syntactic generalization can account for the full range of weak islands, and propose to account for them in semantic terms. They formulate their basic claim as follows: (17) Weak island violations come about when an extracted phrase should take scope over some intervener but is unable to. S&Z explicate this claim in algebraic terms, arguing that weak islands can be understood if we pay attention to the operations that particular quantificational elements are associated with. For instance, (18) Universal quantification involves taking intersections (technically, meets). Existential quantification involves taking unions (technically, joins). Negation involves taking complements. (18) becomes important once we assign algebraic structures as denotations to types of objects, since not all algebraic operations are defined for all structures. The prediction is that a sentence will be semantically unacceptable, even if it can be derived syntactically, if computing it requires performing an operation on a structure for which the operation is not defined. S&Z illustrate this claim with the verb behave: 1

We might think to appeal to the fact that comparative clauses are islands to extraction in order to account for the missing readings in (16), but this would give the wrong results: it would rule out (16a) instead of (16b).

44

(19)

D. Lassiter

a. How did John behave? b. *How didn’t John behave? c. How did everyone behave? i. For each person, tell me: how did he behave? ii. *What was the behavior exhibited by everyone?

Behave requires a complement that denotes a manner. S&Z suggest that manners denote in a free join semilattice, as Landman [15] does for masses. [abc]

(20) Free join semilattice [ab]

[ac]

[bc]

[a]

[b]

[c]

A noteworthy property of (20) is that it is closed under union, but not under complement or intersection. For instance, the union (technically, join) of [a] with [b⊕c] is [a⊕b⊕c], but the intersection (meet) of [a] with [b⊕c] is not defined. The linguistic relevance of this observation is that it corresponds to our intuitions of appropriate answers to questions about behavior. In S&Z’s example, suppose that three people displayed the following behaviors: (21) John behaved kindly and stupidly. Mary behaved rudely and stupidly. Jim behaved loudly and stupidly. If someone were to ask: “How did everyone behave?”, interpreted with how taking wide scope as in (19c-ii), it would not be sufficient to answer “stupidly”. The explanation for this, according to S&Z, is that computing the answer to this question on the relevant reading would require intersecting the manners in which John, Mary and Jim behaved, but intersection is not defined on (20). This, then, is an example of when “an extracted phrase should take scope over some intervener but is unable to”. Similarly, (19b) is unacceptable because complement is not defined on (20). Extending this account to amounts is slightly trickier, since amounts seem to come in two forms. In the first, which S&Z label “counting-conscious”, whexpressions are able to take scope over universal quantifiers. S&Z imagine a situation in which a swimming team is allowed to take a break when everyone has swum 50 laps. In this situation it would be possible to ask: (22) [At least] How many laps has every swimmer covered by now? If the number of laps covered by the slowest swimmer is a possible answer, then counting-conscious amount expressions had better denote in a structure in which

The Algebraic Structure of Amounts: Evidence from Comparatives

45

intersection is defined. 2 The lattice in (23) – essentially the structure normally assumed for all degrees – seems to be an appropriate choice. (23) Lattice

Intersection and union are defined in this structure, though complement is not. This analysis predicts correctly that how many/much should be able to take scope over existential quantification but not negation. (24)

3.2

a. How many laps has at least one swimmer covered by now? [Answer: the number of laps covered by the fastest swimmer.] b. *How many laps hasn’t John covered by now? Extending the Account to Comparatives

Many authors assume that amounts always denote in (23) or some other structure D, ≤ for D ⊆ R. The problem we began with — why can’t Every girl is less tall than John mean “The shortest girl is shorter than John”? — relied tacitly on this assumption, which entails that it should be possible to intersect sets of degrees. I would like to suggest an alternative: heights and similar amounts do not denote in (23), but in a poorer structure for which intersection is not defined, as S&Z claim for island-sensitive amount wh-expressions. As S&Z note, such a structure is motivated already by the existence of non-counting-conscious amount wh-expressions which are sensitive to a wider variety of interveners than how many was in the examples in (22) and (24). This is clear, for example, with amounts that are not associated with canonical measures: (25) How much pain did every student endure? a. “For every student, how much pain did he/she endure?” b. * “What is the greatest amount of pain s.t. every student endured that much, i.e. how much was endured by the one who endured the least?” The unacceptability of (25b) is surprising given that the degree expression is able to take wide scope in the overtly similar (22). S&Z argue that, unless counting is involved, amount expressions denote in a join semilattice: 2

As Anna Szabolcsi (p.c.) suggests, the use of canonical measures like laps, kilos, meters may help bring out this reading.

46

D. Lassiter

(26) Join semilattice

[a + b + c + d] (= 4) [a + b + c] (= 3)

[a + b] (= 2) [a] (= 1)

[d]

[c]

[b]

(26) should be seen as a structure collecting arbitrary unit-sized bits of stuff, abstracting away from their real-world identity, like adding cups of milk to a recipe (S&Z pp.247-8). An important formal property of (26) is that “if p is a proper part of q, there is some part of q (the witness) that does not overlap with p” (p.247). As a result, intersection is not defined unless the objects intersected are identical. S&Z claim that this fact is sufficient to explain the unavailability of (25b), since the heights of the various students, being elements of (26), cannot be intersected. This explanation for the unavailability of reading (25b) relies on a quite general proposal about the structure of amounts. As a result, it predicts that amount-denoting expressions should show similar behavior wherever they appear in natural language, and not only in wh-expressions. The similarities between amount-denoting wh-expressions and comparatives, then, are explained in a most straightforward way: certain operations are not defined on amount-denoting expressions because of the algebraic structure of their denotations, regardless of the other details of the expressions they are embedded in. So, returning to (9), (27) Every girl is less angry than Larry is. Scope-splitting: less > every girl > d-tall max(λd.angry(Larry)(d)) > max(λd.∀x[girl(x) → angry(x)(d)]) * “Larry’s max degree of anger exceeds the greatest degree to which every girl is angry (i.e., he is angrier than the least angry girl).” This interpretation is not available because, in the normal case, max(λd.∀x[girl(x) → tall(x)(d)]) is undefined. I conclude that the puzzle described by the HeimKennedy constraint was not a problem about the scope of a particular type of operator, but was generated by incorrect assumptions about the nature of amounts. Amounts are not simply points on a scale, but elements of (26). This proposal is independently motivated in S&Z, and it explains the restrictions captured in HK as well as other similarities between comparatives and weak islands. A small but important exception to this generalization is that, since A∩A = A for any A, intersections are defined on (26) in the special case in which all individuals in the domain of quantification are mapped to the same point of the join semilattice. The prediction, then, is that the scope-splitting readings of (25) and (27) should emerge just in case it is assumed that all individuals have the property in question to the same degree. And indeed, S&Z note that there is a third reading of sentences like (25) which presupposes that the girls are all

The Algebraic Structure of Amounts: Evidence from Comparatives

47

equally angry (see also Abrus´an [1] for discussion).3 Though S&Z do not make this connection, it seems that their theory makes another correct prediction in this domain: universal quantifiers can intervene when this presupposition is appropriate.4 3.3

Maximum Readings of Existential Quantifiers and Their Kin

The crucial difference between join semilattices (26) and chains (23) is that the latter is closed under intersection while the former is not. However, both are closed under union, which corresponds to existential quantification. Here the predictions of the present theory diverge from those of the LF-constraint in (10): on our theory, existential quantifiers and other quantifiers which are computed using only unions should be acceptable as interveners in both comparatives and amount wh-questions. (10), in contrast, forbids all quantificational DPs from intervening scopally. In fact we have already seen an example which shows a quantificational intervener of this type: the only available reading of (24) is one which requests the number of laps finished by the fastest swimmer. These readings are also available with amount wh-questions, as (28) shows. (28) How tall is at least one boy in your class? [Answer: 6 feet.] This reading does not seem to be available with the quantifier some. However, this is probably due to the fact that, for independent reasons, some in (29) can only have a choice reading. (29) How tall is some professor? a. “Pick a professor and tell me: How tall is (s)he?” b. # “How tall is the tallest professor?” The readings in (24) and (28) may be more robust than the corresponding reading of (29b) because at least, in contrast to some, does not support choice readings (cf. Groenendijk & Stokhof [6]). Maximum readings also appear with a NP when the NP is focused: (30) [The strongest girl lifted 80 kg.] OK, but how much did a BOY lift? [Answer: the amount that the strongest boy lifted.] Since our account emphasizes the similarities between comparatives and whquestions, we expect that similar readings should exist also in comparatives. These readings are indeed attested: for instance, Heim [9, p.223] notes this example. 3

4

We cannot tell whether the comparative in (27) has this reading, since it is truthconditionally equivalent to the direct scope reading. Thanks to an anonymous reviewer for emphasizing the importance of the reading which contains this “very demanding presupposition”, and to Roberto Zamparelli for pointing out that this is an unexpected and welcome prediction of S&Z’s account.

48

D. Lassiter

(31) Jaffrey is closer to an airport than it is to a train station. This is true iff the closest airport to Jaffrey is closer than the closest train station. An additional naturally occurring example which seems to display a maximum reading with an existential quantifier in a comparative complement, viz.: (32) “I made the Yankee hat more famous than a Yankee can.” (Jay-Z, “Empire State of Mind”, The Blueprint 2, Roc Nation, 2009) From the context of this song, the artist is clearly not saying that he made the Yankees hat more famous than some particular Yankee (baseball player) can, or more than a typical Yankee can, but more than any Yankee can. Finally, if any and ever in contexts such as the following has existential semantics, as claimed in Kadmon & Landman [10] and many others, then (37a) and (37b) also involve intervention by an existential quantifier: (33)

a. Larry endured more pain than any professor did. b. My brother is angrier than I ever was.

Maximum readings, then, are well-attested in both amount comparatives and amount wh-questions. This supports the present theory in treating quantifier scope in comparatives and wh-questions in the same way, and is incompatible with the LF constraint (10) proposed by Heim.

4

Conjunction and Disjunction in the Comparative Complement

Noting an ambiguity similar to that in (3) – where a sentence with or in the comparative complement is equivalent on one reading to the same sentence with and replacing or – Schwarzschild & Wilkinson [23] suggest: or in these examples may in fact be a negative polarity item ... which has a conjunctive interpretation in this context, in the way that negative polarity any or ever seem to have universal interpretations in the comparative. The difference between ordinary or meaning ‘∨’ and NPI or meaning ‘∧’, then, is a matter of lexical ambiguity.5 This is a possible analysis, but I think that it 5

This claim is important to Schwarzschild & Wilkinson [23] because their semantics for comparatives prevents scope ambiguities of the type considered here from being computed, so that they have no choice but to treat or as ambiguous. However, Schwarzschild [22] notes that the earlier proposal in Schwarzschild & Wilkinson [23] wrongly predicts that there should be no scope ambiguities of the type considered here, and proposes an enrichment that is compatible with the analysis of the ambiguity of sentences with or and the non-ambiguity of the corresponding sentences with and suggested here. See section (6.1) for details.

The Algebraic Structure of Amounts: Evidence from Comparatives

49

would be desirable to derive the ambiguity of (3) without positing two meanings of or, in line with Grice’s [5] Modified Occam’s Razor (“Senses are not to be multiplied beyond necessity”). The present theory yields an explanation of this fact, and of why similar constructions with and are unambiguous. The only available reading of the sentence in (4) is the one in (34a), where (34) is treated as an elliptical variant of (35). (34) John is richer than his father was and his son will be. a. max(λd[rich(d)(f ather)]) < max(λd[rich(d)(John)]) ∧ max(λd[rich(d)(son)]) < max(λd[rich(d)(John)]) (35) John is richer than his father was and he is richer than his son will be. Why can’t (34) be read as in (36), which ought to mean that John is richer than the poorer of his father and his son? (36) max(λd[rich(d)(f ather) ∧ rich(d)(son)]) < max(λd[rich(d)(John)]) The theory we have adopted offers an explanation: because conjunction, like universal quantification, relies on the operation of intersection, (36) is not available for the same reason that Every girl is less angry than Larry doesn’t mean that the least angry girl is less angry than Larry. Computing (36) would require taking the intersection of the degrees of wealth of John’s father and his son; but this is not possible, because this operation is not defined for amounts of wealth. In contrast, the same sentence with or instead of and is ambiguous because the maximum reading (37a) can be computed using only unions, which are defined in a join semilattice. (37b) is the alternative reading on which (37) is elliptical, like the only available reading of (34). (37) John is richer than his father was or his son will be. a. max(λd[rich(d)(f ather)∨rich(d)(son)]) < max(λd[rich(d)(John)]) [“He is richer than both.”] b. max(λd[rich(d)(f ather)< rich(d)(John)])∨max(λd[rich(d)(John)]) [“He is richer than one or the other, but I don’t remember which.”] Thus it is not necessary to treat or as lexically ambiguous: the issue is one of scope.

5

Modals and Intensional Verbs

On S&Z’s theory, existential quantifiers are able to intervene with amount expressions because join semilattices are closed under unions. This produces the “maximum” readings that we have seen. The analysis predicts, correctly, that (38) is ambiguous, a type of case discussed at length in Heim [9]. (38) (This draft is 10 pages.) The paper is allowed to be exactly 5 pages longer than that. [9, p.224]

50

D. Lassiter

a. allowed > exactly 5 pages -er > that-long ∃w ∈ Acc : max(λd : longw (p, d)) = 15pp “In some accessible world, the paper is exactly 15 pages long, i.e. it may be that long and possibly longer” b. exactly 5 pages -er > allowed > that-long max(λd[∃w ∈ Acc : longw (p, d)) = 15pp “The max length of the paper in any accessible world, i.e. its maximum allowable length, is 15 pages” I am not entirely certain whether the corresponding wh-question is ambiguous: some speakers think it is, and others do not. The robust reading is (39b), which involves scope-splitting; the questionable reading is the choice reading (39a). (39) How long is the paper allowed to be? a. allowed > how long > that-long ? “Pick an accessible world and tell me: what is the length of the paper is that world?” [Answer: “For example, it could be 17 pages long.”] b. how long > allowed > that-long “What is the max length of the paper in any accessible world, i.e. its maximum permissible length?” [Answer: “20 pages – no more will be accepted.”] If the answer given to (39a) is not possible, this may again be related to restrictions on the availability of choice readings, or possibly allowed does not have a quantificational semantics at all (as I will suggest for required ). So far, so good. However, since intersection is undefined with amount expressions, S&Z and the current proposal seem to predict that this ambiguity should be absent with universal modals, so that neither (40b) nor (41b) should be possible. (40) (This draft is 10 pages.) The paper is required to be exactly 5 pages longer than that. [9, p.224] a. required > exactly 5 pages -er > that-long ∀w ∈ Acc : max(λd : longw (p, d)) = 15pp “In every world, the paper is exactly 15 pages long” b. exactly 5 pages -er > required > that-long max(λd[∀w ∈ Acc : longw (p, d)]) = 15pp “The max common length of the paper in all accessible worlds, i.e. its length in the world in which it is shortest, is 15 pages” (41) How long is the paper required to be? a. required > how long > that-long “What is the length s.t. in every accessible world, the paper is exactly that long?” b. how long > required > that-long “What is the max common length of the paper in all accessible worlds, i.e. its length in the world in which it is shortest?”

The Algebraic Structure of Amounts: Evidence from Comparatives

51

But (40b) and (41b) are possible readings, and in fact are probably the most robust interpretations of (40) and (41). S&Z suggest briefly that modals and intensional verbs may be acceptable interveners because they do not involve Boolean operations. I am not sure precisely what they have in mind, but we should consider the possibility that the issue with (40)-(41) is not a problem about the interaction between degree operators and universal quantification, but one about the analysis of modals and intensional verbs. For instance, suppose that we were to treat modals not as quantifiers but as degree words, essentially as minimum- and maximum-standard adjectives as discussed in Kennedy & McNally [13] and Kennedy [12]. This analysis is motivated on independent grounds in Lassiter [17].6 For reasons of space the theory will not be described in detail, but – on one possible implementation of the analysis for allowed and required – its predictions are these: only the (b) readings of (40)-(41) should be present, and it is (40a) and (41a) that need explanation. In the case of (40), at least, the (a) entails the (b) reading, and so we may suppose that only (40b) is generated, and its meaning is “Fifteen pages is the minimum requirement”.7 On such an analysis, (40a) is not a different reading of (40) but merely a special case of (40b), where we have further (e.g., Gricean) reasons to believe that that the minimum is also a maximum.8 The advantage of such an analysis, from the present perspective, is that it explains another stipulative aspect of Heim’s proposed LF-constraint: why is the restriction limited to quantificational DPs? If the proposal I have gestured at in this section is correct, we have an answer: “universal” modals are immune to the prohibition against taking intersections with amounts because they are not really quantifiers. That is, computing them does not involve universal quantification over worlds, but simply checking that the degree (of probability, obligation, etc.) of a set of worlds lies above a particular (relatively high) threshold. Even if this particular suggestion turns out to be incorrect, the difference between universal quantifiers and universal modals noted by S&Z and Heim 6

7

8

A proposal due to van Rooij [27] and Levinson [18] seems to make similar predictions for want. This may account for the fact that want, like require, is a scope-splitting verb. The suggestion made here also recalls a puzzle noted by Nouwen [19] about minimum requirements and what Nouwen calls “existential needs”. Need is another scopesplitting verb, as it happens. I suspect that Nouwen’s problem, and the solution, is the same as in the modal scope-splitting cases discussed here. An anonymous reviewer notes that this account does not extend from required to should, citing the following puzzling data also discussed in Heim [9]: (i) Jack drove faster than he was required to. (ii) Jack drove faster than he should have. If the law requires driving at speeds between 45 mph and 70 mph, (i) is naturally interpreted as saying that Jack drove more than 45 mph, but (ii) says that he drove more than 70 mph. There are various possible analyses of these facts from the current perspective, including treating required as a minimum-standard degree expression, but should as a relative-standard expression (like tall or happy).

52

D. Lassiter

is an unexplained problem for all available theories of comparatives and weak islands, and not just the proposal made here. The general point of this section is simply that the setup of this problem relies on one particular theory of modals which may well turn out to be incorrect. Fleshing out a complete alternative is unfortunately beyond the scope of this paper, however.

6

Comparison with Related Proposals

In this section I compare the proposal made here with two influential proposals in the recent literature. The general conclusion is that the problem in (1) is not resolved by these modifications to the semantics of comparatives and/or whquestions, and that a separate account is needed. However, the line of thought pursued here is essentially compatible with these proposals as well. 6.1

Schwarzschild & Wilkinson

An influential proposal due to Schwarzschild & Wilkinson [23] argues that comparatives have a semantics based on intervals rather than points. They show that, on this assumption, it is possible to derive apparent wide scope of quantifiers in the comparative complement without allowing QR out of the comparative complement. Since the latter would violate the general prohibition against extraction from a comparative complement, this result is welcome. However, it is also empirically problematic, and attempts to cope lead back to a solution of the type given here. Schwarzschild & Wilkinson give the comparative sentence in (42a) a denotation that is roughly paraphrased in (42b): (42)

a. Larry is taller than every girl is. b. The smallest interval containing Larry’s height lies above the largest interval containing the heights of all the girls and nothing else.

(42b) will be true just in case, for every girl, Larry is taller than she is. As a result, the undesired “shortest-girl” reading is not generated. Schwarzschild [22] acknowledges, though, that the proposal in Schwarzschild & Wilkinson [23] is too restrictive: it predicts that there should never be scope ambiguities between more/-er/less and quantifiers in the comparative complement. We have already seen a number of examples where such ambiguities are attested, involving existential quantification and its ilk (in (32)) and existential and (perhaps) universal modals in (38) and (40). In order to maintain the interval-based analysis, Schwarzschild [22] introduces a point-to-interval operator π which can appear in various places in the comparative complement (see also Heim [8] for a closely related proposal and much discussion). In this way, Schwarzschild derives the ambiguities discussed here without raising the QP out of the comparative clause. The important thing to note, for our purposes, is that while Schwarzschild & Wilkinson [23] present a semantics on which the problems discussed here do

The Algebraic Structure of Amounts: Evidence from Comparatives

53

not arise, Schwarzschild’s [22] modification re-introduces scope ambiguities in comparatives in order to deal with the restricted set of cases where they do arise. This modification is valuable because it explains how these ambiguities arise despite the islandhood of the comparative clause; however, as Heim [8, pp.15-16] points out, in order to “prevent massive overgeneration of unattested readings, we must make sure that π never moves over a DP-quantifier, an adverb of quantification, or for that matter, an epistemic modal or attitude verb”. This is essentially Heim’s [9] proposed LF-constraint (10) re-stated in terms of the scope of π. So we are back to square one: the interval-based account, though it has the important virtue of explaining apparent island-violating QR out of the comparative complement, does not explain the core puzzle that we are interested in, why (42a) lacks a “shortest-girl” reading. So the current proposal, or something else which does this job, is still needed in order to explain why (42a) does not (on the high-π reading) have the “shortest-girl” reading.9,10 6.2

Fox & Hackl

Next we turn to an influential proposal by Fox & Hackl [4]. I show that the theory advocated here is not in direct competition with Fox & Hackl’s theory, but that there are some complications in integrating the two approaches which may cause difficulty for Fox & Hackl’s. Fox and Hackl argue that amount-denoting expressions always denote on a dense scale, effectively the lattice in (23) with the added stipulation that, for any two degrees, there is always a degree that falls between them. The most interesting data from the current perspective are in (43) and (44): (43)

a. How fast are we not allowed to drive? b. *How fast are we allowed not to drive?

(44)

a. How fast are we required not to drive? b. *How fast are we not required to drive?

The contrasts in (43) and (44) are surprising from S&Z’s perspective: on their assumptions, there is no maximal degree d such that you are not allowed to drive d-fast, and yet (23a) is fully acceptable. In addition, (43a) and (44a) do not ask for maxima but for minima (the least degree which is unacceptably fast, i.e. the speed limit). Fox and Hackl show that the minimality readings of (43a) and (44a), and the ungrammaticality of (43b) and (44b) follow if we assume (following [3] and [2]) that wh-questions do not ask for a maximal answer but for a maximally informative answer, defined as follows: 9

10

Although intervals are usually assumed to be subsets of the reals – and so fit naturally with totally ordered structures like (22) – there is no barrier in principle to defining an interval-based degree semantics for partially ordered domains. Of course, some issues of detail may well arise in the implementation. A similar point holds for Abrus´ an [1]: her interval-based semantics does well with negative and manner islands, but does not account for quantificational interveners.

54

D. Lassiter

(45) The maximally informative answer to a question is the true answer which entails all other true answers to the question. Fox & Hackl show that, on this definition, upward monotonic degree questions ask for a maximum, since if John’s maximum height is 6 feet, this entails that he is 5 feet tall, and so on for all other true answers. However, downward entailing degree questions ask for a minimum, since if we are not allowed to drive 70 mph, we are not allowed to drive 71 mph, etc. This is not as deep a problem for the present theory as it may appear. S&Z assume that wh-questions look for a maximal answer, but it is unproblematic simply to modify their theory so that wh-questions look for a maximally informative answer. Likewise, we can just as easily stipulate that a join semilattice (26) is dense as we can stipulate that a number line (23) is dense; this maneuver would replicate Fox & Hackl’s result about minima in downward entailing contexts. In this way it is possible simply to combine S&Z’s theory with Fox & Hackl’s. In fact, this is probably independently necessary for Fox & Hackl, since their assumption that amounts always denote in (23) fails to predict the core data of the present paper: the fact that How tall is every girl? and Every girl is less tall than John lack a “shortest-girl” reading. I conclude that the two theories are compatible, but basically independent. Finally, note that the maximal informativity hypothesis in (45), whatever its merit in wh-questions and other environments discussed by Fox & Hackl, is not appropriate for comparatives: here it appears that we need simple maximality.11 (46)

a. How fast are you not allowed to drive? b. *You’re driving faster than you’re not allowed to.

A simple extension of the maximal informativity hypothesis to comparatives would predict that (46b) should mean “You are exceeding the speed limit”. In contrast, the maximality-based account predicts that (46b) is unacceptable, since there is no maximal speed which is not allowed. This appears to be the correct prediction. However, it is worth noting that, because of the asymmetry between (43) and (46), combining the two theories in the way suggested here effectively means giving up the claim that the comparative is a type of wh-operator. Since this idea has much syntactic and semantic support, it is probably worth looking for an alternative explanation of (43) and (44) that does not involve adopting the proposal in (45).

7

Conclusion

To sum up, the traditional approach on which amounts are arranged on a scale of degrees fails to explain why the constraint in (10) should hold. However, the numerous similarities between limitations on comparatives and amount-denoting wh-questions with quantifiers suggest that these phenomena should be explained by a single theory. S&Z’s semantic account of weak islands predicts, to a large 11

Thanks to a PLC reviewer for bringing the contrast in (46) to my attention.

The Algebraic Structure of Amounts: Evidence from Comparatives

55

extent, where quantifier intervention is possible and where it is not. The crucial insight is that intervention effects are due to the kinds of operations that quantifiers need to perform, and not merely the structural configuration of various scope-taking elements. S&Z’s theory also predicts correctly that narrow-scope conjunction is impossible in amount comparatives, but narrow-scope disjunction is possible. To be sure, important puzzles remain; but the algebraic approach to comparative scope offers a promising explanation for a range of phenomena that have not been previously treated in a unified fashion. Furthermore, if S&Z’s theory turns out to be incomplete, all is not lost. The most important lesson of the present paper, I believe, is not that S&Z’s specific theory of weak islands is correct — as we have seen, there are certainly empirical and technical challenges12 — but rather that weak island phenomena are not specific to wh-questions. In fact, we should probably think of the phenomena summarized by the Heim-Kennedy constraint as comparative weak islands. However the theory of weak islands progresses, evidence from comparatives will need to play a crucial role in its development — and vice versa.13

References 1. Abrus´ an, M.: Contradiction and Grammar. PhD thesis, MIT (2007) 2. Beck, S., Rullmann, H.: A flexible approach to exhaustivity in questions. Natural Language Semantics 7(3), 249–298 (1999) 3. Dayal, V.: Locality in wh-quantification. Kluwer, Dordrecht (1996) 4. Fox, D., Hackl, M.: The universal density of measurement. Linguistics and Philosophy 29(5), 537–586 (2006) 5. Grice, H.P.: Studies in the Way of Words. Harvard University Press, Cambridge (1989) 6. Groenendijk, J., Stokhof, M.: Studies in the Semantics of Questions and the Pragmatics of Answers. PhD thesis, University of Amsterdam (1984) 7. Hackl, M.: Comparative quantifiers. PhD thesis, MIT (2000) 8. Heim, I.: Remarks on comparative clauses as generalized quantifiers. Ms. MIT, Cambridge (2006) 9. Heim, I.: Degree operators and scope. In: Fery, Sternefeld (eds.) Audiatur Vox Sapientiae: A Festschrift for Arnim von Stechow. Akademie Verlag, Berlin (2001) 10. Kadmon, N., Landman, F.: Any. Linguistics and philosophy 16(4), 353–422 (1993) 12

13

In particular, S&Z do not give a compositional implementation of their proposal. I do not see any very deep difficulties in doing so, although, as Anna Szabolcsi (p.c.) points out, treating amounts and counters differently in semantic terms despite their similar (or possibly identical) syntax might seem unattractive to some. Thanks to Anna Szabolcsi and an anonymous ESSLLI reviewer for pointing out the need for this note. Thanks to Chris Barker, Anna Szabolcsi, Arnim von Stechow, Emmanuel Chemla, Yoad Winter, Lucas Champillon, Rick Nouwen, Roberto Zamparelli, Roger Schwarzschild, several anonymous reviewers, and audiences at the 33rd Penn Linguistics Colloquium and the 2009 ESSLLI Student Session for helpful discussion and advice. An earlier version of this paper appeared as Lassiter [16].

56

D. Lassiter

11. Kennedy, C.: Projecting the adjective: The syntax and semantics of gradability and comparison. PhD thesis, U.C., Santa Cruz (1997) 12. Kennedy, C.: Vagueness and grammar: The semantics of relative and absolute gradable adjectives. Linguistics and Philosophy 30(1), 1–45 (2007) 13. Kennedy, C., McNally, L.: Scale structure, degree modification, and the semantics of gradable predicates. Language 81(2), 345–381 (2005) 14. Kroch, A.: Amount quantification, referentiality, and long wh-movement. Ms., University of Pennsylvania (1989) 15. Landman, F.: Structures for Semantics. Kluwer, Dordrecht (1991) 16. Lassiter, D.: Explaining a restriction on the scope of the comparative operator. University of Pennsylvania Working Papers in Linguistics, 16(1) (2010) 17. Lassiter, D.: Gradable Epistemic Modals, Probability, and Scale Structure. In: Proceedings from Semantics and Linguistic Theory XX (to appear, 2010) 18. Levinson, D.: Probabilistic Model-theoretic Semantics for want. In: Proceedings from Semantics and Linguistic Theory XIII (2003) 19. Nouwen, R.: Two puzzles about requirements. In: Proceedings of the 17th Amsterdam Colloquium, pp. 326–334 (2009) 20. Rizzi, L.: Relativized Minimality. MIT Press, Cambridge (1990) 21. Rullmann, H.: Maximality in the Semantics of wh-constructions. PhD thesis, University of Massachusetts, Amherst (1995) 22. Schwarzschild, R.: Scope-splitting in the comparative. Handout from MIT colloquium (2004), http://www.rci.rutgers.edu/~ tapuz/MIT04.pdf 23. Schwarzschild, R., Wilkinson, K.: Quantifiers in comparatives: A semantics of degree based on intervals. Natural Language Semantics 10(1), 1–41 (2002) 24. Szabolcsi, A.: Strong vs. weak islands. Blackwell Companion to Syntax 4, 479–531 (2006) 25. Szabolcsi, A., Zwarts, F.: Weak islands and an algebraic semantics for scope taking. Natural Language Semantics 1(3), 235–284 (1993) 26. Takahashi, S.: More than two quantifiers. Natural Language Semantics 14(1), 57–101 (2006) 27. van Rooij, R.: Some analyses of pro-attitudes. In: de Swart, H. (ed.) Logic, Game Theory, and Social Choice. Tilburg University Press (1999) 28. von Stechow, A.: Comparing semantic theories of comparison. Journal of Semantics 3(1), 1–77 (1984)

Extraction in the Lambek-Grishin Calculus Arno Bastenhof Utrecht University

Abstract. We propose an analysis of extraction in the Lambek-Grishin calculus (LG): a categorial type logic featuring subtractions A  B and B  A, with proof-theoretic behavior dual to that of the usual implications A B, B A. Our analysis rests on three pillars: Moortgat’s discontinuous type constructors ([6]); their decomposition in LG as proposed by Bernardi and Moortgat ([1]); and the polarity-sensitive double negation translations of [3] and [5], inspiring the Montagovian semantics of our analysis. Characteristic of the latter is the use of logical constants for existential quantification and identity to identify the extracted argument with its associated gap.

Being founded upon logics of strings (L) or trees (NL), categorial type logics [7, CTL] do not naturally accommodate a satisfactory treatment of discontinuity. In response, Moortgat ([6]) proposed a type constructor that allows unbounded abstraction over the type of the syntactic context of an expression. Though originally intended for the analysis of scopal ambiguities, we here pursue its application to extraction. We find opportunities for improvement by extending our analysis to the Lambek-Grishin calculus (LG), a conservative extension of NL for which Moortgat’s type constructor was shown derivable in [1]. We conclude by pairing our analysis with a Montagovian semantics, taking inspiration from the double negation translations proposed in [3] and [5]. We proceed as follows. 1 briefly reviews the application of CTL to linguistic analysis. 2 outlines our usage of Moortgat’s type constructor in finding a type schema for extraction. The slight amount of lexical ambiguity thereby posited is shown reducible in 3, where we discuss the decomposition of our schema in LG. 4 couples our analysis with a Montagovian semantics. 5 summarizes our findings and relates our work to the literature.

1

Categorial Analyses

We adopt a logical perspective on natural language syntax: syntactic categories are logical formulas, or (syntactic) types (written A..E), as we shall call them. A..E

n np s



B A

AB 



Types n, np and s are atomic, categorizing common nouns, noun phrases and sentences respectively. The interpretations of complex types we derive from the proof-theoretic meanings of their constructors: the implications , . Proofs we T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 57–71, 2010. Springer-Verlag Berlin Heidelberg 2010

58

A. Bastenhof

take to establish sequents Γ A, understood as grammaticality judgements: Γ is a well-formed constituent of type A. By a constituent we simply understand a binary-branching tree with types at its leaves: Γ, Δ

A

Γ  Δ



On this reading, the slashes embody subcategorization: a constituent Γ of type AB (B A combines with a constituent Δ of type B into Γ  Δ (Δ  Γ ), assigned the type A. Made explicit in a Natural Deduction presentation: A Γ

A

Ax

AB Δ B Δ B Γ B A Γ  B A I E E Γ Δ A ΔΓ A Γ AB

B  Γ A I Γ B A

Introduction rules I , I  allow the inference of a type, while elimination rules E , E  allow the use of a type in an inference. Axioms Ax ensure each constituent corresponding to a single leaf to be a well-formed constituent of the type found at that leaf. Additionally, one may consider a formula counterpart of tree merger Γ  Δ: the multiplicative product AB, read as binary merger:1 Δ

A  B Γ A  B  Γ Δ C

C

E

Γ Γ



A Δ B I Δ AB

As an illustration, consider the following type assignments to the lexical items in John offered the lady a drink. Figure 1 derives the corresponding sentence:2 John npsu



offered npsu snpdo npio

the lady a drink npio n n npdo n n

The calculus NL thus established is no stronger than context-free formalisms. The next section discusses Moortgat’s proposal on how to go beyond.

2

Types for Extraction

We discuss Moortgat’s discontinuous type constructors ([6]), proposing an application to extraction. Intuitively, said type constructors allow abstraction over the type of the syntactic contexts of expressions. By a context we shall understand trees Γ  with a hole . Substituting for  some Δ yields the tree Γ Δ, the notation emphasizing the (distinguished) occurrence of Δ in Γ . Γ , Δ 1

2



Γ   Δ Δ  Γ 

The notation Γ Δ emphasizes the distinguished occurrence of a subtree Δ in Γ . We discuss this in more detail in 2. Subscripts have been added to facilitate matching with grammatical roles: su, do and io abbreviate subject, direct object and indirect object respectively.

Extraction in the Lambek-Grishin Calculus

59

Any subtree Δ of some Γ uniquely determines a context Γ  s.t. Γ Γ Δ. This result extends to the level of type assignment: for derivable Γ B, any decomposition of Γ into a subtree Δ and its context Γ  allows for a type A of the hole to be determined, in the sense that there are derivable Γ A B and Δ A. Cut recovers the original type assignment Γ B: Δ

A Γ A Γ Δ B

B

Cut

The act of singling out a subtree of Γ is assigned an operational meaning with C : the embedded constituent Δ with syntacMoortgat’s discontinuity types A  B tic distribution characterized by A may seize scope over its embedding context Γ A of type B, with outcome an expression of type C.3 Δ

C Γ A AB

B

 E C Moortgat proposes assigning types np  ss to quantified noun phrases, abstracting over the sentential domain defining their scope. Here, we consider instead a more surface oriented manifestation of discontinuity: extraction. A predicate  C an argument of which is extracted we lexically assign the type A  B   , B parameterizing over the types: A of its syntactic distribution; C of its extracted argument (instantiating B    C by B   C if it occurs in a right branch, and   C B otherwise); and B of the extraction domain. Here,  (diamond) and  (box) are unary type-forming operators, held subject to:

Γ Δ

 

A

A

 

A

thereby preventing overgeneration: constituents of type C do not directly combine with gapped clauses, seeing as they do not derive   C. As an example, consider the complex noun phrase lady whom John offered  np) of offered is extracted (from a right a drink. The object noun phrase (C branch) at the level of the embedded sentence (B s), read itself locally selecting for a subject and direct object (A npsnp) (Figure 1 gives the derivation): whom nn s

np

io 

John

offered

npsu

npsu s npdo   s

np s

io



a

drink

npdo n

n

Use of  E  allows the assignment of snp to the gapped embedded sentence John offered  a drink, establishing it as a proper argument for whom. We observe: 1. Our type schema for extraction places no constraints on the location of the gap, seeing as  E  operates at an arbitrarily deep level within a context.4 3

4

Moortgat writes instead q A, B, C , giving inference rules for a sequent calculus. Our notation is borrowed from [11]. Compare this to the situation in L, where only peripheral extraction is possible, at least as long as one does not wish to resort to the use of types that are highly specialized to their syntactic environments.

60

A. Bastenhof

 C 2. We rely on limited lexical ambiguity: types A  B   are lexically asB signed next to the usual types for when no extraction takes place. Although the ambiguity is well under control (it being of finite quantity), we establish it reducible in the next section for several specific cases. 3. Our analysis is, of course, not limited to the case where the extraction domain is a sentence. For example, with wh-extraction in English the gap shows up at the level of a yes-no question [12, Chapter 3]:

Whom

np

wh q

did q sinf

John np

offer npsinf  np  q

np  q

something? np

Here q, wh are the types of yes-no and wh-questions respectively.

3

The Lambek-Grishin Calculus

C from the We discuss a proposal of Bernardi and Moortgat ([1]) to derive A  B more primitive type constructors of the Lambek-Grishin calculus (LG, [8]). The latter is an extension of NL exhibiting an involutive operation   on types and sequents, manifesting as an arrow-reversing duality:5 A is derivable  A

Γ

Γ

is derivable

More accurately, once we admit some notion of costructure Π naturally occurring on the right hand side of the turnstile: Γ

Π is derivable  Π

Γ

is derivable

We realize   at the level of types by adding subtractions ,  to the repertoire of type-forming operators: s np n B A A  B  AB  B  A

A..E



Formally,





(Atoms) (Left selection vs. right rejection) (Right selection vs. left rejection)

is then defined by B

AB A

BA A B

AB B A

B A A B



and A A is easily checked. def A for A atomic. Involutivity of   (A At the level of sequents, we require a notion of costructure: Π, Σ

A

Π  Σ ,



A A

Γ Δ

 

Δ Γ

Σ

Π Σ  Π



Cocontexts Π  are defined similarly to contexts Γ . Inference rules are listed in Figure 2. An easy induction shows that   gives the requested duality. E.g., 5

In contrast with classical NL [2], this duality is not internalized via linear negation.

  

 

   









 















 









Fig. 1. Derivations for examples discussed in sections 1,2. Lexical insertion is realized (informally) by writing axioms A an inference of A from w when w is a word lexically assigned the type A.





 

Ax something npsinf  np npsinf  np np John E np  np  s  np something np  s inf inf did E offer q sinf John npsinf  np something sinf E npsinf  np  q q np  did John npsinf  np something q whom  E wh q np did John offer something q np E whom did John offer something wh



   

    

a drink np n n E Ax nps np nps np a drink np John E offered np nps np a drink nps E nps np  s s np  John nps  np a drink s whom  E nn s np John offered a drink s np E whom John offered a drink n n







the lady np n n offered a drink E nps np np the lady np np n n E E offered the lady nps  np a drink np John E np offered the lady a drink nps E John offered the lady a drink s

A as

Extraction in the Lambek-Grishin Calculus 61

62

A. Bastenhof

A Δ

A



A Γ Π B  I Γ Π AB 

Ax



Π A Γ A Σ Cut Γ Δ Π Σ  Δ



A  B 

A

A

Π B Π Σ

Σ



Δ A

B

Σ B  A A Σ Π

Π

Γ B

E

Π B  I Π B A

Γ B  A Π I Γ A  B  Π

A B 

Γ

E

A



Γ B  Π A I Γ B  A Π

B A

B Γ Δ Γ

Γ Γ





Fig. 2. Inference rules of LG. For Cut, we require either Γ 



Γ

A B 

Γ



Δ A

B

E

B

Δ A

B A Δ Γ



E

E  or Π 

Γ

.

E

Moortgat ([8]) originally formulated LG as a display calculus, although here we opted for a presentation more Natural Deduction-esque. Note, however, that (Cut) is not eliminable in this presentation, as with usual Natural Deduction. Compared to NL, LG’s expressivity is enhanced through the introductions: the inferred (co-)slash may occur arbitrarily deep inside the right-hand side (left-hand side) of the sequent. Bernardi and Moortgat exploit this property by decomposing types A  C B as B  C   A. Indeed,   E  is then derivable: Ax Ax B  C B  C C C E B B B  C   C Cut Γ A B  C   C I Γ B  C   A C Cut C 

Γ A Δ



B  C  A Γ Δ

Our analysis of extraction in 2 thereby immediately extends to LG: translate  C AB   as B  B    C   A. In doing so, we open the door to B further improvement of our analysis. Recall from previous examples our use of two separate types for offered, depending on whether or not extraction took place. In LG, we can find a type D from which both may be derived : D D

npsu snpdo npio (No extraction) s   np io np (Extraction) su snpdo   s 

whereas D  npsu snpio npdo and D  npsu snpio npdo , preventing overgeneration. Abbreviating npsu snpdo npio as dtv, we have the following solution for D: s  npio DZ    np ss  dtv io s  s s



Extraction in the Lambek-Grishin Calculus s s

s s

Ax E

s s s s npio  s s s ss s  ss npio  s s s s npio  s s  dtv s  s s

 

 



s s

bind

np

s s

E

io

  np   np





io

io

s dtv

s s  dtv s 

 s s  dtv s  s  s s s  dtv s  s



  np np  s io

bind

Ax E

s s s s   bind s s  dtv  s dtv s





E

 s   np np s np s  s np   np s np   s snp np     np np s np   s s io

su

io

io

63

io

do

su

su

io

do



E  I

io

do

Fig. 3. Explaining the intuition behind D. Active formulas are explicitly marked with a suffix  in order to faciliate understanding.

thus at least partially eliminating the need for (lexically) ambiguous type assignment. Figure 3 explains the intuition behind D, using the following derived rules in order to abstract away from unnecessary details: Γ A B bind C  C Γ A  B

Γ

AB C  B,   B  E Γ C A

Γ

Γ B E  C C B

B More generally, we can find such a D for types AB and A  C    provided C

C

headA (e.g., if A D

4





npsnp, then C

s), the solution then being:6

B C   B      B C C   AC   C  C

Formal Semantics

Montagovian semantics splits the labour between a derivational and a lexical component. The former motivates LG’s inference rules semantically, showing how they combine the denotations associated with their premises into a denotation of their conclusion. The lexical component determines the base of the 6

The case of extraction from a left branch is similar.

64

A. Bastenhof

τ



x

x τ

 E  M τ Δ N  τ  σ Γ, τ , σ Γ, Δ case N of x  y .M  τ Γ

M Γ, Δ

 τ

τ

y

σ





   I Γ M τ Δ N σ Γ, Δ M  N  τ  σ Γ, τ x M φ Γ λxτ M τ

Δ N τ M N  φ x



N τ Γ, τ x M τ Cut Γ, Δ M N x τ

Δ

Ax

E

τ λx M N  case N1  N2 of xτ  y σ .M

β β

I

M N x M N1 x, N2 y 



Fig. 4. Recipes for meaning construction, phrased in linear λ-calculus. In  E , E , we require the sets of variables in Γ, Δ to be disjoint.

recursion, specifying the denotations of the individual words. 4.1 gives a derivational semantics of LG, inspired by [5] and [3], while 4.2 illustrates the lexical semantics associated with our extraction type schema. 4.1

Derivational Semantics

We phrase our instructions for meaning composition in a restriction of the linear simply-typed λ-calculus, detailed in Figure 4. The (semantic) type τ of a term M is determined relative to a context Γ , being a multiset of type declarations xσ for the free variables in M . This relation is expressed by a sequent Γ M τ , held subject to various linearity constraints: Γ contains no variables not in M , and no type declaration occurs more than once. In other words, we dispense with weakening and contraction, and end up with a fragment of intuitionistic multiplicative linear logic. We speak of a fragment, in that we require only limited use of implications. Formally, we define semantic types τ, σ, ρ by τ, σ, ρ

s np n φ

τ



τ  σ



Note that we considered again atomic types s, np and n: the specification of their referential properties we leave to the lexical component. The distinguished type φ acts as the sole answer type of linear implications: τ then reads as τ  φ. The term language thus obtained is essentially the linear counterpart of the restricted λ-calculus explored in [5]. We shall refer to it as LP, , or simply LP when no confusion arises with the full LP , . Finding a derivational semantics for LG now amounts to translating a derivation for a multiple conclusion sequent Γ Π into a single conclusioned Δ M τ . Such problems have been tackled before in the proof theory of classical logic through the familiar double negation translations into minimal logic: roughly, translate the conclusions of a derivation as hypotheses by negating them. This practice typically results in the prefixing of a great deal of subformulas with double negations ; most unpractical as a foundation for a Montagovian semantics. Here, we adapt instead the approach by Girard ([3]), who obtained a

Extraction in the Lambek-Grishin Calculus

65

significantly more efficient translation by taking into account the polarity of a formula. Within LG, we will define the latter concept by stating atomic types to have positive polarity, while complex types are said to have negative polarity: P, Q K, L

s np n AB  B A





B  A

(Positive types) A  B  (Negative types)



We define a translation  taking LG’s types to LP. We set A A for atomic   A, whereas A for complex A explicitly abstracts over the polarities  B  of   its direct subtypes B ( if positive, if negative):7 A B 















B  A, A  B  A  B  A  B  A  B  A  B 

AB , B A A  B  A  B  A  B  A  B 





Extending the translation to sequents, we set 

A

A for positive input A or negative output A A for negative input A or positive output A

  

where A is input if it occurs in the antecedent (left-hand side) of a sequent, and output if it occurs in the consequent (right-hand side). A derivation of Γ Π we then interpret by a term-in-context Γ , Π  M φ, with Γ  and Π  denoting pointwise applications of , trading in the structure built up by ,  for LP’s multisets. Figure 4.1 interprets LG’s inference rules, restricting the treatment of introductions and eliminations of complex A to the case where each of its direct subtypes is negative, leaving the other cases as an exercise. We note: 1. We mention explicitly only the active and main formulas of each rule. 2. λxτ σ .case x of y τ  z σ .M (x not free in M ) is abbreviated λy τ  z σ M . 3. We use α, β, γ, possibly sub- or superscripted, as variable names for types A with A originating in the righthand side of the translated sequent. Returning to our analysis of extraction, we illustrate with the slightly simplified lady n



whom John nns  npdo  npsu

saw npsu s  s  snpdo



leaving more room for discussion of the lexical semantics associated with the extraction type schema. The derivational semantics is calculated in Figure 4.1. For reasons of space, we merged the syntactic and semantic derivations into one, presenting it in list format. 7

A less negation-heavy translation is in fact possible if we take subtractions to be positive. However, we would then be unable to give a lexical semantics for our extraction types. See also our discussion in 5.



11. 12. 13. 14.

9. 10.

w

 



 

whomw j John saws  ladyl j John saws 

Johnj

saws saws

z

so Johnj npsx4 nps x4 nps x4

x2

x1

β,η

β,η

β,η

α





M φ

  N   φ

x

Cut





 np

s

np 

α

α5



P 

x









M φ

α

β









M λγ

L









L

L

γ

y

M φ





 s np

α2

E 

(E,13,12)

(Cut,9,10) (Ax) ( E,11,10) (Ax)

(Ax)

(Cut,6,3) (I,7)

(Ax) (Ax) (E,1,2) (Ax) (Ax) (E,4,5)

γ β  x  y  z  φ

LK 

N λβ



KL

z

N φ



Cut

 E 

K  M  λβ N  α γ   φ  K  L M  φ  L N  φ



x



 K 



N φ

λz z λ x4  α1 x4 λoα1 o  α2   j  λα5 s α5  s λ x4  α1 x4 λoα1 o  α2   j  s npα2 α6 w α6  nn s np β w β  λα2 s λ x4  α1 x4 λoα1 o  α2   j   nn α α7 l n 7 λv w γ  v  λα2 s λ x4  α1 x4 λoα1 o  α2   j   l γ w γ  l  λα2 s λ x4  α1 x4 λoα1 o  α2   j   n

s α5  nps  s





 



, Lβ M φ

I  z λ β  x M  φ

α

 P  N  λx P  M  λα

 P 

1 x1 α1  s  s x2 α2  s npα2 λγ λβ γ o  β  α2  α1  β,η α1 o  α2  s  s npα1 α3 α3 j  np α x4 α4  nps 4 α λux4 α  u  j  β,η x4 α  j  s λαx4 α  j  λoα1 o  α2  x4 λoα1 o  α2   j  s  s npα1 s npα2 α2 z λ x4  α1 x4 λoα1 o  α2   j  s np

z

x

M φ

I  case γ of y  α.M φ

y

 K 

γ

 K  L

np  nps 

Johnj Johnj

 K 

 L , K 

LK 

np np



N φ

Fig. 5. Recipes for meaning construction, phrased in linear λ-calculus. Sample derivation included.

 whom

Ax

s  s s

 

whomw

 s  s

ladyl

8. Johnj

1. 2. 3. 4. 5. 6. 7.



α

α x φ



x α φ

α

, K 

P  , P 

x

 K 

x

Ax

α

 K M  λα K λx

K 

66 A. Bastenhof

Extraction in the Lambek-Grishin Calculus

4.2

67

Lexical Semantics

Having established a compositional procedure for determining the denotations of complex constituents from those of their components, we close off the recursion by interpreting the words at the yields. In doing so, we need no longer commit to linearity constraints: while recipes for compositional meaning construction inherit their linearity from the syntactic mechanisms they interpret, our means of referring to the world around us is not so restricted. Thus, we fix as our domain of discourse the simply typed λ-calculus with product and implication types. Moreover, we ask at least for atomic types e, t, characterizing the denotational domains of entities and truth values respectively: τ, σ

e t

τ









τ



σ

Complex linear types carry over straightforwardly: replace occurrences of τ  σ and τ by τ σ and τ  t respectively. Moreover, terms M  N  become pairings M, N  and case expressions case N of x  y.M become simultaneous substitutions M π1 N x, π2 N y . Abbreviating λxτ σ M π1 xy, π2 xz  by λy, z .M and types τ  t by τ , our terms of 4.1 remain practically unchanged. The only non-triviality lies in how we interpret the atoms φ, s, np, n: φ t

s t

np e

n e



In particular, input occurrences of s, np, n interpret as s, np and n, which now read t (sentences denote truth values), e (noun phrases denote entities) and e e  t (common nouns denote first-order properties). Henceforth, we write   for the composition of  ( 4.1) with the mapping to the type language of the λ-calculus, as outlined above. Now consider again our example: lady n

whom John nns  np do  npsu

saw s  npdo np su s  s

The derivational interpretation of this relative clause we calculated to be w



γ, l, λα2 s λx4 , α1 x4 λoα1 o, α2 , j 



parameterizing over the following lexical denotations: Word Variable Type  lady l n  whom w nns  np  John j np  saw s s  s  np  nps Our task is to substitute appropriate terms for l, w, j, s such that the result becomes logically equivalent to: 

γ λy e lady y  saw y  john

68

A. Bastenhof

with γ the remaining free variable of type n to have at our disposal the following constants:



e. For this task, we assume

Constant(s) Type Description (denotation)

e Existential quantification e  e Equality of entities t  t Conjunction john e Entity lady e (First-order) property see e  e Binary relation In particular, we assume the interpretations of , and to uniformly match their semantics in classical first-order predicate logic across (the usual set-theoretic) models for the λ-calculus.8 Suitable witnesses to the variables j, l are now easily provided: just take the constants john and lady respectively. For whom, we seek a term of the type nns  np   nn snp      n n  s np  e e t e 

def def def

We propose λγ, P , Qγ λy e P y  Q λpt p, y  Our real interest, of course, is in saw. Its type nps  s snp decomposes as s snp nps. Since it is negative and occurs in antecedent position, we seek a term of the type s s  np  nps s snp  nps . That is, we seek an abstraction: λR

ssnpnpsì

N

reducing the problem to that of finding a suitable instantiation of N of type t. The type of the bound variable R further dissects as 

s  snp  nps

def

nps s  snp 



Thus, we are to provide R with two arguments, one of type of type s  snp . We start with the former: nps N1



def def

t e λq t , xe q

nps , the other







see u x

Notice the use of a free variable u (type e) as a place-holder for the extracted argument of saw (i.e., the direct object). As for the second argument of R, 

8

s  snp N2

In practice, we write

 and

def def

t t e λpt , q t , y e q  p y 



u



in infix notation, and abbreviate  λxe M  as xe M .

Extraction in the Lambek-Grishin Calculus

69

In essence, N2 contains the recipe for construing the denotation of the gapped embedded clause John saw . Crucially, it equates the denotation of the extracted argument (the bound variable y) with u (the free variable standing in for the direct object in N1 ). An existential binding of u concludes the denotation of saw: N

ue R N1 , N2  e

u R λq, xq

see u x, λp, q  , y q  p y

def



u

With the denotations of each of the lexical items fixed, we apply β-reduction. We sketch the computation. Starting again from the term below, replace j with john and l with lady: w



γ, l, λα2 s λx4 , α1 x4 λoα1 o, α2 , j 



Substituting for s the term we found for saw in s λx4 , α1 x4 λoα1 o, α2 , john



yields, after β-reduction,

uπ1 α2 

see u john π2 α2 



u

Finally, replacing w with the term we found for whom in w



γ, lady, λα2 uπ1 α2 



see u john π2 α2 



u

gives a term β-reducible to γ λy e lady y  ue see u john y



u

The desired result we then obtain by the following equivalence, licensed by the laws of predicate logic:9

5

ue see u john y

u  see y  john

Discussion

We have provided an analysis for extraction in the Lambek-Grishin calculus. Its syntactic component was inspired by Moortgat’s discontinuous type constructor, while its semantic component drew upon a double negation translation into the simply-typed λ-calculus. Characteristic of the lexical semantics we proposed was the use of constants for equality and existential quantification to identify the extracted argument with its gap position. Ours is not the only analysis of extraction in LG. We mention several alternatives: 1. Moot ([10]) provides an embedding of lexicalized tree adjoining grammars in LG, illustrating it with a grammar for a language fragment exhibiting discontinuous dependencies. Like ours, Moot’s approach relies on (finite) lexical ambiguity, although we have not yet seen it been coupled with a Montagovian semantics. 9

Compare this to Montague’s PTQ treatment of to be.

70

A. Bastenhof

2. Moortgat and Pentus ([9]) explore the relation of type similarity in LG, with A, B being type similar (written A  B) in case A, B are derivable from a common D, referred to as their meet (or, equivalently, A  B iff A, B both derive some join C). In particular, they observe npsnppp  npsppnp (pp an atomic type for prepositions), suggesting the assignment of their meet to gave so as to make derivable both John gave a book to Mary as well as the gapped book that John gave  to Mary from a single type assignment. Our proposal constitutes a refinement of this approach: we observed (as an instance of the general schema discussed at the end of 3) s  np , witnessing it by a meet that we npsppnp  npspp  s ensured would not lead to overgeneration. 3. A different approach, though not tailored to LG, is that of [12]: extraction derives from ’movement’, expressed by permutations of antecedent structures. To prevent a collapse into multisets, the latter operation is licensed only when the phrase undergoing the dislocation is prefixed by  (hence applicable to the type   np used for the gap in 3):  

A  B  C  A  B  C 

A  B   C B  A  C 

A  B   C A  B   C

A  B  C  A  C   B

 

The current approach, like those discussed above, instead derives Vermaat’s postulates as linear distributivity principles between merger  and the subtractions , , albeit with the turnstile turned around. Indeed, the following sequents, studied originally by [4], are derivable in our presentation of LG: A  B  C A  B  C 



A  B  C  B  A  C 

A  B  C  A  B   C

A  B  C A  C  B

 

Conversely, one may present LG using these sequents as axioms. We refer to [8] for further details. We conclude with a discussion of LG’s derivational semantics. Our proposal in 4.1 ’competes’ with that of [1], who consider the following two dual translations (named call-by-name and call-by-value, after the evaluation strategies they encode for Cut elimination in sequent calculus):  (CBN)  (CBV) A (atomic) A A AB, B A B   A B   A B  A, A  B B   A B   A

A derivation of Γ Π we then interpret by Γ , Π  M φ (CBN) or by Γ , Π  M φ (CBV). We make two observations: [1] make no explicit reference to polarities; and our own proposal sides with CBN for complex A (for negative direct subtypes), but with CBV for atomic types. The second observation hints at an explanation of the first: CBN considers all types negative, whereas CBV considers all types positive, thus preventing any mixing of polarities. As an illustration of how the two proposals compare in practice, consider again our term for saw in 4.2:

Extraction in the Lambek-Grishin Calculus

λR uR λq, xq

see u x, λp, q  , y q  p y



71

u

In CBN, we would have had to adopt the more complex λR uR λk, X X λxk see u x , λw, k  , Y w λpk  p Y λy y

u

with no term being derivable at all in CBV. We leave the further comparisons between these two proposals for future research. Acknowledgements. I thank the following people for their comments on earlier drafts of this paper: Michael Moortgat, Raffaella Bernardi, Jan van Eijck, Andres L¨oh, Gianluca Giorgolo, Christina Unger, as well as two anonymous referees.

References 1. Bernardi, R., Moortgat, M.: Continuation semantics for symmetric categorial grammar. In: Leivant, D., de Queiroz, R.J.G.B. (eds.) WoLLIC 2007. LNCS, vol. 4576, pp. 53–71. Springer, Heidelberg (2007) 2. De Groote, P., Lamarche, F.: Classical non-associative lambek calculus. Studia Logica 71, 355–388 (2002) 3. Girard, J.-Y.: A new constructive logic: Classical logic. Mathematical Structures in Computer Science 1(3), 255–296 (1991) 4. Grishin, V.N.: On a generalization of the Ajdukiewicz-Lambek system. In: Mikhailov, A.I. (ed.) Studies in Nonclassical Logics and Formal Systems, Nauka, Moscow, pp. 315–334 (1983) 5. Lafont, Y., Reus, B., Streicher, T.: Continuation semantics or expressing implication by negation. Technical report, Ludwig-Maximilians-Universit¨ at, M¨ unchen (1993) 6. Moortgat, M.: Generalized quantifiers and discontinuous type constructors. In: Horck, A., Sijtsma, W. (eds.) Discontinuous Constituency, pp. 181–208. Mouton de Gruyter, Berlin (1992) 7. Moortgat, M.: Categorial type logics. In: van Benthem, J.F.A.K., ter Meulen, G.B.A. (eds.) Handbook of Logic and Language, pp. 93–177. Elsevier, Amsterdam (1997) 8. Moortgat, M.: Symmetries in natural language syntax and semantics: The LambekGrishin calculus. In: Leivant, D., de Queiroz, R.J.G.B. (eds.) WoLLIC 2007. LNCS, vol. 4576, pp. 264–284. Springer, Heidelberg (2007) 9. Moortgat, M., Pentus, M.: Type similarity for the Lambek–Grishin calculus. In: Proceedings 12th conference on formal grammar (2007) 10. Moot, R.: Proof nets for display logic. CoRR, abs/0711.2444 (2007) 11. Shan, C.-C.: A continuation semantics of interrogatives that accounts for Baker’s ambiguity. In: Jackson, B. (ed.) Proceedings of SALT XII: semantics and linguistic theory, pp. 246–265 (2002) 12. Vermaat, W.: The logic of variation. A cross-linguistic account of wh-question formation. PhD thesis, Utrecht Institute of Linguistics OTS, Utrecht University (2006)

Formal Parameters of Phonology From Government Phonology to SPE Thomas Graf Department of Linguistics University of California, Los Angeles [email protected] http://tgraf.bol.ucla.edu

Abstract. Inspired by the model-theoretic approach to phonology deployed by Kracht [25] and Potts and Pullum [32], I develop an extendable modal logic for the investigation of phonological theories operating on (richly annotated) string structures. In contrast to previous research in this vein [17, 31, 37], I ultimately strive to study the entire class of such theories rather than merely one particular incarnation thereof. To this end, I first provide a formalization of classic Government Phonology in a restricted variant of temporal logic, whose generative capacity is then subsequently increased by the addition of further operators, thereby pushing it up the subregular hierarchy until one reaches the level of the regular stringsets. I identify several other axes along which Government Phonology might be generalized, moving us towards a parametric metatheory of phonology.

Like any other subfield of linguistics, phonology is home to a multitude of competing theories that differ vastly in their conceptual and technical assumptions. Contentious issues are, among others, the relation between phonology and phonetics (and if it is an interesting research question to begin with), if features are privative, binary or attribute valued, if phonological structures are strings, trees or complex matrices, if features can move from one position to another (i.e. if they are autosegments), and what role optimality requirements play in determining well-formedness. Meticulous empirical comparisons carried out by linguists have so far failed to yield conclusive results; it seems that for every phenomenon that lends support to a certain set of assumptions, there is another one that refutes it. The lack of a theoretical consensus should not be taken to indicate that the way phonologists go about their research is flawed. Unless one subscribes to the view that scientific theories can faithfully reflect reality rather than merely approximate it, it is to be expected that one theory may fail where another one succeeds, and vice versa. A similar situation arises in physics, where depending 

This paper has benefited tremendously from the comments and suggestions of Bruce Hayes, Ed Keenan, Marcus Kracht, Ed Stabler, Kie Zuraw, the members of the UCLA phonology seminar (winter quarter 2009), and two anonymous reviewers.

T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 72–86, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Formal Parameters of Phonology

73

on the circumstances light exhibits particle-like or wave-like properties. But faced with this apparent indeterminacy of theory choice, it is only natural for us to ask if there is a principled way to identify interchangeable theories, i.e. proposals which may seem to have little in common yet are underlyingly the same. This requires developing a metatheory of phonology that uses a finite set of parameters to conclusively determine the equivalence class which a given phonological theory belongs to. This paper is intended to lay the basis for such a metatheory, building on techniques and insights from model-theoretic syntax [24, 35, 36]: I develop a modal logic for the formalization of a particular theory, Government phonology (GP), and then use this modal logic and its connections to neighboring areas, foremost formal language theory, to explore natural extensions and their relation to other approaches in phonology. I feel obliged to point out in advance that I have my doubts concerning the feasibility of a formal theory of phonology that is adequate and insightful on both a linguistic and a mathematical level. But this is a problem all too familiar to mathematical linguists: any mathematically natural class of formal languages allows for constructions that never arise in natural language. For example, assignment of primary word stress is sometimes sensitive to whether a syllable is an odd or an even number of syllables away from the edge of a word (see [10] and my remarks in Sec. 2). Now in order to distinguish between odd and even, phonology has to be capable of counting modulo 2. On the other hand, phenomena that involve counting modulo 3, 4 or 21 — which from a mathematical perspective are just as simple as counting modulo 2 — are unheard of. Thus, the problem of mathematical methods in the realm of language is that their grip tends to be too loose, and the more we try to tighten it, the more difficult it becomes to prove interesting results. Undeniably, though, a loose grip is better than no grip at all. I am confident that in attempting to construct the kind of metatheory of phonology I envision, irrespective of any shortcomings it might have, we will gain crucial insights into the core claims about language that are embodied by different phonological assumptions (e.g. computational complexity and memory usage) and how one may translate those claims from one theory into another. Moreover, the explicit logical formalization of linguistic theories makes it possible to investigate various problems in an algorithmic way using techniques from proof theory and model checking. These results are relevant to linguists and computer scientists alike. Linguists get a better understanding of how their claims relate to the psychological reality of language, how the different modules of a given theory interact to yield generalizations, and how they increase the expressivity of a theory (see [32] for such results on optimality theory). To a limited degree, linguists also get the freedom to switch to different theories for specific phenomena without jeopardizing the validity of their framework of choice. Computer scientists, on the other hand, will find that the model-theoretic perspective on phonology eases the computational implementation of linguistic proposals and allows them to gauge their runtime-behavior in advance. Furthermore, they may use the connection between finite model theory and formal language theory to increase the

74

T. Graf

efficiency of their programs by picking the weakest phonological theory that is expressive enough for the task at hand. This paper is divided into two parts as follows. First, I introduce GP as an example of a weak theory of phonology and show how it can be axiomatized as a theory of richly annotated string structures using modal logic. In the second part, I analyze several parameters that distinguish GP from other proposals and might have an effect on generative capacity. In particular, I discuss how increasing the power of GP’s spreading operation moves us along the subregular hierarchy and why the specifics of the feature system have no effect on expressivity in general. I close with a short discussion of two important areas of future research, the impact of the syllable template on generative capacity and the relation between derivational and representational theories. The reader is expected to have some basic familiarity with phonology, formal language theory, non-classical logics and model-theoretic syntax. There is an abundance of introductory material for the former three, while the latter is cogently summarized in [34] and [35].

1 1.1

A Weak Theory of Phonology — Government Phonology Informal Overview

Due to space restrictions, I offer but a sketch of the main ideas of Government Phonology (GP). More readily accessible expositions may be found in the User’s Guide to Government Phonology [20] and related work of mine [10, 11]. To compensate for the terseness, the reader may want to check the explanation against the examples in Fig. 1 on the facing page. Before we go in medias res, though, a note on my sources is in order. Just like Government-and-Binding theory [4], GP has changed a lot since its inception and practitioners hardly ever fully specify the details of the version of GP they use. However, there seems to be a consensus that a GP-variant is considered canonical if it incorporates the following modules: government, the syllable template, coda licensing and the ECP from [21], magic licensing from [19], and licensing constraints and the revised theory of elements from [20]. My strategy will be to follow the definitions in [20] as closely as possible and fill in any gaps using the literature just cited. In GP, the carrier of all phonological structure is the skeleton, a finite, linearly ordered sequence of nodes (depicted by little crosses in Fig. 1) to which phonological expressions (PEs) can be attached in order to form the melody of the structure. A PE is built from a set E of privative features called elements, yielding a pair O, H, where O ⊆ E is a set of operators, H ∈ E ∪ {∅} the head, and H ∈ / O. It is an open empirical question how many features are needed for an adequate account of phonological behavior [13, 14] — recent incarnations usually set E := {A, I, U, H, L, }, but for our axiomatization the only requirement is for E to be finite. Some examples of PEs are  = {A, H} , ∅,  = {L, } , A,  = ∅, ∅,  = {I} , ∅,  = ∅, I, and   = ∅, I. The set of licit PEs is

Formal Parameters of Phonology O

R

O R

N

C

x x x 



N

x x x 

N

x x x x x x x 





O R O R O R

N

N

O R

N C

N

 

O R O R O R N

O R O R

N

N

N

x x x x x x

x x x x x x









75





Fig. 1. Some phonological structures in GP (with IPA notation)

further restricted by language-specific licensing constraints, i.e. restrictions on the co-occurrence of features and their position in the PE. Common licensing constraints are for A to occupy only head positions, ruling out  in the list above, and for I and U not to occur in the same PE, ruling out the typologically uncommon   = {U} , I and   = {I} , U, among others. As witnessed by  = ∅, I and   = ∅, I, every PE is inherently underspecified; whether it is realized as a consonant or a vowel depends on its position in the structure, which is annotated with constituency information. An expression is realized as a vowel if it is associated to a skeleton node contained by a nucleus (N), but as a consonant if the node is contained by an onset (O) or a coda (C). Every N constitutes a rhyme (R), with C an optional subconstituent of R. All O, N and R may branch, that is be associated to up to two skeleton nodes, but a branching R must not contain a branching N. Furthermore, word initial O can be floated, i.e. be associated to no node at all. The number of PEs per node is limited to one, with the exception of unary branching N, where the limit is two (to model light diphthongs). All phonological structures are obtained from concatenating O, R pairs according to constraints imposed by two government relations. Constituent government restricts the distribution of elements within a constituent, requiring that the leftmost PE licenses all other constituent-internal PEs. Transconstituent government enforces dependencies between the constituents themselves. In particular, every branching O has to be licensed by the N immediately following it, and every C has to be licensed by the PE contained in the immediately following O. Even though the precise licensing conditions are not fully worked out for either government relation, the general hypothesis is that PE i licenses PE j iff PE i is leftmost in its constituent and contained by N, or leftmost in its constituent and composed from at most as many elements as PE j and licenses no PE k = PE j

76

T. Graf

(hence any C has to be followed by a non-branching O, but a branching O might be followed by a branching N or R). GP also features empty categories: a segment does not have to be associated to a PE. Inside a unary branching O, an unassociated node will always be mapped to the empty string. Inside N, on the other hand, it is either mapped to the empty string or the language-specific realization of the PE {∅} , ∅. This is determined by the phonological ECP, which allows only p-licensed N to be mapped to the empty string. N is p-licensed if it is followed by a coda containing a sibilant (magic licensing), or in certain languages if it is the rightmost segment of the string (final empty nucleus, abbreviated FEN), or if it is properly governed [18]. N is properly governed if the first N following it is not p-licensed and no government relations hold between or within any Cs or Os in-between the two Ns. Note that segments inside C or a branching O always have to be associated to a PE. Finally, GP allows elements to spread, just as in fully autosegmental theories [9]. All elements, though, are assumed to share a single tier, and association lines are allowed to cross. The properties of spreading have not been explicitly spelled out in the literature, but it is safe to assume that it can proceed in either direction and might be optional or obligatory, depending on the element, its position in the string and the language in question. While there seem to be restrictions on the set of viable targets given a specific source, the only canonical one is a ban against spreading within a branching O. 1.2

Formalization in Modal Logic

For my formalization, I use a very weak modal logic that can be thought of as the result of removing the “sometime in the future” and “sometime in the past” modalities from restricted temporal logic [6, 7]. Naturally, the tree model property of modal logic implies that the logic is too weak to define the intended class of models, so we are indeed dealing with a formal description rather than a proper axiomatization. Let E be some non-empty finite set of basic elements different from the neutral element v, which represents the empty set of GP’s feature calculus. We define the set of elements E := (E × {1, 2}× {head , operator } × {local , spread }) ∪ ({v} × {1, 2} × {head , operator } × {local }). The intended role of the head /operator and local /spread parameter is to distinguish elements according to their position in the PE and whether they arose from a spreading operation, respectively. The second projection is of very limited use and required only by GP’s rendition of light diphthongs as two PEs associated to one node in the structure. The set of melodic features M := E ∪ {μ, fake, } will be our set of propositional variables. The intention is for μ (mnemonic for mute) and  to mark unpronounced and licensed segments, respectively, while fake denotes an unassociated onset. For the sake of increased readability, the set of propositional variables is “sorted” such that x ∈ M is represented by m, m ∈ E by e, heads by h, and operators by o. The variable en is taken to stand for any element such that π2 (e) = n, where πi (x) returns the ith projection of x. In rare occasions, I will write e and e for a specific element e in head and operator position, respectively.

Formal Parameters of Phonology

77

Furthermore, there are three nullary modalities1 , N , O, C, the set of which is designated by S, read skeleton. In addition, we introduce two unary diamond operators  and , whose duals are denoted by  and . The set of well-formed formulas is built up in the usual way from M, S, , , → and ⊥. Our intended models M := F, V  are built over bidirectional frames F := D, Ri , R i∈S , where D is an initial subset of N, Ri ⊆ D for each i ∈ S, and R is the successor function over N. The valuation function V : M → ℘(D) maps propositional variables to subsets of D. The definition of satisfaction is standard, though it should be noted that our models are “numbered from right to left”. That is to say, 0 ∈ D marks the right edge of a structure and n + 1 is to the left of n. This is due to GP’s transconstituent government being computed from right to left. M, w |= ⊥ M, w |= p M, w |= ¬φ M, w |= φ ∧ ψ M, w |= N M, w |= O M, w |= C M, w |= φ M, w |= φ

never iff w ∈ V (p) iff M, w  φ iff M, w |= φ and M, w |= ψ iff w ∈ RN iff w ∈ RO iff w ∈ RC iff M, w + 1 |= φ iff M, w − 1 |= φ

With the logic fully defined, we can turn to the axioms for GP. The formalization of the skeleton is straightforward if one models binary branching constituents as two adjacent unary branching ones and views rhymes as mere notational devices. Recall that Ns containing light diphthongs are implemented as a single N with both e1 and e2 elements associated to it.   S1 Unique constituency i∈S (i ↔ i=j∈S ¬j) S2 ( ⊥ → O) ∧ ( ⊥ → N ) Word edges S3 R ↔ (N ∨ C) Definition of rhyme S4 N → O∨  N Nucleus placement S5 O →¬O∨¬O Binary branching onsets S6 R→¬R∨¬R Binary branching rhymes S7 C → N ∧  O Coda placement GP’s feature calculus is also easy to capture. A propositional formula φ over a  set of variables x1 , . . . , xk is called exhaustive iff φ := 1≤i≤k ψi , where for every i, ψi is either xi or ¬xi . A PE φ is an exhaustive propositional formula over E such that φ ∪ {F1, F2, F3, F4, h, o} is consistent. 1

I follow the terminology of [1] here. Nullary modalities correspond to unary relations and can hence be thought of as propositional constants. As far as I can see, nothing hinges on whether we treat constituent labels as nullary modalities, propositional constants, or propositional variables; my motivation in separating them from phonological features stems solely from the parallel distinction between melody and constituency in GP.

78

T. Graf

F1 F2 F3 F4

  (hn → hn =h ¬hn ) Exactly one head n   ¬v → (hn → π1 (h)=π1 (o) ¬on ) No basic element (except v) twice  v → o=v ¬o v excludes other operators  Pseudo branching implies first branch (e2 → h1 ∧ o1 )

Let PH be the least set containing all PEs (noting that a PE is now a particular kind of propositional formula), and let lic : PH → ℘(PH ) map every PE to its set of melodic licensors. Furthermore, S ⊆ PH designates the set of PEs occurring in the codas of magic licensing configurations (the letter S is mnemonic for “sibilants”). The following five axioms, then, sufficiently restrict the melody.

 Universal annotation M1 i∈S i → φ∈PH φ ∨ μ ∨ fake  M2 ((O∨  N ∨  N ) → ¬e2 ) No pseudo branching for O, C & branching N  M3 O∧  O → φ∈PH (φ → ψ∈lic(φ)  ψ) Licensing within branching onsets   M4 C ∧ i∈S ¬i → ¬μ ∧ φ∈PH (φ → ψ∈lic(φ)  ψ) Melodic coda licensing  fake → O ∧ m=fake ¬m Fake onsets M5 Remember that GP allows languages to impose further restrictions on the melody by recourse to licensing constraints. It is easy to see that licensing constraints operating on single PEs can be captured by propositional formulas. The licensing constraint “A must be head”, for instance, corresponds to the propositional formula ¬A. Licensing constraints that extend beyond a single segment can be modeled using  and , provided their domain of application is finitely bounded (see the discussion on spreading below for further details). Thus licensing constraints pose no obstacle to formalization in our logic, either. As mentioned above, I use μ to mark “mute” segments that will be realized as the empty string. The distribution of μ is simple for O and C — the latter never allows it, and the former only if it is unary branching and followed by a pronounced N. For N, on the other hand, we first need to distribute  in a principled manner across the string to mark the licensed nuclei, i.e. those N that may remain unpronounced. Note that unpronounced segments may not contain any other elements (which would affect spreading).  L1 μ → m∈{μ,} ¬m ∧ ¬C ∧ (N → ) Empty categories / L2 L3 L4

N ∧  N → (μ ↔ μ) No partially mute branching nuclei O ∧ μ → ¬  O∧  (N ∧ ¬μ) Mute onsets N ∧  ↔  (C ∧ i∈S i) ∨ (¬  N ∧  ⊥) ∨ P-licensing

 

  Magic Licensing

FEN

((¬  N → ( N ∨  ⊥)) ∧ (¬  N → (N ∧ ¬μ)))

  Proper Government

Formal Parameters of Phonology

79

Axiom L4 looks daunting at first, but it is easy to unravel. The magic licensing conditions tells us that N is licensed if it is followed by a sibilant in coda position.2 The FEN condition ensures that wordfinal N are licensed if they are nonbranching. The proper government condition is the most complex one, though it is actually simpler than the original GP definition. Remember that N is properly governed if the first N following it is pronounced and neither a branching onset nor a coda intervenes. Also keep in mind that we treat a binary branching constituent as two adjacent unary branching constituents. The proper government condition then enforces a structural requirement such that N (or the first N if we are talking about two adjacent N) may not be preceded by two constituents that are not N and (the second N) may not be followed by two constituents that are not N or not pronounced. Together with axioms S1–S7, this gives the same results as the original constraint.3 The last module, spreading, is also the most difficult to accommodate. Most properties of spreading are language specific — only the set of spreadable features and the ban against onset internal spreading are universal. To capture this variability, I define a general spreading scheme σ with six parameters i, j, ω, , min and max . ω

π1 (i)=π1 (j)

(i ∧ ω →

max 

♦n (j ∧ ) ∧ (O ∧ ♦O → ω



n=min

max 

♦n (j ∧ ))) ω

σ :=

n=min+1

The variables i, j ∈ E, coupled with judicious use of the formulas ω and regulate the optionality of spreading. If spreading is optional, i is a spread element and ω, are formulas describing, respectively, the structural configuration of the target of spreading and the set of licit sources for spreading operations to said target. If spreading is mandatory, then i is a local element and ω, describe the source and the set of targets. If we want spreading to be mandatory in only those where cases max a target is actually available, ω has to contain the subformula n=min ♦n . Observe moreover that we need to make sure that every structural configuration is covered by some ω, so that unwanted spreading can be blocked by making ω

ω

ω

ω

ω

2

3

Note that we can easily restrict the context, if this appears to be necessary for em pirical reasons. Strengthening the condition to  (C ∧ i∈S i)∧  ⊥, for example, restricts magic licensing to the N occupying the second position in the string. In this case, the modal logic is once again flexible enough to accommodate various alternatives. For instance, if proper government should be limited to non-branching Ns, one only has to replace both occurrences of → by ∧. Also, my formalization establishes no requirement for a segment to remain silent, because N often are pronounced in magic licensing configurations or at the end of a word in a FEN language. For proper government, however, it is sometimes assumed that licensed nuclei have to remain silent, giving rise to a strictly alternating pattern of realized and unrealized Ns. If we seek to accommodate such a system, we have to distinguish Ns that are magically licensed or FEN licensed from Ns that are licensed by virtue of being properly governed. The easiest way to do so is to split  into two features o and m (optional and mandatory), the latter of which is reserved for properly governed Ns. The simple formula m → μ will force such Ns to remain unpronounced.

80

T. Graf

not satisfiable. As further parameters, the finite values min, max > 0 encode the minimum and maximum distance of spreading, respectively. Finally, the operator ♦ ∈ {, } fixes the direction of spreading for the entire formula (♦n is the n-fold iteration of ♦). With optional spreading, the direction of the operator is opposite to the direction of spreading, otherwise they are identical. The different ways of interaction between the parameters is summarized in Table 1. Table 1. Parameterization of spreading patterns with respect to σ Direction

optional optional mandatory mandatory

left right left right

i

ω

ω

Mode



spread spread local local

target target source source

source source target target

   

As the astute reader (or rather, all readers that took a glimpse at footnotes 2 and 3) will have noticed by now, nothing in our logic prevents us from defining alternative versions of GP. Whether this is a welcome state of affairs is a matter of perspective. On the one hand, the flexibility of our logic ensures its applicability to a wide range of different variants of GP, e.g. to versions where spreading is allowed within onsets or where the details of proper government and the restrictions on branching vary. On the other hand, it raises the question whether there isn’t an even weaker modal logic that is still expressive enough to formalize GP. However, the basic feature calculus of GP already requires the logical symbols ¬ and ∧, which gives us the complete set of logical connectives, and we furthermore need  and  to move us along the phonological string. Hence, imposing any further syntactic restrictions on formulas requires advanced technical concepts such as the number of quantifier alternations. But this brings us back to an issue I discussed in the preface to this section: the loose grip of mathematical methods, and why it isn’t as problematic as it might seem initially. Lest I unnecessarily bore the reader with methodological remarks, I shall merely point out that it is doubtful that a further weakening of the logic would would have interesting ramifications given the questions I set out to answer; I am not interested in the logic that provides the best fit for a specific theory but in the investigation of entire classes of string-based phonological theories from a model-theoretic perspective. In the next section, I try to get closer to this goal.

2 2.1

The Parameters of Phonological Theories Elaborate Spreading — Increasing the Generative Capacity

It is easy to see that the modal logic defined in the previous section is powerful enough to account for all finitely bounded phonological phenomena (I hasten to add that this does not imply that GP itself can account for all of them, since

Formal Parameters of Phonology

81

certain phenomena might be ruled out by, say, the syllable template or the ECP). In fact, it is even possible to accommodate many long-distance phenomena in a straight-forward way, provided that they can be reinterpreted as arising from iterated application of finitely bounded processes or conditions. Consider for example a stress rule for language L that assigns primary stress to the last syllable that is preceded by an even number of syllables. Assume furthermore that secondary stress in L is trochaic, that is to say it falls on every odd syllable but the last one. Let 1 and 2 stand for primary and secondary stress, respectively. Unstressed syllables are assigned the feature 0. Then the following formula will ensure the correct assignment of primary stress, even though the notion of being separated from the left word edge by an even number of syllables is unbounded (for the sake of simplicity, I assume that every node in the string represents a syllable; it is an easy but unenlightening exercise to rewrite the formula for a GP syllable template consisting of Os, Ns and Cs).   i∧ (i → ¬j) ∧ ( ⊥ → 1 ∨ 2) ∧ (2 → 0)∧ i∈{0,1,2}

i=j∈{0,1,2}

(0 → (1 ∨ 2)∨  ⊥) ∧ (1 → ¬  1 ∧ ( ⊥∨  ⊥)) Other seemingly unbounded phenomena arising from iteration of local processes, most importantly vowel harmony (see [3] for a GP analysis), can be captured in a similar way. However, there are several unbounded phonological phenomena that require increased expressivity, as I discuss en detail in [10]. Since we are only concerned with string structures, it is a natural move to try to enhance our language with operators from more powerful string logics, in particular, linear temporal logic. The first step is the addition of two operators + + and + with the corresponding relation R , the transitive closure of R . This new logic is exactly as powerful as restricted temporal logic [6], which in turn has been shown to exactly match the expressivity of the two-variable fragment of first-order logic ([7]; see [44] for further equivalence results). Among other things, unbounded OCP effects [9, 26] can now be captured in an elegant way. The formula O ∧A∧L∧ →+ ¬(O ∧A∧ ), for example, disallows alveolar nasals to be followed by another alveolar stop, no matter how far the two are apart. But + and + are too coarse for faithful renditions of unbounded spreading. For example, it is not possible to define all intervals of arbitrary size within which a certain condition has to hold (e.g. no b may appear between a and c). As a remedy, we can add to the logic the until and since operators U and S familiar from linear temporal logic, granting us the power of full first-order logic and pushing us to the level of the star-free languages [5, 6, 29, 41]. Star-free languages feature a plethora of properties that make them very attractive for purposes of natural language processing. Moreover, the only phenomenon known to the author that exceeds their confines is stress assignment in Cairene Arabic and Creek, which basically works like the stress assignment system outlined above — with the one exception that secondary stress is not marked overtly [12, 30]. Under these conditions, assigning primary stress involves counting modulo 2,

82

T. Graf

which is undefinable in first-order logic, whence a more powerful logic is needed. The next step up from the star-free stringsets are the regular stringsets, which can count modulo n. The regular stringsets are identical to the sets of finite strings definable in monadic second order logic (MSO) [2], linear temporal logic with modal fixed point operators [43] or regular linear temporal logic [27]. In linguistic terms, this corresponds to spreading being capable of picking its target based on more elaborate patterns, counting modulo 2 being one of them. For further discussion of the relation between expressivity and phenomena in natural language phonology, the reader is once again referred to [10]. A caveat is in order, though. Thatcher [40] proved that every recognizable set is a projection of some local set. Thus the hierarchy outlined above collapses if we grant ourselves an arbitrary number of additional features to encode all the structural properties our logic cannot express. In the case of primary stress in Cairene Arabic and Creek, for instance, we could just use the feature for secondary stress assignment even though secondary stress seems to be absent in these languages. Generally speaking, we can reinterpret any unbounded dependency as a result of iterated local processes by using “invisible” features. Therefore, all claims about generative capacity hold only under the proviso that all such coding-features are being eschewed. We have just seen that the power of GP can be extended along the subregular hierarchy, up to the power of regular languages, and that there seems to be empirical motivation to do so. Interestingly, it has been observed that SPE yields regular languages, too [15, 17]. But even the most powerful rendition of GP defines only a proper subset of the stringsets derivable in SPE, apparently due to its restrictions on the feature system, the syllable template and its government requirements. The question we face, then, is whether we can generalize GP in these regards, too, to push it to the full power of SPE and obtain a multidimensional vector space of phonological theories. 2.2

Feature Systems

Is is easy to see that at the level of classes of theories, the restriction to privative features is immaterial. A set of PEs is denoted by some propositional formula over E, and the boolean closure of E is isomorphic to ℘(E). But as shown in [22], a binary feature system using a set of features F can be modeled by the powerset algebra ℘(F), too. So if |E| = |F|, then ℘(E) and ℘(F) isomorphic, and so are the two feature systems. The same result holds for systems using more than two feature values, provided their number is finitely bounded, since multivalued features can be replaced by a collection of binary valued features given sufficient co-occurrence restrictions on feature values (which can easily be formalized in propositional logic). One might argue, though, that the core restriction of privative feature systems does not arise from the feature system itself but from the methodological principle that absent features, i.e. negative feature values, behave like constituency information and cannot spread. In general, though, this is not a substantial restriction either, as for every privative feature system E we can easily design a

Formal Parameters of Phonology

83

privative feature system F := {e+ , e− | e ∈ E} such that M, w |= e+ iff M, w |= e and M, w |= e− iff M, w |= ¬e. Crucially, though, this does not entail that the methodological principle described above has no impact on expressivity when the set of features is fixed across all theories, which is an interesting issue for future research. 2.3

Syllable Template

While GP’s syllable template could in principle be generalized to arbitrary numbers and sizes of constituents, a look at competing theories such as SPE and CVCV [28, 38] shows that the number of different constituents is already more than sufficient. This is hardly surprising, because GP’s syllable template is modeled after the canonical syllable template, which isn’t commonly considered to be in need of further refinement. Consequently, we only need to lift the restriction on the branching factor and allow theories not to use all three constituent types. SPE then operates with a single N constituent of unbounded size (as no segment in SPE requires special licensing, just like Ns in GP), whereas CVCV uses N and O constituents of size 1. Regarding the government relations, the idea is to let every theory fix the branching factor b for each constituent and the maximum number l of licensees per head. Every node within some constituent has to be constituent licensed by the head, i.e. the leftmost node of said constituent. Similarly, all nodes in a coda or non-head position have to be transconstituent licensed by the head of the following constituent. For every head the number of constituent licensees and transconstituent licensees, taken together, may not exceed l. Even from this basic sketch it should already be clear that the syllable template can have a negative impact on expressivity, but only under the right conditions. For instance, if our feature system is set up in a way such that every symbol of our alphabet is to be represented by a PE in N (as happens to be the case for SPE), restrictions on b and l are without effect. Thus one of the next stages in this project will revolve around determining under which conditions the syllable template has a monotonic effect on generative capacity. 2.4

Representations versus Derivations

One of the most striking differences between phonological theories is the distinction between representational and derivational ones, which begs the question how we can ensure comparability between these two classes. Representational theories are naturally captured by the declarative, model-theoretic approach, whereas derivational theories like SPE are usually formalized as regular relations [17, 31], which resist being recast in logical terms due to their closure properties. This problem is aggravated by the fact Optimality Theory [33], which provides the predominant framework in contemporary phonology, is also best understood in terms of regular relations [8, 16]. Of course, one can use a coding trick from two-level phonology [23] and use an unpronounced feature like μ to ensure that

84

T. Graf

all derivationally related strings have the same length, so that the regular relations can be interpreted as languages over pairs and hence cast in MSO terms [42]. Unfortunately, it is far from obvious how this method could be extended to subregular grammars, because Thatcher’s theorem tells us that the projection of a subregular language of pairs might be a regular language. But due to the ubiquity of SPE and OT analyses in phonology, no other open issue is of greater importance to the success of this project.

3

Conclusion

The purpose of this paper was to lay the foundation for a general framework in which string-based phonological theories can be matched against each other. I started out with a modal logic which despite its restrictions was still perfectly capable of defining a rather advanced and intricate phonological theory. I then tried to generalize the theory along several axes, some of which readily lent themselves to conclusive results while others didn’t. We saw that the power of spreading, by virtue of being an indicator of the necessary power of the description language, has an immediate and monotonic effect on generative capacity. Feature systems, on the other hand, were shown to be a negligible factor in theory comparisons; it remains an open question if the privativity assumption might affect generative capacity when the set of features is fixed. A detailled study of the effects of the syllable template also had to be deferred to later work. Clearly the most pressing issue, though, is the translation from representational to derivational theories. Not only will it enable us to reconcile two supposedly orthogonal perspectives on phonology, but it also allows us to harvest results on finite-state OT [8] to extend the framework to optimality theory. Even though a lot of work remains to be done and not all of my goals may turn out be achievable, I am confident that a model-theoretic approach provides an interesting new perspective on long-standing issues in phonology.

References [1] Blackburn, P., de Rijke, M., Venema, Y.: Modal Logic. Cambridge University Press, Cambridge (2002) [2] B¨ uchi, J.R.: Weak second-order arithmetic and finite automata. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik 6, 66–92 (1960) [3] Charette, M., G¨ oksel, A.: Licensing constraints and vowel harmony in Turkic languages. SOAS Working Papers In Linguistics and Phonetics 6, 1–25 (1996) [4] Chomsky, N.: Lectures on Government and Binding: The Pisa Lectures. Foris, Dordrecht (1981) [5] Cohen, J.: On the expressive power of temporal logic for infinite words. Theoretical Computer Science 83, 301–312 (1991) [6] Cohen, J., Perrin, D., Pin, J.E.: On the expressive power of temporal logic. Journal of Computer and System Sciences 46, 271–294 (1993) [7] Etessami, K., Vardi, M.Y., Wilke, T.: First-order logic with two variables and unary temporal logic. In: Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science, pp. 228–235 (1997)

Formal Parameters of Phonology

85

[8] Frank, R., Satta, G.: Optimality theory and the generative complexity of constraint violability. Computational Linguistics 24, 307–315 (1998) [9] Goldsmith, J.: Autosegmental Phonology. Ph.D. thesis, MIT (1976) [10] Graf, T.: Comparing incomparable frameworks: A model theoretic approach to phonology. In: University of Pennsylvania Working Papers in Linguistics, Article 10, vol. 16 (2010), http://repository.upenn.edu/pwpl/vol16/iss1/10 [11] Graf, T.: Logics of Phonological Reasoning. Master’s thesis, University of California, Los Angeles (2010) [12] Haas, M.R.: Tonal accent in Creek. In: Hyman, L.M. (ed.) Southern California Occasional Papers in Linguistics, vol. 4, pp. 195–208. University of Southern California, Los Angeles (1977), reprinted in [39] [13] Harris, J., Lindsey, G.: The elements of phonological representation. In: Durand, J., Katamba, F. (eds.) Frontiers of Phonology, pp. 34–79. Longman, Harlow (1995) [14] Jensen, S.: Is an element? Towards a non-segmental phonology. SOAS Working Papers In Linguistics and Phonetics 4, 71–78 (1994) [15] Johnson, C.D.: Formal Aspects of Phonological Description. The Hague, Mouton (1972) [16] J¨ ager, G.: Gradient constraints in finite state OT: The unidirectional and the bidirectional case. In: Kaufmann, I., Stiebels, B. (eds.) More than Words. A Festschrift for Dieter Wunderlich, pp. 299–325. Akademie Verlag, Berlin (2002) [17] Kaplan, R.M., Kay, M.: Regular models of phonological rule systems. Computational Linguistics 20(3), 331–378 (1994) [18] Kaye, J.: Government in phonology: the case of Moroccan Arabic. The Linguistic Review 6, 131–159 (1990) [19] Kaye, J.: Do you believe in magic? The story of s+C sequences. Working Papers in Linguistics and Phonetics 2, 293–313 (1992) [20] Kaye, J.: A user’s guide to government phonology (2000) (unpublished manuscript), http://134.59.31.7/~ scheer/scan/Kaye00guideGP.pdf [21] Kaye, J., Lowenstamm, J., Vergnaud, J.R.: Constituent structure and government in phonology. Phonology Yearbook 7, 193–231 (1990) [22] Keenan, E.: Mathematical structures in language, ms. University of California, Los Angeles (2008) [23] Koskenniemi, K.: Two-level morphology: A general computational model for wordform recognition and production. Publication 11 (1983) [24] Kracht, M.: Syntactic codes and grammar refinement. Journal of Logic, Language and Information 4, 41–60 (1995) [25] Kracht, M.: Features in phonological theory. In: L¨ owe, B., Malzkorn, W., R¨ asch, T. (eds.) Foundations of the Formal Sciences II, Applications of Mathematical Logic in Philosophy and Linguistics. Trends in Logic, vol. 17, pp. 123–149. Kluwer, Dordrecht (2003); papers of a conference held in Bonn (November 11–13, 2000) [26] Leben, W.: Suprasegmental Phonology. Ph.D. thesis, MIT (1973) [27] Leucker, M., S´ anchez, C.: Regular linear temporal logic. In: Jones, C.B., Liu, Z., Woodcock, J. (eds.) ICTAC 2007. LNCS, vol. 4711, pp. 291–305. Springer, Heidelberg (2007) [28] Lowenstamm, J.: CV as the only syllable type. In: Durand, J., Laks, B. (eds.) Current Trends in Phonology: Models and Methods, pp. 419–421. European Studies Research Institute, University of Salford (1996) [29] McNaughton, R., Pappert, S.: Counter-Free Automata. MIT Press, Cambridge (1971) [30] Mitchell, T.F.: Prominence and syllabification in Arabic. Bulletin of the School of Oriental and African Studies 23(2), 369–389 (1960)

86

T. Graf

[31] Mohri, M., Sproat, R.: An efficient compiler for weighted rewrite rules. In: 34th Annual Meeting of the Association for Computational Linguistics, pp. 231–238 (1996) [32] Potts, C., Pullum, G.K.: Model theory and the content of OT constraints. Phonology 19(4), 361–393 (2002) [33] Prince, A., Smolensky, P.: Optimality Theory: Constraint Interaction in Generative Grammar. Blackwell, Oxford (2004) [34] Pullum, G.K.: The evolution of model-theoretic frameworks in linguistics. In: Rogers, J., Kepser, S. (eds.) Model-Theoretic Syntax @ 10, pp. 1–10 (2007) [35] Rogers, J.: A model-theoretic framework for theories of syntax. In: Proceedings of the 34th Annual Meeting of the ACL, Santa Cruz, USA, pp. 10–16 (1996) [36] Rogers, J.: Strict LT2: Regular: Local: Recognizable. In: Retor´e, C. (ed.) LACL 1996. LNCS (LNAI), vol. 1328, pp. 366–385. Springer, Heidelberg (1997) [37] Russell, K.: A Constraint-Based Approach to Phonology. Ph.D. thesis, University of Southern California (1993) [38] Scheer, T.: A Lateral Theory of Phonology: What is CVCV and Why Should it be? Mouton de Gruyter, Berlin (2004) [39] Sturtevant, W.C. (ed.): A Creek Source Book. Garland, New York (1987) [40] Thatcher, J.W.: Characterizing derivation trees for context-free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences 1, 317–322 (1967) [41] Thomas, W.: Star-free regular sets of ω-sequences. Information and Control 42, 148–156 (1979) [42] Vaillette, N.: Logical specification of regular relations for NLP. Natural Language Engineering 9(1), 65–85 (2003) [43] Vardi, M.Y.: A temporal fixpoint calculus. In: Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 250–259 (1988) [44] Weil, P.: Algebraic recognizability of languages. In: Fiala, J., Koubek, V., Kratochv´ıl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 149–175. Springer, Heidelberg (2004)

Variable Selection in Logistic Regression: The British English Dative Alternation Daphne Theijssen Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, 6525 HT Nijmegen, The Netherlands [email protected] http://lands.let.ru.nl/~ daphne

Abstract. This paper addresses the problem of selecting the ‘optimal’ variable subset in a logistic regression model for a medium-sized data set. As a case study, we take the British English dative alternation, where speakers and writers can choose between two – equally grammatical – syntactic constructions to express the same meaning. With 29 explanatory variables taken from the literature, we build two types of models: one with the verb sense included as a random effect, and one without a random effect. For each type, we build three different models by including all variables and keeping the significant ones, by successively adding the most predictive variable (forward selection), and by successively removing the least predictive variable (backward elimination). Seeing that the six approaches lead to six different variable selections (and thus six different models), we conclude that the selection of the ‘best’ model requires a substantial amount of linguistic expertise.

1

Introduction

There are many linguistic phenomena that researchers have tried to explain on the basis of features on several different levels of description (semantic, syntactic, lexical, etc.), and it can be argued that no single level can account for all observations. Probabilistic modelling techniques can help in combining these partially explanatory features and testing the combination on corpus data. A popular – and rather successful – technique for this purpose is logistic regression modelling. However, how exactly the technique is best employed for this type of research remains an open question. Statistical models built using corpus data do precisely what they are designed to do: find the ‘best possible’ model for a specific data set given a specific set of explanatory features. The issue that probabilistic techniques model data (while one would actually want to model underlying processes) is only aggravated by the fact that the variables are usually not mutually independent. As a consequence, one set of data and explanatory features can result in different models, depending on the details of the model building process. Building a regression model consists of three main steps: (1) deciding which of the available explanatory features should actually be included as variables in T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 87–101, 2010. c Springer-Verlag Berlin Heidelberg 2010 

88

D. Theijssen

the model, (2) establishing the coefficients (weights) for the variables, and (3) evaluating the model. The first step is generally referred to as variable selection and is the topic of the current paper. Steps (1) and (3) are clearly intimately related. Researchers have employed at least three different approaches to variable selection: (1) first building a model on all available explanatory features and then keeping/reporting those that have a significant contribution (e.g. [3]), (2) successively adding the most explanatory feature (forward), until no significant gain in model accuracy1 is obtained anymore (e.g. [9]), and (3) starting with a model containing all available features, and (backward) successively removing those that yield the smallest contribution, as long as the accuracy of the model is not significantly reduced (e.g. [2]). In general, researchers report on only one (optimal) model without giving clear motivations for their choice of the procedure used. In this paper, we compare the three approaches in a case study: we apply them to a set of 930 instances of the British English dative alternation, taken from the British component of the ICE Corpus. In the dative alternation, speakers choose between the double object (1) and the prepositional dative construction (2). 1. She handed the student the book. 2. She handed the book to the student. The explanatory features (explanations suggested in the literature) are taken from Bresnan et al.’s work on the dative alternation in American English [3]. Previous research (e.g. [8,3]) has indicated that the verb or verb sense often predicts a preference for one of the two constructions. However, contrary to the fourteen explanatory features suggested by Bresnan et al., which can be treated as fixed variables because of their small number of values (often only two), verb sense has so many different values that it cannot be treated as a fixed variable in a regression model. Recently developed logistic regression models can handle variables with too many values by treating these as random effects (cf. [18]). In order to examine the effect of building such mixed models, we create models with and without a random effect in each of the three approaches to variable selection described above. This leads to a total of six different models. Our goal is to investigate whether it is justified to report only one ‘optimal’ regression model, if models can be built in several different ways. We will also pay attention to the role of a random effect in a model of syntactic variation built with a medium-sized set of observations. The case of the British English dative alternation is used to illustrate the issues and results. The structure of this paper is as follows: A short overview of the related work can be found in Section 2. The data is described in Section 3. In Section 4, we explain the method applied. The results are shown and discussed in Section 5. In the final Section (6), we present our conclusions. 1

Obviously, the accuracy measure will also have considerable impact on the result.

Variable Selection in Logistic Regression

2 2.1

89

Related Work The Dative Alternation

Bresnan et al. [3] built various logistic regression models for the dative alternation based on 2360 instances they extracted from the three-million word Switchboard Corpus of transcribed American English telephone dialogues [5]. With the help of a mixed-effect logistic regression model, or mixed model, with verb sense as a random effect, they were able to explain 95% of the variation. They defined the verb sense as the verb lemma together with its semantic verb class. The semantic verb class is either ‘abstract’ (e.g. give it some thought), ‘communication’ (e.g. tell him a story), ‘transfer of possession’ (e.g. give him the book ), ‘prevention of possession’ (e.g. deny him the money) or ‘future transfer of possession’ (e.g. promise him help). To test how well the model generalizes to previously unseen data, they built a model on 2000 instances randomly selected from the total set, and tested on the remaining 360 cases. Repeating this 100 times, 94% of the test cases on average were predicted correctly. Many of the variables in the model concern the two objects in the construction (the student and the book in example 1 and 2). In the prepositional dative construction, the object first mentioned is the theme (the book ), and the second object the recipient (the student ). In the double object construction, the recipient precedes the theme. Bresnan et al. found that the first object is typically (headed by) a pronoun, mentioned previously in the discourse (given), animate, definite and longer (in number of words) than the second object. The characteristics of the second object are generally the opposite: non-pronominal, new, inanimate, indefinite and shorter. According to Haspelmath [10], there is a slight difference between the dative alternation as it occurs in British English and in American English. When the theme is a pronoun, speakers of American English tend to allow only the prepositional dative construction. In British English, clauses such as She gave me it and even She gave it me are also acceptable. Haspelmath provides no evidence for these claims (neither from corpora nor from psycholinguistic experiments). He refers to Siewierska and Hollmann [17], who present frequency counts in various corpora of Lancashire (British) English: Of the 415 instances of the dative alternation they found, 8 were of the pattern She gave me it, and 15 of She gave it me. It must be expected that such differences between language variants result in different behaviour of variables in models for these different language variants. Inappropriate approaches to variable selection may obscure this kind of ‘real’ difference. Gries [7] performed analyses with multiple variables that are similar to those in Bresnan et al. [3], but applied a different technique (linear discriminant analysis or LDA) on a notably smaller data set consisting of only 117 instances from the British National Corpus [4]. The LDA model is trained on all instances, and is able to predict 88.9% of these cases correctly (with a majority baseline of 51.3%). There is no information on how the model performs on previously unseen data.

90

D. Theijssen

Gries and Stefanowitsch [8] investigated the effect of the verb in 1772 instances from the ICE-GB Corpus [6]. When predicting the preferred dative construction for each verb (not taking into account the separate senses), 82.2% of the constructions could be predicted correctly. Using verb bias as a predictor thus outperforms the majority baseline of 65.0%. 2.2

Variable Selection in Logistic Regression

Variable selection in building logistic regression models is an extremely important issue, for which no hard and fast solution is available. In [11, chapter 5] it is explained that variable selection is often needed to arrive at a model that reaches an acceptable prediction accuracy and is still interpretable in terms of some theory about the role of the independent variables. Keeping too many variables may lead to overfitting, while a simpler model may suffer from underfitting. The risk of applying variable selection is that one optimizes the model for a particular data set. Using a slightly different data set may result in a very different variable subset. Previous studies aimed at creating logistic regression models to explain linguistic phenomena have used various approaches to variable selection. Grondelaers and Speelman [9], for instance, successively added the most predictive variables to an empty model, while Blackwell [2] successively eliminated the least predictive variables from the full model. The main criticisms of these methods are (1) that the results are difficult to interpret when the variables are highly correlated, (2) that deciding which variable to remove or add is not trivial, (3) that all methods may result in different models that may be sub-optimal in some sense, and (4) that each provides a single model, while there may be more than one ‘optimal’ subset [11]. A third approach to variable selection used in linguistic research is keeping only the significant variables in a complete model (cf. Bresnan et al. [3]). This is also what Sheather suggests in [16, chapter 8]. Before building a model, however, he studies plots of the variables to select those that he expects to contribute to the model. Where beneficial, he transforms the variables to give them more predictive power (e.g. by taking their log). After these preprocessing steps he builds a model containing all the selected variables, removes the insignificant ones, and then builds a new model. As indicated by Izenman [11], variable selection on the basis of a data set may lead to a model that is specific for that particular set. Since we want to be able to compare our models to those found by Bresnan et al. [3], who did not employ such transformations, we refrain from such preprocessing and we set out using the same set of variables they used in the variable selection process. Yet another approach mentioned in [11] is to build all models with each possible subset and select those with the best trade-off between accuracy, generalisability and interpretability. An important objection to this approach is that it is computationally expensive to carry out, and that decisions about interpretability may suffer from theoretical prejudice. For these reasons, we do not employ this method.

Variable Selection in Logistic Regression

3

91

Data

Despite the fact that a number of researchers have studied the dative alternation in English (see Section 2.1), none of the larger data sets used is available in such a form that it enables the research in this paper.2 We therefore established our own set of instances of the dative alternation in British English. Since we study a syntactic phenomenon, it is convenient to employ a corpus with detailed (manually checked) syntactic annotations. We selected the one-million-word British component of the ICE Corpus, the ICE-GB, containing both written and (transcribed) spoken language [6]. We used a Perl script to automatically extract potentially relevant clauses from the ICE-GB. These were clauses with an indirect and a direct object (double object) and clauses with a direct object and a prepositional phrase with the preposition to (prepositional dative). Next, we manually checked the extracted sets of clauses and removed irrelevant clauses such as those where the preposition to had a locative function (as, for example, in Fold the short edges to the centre.). Following Bresnan et al. [3], we ignored constructions with a preposition other than to, with a clausal object, with passive voice and with reversed constructions (e.g. She gave it me). To further limit the influence of the syntactic environment of the construction, we decided to exclude variants in imperative and interrogative clauses, as well as those with phrasal verbs (e.g. to hand over ). Coordinated verbs or verb phrases were also removed. The characteristics of the resulting data sets can be found in Table 1. Table 1. Characteristics of the 930 instances taken from the ICE-GB Corpus Medium Double object Prep. dative Total Spoken British English 406 152 558 Written British English 266 106 372 Total 672 258 930

4

Method

4.1

Explanatory Features

We adopt the explanatory features and their definitions from Bresnan et al. [3] (Table 2), and manually annotate our data set following an annotation manual based on these definitions.3 Our set includes one feature that was not used in [3]: medium, which tells us whether the construction was found in written or spoken text. It may well be 2

3

Although most of the data set used in [3] is available through the R package LanguageR, the original sentences and some annotations are not publicly available because they are taken from an unpublished, corrected version of the Switchboard Corpus. The annotation manual is available online: http://lands.let.ru.nl/~daphne/ downloads.html

92

D. Theijssen

Table 2. Explanatory features (th=theme, rec=recipient). All nominal explanatory features are transformed into binary variables with values 0 and 1. Feature 1. rec = animate 2. th = concrete 3. rec = definite

Values 1, 0 1, 0 1, 0

4. th = definite 5. rec = given 6. th = given 7. length difference 8. rec = plural 9. th = plural 10. rec = local 11. rec = pronominal 12. th = pronominal 13. verb = abstract verb = communication verb = transfer 14. structural parallellism 15. medium = written

1, 0 1, 0 1, 0 -3.4-4.2 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0 1, 0

Description human or animal, or not with fixed form and/or space, or not definite pronoun, proper name or noun preceded by definite determiner, or not Id. mentioned/evoked ≤20 clauses before, or not Id. ln(#words in th) − ln(#words in rec) plural in number, or not (singular) Id. first or second person (I, you), or not headed by a pronoun, or not Id. give it some thought is abstract, tell him a story is communication, give him the book is transfer preceding instance is prep. dative, or not type of data is written, or not (spoken)

that certain variables only play a role in one of the two media. In order to test this, we include the 14 (two-way) interactions between the features taken from Bresnan et al. and the medium.4 Together with the feature medium itself, this yields a total number of 29 features. As mentioned in the Introduction, we will build models with and without including verb sense as a random effect. Following [3], we define the verb sense as the lemma of the verb together with its semantic class, e.g. pay a for pay with an abstract meaning (pay attention) and pay t when pay is used to describe a transfer of possession (pay $10 ). In total, our data set contains 94 different verb senses (derived from 65 different verbs). The distribution of the verb senses with 5 or more occurrences can be found in Table 3. As predicted by Gries and Stefanowitsch [8], many verbs show a bias towards one of the two constructions. The verb give, for instance, shows a bias for the double object construction, and sell for the prepositional dative construction. Only for pay and send, the bias differs for the different senses. For example, pay shows a clear bias towards the prepositional dative construction when it has an abstract meaning, but no bias when transfer of possession is meant. Nevertheless, we follow the approach in [3] by taking the verb sense, not the verb, as the random effect. 4

We are aware of the fact that there are other ways to incorporate the medium in the regression models, for instance by building separate models for the written and the spoken data. Since the focus of this paper is on the three approaches in combination with the presence or absence of a random effect, we will limit ourselves to the method described.

Variable Selection in Logistic Regression

93

Table 3. Distribution of verb senses with 5 or more occurrences in the data set. The verb senses in the right-most list have a clear bias towards the double object (d.obj.) construction, those in the left-most for the prepositional dative (p.dat.) construction, and those in the middle show no clear preference. The a represents abstract, c communication and t transfer of possession. # d.obj. > # p.dat. verb sense d.obj. p.dat. give a 255 32 56 21 give t 66 10 give c 67 1 tell c 42 16 send t 37 9 show c 24 9 offer a 6 1 show a offer t 6 0 tell a 6 0 wish c 6 0 bring a 4 1

4.2

# d.obj. ≈ # p.dat. verb sense d.obj. p.dat. do a 8 10 send c 9 7 lend t 8 7 pay t 6 5 leave a 5 4 write c 4 5 bring t 3 2 hand t 3 2

# d.obj. < # p.dat. verb sense d.obj. p.dat. pay a 2 12 cause a 5 8 sell t 0 10 owe a 2 6 explain c 0 6 present c 0 6 read c 1 4

Variable Selection

Using the values of the 29 explanatory features (fixed effect factors), we establish a regression function that predicts the natural logarithm (ln) of the odds that the construction C in clause j is a prepositional dative. The prepositional dative is regarded a ‘success’ (with value 1), while the double object construction is considered a ‘failure’ (0). The regression function for the models without a random effect is: (1): ln odds(Cj = 1) = α +

29 

(βk Vjk ) .

(1)

k=1

The α is the intercept of the function. βk Vjk are the weights β and values Vj of the 29 variables k. For the model with the random effect (for verb sense i), the regression function is: ln odds(Cij = 1) = α +

29 

(βk Vjk ) + eij + ri .

(2)

k=1

The random effect ri is normally distributed with mean zero (ri ∼ N (0, σr2 )), independent of the normally distributed error term eij (eij ∼ N (0, σe2 )). The optimal values for the function parameters α, βk and (for models with a random effect) ri and eij are found with the help of Maximum Likelihood Estimation.5 The outcome of the regression enables us to use the model as a classifier: all cases with ln odds(Cj = 1) ≥ t (for the models without a random effect) or 5

We use the functions glm() and lmer() [1] in R [15].

94

D. Theijssen

ln odds(Cij = 1) ≥ t (for models with a random effect) are classified as prepositional dative, all with ln odds(Cj = 1) < t or ln odds(Cij = 1) < t as double object, with t the decision threshold, which we set to 0. With this threshold, all instances for which the regression function outputs a negative ln odds are classified as double object constructions, all other instances as prepositional dative. In the first approach, we include all 29 features in the model formula. We then remove all variables Vk that do not have a significant effect in the model output,6 and build a model with the remaining (significant) variables. For the second approach, being forward selection, we start with an empty model and successively add the variable that is most predictive. As Izenman [11] explains, there are several possible criteria for deciding which variable to enter. We decide to enter the variable that yields the highest area under the ROC (Receiver Operating Characteristics) curve of the extended model. The ROC curve shows the proportions of correctly and incorrectly classifies instances as a function of the decision threshold. The area under the ROC curve (AUC) gives the probability that the regression function, when randomly selecting a positive (prepositional dative) and a negative (double object) instance, outputs a higher log odds for the positive instance than for the negative instance. The AUC is thus an evaluation measure for the quality of a model. It is calculated with: average rank(xC=1 ) − n−p

p+1 2

,

(3)

where average rank(xC=1 ) is the average rank of the instances x that are prepositional dative (when all instances are ranked numerically according to the log odds), p the number of prepositional dative instances, and n the total number of instances.7 We add the next most predictive variable to the model as long as it gives an improvement over the AUC of the model without the variable. An interaction of variable Vk with medium is only included when the resulting AUC is higher than the value reached after adding the main variable Vk .8 Two AUC values are considered different when the difference is higher than a threshold. We set the threshold to 0.002.9 For the third approach (backward elimination), we use the opposite procedure: we start with the full model, containing all 29 variables, and successively leave out the variable Vk that, after removal, yields the model with the highest AUC value that is not lower than the AUC value for the model with Vk . When the AUC value of a model without variable Vk does not differ from the AUC value of the model without the interaction of Vk with medium, we remove the interaction. Again, AUC values are only considered different when the difference exceeds a threshold (again set to 0.002). 6 7 8

9

We use the P-values as provided by glm() and lmer(). We use the function somers2() created in R [15] by Frank Harrell. When including an interaction but not the main variables in it, the interaction will also partly explain variation that is caused by the main variables [14]. The threshold value has been established experimentally.

Variable Selection in Logistic Regression

95

We evaluate the models with and without random effects by establishing the model quality (training and testing on all 930 cases) by calculating the percentage of correctly classified instances (accuracy) and the area under the ROC curve (AUC). Also, we determine the prediction accuracy reached in 10-fold cross-validation (10 sessions of training on 90% of the data and testing on the remaining 10%) in order to establish how well a model generalizes to previously unseen data. In the 10-fold cross-validation setting, we provide the algorithms with the variables selected in the models trained on all 930 cases. The regression coefficients for these subsets of variables are then estimated for each separate training set. The coefficients in the regression models help us understand which variables play what role in the dative alternation. We will therefore compare the coefficients of the significant effects in the models built on all 930 instances.

5 5.1

Results Mixed Models

Table 4 gives the model quality and prediction accuracy for the different regression models we built, including verb sense as a random effect. The prediction accuracy (the percentage of correctly classified cases) is significantly higher than the majority baseline (always selecting the double object construction) in all settings, also when testing on new data (p < 0.001 for the three models, Wilcoxon paired signed rank test). Table 4. Number of variables selected, model quality and prediction accuracy of the regression models with verb sense as a random effect

selection #variables baseline 1. significant 6 0.723 2. forward 4 0.723 3. backward 4 0.723

model quality (train=test) AUC accuracy 0.979 0.935 0.979 0.932 0.978 0.928

10-fold cv aver. accuracy 0.819 0.827 0.833

When training and testing on all 930 instances, the mixed models reach very high AUC and prediction accuracy (model quality). However, seeing the decrease in accuracy in a 10-fold cross-validation setting, it seems that the mixed models do not generalize very well to previously unseen data. The significant effects for the variables selected in the three approaches are presented in Table 5. The directions of the main effects are the same as the results presented in Section 2.1 for American English [3]. The forward selection (2) and backward elimination (3) approaches lead to almost the same regression model. The only difference is that in the backward model, the discourse givenness of the recipient is included as a main effect, while it is included as an interaction with medium in the forward model. Both indicate that the choice for the double object construction is more likely when the

96

D. Theijssen

Table 5. Coefficients of significant effects in (mixed) regression models with verb sense as random effect, trained on all 930 instances, *** p) such that A ⊆ N, > is the standard order on the natural numbers, and fA is a polynomial for every function symbol f . If a polynomial interpretation is compatible with a TRS R, then we clearly have dlR (t) [α]A (t) for all terms t and assignments α. Example 1. Consider the TRS R with the following rewrite rules over the signature containing the function symbols 0 (arity 0), s (arity 1), + and - (arity 2). This system is example SK90/2.11.trs in the termination problems database1 (TPDB), which is the standard benchmark for termination provers: +(0, y) → y +(s(x), y) → s(+(x, y))

-(0, y) → 0 -(x, 0) → x

-(s(x), s(y)) → -(x, y)

The following interpretation functions build a compatible polynomial interpretation A over the carrier N: +A (x, y) = 2x + y

-A (x, y) = 3x + 3y

sA (x) = x + 2

0A = 1

A strongly linear interpretation is a polynomial interpretation n such that every interpretation function fA has the form fA (x1 , . . . , xn ) = i=1 xi + c, c ∈ N. A surprisingly simple property is that compatibility with a strongly linear interpretation induces a linear upper bound on the derivational complexity. A linear polynomial interpretation is a polynomial interpretation where each n interpretation function fA has the shape fA (x1 , . . . , xn ) = i=1 ai xi +c, ai ∈ N, 1

See http://www.lri.fr/~ marche/tpdb/ and http://termcomp.uibk.ac.at/

Cdiprover3: A Tool for Proving Derivational Complexities

145

c ∈ N. For instance, the interpretation given in Example 1 is a linear polynomial interpretation. Because of their simplicity, this class of polynomial interpretations is the one most commonly used in automatic termination provers. As illustrated by Example 2 below, if only a single one of the coefficients ai in any of the functions fA is greater than 1, there might already exist derivations whose length is exponential in the size of the starting term. Example 2. Consider the TRS S with the following single rule over the signature containing the function symbols a, b (arity 1), and c (arity 0). This system is example SK90/2.50.trs in the TPDB: a(b(x)) → b(b(a(x))) The following interpretation functions build a compatible linear polynomial interpretation A over N: aA (x) = 2x

bA (x) = x + 1

cA = 0

If we start a rewrite sequence from the term an (b(c)), we reach the normal form n b2 (an (c)) after 2n − 1 rewriting steps. Therefore, the derivational complexity of S is at least exponential. 3.2

Context Dependent Interpretations

Even though polynomial interpretations provide an easy way to obtain an upper bound on the derivational complexity of a TRS, they are not very suitable for proving polynomial derivational complexity. Strongly linear interpretations only capture linear derivational complexity, but even a slight generalisation admits already examples of exponential derivational complexity, as illustrated by Example 2. In [12], context dependent interpretations are introduced. They use an additional parameter (usually denoted by Δ) in the interpretation functions, which changes in the course of evaluating the interpretation of a term, thus making the interpretation dependent on the context. This way of computing interpretations also allows us to bridge the gap between linear and polynomial derivational complexity. Definition 1. A context dependent interpretation C for some signature F conn + sists of functions {fC [Δ] : (R+ → R+ 0) 0 | f ∈ F, n = arity(f ), Δ ∈ R } i + + and {fC : R → R | f ∈ F, i ∈ {1, . . . , arity(f )}}. Given a Δ-assignment α : R+ × V → R + 0 , the evaluation of a term t by C is denoted by [α, Δ]C (t). It is defined inductively as follows: [α, Δ]C (x) = α(Δ, x) [α, Δ]C (f (t1 , . . . , tn )) =

fC [Δ]([α, fC1 (Δ)]C (t1 ), . . . , [α, fCn (Δ)]C (tn ))

for x ∈ V for f ∈ F

Definition 2. For each Δ ∈ R+ , let >Δ be the order defined by a >Δ b ⇐⇒ a − b Δ. A context dependent interpretation C is compatible with a TRS R if for all rewrite rules l → r in R, all Δ ∈ R+ , and every Δ-assignment α, we have [α, Δ]C (l) >Δ [α, Δ]C (r).

146

A. Schnabl

Definition 3. A Δ-linear interpretation is a context dependent interpretation C whose interpretation functions have the form fC [Δ](z1 , . . . , zn ) =

n 

a(f,i) zi +

i=1

fCi (Δ) =

n 

b(f,i) zi Δ + cf Δ + df

i=1

Δ a(f,i) + b(f,i) Δ

with a(f,i) , b(f,i) , cf , df ∈ N, a(f,i) + b(f,i) = 0 for all f ∈ F, 1 i n. If we have a(f,i) ∈ {0, 1} for all f, i, we also call it a Δ-restricted interpretation We consider Δ-linear interpretations because of the similarity between the functions fC [Δ] and the interpretation functions of linear polynomial interpretations. Another point of interest is that the simple syntactical restriction to Δ-restricted interpretations yields a quadratic upper bound on the derivational complexity. Moreover, because of the special shape of Δ-linear interpretations, we need no additional monotonicity criterion for our main theorems: Theorem 1 ([18]). Let R be a TRS and suppose that there exists a compatible Δ-linear interpretation. Then R is terminating and dcR (n) = 2O(n) . Theorem 2 ([21]). Let R be a TRS and suppose that there exists a compatible Δ-restricted interpretation. Then R is terminating and dcR (n) = O(n2 ). Example 3. Consider the TRS given in Example 1 again. A compatible Δrestricted (and Δ-linear) interpretation C is built from the following interpretation functions: +C [Δ](x, y) = (1 + Δ)x + y + Δ -C [Δ](x, y) = x + y + Δ sC [Δ](x) = x + Δ + 1

Δ 1+Δ -1C (Δ) = Δ

+1C (Δ) =

s1C (Δ) = Δ

+2C (Δ) = Δ −2C (Δ) = Δ 0C [Δ] = 0

Note that this interpretation gives a quadratic upper bound on the derivational complexity. However, from the polynomial interpretation given in Example 1, we can only infer an exponential upper bound [13]. Consider the term Pn,n , where we define P0,n = sn (0) and Pm+1,n = +(Pm,n , 0). We have |Pn,n | = 3n + 1. For every m, n ∈ N, Pm+1,n rewrites to Pm,n in n+1 steps. Therefore, Pn,n reaches its normal form sn (0) after n(n+1) rewrite steps. Hence, the derivational complexity is also Ω(n2 ) for this example, so the inferred bound O(n2 ) is tight.

4

Implementation

cdiprover3 is written fully in OCaml2. It employs the libraries of the termination prover TTT23 . From these libraries, functionality for handling TRSs and 2 3

http://caml.inria.fr http://colo6-c703.uibk.ac.at/ttt2

Cdiprover3: A Tool for Proving Derivational Complexities

147

SAT encodings, and an interface to the SAT solver MiniSAT4 are used. Without counting this, the tool consists of about 1700 lines of OCaml code. About 25% of that code are devoted to the manipulation of polynomials and extensions of polynomials that stem from our use of the parameter Δ. Another 35% are used for constructing parametric interpretations and building suitable Diophantine constraints (see below) which enforce the necessary conditions for termination. Using TTT2’s library for propositional logic and its interface to MiniSAT, 15% of the code deal with encoding Diophantine constraints into SAT. The remaining code is used for parsing input options and the given TRS, generating output, and controlling the program flow. In order to find polynomial interpretations automatically, Diophantine constraints are generated according to the procedure described in [6]. Putting an upper bound on the coefficients makes the problem finite. Essentially following [8], we then encode the (finite domain) constraints into a propositional satisfiability problem. This problem is given to MiniSAT. From a satisfying assignment for the SAT problem, we construct a polynomial interpretation which is monotone and compatible with the given TRS. This procedure is also the basis of the automatic search for Δ-linear and Δrestricted interpretations. The starting point of that search is an interpretation with uninstantiated coefficients. If we want to be able to apply Theorem 1 or 2, we need to find coefficients which make the resulting interpretation compatible with the given TRS. Furthermore, we need to make sure that no divisions by zero occur in the interpretation functions. Again, we encode these properties into Diophantine constraints on the coefficients of a Δ-linear or Δ-restricted interpretation. The encoding is an adaptation of the procedure in [6] to context dependent interpretations: to encode the condition that no divisions by zero occur, we use the constraint a(f,i) + b(f,i) > 0 for each function symbol f ∈ F, and 1 i arity(f ). Here, the variables a(f,i) and b(f,i) refer to the (uninstantiated) coefficients of fC [Δ], as presented in Definition 3. If a Δ-restricted interpretation is searched, we also add the constraint a(f,i) − 1 0 for each f ∈ F, and 1 i arity(f ), which enforces the Δ-restricted shape. To ensure compatibility with the given TRS, we use the constraints ∀α∀Δ∀x1 . . . ∀xn [α, Δ]C (l) − [α, Δ]C (r) − Δ 0 for each rule l → r in the given TRS, where x1 , . . . , xn is the set of variables occurring in l → r. We unfold [α, Δ]C according to the equalities given in Definitions 1 and 3. We then use some (incomplete) transformations to obtain a set of constraints using only the variables a(f,i) , b(f,i) , cf , and df introduced in 4

http://minisat.se

148

A. Schnabl

Definition 3. Satisfaction of these transformed constraints then implies satisfaction of the original constraints, which in turn implies compatibility of the induced context dependent interpretation with the given TRS. For a detailed description of this procedure, we refer to [21,18]. Once we have built the constraints, we continue using the same techniques as for searching polynomial interpretations: we encode the constraints in a propositional satisfiability problem, apply the SAT solver, and use a satisfying assignment to construct a context dependent interpretation. Table 1 shows experimental results of applying cdiprover3 on the 957 known terminating examples of version 4.0 of the TPDB. The tests were performed R OpteronTM 2.80 GHz dual single-threaded on a server equipped with 8 AMD! core processors with 64 GB of memory. For each system, cdiprover3 was given a timeout of 60 seconds. All times in the table are given in milliseconds. The first line of the table indicates the used proof technique; SL denotes strongly linear interpretations. The second row of the table specifies the upper bound for the coefficient variables; in all tests, we called cdiprover3 with the options -i -b X (see Section 5 below), where X is the value specified in the second row. As we can see, cdiprover3 is able to prove polynomial derivational complexity for 88 of the 368 known terminating non-duplicating rewrite systems of the TPDB (duplicating rewrite systems have at least exponential derivational complexity, so this restriction is harmless here). The results indicate that an upper bound of 7 on the coefficient variables suffices to capture all examples on our test set. Therefore, 3 and 7 seem to be good candidates for default values of the -b flag. However, it should be noted that our handling of the divisions introduced by the functions fCi is computationally rather expensive, which is indicated by the number of timeouts and the average time needed for successful proofs. This also explains the slight decrease in performance when we extend the search space to Δ-linear interpretations. The amount and average time of successes for Δ-linear interpretations remains almost constant for the tested upper bounds on the coefficient variables. However, raising this upper bound leads to a significant increase in the number of timeouts. There is exactly one system which can be handled by Δ-linear interpretations (even with upper bound 3), but not by Δ-restricted interpretations: system SK90/2.50 in the TPDB, which we mentioned in Example 2. Note that it is theoretically impossible to find a suitable Δ-restricted interpretation for this TRS, since its derivational complexity is exponential. Table 1. Performance of cdiprover3 Method SL SL+Δ-rest. Δ-linear Δ-rest. 31 31 3 7 15 31 3 7 15 -i -b X # success 41 88 82 83 82 83 83 86 86 3287 5256 5566 4974 5847 3425 3935 3837 average success time 15 0 234 525 687 750 797 142 189 222 # timeout

31 86 3845 238

Cdiprover3: A Tool for Proving Derivational Complexities

5

149

Using cdiprover3

cdiprover3 is called from command line. Its basic usage pattern is $ ./cdiprover3 – specifies the maximum number of seconds until cdiprover3 stops looking for a suitable interpretation. – specifies the path to the file which contains the considered TRS. – For , the following switches are available: -c defines the desired subclass of the searched polynomial or context dependent interpretation. The following values of are legal: linear, simple, simplemixed, quadratic These values specify the respective subclasses of polynomial interpretations, as defined in [22]. Linear polynomial interpretations imply an exponential upper bound on the derivational complexity. The other classes imply a double exponential upper bound, cf. [13]. pizerolinear, pizerosimple, pizerosimplemixed, pizeroquadratic For these values, cdiprover3 tries to find a polynomial interpretation with the following restrictions: defined function symbols are interpreted by linear, simple, simple-mixed, or quadratic polynomials, respectively. Constructors are interpreted by strongly linear polynomials. These interpretations guarantee that the derivation length of all constructor based terms is polynomial [4]. sli This option corresponds to strongly linear interpretations. As mentioned in Section 3, they induce a linear upper bound on the derivational complexity of a compatible TRS. deltalinear This value specifies that the tool should search for a Δlinear interpretation. By Theorem 1, compatibility with such an interpretation implies an exponential upper bound on the derivational complexity. deltarestricted This value corresponds to Δ-restricted interpretations. By Theorem 2, they induce a quadratic upper bound. -b sets the upper bound for the coefficient variables. The default value for this bound is 3. -i This switch activates an incremental strategy for handling the upper bound on the coefficient variables. First, cdiprover3 tries to find a solution using an intermediate upper bound of 1 (which corresponds to encoding each coefficient variable by one bit). Whenever the tool fails to find a proof for some upper bound b, it is checked whether b is equal to the bound specified by the -b option. If that is the case, then the search for a proof is given up. Otherwise, b is set to the minimum of the bound specified by the -b option and 2(b + 1) − 1 (which corresponds to increasing the number of bits used for each coefficient variable by 1). If the -c switch is not specified, then the standard strategy for proving polynomial derivational complexity is employed. First, cdiprover3 looks for a strongly

150

A. Schnabl

linear interpretation. If that is not successful, then a suitable Δ-restricted interpretation is searched. The input TRS files are expected to have the same format as the files in the TPDB. The format specification for this database is available at http://www.lri.fr/~marche/tpdb/format.html. The output given by cdiprover3, as exemplified by Example 4, is structured as follows. The first line contains a short answer to the question whether the given TRS is terminating: YES, MAYBE, or TIMEOUT. The latter means that cdiprover3 was still busy after the specified timeout. MAYBE means that a termination proof $ cat tpdb-4.0/TRS/SK90/2.11.trs (VAR x y) (RULES +(0,y) -> y +(s(x),y) -> s(+(x,y)) -(0,y) -> 0 -(x,0) -> x -(s(x),s(y)) -> -(x,y) ) (COMMENT Example 2.11 (Addition and Subtraction) in \cite{SK90}) $ ./cdiprover3 -i tpdb-4.0/TRS/SK90/2.11.trs 60 YES QUADRATIC upper bound on the derivational complexity This TRS is terminating using the deltarestricted interpretation -(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 0*X1*delta + 1*delta s(delta, X0) = + 1*X0 + 1 + 0*X0*delta + 1*delta 0(delta) = + 0 + 0*delta +(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 1*X1*delta + 1*delta - tau 1(delta) = delta/(1 + 0 * delta) - tau 2(delta) = delta/(1 + 0 * delta) s tau 1(delta) = delta/(1 + 0 * delta) + tau 1(delta) = delta/(1 + 1 * delta) + tau 2(delta) = delta/(1 + 0 * delta) Time: 0.024418 seconds Statistics: Number of monomials: 187 Last formula building started for bound 1 Last SAT solving started for bound 1 Fig. 1. Output produced by cdiprover3

Cdiprover3: A Tool for Proving Derivational Complexities

151

could not be found, and cdiprover3 gave up before time ran out. The answer YES indicates that an interpretation of the given class has been found which guarantees termination of the given TRS. It is followed by the inferred bound on the derivational complexity and a listing of the interpretation functions. After the interpretation functions, the elapsed time between the call of cdiprover3 and the output of the proof is given. In all cases, the answer is concluded by statistics stating the total number of monomials in the constructed Diophantine constraints, and the upper bound for the coefficients that was used in the last call to MiniSAT. Example 4. Given the TRS shown in Example 1, cdiprover3 produces the output shown in Figure 1. The interpretations in Example 3 and in the output are equivalent. Note that the parameter Δ in the interpretation functions fC [Δ] is treated like another argument of the function. The interpretation functions fCi are represented by f tau i in the output.

6

Discussion

In this paper, we have presented the (as far as we know) first tool which is specifically designed for automatically proving polynomial derivational complexity of term rewriting. We have also given a brief introduction into the applied proof methods. During the almost two years which have passed between the 13th ESSLLI Student Session, where this paper was originally published, and the writing of this version, we have done further work concerning context dependent interpretations and automated complexity analysis. In [20], we have extended Δ-linear interpretations to Δ2 -interpretations, defined by the following shape: fC (Δ, z1 , . . . , zn ) =

n  i=1

fCi (Δ) =

a(f,i) zi +

n 

b(f,i) zi Δ + gf + hf Δ

i=1

c(f,i) + d(f,i) Δ a(f,i) + b(f,i) Δ

In the same paper, we have established a correspondence result between Δ2 interpretations and two-dimensional matrix interpretations. Matrix interpretations are interpretations into a well-founded F -monotone algebra using vectors of natural numbers as their carrier. Their interpretation functions are based on vector addition and matrix-vector multiplication. See [7] for a more detailed description of matrix interpretations. We have the following theorem: Theorem 3 ([20]). Let R be a TRS and let C be a Δ2 -interpretation such that R is compatible with C. Then there exists a corresponding matrix interpretation A (of dimension 2) compatible with R.

152

A. Schnabl

With some minor restrictions, the theorem also holds in the reverse direction. Also note that one-dimensional matrix interpretations are equivalent to polynomial interpretations as long as all used polynomials are linear. Moreover, in the meantime, Martin Avanzini, Georg Moser, and the author of this paper have also been developing TCT, a more general tool for automated complexity analysis of term rewriting. TCT can be found at http://cl-informatik.uibk.ac.at/software/tct/ While TCT does not apply polynomial and context dependent interpretations anymore, matrix interpretations are one of its most heavily used proof techniques. As suggested by Theorem 3, all examples, where cdiprover3 can show a polynomial upper bound on the derivational complexity of a TRS, can also be handled by TCT with a matrix interpretation of dimension at most 2 (of a restricted shape which induces a quadratic upper bound on the derivational complexity). Further techniques implemented by TCT include arctic interpretations [14] (the basic idea of this technique is to extend matrix interpretations to a domain different from natural numbers), root labeling [23], and rewriting of right hand sides [25]. Currently, TCT can show a polynomial upper bound on the derivational complexity of 212 of the 368 known terminating non-duplicating systems mentioned in Section 4. The average time for a successful complexity proof is 4.89 seconds, and TCT produces a timeout for 122 of the remaining systems. However, it should be noted that TCT was designed to run several termination proof attempts in parallel, and TCT ran on 16 cores in this test (we used the same testing machine as for the tests described in Section 4, and we did not restrict TCT to run single-threaded). Hence, the numbers are not directly comparable. Still, it becomes visible that the power of automated derivational complexity analysis has increased greatly during the last two years. At this point, there exist upper bounds on the derivational complexity induced by most direct termination proof techniques. However, virtually all state-of-theart termination provers employ the dependency pair framework, cf. [9], in their proofs. As shown in [19], not even the most simple version of the dependency pair method, as presented in [1], is suitable for inferring polynomial upper bounds on derivational complexities. There have been efforts in [10,11] to weaken the basic dependency pair method in order to make it usable for bounding the derivation length of constructor based terms (this is called runtime complexity analysis in these papers). A possible avenue for future work would be to develop a restricted version of the dependency pair method (or even the dependency pair framework) which is able to infer polynomial bounds on derivational complexities.

References 1. Arts, T., Giesl, J.: Termination of term rewriting using dependency pairs. Theor. Comp. Sci. 236(1,2), 133–178 (2000) 2. Avanzini, M., Moser, G.: Complexity analysis by rewriting. In: Garrigue, J., Hermenegildo, M.V. (eds.) FLOPS 2008. LNCS, vol. 4989, pp. 130–146. Springer, Heidelberg (2008)

Cdiprover3: A Tool for Proving Derivational Complexities

153

3. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998) 4. Bonfante, G., Cichon, A., Marion, J.Y., Touzet, H.: Algorithms with polynomial interpretation termination proof. J. Funct. Program. 11(1), 33–53 (2001) 5. Bonfante, G., Marion, J.Y., P´echoux, R.: Quasi-interpretation synthesis by decomposition. In: Jones, C.B., Liu, Z., Woodcock, J. (eds.) ICTAC 2007. LNCS, vol. 4711, pp. 410–424. Springer, Heidelberg (2007) 6. Contejean, E., March´e, C., Tom´ as, A.P., Urbain, X.: Mechanically proving termination using polynomial interpretations. J. Autom. Reason. 34(4), 325–363 (2005) 7. Endrullis, J., Waldmann, J., Zantema, H.: Matrix interpretations for proving termination of term rewriting. J. Autom. Reason. 40(3), 195–220 (2008) 8. Fuhs, C., Giesl, J., Middeldorp, A., Schneider-Kamp, P., Thiemann, R., Zankl, H.: SAT solving for termination analysis with polynomial interpretations. In: MarquesSilva, J., Sakallah, K.A. (eds.) SAT 2007. LNCS, vol. 4501, pp. 340–354. Springer, Heidelberg (2007) 9. Giesl, J., Thiemann, R., Schneider-Kamp, P.: The dependency pair framework: Combining techniques for automated termination proofs. In: Baader, F., Voronkov, A. (eds.) LPAR 2004. LNCS (LNAI), vol. 3452, pp. 301–331. Springer, Heidelberg (2005) 10. Hirokawa, N., Moser, G.: Automated complexity analysis based on the dependency pair method. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS (LNAI), vol. 5195, pp. 364–379. Springer, Heidelberg (2008) 11. Hirokawa, N., Moser, G.: Complexity, graphs, and the dependency pair method. In: Cervesato, I., Veith, H., Voronkov, A. (eds.) LPAR 2008. LNCS (LNAI), vol. 5330, pp. 652–666. Springer, Heidelberg (2008) 12. Hofbauer, D.: Termination proofs by context-dependent interpretations. In: Middeldorp, A. (ed.) RTA 2001. LNCS, vol. 2051, pp. 108–121. Springer, Heidelberg (2001) 13. Hofbauer, D., Lautemann, C.: Termination proofs and the length of derivations. In: Dershowitz, N. (ed.) RTA 1989. LNCS, vol. 355, pp. 167–177. Springer, Heidelberg (1989) 14. Koprowski, A., Waldmann, J.: Arctic termination . . . below zero. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 202–216. Springer, Heidelberg (2008) 15. Lankford, D.: On proving term-rewriting systems are noetherian. Tech. Rep. MTP2, Math. Dept., Louisiana Tech. University (1979) 16. Lescanne, P.: Termination of rewrite systems by elementary interpretations. Formal Aspects of Computing 7(1), 77–90 (1995) 17. Marion, J.Y.: Analysing the implicit complexity of programs. Inf. Comput. 183(1), 2–18 (2003) 18. Moser, G., Schnabl, A.: Proving quadratic derivational complexities using context dependent interpretations. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 276–290. Springer, Heidelberg (2008) 19. Moser, G., Schnabl, A.: The derivational complexity induced by the dependency pair method. In: Treinen, R. (ed.) RTA 2009. LNCS, vol. 5595, pp. 255–269. Springer, Heidelberg (2009) 20. Moser, G., Schnabl, A., Waldmann, J.: Complexity analysis of term rewriting based on matrix and context dependent interpretations. In: Hariharan, R., Mukund, M., Vinay, V. (eds.) FSTTCS 2008. LIPIcs, vol. 2, pp. 304–315. Schloss Dagstuhl Leibniz-Zentrum fuer Informatik (2008) 21. Schnabl, A.: Context Dependent Interpretations, Master’s thesis, Universit¨ at Innsbruck (2007), http://cl-informatik.uibk.ac.at/~ aschnabl/

154

A. Schnabl

22. Steinbach, J.: Proving polynomials positive. In: Shyamasundar, R.K. (ed.) FSTTCS 1992. LNCS, vol. 652, pp. 191–202. Springer, Heidelberg (1992) 23. Sternagel, C., Middeldorp, A.: Root-labeling. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 336–350. Springer, Heidelberg (2008) 24. TeReSe: Term Rewriting Systems. Cambridge Tracts in Theoretical Computer Science, vol. 55. Cambridge University Press, Cambridge (2003) 25. Zantema, H.: Reducing right-hand sides for termination. In: Middeldorp, A., van Oostrom, V., van Raamsdonk, F., de Vrijer, R. (eds.) Processes, Terms and Cycles: Steps on the Road to Infinity. LNCS, vol. 3838, pp. 173–197. Springer, Heidelberg (2005)

POP∗ and Semantic Labeling Using SAT Martin Avanzini Institute of Computer Science, University of Innsbruck, Austria [email protected]

Abstract. The polynomial path order (POP∗ for short) is a termination method that induces polynomial bounds on the innermost runtime complexity of term rewrite systems (TRSs for short). Semantic labeling is a transformation technique used for proving termination. In this paper, we propose an efficient implementation of POP∗ together with finite semantic labeling. This automation works by a reduction to the problem of boolean satisfiability. We have implemented the technique and experimental results confirm the feasibility of our approach. By semantic labeling the analytical power of POP∗ is significantly increased.

1

Introduction

Term rewrite systems (TRSs for short) provide a conceptually simple but powerful abstract model of computation. In rewriting, proving termination is a long standing research field. Consequently, termination techniques applicable in an automated setting have been introduced quite early. Early research concentrated mainly on direct termination techniques [24]. One such technique is the use of recursive path orders, for instance the multiset path order (MPO for short) [11]. Nowadays, the emphasis shifted toward transformation techniques like the dependency pair method [2] or semantic labeling [26]. These methods significantly increase the possibility to automatically verify termination. Many termination techniques can be used to analyse the complexity of rewrite systems. For instance, Hofbauer was the first to observe that termination via MPO implies the existence of a primitive recursive bound on the derivational complexity [15]. Here the derivational complexity of a TRS measures the maximal number of rewrite steps as a function in the size of the initial term. For the study of lower complexity bounds we recently introduced in [4] the polynomial path order (POP∗ for short). This order is in essence a miniaturization of MPO, carefully crafted to induce polynomial bounds on the number of rewrite steps (c.f. Theorem 1), whenever the initial term is argument-normalised (aka basic). In this work, we show how to increase the analytical power of POP∗ by semantic labeling [26]. The idea behind semantic labeling is to label the function symbols of the analysed TRS R with semantic information so that proving termination of the labeled TRS Rlab becomes easier. The transformation is termination preserving and reflecting. More precisely, every derivation of R is simulated 

This research is supported by FWF (Austrian Science Fund) projects P20133.

T. Icard, R. Muskens (Eds.): ESSLLI 2008/2009 Student Sessions, LNAI 6211, pp. 155–166, 2010. Springer-Verlag Berlin Heidelberg 2010

156

M. Avanzini

step-by-step by Rlab . Thus, besides analysing the termination behavior of R, the TRS Rlab can also be employed for investigating the complexity of R. In order to obtain the labeled TRS Rlab from R, one needs to define suitable interpretation- and labeling-functions for all function symbols appearing in R. Naturally, these functions have to be chosen such that the employed direct technique — in our case POP∗ — is applicable to the labeled system. To find a properly labeled TRS Rlab automatically, we extend the propositional encoding of POP∗ presented in [4]. Satisfiability of the constructed formula certifies the existence of a labeled system Rlab that is compatible with POP∗ . As we have implemented the technique, the feasibility of our approach is confirmed. Moreover, experimental evidence indicates that the analytical power of polynomial path orders is significantly improved. The automation of semantic labeling together with some base order is not essentially new. For instance, an automation of semantic labeling together with recursive path orders has already been given in [17]. Unfortunately, this approach is inapplicable in our context as the resulting TRS is usually infinite here. Like many syntactic techniques, soundness of polynomial path orders is restricted to finite TRSs. To achieve that Rlab is finite, we restrict interpretation- and labeling-functions to finite domains. We structure the remainder of this paper as follows: In Section 2 we recall basic notions and briefly introduce the reader to polynomial path orders POP∗ . In Section 3 we show how polynomial path orders together semantic labeling can be efficiently automated. In Section 4 we present experimental results, and we conclude in Section 5.

2

The Polynomial Path Order

We briefly recall the basic concepts of term rewriting, for details [8] provides a good resource. Let V denote a countably infinite set of variables and F a signature, that is a set of function symbols with associated arities. The set of terms over F and V is denoted by T (F , V). We write  for the subterm relation, the converse is denoted by , the strict part of  by . A term rewrite system (TRS for short) R over T (F , V) is a set of rewrite rules l → r such that l, r ∈ T (F , V), l ∈ V and all variables of r also appear in l. In the following, R always denotes a TRS. If not mentioned otherwise, R is finite. A binary relation on T (F , V) is a rewrite relation if it is closed under contexts and substitutions. The smallest extension of R that is a rewrite relation i → is denoted by →R . The innermost rewrite relation − R is the restriction of →R where innermost terms have to be reduced first. The transitive and reflexive closure of a rewrite relation → is denoted by →∗ and we write s →n t for the contraction of s to t in n steps. We say that R is (innermost) terminating if there i exists no infinite chain of terms t0 , t1 , . . . such that ti →R ti+1 (ti − → R ti+1 ) for all i ∈ N. The root symbols of left-hand sides of rewrite rules in R are called defined symbols and collected in D(R), while all other symbols are called constructor

POP∗ and Semantic Labeling Using SAT

157

symbols and collected in C(R). A term f (s1 , . . . , sn ) is basic if f ∈ D(R) and s1 , . . . , sn ∈ T (C(R), V). We write Tb (R) for the set of all basic terms over R. If every left-hand side of R is basic then R is called constructor TRS. Constructor TRSs allow us to model the computation of functions in a very natural way. Example 1. Consider the constructor TRS Rmult defined by add(0, y) → y add(s(x), y) → s(add(x, y))

mult(0, y) → 0 mult(s(x), y) → add(y, mult(x, y)).

Rmult defines the function symbols add and mult, i.e. D(R) = {add, mult}. Natural numbers are represented using the constructor symbols C(R) = {s, 0}. Define the encoding function · : Σ ∗ → T (C(R), ∅) by 0 = 0 and n + 1 = s( n ). i ∗ Then for all n, m ∈ N, mult( n , m ) − → R n ∗ m . We say that Rmult computes multiplication (and addition) on natural numbers. For instance, the system admits the innermost rewrite sequence i i i mult(s(0), 0) − → add(0, mult(0, 0)) − → add(0, 0) − → 0,

computing 1∗0 = 0. Note that for the second term, the innermost redex mult(0, 0) is reduced first. In [19] it is proposed to conceive the complexity of a rewrite system R as the complexity of the functions computed by R. Whereas this view falls into the realm of implicit complexity analysis, we conceive rewriting under R as the evaluation mechanism of the encoded function. Thus it is natural to define the runtime complexity based on the number of rewrite steps admitted by R. Let |s| denote the size of a term s. The (innermost) runtime complexity of a terminating rewrite system R is defined by i n rcR (m) = max{n | ∃s, t. s − → t, s ∈ Tb (R) and |s| m} .

To verify whether the runtime complexity of a rewrite system R is polynomially bounded, we employ polynomial path order. Inspired by the recursiontheoretic characterization of the polytime functions given in [9], polynomial path orders rely on the separation of safe and normal inputs. For this, the notion of safe mappings is introduced. A safe mapping safe associates with every nary function symbol f the set of safe argument positions. If f ∈ D(R) then safe(f ) ⊆ {1, . . . , n}, for f ∈ C(R) we fix safe(f ) = {1, . . . , n}. The argument positions not included in safe(f ) are called normal and denoted by nrm(f ). A precedence is an irreflexive and transitive order on F . The polynomial path order >pop∗ is an extension of the auxiliary order >pop , both defined in the following two definitions. Here we write >= for the reflexive closure of an order >, further (>)mul denotes its multiset-extension (c.f. [8]). Definition 1. Let > be a precedence and let safe be a safe mapping. We define the order >pop inductively as follows: s = f (s1 , . . . , sn ) >pop t if one of the following alternatives hold:

158

M. Avanzini

1. si >= pop t for some i ∈ {1, . . . , n}, and if f ∈ D(R) then i ∈ nrm(f ), or 2. t = g(t1 , . . . , tm ), f ∈ D(R), f > g and s >pop ti for all 1 i m. Definition 2. Let > be a precedence and let safe be a safe mapping. We define the polynomial path order >pop∗ inductively as follows: s = f (s1 , . . . , sn ) >pop∗ t if one of the following alternatives hold: 1. s >pop t, or 2. si >= pop∗ t for some i ∈ {1, . . . , n}, or 3. t = g(t1 , . . . , tm ), f ∈ D(R), f > g and – s >pop∗ tj0 for some j0 ∈ safe(g), and – for all j = j0 , either s >pop tj or s  tj and j ∈ safe(g), or 4. t = f (t1 , . . . , tm ), f ∈ D(R) and – [si1 , . . . , sip ] (>pop∗ )mul [ti1 , . . . , tip ] for nrm(f ) = {i1 , . . . , ip }, and – [sj1 , . . . , sjq ] (>= pop∗ )mul [tj1 , . . . , tjq ] for safe(f ) = {j1 , . . . , jq }. Here [t1 , . . . , tn ] denotes the multiset with elements t1 , . . . , tn . When R ⊆ >pop∗ holds, we say that >pop∗ is compatible with R. The main theorem from [4] states: Theorem 1. Let R be a finite, constructor TRS compatible with >pop∗ , i.e., R ⊆ >pop∗ . Then the runtime complexity of R is polynomial. The polynomial depends only on the cardinality of F and the sizes of the right-hand sides in R. We conclude this section by demonstrating the application of POP∗ on the TRS Rmult . Below we write i for the i-th case of Definition 2. Example 2. Reconsider the rewrite system Rmult from Example 1. Consider the safe mapping safe where the second argument of addition is safe (safe(add) = {2}) and all arguments of multiplication are normal (safe(mult) = ∅). Furthermore, let the precedence > be defined as mult > add > s > 0. In order to verify compatibility for this particular instance >pop∗ we need to show that all the rules in Rmult are strictly decreasing, i.e., l >pop∗ r holds for l → r ∈ Rmult . To exemplify this, consider the rule add(s(x), y) → s(add(x, y)). From s(x) >pop∗ x by rule 2 we infer [s(x)] (>pop∗ )mul [x]. Furthermore [y] (>= pop∗ )mul [y] holds and thus by rule 4 we obtain add(s(x), y) >pop∗ add(x, y). From add > s we finally conclude add(s(x), y) >pop∗ s(add(x, y)) by one application of rule 3. As a consequence of Theorem 1, the number of rewrite steps starting from mult( n , m ) is polynomially bounded in n and m.

3

A Propositional Encoding of POP∗ and Finite Semantic Labeling

Before we investigate the propositional encoding of polynomial path orders and semantic labeling, we briefly explain basic notions of semantic labeling as introduced in [26]. Semantics is given to a TRS R by defining a model. A model is an F -algebra A, i.e. a carrier A equipped with operations fA : An → A for every n-ary symbol

POP∗ and Semantic Labeling Using SAT

159

f ∈ F, such that for every rule l → r ∈ R and any assignment α : V → A, the equality [α]A (l) = [α]A (r) holds. Here [α]A (t) denotes the interpretation of t with assignment α, recursively defined by  α(t) if t ∈ V [α]A (t) = fA ([α]A (t1 ), . . . , [α]A (tn )) if t = f (t1 , . . . , tn ) . The system R is labeled according to a labeling  for A, i.e. a set of mappings f : An → A for every n-ary function symbol f ∈ F.1 For every assignment α, the mapping labα (t) is defined by  t if t ∈ V labα (t) = fa (labα (t1 ), . . . , labα (tn )) if t = f (t1 , . . . , tn ) where a = f ([α]A (t1 ), . . . , [α]A (tn )). The labeled TRS Rlab is obtained by labeling all rules for all assignments α: Rlab = {labα (l) → labα (r) | l → r ∈ R and assignment α}. The main theorem from [26] states that Rlab is terminating if and only if R is terminating. In particular, it is shown that s →R t

⇐⇒

labα (s) →Rlab labα (t)

holds for α an arbitrary assignment. To simplify the presentation, we consider only algebras B with carrier B = {, ⊥} here, although in principle the approach works for arbitrary finite carriers. To encode a Boolean function b : Bn → B, we make use of unique propositional atoms bw1 ,...,wn for every sequence of arguments w1 , . . . , wn ∈ Bn . The atom bw1 ,...,wn denotes the result of applying arguments w1 , . . . , wn to b. For each sequence a1 , . . . , an of propositional formulas, we denote by b (a1 , . . . , an ) the following formula: when n = 0, we set b = bε . For n > 0, we set b (a1 , . . . , an ) =

 w1 ,...,wn ∈Bn

n

 

 wi ↔ ai → bw1 ,...,wn .

i=1

Consider the constraint b (a1 , . . . , an ) ↔ r, and suppose ν is a satisfying assignment. One easily verifies that the encoded function b satisfies b(w1 , . . . , wn ) = ν(bw1 ,...,wn ) = ν(r) for w1 = ν(a1 ), . . . , wn = ν(an ). We use this observation below to impose restrictions on interpretation- and labeling-functions. For every assignment α : V → B and term t appearing in R we introduce the atoms intα,t and labα,t for t ∈ V. The meaning of intα,t is the result of [α]B (t) for the encoded model B, labα,t denotes the label of the root symbol of the labeled 1

The definition from [26] allows the labeling of a subset of F and leave other symbols unchanged. In our context, this has no consequence and simplifies the translation.

160

M. Avanzini

term labα (t). To ensure for terms t = f (t1 , . . . , tn ) and assignments α a correct valuation of intα,t and labα,t respectively, we introduce constraints INTα (t) = intα,t ↔ fB (intα,t1 , . . . , intα,tn ), and LABα (t) = labα,t ↔ f (intα,t1 , . . . , intα,tn ). Furthermore, we set INTα (t) = intα,t ↔ α(t)2 for t ∈ V. The above constraints have to be enforced for every term appearing in R. This is covered by     LAB(R) = (INTα (t) ∧ LABα (t)) ∧ (intα,l ↔ intα,r ) . α

Rt

l→r∈R

Above  is extended to TRSs in the  obvious way: R  t if l  t or r  t for some rule l → r ∈ R. Notice that l→r∈R (intα,l ↔ intα,r ) enforces the model condition. Assume ν is a satisfying assignment for LAB(R) and Rlab denotes the system obtained by labeling R according to the encoded labeling and model. In order to show compatibility of Rlab with POP∗ , we need to find a precedence > and safe mapping safe such that Rlab ⊆ >pop∗ holds for the induced order >pop∗ . To compare the labeled versions labα (s) and labα (t) of two concrete terms s, t ∈ T (F , V) under a particular assignment α, i.e., to check labα (s) >pop∗ labα (t), we define (1)

(2)

(3)

(4)

s >pop∗ t α = s >pop∗ t α ∨ s >pop∗ t α ∨ s >pop∗ t α ∨ s >pop∗ t α . (i)

Here s >pop∗ t refers to the encodings of the case i from Definition 2. We discuss the cases 2 – 4, case 1, the comparison using the weaker order >pop , is obtained similarly. The above definition relies on the following auxiliary constraints. For every labeled symbol fa ∈ Flab and argument position i of f , we encode i ∈ safe(fa ) by a propositional atom safefa ,i . For every unlabeled symbol f ∈ F and formula a representing the label, the formula SF(fa , i) (respectively NRM(fa , i)) assesses that depending on the valuation of a, the i-th position of f or f⊥ is safe (normal). Similar, for f, g ∈ F and propositional formulas a and b, the formula fa > gb ensures fν(a) > fν(b) in the precedence for satisfying assignment ν. For the latter, we follow the standard approach of encoding precedences on function symbols, compare for instance [23]. Notice that si = t if and only if labα (si ) = labα (t). Thus case 2 is perfectly (2) captured by f (s1 , . . . , sn ) >pop∗ t α =  if si = t holds for some si . Otherwise, (2) we define f (s1 , . . . , sn ) >pop∗ t α = ni=1 si >pop∗ t α . For the encoding of the third clause in Definition 2, we introduce fresh atoms δj for each argument position j of g. The formula one(δ1 , . . . , δm ) assures that exactly one atom δj is true. This particular atom marks the unique safe argument position j of g(t1 , . . . , tm ) with the strong comparison labα (s) >pop∗ labα (tj ) allowed. We express clause 3 by the propositional formula 2

We also use  and ⊥ to denote truth and falsity in propositional formulas.

POP∗ and Semantic Labeling Using SAT

161

(3)

f (s1 , . . . , sn ) >pop∗ g(t1 , . . . , tm ) α = flabα,s > glabα,t ∧ one(δ1 , . . . , δm ) m

   ∧ δj → s >pop∗ tj α ∧ SF(glabα,t , j) j=1

  ∧ ¬δj → s >pop tj α ∨ s  tj ∧ SF(glabα,t , j)

for g ∈ D(R). Here s  ti =  when s  ti holds, and otherwise s  ti = ⊥. This is justified as the subterm relation is closed under labeling. Note that in the above encoding of clause 3, we assume that the labeled root symbol flabα,s is a defined symbol of Rlab . For the case that flabα,s is not defined, we add a rule flabα,s (x1 , . . . , xn ) → c with c a fresh constant to the analysed system Rlab . The latter rule is oriented if we additionally require flabα,s > c in the precedence. For (3) instance, the constraint mult(s(x), y) >pop∗ add(y, mult(x, y)) α unfolds to multl1 > addl2 α ∧ one(δ1 , δ2 )   ∧ δ1 → mult(s(x), y) >pop∗ y α ∧ SF(gl2 , 1)   ∧ ¬δ1 → mult(s(x), y) >pop y α ∨  ∧ SF(gl2 , 1)   ∧ δ2 → mult(s(x), y) >pop∗ mult(x, y) α ∧ SF(gl2 , 2)   ∧ ¬δ2 → mult(s(x), y) >pop mult(x, y) α ∨ ⊥ ∧ SF(gl2 , 2) for corresponding labels l1 and l2 depending on the encoded model. Additionally we require mult > c and mult⊥ > c to orient the added rules. To encode the final clause 4 from Definition 2, we make use of multiset covers [23]. A multiset cover is a pair of total mappings γ : {1, . . . , n} → {1, . . . , n} and ε : {1, . . . , n} → B, encoded using fresh atoms γi,j and εi . The underlying idea is that for the comparison [s1 , . . . , sn ] (>= pop∗ )mul [t1 , . . . , tn ] to hold, every term tj has to be covered by some term si (encoded by γij ), either by si = tj or si >pop∗ tj . The former situation is encoded by ¬εi , the latter by εi . For the case si = tj , si must not cover any element besides tj . We set (γ, ε) =

m 

one(γ1,j , . . . , γn,j ) ∧

j=1

n 

(εi → one(γi,1 , . . . , γi,m )) .

i=1

Based on this encoding of multiset covers, case 4 is now expressible as (4)

f (s1 , . . . , sn ) >pop∗ f (t1 , . . . , tn ) α = (labα,s ↔ labα,t ) ∧ (γ, ε) ∧

n    NRM(flabα,s , i) ∧ ¬εi i=1

n  n

  γi,j → (SF(flabα,s , i) ↔ SF(flabα,t , j)) ∧ i=1 j=1

 ∧ (εi → si = tj ) ∧ (¬εi → si >pop∗ tj α ) .

162

M. Avanzini

  The constraint ni=1 NRM(flabα,s , i) ∧ ¬εi is used so that at least one normal argument decreases. Assuming STRICT(R) and SMSL (R) cover the restrictions on the precedence and safe mapping, satisfiability of   POP∗SL(R) = l >pop∗ r α ∧ SM(R) ∧ STRICT(R) ∧ LAB(R) α l→r∈R

certifies the existence of a model B and labeling  such that the rewrite system Rlab = Rlab ∪ {fa (x1 , . . . , xn ) → c | f ∈ D(R) and fa ∈ C(Rlab )} is compatible with >pop∗ . The encoding is sound in the following sense. Theorem 2. Suppose the propositional formula POP∗SL(R) is satisfiable. Then Rlab ⊆ >pop∗ for some (finite) labeled rewrite system Rlab and polynomial path order >pop∗ . Since every rewrite sequence of R is simulated step-by-step by Rlab we obtain: Corollary 1. Let R be a finite constructor TRS. Suppose the propositional formula POP∗SL (R) is satisfiable. Then the induced (innermost) runtime complexity of R is polynomial.

4

Experimental Results

We have implemented the encoding of POP∗ with semantic labeling (denoted by POP∗SL below) in OCaml. We compare this implementation to the implementation without labeling from [4] (denoted by POP∗ ) and an implementation of a restricted class of polynomial interpretations (denoted by SMC). To check satisfiability of the obtained formulas we employ the MiniSat SAT-solver [12]. SMC refers to a restrictive class of polynomial interpretations: Every constructor symbol is interpreted by a strongly linear polynomial, i.e., a polynomial n of shape P (x1 , . . . , xn ) = Σi=1 xi + c with c ∈ N, c 1. Furthermore, each defined symbol is interpreted by a simple-mixed polynomial P (x1 , . . . , xn ) = n bi x2i with coefficients in N. Unlike for the genΣij ∈0,1 ai1 ...in xi11 . . . xinn + Σi=1 eral case, these restricted interpretations induce polynomial bounds on the runtime complexity. To find such interpretation functions automatically, we employ cdiprover3 [20]. Table 1 presents experimental results based on two testbeds. Testbed T constitutes of the 957 examples from the Termination Problem Database 4.03 (TPDB) that were automatically verified terminating in the competition of 20074 . Testbed C is a restriction of T where only constructor TRSs have been considered (449 in total). All experiments were conducted on a PC with 512 MB of RAM and a 2.4 GHz Intel Pentium IV processor.



3 4

Available at http://www.lri.fr/~ marche/tpdb C.f. http://www.lri.fr/~ marche/termination-competition/2007/

POP∗ and Semantic Labeling Using SAT

163

Table 1. Experimental results on TPDB 4.0

Yes Maybe Timeout (60 sec.) Average Time Yes (sec.)

POP∗ T C

POP∗SL T C

SMC T C

65 41 892 408 0 0

128 74 800 370 29 5

156 83 495 271 306 95

0.037

0.130

0.183

Table 1 confirms that semantic labeling significantly increases the power of POP∗ , yielding comparable results to SMC. What is noteworthy is that the union of yes-instances of the three methods constitutes of 218 examples for testbed T and 112 for testbed C. For these 112 out of 449 constructor TRSs we are able to conclude a polynomial runtime complexity. Interestingly, POP∗SL and SMC succeed on a quite different range of systems. There are 29 constructor TRSs that only POP∗SL can deal with, whereas 38 constructor yes-instances of SMC cannot be handled by POP∗SL . Table 1 reflects that for both suites SMC runs into a timeout for approximately every fourth system. This indicates that purely semantic methods similar to SMC tend to get impractical when the size of the input system increases. Compared to this, the number of timeouts of POP∗SL is rather low. We perform various optimizations in our implementation: First of all, the constraints can be reduced during construction. Further, it is beneficial to lazily con(2) struct the overall constraint. For example, the formula f (s1 , . . . , sn ) >pop∗ si α reduces to . Hence f (s1 , . . . , sn ) >pop∗ si α =  can be concluded without constructing encodings for the remaining cases in Definition 2. Furthermore, s >pop∗ t is doomed to failure if t contains variables not appearing in s. For this case, we replace the corresponding constraint by ⊥. SAT-solvers expect their input in CNF (worst case exponential in size). We employ the transformation proposed in [21] to obtain an equisatisfiable CNF linear in size. This approach is analogous to Tseitin’s transformation [25] but the resulting CNF is usually shorter as the plurality of atoms is taken into account.

5

Conclusion

In this paper we have shown howto automatically verify polynomial runtime complexities of rewrite systems. For that we employ semantic labeling and polynomial path orders. Our automation works by a reduction to SAT and employing a state-of-the-art SAT-solver. To our best knowledge, this is the first SAT encoding of some recursive path order with finite semantic labeling. The experimental results confirm the feasibility of our approach. Moreover, they demonstrate that by semantic labeling we significantly increase the power of polynomial path orders.

164

M. Avanzini

Our research seems comparable to [10], where recursive path orders together with strongly linear polynomial quasi-interpretations are employed in the complexity analysis. In particular, they have a fully automatable (but of course incomplete) procedure to verify whether the functions computed by the TRS under consideration are feasibly, i.e., polytime, computable. Opposed to [10], we study the length of derivations here. In [7] it is shown that polynomially bounded innermost runtime-complexity entails polytime computability of the functions defined. As a by-product of Corollary 1, [7] gives us a procedure for the complexity analysis of the functions defined. Finally, we also mention that semantic labeling over a Boolean carrier has been implemented in the termination prover TPA [16], where heuristics are used to find an appropriately labeled TRS Rlab . Unlike their approach, we leave all choices concerning the labeling to a state-of-the-art SAT-solver. In the meantime, polynomial path orders have been extended in various ways. Inspired by the concept of predicative recursion and parameter substitution (see [9]), [6] extends polynomial path orders, widening their applicability. Our integration of semantic labeling naturally translates to this extension. Second, polynomial path orders can also be defined over quasi-precedences, compare [5]. Further, in [5] polynomial path orders have been combined with weak dependency pairs [14], a version of the dependency pair method suitably adapted for the study of runtime-complexities. In principle, this allows the use of those techniques that were developed in the context of dependency pairs for termination analysis, also for complexity analysis. In [5] we exploit two such techniques, namely argument filterings [18] and the usable rules criterion [2]. All above mentioned extensions have been implemented in the Tyrolean Complexity Tool, an open source complexity analyser for TRSs5 . Finally, we conclude with an application of our research. There is a long interest in the functional programming community to automatically verify complexity properties of programs. For brevity, we just mention [22,1,10]. Rewriting naturally models the evaluation of functional programs, and the termination behavior of functional programs via transformations to rewrite systems has been extensively studied. For instance, one recent approach is described in [13] where Haskell programs are covered. In joint work with Hirokawa, Middeldorp and Moser [3] we propose a translation from (a pure subset of higher-order) Scheme programs to term rewrite systems. The transformation is designed to be complexity preserving and thus allows the study of the complexity of a Scheme program P by the analysis of the transformed rewrite system R. Hence from compatibility of R with POP∗ we can directly conclude that the number of evaluation steps of the Scheme program P is polynomially bounded with respect to the input sizes. All necessary steps can be performed mechanically and thus we arrive at a completely automatic complexity analysis for (a pure subset of) Scheme, and eagerly evaluated functional programs in general.

5

For further information, see http://cl-informatik.uibk.ac.at/software/tct/

POP∗ and Semantic Labeling Using SAT

165

References 1. Anderson, H., Khoo, S., Andrei, S., Luca, B.: Calculating polynomial runtime properties. In: Yi, K. (ed.) APLAS 2005. LNCS, vol. 3780, pp. 230–246. Springer, Heidelberg (2005) 2. Arts, T., Giesl, J.: Termination of term rewriting using dependency pairs. TCS 236(1-2), 133–178 (2000) 3. Avanzini, M., Hirokawa, N., Middeldorp, A., Moser, G.: Proving termination of scheme programs by rewriting, http://cl-informatik.uibk.ac.at/~ zini/publications/SchemeTR07.pdf 4. Avanzini, M., Moser, G.: Complexity analysis by rewriting. In: Garrigue, J., Hermenegildo, M.V. (eds.) FLOPS 2008. LNCS, vol. 4989, pp. 130–146. Springer, Heidelberg (2008) 5. Avanzini, M., Moser, G.: Dependency pairs and polynomial path orders. In: Treinen, R. (ed.) RTA 2009. LNCS, vol. 5595, pp. 48–62. Springer, Heidelberg (2009) 6. Avanzini, M., Moser, G.: Polynomial path orders and the rules of predicative recursion with parameter substitution. In: Proc. 10th WST (2009) 7. Avanzini, M., Moser, G.: Complexity analysis by graph rewriting. In: Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp. 257–271. Springer, Heidelberg (2010) 8. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998) 9. Bellantoni, S., Cook, S.A.: A new recursion-theoretic characterization of the polytime functions. CC 2, 97–110 (1992) 10. Bonfante, G., Marion, J., P´echoux, R.: Quasi-interpretation synthesis by decomposition. In: Jones, C.B., Liu, Z., Woodcock, J. (eds.) ICTAC 2007. LNCS, vol. 4711, pp. 410–424. Springer, Heidelberg (2007) 11. Dershowitz, N.: Orderings for term-rewriting systems. In: 20th Annual Symposium on Foundations of Computer Science, pp. 123–131. IEEE, Los Alamitos (1979) 12. E´en, N., S¨ orensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004) 13. Giesl, J., Swiderski, S., Schneider-Kamp, P., Thiemann, R.: Automated termination analysis for Haskell: From term rewriting to programming languages. In: Pfenning, F. (ed.) RTA 2006. LNCS, vol. 4098, pp. 297–312. Springer, Heidelberg (2006) 14. Hirokawa, N., Moser, G.: Automated complexity analysis based on the dependency pair method. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) IJCAR 2008. LNCS (LNAI), vol. 5195, pp. 364–380. Springer, Heidelberg (2008) 15. Hofbauer, D.: Termination proofs by multiset path orderings imply primitive recursive derivation lengths. TCS 105(1), 129–140 (1992) 16. Koprowski, A.: Tpa: Termination proved automatically. In: Pfenning, F. (ed.) RTA 2006. LNCS, vol. 4098, pp. 297–312. Springer, Heidelberg (2006) 17. Koprowski, A., Middeldorp, A.: Predictive labeling with dependency pairs using SAT. In: Pfenning, F. (ed.) CADE 2007. LNCS (LNAI), vol. 4603, pp. 410–425. Springer, Heidelberg (2007) 18. Kusakari, K., Nakamura, M., Toyama, Y.: Argument filtering transformation. In: Nadathur, G. (ed.) PPDP 1999. LNCS, vol. 1702, pp. 47–61. Springer, Heidelberg (1999) 19. Lescanne, P.: Termination of rewrite systems by elementary interpretations. Formal Aspects of Computing 7(1), 77–90 (1995)

166

M. Avanzini

20. Moser, G., Schnabl, A.: Proving quadratic derivational complexities using context dependent interpretations. In: Voronkov, A. (ed.) RTA 2008. LNCS, vol. 5117, pp. 276–290. Springer, Heidelberg (2008) 21. Plaisted, D.A., Greenbaum, S.: A structure-preserving clause form translation. J. Symb. Comput. 2(3), 293–304 (1986) 22. Rosendahl, M.: Automatic complexity analysis. In: Proc. 4th FPCA, pp. 144–156 (1989) 23. Schneider-Kamp, P., Thiemann, R., Annov, E., Codish, M., Giesl, J.: Proving termination using recursive path orders and SAT solving. In: Konev, B., Wolter, F. (eds.) FroCos 2007. LNCS (LNAI), vol. 4720, pp. 267–282. Springer, Heidelberg (2007) 24. TeReSe: Term Rewriting Systems. CTTCS, vol. 55. Cambridge University Press, Cambridge (2003) 25. Tseitin, G.: On the complexity of derivation in propositional calculus. SCML, Part 2, 115–125 (1968) 26. Zantema, H.: Termination of term rewriting by semantic labelling. FI 24(1/2), 89–105 (1995)

Author Index

Avanzini, Martin

155

Bastenhof, Arno

57

Charlow, Simon

1

Franke, Michael

13

Graf, Thomas

Nikolova, Ivelina

72

Klarman, Szymon

Lassiter, Daniel 38 Lison, Pierre 102

124

114

Schnabl, Andreas

142

Theijssen, Daphne

87

Wintein, Stefan

25