357 95 7MB
English Pages 464 [453] Year 2021
Outstanding Contributions to Logic 20
Claudia Casadio Philip J. Scott Editors
Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics
Outstanding Contributions to Logic Volume 20
Editor-in-Chief Sven Ove Hansson, Division of Philosophy, KTH Royal Institute of Technology, Stockholm, Sweden
Outstanding Contributions to Logic puts focus on important advances in modern logical research. Each volume is devoted to a major contribution by an eminent logician. The series will cover contributions to logic broadly conceived, including philosophical and mathematical logic, logic in computer science, and the application of logic in linguistics, economics, psychology, and other specialized areas of study. A typical volume of Outstanding Contributions to Logic contains: • A short scientific autobiography by the logician to whom the volume is devoted • The volume editor’s introduction. This is a survey that puts the logician’s contributions in context, discusses its importance and shows how it connects with related work by other scholars • The main part of the book will consist of a series of chapters by different scholars that analyze, develop or constructively criticize the logician’s work • Response to the comments, by the logician to whom the volume is devoted • A bibliography of the logician’s publications Outstanding Contributions to Logic is published by Springer as part of the Studia Logica Library. This book series, is also a sister series to Trends in Logic and Logic in Asia: Studia Logica Library. All books are published simultaneously in print and online. This book series is indexed in SCOPUS. Proposals for new volumes are welcome. They should be sent to the editor-in-chief [email protected]
More information about this series at http://www.springer.com/series/10033
Claudia Casadio Philip J. Scott •
Editors
Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics
123
Editors Claudia Casadio Department of Modern Languages Literatures and Cultures University of Chieti-Pescara Pescara, Italy
Philip J. Scott Department of Mathematics and Statistics University of Ottawa Ottawa, ON, Canada
ISSN 2211-2758 ISSN 2211-2766 (electronic) Outstanding Contributions to Logic ISBN 978-3-030-66544-9 ISBN 978-3-030-66545-6 (eBook) https://doi.org/10.1007/978-3-030-66545-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Fig. 1 Joachim Lambek (Photo courtesy of Larry Lambek)
v
Preface
Joachim Lambek (1922–2014) was a highly original mathematician with a wide range of scientific interests. After early work in combinatorics and elementary number theory, he became a well-known algebraist. In the 1960s, he began to work in category theory, categorical algebra, logic, proof theory and the foundations of computability theory. In a parallel development, beginning in the late 1950s and for the rest of his career, Lambek also worked extensively in mathematical linguistics and computational approaches to natural languages. Lambek grammars (from the early 1960s) form a fundamental structure in mathematical linguistics. In the early 2000s, Lambek introduced a novel algebraic framework, pregroup grammars, for natural language, with algebraic, higher category and proof-theoretic semantics. In terms of physics (in which Lambek wrote one of his two PhD theses), he was a proud follower of a somewhat unconventional group of theorists who advocated using quaternionic methods in mathematical physics, following Hamilton, Maxwell, and Dirac. In this volume, we gather together noted experts to discuss the state of the art of various works by Lambek in logic, category theory, and linguistics. Acknowledgements The editors (C.C. and P. S.) would like to thank the overall series editor, Professor Sven Ove Hansson, both for his unflagging support and expert advice as well as for allowing us to continue with this volume after Lambek’s death. We also wish to thank the authors of the articles in this volume for their scholarly contributions. Most of the authors had close connections with Jim Lambek or his work, and the articles are often on themes he specifically discussed with them or on talks he actually attended. We would like to express our appreciation to Christi Lue at Springer for her expert support and advice on this volume as well as to Paloma Hammond and Oda Siqveland at Springer for helping us obtain copyright permissions. We are grateful to the Centre de recherches mathématiques (and its director, Professor Luc Vinet) for allowing us to reproduce the Lambek obituary. Finally, we thank the referees who reviewed the authors’ papers, which helped immensely in writing the final versions printed here. We also owe an immense debt of gratitude to Jim Lambek’s three sons: Michael, Larry, and Bernie, for their very special and personal help and support in this project. Pescara, Italy and Ottawa, Canada October 2020
Claudia Casadio Philip Scott
vii
Contents 0 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Claudia Casadio and Philip Scott 1 Lambek’s Syntactic Calculus and Noncommutative Variants of Linear Logic: Laws and Proof-Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Michele Abrusci and Claudia Casadio 2 Sheaf Representations and Duality in Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Steve Awodey 3 On the naturalness of Mal’tsev categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 D. Bourn, M. Gran, and P.-A. Jacqmin 4 Extensions of Lambek Calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Wojciech Buszkowski 5 Categories with Families: Unityped, Simply Typed, and Dependently Typed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Simon Castellan, Pierre Clairambault, and Peter Dybjer 6 The Mathematics of Text Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Bob Coecke 7 Aspects of Categorical Recursion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Pieter Hofstra and Philip Scott 8 Morphisms of Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Robert Paré 9 Pomset Logic: The other approach to noncommutativity in logic . . . . 299 Christian Retoré 10 Pregroup Grammars, their Syntax and Semantics . . . . . . . . . . . . . . . . . . . 347 Mehrnoosh Sadrzadeh 11 The Sequent Calculus of Skew Monoidal Categories . . . . . . . . . . . . . . . . . 377 Tarmo Uustalu, Niccolò Veltri, and Noam Zeilberger Appendix : An obituary of J. Lambek, reprinted from Bulletin of the CRM, and two of his expository papers reprinted from The Mathematical Intelligencer. • Joachim Lambek, FRSC, by M. Barr, P. Scott, and R. Seely . . . . . . . . . . . . 409 • If Hamilton Had Prevailed: Quaternions in Physics, by J. Lambek . . . . . 413 • Pregroups and Natural Language Processing, by Joachim Lambek . . . . . . 423
ix
List of Contributors V. Michele Abrusci University “Roma Tre", Department of Mathematics and Physics, Largo San Leonardo Murialdo 1, Rome. e-mail: [email protected] Steve Awodey Carnegie Mellon University, Pittsburgh, PA, USA. e-mail: [email protected] Dominique Bourn Université du Littoral, Laboratoire LMPA, BP 699, 62228 Calais Cedex, France. e-mail: [email protected] Wojciech Buszkowski Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznań, Poland. e-mail: [email protected] Claudia Casadio University “G. D’Annunzio" of Chieti and Pescara, Department of Language, Literature and Modern Cultures, Viale Pindaro, 42 - 65127 Pescara . e-mail: [email protected] Simon Castellan Inria, Univ Rennes, CNRS, IRISA, France. e-mail: [email protected] Pierre Clairambault Univ Lyon, EnsL, UCBL, CNRS, LIP, F-69342, LYON Cedex 07, France. e-mail: [email protected] Bob Coecke Oxford University, Department of Computer Science, Cambridge Quantum Computing Ltd. e-mail: [email protected]. [email protected] Peter Dybjer Chalmers University of Technology, SE-412 96 Göteborg, Sweden. e-mail: [email protected] Marino Gran Université catholique de Louvain, IRMP, Chemin du Cyclotron 2, 1348 Louvain-laNeuve, Belgique. e-mail: [email protected] Pierre-Alain Jacqmin Université catholique de Louvain, IRMP, Chemin du Cyclotron 2, 1348 Louvain-laNeuve, Belgique. e-mail: [email protected]
xi
xii
List of Contributors
Pieter Hofstra Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada. e-mail: [email protected] Robert Paré Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada. e-mail: [email protected] Christian Retoré LIRMM, Univ Montpellier, CNRS, Montpellier, France. e-mail: [email protected] Mehrnoosh Sadrzadeh Department of Computer Science, University College London. e-mail: [email protected] Philip Scott Department of Mathematics and Statistics, University of Ottawa, Canada. e-mail: [email protected]. Tarmo Uustalu Reykjavik University, Iceland, and Tallinn University of Technology, Estonia. e-mail: [email protected] Niccolò Veltri Tallinn University of Technology, Estonia. e-mail: [email protected] Noam Zeilberger École Polytechnique, Palaiseau, France. e-mail: [email protected]
Introduction Joachim Lambek enjoyed a long and distinguished career in the Mathematics Department of McGill University in Montréal. In Part I of this Introduction we include a brief biography1. In Part II we discuss some of Lambek’s academic interests and research programs. We shall not follow a strict chronological order, but rather a thematic one, since many of his widely cited works were pursued concurrently. Although Lambek was well known for his work in pure mathematics, in what follows–given the theme of this volume–we pay particular attention to his work in logic, linguistics, and category-theoretic foundations of mathematics. In Part III we discuss each of the papers in this volume in more detail, especially how they relate to Lambek’s various research programs. I. A brief biography Joachim (Jim) Lambek was born in Leipzig, Germany in 1922. According to his writings [47] he attended the prestigious King Albert Gymnasium (high school) just after the Nazis came to power in 1933. This school emphasized humanities and classical languages, from which he attributed his later interest in linguistics. However by the end of 1937, as a “non-Aryan” he was forced to leave the school. He briefly attended a Jewish high school, but after the Kristallnacht pogrom, life in Leipzig became increasingly dangerous. So roughly around 1939 his family fled Nazi Germany. He and his sister were put aboard the Kindertransport to England and his parents followed separately, arriving in England just before the outbreak of WW II. Shortly thereafter, the British authorities unaccountably decided to send many young male German refugees (who were unfairly classified “enemy aliens”) to internment camps in the Commonwealth. Jim was interned in New Brunswick, Canada, where he worked as a lumberjack. Fortunately, the younger detainees were allowed to continue their education. As Lambek noted [47]: The camp contained a number of distinguished scholars and scientists, who would give lectures on their specialties. The person with the greatest influence on me was the mathematician Fritz Rothberger, who had founded a subject called “Combinatorial Set Theory.” (He was to be a friend to me and my family for years to come.) In camp, he offered a course on advanced mathematics, mostly set theory and analysis, which was attended by Walter and me.
The other student was Walter Kohn, who went on to get a PhD in nuclear physics from Harvard, and who won the Nobel Prize in 1998 in what today is called quantum chemistry. Lambek often remarked–with some pride–that many of his fellow internees went on to successful careers in a wide variety of fields, including science, medicine, the arts, and politics. 1 In the Appendix we also include an Obituary of Lambek by M. Barr, P. Scott, and R. Seely, which first appeared in the Bulletin of the Centre de Recherches Mathématiques (CRM) of the Université de Montréal in the spring of 2015. This Introduction is an expansion of some of the themes mentioned there. xiii
xiv
Introduction
While in the internment camp, Lambek’s scholarly interests ranged widely. For example, he read Hardy’s Course in Pure Mathematics and Quine’s Mathematical Logic. The latter, together with some correspondence he had with Quine, seems to have piqued his later interests in logic, philosophy, and the foundations of mathematics. After some time, the British authorities realized their error and released the unfairly held refugees from internment. Lambek, supported by a local Montréal family, stayed on in Montréal, attending McGill University as an undergraduate as well as graduate student. He obtained an MSc in number theory in 1946 [16] supervised by the number theorist Gordon Pall and in 1950 submitted two short PhD theses [17, 18] under the supervision of a new McGill professor, Hans Zassenhaus, thus becoming the first person to receive a PhD in Mathematics at McGill.2 In his first thesis [17] Lambek studied quaternions in the foundations of mathematical physics, a topic in which he maintained a lifelong interest. He published an expository article on his ideas in [32] (reproduced in the Appendix of this volume). In the second thesis (and resulting publication) [18, 19] Lambek gave elegant necessary and sufficient geometric conditions for embedding a semigroup in a group, a problem posed to him by Zassenhaus (and previously solved by Mal’cev using a complicated argument). After his PhD, Lambek continued on at McGill, rising through the professorial ranks to the distinguished Redpath Chair in Mathematics, as well as being named a Fellow of the Royal Society of Canada. II. Academic Work During the 1950s, Lambek primarily pursued research in elementary number theory and algebra, although his wide scholarly readings continued. The classical logicians, including Church, Tarski, Curry, Gödel, and Kleene, would become major influences on him. For example, according to Lambek [45], in teaching a course of logic from Kleene’s book, he learned Gentzen’s proof theory (in particular, sequent calculi and cut-elimination), which played a fundamental role in so many of his later works. In the period 1955-1965, Lambek was primarily an algebraist; his mathematical work focussed mainly on ring theory, including two books [23, 13]. Lambek, in [30], recalls that he and G. D. Findlay studied homological algebra in the mid 1950s. Their unpublished paper [11] appears to have greatly influenced many of Lambek’s later works, including his book on ring theory [23] as well as his burgeoning work in linguistics. For example, in [11, 23], two novel operations N M and N M on bimodules M, N were introduced (read “N under M” and “N over M”, respectively)3 , based upon the observation that there are bijective correspondences between canonical homomorphisms : 2 We list the theses separately in the Bibliography, but they are scanned together as one electronic document in the McGill Archives. Lambek considered them as Parts I and II of his thesis. 3 In modern noncommutative linear logic notation, these operations would be written N M and N M, respectively (see the papers in this volume by Abrusci and Casadio, by Buszkowski, and by Retoré).
Introduction
xv
M ⊗N →P , M →PN , N → MP
.
The formal properties of these operations played a central role in Lambek’s Syntactic Calculus [25, 26]4. However, this was not his only influence in the development of the Syntactic Calculus. His original paper [25] cites works of Ajdukiewicz, Bar-Hillel, and Chomsky for their influence on his thinking. Crucially, already in 1958, Lambek employed a version of Gentzen’s cut-elimination theorem to establish a decision procedure for his calculus, in answer to a question of Bar-Hillel (see [29], p. 297). We shall discuss more of this history below. Lambek’s interest in category theory dates from his sabbatical year 1965-1966 at the ETH in Zürich, in which Bruno Eckmann gathered together an influential group of mathematicians to study algebraic topology, homological algebra, and category theory. This sabbatical year changed Lambek’s research focus. Although he continued his work in algebra for some time afterwards, his scientific interests increasingly turned towards the newly developing subjects of category theory and categorical logic, along with his increasing activity in linguistics and formal grammar. As we shall see, he also contributed some seminal ideas in theoretical computer science. His published work in category theory began with his monograph Completions of Categories [24]. In the late 1960s, Lambek was attracted by Mac Lane’s “coherence problems”. According to Mac Lane [48], p. 161, “A coherence theorem asserts: ...every diagram of a certain class commutes”. Here, Mac Lane is thinking of monoidal categories, and the diagrams are formally built from instances of certain canonical isomorphisms. Lambek’s key insight was to observe that such questions can often be formulated as the decision (or word) problem for the hom-sets of certain freely generated categories (with additional structure). Lambek turned this into a proof-theoretical problem (beginning in [21]) by developing a Gentzen-style logical system, whose formulas are the objects of a free category, and in which equivalence classes of formal sequent calculus proofs are the arrows. The decision problem is then resolved by proving an appropriate cut-elimination theorem. This also introduced a new idea into proof theory: the study of equations between proofs (corresponding to the meaning of a commutative diagram). These notions have had great influence in categorical logic and theoretical computer science.5 Let us illustrate this a bit more. Over many years, Lambek refined his prooftheoretical view of logics into what he called labelled deductive systems. For an insightful introduction see his paper [31]. For Lambek, a deductive system has three components: (a) formulas, (b) labelled proofs (or deductions), and (c) equations between proofs. He distinguished several styles of deductive systems; we shall mention his Lawvere-style and Gentzen Intuitionistic Sequent-style. 4 Similar operations were already studied in the theory of residuated lattices of Ward and Dilworth, Trans. Amer. Math. Soc. 45 (1939) 335-354. See also the paper of Buszkowski in this volume. 5 Aspects of Lambek’s general program in categorical proof theory are discussed in several papers in this volume. In particular, the reader is referred to Section 3 of the paper by P. Hofstra and P. Scott (and the references there); for dependently typed lambda calculi the paper by S. Castellan, P. Clairambault, and P. Dybjer; and–for a newer application–to the paper of T. Uustalu, N.Veltri, and N. Zeilberger.
xvi
Introduction
Lambek à la Lawvere-style considers deductive systems as directed multigraphs, whose nodes are the formulas (i.e., (a) above) and whose directed edges are the (labelled) proofs (i.e., (b) above). Any such deductive system for Lambek has a specified identity proof (or axiom) for each formula and a typed binary operation (called a rule of inference) for composition of proofs, as shown below: f
id A
A −→ A
A −→ B
g
B −→ C comp go f A −→ C .
A proof can be written as a tree (of labelled deductions), built from axioms and rules of inference, but (in general) the formulas and proofs need not be freely generated. Finally, we impose equations (c) on proofs, i.e., an equivalence relation on proofs specifying the three equations of a category. The above is the very minimum system. To get more complicated categories, Lambek imposed additional structure. He often discussed free deductive systems, whose formulas and proofs are indeed freely generated, using specified “logical” operations.6 In addition, Lambek imposes equations guaranteeing the quotient deductive system had some desired categorical structure. For example, in the Lambek-Scott book [43], one sees (among various systems), for any set of atomic formulas, the axioms of the deductive system of the logic of conjunction {∧, } give the free Cartesian category (generated by the atomic formulas), while the axioms of the deductive system of the logic of positive intuitionistic propositional calculus {∧, ⇒, } give the free cartesian closed category (generated by the atomic formulas).7 In the seminal paper [22], Lambek presents deductive systems for a host of interesting mathematical structures, as well as a novel approach, called multicategories, for developing deductive systems for various kinds of monoidal categories (and associated substructural logics). We sketch below Lambek’s insightful development in [30], which we strongly recommend, along with his paper [36]. Lambek’s favourite deductive systems were Gentzen’s intuitionistic sequent calculi [31]. Here (labelled) deductions are of the form f : Γ → B, where Γ is a finite list (possibly empty) of formulas8. At the very minimum, there is one Axiom (or identity deduction) for each formula A, and a generalized composition rule, Gentzen’s cut, as follows 9:
6 Here, the formulas are generated from certain atomic formulas and/or distinguished constants using some specified operations, while the deductions are generated from some distinguished arrows (called (logical) axioms) using certain specified operations (called rules of inference). 7 More generally, in [43] there are deductive systems freely generated by a directed multigraph G. For example, the axioms of the deductive system for {∧, ⇒, } mentioned above would give the free cartesian closed category generated by G. f
8 Also denoted Γ −→ B. We elide commas in lists, writing A1, A2 . . . , A n as A1 A2 . . . A n . We write Mult(Γ, B) for the collection of labelled deductions (multiarrows) Γ → B. 9 As Lambek remarked, the notation g f is ambiguous out of context, but we shall use it nevertheless.
Introduction
xvii f
id A
A −→ A
Λ −→ A
g
ΓAΔ −→ B cut gf ΓΛΔ −→ B
As before, in general, proofs can be represented as trees built from the axioms and the rules of inference. Lambek said this generates a contextfree production grammar, with the arrows reversed, which he called a contextfree recognition grammar. It is important to stress that Lambek does not consider Gentzen’s structural rules as in full intuitionistic logic: permutation, contraction, and weakening. Thus this system is a basic substructural logic, a (noncommutative) predecessor of Girard’s linear logic. To obtain a multicategory, Lambek imposed an appropriate equivalence relation between arrows f , g : Γ → B, just as he did for categories. These amount to multi-versions of identity, associativity, and commutativity of cut. It has turned out that in many areas, for example in higher category theory, multicategories continue to play a fundamental role (e.g., see the paper by Tarmo Uustalu, Niccolò Veltri, and Noam Zeilberger in this volume). A key observation, which goes back to Lambek’s earliest works in the area, is that instead of defining monoidal categories, closed categories, etc. in the traditional way, it is often possible to equationally introduce these structures more directly into a multicategory. Let us see how to add some structure into a multicategory. • A tensor product is a binary operation ⊗ on objects together with a canonical family of multiarrows (deductions) m AB : AB → A ⊗ B inducing a bijection Mult(ΓA ⊗ BΔ, C) Mult(ΓABΔ, C) This says: for every multiarrow (deduction) f : ΓABΔ → C, there corresponds a unique multiarrow f § : ΓA ⊗ BΔ → C such that f § m AB = f . Moreover, the uniqueness of f § may itself be expressed equationally, by postulating: for each g : ΓA ⊗ BΔ → C, we have: (gm AB )§ = g. • A right internal hom is defined by a binary operation on objects together with a canonical family of multiarrows (deductions) eC B : (C B)B → C inducing a bijection: Mult(Γ, C B) Mult(ΓB, C). This says: for each multiarrow f : ΓB → C, there is a unique multiarrow f ∗ : Γ → C B satisfying eC B f ∗ = f . As before, uniqueness is also equationally specifiable. There is also a dual notion of left internal hom, denoted .
Lambek studied a host of structured multicategories, often stemming from his work in linguistics. For example, right closed multicategories have , left closed multicategories have , residuated multicategories have ⊗, , . A fundamental idea introduced in [30] is the idea of generating multicategories via an internal language. At that time, the idea had already been popularized in the Lambek-Scott book [43] , but it had a long genesis in categorical logic. Briefly, the idea is that one develops a multi-sorted (or typed) algebraic theory associated to a
xviii
Introduction
multicategory M by saying its types are the objects (or the formulas) of M and its terms (or sorted operation symbols) are generated by the arrows (Gentzen sequents) of M, together with variables of each type (we assume there are countably many variables of each type.). Thus a sequent f : A1 A2 . . . An → B would yield a term f x1 . . . xn of type B, where the variable xi is of type Ai . Given two such sequents f , g : A1 A2 . . . An → B, we introduce an equality relation between terms of the same type f x1 x2 . . . xn =X gx1 x2 . . . xn , where X = {x1, . . . , xn }10 and we impose the equations of a multicategory. In the case of the additional structure mentioned above, the associated terms become more complicated operators and the equations added implement those defining that structure. For example, the closed structures would give versions of a linear lambda calculus. 11 There are several advantages in moving to multicategories and their internal languages. For example, Lambek [30] proves that the usual properties of ⊗ are easily deduced, say that ⊗ is a bifunctor. Another is that various coherence conditions on the monoidal category level become immediate using the internal language (e.g., Mac Lane’s pentagonal coherence condition for ⊗ is just a straightforward equation in the internal language, which Lambek easily proves directly!) As Lambek says [30] We thus see that the usual properties of the tensor product in a category follow from the canonical bijection between arrows Γ A ⊗ BΔ → C and Γ ABΔ → C in a multicategory. Of course, the traditional procedure is the reverse: one postulates that ⊗ is a bifunctor with a coherent natural associativity . . . One then turns the category into a multicategory by defining f : ABCD → E as f : ((A ⊗ B) ⊗ C) ⊗ D → E . . . The equations of a multicategory are then easily checked. However, in the spirit of Gentzen and Bourbaki, I prefer to go in the opposite direction, starting with a multicategory.
For an abstract (monadic) treatment of multicategories, so-called representable multicategories and coherence theory, the reader should see Hermida’s paper [15]. Lambek had a long-time interest in the calculus of relations and its applications. We shall mention only a few of his many publications in the area. For different surveys, see for example [39, 37]. Lambek, using relational techniques, studied production grammars for kinship terminologies for a wide range of languages in both linguistics and anthropology, e.g. [6, 33, 7]. He used relations to model categorial grammars and bilinear logic [35, 36], as well as in algebra and category theory itself [37, 40, 41]. The paper in this volume, “On the naturalness of Mal’tsev categories”, by Dominique Bourn, Marino Gran and Pierre-Alain Jacqmin, investigates a class of categories developed by Lambek and colleagues for studying homological algebra, in which the calculus of relations provides appropriate methods for diagram-chasing arguments (see also [38]). As we recall, the original motivation for his Syntactic Calculus was his early study of bimodules in homological algebra [11, 36]. The study of the algebra of such bimodules continues in this volume with the paper of Robert Paré, “Morphisms of Rings”. 10 The subscript X takes into account the possibility of empty types; see [43] and the paper of Hofstra and Scott in this volume. 11 Without structural rules, the function-theoretic interpretation of a Gentzen sequent does not allow closure of the class of function terms under: permuting variables, repeating (or duplicating) variables, or adding dummy variables to terms (cf. also the Hofstra and Scott paper in this volume.).
Introduction
xix
Finally, Lambek was also interested in applications of relational methods to theoretical computer science. For example, in the theory of recursive functions, he preferred defining r.e. sets and partial recursive functions purely relation-theoretically (in the latter case, avoiding the operation of minimization altogether). This theme is explored in the article by Lambek and Scott [42] and in [39]. Another topic of interest in computer science and category theory is the model PER (partial equivalence relations), a model of polymorphic lambda calculus (Girard’s System F ). The subject was also taken up in several papers of Lambek (cf. [34]) (see also the paper in this volume by P. Hofstra and P. Scott.) Lambek also contributed to other areas of theoretical computer science. One especially interesting contribution was his introduction of a version of an unlimited register machine (which he called an infinite abacus) as a simple model of idealized computation [20]. In another influential paper [27], he introduced an important abstract idea: the category of T-algebras for an endofunctor T : C → C on a category C. A key property enjoyed by this notion is that an initial T algebra A is a fixed point (up to isomorphism) of the functor T, i.e. it satisfies T(A) A. This observation, known as Lambek’s lemma, has been important in certain areas of Domain Theory and in denotational semantics. The above topics are also discussed in the article by Hofstra and Scott in this volume. We shall just mention in passing the book by J. Lambek and P. J. Scott, Introduction to Higher Order Categorical Logic [43]. Both its authors were extremely pleased with how well the book was received by logicians and theoretical computer scientists. Its use of natural deduction methods (also due to Gentzen) was a change from Lambek’s original uses of Gentzen sequent calculi (in both coherence theory and syntactic calculus). Several of the key topics developed in the book are discussed (and extended) in the articles in this volume by Awodey, by Castellan, Clairambault, and Dybjer, and by Hofstra and Scott. It should be noted that the book [43] was originally intended to be a two part series and both authors regretted that this never came to pass. As mentioned in the Introduction to that book, it was intended that the first volume, [43], would set up the categorical logic background for analyzing various foundational topics (the interested reader is encouraged to look at [43], p. viii. ) One particular topic, the analysis of Gödel’s Dialectica Interpretation, strongly influenced the book’s development, from studies of free cartesian closed categories to the various “meta-principles” studied in the free topos. But alas the study of the Dialectica Interpretation itself (in the style of the first volume) was left to the second volume which, along with the other topics intended to be studied there, never were pursued. Perhaps the main reason for the lack of a volume II was that in 1986, when the book [43] appeared, a remarkable and deeply influential subject arose: J-Y Girard’s Linear Logic. Many categorical logicians immediately started working in the area. Both authors J. Lambek and P. Scott pursued their individual interests in linear logic, with Lambek in particular realizing that his earlier work on the syntactic calculus amounted to a noncommutative version of intuitionistic linear logic (see the paper by Abrusci and Casadio in this volume). Also, because Girard’s presentation
xx
Introduction
of linear logic was based on Gentzen’s sequent calculus, Lambek was happy to return to that style of calculus. We shall describe more of this history below. Let us end this part with an amusing story. As should be clear, Lambek was at heart an algebraist. In particular, he did not approve of the use of geometric arguments, preferring instead concrete algebraic derivations. Girard’s linear logic is famous for its novel theory of proof nets: graphical networks representing proofs, with their own geometrical/graphical correctness criterion. At about the same time, similar geometrical ideas (string diagrams) started to appear in the category theory literature associated to monoidal categories. Versions of proof nets for noncommutative linear logic are discussed in some detail in Retoré’s article in this volume. In a lecture on bicategories in February, 2003, Lambek wrote: I don’t like proof nets, commutative diagrams etc. I follow Descartes’ advice: replace geometry by algebra.
Retoré also records a similar quotation to the one above. Ironically, in a few years, Lambek himself was using a kind of primitive version of proof nets in his pregroup grammars (see below) while in various areas of linguistics genuine proof net methods have been employed, for example by Glyn Morrill and Christian Retoré. We end this Introduction with some of the history of Lambek’s works in linguistics. Categorial grammars were first introduced by Ajdukiewicz [3], who traced the idea back to Lesniewski and Husserl. His “syntactic connexion” influenced BarHillel [5], who formalized the distinction between left-looking and right-looking categories. As we mentioned earlier, Lambek arrived at a similar idea while studying homological algebra, in collaboration with George Findlay, and in preparing his book “Lectures on rings and modules” [23], introduced the module operations and . By considering Church’s type theory, Lambek realized that a similar technique could be applied to natural languages, and introduced his syntactic calculus, a logical system with three binary operations –using Lambek’s linguistic notation– ⊗, / (over) and \ (under), satisfying the condition (known as residuation) (x ⊗ y) → z if and only if x → (z/y) if and only if y → (x\z). where the arrow stands for logical deduction, but can also be taken as a partial order over types, when talking about an ordered algebraic system, namely a residuated semigroup. Later, Lambek added an identity element to the syntactic calculus and the ordered algebraic system became a residuated monoid. As Lambek mentions [45], p. 94: After having realized the possible application to natural languages, I rushed to the library and found the article by Yehoshua Bar-Hillel in the journal Language [5]. He had essentially developed the same arithmetical notation, except that he only adopted my symbol for under in a later publication. As might have been expected, he turned out to be the referee of my 1958 paper [25], which I had submitted to the American Mathematical Monthly. He had two objections: 1. my typist had written “categorical” instead of categorial, 2. the system I advocated had no obvious decision procedure. I did not confess that it was I, not the typist, who was responsible for (1), but I was able to rectify (2) by turning to Gentzen’s sequent
Introduction
xxi
calculus, which I had learned while teaching a logic course using Kleene’s (1952) book. It turned out that, for my syntactic calculus, Gentzen’s three “structural rules”, interchange, contraction and weakening, were not needed. In fact, the proof of the crucial cut elimination theorem was easier in the absence of these rules.
A couple of years later, Lambek was invited to a symposium of the American Mathematical Society on the applications of mathematics to linguistics, which was organized by Roman Jakobson [26]. There he discussed a non-associative version of the syntactic calculus and pointed out that Bourbaki’s introduction of the tensor product was essentially a categorical version of a Gentzen style introduction rule. This idea later gave rise to his notion of a “multicategory” [22], mentioned earlier. At the symposium Lambek learned of a parallel development by H. B. Curry, who had similarly explored a positive intuitionistic propositional logic for types in linguistics, that later gave rise to Montague semantics, in view of the so-called Curry-Howard isomorphism between proofs in intuitionistic logic and the lambda terms of Church. In the period around 1980 there appeared a revival of interest in categorial grammars in general and in Lambek’s syntactic calculus in particular, stimulated in the first place by the work of W. Buszkowski in Poland (cf. [8], [9]) and J. van Benthem in the Netherlands (cf. [53]). In 1985 a conference was held in Tucson, Arizona, in which a group of linguists and logicians, including E. Bach, J. van Benthem, W. Buszkowski, D. Dowty, E. Keenan, M. Moortgat, and M. Steedman, met to discuss the developments of categorial grammars; Lambek was present contributing with a paper about the connections between his syntactic calculus and category theory [29]. In the following years a number of theoretical questions concerning the syntactic calculus were answered by scholars like W. Buszkowski, M. Pentus and others. For example, his syntactic calculus was shown by Pentus (1997) to coincide with context-free languages, answering a conjecture of Chomsky. The connection to Montague semantics was pointed out by J. van Benthem and exploited in textbooks by M. Moortgat [49] and G. Morrill [50]. A new stream of ideas flooded the scene with Girard’s linear logic [14]. This differed from the syntactic calculus in generalizing classical rather than intuitionistic logic, and in retaining (in a more subtle form) Gentzen’s structural rules. It led Abrusci [1], [2] and Lambek [35] to study a classical version of the syntactic calculus, a noncommutative version of Girard’s Linear logic, which Lambek named “classical bilinear logic”. In 1997, after a lecture given by Claudia Casadio at McGill university in Montréal in which she showed applications of classical bilinear logic to linguistics, Lambek realized that the distinction between the tensor product and its De Morgan dual (par) introduced an unwanted complication. Identifying these two binary operations one obtains a simpler system, which he called “compact bilinear logic”. Lambek [45] p. 95, noticed that the word compact had been introduced by Max Kelly in a categorical context and was used in this sense by Michael Barr in his “starautonomous” categories, essentially a categorical version of multiplicative linear logic.
xxii
Introduction
The calculus of pregroups is the algebraic version of compact bilinear logic. For excellent introductions, see Lambek’s article in the Appendix to this volume, as well as [44] and the article by Mehrnoosh Sadrzadeh in this volume. The calculus of pregroups is based on the categorial tradition, as well as being a natural generalization of partially ordered groups.12 In retrospect, this approach was already implicit in the work of Zellig Harris [12], Chomsky’s teacher. In a free pregroup one starts with a partially ordered set of basic types a, a1 , . . . an ; from each basic type one can form simple types with left and right adjoints: a and ar . Some modern languages require also types with double adjoints: a and ar r . In general a type is a finite string of simple types. The only rules needed for linguistic calculations are the contraction rules a a → 1 and a ar → 1 . Here, the arrow → is used to denote the partial order of basic types, which easily extends inductively to all types. The underlinks in the pregroup analysis of a sentence may be viewed as degenerate instances of what linear logicians call proof nets, and represent the way the type of a phrase or a sentence is obtained starting from the types of its constituents. From a linguist’s point of view, they can be considered as representing the “deep structure” of the context under analysis. The final dash after a string of types represents what Lambek calls “a Chomskyan trace” (see also the Sadrzadeh article in this volume). It turns out that double adjoints occur wherever modern European languages would require traces (see e.g. [46]). Double adjoints are also useful for typing clitic pronouns in Romance languages (see, e.g. [4]). Lambek’s algebraic approach to grammar via free pregroups was presented at several conferences and in a number of papers written in cooperation with C. Casadio, A. Preller, D. Bargelli and others. Much theoretical work has been done by W. Buszkowski, who proved cut-elimination for compact bilinear logic and showed that pregroup grammars are context-free. Since 1998, pregroup grammars have been applied to fragments of a number of languages: English, French, German, Italian, Turkish, Arabic, Polish, Latin and Japanese. Finally, we should mention that Jim Lambek was a huge fan of Inspector Morse, a British detective drama television series based on novels by Colin Dexter [10]. Indeed, his passion for Morse’s insistence on using whom for the object pronoun led to him writing a pregroup-based article on the subject [46], where he states: My authority for saying whom rather than who is the late Inspector Morse (see Dexter, 1994), who kept on reminding his sergeant: “whom, Lewis, whom”. However, not only Sergeant Lewis, but even Noam Chomsky and Steven Pinker accept who as the natural usage for the object pronoun. Pinker (1994: p. 116) asserts: “In the U.S. whom is used consistently only by careful writers and pretentious speakers.” I apologize for being a pretentious speaker; but, English being my second language, the object pronoun whom comes to me more naturally than who.
12 Lambek also pointed out that the natural category-theoretic generalization of these ideas leads to a notion of bicategory (due to Bénabou) which he called a compact bicategory.
Introduction
xxiii
III. The papers in this volume We shall now introduce each of the papers in this volume, including some background to each paper. • Lambek’s Syntactic Calculus and Noncommutative Variants of Linear Logic Laws and Proof-Nets by Michele Abrusci and Claudia Casadio The Calculus of Syntactic Types was introduced by Lambek in 1958 [25]. Also known as the Lambek Calculus (LC), it represents a milestone in the field of mathematical linguistics, with particular reference to categorial grammars. After Girard’s discovery of Linear Logic in 1986, it was shown that LC corresponds to the noncommutative fragment of multiplicative intuitionistic linear logic (MILL). Specifically LC is a noncommutative fragment of Linear Logic without additive operations: the calculus presented in [25] and in general use in linguistic analysis contains neither additive constants nor additive connectives. LC is an intuitionistic calculus without structural rules, in the sense that there is asymmetry in the sequents (only one conclusion, whereas the number of hypotheses may be arbitrary). However classical systems connected to LC have also been studied: Noncommutative Multiplicative Linear Logic (NMLL) and Cyclic Multiplicative Linear Logic (CyMLL)13 in which the sequents are classical (i.e., the number of conclusions as well as hypotheses may be arbitrary). Moreover, what appears in the left side of sequents (in the space of the hypotheses) is just the (de Morgan) dual of what appears in the right side. In that case, it suffices (à la Girard) to consider just one-sided sequents (usually, the right side). This paper by Michele Abrusci and Claudia Casadio is devoted to the relationship between Lambek’s Syntactic Calculus (LC) and Cyclic Multiplicative Linear Logic (CyMLL). In particular, the paper proposes a geometrical representation of a set of well-known laws of LC (Residuation, Monotonicity, Application, Expansion, Type-raising, Composition, Geach and Switching laws) by means of cyclic multiplicative proof-nets (CyM-PN). The definitions and detailed representations of the CyM-PNs for each of the mentioned laws offer a better understanding of their internal structure, allowing for their classification into three main families of graphs: cyclic multiplicative proof-nets with one, two, and three axiom links, respectively. Owing to this geometrical representation, some new laws of LC not yet considered in the literature are obtained, on the basis of which some possible linguistic applications and examples are presented. • Sheaf Representations and Duality in Logic by Steve Awodey In the early 1970s, elementary topos theory was being developed by F. W. Lawvere and M. Tierney. This project was itself influenced by the profound work of A. Grothendieck and the French school of Algebraic Geometry in the 1960s. A theorem of fundamental importance, published by Grothendieck (1960), states: 13 Both systems also have additive counterparts: Noncommutative Multiplicative-Additive Linear Logic (NMALL) and Cyclic Multiplicative-Additive Linear Logic (CyMALL). The various noncommutative systems mentioned have been studied in several articles by Abrusci.
xxiv
Introduction
Every commutative ring is isomorphic to the ring of continuous global sections of a sheaf of local rings. Lambek was well aware of the area: in the early 1970s he published articles on sheaf representation in module theory. But in 1981 he realized there was a corresponding such theorem for elementary toposes, and (with I. Moerdijk) published two sheaf representation theorems [28]. Interestingly, this led to new insights into Henkin’s Completeness Theorem for Higher Order Logic (type theory) [28], which was developed further in Lambek-Scott [43] and in later papers by Lambek. The subject was then greatly developed by Steve Awodey and his students and collaborators. In this article, Steve Awodey gives his perspective as a key contributer to the area. He also elegantly introduces the point of view, often noted by Lambek, of connections of this work with the grand themes of Duality Theory (e.g., Gelfand Duality) in 20th-century mathematics. • On the Naturalness of Mal’tsev Categories by Dominique Bourn, Marino Gran and Pierre-Alain Jacqmin As previously noted, Lambek had a long-time interest in applications of the calculus of relations in a wide range of areas from algebra and category theory to anthropology and linguistics. In particular, he applied relation-theoretic methods to group theory and homological algebra (e.g., proofs of the “Butterfly” and “Snake Lemmas” (see his amusingly titled article [40]) as well as for equational (diagrammatic) reasoning in various algebraic categories, e.g., Mal’tsev varieties in universal algebra and variants of Abelian categories [41]. For example, based on some of this earlier work, Mal’tsev categories were introduced by Carboni, Lambek, and Pedicchio as a weakening of the notion of Abelian category into one which is closed under many more categorical constructions, is appropriate for carrying out basic homological algebra, and for which the calculus of relations provides suitable machinery for diagrammatic reasoning. In this article, Bourn, Gran, and Jacqmin survey the history and current research on Mal’tsev categories and discuss the many reasons why such categories continue to play a fundamental role in contemporary research. • Extensions of Lambek Calculi by Wojciech Buszkowski In this article, the author gives a comprehensive survey of Lambek calculi (associative and nonassociative) as formal logics, both substructural and as categorial grammars. As Buszkowski notes, the aim of this paper is to emphasize the role of the Lambek calculus in the world of nonclassical logics. He begins with a survey of the semantics of the kinds of logics dealt with: variations of residuated lattice models, relation algebra models, and phase semantics models (from linear logic). The role of posetal Galois connections is highlighted, along with the distinction between classical and intuitionistic semantics. The author then surveys sequent calculus presentations of Lambek’s calculi L and N L as well as discussing cut-elimination, completeness, interpolation
Introduction
xxv
theorems, decidability and complexity issues. He also provides an insightful presentation of the associated categorial grammars. In the next section, the paper surveys different substructural intuitionistic-style logics. In a wide-ranging discussion, the author gives many substructural and alternative logics that can be presented as extensions of full Lambek logics, e.g., Łukasiewicz logic and Hajek’s basic fuzzy logic. The author discusses fundamental properties such as the finite model property, complexity and decidability issues, cut-elimination, and algebraic semantics. He ends this discussion with a description of some related multi-modal logics. The last sections focus on presenting sequent formulations of various classical linear logics, expanding on the previous sections. The author considers logics without exponentials with emphasis on noncommutative logics, including Lambek’s compact bilinear logic. • Categories with Families: Unityped, Simply Typed, and Dependently Typed by Simon Castellan, Pierre Clairambault, and Peter Dybjer The book by Lambek and Scott [43] (Part I) expanded on earlier writings of Lambek by proving a strong categorical equivalence between cartesian closed categories and typed lambda calculi. One missing direction (discussed in the Introduction of [43]) was, regrettably, not being able to include a treatment of the novel work (at that time) of Robert Seely on connections of Martin-Löf’s dependent type theories with locally cartesian closed categories. As it turned out, this was rather more subtle than originally thought. Let us quote the above authors (references are from their paper): Seely’s seminal paper [35] claims to prove that a category of Martin-Löf type theories is equivalent to a category of locally cartesian closed categories (lcccs). However, his result relies on an interpretation of substitution as pullback, and the latter are only defined up to isomorphism. It is not clear how to choose pullbacks in such a way that the strict laws for substitution are satisfied. This coherence problem is identified and solved by Curien [16] and Hofmann [21], who provide alternative methods for interpreting Martin-Löf type theory in lcccs (see also [15]). By using Hofmann’s interpretation Clairambault and Dybjer [11, 12] show that there is an actual biequivalence of 2-categories. In this paper we ask ourselves what it would take to add the missing chapter on Martin-Löf type theory and its correspondence with lcccs to the book by Lambek and Scott. The authors give a uniform, modern treatment of the area based upon the increasingly important notion of categories with families. In their wide-ranging discussion, they include various higher biequivalence theorems between type theories and (for example) cartesian operads, Lawvere theories, and cartesian and locally cartesian closed categories. Such notions have had an increasing appearance in many recent studies of dependent type theories, including Voevodsky’s Homotopy Type Theory. Their paper ends with an important discussion, proving the undecidabilty of the word problem for the bi-initial (in a 2-categorical sense) locally cartesian closed category on one base type.
xxvi
Introduction
• The Mathematics of Text Structure by Bob Coecke The contribution by Bob Coecke presents new results stemming from several areas of research in the field of quantum logic and computation, especially developed at the Department of Computer Science, University of Oxford. In previous work the author developed a formal system, referred to as DisCoCat (Categorical Compositional Distributional – in reverse order), for analyzing how words interact in a sentence in order to produce the meaning of that sentence. To this extent, the structural match between grammar and categories of meaning spaces is exploited. In the present work, the system DisCoCat of the basic interactive syntax and semantics at the sentence level is extended to the new system DisCoCirc, in which the question of textual meaning is addressed. Specifically, the way in which sentences interact in texts in order to produce textual meaning. While in DisCoCat all meanings are fixed as states, in DisCoCirc word meanings correspond to a type, or system, and the states of this system can evolve. Sentences are gates within a circuit which update the variable meanings of those words. As in DisCoCat, word meanings can live in a variety of spaces, for example, propositional, vectorial, or cognitive. The compositional structure is given by string diagrams representing information flows, and a text yields a single string diagram in which word meanings lift to the meaning of the entire text. This project is ambitious since textual linguistics is a rather wide field and several questions are crucial, e.g., coordination, dependencies, inter-sentential and long-distance anaphora. Lambek’s work in category theory and mathematical linguistics is behind the development of this paper, in particular his introduction of the pregroup calculus based on what he called compact bilinear logic (cf. the papers by Wojciech Buszkowski and Mehrnoosh Sadrzadeh in this volume). An interesting aspect of Coecke’s paper is the aim of extending a mathematical foundation for sentencemeaning composition – in which word meanings interact to generate sentence meanings – to text-meaning composition, in which sentence meanings interact to generate possible text meanings. Of particular interest is the dynamic interactive analysis of text, as nicely expressed by the author (page 199): “Text is a process that alters our understanding of words.” That is, the idea of text interpretation as a process in progress, not determined in advance. As mentioned by the author, while the developments in this paper are independent of a physical embodiment (cf. classical vs. quantum computing), both the compositional formalism and the model of meaning are highly quantum inspired, allowing for implementation on a quantum computer. • Aspects of Categorical Recursion Theory by Pieter Hofstra and Philip Scott Jim Lambek – in his writings and public presentations – had a long-time interest in the foundations and history of computability theory. In this paper the authors examine three particular questions that occupied Lambek for many years. 1. Are there natural recursion theories? 2. What are the computable functions and functionals in various concrete categorical structures?
Introduction
xxvii
3. Are there intrinsic algebraic/categorical approaches to recursion theory? Re questions 1 and 2, Lambek often asserted that “natural” theories of computable numerical functions (both simple and higher-type) should correspond to numerical functions (or functionals) definable in various familiar free categories. Such categories should include the free monoidal, cartesian and cartesian closed categories, free topos, etc. (all with a natural numbers object N), generated by the empty graph. This is the viewpoint of Part III of the Lambek-Scott book [43], where standard logicians’ techniques (applied to the internal logics of such categories) are employed to classify such functions. In the first part of this paper, the authors survey the background and literature of Lambek’s ideas in computability theory along the lines of [43]. Concerning question 3, the authors discuss several categorical approaches to abstract computability arising in the recent category theory literature. This includes such notions as computation by normalization, realizability toposes, Turing categories, a co-algebraic approach to computable partial functions, and categorical approaches to complexity theory. • Morphisms of Rings by Robert Paré In this article, Robert Paré discusses the seemingly naive question: what is a morphism of rings? As Paré notes, this is a question that Lambek himself, of course, had thought carefully about and there are two obvious answers: the ordinary notion of ring homomorphism, and the notion of bimodules (here, an S-R-bimodule S MR is considered as a kind of morphism M : R −→ S). Paré discusses the questions: (i) are there other kinds of morphisms and (ii) what kind of structure do these morphisms enjoy? The answers arise from a surprising application of the notion of double categories (Ehresmann 1965), structures which have seen increasing interest in recent years in both mathematics and theoretical computer science. Paré shows that a natural class of morphisms that arise in this way are the amplimorphisms, which were also discovered independently by mathematical physicists in quantum field theory. • Pomset Logic: The other approach to noncommutativity in logic by Christian Retoré In this paper, Christian Retoré gives an up-to-date and detailed analysis of his own approach to noncommutativity in logic called pomset logic, a variant of classical linear logic motivated by Girard’s coherence space semantics for linear logic. Pomset logic includes an operator A B, called before, which is associative and self-dual: (A B)⊥ = A⊥ B⊥ . The conclusion of a pomset logic proof is a Partially Ordered Multiset of formulas. Pomset logic is defined by a refined notion of proof-nets and has a denotational semantics in the category of coherence spaces. Retoré describes the fine details of his approach versus Lambek’s. Although both calculi are noncommutative versions of multiplicative linear logic, Lambek’s calculus is basically a noncommutative restriction of intuitionistic linear logic
xxviii
Introduction
whereas pomset logic is basically a noncommutative extension of classical linear logic. The author presents new results on proof nets for this calculus, based on the technology of directed cographs (dicographs), along with work of Alessio Guglielmi and Lutz Straßburger on connecting the theory with Guglielmi’s deep inference. These ideas provide an important impetus for many of the newest developments recorded here. The author discusses sequentialisation and “the quest” for a complete sequent calculus for his system, and provides a surprising example (in joint work with Lutz Straßburger) of a proof net that does not derive from any simple sequent calculus. As the author mentions, recent work of Sergey Slavnov (2019) for the first time gives a kind of sequent calculus which is complete with respect to pomset proof nets (but at the cost of using rather complicated decorated sequents). In another point of contact with Lambek, Retoré finishes with applications of pomset logic to linguistics. He shows how one can design grammars by associating words with partial proof nets of pomset logic, and their relationship with Lambek grammars in the manner of Abrusci. As Retoré points out, Lambek was the external reader of his habilitation, and took a keen interest in the developments reported here. The author reports interesting observations that Lambek made on these notions. • Pregroup Grammars, Their Syntax and Semantics by Mehrnoosh Sadrzadeh The author studies the semantics of pregroup grammars and surveys recent advances in vector space modelling in natural language processing. Sadrzadeh begins with an introduction to pregroups and applications to natural language, where she illustrates Lambek’s modelling of complex sentences, yes-no questions and Wh-questions. The latter notions are developed in some detail, in particular exemplifying iterated adjoint types and Chomsky traces. Then the author illustrates one of the main issues the paper addresses: what Lambek called the ambiguity (or noncompositionality) of the naive set-theoretic modelling of pregroup expressions. For example, the type x · y · zl can have two different settheoretic meanings [[−]] depending upon how it is bracketed; that is, although x · y · zl = (x · y) · zl = x · (y · zl ), unfortunately [[(x · y) · zl ]] [[x · (y · zl )]]. The author follows a suggestion of Lambek and discusses finite dimensional vector space semantics for pregroups, in which the adjoint types are interpreted as dual spaces. The question is how to interpret the meaning of x · y. Interpreting this as the Cartesian product of meanings (which in finite dimensional spaces is isomorphic to a direct sum) turns out to yield similar ambiguity problems as the Set case. But if dimensionality explosion is tolerated and direct sum is replaced by tensor product, the problem gets resolved. On the practical side (for example, “data-driven vector space semantics”), the author surveys recent results. Sadrzadeh builds semantic vector representations for some exemplary words, phrases, and sentences of language and shows how compositionality of vector semantics disambiguates meaning. Finally, the paper presents a vector semantics and analysis of questions and demonstrates how their representations relate to the sentences they are asked about. Future directions in the field are also discussed.
Introduction
xxix
• The Sequent Calculus of Skew Monoidal Categories by Tarmo Uustalu, Niccolò Veltri, and Noam Zeilberger The last paper in the volume presents the proof theory of skew monoidal categories as an instance of a Lambek-style analysis of proofs and solves the coherence problem (the decision problem for equality). Skew monoidal categories are a variation of monoidal categories in which the unit and associativity transformations are no longer required to be natural isomorphisms but merely natural transformations in a certain direction. They first arose in classification theory of Hopf algebras (Szlachányi 2012), quantum categories, and related areas. The authors construct free skew monoidal categories using an appropriate Gentzen sequent calculus and analyze the decision (word) problem. The techniques involve use of machinery inspired by linear logic proof search (introduced by Andreoli), based on the technique of focusing. This involves having a distinguished stoup position in the antecedents of sequents, and adaptation of sequent calculus rules of inference to accommodate stoups. They prove the resultant calculus is sound and complete with respect to the existence of maps in the free skew monoidal category. By setting up an appropriate equivalence relation on proofs and associated rewriting machinery, the authors are able to pick canonical representatives of each equivalence class of proofs, thus solving the coherence problem. But the paper points out a subtlety: unlike in ordinary coherence problems, Mac Lane’s slogan “all diagrams commute” is no longer true in the skew monoidal case. It is possible to have more than one map between a pair of objects in the free skew monoidal category on a set of generators (even on one generator). The theory of skew monoidal categories has been studied in an extensive series of papers by Bourke, Lack, Street and collaborators since 2014. The authors of this paper compare their work with recent papers of Bourke and Lack, as well as discuss the sense in which Lambek’s language of multicategories provides a better understanding of the proof-theoretic analysis given here. Finally, the authors have formalized this development in the dependently typed programming language Agda.
References 1. Abrusci, V. M. Non-Commutative Intuitionistic Linear Logic. Zeitschr. f. Math. Logik und Grundlagen d. Math. 36 (1990), 11–15. 2. Abrusci, V. M. Phase semantics and sequent calculus for pure noncommutative classical linear propositional logic. The Journal of Symbolic Logic 56, 4 (1991), 1403–1451. 3. Ajdukiewicz, K. Die syntaktische Konnexität. Studia Philosophica 1 (1935), 1–27. Eng. trans., Syntactic connexion. In Polish Logic, S. McCall (ed.), Oxford: Clarendon Press (1967). 4. Bargelli, D. and Lambek, J. An algebraic approach to French sentence structure. In P. de Groote et al. (eds.), Logical aspects of computational linguistics. Springer LNAI 2099 (2001), 62–78.
xxx
Introduction
5. Bar-Hillel, Y. A quasi-arithmetical notation for syntactic description. Language 29 (1953), 47–58. In Language and Information: Selected Essays on Their Theory and Application, Bar-Hillel, Y. (ed.), Addison-Wesley (1964), 61–74. 6. Bhargava, M. and Lambek, J. A Production Grammar for Hindi Kinship Terminology. Theoretical Linguistics 10, no. 2/3 (1983), 227–245. 7. Bhargava, M. and Lambek, J. A rewrite system of the Western Pacific: Loundsbury’s Analysis of Trobriand Kinship Terminology. Theoretical Linguistics 21, no. 2/3 (1995), 241–253. 8. Buszkowski, W. Lambek’s categorial grammars. PhD thesis, Mathematical Institute, Adam Mickiewicz University, Poznan (1982). 9. Buszkowski, W. Completeness results for Lambek syntactic calculus. Zeitschr. f. Math. Logik und Grundlagen. d. Math. 32 (1986), 13–28. 10. Dexter, C. The Second Inspector Morse Omnibus. London: Pan Books (1994). 11. Findlay, G. D. and Lambek, J., Calculus of Bimodules. McGill University, manuscript (1955). 12. Harris, Z. S. A cyclic cancellation-automaton for sentence wellformedness. International Computation Centre Bulletin 5 (1966), 69–94. 13. Fine, N. J., Gillman, L., and Lambek, J. Rings of Quotients of Rings of Functions, McGill University Press (1965). 14. Girard, J.-Y. Linear logic. Theoretical Computer Science 50 (1987), 1–102. 15. Hermida, C. Representable Multicategories. Advances in Mathematics 151 (2000), 164–225. 16. Lambek, J. A Non-Distributive Calculus of Numerical Functions. MSc thesis, McGill University (1946). Available from McGill Archives. https://escholarship.mcgill.ca/concern/theses/h989r630z?locale=en 17. Lambek, J. Biquaternion Vector Fields over Minkowski Space. PhD thesis, McGill University (1950). Available from McGill Archives. https://escholarship.mcgill.ca/concern/theses/0c483p326?locale=en 18. Lambek, J. The Immersibility of a semigroup into a group. PhD thesis, McGill University (1950). Available from McGill Archives. https://escholarship.mcgill.ca/concern/theses/0c483p326?locale=en 19. Lambek, J. The Immersibility of a semigroup into a group. Can. J. Mathematics 3 (1951), 34–43. 20. Lambek, J. How to program an infinite abacus. Canadian Mathematical Bulletin 4, 3 (1961), 295–302. 21. Lambek, J. Deductive Systems and Categories I. Journal of Mathematical Systems Theory 2 (1968), 287–318. 22. Lambek, J. Deductive Systems and Categories II. Standard Constructions and Closed Categories. Springer LNM 86 (1969), 76–122. 23. Lambek, J. Lectures on Rings and Modules, 3rd edition. New York: Chelsea Publishing Co. (1986). 24. Lambek, J. Completions of Categories. Springer LNM 24 (1966), Springer-Verlag. 25. Lambek, J. The mathematics of sentence structure. Am. Math Monthly 65 (1958), 154–170. 26. Lambek, J. On the calculus of syntactic types. In Jakobson, R. (ed.), Structure of Language and Its Mathematical Aspects. Proc. of Symposium in Applied Mathematics, AMS (1961), 166–178. 27. Lambek, J. A fixpoint theorem for complete categories. Math. Zeitschr. 103 (1968), 151–161. 28. Lambek, J. and Moerdijk, I. Two sheaf representations of elementary toposes. In The L. E. J. Brouwer centenary symposium, A. S. Troelstra and D. van Dalen (eds.), North-Holland (1982), 275–295. 29. Lambek, J. Categorial and Categorical Grammars. In R.T. Oehrle, E. Bach, and D. Wheeler (eds.), Categorial Grammars and Natural Languages Structures, Reidel, Dordrecht (1988), 297–317.
Introduction
xxxi
30. Lambek, J. Multicategories Revisited. In Categories in Computer Science and Logic, Contemp. Math. 92 (1989), AMS, 217–239. 31. Lambek, J. What is a Deductive System? In What is a Logical System, D. M. Gabbay (ed.), Oxford Studies in Logic and Computation 4, Oxford (1994), 141–159. 32. Lambek, J. If Hamilton Had Prevailed. The Mathematical Intelligencer 17, no. 4 (1995), 7–15. 33. Lambek, J. A production grammar for English kinship terminology. Theoretical Linguistics, vol. 13 (1986), 19–36. 34. Lambek, J. Fixpoints revisited. In Logic at Botik ’89, Springer LNCS 363 (1989), 200-207. 35. Lambek, J. From Categorial Grammar to Bilinear Logic. In Substructural Logics, P. Schroeder-Heister and K. Došen (eds.), Oxford (1993), 207–237. 36. Lambek, J. Bilinear logic in algebra and linguistics. In Advances in Linear Logic, J-Y Girard, Y. Lafont, and L. Regnier (eds.), LMS Lecture Note series 222 (1995), Cambridge, 43–59. 37. Lambek, J. Relations in operational categories. J. Pure & Applied Algebra 116, 1-3, (1997), 221–248. 38. Lambek, J. Diagram chasing in ordered categories with involution. J. Pure & Applied Algebra 143, 1-3 (1999), 293–307. 39. Lambek, J. Relations old and new. In Relational Methods for Computer Science Applications, E. Orłowska, A. Szałas (eds.), Studies in Fuzziness and Soft Computing, vol. 65, Springer-Verlag (2001), 135–147. 40. Lambek, J. The Butterfly and the Serpent. In Logic and Algebra, A. Ursini (ed.), Chap. 7. Routledge (1996). 41. Lambek, J. On the ubiquity of Mal’cev operations. In Proc. International Conference on Algebra: Dedicated to A. I. Mal’cev, Contemp. Math, AMS, vol. 131 (1992) part 1, 135–146. 42. Lambek, J. and Scott, P. An exactification of the monoid of primitive recursive functions. Studia Logica 81 (2005), 1–18. 43. Lambek, J. and Scott, P. J. Introduction to higher order categorical logic. Cambridge University Press (1986). 44. Lambek, J. What are pregroups? In Language and Grammar, C. Casadio, P.J. Scott, R. A. G. Seely (eds.), CSLI Publications (2005), 129–136. 45. Lambek, J. From Word to Sentence. Polimetrica (2008). 46. Lambek, J. From word to sentence: a pregroup analysis of the object pronoun who(m). Journal of Logic, Language and Information 16 (2007), 303–323. 47. Lambek, J. Memories of a Minor Mathematician, Unpublished autobiographical manuscript, obtained from Bernie Lambek, (2018). 48. Mac Lane, S. Categories for the Working Mathematician. Springer (1971). 49. Moortgat, M. Categorial Investigations: Logical and Linguistic Aspects of the Lambek Calculus. Foris (1988). 50. Morrill, G. Type Logical Grammar: Categorial Logic of Signs. Kluwer (1994). 51. Pentus, M. Product-free Lambek calculus and context-free grammars. Journal of Symbolic Logic 62 (1997), 648–660. 52. Pinker, S. The Language Instinct. New York: William Morrow (1994). 53. van Benthem J. The Lambek Calculus. In R.T. Oehrle, E. Bach, and D. Wheeler (eds.), Categorial Grammars and Natural Languages Structures, Reidel, Dordrecht (1988), 35–68.
xxxii
Introduction
Fig. 2 J. Lambek at the University of Warsaw for the Trends in Logic meeting, October, 2003. (Photo courtesy of Aleksandra Kiślak-Malinowska.)
Lambek’s Syntactic Calculus and Noncommutative Variants of Linear Logic: Laws and Proof-Nets V. Michele Abrusci and Claudia Casadio
Abstract This work is devoted to the relations between Lambek’s Syntactic Calculus (LC) and noncommutative variants of Girard’s Linear Logic; in particular the paper will consider: (i) the geometrical representation of the laws of LC by means of proof-nets; (ii) the discovery - due to such a geometrical representation - of some laws of LC not yet considered; (iii) the discussion of possible linguistic uses of these new laws.
1 Formulations of Lambek’s Syntactic Calculus in the framework of noncommutative variants of Linear Logic In this paper, LC will denote original Lambek’s Syntactic Calculus (see [19]) whereas LL will denote Girard’s Linear Logic (see [17]). In this section we consider some formulations of LC in the framework of noncommutative variants of LL, in particular noncommutative variants of the multiplicative fragment of Classical Linear Logic and Intuitionistic Linear Logic are presented. The multiplicative fragment of Classical Linear Logic (shortly Multiplicative Linear Logic, MLL) is the fragmnent of Classical Linear Logic with only the multiplicative connectives (the multiplicative conjunction ⊗, the multiplicative disjunction `); in MLL linear implication is defined by V. Michele Abrusci University “Roma Tre”, Department of Mathematics and Physics, Largo San Leonardo Murialdo 1, Rome; e-mail: [email protected] Claudia Casadio University “G. D’Annunzio” of Chieti and Pescara, Department of Language, Literature and Modern Cultures, Viale Pindaro, 42 - 65127 Pescara; e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_1
1
2
V. M. Abrusci, C. Casadio
A B = A⊥ ` B where ()⊥ is the linear negation. The multiplicative fragment of Intuitionistic Linear Logic (shortly, Multiplicative Intuitionistic Linear Logic: MILL) is obtained from MLL by removing the linear negation ()⊥ (i.e. duality) and the multiplicative disjunction ` and by adding the linear implication as a primitive connective. Noncommutative variants of Classical Linear Logic and Intuitionistic Linear Logic are obtained by removing the full general exchange rule that allows one to disregard the order of the formulas of a sequent (the assumptions and the conclusions of a proof) ([2]): Γ, A, B, Δ Γ, B, A, Δ In this section, we expose some results obtained in the research on the relationships between LC and LL: LC may be considered as the multiplicative fragment of the noncommutative variant of Intuitionistic Linear Logic (§ 1.1), and may be formulated in the multiplicative fragment of two noncommutative variants of Classical Linear Logic (§ 1.2).
1.1 Formulation of LC in the multiplicative fragment of Noncommutative Intuitionistic Linear Logic Noncommutative Intuitionistic Linear Logic, presented and investigated in [1], is the noncommutative variant of Intuitionistic Linear Logic. Let us consider the multiplicative fragment NMILL of Noncommuative Intuitionistic Linear Logic. In NMILL: – the exchange rule is completely removed; – the sequents have the form Γ A where Γ is a finite sequence of occurrences of formulas and A is a formula; – due to the absence of the exchange rule, the multiplicative conjunction ⊗ becomes a noncommutative conjunction, and there are two linear implications: – the linear post-implication, or left implication (i.e. implication from the left), denoted by , such that the following sequent is provable: A, A B B – the linear retro-implication or right implication (i.e. implication from the right), denoted by ◦−, such that the following sequent is provable: B ◦−A, A B Lambek’s Syntactic Calculus (LC) is exactly NMILL, as it is shown in [1], by means of the following translation:
Syntactic Calculus and Linear Logic
3
– the product or residuated operation · of LC is translated into the multiplicative conjunction ⊗ of NMILL; – the left residual operation \ of LC is translated into the linear postimplication −◦ of NMILL; – the right residual operation / of LC is translated into the linear retroimplication ◦− of NMILL; – the partial ordering ≤ of LC is translated into the derivability relation of NMILL, therefore the sequents of LC are translated into sequents of NMILL. Under this translation of LC into NMILL, – – – –
the formulas of LC are exactly the formulas of NMILL, the sequents of LC are exactly the sequents of NMILL; each valid sequent of LC is a provable sequent of NMILL, and vice versa; each valid rule between sequents of LC is a derivable rule of NMILL, and vice versa.
Therefore, we may consider LC exactly as NMILL (the multiplicative fragment of Noncommutative Intuitionistic Linear Logic). In particular, let us consider the following basic laws of LC (cf. [10], pp. 17-19), represented in an algebraic style: 1. Residuation laws a. (RES)
a · b ≤ c iff b ≤ a\c iff a ≤ c/b
2. Monotonicity laws1 a. (MON1.1) if a ≤ b then a · c ≤ b · c (MON1.2) if a ≤ b then c · a ≤ c · b b. (MON2.1) if a ≤ b then c\a ≤ c\b (MON2.2) if a ≤ b then b\c ≤ a\c c. (MON3.1) if a ≤ b then a/c ≤ b/c (MON3.2) if a ≤ b then c/b ≤ c/a 3. Application laws a. (APP1) b. (APP2)
a · a\b ≤ b b/a · a ≤ b
4. Expansion laws a. (EXP1) b. (EXP2)
a ≤ b\(b · a) a ≤ (a · b)/b
5. Type-raising laws a. (TYR1) b. (TYR2)
a ≤ (b/a)\b a ≤ b/(a\b)
1 Monotonicity can also be introduced by rules with two premises, however following [10], we prefer the present formulation.
4
V. M. Abrusci, C. Casadio
6. Composition laws a. (COM1) (a\b) · (b\c) ≤ (a\c) b. (COM2) (a/b) · (b/c) ≤ (a/c) 7. Geach laws b\c ≤ (a\b)\(a\c) a/b ≤ (a/c)/(b/c)
a. (GEA1) b. (GEA2)
8. Switching laws a. (SWI1) (a\b) · c ≤ a\(b · c) b. (SWI2) a · (b/c) ≤ (a · b)/c Under the translation from LC to NMILL, these laws of LC become the following provable sequents and derivable rules of NMILL: 1. Residuation laws a. (RES)
A⊗B C
iff
B A−◦C
iff
A C ◦− B
2. Monotonicity laws a. (MON1.1) a. (MON2.1) b. (MON3.1)
AB A⊗C B⊗C AB C−◦A C−◦B AB A◦−C B◦−C
(MON1.2) (MON2.2) (MON3.2)
3. Application laws a. (APP1) A ⊗ (A−◦B) B b. (APP2) (B◦−A) ⊗ A B 4. Expansion laws a. (EXP1) A B−◦(B ⊗ A) b. (EXP2) A (A ⊗ B)◦−B 5. Type-raising laws a. (TYR1) A (B◦−A)−◦B b. (TYR2) A B◦−(A−◦B) 6. Composition laws a. (COM1) (A−◦ B) ⊗ (B−◦ C) A−◦ C b. (COM2) (A◦−B) ⊗ (B◦−C) A◦−C 7. Geach laws a. (GEA1) B−◦C (A−◦B)−◦(A−◦C)
AB C ⊗AC ⊗B AB B−◦C A−◦C AB C◦−B C◦−A
Syntactic Calculus and Linear Logic
5
b. (GEA2) A◦−B (A◦−C)◦−(B◦−C) 8. Switching laws a. (SWI1) (A−◦B) ⊗ C A−◦(B ⊗ C) b. (SWI2) A ⊗ (B◦−C) (A ⊗ B)◦−C
1.2 Formulation of LC in the multiplicative fragments of Noncommutative Classical Linear Logic and Cyclic Linear Logic Noncommutative Classical Linear Logic (presented and investigated in [2]) and Cyclic Linear Logic (presented and investigated in [29]) are two different noncommutative variants of Classical Linear Logic. Multiplicative fragments of these noncommutative variants of Classical Linear Logic are conservative extensions of NMILL, i.e. conservative extensions of the original Lambek’s Syntactic Calculus LC, as it is proven in [3, 4] The Multiplicative fragment of Noncommutative Classical Linear Logic (denoted by NMLL) has a sequent calculus with the following features: – the sequents have the form Γ where Γ is a finite sequence of occurrences of formulas, – there is no exchange rule; – both the multiplicative conjunction ⊗ and the multiplicative disjunction ` are noncommutative, due to the absence of the exchange rule; – there are two linear negations: – linear post-negation ()⊥ , such that the sequent A⊥ , A holds; – linear retro-negation ⊥ (), such that the sequent A, ⊥A holds; and for each formula A, A⊥ and ⊥A are defined as follows : – if A is an atom, A⊥ and ⊥A are atoms, and A =⊥ (A⊥ ) = (⊥ A)⊥ – (B ⊗ C)⊥ = C ⊥ ` B ⊥ and ⊥ (B ⊗ C) = ⊥ C ` ⊥B – (B ` C)⊥ = C ⊥ ⊗ B ⊥ and ⊥ (B ` C) = ⊥ C ⊗ ⊥B; – for each formula A, A = ⊥(A⊥ ) = (⊥ A)⊥ – the linear post-implication and the linear retro-implication may be defined as follows: – linear post-implication −◦ is defined by: A−◦B = A⊥ ` B, so that A⊥ is equivalent to A−◦ ⊥, – linear retro-implication ◦− is defined by B◦−A = B ` ⊥A, so that ⊥A is equivalent to ⊥ ◦−A; – the cut-elimination theorem holds, when the following additional rules are introduced:
6
V. M. Abrusci, C. Casadio
A, Γ Γ,⊥⊥ A
Γ, A A⊥⊥ , Γ
(these rules allow to move a formula A from the first place to the last place of a sequent by transforming A into ⊥⊥ A, and to move a formula A from the last place to the first one of a sequent by transforming A into A⊥⊥ ). The Multiplicative fragment of Cyclic Linear Logic (denoted by CyMLL) has a sequent calculus with the following features: – the sequents have the form Γ where Γ is a finite sequence of formulas; – the exchange rule is limited to cyclic permutations (move the first formula to the end of the sequence, move the last formula to the top of the sequence): A, Γ Γ, A
Γ, A A, Γ
(observe that this feature may be expressed also by saying that sequents are of the form Γ where Γ is a finite cycle of formulas); – both the multiplicative conjunction ⊗ and the multiplicative disjunction ` are noncommutative, due to the absence of a general exchange rule; – there is only one linear negation ()⊥ , defined as follows in the multiplicative fragment: – if A is an atom, A⊥ is an atom, and A = (A⊥⊥ ), – (B ⊗ C)⊥ = C ⊥ ` B ⊥ , – (B ` C)⊥ = C ⊥ ⊗ B ⊥ , and each formula A is equal to A⊥⊥ ; – the linear post-implication and the linear retro-implication may be defined as follows: – linear post.implication −◦ is defined by: A−◦B = A⊥ ` B, so that A⊥ is equivalent to A−◦ ⊥, – linear retro-implication ◦− is defined by B◦−A = B ` A⊥ , so that ⊥ A is equivalent to ⊥ ◦−A; – the cut-elimination theorem holds. Some remarks are due concerning the sequents of CyMLL, considered as finite cyclic orders of formulas: – given a sequent A1 , · · · , An , B, C1 , · · · , Cm , the linear order of the predecessors of B inside the sequent is An , · · · , A1 , Cm , · · · , C1 ; – a proof of a sequent is, for each formula A of the sequent, the derivation of A from the linear negations of the predecessors of A inside the sequent; so a proof of a sequent A1 , · · · , An , B, C1 , · · · , Cm
Syntactic Calculus and Linear Logic
7
corresponds to the derivation of B from the linear order of the premisses (An )⊥ , · · · , (A1 )⊥ , (Cm )⊥ , · · · , (C1 )⊥ i.e. a proof of (An )⊥ , · · · , (A1 )⊥ , (Cm )⊥ , · · · , (C1 )⊥ B The original Lambek’s Syntactic Calculus (LC) i.e. NMILL is contained in NMLL and in CyMLL, under the following translations of NMILL into NMLL and into CyMLL. The translation ψ of the language of NMILL into the language of NMLL is defined as follows: – if A is an atomic formula of NMILL, ψ(A) = A since A is also an atomic formula of NMLL; – ψ(A ⊗ B) = ψ(A) ⊗ ψ(B) – ψ(A−◦B) = (ψ(A)⊥ ` ψ(B)) – ψ(B◦−A) = (ψ(B) ` ψ(A)⊥ ) – ψ( A) = ψ(A), ψ(B A) = ψ(B)⊥ , ψ(A); and in general ψ(A1 , ..., An A) = ψ(An )⊥ , ..., ψ(A1 )⊥ , ψ(A). Formulas (sequents) of NMLL that are translations of formulas (sequents, resp.) of NMILL will be called L-formulas (L-sequents, resp.) of NMLL. Remark: an L-sequent of NMLL is a sequent where there is exactly one L-formula whereas the other formulas are linear negations of L-formulas. The translation φ of the language of NMILL into the language of CyMLL is defined as follows: – if A is an atomic formula of NMILL, φ(A) = A since A is also an atomic formula of CyMLL; – φ(A ⊗ B) = φ(A) ⊗ φ(B) – φ(A−◦B) = (φ(A)⊥ ` φ(B)) – φ(B◦−A) = (φ(B) ` φ(⊥A)) – φ( A) = φ(A), φ(B A) = φ(B)⊥ , φ(A), and in general φ(A1 , ..., An A) = φ(An )⊥ , ..., φ(A1 )⊥ , φ(A). Formulas (sequents) of CyMLL that are translations of formulas (sequents. resp.) of NMILL will be called L-formulas (L-sequents, resp.) of CyMLL. Remark: an L-sequent of CyMLL is a sequent where there is exactly one L-formula whereas the other formulas are linear negations of L-formulas. Therefore, the class of L-formulas of CyMLL is the subclass of formulas of CyMLL, inductively defined as follows: – if X is an atom, then X is a L-formula; – if A and B are L-formulas, then A ⊗ B, A⊥ ` B i.e. A−◦B and B ` A⊥ i.e. B◦−A, are L-formulas; – no other formula of CyMLL is a L-formula of CyMLL. Moreover the class of formulas of CyMLL which are linear negations of Lformulas of CyMLL is the subclass of formulas of CyMLL inductively defined as follows:
8
V. M. Abrusci, C. Casadio
– if X is an atom, then X ⊥ is the linear negation of a L-formula ; – if A and B are L-formulas, then A⊥ ` B ⊥ i.e. (B ⊗ A)⊥ , B ⊥ ⊗ A i.e. (A −◦B)⊥ and B ⊗ A⊥ i.e. (A ◦−B)⊥ are linear negations of L-formulas ; – no other formula of CyMLL is the linear negation of a L-formula of CyMLL. These translations satisfy the following properties: – if a sequent of NMILL is provabile in NMILL then its ψ-translation is provable in NMLL and its φ-translation is provable in CyMLL; i.e. under these translations both NMLL and CyMLL are extensions of NMLL (and so, extensions of the original Lambek’s Syntactic Calculus LC); – the ψ-translation of a derivable rule of NMILL is a derivable rule of NMLL, and the φ-translation of a derivable rule of NMILL is a derivable rule of CyNLL; – every L-sequent of NMLL provable in NMLL is the ψ-translation of a provable sequent of NMILL, and every L-sequent of CyMLL provable in CyMLL is the φ-translation of a provable sequent of NMILL; i.e. under these translation both NMLL and CyMLL are conservative extensions of NMILL (therefore conservative extensions of LC). We remark that: – each sequent A1 , . . . , An B of NMILL , i.e. LC, is translated into a sequent φ(An )⊥ , . . . , φ(A1 )⊥ , φ(B) of CyMLL; – each sequent of CyMLL, where there is exactly one L-formula and all the other formulas are linear negations of L-formulas, is a L-sequent; in particular, the sequent Cn⊥ , . . . , C1⊥ , D, when C1 , . . . , Cn , D are L-formulas, is a L-sequent. We prefer to use the translation of LC into CyMLL (instead of the translation into NMLL), since CyMLL has a good theory of proof-nets (as we will show in the next section) and can be extended to a general Noncommutative Logic (see [7]) where multiplicative noncommutative connectives live together with mutiplicative commutative connectives. Here, we show how the laws of Lambek’s Syntactic Calculus (LC) exposed above are translated as provable sequents or derivable rules of CyMLL. 1. Residuation laws a. (RES)
(B ⊥ ` A⊥ ), C
iff
B ⊥ , A⊥ ` C
iff
A⊥ , C ` B ⊥
2. Monotonicity laws a. (MON1.1) b. (MON2.1)
A⊥ , B C ⊥ ` A⊥ , B ⊗ C A⊥ B A⊥ ⊗ C, C ⊥ ` B
(MON1.2) (MON2.2)
A⊥ , B A⊥ ` C ⊥ , C ⊗ B A⊥ , B C ⊥ ⊗ B, A⊥ ` C
Syntactic Calculus and Linear Logic
c. (MON3.1)
A⊥ , B C ⊗ A⊥ , B ` C ⊥
(MON3.2)
9
A⊥ , B B ⊗ C ⊥ , C ` A⊥
3. Application laws a. (APP1) (B ⊥ ⊗ A) ` A⊥ , B b. (APP2) A⊥ ` (A ⊗ B ⊥ ), B 4. Expansion laws a. (EXP1) A⊥ , B ⊥ ` (B ⊗ A) b. (EXP2) A⊥ , (A ⊗ B) ` B ⊥ 5. Type-raising laws a. (TYR1) A⊥ , (A ⊗ B ⊥ ) ` B b. (TYR2) A⊥ , B ` (B ⊥ ⊗ A) 6. Composition laws a. (COM1) (C ⊥ ⊗ B) ` (B ⊥ ⊗ A), A⊥ ` C b. (COM2) (C ⊗ B ⊥ ) ` (B ` A⊥ ), A ` C ⊥ 7. Geach laws a. (GEA1) C ⊥ ⊗ B, (B ⊥ ⊗ A) ` (A⊥ ` C) b. (GEA2) B ⊗ A⊥ , (A ` C ⊥ ) ` (C ⊗ B ⊥ ) 8. Switching laws a. (SWI1) C ⊥ ` (B ⊥ ⊗ A), A⊥ ` (B ⊗ C) b. (SWI2) (C ⊗ B ⊥ ) ` A⊥ , (A ⊗ B) ` C ⊥
2 Proof-Nets for Lambek’s Syntactic Calculus In Linear Logic and in Cyclic Linear Logic, each proof of a sequent can be represented - in a strong geometrical view - as a graph called proof-net, and each derivable n-ary rule on sequents may be represented as a rule transforming n proof-nets into another proof-net. Several sequent calculus proofs of the same sequent may be represented by the same proof-net. Thus, since the translations of laws of LC are provable L-sequents or derivable rules concerning L-sequents in CYMLL (a fragment of CyLL), we can represent the proofs of provable L-sequents as proof-nets and the derivable rules concerning L-sequents as rules transforming proof-nets into another proof-net.
10
V. M. Abrusci, C. Casadio
Proof-nets in CyMLL will be called cyclic multiplicative proof-nets (§ 2.1), and cyclic multiplicative proof-nets corresponding to proofs of L-sequents in CyMLL will be called proof-nets for LC (§ 2.2).
2.1 Cyclic multiplicative proof-nets A cyclic multiplicative proof-net (shortly, CyM-PN) is a graph such that: – the nodes are decorated by formulas of CyMLL – the edges are grouped by links and the links are: – the axiom-link, a binary link (i.e. a link with two nodes and one edge) with no premise, in which both nodes are conclusions of the link and each node is decorated by the linear negation of the formula decorating the other one; i.e. the conclusions of an axiom link are decorated by two formulas A, A⊥
A
A⊥
– the cut-link, a binary link with no conclusion, in which both nodes are premises of the link: each node is decorated by the linear negation of the formula decorating the other one; i.e. the premises of a cut link are decorated by two formulas A, A⊥ A A⊥ – the ⊗-link, a ternary link (i.e. a link with three nodes and two edges), where two nodes are premises (the first premise and the second premise) and the other node is the conclusion; there is one edge between the first premise and the conclusion and another edge between the second premise and the conclusion; the conclusion is decorated by the formula A ⊗ B, where A is the formula decorating the first premise and B is the formula decorating the second premise A B bb " " b" A⊗B – the `-link, a ternary link in which two nodes are premises (the first premise and the second premise) and the other node is the conclusion; there is one edge between the first premise and the conclusion and another edge between the second premise and the conclusion; the conclusion is decorated by the formula A ` B, where A is the formula
Syntactic Calculus and Linear Logic
11
decorating the first premise and B is the formula decorating the second premise A B bb "" b" A`B – each node is the premise of at most one link, and is the conclusion of exactly one link; the nodes which are not premises of links are called the conclusions of the proof-net ; – for each “switching”, the graph is acyclic and connected, where a “switching” of the graph is the removal of one edge in each `-link of the graph; – the graph is planar, i. e. the graph may be drawn on the plane with no crossing of edges; – the conclusions of the graph are in a cyclic order, induced by the “trips” inside the proof-net (as defined in [7]); this cyclic order corresponds to the order of the conclusions going from left to right, when the graph is written on the plane and one considers the “rightmost” conclusion before the “leftmost” one. We may represent a CyM-PN π as follows: π A1
···
An
where A1 , . . . , An are the conclusions of π in their cyclic order (A2 is the immediate successor of A1 , An is the immediate successor of An−1 , A1 is the immediate successor of An ). For each conclusion A of a CyM-PN, we may draw the graph on the plane - with no crossing of edges - in such a way that A is the first conclusion going from left to right. For example, we may draw the CyM-PN π considered above in such a way that the first conclusion (from left to right) is A2 and the last conclusion is A1 , i.e. : π A2
···
An
A1
A CyM-PN is cut-free iff it contains no cut-link. An important theorem (cut-elimination theorem or normalization theorem for CyM-PN) states that every CyM-PN can be transformed in a cut-free CyM-PN with the same conclusions and the same cyclic order of the conclusions. We may therefore restrict our attention to cut-free CyM-PN. Other important theorems state the relationship between CyM-PN’s and the sequent calculus for CyMLL. CyM-PN’s are geometrical representations of proofs in the sequent calculus for CyMLL, as a consequence of the following results:
12
V. M. Abrusci, C. Casadio
– every proof of a sequent Γ in the sequent calculus for CyMLL can be transformed in a CyM-PN with conclusions Γ ; – every CyM-PN with conclusions Γ can be considered as the CyM-PN coming from a proof of the sequent Γ in the sequent calculus for CyMLL. Multiplicative disjunction ` is a reversible connective in CyMLL (as well as in LL) i.e.: – the sequent Γ , A`B, Δ is provable in sequent calculus for CyMLL iff Γ , A, B, Δ is provable in the sequent calculus for CyMLL; – π is a CyM-PN with conclusions Γ , A`B, Δ iff the graph obtained from π by deleting the terminal `-link with conclusion A ` B is a CyM-PN with conclusions Γ , A, B, Δ. Therefore, given a proof net π with conclusions Γ , A, B, Δ, the sequence Γ , A ` B, Δ may be considered another way to read the conclusions of π.
2.2 Proof-nets for LC To every provable sequent of NMILL, i.e. for every provable sequent of LC, we may associate a geometrical representation of its proof, i.e. the CyM-PN of the corresponding L-sequent of CyMLL. The CyM-PN associated to proofs of L-sequents of CyMLL may be considered as proof-nets for LC, i.e. proof-nets corresponding to proofs in the original Lamberk’s Syntactic Calculus. Indeed, as proven in [3, 4] , when A1 , . . . , An , B are formulas of NMILL i.e. of LC): – each proof in NMILL (in LC) of a sequent A1 , . . . , An B may be transformed into a proof in CyMLL of the corresponding sequent (via φ-translation) φ(An )⊥ , . . . , φ(A1 )⊥ , φ(B), and therefore there is a CyM-PN which is the geometrical representation of such a proof; – each proof in CyM-LL of the sequent φ(An )⊥ , . . . , φ(A1 )⊥ , φ(B) may be transformed into a proof in NMILL (LC) of the sequent A1 , . . . , An B, and the CyM-PN which is the geometrical representation of the proof in CyMLL of
Syntactic Calculus and Linear Logic
13
φ(An )⊥ , . . . , φ(A1 )⊥ , φ(B) is also the geometrical representation of a proof in NMILL (LC) of A1 , . . . , An B. From these results, it is easy to see that a CyM-PN π is a proof-net for LC (i.e. a geometrical representation of a proof in the original Lambek’s Syntactic Calculus) when: – there is exactly one conclusion A of π which is a L-formula and all the other conclusions of π are negations of L-formulas; – no point of the graph π is labelled by formulas C ` D or C ⊥ ⊗ D⊥ , where C and D are L-formulas. Indeed, LC is an intuitionistic fragment of CyMLL. The conditions required for being a proof-net for LC, that is a geometrical representation of a proof in NMILL, reflect the features of intuitionistic systems: in each intuitionistic proof – there is only one conclusion, whereas the number of hypotheses is an arbitrary natural number; – there is no duality, therefore there is no way to consider a conclusion as a hypothesis and a hypothesis as a conclusion, no way to change the role of a formula (hypothesis vs. conclusion). Thus, each CyM-PN representing a proof π in LC must have just one conclusion which is a L-formula (and this L-formula is the translation of the conclusion of the proof π in LC), whereas all the other conclusions are linear negations of L-formulas (and these L-formulas are the hypotheses of the proof π in LC).
3 Geometrical formulation of laws of LC through Proof-Nets for LC In this section we briefly introduce the geometrical representation - by means of cyclic multiplicative proof-nets, more precisely by means of proof-nets for LC - of the laws of original Lambek’s Syntactic Calculus (LC) given in (§1) (for the full exposition see [3, 6]). We remark that these laws can be translated into provable L-sequents of CyMLL or into derivable rules on L-sequents of CyMLL. Precisely: – when a law of LC is translated into a provable L-sequent S of CyMLL, the geometrical formulation of the law through proof-nets for LC is given by a CyM-PN of S. – when a law of LC is translated into a derived rule on L-sequents of CyMLL, the geometrical formulation of the law through proof-nets for LC is given by a rule on CyM-PN corresponding to the derived rule on L-sequents.
14
V. M. Abrusci, C. Casadio
In this way, we discover that – several laws of LC - usually considered as different laws - receive essentially the same geometrical representation (cf. §3.2, §3.3) – other laws of LC - not yet considered in the Literature - receive the same geometrical representation of other well-known laws of LC (cf. §4). Therefore, such a geometrical approach appears to be more unitary and useful for the development of our understanding of LC.
3.1 Geometrical representation of Monotonicity laws Let’s now consider the geometrical representation of Monotonicity laws, which are unary derivable rules of LC. Each Monotonicity law is a rule with a L-sequent A B as premise and a L-sequent E F as conclusion, and states the existence in LC of a proof of the L-sequent E F from the existence in LC of a proof of the L-sequent A B. Thus, to get a representation of a Monotonicity rule with premise A B and conclusion E F , we have to consider how one gets a CyMPN representing a proof in LC of the conclusion E F from a CyM-PN representing a proof in LC of the premise A B. A CyM-PN π representing a proof in LC of the premise A B must have two conclusions, A⊥ and B, where A and B are L-formulas
π A⊥
π
= B
B
A⊥
and each Monotonicity law with conclusion E F and premise A B states that such a CyM-PN π may be transformed in another CyM-PN with conclusions E ⊥ , F . The full geometrical representation of Monotonicity Laws is given in [6], where it is showed that, given a CyM-PN π corresponding to a proof of A B, all the CyM-PN’s corresponding to the conclusions of some Monotonicity law with premise A B belong to the class MON(π, A⊥ , B, C) of all the graphs obtained as follows: – take one Axiom link (Ax) with conclusions C, C ⊥ and the CyM-PN π with conclusions A⊥ , B, – then connect one of the conclusions of Ax with one of the conclusions of π by means of a ⊗-link, – finally, connect the other conclusion of π and the other conclusion of Ax by means of a `-link.
Syntactic Calculus and Linear Logic
15
MON-L(π, A⊥ , B, C) is the subclass of MON(π, A⊥ , B, C) containing the graphs where one of the conclusions is a L-formula and the other conclusion is the linear negation of a L-formula. If π is a proof-net for LC, then the elements of MON-L(π, A⊥ , B, C) are all the elements of MON(π, A⊥ , B, C) which are proof-nets for LC. In order to explore the elements of MON(π, A⊥ , B, C) and MON-L(π, A⊥ , B, C) it is useful to use the following configuration:
π δ
α
β
γ
where δ, γ are the conclusions of the axiom C, C ⊥ (i.e. δ = C and γ = C ⊥ , or δ = C ⊥ and γ = C) and α, β are the conclusions of π (i.e. α = B and β = A⊥ , or α = A⊥ and β = B) . We remark that in this configuration the axiom C C wraps itself around the CyM-PN which is the geometrical representation of the proof in LC of the sequent A B. The following theorem –stated in [6]– shows that MON-L(π, A⊥ , B, C) are the geometrical representations of the six Monotonicity rules with premise A B. Theorem 1 The eight elements of MON(π, A⊥ , B, C) are: 1. six CyM-PN’s which belong to MON-L(π, A⊥ , B, C) and are geometrical representations of proofs in LC, corresponding to the six Monotonicity laws. 2. two CyM-PN’s which do not belong to MON-L(π, A⊥ , B, C) i.e. are not representations of proofs in LC. We present two elements of MON(π, A⊥ , B, C): the first one belongs to MON-L(π, A⊥ , B, C) and corresponds to the first Monotonicity law (since one of the conclusion is a L-formula, the other conclusion is the linear negation of a L-formula), the second one does not belong to MON-L(π, A⊥ , B, C) (since no conclusion is a L-formula, and moreover no conclusion is the linear negation of a L-formula). Example (1): MON1.1
Example (2)*
π C⊥
\
A⊥
/
C ⊥ `A⊥
= (A ⊗ C)⊥
π B
\
C
/
B⊗C
C⊥
\
A⊥
/
C ⊥ ⊗A⊥
B
\
C
/
B`C
It is easy to see that in each Monotonicity law there is a transition from a graph π with two conclusions to another graph with three conclusions (by
16
V. M. Abrusci, C. Casadio
linking one of the conclusions of π with one of the conclusions of an axiom link by means a ⊗-link), and each Monotonicity law is a way to see a graph with three conclusions as a graph with two conclusions (by linking two conclusions by means of a `-link, i.e. by using the link of a reversible connective).
3.2 Geometrical Representation of Application Laws, Expansion Laws and Type-raising Laws Let’s now consider the geometrical representation of Application laws, Expansion laws, and Type-raising laws by showing the CyM-PNs which correspond to a proof of each law, as presented in [6]. The CyM-PN’s which are the geometrical representations of these different rules belong to a well defined class of graphs ID-L(A, B). Firstly, we define the class of graphs ID(A, B)= (IDA × IDB ) ∪ (IDB × IDA ), where: – A is a L-formula and IDA is the axiom link with conclusions A⊥ and A (corresponding to the axiom A A of the sequent calculus of LC); – B is a L-formula and IDB is the axiom link with conclusions B ⊥ and B (corresponding to the axiom B B of the sequent calculus of LC); – each graph belonging to IDA × IDB is obtained from IDA and IDB as follows: – firstly, connect – by means of a ⊗-link – one conclusion of IDA (first premise) with one conclusion of IDB (second premise), so that one obtains a CyM-PN with 3 conclusions in a cyclic order; – secondly, connect – by means of a `-link – the conclusion of the ⊗-link with one of the other conclusions, by respecting the cyclic order of the conclusions, so that one obtains a CyM-PN with 2 conclusions. – each graph belonging to IDB × IDA is obtained from IDA and IDB as follows: – firstly, connect – by means of a ⊗-link – one conclusion of IDB (first premise) with one conclusion of IDA (second premise), so that one obtains a CyM-PN with 3 conclusions in a cyclic order; – secondly, connect – by means of a `-link – the conclusion of the ⊗-link with one of the other conclusions, by respecting the cyclic order of the conclusions, so that one obtains a CyM-PN with 2 conclusions. The graphs belonging to ID(A, B) have one of the following forms:
α JJ
β \ J
γ /
β⊗γ
J
/ α ` (β ⊗ γ)
δ
α
β \
γ /
β⊗γ
\
(β ⊗ γ) ` δ
δ
Syntactic Calculus and Linear Logic
17
where one of the following cases occurs: – α, β are A, A⊥ or A⊥ ,A and γ, δ are B, B ⊥ or B ⊥ ,B – α, β are B, B ⊥ or B ⊥ ,B and γ, δ are A, A⊥ or A⊥ ,A It is easy to see that ID(A, B) has 12 elements and all the elements of ID(A, B) are CyM-PN’s. Now, we consider a subclass ID-L(A, B) of ID(A, B), defined as the class of the CyM-PN’s which belong to ID(A, B) and satisfy the following requirement: no conclusions A⊥ ⊗ B ⊥ or B ⊥ ⊗ A⊥ are admitted. In [6], it is proved that all the elements of ID-L(A, B) are CyM-PN’s which are proof-nets for LC, i.e. geometrical representations of proofs in LC. The following theorem – proved in [6] – states that the elements of ID-L(A, B) are the geometrical representations of Application laws, Expansion laws, Type-raising lawsof LC . Theorem 2 The 16 elements of ID(A, B) are: 1. the twelve elements of ID-L(A, B), i.e. – four CyM-PN’s (two belonging to IDA × IDB , two belonging to IDB × IDA ) which correspond to the proofs in LC of the four Application laws (i.e. two Application laws with B at the left-side of the sequent , and two Application laws with A at the left side of the sequent), – four CyM-PN’s (two belonging to IDA × IDB , two belonging to IDB × IDA ) which correspond to the proofs in LC of the four Expansion laws, – four CyM-PN’s (two belonging to IDA × IDB , two belonging to IDB × IDA ) which correspond to the four Type-raising laws (i.e. two Typeraising laws with A at the left-side of the sequent, and two Type-raising laws with B at the left side of the sequent). 2. other four elements which are CyM-PNs not corresponding to proofs in LC. We present four examples of elements of ID(A, B): the first two and the last one are elements of ID-L(A, B) and are the geometrical representations of the LC laws (APP 2), (EXP1.2) and (TYR2) respectively, whereas the third example is a CyM-PN which is not a proof-net for LC.
Case (1): (APP 2)
A⊥ JJ
B⊥ /
A \ J
A ⊗ B⊥
J
/
A⊥ ` (A ⊗ B ⊥ )
Case (2): (EXP1.2)
B
A⊥ JJ
A \ J
B /
A⊗B
J
/
A⊥ ` (A ⊗ B)
B⊥
18
V. M. Abrusci, C. Casadio
*Case (3)
A JJ
Case (4): (TYR2)
A⊥ B ⊥ \ / J
A⊥ ⊗ B ⊥
J
/
A ` (A⊥ ⊗ B ⊥ )
B
B JJ
B⊥ A \ / J
A⊥
B⊥ ⊗ A
J
/
B ` (B ⊥ ⊗ A)
Precisely: - in Case (1), the sequence of conlusions ce A⊥ ` (A ⊗ B ⊥ ), B represents the sequent (B◦−A) ⊗ A B, corresponding to the law b/a · a ≤ b (APP2) in LC; - in Case (2), the sequence of conclusions A⊥ ` (A ⊗ B), B ⊥ represents the sequent A (A ⊗ B)◦−B corresponding to the law a ≤ (a · b)/b (EXP2) in LC; - in Case (4), the sequence of conclusions B ` (B ⊥ ⊗ A), A⊥ represents the sequent A B◦−(A−◦B) corresponding to the law a ≤ b/(a\b) (TYR2) in LC (see § 1.1 above).
3.3 Geometrical representation of Composition Laws, Geach Laws and Switching Laws The CyM-PN’s which are the geometrical representations of the proofs in LC of Composition laws, Geach laws and Switching laws are graphs obtained from graphs belonging to the same class: SYL(C, B, A), as stated in [6]. The class of graphs SYL(C, B, A) has the following features: – A, B, C are L-formulas of CyMLL; – each graph of this class is constructed by starting from three Axiom links (IDA , IDB , IDC ) as follows: there is a ⊗-link whose premises are one of the conclusions of IDC and one of the conclusions of IDB , there is a ⊗-link whose premises are the other conclusion of IDB and one of the conclusions of IDA – the four conclusions of the graph are the conclusions of the two ⊗-links, one of the conclusions of IDA and one of the conclusions of IDC . It is easy to see that each element of SYL(C, B, A) is a CyM-PN. As it is shown in [6], when α, α are the conclusions of IDA , β, β are the conclusions of IDB and γ, γ are the conclusions of IDC , each element of SYL(C, B, A) belongs to one of the following cases: – Case 1: one of the conclusions of IDC is the first premise of a ⊗-link, and one of the conclusions of IDA is the second premise of a ⊗-link
Syntactic Calculus and Linear Logic
IDC
IDB γ b b
γ
" " b" γ ⊗β
19
IDA β b b
β
α
α " "
b" β ⊗ α
IDC
i.e. γ
IDB b
"
β
IDA
β b
" b" β ⊗ α
b" γ ⊗ β
α
α
γ
– Case 2: one of the conclusions of IDC is the first premise of a ⊗-link, and one of the conclusions of IDA is the first premise of a ⊗-link: IDC
IDB IDA
IDC
γ β b " b" γ ⊗ β
γ
IDB IDA
=
α β b " b" α ⊗ β
γ β b " b" γ ⊗ β
α
β b " b" α ⊗ β
α
γ
– Case 3: one of the conclusions of IDC is the second premise of a ⊗-link, and one of the conclusions of IDA is the second premise of a ⊗-link: IDA
IDB IDC
β γ b " b" β ⊗ γ
IDB
IDA
γ
β α b " b" β ⊗ α
α
IDC
= α
β γ b " b" β ⊗ γ
γ
β α b " b" β ⊗ α
IDC
=
IDB IDA
γ
β α b " b" β ⊗ α
α
γ b " b" β ⊗ γ
β
20
V. M. Abrusci, C. Casadio
These three cases are related as follows to Composition laws, Geach laws and Switching laws: – the CyM-PN’s which are the geometrical representations of the proofs of Composition laws in LC are graphs obtained from graphs of SYL(C, B, A) belonging to the case 1; – the CyM-PN’s which are the geometrical representations of the proofs of Geach laws in LC are graphs obtained from graphs of SYL(C, B, A) belonging to the case 1; – the CyM-PN’s which are the geometrical representations of the proofs of Switching laws in LC are graphs obtained from graphs of SYL(C, B, A) belonging to the cases 2 and 3.
3.3.1 Composition Laws The class of graphs COM(C, B, A) is the class of graphs obtained from a graph belonging to case 1 of SYL(C, B, A) by adding: – a `-link where the first premise is the conclusion of the ⊗-link connecting one of the conclusions of ID(C ) and the second premise is the conclusion of the ⊗-link connecting one of the conclusions of ID(A ) – a `- link whose first premise is the other conclusion of ID(A ) and the second premise is the other conclusion of ID(C ). Each graph belonging to COM(C, B, A) is a CyM-PN and looks as follows:
γ bb
""
b" ⊗
β bb
β
bb
b
b
b" `
"
"
""
""
α
b" ⊗
α bb
γ ""
b" `
where α, α are the conclusions of IDA , β, β are the conclusions of IDB and γ, γ are the conclusions of IDC . COM-L(C, B, A) is the subclass of COM(C, B, A), whose elements satisfy the following requirement: no ⊗-link has the formula C ⊥ ⊗B ⊥ or the formula B ⊥ ⊗ A⊥ as conclusion. The following theorem - proved in [6](pp. 504-506) states that the elements of COM-L(C, B, A) are geometrical representations of the proofs in LC of Composition laws, and so they are proof-nets for LC. Theorem 3 The eight elements of COM(C, B, A) are:
Syntactic Calculus and Linear Logic
21
1. the four elements of COM-L(C, B, A) which are CyM-PN’s and correspond to proofs in L of the two Composition laws ((COM1) and (COM2)) and of two other laws ((COM*1) and (COM*2)), 2. four CyM-PN’s which do not correspond to proofs in L. Therefore, by means of CyM-PN’s belonging to COM-L(C, B, A), we get geometrical representations of proofs in LC of the following sequents which have to be considered as Composition laws: – the law (COM1) (A −◦ B) ⊗ (B −◦ C) A −◦ C, – the law (COM2) (A ◦−B) ⊗ (B ◦−C) A ◦−C, – two other laws strictly related to the previous ones, at least from a geometrical point of view: – (COM*1) C ⊗ A C ⊗ B ◦−(A −◦ B), – (COM*2) C ⊗ A (B ◦−C) −◦ B ⊗ A. (COM*1)
C bb
b" ⊗
B⊥ bb
B
"" bb
b
b
"
b" `
""
"" b" ⊗
A
A⊥ bb
b" `
""
C⊥
"
(COM*2)
C bb
""
b" ⊗
B⊥
bb
b
b
B bb
"
b" `
""
"" b" ⊗
A
A⊥ bb
b" `
""
"
3.3.2 Geach Laws We introduce a class of graphs GEA(C, B, A), defined as follows: – GEA(C, B, A) = GEA1(C, B, A) ∪ GEA2(C, B, A) – A graph belongs to GEA1(C, B, A) iff it is constructed by starting from a graph belonging to SYL(C, B, A) - case 1 and by adding:
C⊥
22
V. M. Abrusci, C. Casadio
– a `- link whose first premise is the conclusion of ID(A ) which is not the premise of a ⊗- link and the second premise is the conclusion of ID(C ) which is not the premise of a ⊗-link; – a `-link where the first premise is the conclusion of the ⊗-link connecting one of the conclusions of ID(B ) and the second premise is the conclusion of the `-link above described. – A graph belongs to GEA2(C, B, A) iff it is constructed by starting from a graph belonging to SYL(C, B, A) - case 1 and by adding: – a `- link whose first premise is the conclusion of ID(A ) which is not the premise of a ⊗- link and the second premise is the conclusion of ID(C ) which is not the premise of a ⊗-link; – a `-link where the first premise is he conclusion of the `-link above described and the second premise is the conclusion of the ⊗-link connecting one of the conclusions of ID(B ). We remark that, when α, α are the conclusions of IDA , β, β are the conclusions of IDB and γ, γ are the conclusions of IDC , – each graph belonging to GEA1(C, B, A) looks as follows:
γ
β " "
bb
b" ⊗
β bb
α bb
α " " b" ⊗ b
b
b
b" ` – each graph belonging to GEA2(C, B, A) looks as follows:
β
bb
b" ⊗
α ""
α bb
" "
b" `
γ ""
"
b" `
b
γ bb b
b
"
b" `
"
"
γ " "
b" ⊗
β " "
It is easy to see that every graph belonging to GEA(C, B, A) is a CyM-PN. GEA-L(C, B, A) is the subclass of GEA(C, B, A), whose elements satisfy the following requirement: no ⊗-link has as conclusion the formula C ⊥ ⊗ B ⊥ or the formula B ⊥ ⊗ A⊥ .
Syntactic Calculus and Linear Logic
23
The following theorem – proved in [6] – says that the elements of GEAL(C, B, A) are geometrical representations of Geach laws. Theorem 4 The elements of GEA(C, B, A) (eight elements of GEA1(C, B, A) and eight elements of GEA2(C, B, A)) are: 1. the eight elements of GEA-L(C, B, A) (four inside GEA1(C, B, A), four inside GEA2(C, B, A)) which correspond to proofs in LC of the two Geach laws ((GEA1) and (GEA2)) and six other laws strictly related to Geach laws, 2. eight CyM-PN’s which do not correspond to proofs in LC. As shown in [6](pp. 508-509), by means of CyM-PN’s belonging to GEAL(C, B, A), we get geometrical representations of proofs in LC of the following sequents which have to be considered as Geach laws: – the law (GEA1) B −◦C (A −◦B) −◦(A −◦C), – the law (GEA2) A ◦−B (A ◦−C) ◦−(B ◦−C), – six other laws strictly related to the previous ones, at least from a geometrical point of view: – – – – – –
(GEA1*1) (GEA1*2) (GEA1*3) (GEA2*1) (GEA2*2) (GEA2*3)
B ◦−C (A ◦−B) −◦ (A ◦−C), B ◦−C (B ⊗ A) ◦−(C ⊗ A), (C ⊗ A) ⊗ (A −◦ B) C ⊗ B, A −◦ B (A −◦ C) ◦−(B −◦ C), A −◦ B C ⊗ A −◦ C ⊗ B, (B ◦−C) ⊗ (C ⊗ A) B ⊗ A.
3.3.3 Switching laws We introduce a class of graphs SWI(C, B, A), defined as follows: – SWI(C, B, A) = SWI1(C, B, A) ∪ SWI2(C, B, A) – A graph belongs to SWI1(C, B, A) iff it is constructed by starting from a graph belonging to SYL(C, B, A) - case 3 and by adding: – a `- link whose first premise is the conclusion of ID(C ) which is not the premise of a ⊗- link and the second premise, – a `-link where the first premise is the conclusion of the ID(A ) which is not the premise of a ⊗-link, and the second premise is the conclusion of the ⊗-link connecting one of the conclusions of ID(C ) – A graph belongs to SWI2(C, B, A) iff it is constructed by starting from a graph belonging to SYL(C, B, A) - case 2 and by adding: – a `- link whose first premise is the conclusion of the ⊗-link connecting one of the conclusions of ID(C ) and the second premise is the conclusion of ID(A ) which is not the premise of a ⊗-link;
24
V. M. Abrusci, C. Casadio
– a `-link where the first premise is the conclusion of the ⊗-link connecting one of the conclusions of ID(A ) and the second premise is the conclusion of ID(C ) which is not the premise of a ⊗-link. We remark that, when α, α are the conclusions of IDA , β, β are the conclusions of IDB and γ, γ are the conclusions of IDC – each graph belonging to SWI1(C, B, A) looks as follows:
β α b " b" ⊗ b b b"" `
γ
α
γ b " b" ⊗ b b b"" ` β
– each graph belonging to SWI2(C, B, A) looks as follows:
γ β α b " b" # ⊗ bb## `
α β γ b " b" ⊗ " bb"" `
It is easy to see that every graph belonging to SWI(C, B, A) is a CyM-PN. SWI-L(C, B, A) is the subclass of SWI(C, B, A), whose elements are graphs where no ⊗-link has as conclusion one of these formulas: C ⊥ ⊗ B ⊥ , B ⊥ ⊗ A⊥ , A⊥ ⊗ B ⊥ , B ⊥ ⊗ C ⊥ . The following theorem - proved in [6](pp. 513-517) - states that SWIL(C, B, A) gives the geometrical representations of Switching laws. Theorem 5 The elements of SWI(C, B, A) (the eight elements of SWI1(C, B, A) and the eight elements of SWI2(C, B, A)) are: 1. the eight elements of SWI-L(C, B, A) (four inside SWI1(C, B, A) and four inside SWI2(C, B, A)) which correspond to proofs in LC of the Switching laws (SWI1) and (SWI2) (given in two different formulations), and to the proofs in LC of four sequents strictly related to Switching laws; 2. eight CyM-PN which do not correspond to proofs in LC. By means of the CyM-PN’s belonging to SWI-L(C, B, A), we get the geometrical representations of proofs in LC of the following sequents that are all forms of Switching laws:
Syntactic Calculus and Linear Logic
25
– the law (SWI1) (A −◦ B) ⊗ C A −◦ (B ⊗ C), – the law (SWI2) A ⊗ (B ◦− C) (A ⊗ B) ◦− C, – two other laws strictly related to the previous ones, at least from a geometrical point of view: – (SWI1*1) (A ◦− B) ⊗ C A ◦− (C−◦ B), – (SWI2*1) A ⊗ (B −◦ C) (B ◦− A) −◦ C.
4 New laws emerged from proof-nets for LC and their linguistic applications We have seen that a group of new laws emerge from the geometrical representation of well known laws of LC; these new laws can be written in the language of LC and are derivable in LC; in this section we analyze their content and consider some linguistic application. All these new laws belong to the family of Syllogisms and are listed below with their translations in the usual notation of the Syntactic Calculus LC: – within the class COM-L(C, B, A), i.e. as strictly related to Composition laws, the following new laws emerge: (COM*1) C ⊗ A (C ⊗ B)◦−(A−◦B) i.e. C · A (C · B)/(A\B) (COM*2) C ⊗ A (B◦−C)−◦(B ⊗ A) i.e. C · A (B/C)\(B · A) – within the class GEA-L(C, B, A), i.e. as strictly related to Geach laws. the following new laws emerge: (GEA1*1) B◦−C (A◦−B)−◦(A◦−C) i.e. B/C (A/B)\(A/C) (GEA1*2) B◦−C (B ⊗ A)◦−(C ⊗ A) i.e. B/C (B · A)/(C · A) (GEA1*3) (C ⊗ A) ⊗ (A−◦ B) C ⊗ B i.e. (C · A) · (A\B) (C · B) (GEA2*1) A−◦ B (A−◦ C)◦−(B−◦ C) i.e. A\B (A\C)/(B\C) (GEA2*2) A−◦ B (C ⊗ A)−◦(C ⊗ B) i.e. A\B (C · A)\(C · B) (GEA2*3) (B◦−C) ⊗ (C ⊗ A) B ⊗ A i.e. B/C · (C · A) (B · A) – within the class SWI-L(C, B, A), i.e. as strioctly related to Switching laws, the following new laws emerge: (SWI1*1) (A◦−B) ⊗ C A◦−(C−◦B) i.e. (A/B) · C A/(C\B) (SWI1*2) A ⊗ (B−◦C) (B◦−A)−◦C i.e. A · (B\C) (B/A)\C From a general point of view, we observe that – the new laws listed above emerge from the geometrical formulation, by means of the proof-nets for LC, of the proofs in LC of the well-known laws of Lambek Syntactic Calculus (Composition Laws, Geach Laws, Switching Laws) related to the Syllogisms;
26
V. M. Abrusci, C. Casadio
– a crucial role on this respect is played, in these new laws, by the multiplicative conjunction “⊗” that in its natural interpretation is the operator of concatenation (cf. [14], [20], [4]); this connective occurs in the premise and/or in the conclusion of these rules (with the only exception of (GEA1*1) and (GEA2*1); – in the laws ((GEA1*1) and (GEA2*1) the premise contains as main connective one implication and as conclusion the other implication. It’s worth reminding that all these new laws belong to LC, and are provable in it, but for various reasons they have not been taken into consideration in the literature, probably because the formulation of logical types for linguistic categories has paid fundamental attention to the functor-argument distinction expressed by implicational types such as A/B, B\A, respectively A◦−B, B−◦A in CyMLL. We can take advantage of these new laws to look at natural languages from a different point of view with respect to the usual function-argument analysis.
4.1 Laws of Composition The new laws belonging to COM-L(C, B, A) are the following: (COM*1) (COM*2)
C ⊗ A (C ⊗ B)◦−(A−◦B) C ⊗ A (B◦−C)−◦(B ⊗ A)
These laws show the role played by the multiplicative conjunction “⊗”: in both rules you have, on the left side of the sequent, two formulas (types) C, A connected by the multiplicative conjunction “⊗” that, on the right side of the sequent, are put into relation with a third formula (type) B by means of the two linear implications “◦−”, “−◦”, respectively. We can read these laws as two different kinds of inferences. On the first reading, the law (COM*1) is considered as an inference on objects; the law says that, when we have a concatenation of an object of type C followed by an object of type A (i.e. an object of type C ⊗ A), and we add on the right side an object of type A−◦B, one produces an object of type C ⊗ B, i.e. the concatenation of an object of type C followed by an object of type B. C ⊗ A, A−◦B (COM ∗ 1)(a) C ⊗B On this reading, the law (COM*1) can be considered as a sort of inclusion of types of objects: the law says that every object of type C ⊗ A becomes an object of type C ⊗ B, given A−◦B; that is, if one knows that A−◦B then, by taking an object of type C ⊗ A one obtains also an object of type C ⊗ B; e.g. the syllogism If some apple is red, and red implies coloured, then some apple is coloured
Syntactic Calculus and Linear Logic
27
apple ⊗ red , red−◦coloured apple ⊗ coloured On the second reading, the law (COM*1) is considered as an inference assigning to an object of type C ⊗A the more complex type C ⊗ B ◦−(A−◦B) C ⊗A (COM ∗ 1)(b) C ⊗ B ◦−(A−◦B) The law (COM*2) can be considered in an analogous way, involving the same process in the other direction. We can think at these laws as means for combining independent parts of speech: given an arbitrary phrase or text another phrase or text, occuring behind it (rule (COM*1)) or in front of it (rule (COM*2)), can be combined with it by means of the multiplicative conjunction ⊗.
4.1.1 Examples with COM* rules The laws (COM*1), (COM*2), as composition rules, represent means to analyze sequences of words of a language and tell what they are. They say that composition, interpreted as concatenation, is a kind of inferential process. We can use these laws to analyze fragments of a language by taking two parts whatsoever that one can individuate within the flowing of discourse: they can be incomplete phrases or incomplete parts of sentences. Let’s take C and A as being two parts of a context and connect them with the multiplicative conjunction C ⊗ A:
C
⊗
A
For example, let C be the string of Italian words Giovanni cammina, made of the subject noun Giovanni, of type N , and the intransitive verb cammina (is walking) of type N −◦S, and let A be the noun Antonio, of type N ; we can combine them with ⊗ and obtain the sequence: Giovanni cammina S
⊗
Antonio N
N, N −◦S S ; where Giovanni cammina takes type S by functional application then, let’s take the string made of the intransitive verb corre (is running) of type N −◦S, and put it after the string C ⊗ A:
28
V. M. Abrusci, C. Casadio
Giovanni cammina
⊗
corre
Antonio
N −◦S
N
S
we have an instance of the rule (COM*1) in the reading: C ⊗ A, A−◦B C ⊗B
S ⊗ N, N −◦S S⊗S
that is
The result is a context with two propositions of type S obtained by combining independent parts of type S and N , applied to N −◦S. Giovanni cammina.
⊗ Antonio corre.
S
S
The effect of the law (COM*1) considered as inference rule can be so represented: C ⊗ A
A−◦B
;
C ⊗B
The inferential reading of the (COM*2) law follows a similar pattern, but in the other direction. From the formulation of the rule: (COM*2)
C ⊗ A (B◦−C)−◦(B ⊗ A)
one obtains the two readings (COM*2)(a) and (COM*2)(b): B◦−C, C ⊗ A (COM ∗ 2)(a) B⊗A
C ⊗A (COM ∗ 2)(b) (B◦−C)−◦(B ⊗ A)
where the reading (COM*2)(a), may be so exemplified: Spesso B◦−C
Giovanni cammina.
⊗ Antonio
C
A
that by assigning linguistic types becomes: Spesso S◦−S
Giovanni cammina. Antonio S
⊗
N
where by applying the rule (COM*2)(a) and by functional application S◦−S, S S one obtains:
Syntactic Calculus and Linear Logic
29
Spesso Giovanni cammina. Antonio S
⊗
N
One can put together the two rules (COM*2)(a) and (COM*1)(a) to obtain a longer context with the intransitive verb phrase invece corre (instead is running) of type N −◦S: Spesso
Giovanni cammina. Antonio
invece corre
⊗
N −◦S
S
N
S ⊗ N, N −◦S S⊗S one obtains and by applying the instance of rule (COM*1)(a): the context with two sentences: Spesso
Giovanni cammina.
Antonio invece corre.
⊗
S
S
A parallel example in English is the following: Usually John likes reading.
Bill instead likes running.
⊗
S
S
where the rule (COM*2)(a) has been applied to the following independent parts of speech: Usually
John likes reading.
S◦−S
S
⊗
Bill N
then rule (COM*1)(a) is applied: Usually John likes reading. S
⊗
Bill
instead likes running.
N
N −◦S
30
V. M. Abrusci, C. Casadio
4.2 Laws related to Geach rules We have found six new laws strictly related from a geometrical point of view to the original Geach laws: (GEA1*1) B◦−C (A◦−B)−◦(A◦−C) (GEA1*2) B◦−C (B ⊗ A)◦−(C ⊗ A) (GEA1*3) (C ⊗ A) ⊗ (A−◦ B) C ⊗ B (GEA2*1) A−◦ B (A−◦ C)◦−(B−◦ C) (GEA2*2) A−◦ B (C ⊗ A)−◦(C ⊗ B) (GEA2*3) (B◦−C) ⊗ (C ⊗ A) B ⊗ A The laws (GEA1*1) and (GEA2*1) involve the two linear implications “◦−”, “−◦”: they introduce an inference expanding a single implicational type B◦−C (in (GEA1*1)), A−◦ B (in (GEA2*1)), into a complex type in which opposite implications occur with respect to the original Geach laws (cf. laws (7a), (7b) in § 1.1): B◦−C (GEA1∗ 1) (A◦−B)−◦(A◦−C)
A◦−B (GEA2∗ 1) (A◦−C)−◦(B◦−C)
The other four GEA* laws include the multiplicative conjunction “⊗”: B◦−C (GEA1∗ 2) (B ⊗ A)◦−(C ⊗ A)
A−◦ B (GEA2∗ 2) (C ⊗ A)−◦(C ⊗ B)
(C ⊗ A) ⊗ (A−◦ B) (GEA1∗ 3) C ⊗B
(B◦−C) ⊗ (C ⊗ A) (GEA2∗ 3) B⊗A
We have noticed that the multiplicative conjunction “⊗” allows the composition of two independent objects (types). This approach based on a geometrical view of the laws of Lambek’s Syntactic Calculus (LC)[19], put into clear evidence the role played by the connective “⊗” in types generation. As shown above, in the calculus NMILL (the multiplicative fragment of Noncommutative Intuitionistic Linear Logic) corresponding to LC the connective “⊗” appears only on the left side of a sequent, while on the right side occur just the two implications “−◦”, “◦−”. We can think at this feature as expressing the basic intuitionistic nature of LC. On the other hand, the multiplicative conjunction “⊗” is independent from the other resources: in this respect, we can say that the operation induced by the multiplicative conjunction is free 2 . 2
It is this kind of observation that led Lambek to develop his calculus of Pregroups, in which the compact multiplicative conjunction, corresponding to the connective “⊗”, is self-dual and therefore is the only operation occurring on both sides of a sequent [20, 14].
Syntactic Calculus and Linear Logic
31
4.2.1 Examples with Gea* rules Let’s start with the first of the GEA rules (GEA1*1)
B◦−C (A◦−B)−◦(A◦−C)
Again we have two readings of this rule: the first (GEA1*1)(a), is the inference in which the conclusion of type A◦−C is derived from two premises of type B◦−C, A◦−B; the second (GEA1*1)(b), is the inference with conclusion (A◦−B)−◦(A◦−C) from the premise B◦−C A◦−B , B◦−C (GEA1∗ 1)(a) A◦−C
(B◦−C) (GEA1∗ 1)(b) (A◦−B)−◦(A◦−C)
As an example of (GEA1*1), let’s take two independent parts of speech like il fatto che (the fact that) and Antonio, where the first is an adverbial phrase, of type N ◦−S, taking a proposition to give a nominal expression, and the second is analyzed as a noun phrase taking a verb to give a proposition, of type S◦−V : Il fatto che
Antonio
N ◦−S
Il fatto che Antonio N ◦−V
S◦−V
obtaining the incomplete expression Il fatto che Antonio (the fact that Antonio) of type N ◦−V 3 . Alternatively, one can use rule (GEA1*1)(b) and assign to the noun Antonio the complex type (N ◦−S)−◦(N ◦−V ): (S◦−V ) (GEA1∗ 1)(b) (N ◦−S)−◦(N ◦−V ) One can obtain an expression of type N by adding a word of type V (i.e. N −◦S), e.g. sorride (is smiling): Il fatto che Antonio
sorride
N ◦−V
V
Il fatto che Antonio sorride N
The laws (GEA1*2) and (GEA1*3) involve three operators, the multiplicative conjunction “⊗” and the two implications “◦−”, “−◦”. The first one may be viewed as an expansion rule: (GEA1*2) B◦−C (B ⊗ A)◦−(C ⊗ A) with the inferential readings 3
Observe that this reading of the (GEA1*1) law corresponds to the Composition law (COM2) in LC.
32
V. M. Abrusci, C. Casadio
B◦−C , C ⊗ A (GEA1∗ 2)(a) B⊗A
(B◦−C) (GEA1∗ 2)(b) (B ⊗ A)◦−(C ⊗ A)
while (GEA1*3) may be viewed as a contraction rule: (GEA1*3) (C ⊗ A) ⊗ (A−◦ B) C ⊗ B with the inferential readings: (C ⊗ A) ⊗ (A−◦ B) (GEA1∗ 3)(a) C ⊗B
(A−◦ B) (GEA1∗ 3)(b) (C ⊗ A)−◦(C ⊗ B)
As an example of (GEA1*2)(a), let’s take B◦−C to be the noun Antonio and C ⊗A to be the verbal expression cammina spesso (is walking frequently): Antonio
cammina spesso
C ⊗A
B◦−C
Antonio cammina spesso B⊗A
that by assigning type S◦−V to the noun Antonio, type V to the verb cammina, and type Adv to the adverbial spesso, gives the string of type S ⊗ Adv 4 : Antonio
cammina spesso
S◦−V
V ⊗ Adv
Antonio cammina spesso S ⊗ Adv
The rules (GEA1*3)(a), (GEA1*3)(b) follow a pattern involving the multiplicative conjunction “⊗” and the linear implication “−◦”; we give an example of rule (GEA1*3)(a) that allows one to reduce the complex type (C ⊗ A) ⊗ (A−◦ B) to the type C ⊗ B: Antonio
Domani C
⊗
A
parte ⊗
A−◦B
Domani Antonio parte C ⊗B
that is, by assigning the types Adv to the adverbial Domani (tomorrow), N to the noun Antonio, and N −◦S to the verb parte (is leaving)5 : 4
Trivially, by taking an adverbial like spesso as a word of type S−◦S the string reduces to the simple type S.
5
As above, notice that by assigning type S◦−S to the pre-sentential adverb domani one obtains a string of type S.
Syntactic Calculus and Linear Logic
Antonio
Domani Adv
⊗
N
parte ⊗
N −◦S
33
Domani Antonio parte Adv ⊗ S
Observe that one can obtain the same result by applying the rules of Functional Application (APP1), (APP2): (S◦−S, (N, N −◦S S) S), i.e. S◦−S S
N N −◦S S
The laws (GEA2*1), (GEA2*2), (GEA2*3) follow patterns similar to the corresponding (GEA1*1), (GEA1*2), (GEA1*3) laws, the main difference being the occurrence of the linear left implication −◦ in place of the linear right implication ◦−, and vice versa.
4.3 Swiching laws Within the SWI family, we have found two new laws strictly related to the previous ones, at least from a geometrical point of view: – (SWI1*1) (A◦−B) ⊗ C A◦−(C−◦B), – (SWI1*2) A ⊗ (B−◦C) (B◦−A)−◦C. with the following inferential readings: (A◦−B) ⊗ C (SW I1∗ 1) A◦−(C−◦B)
A ⊗ (B−◦C) (SW I1∗ 2) (B◦−A)−◦C
We observe that these laws involve the elimination of the multiplicative conjunction “⊗” in favour of the two implications “−◦”, “◦−”; in doing so linguistic concatenation works incrementally reducing the number of objects considered. The law (SWI1*1) combines a phrase or expression of type A◦−B with a fragment of language C occurring after it. The law (SWI1*1) combines a given phrase or expression of type B −◦C with a fragment of language A occuring before it.
4.3.1 Examples with SWI* rules The following is an example in Italian of the rule (SWI1*1):
34
V. M. Abrusci, C. Casadio
⊗ lentamente
Antonio A◦−B
Antonio
lentamente
A◦−(C−◦B)
C
If we put an object of type C−◦B after the string Antonio lentamente we get an object of type A: Antonio
lentamente
cammina
Antonio lentamente cammina
C−◦B
A
A◦−(C−◦B)
Observe that by assigning type S◦−V to the noun Antonio, Adv to the adverbial lentamente (slowly), Adv−◦V to the verb cammina (is walking), the rule (SWI1*1) produces a string of type S:
⊗ lentamente
Antonio S◦−V
Antonio
lentamente
S◦−(Adv−◦V )
Adv
since for every object of type Adv−◦V (e.g. the verb cammina): Antonio lentamente
cammina
Antonio lentamente cammina
S◦−(Adv−◦V )
Adv−◦V
S
Here is an example in English: John
⊗
was excessively
C
A◦−B
John was excessively A◦−(C−◦B)
since for every object of type C−◦B (e.g. the verb pleased): John was excessively
pleased
A◦−(C−◦B)
John was excessively pleased
C−◦B
A
By assigning appropriate syntactic types to words we obtain as a result the type S of propositions (for V the type of simple verb phrases, V P the type of full verb phrases): John S ◦− V P
was excessively ⊗
V
Syntactic Calculus and Linear Logic
35
since for every object of type V −◦V P (e.g. the verb pleased): John
was excessively
pleased
S◦−(V −◦V P )
John was excessively pleased
V −◦V P
S
We can also take a different word order in which the adverbial excessively is the last word of the string: John S◦−V P
was pleased ⊗
V
since for every object of type V −◦V P (the adverb excessively): John was pleased S◦−(V −◦V P )
excessively V −◦V P
John was pleased excessively S
5 Conclusions In this paper we have analyzed a group of new rules (§4.1, §4.2, §4.3) that belong to Lambek’s Syntactic Calculus, but have not yet been considered in the literature. These rules follow from a set of laws that have a common geometrical representation: Application laws, Expansion laws and Type-raising laws are represented by cyclic multiplicative proof-nets with two axiom links (§3.2), whereas Composition laws, Geach laws and Switching laws are represented by cyclic multiplicative proof-nets with three axiom links (§3.3). It’s important to point out that the linguistic examples considered above can be also handled by other rules of the Syntactic Calculus: e.g. the inferential reading (GEA1*1)(a) can be treated as an instance of the law of Composition. However, the new laws studied in this paper express a different way to treat the generation of linguistic texts, in particular string concatenation. Words or phrases of a language are considered independently from the goal of obtaining a sentence and can be processed both from left to right, or from right to left, or also from the middle of the text. Text generation proceeds, rather freely, step by step and different results can be reached. We think that this kind of approach can have interesting applications both in language learning algorithms and, more generally, in natural language computation. We are really grateful to Jim Lambek for his acute observations and discussions over the years; we are indebted to him for many intuitions that are behind the appearing of this paper.
36
V. M. Abrusci, C. Casadio
References 1. Abrusci, V. M.: Non-Commutative Intuitionistic Linear Logic. Zeitschrift f. math. Logik und Grundlagen d. Math. 36 (1990), 11–15. 2. Abrusci, V. M.: Phase semantics and sequent calculus for pure noncommutative classical linear propositional logic. The Journal of Symbolic Logic 56(4), (1991), 1403–1451. 3. Abrusci, V. M.: Classical Conservative Extensions of Lambek Calculus. Studia Logica, 71(3), (2002), 277–314. 4. Abrusci, V. M.: On Residuation. In C. Casadio, B. Coecke, M. Moortgat, P. Scott (eds.), Categories and Types in Logic, Language, and Physics, Essays Dedicated to Jim Lambek on the Occasion of His 90th Birthday. Springer, LNCS Volume 8222, FoLLI Publications in Logic, Language and Information (2014), 14–27. 5. Abrusci, V. M.: Sylllogisms and Linear Logic. Preprint, Dipartimento di Filosofia, Roma (2000). Extended edition forthcoming. 6. Abrusci, V. M., Casadio, C.: A Geometrical Representation of the Basic Laws of Categorial Grammar. Studia Logica 105 (2017), 479-520. 7. Abrusci, V. M., Ruet, P.: Non-commutative Logic I: The multiplicative fragment. Annals of Pure and Applied Logic 101(1), (1999), 29–64. 8. Bernardi, R., Moortgat, M.: Continuation Semantics for the Lambek-Grishin Calculus. Information and Computation 208(5), (2010), 397-416. 9. Buszkowski, W.: Type Logics and Pregroups. Studia Logica, 87(2-3), (2007), 145–169. 10. Buszkowski, W.: Lambek Calculus and Substructural Logics. Linguistic Analysis, 36(1-4), (2010), 15-48. 11. Buszkowski, W.: Syntactic Categories and Types: Ajdukiewicz and Modern Categorial Grammars. Adam Mickiewicz University, Poznan (2015) 12. Buszkowski, W., Farulewski, M.: Nonassociative Lambek Calculus with Additives and Context-Free Languages. In O. Grumberg et al. (eds.), LNCS 5533, Springer (2009), 45–58. 13. Casadio, C.: Non-Commutative Linear Logic in Linguistics. Grammars 4(3), (2001), 167–185. 14. Casadio, C., Lambek, J.: A Tale of Four Grammars. Studia Logica, 71(3), (2002), 315-329. 15. Casadio, C., Lambek, J. (eds.): Recent Computational Algebraic Approaches to Morphology and Syntax. Polimetrica International Publisher, Monza (Milan) (2008) 16. Casadio, C., Sadrzadeh, M.: Clitic Movement in Pregroup Grammar: a CrossLinguistic Approach. In Logic, Language and Computation, Proceeding 8th International Tbilisi Symposium, TbiLLC2009, Springer (2011), 197–214. 17. Girard, J.-Y.: Linear logic. Theoretical Computer Science, 50, (1987), 1–102. 18. Grishin, V. N.: On a generalization of the Ajdukiewicz-Lambek system. In Studies in Nonclassical Logics and Formal Systems, Moscow, Nauka 315-343 (1983). Eng. trans. by D. Cubric, edited by author. Rep. in Abrusci and Casadio (eds.) New Perspectives in Logic and Formal Linguistics, Proceedings of the 5th Roma Workshop. Bulzoni, Rome (2002), 9–27. 19. Lambek, J.: The Mathematics of Sentence Structure. The American Mathematical Monthly, 65(3), (1958), 154–17. 20. Lambek, J.: From Word to Sentence. A Computational Algebraic Approach to Grammar. Polimetrica International Publisher, Monza (Milan) (2008). 21. Lambek, J.: Exploring Feature Agreement in French with Parallel Pregroup Computations. Journal of Logic, Language and Information 19 (2010), 75–88. 22. Lambek, J.: Logic and Grammar. Studia Logica 100(4), (2012), 667–681. 23. Lambek, J.: From Rules of Grammar to Laws of Nature. Nova Science Publisher, New York (2014). 24. Melies, P. A.: A topological correctness criterion for multiplicative non-commutative logic. In T. E. Ehrhard, J. Y. Girard, P. Huet, P. Scott (eds), Linear Logic in Computer Science, Cambridge University Press (2004).
References
37
25. Moortgat, M.: Categorial Type Logics. In J. A. F. K. van Benthem, A. Ter Meulen (eds.), Handbook of Logic and Language, Elsevier (2010), 95–180. 26. Moortgat, M.: Symmetric Categorial Grammar: Residuation and Galois Connections. Linguistic analysis, 36(1-4), (2010), 143–166. 27. Moortgat, M., Moot, R.: Proof-nets for the Lambek-Grishin Calculus. In E. Grenfestette, C. Heunen, M. Sadrzadeh (eds.), Compositional Methods in Physics and Linguistics, Oxford University Press (2013), 283–320. 28. Morrill, G.: A Categorial Type Logic. In C. Casadio, B. Coeke, M. Moortgat and P. Scott (eds.) Categories and Types in Logic, Language and Physics: Essays Dedicated to Jim Lambek on the Occasion of His 90th Birthday. Springer LNCS Volume 8222, FoLLI Publications in Logic, Language and Information (2014), 331–352. 29. Yetter, D. N.: Quantales and (Noncommutative) Linear Logic. The Journal of Symbolic Logic, 55(1), (1990), 41–64.
Sheaf Representations and Duality in Logic Steve Awodey
Abstract The fundamental duality theories relating algebra and geometry that were discovered in the mid-20th century can also be applied to logic via its algebraization under categorical logic. They thereby result in known and new completeness theorems. This idea can be taken even further via what is sometimes called “categorification” to establish a new connection between logic and geometry, a glimpse of which can also be had in topos theory.
Preface Shortly after finishing my PhD thesis, I received a friendly letter from Professor Lambek in which he expressed interest in a result of mine that extended his work with Moerdijk [14]. He later cited my result in some papers on the philosophy of mathematics (including [12, 13]), in which he developed a congenial position that attempted to reconcile the various competing ones in foundations on the basis of results concerning the free topos, the sheaf representations considered here, and related considerations from categorical logic. The particular result in question, discussed in section 4 below, extends prior results by Lambek and Moerdijk [14] and Lambek [13], and was later extended further in joint work with my PhD students, first Henrik Forssell [2, 6], and then Spencer Breiner [3, 4]. This line of thought is, however, connected to a deeper one in modern mathematics, as I originally learned Steve Awodey Carnegie Mellon University, Pittsburgh, PA, USA, e-mail: [email protected] The author acknowledges the support of the Centre for Advanced Study (CAS) in Oslo, Norway, which funded and hosted the research project “Homotopy Type Theory and Univalent Foundations” during the 2018/19 academic year, as well as the support of the Air Force Office of Scientific Research through MURI grant FA9550-15-1-0053.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_2
39
40
Steve Awodey
from the papers of Lambek. That insight inspired my original contribution and also the later joint work with my students, and it continues to fascinate and inspire me. The purpose of this survey is to sketch that line of thought, which owes more to Lambek than to anyone else. The main idea, in a nutshell, is that the ground-breaking duality theories developed in the mid-20th century can also be applied to logic, via its algebraization under categorical logic, and they thereby result in known and new completeness theorems. This insight, which is already quite remarkable, can as it turns out be taken even further—via what is sometimes called “categorification”—to establish an even deeper relation between logic and geometry, a glimpse of which can also be had in topos theory, and elsewhere.
1 Gelfand duality Perhaps the ur-example of the sort of duality theory that we have in mind is the relation between topological spaces and commutative rings given by Gelfand duality (see [9] Ch. 4). To give a brief (and ahistorical) sketch, let X be a space and consider the ring of real-valued continuous functions on X, with pointwise algebraic operations, C(X) = Top(X, R). This construction is a (contravariant) functor from “geometry” to “algebra”, C : Topop
/ CRng.
The functor C can be shown to be full and faithful if we restrict to compact Hausdorff spaces X and (necessarily bounded) continuous functions C ∗ (X), C ∗ : KHausop → CRng. It then requires some further work to determine exactly which commutative rings are of the form C ∗ (X) for some space X. These are called (commutative) C ∗ -algebras, and they can be characterized as commutative rings A satisfying the following further conditions ([9], §4.4): 1. the additive group of A is divisible and torsion free, 2. A has a partial order compatible with the ring structure and such that a2 ≥ 0 for all a ∈ A, 3. A is Archemedian, i.e. for every a ∈ A there is an integer n such that n · 1A ≥ a, 4. if 1A ≥ n · a for all positive integers n, then a ≤ 0, 5. A is complete in the norm ||a|| = inf{q ∈ Q+ | q · 1A ≥ a and q · 1A ≥ −a} .
Sheaf Representations and Duality in Logic
41
There are many equivalent specifications (most using complex numbers in place of reals). Henceforth all rings are assumed to be commutative with unit. Theorem 1 (Gelfand duality) The category KHaus of compact Hausdorff spaces is dual to the category C∗ Alg of C ∗ -algebras and their homomorphisms, via the functor C ∗ : KHausop C∗ Alg. How can we recover the space X from its ring of functions C ∗ (X)? • The points x ∈ X determine maximal ideals in the ring C ∗ (X), Mx = { f : X
/ R | f (x) = 0 } ,
and every maximal ideal in C ∗ (X) is of this form for a unique x ∈ X. • For any ring A, the (Zariski) topology on the set Max(A) of maximal ideals has a basis of open sets of the form: Ba = {M ∈ X | a ∈ / M },
a ∈ A.
• If A is a C ∗ -algebra, then this specification determines a compact Hausdorf space X = Max(A) such that A ∼ = C ∗ (X). A key step in the proof is the following: Proposition 1 Let A be a C ∗ -algebra. For any maximal ideal M in A, the quotient field A/M is isomorphic to R. It follows that there is an injection of rings, A/M ∼ A = RMax(A) . M ∈Max(A)
The image of this map can be shown to consist of the Zariski continuous functions, i.e. it is C ∗ (Max(A)).
2 Grothendieck duality for commutative rings Grothendieck extended the Gelfand duality from C ∗ -algebras to all commutative rings by generalizing on the “geometric” side from compact Hausdorff spaces to the new notion of (affine) schemes, Schemeop aff CRng. The essential difference is to generalize the “ring of values” from the constant ring R to a ring R that “varies continuously over the space X”, i.e. a sheaf
42
Steve Awodey
of rings. The various rings Rx that are the stalks of R generalize the local rings of real-valued functions that vanish at the points x ∈ X. This change allows every commutative ring A to be seen as a ring of continuous functions on a suitable space XA (the prime spectrum of A), where the values of the functions are in a suitable sheaf of (local) rings R on XA (see [9] Ch. 5). Definition 1 A ring is called local if it has a unique maximal ideal. Equivalently, if 0 = 1, and x + y is a unit
implies
x is a unit or y is a unit.
(1)
Theorem 2 (Grothendieck sheaf representation) Let A be a ring. There is a space XA with a sheaf of rings R such that: 1. for every p ∈ XA , the stalk Rp is a local ring, 2. there is an isomorphism, A∼ = Γ (R) , where Γ (R) is the ring of global sections. Thus every ring is isomorphic to the ring of global sections of a sheaf of local rings. The space XA in the theorem is the prime spectrum Spec(A) of the ring A: • points p ∈ Spec(A) are prime ideals p ⊆ A, • the (Zariski) topology has basic opens of the form: Bf = {p ∈ Spec(A) | f ∈ p},
f ∈ A.
Note the similarity to the space Max(A) of maximal ideals from the Gelfand case. Unlike that case, however, the functor Spec : CRngop
/ Top
is not full, and so we need to equip the spaces Spec(A) with an additional structure. The structure sheaf R is determined at a basic open set Bf by “localizing” A at f , R(Bf ) = [f ]−1 A where A → [f ]−1 A freely inverts all of the elements f, f 2 , f 3 , . . . . The stalk Rp of this sheaf at a point p ∈ Spec(A) is then seen to be the localization of A at Sp = A \ p, Rp = Sp−1 A . The affine scheme (Spec(A), R) presents A as a “ring of continuous functions” in the following sense:
Sheaf Representations and Duality in Logic
43
• each element f ∈ A determines a “continuous function”, fˆ : Spec(A)
/ R,
except that the ring R is itself “varying continuously over the space Spec(A)” – i.e. it is a sheaf – and the function fˆ is then a global section of the sheaf R. • Each stalk Rp is a local ring, with a unique maximal ideal, corresponding / R that vanish at p”. to “those functions fˆ : Spec(A) • (Spec(A), R) is a “representation” of A in the sense that f → fˆ is an isomorphism of rings A∼ = Γ (R) . There is always an injective homomorphism from the global sections of a sheaf into the product of all the stalks, Rp . Γ (R) p
Thus we have the following: Corollary 1 (“Subdirect-product representation”) Every ring A is isomorphic to a subring of a “direct product” of local rings. I.e. there is an injective ring homomorphism Rp , A p
where the Rp are all local rings.
3 Lambek-Moerdijk sheaf representation for toposes Definition 2 Call a (small, elementary) topos E sublocal 2 if its subterminal lattice SubE (1) has a unique maximal ideal. Equivalently, 0 1 and for x, y ∈ SubE (1): x ∨ y = 1 implies x = 1 or y = 1 . Note the formal analogy to the concept of local ring. In [14] the following analogue of the Grothendieck sheaf representatation for rings is given for toposes (henceforth, topos unqualified will mean small, elementary topos): Theorem 3 (Lambek-Moerdijk sheaf representation) Let E be a topos. There is a space XE with a sheaf of toposes E such that:
2
In [14], and elsewhere, the term local was used for the concept here called sublocal, and another term was then required in [1] for the stronger condition that we now call local in Definition 3 below.
44
Steve Awodey
1. for every p ∈ XE , the stalk Ep is a sublocal topos, of global sections, there is an isomorphism, 2. for the topos Γ (E) . E∼ = Γ (E)
Thus every topos is isomorphic to the topos of global sections of a sheaf of sublocal toposes. The space X mentioned in the theorem is what may be called the subspectrum of the topos, X = sSpec(E); it is the prime ideal spectrum of the distributive lattice Sub(1): • the points P ∈ sSpec(E) are prime ideals P ⊆ Sub(1), • the topology has basic opens of the form: Bq = {P ∈ Spec(E) | q ∈ P },
q ∈ Sub(1) .
Note the close analogy to the space Spec(A) for a commutative ring A. The lattice of all open sets of sSpec(E) is then (isomorphic to) the ideal completion of the lattice Sub(1), O(Spec(E)) = Idl(Sub(1)) . Next, let us define a structure sheaf E on sSpec(E) by “slicing” E at q ∈ Sub(1), q ) = E/q . E(B This takes the place of the localization of a ring A at a basic open Bf : RA (Bf ) = [f ]−1 A . Note that E/q “inverts” all those elements p ∈ Sub(1) with q ≤ p, in the / E/q takes every p 1 to q ∧ p q, sense that the canonical map q ∗ : E and so if q ≤ p then q ∗ p = 1q : q q. The fact that E is indeed a sheaf on sSpec(E) comes down to showing that, for any p, q ∈ Sub(1), there is a canonical equalizer of toposes (and logical morphisms), E/p ∨ q E/p × E/q ⇒ E/p ∧ q . This in turn says that in a diagram of the form:
Sheaf Representations and Duality in Logic
45
/ |X P
|
/Q /Y
}
p| ∧ q / | p /
/ | q | / p∨q
with a pushout-pullback of monos in the base, and the two vertical squares involving X given as pullbacks, one can complete the cube as indicating by first forming the pushout Y on the top face, and then obtaining the front vertical map from Y , and the resulting new vertical faces will then also be pullbacks. This is a rather special “descent condition” for the presheaf E/−. The stalk EP of this sheaf at a point P ∈ sSpec(E) is computed as the filter-quotient of E over the complement of the prime ideal P ⊆ SubE (1), i.e. the prime filter P c = Sub(1) \ P . Thus for the stalk we have the (filtered) colimit (taken in Cat, but again a topos): EP = lim E/q . −→ q∈P
For this stalk topos, one then has the subterminal lattice: SubEP (1) ∼ = SubE (1)/P c , where SubE (1)/P c is the quotient Heyting algebra by the prime filter P c . Since for the prime filter P c we have p ∨ q ∈ P c implies p ∈ P c or q ∈ P c , it thus follows that the stalk topos EP is indeed sublocal. Finally, for the global sections of E we have simply: ∼ ) = E/1 ∼ Γ (E) = E(B =E.
Thus the topos of global sections of E is indeed isomorphic to the original topos E. In this way, E is isomorphic to the global sections of a sheaf of sublocal toposes. Again, there is always an injection from the global sections into the product of the stalks, which in this case gives a conservative logical morphism of the form E∼ EP . = Γ (E) P ∈sSpec(E)
Corollary 2 Every topos has a conservative logical morphism into a product of sublocal toposes.
46
Steve Awodey
3.1 Lambek’s modified sheaf representation for toposes Now consider the following logical interpretation of the sheaf representation theorem for toposes and its corollary. • A topos E can be regarded as the syntactic category ET of a theory T in Intuitionistic Higher-Order Logic (IHOL). Thus for any sentence φ in the language of the theory T, ET |= φ
iff
T φ.
• A logical functor ET → F between toposes induces an interpretation of the language of T into F; one which, moreover, satisfies all the sentences that hold in ET , and thus a T-model. Such a functor is called conservative if it reflects isomorphisms; for logical functors, this is the same as being faithful. A conservative logical functor f : ET F therefore reflects satisfaction, in the sense that for any sentence φ in the language of T, F |= φf
implies
ET |= φ .
where φf is the interpretation of φ under the model induced by f . • A sublocal topos S is one that is consistent S ⊥ and has the disjunction property S |= φ ∨ ψ iff S |= φ or S |= ψ , for all sentences φ, ψ. Such sublocal toposes are more Set-like than a general one, and can thus be regarded as suitable semantics for logical theories. • The “subdirect-product representation” given by Corollary 2 is a logical completeness theorem with respect to interpretations ET → S of T into sublocal toposes S. It says that, for any theory T in IHOL, a sentence φ is provable, T φ, iff it holds in every interpretation of T in a sublocal topos S. Thus IHOL is complete with respect to models in sublocal toposes. • The sheaf representation is a Kripke-style completeness theorem for IHOL, with E as a “sheaf of possible worlds” (see [12]). Under this interpretation, however, the present sheaf representation is not entirely satisfactory, because we would really like the “semantic toposes” S to be even more Set-like, in addition to being sublocal, by also having the existence property: S |= (∃x : A)ϕ(x)
iff
S |= ϕ(a), for some closed a : A .
Definition 3 A topos S will be called local if the terminal object 1 is both indecomposable and projective, i.e. the global sections functor Γ = HomS (1, −) : S
/ Set
Sheaf Representations and Duality in Logic
47
preserves all finite coproducts and epimorphisms. Note that a local topos is exactly one that is consistent and has both the disjunction and existence properties. In the paper [12], Lambek gave the following improvement over the sublocal sheaf representation: Theorem 4 (Lambek sheaf representation) Let E be a topos. There is a faithful logical functor E F and a space X with a sheaf of toposes F such that: 1. for every p ∈ X, the stalk Fp is a local topos, 2. for the global sections of F there is an isomorphism F ∼ = Γ (F). Thus every topos is a subtopos of one that is isomorphic to the global sections of a sheaf of local toposes.
The proof was inspired by the Henkin completeness theorem for higher-order logic [8], and first performs a sort of “Henkinization” of E to get a bigger topos E F with witnesses for all existential quantifiers, in a suitable sense. This result then suffices for a subdirect-product embedding of any topos E into a product of local toposes, and therefore gives the desired logical completeness of IHOL with respect to such toposes, which are much more Set-like.
4 Local sheaf representation for toposes The result from [1] mentioned above was this: Theorem 5 (Local topos sheaf representation) Let E be a topos. There is a space XE with a sheaf of toposes E such that: 1. for every p ∈ XE , the stalk Ep is a local topos, 2. for the global sections of E there is an equivalence E Γ (E).
Thus every topos is equivalent to the global sections of a sheaf of local toposes. As before, this gives a subdirect-product representation of E, Sp , E p ∈X
into a product of local toposes Sp = Ep , and therefore implies the desired logical completeness of IHOL with respect to local toposes. This stronger result also gives better “Kripke semantics” for IHOL, since the “sheaf of possible worlds” (in the sense of [12]) now has local stalks. For classical higher-order logic, something more can be said:
48
Steve Awodey
Lemma 1 Every Boolean, local topos S is well-pointed, i.e. the global sections functor, / Set Γ = HomS (1, −) : S is faithful. Corollary 3 Every Boolean topos is isomorphic to the global sections of a sheaf of well-pointed toposes. For Boolean toposes B, we therefore have an embedding, Sp B p ∈X
as a subdirect-product of well-pointed toposes Sp (this is [7], Thm 3.22). The logical counterpart now says: Corollary 4 Classical HOL is complete with respect to models in well-pointed toposes. A well-pointed topos is essentially a model of classical Zermelo set theory ([16], §VI.10). Indeed, it is worth emphasizing that the models of HOL here are standard models of classical HOL (i.e. with full function and power sets), taken in varying models S of set theory. Finally, taking the global sections Γ : Sp Set of each well-pointed topos Sp , we get a faithful functor from any Boolean topos B into a power of Set: B Sp Set ∼ = SetX . p ∈X
p ∈X
However, the various composites B → Sp Set are now not logical functors, because they need not preserve exponentials; they do, however, preserve the first-order logical structure (they are also exact; thus we have another proof of [7] theorem 3.24). These composites are exactly what the logician calls a “Henkin” or “non-standard” model of HOL in Set. In this way, we recover the familiar “Henkin completeness theorem for HOL” [8]: Corollary 5 Classical HOL is complete with respect to Henkin models in Set. For the proof of the local topos sheaf representation theorem, these “Henkin models” will be taken as the points of the space XE , which we call the space of models (following [5]). In the sublocal case, the points were the prime ideals P ⊆ Sub(1). These correspond exactly to the lattice homomorphisms / 2. p : SubE (1) For the local case, we instead take coherent functors P :E
/ Set ,
Sheaf Representations and Duality in Logic
49
which correspond to (Henkin) models of the “theory” E.3 The topology on XE can be described roughly as follows (see [1] for more details, but the idea for this topology originates with [10, 5]; it was also used in [2, 4]). To simplify things, let us regard E as a classifying topos for a theory / Set “satisfies” a sentence φ, which we may T, and say that a model P : E identify with its interpretation φ 1E , if [[φ]]P = P (φ) = 1. Then we could mimic the subspectrum by taking as a basic open set all those models P that satisfy some fixed φ: Vφ = {P ∈ XE | P |= φ} . However, it turns out that there are too few such basic opens; thus we will also use formulas φ(x) with free variables . In order to say when P |= φ(x) we therefore equip each model P with a “labelling” α : κ → |P | by elements of some fixed, large set κ, and we then define the notion of satisfaction of a formula by such a labelled model (P, α) |= φ(x), which we write suggestively as P |= φ(α). Thus the points of XE are actually pairs (P, α), and the basic open sets then have the form Vφ(x) = {(P, α) ∈ XE | P |= φ(α)} for all formulas φ(x). (This description is not entirely accurate, but it gives the idea for present purposes; see [1, 2, 4] for details.) The structure sheaf E on XE is again defined by “slicing” E, E(A) = E/A
for A ∈ E ,
but now it is first shown to be a stack on E itself (with respect to the coherent topology). What this means is: 1. for any A, B ∈ E, the canonical map is an equivalence, E/A + B E/A × E/B , 2. for any epimorphism e : B
/ / A, the canonical map is an equivalence,
E/A des(E/B, e) , where des(E/B, e) is the category of objects of E/B equipped with descent / A. data with respect to e : B The stack is then strictified to a sheaf of categories (see [1]), and then finally transferred from E to the space XE of models using a topos-theoretic covering theorem due to Butz and Moerdijk [5]. Call the resulting sheaf of categories on XE again E.
3 Of course, the collection of all such functors may be too big to form a set. The remedy, as explained in the paper [1], is to choose a suitable cardinal bound κ on the size of the models P .
50
Steve Awodey
The stalk E(P,α) of the (transferred) sheaf at a point (P, α) can be calculated as the colimit, E(P,α) = lim E/A, −→ A∈ P
where the (filtered!) category of elements E P of the model P : E → Set takes the place of the prime filter. As a key step, one shows that these stalks are indeed local toposes whenever P : E → Set is a coherent functor. Finally, for the global sections functor Γ : Sh(XE ) → Set, we still have: E/1 ∼ Γ (E) = E.
In this way, E is indeed equivalent to the topos of global sections of a sheaf of local toposes on a space.
5 Stone duality for Boolean algebras The foregoing sheaf representations for toposes suggest an analogous treatment for pretoposes, which would actually be somewhat better, because the Set-valued models used for the points (and coming from the global sections of the stalks) would then all be standard models, rather than Henkin style, non-standard models. This suggests the possibility of a duality theory for first-order logic, analogous to that for affine schemes and commutative rings, with pretoposes playing the role of rings, the space of models playing the role of the prime spectrum, and the sheaf representation providing a structure sheaf. This is more than just an analogy: it is a generalization of the classical Stone duality for Boolean algebras (= Boolean rings). From a logical point of view, the classical duality theory for Boolean algebras is the propositional case of the first-order one that we are proposing for pretoposes. (There is also a generalization from classical to intuitionistic logic, which is less of a stretch.) Thus let us briefly review the “propositional case” of classical Stone duality for Boolean algebras, before proceeding to the “first-order” case of pretoposes. Recall (e.g. from [9], Ch. 5) that for a Boolean algebra B we have the Stone space Stone(B), which is defined exactly as was the subterminal lattice SubE (1) of a topos E, i.e. Stone(B) = Spec(B) is the prime spectrum of B (prime ideals in a Boolean algebra are always maximal, thus are exactly the complements of the ultrafilters, which are the usual points of Stone(B)). We can represent the points p ∈ Spec(B) as Boolean homomorphisms, p:B
/ 2.
Sheaf Representations and Duality in Logic
51
And we can recover the Boolean algebra B from the space Spec(B) as the clopen subsets, which are represented by continuous maps, / 2,
f : Spec(B)
where (the underlying set of) 2 is given the discrete topology. Note that this is also a sheaf representation – but a constant one! The stalks are local Boolean algebras, which are always just 2. Stone’s representation theorem for Boolean algebras then says that there is always an injective homomorphism, B 2X ∼ = P(X) , for a set X, which we can take to be the set of points of Spec(B), i.e. the ultrafilters. This is therefore the usual subdirect-product embedding resulting from the sheaf representation. There is, moreover, a contravariant equivalence of categories, Spec
Bool j
+
Stoneop .
Clop
Both of the functors Spec and Clop are given by homming into 2, albeit in two different categories. Logically, a Boolean algebra B is always the “Lindenbaum-Tarski algebra” / 2 is of a theory T in propositional logic, and a Boolean homomorphism B then the same thing as a T-model, i.e. a “truth-valuation”. Thus the points of Spec(B) are models of the propositional theory T. We are going to generalize this situation by replacing Boolean algebras with (Boolean) pretoposes, representing first-order logical theories, and replacing 2-valued models with Set-valued models.
6 Stone duality for Boolean pretoposes M. Makkai [15] has discovered a Stone duality for Boolean pretoposes with respect to what he terms ultragroupoids on the geometric/semantic side. These are groupoids (of models and isomorphisms) equipped with a primitive structure of ultraproducts of models, together with groupoid homomorphisms that preserve ultraproducts. The result is an equivalence of categories: + BoolPreTopk UltraGpdop
52
Steve Awodey
which, as in the propositional case, is mediated by homming into a special object, now Set in place of 2. This replacement, and the remarkable duality theory that results, is an instance of what is sometime called “categorification”, an idea that plays a guiding role throughout categorical logic. It follows in particular that every Boolean pretopos B has a pretopos embedding into a power of Set. B SetX , / Set. where X is a set of “models”, i.e. pretopos functors M : B We will show below that this last fact—which is essentially G¨ odel’s completeness theorem for first-order logic—is also a “subdirect-product representation” resulting from a sheaf representation of B. But first we need to make a suitable “space of models”. In joint work with H. Forssell [2, 6] Makkai’s ultragroupoids of models were replaced by topological groupoids of models, equipped with a StoneZariski type logical topology similar to the one used above for the local sheaf representation for toposes. In overview, our (topological) generalization of Stone duality from Boolean algebras to Boolean pretoposes works like this: Boolean algebra B Boolean pretopos B propositional theory first-order theory
homomorphism /2 B truth-valuation
pretopos functor / Set B elementary model
topological space Spec(B) of all valuations
topological groupoid Spec(B) of all models and isos
continuous function /2 Spec(B) clopen set
continuous functor / Set Spec(B) coherent sheaf
To give a bit more detail of a few of the steps: • The spectrum Spec(B) of a Boolean pretopos B is not just a space, but a topological groupoid, consisting of a space of (labelled) models (M, α) and a space of isos i : M ∼ = N . These are topologized by a logical topology of the same kind already considered, where the basic opens (of the space of models) are determined by satisfaction of formulas, Vφ(x) = {(M, α) ∈ Spec(B) | M |= φ(α)} .
Sheaf Representations and Duality in Logic
53
/ Spec(B ) are just continuous groupoid homo• Morphisms f : Spec(B) / B gives rise to such a homorphisms. Every pretopos functor F : B momorphism, essentially by precomposition, since Spec : BoolPreTop
/ StoneTopGpdop
is representable, Spec(B) BoolPreTop(B, Set) . / B as a “translation of theThinking of such a pretopos functor F : B ories”, the semantic functor Spec(F ) acts on models in the corresponding way. • Recovering B from Spec(B) amounts to recovering an elementary theory (up to pretopos completion) from its models. This is done using hard results from topos theory due mainly to Joyal-Tierney and Joyal-Moerdijk [11, 10, 5]. Specifically, one shows that the category of equivariant sheaves on the topological groupoid Spec(B) is equivalent to the (Grothendieck) topos of sheaves on B for the coherent topology, Sheq (Spec(B)) Sh(B) . Logically, this gives two different presentations of the (Grothendieck) classifying topos of a first-order theory T, such that B = BT is the pretopos completion of (the syntactic category of) T, and Spec(B) is then the groupoid of T-models. It follows that B is equivalent to the subcategory of coherent objects of this topos; thus B is equivalent to the category of coherent, equivariant sheaves on the topological groupoid Spec(B). These can be shown to correspond to certain continuous homomorphisms Spec(B) → Set, where the latter is the topological groupoid of sets, equipped with a suitable topology. In this sense, the coherent, equivariant sheaves generalize the clopen sets in a Stone space. Unlike in the case of Boolean algebras, however, and unlike in Makkai’s theorem using ultragroupoids, we do not have an equivalence of categories, but only an adjunction [2, 6]: Theorem 6 (Awodey-Forssell) There is a contravariant adjunction, Spec
BoolPreTopk
+ StoneTopGpdop ,
Coh
in which both functors are given by homming into Set. In particular, the “semantic” functor, Spec : BPreTop −→ StoneTopGpdop
54
Steve Awodey
is not full : there are continuous functors between the groupoids of models that do not come from a “translation of theories”. Compare the case of commutative rings A, B, where an arbitrary continuous function / Spec(A)
f : Spec(B)
/ B. need not come from a ring homomorphism h : A We can of course characterize the “semantic functors” arising from a pretopos morphism as those that pull coherent sheaves back to coherent sheaves. / Spec(B ) will then correspond to preSuch “coherent” maps f : Spec(B) / B, simply by f (M ) ∼ topos maps F : B = M ◦ F.
7 Sheaf representation for pretoposes We now want to cut down the morphisms between the semantic groupoids Spec(B) to just the coherent ones that come from pretopos functors. We will do this by endowing Spec(B) with additional structure that is preserved by all such “syntactic” maps. Specifically, as for rings and affine schemes, we can equip the spectrum Spec(B) of the pretopos B with a “structure sheaf” defined just as in the sheaf representation for toposes: B, • Start with the pseudofunctor B : B op
/ Cat with,
∼ B(X) = B/X ,
X ∈ B.
The prestack B is actually a stack for the coherent topology, because B is a pretopos. on B. The “stalk” of • Strictify B to get a sheaf of categories (also called B) / Set (a pretopos functor) is then B at a “point” M : E EM
lim E/A,
−→ A∈ M
which is a local Boolean pretopos (1 is indecomposable and projective). • There is an equivalence of Grothendieck toposes, Sh(B) Sheq (Spec(B)), between sheaves on the pretopos B, for the coherent Grothendieck topology, and equivariant sheaves on the topological groupoid Spec(B) of (labelled) models. • Move B across this equivalence in order to get an equivariant sheaf on is thus a sheaf of local, Boolean pretoSpec(B). The result (also called B) poses on Spec(B).
Sheaf Representations and Duality in Logic
55
And from a logical point of view: • B = BT is the Boolean pretopos completion of (the syntactic category of) a theory T in (classical) FOL, and Spec(B) is then the groupoid of T-models. • B is a sheaf of “local theories”. The stalk BM at a T-model M is a wellpointed pretopos representing the complete theory of M , with parameters for all the elements of M added; it is what the logician calls the “elementary diagram” of the model M . B. So the original pretopos BT • As before, B has global sections Γ (B) turns out to be the “theory of all the T-models”. • Since each stalk BM is local, and well-pointed, the global sections functor ΓM : BM Set is a faithful pretopos morphism, i.e. a model in Set. In / Set is naturally isomorphic to the composite: fact, the model M : B M : B Γ (B)
/ BM
ΓM
/ Set .
In sum, we have the following (see [3, 4]):
Theorem 7 (Awodey-Breiner) Let B be a Boolean pretopos. There is a topological groupoid G with an equivariant sheaf of pretoposes B such that: 1. for every g ∈ G, the stalk Bg is a well-pointed pretopos, 2. for the global sections of B there is an equivalence B Γ (B).
Thus every Boolean pretopos is equivalent to the global sections of a sheaf of well-pointed pretoposes. There is again an analogous result for the general (i.e. non-Boolean) case, with local pretoposes in place of well-pointed ones in the stalks. The associated subdirect-product representation is then the following: Corollary 6 For any pretopos E, there is a pretopos embedding, Eg E g∈XE
with each Eg a local pretopos and XE the set of points of the topological groupoid Spec(E). If moreover B is Boolean, then the local pretoposes Bg are all well-pointed, and B therefore embeds (as a pretopos!) into a power of Set: Bg Set SetXB . B g∈XB
g∈XB
In logical terms, the last statement is essentially the G¨ odel completeness theorem for first-order logic, repackaged. Of course, the proof made use of / Set. the equivalent fact that B has enough pretopos functors M : B
56
Steve Awodey
8 Logical schemes For a Boolean pretopos B, call the pair (Spec(B), B)
just constructed an affine logical scheme. A morphism of affine logical schemes (f, f) : (Spec(A), A)
/ (Spec(B), B)
consists of a continuous groupoid homomorphism f : Spec(A)
/ Spec(B),
together with a pretopos functor over Spec(B) f : B
/ f∗ A .
Theorem 8 (Awodey-Breiner) Every pretopos functor B a morphism of the associated affine logical schemes Spec(A) Moreover, the functor
/ A induces / Spec(B).
Spec : BoolPreTop −→ LogSchemeop aff is full and faithful: every map of schemes comes from an essentially unique map of pretoposes. Corollary 7 (First-order logical duality) There is an equivalence, BoolPreTop LogSchemeop aff . The category of Boolean pretoposes is thus dual to the category of affine logical schemes. We can now start to “patch together” affine pieces of the in order to make a general notion of a “logical scheme”, form (Spec(B), B), consisting of a topological groupoid of structures not tied to any one theory, equipped with a sheaf of local theories, and locally equivalent to an affine scheme. The first few steps in this direction are explored in [4].
References 1. Awodey, S., Sheaf representation for topoi. Journal of Pure and Applied Algebra, 145, pp. 107–121, 2000. 2. Awodey, S. and H. Forssell, First-order logical duality. Annals of Pure and Applied Logic, 164(3), pp. 319–348, 2013. 3. Awodey, S. and S. Breiner, Scheme representation for first-order logic. TACL 2013. Sixth International Conference on Topology, Algebra and Categories in Logic, pp. 10– 13, 2014.
Sheaf Representations and Duality in Logic
57
4. Breiner, S., Scheme Representation for First-Order Logic. Ph.D. thesis, Carnegie Mellon University, 2013. Available as arXiv:1402.2600. 5. Butz, C. and Ieke Moerdijk, Representing topoi by topological groupoids. Journal of Pure and Applied Algebra, 130(3), pp. 223–235, 1998. 6. Forssell, H., Topological representation of geometric theories. Mathematical Logic Quarterly, 58, pp. 380-393, 2012. 7. Freyd, P., Aspects of topoi, Bulletin of the Australian Mathematical Society 7, pp. 1– 76, 1972. 8. Henkin, L., Completeness in the theory of types. Journal of Symbolic Logic 15, pp. 81– 91, 1950. 9. Johnstone, P.T., Stone Spaces. Cambridge Studies in Advanced Mathematics 3, Cambridge University Press, 1982. 10. Joyal, A. and I. Moerdijk, Toposes as homotopy groupoids. Advances in Mathematics, 80(1), pp. 22–38, 1990. 11. Joyal, A. and M. Tierney, An extension of the Galois theory of Grothendieck. Memoirs of the AMS, 308, 1984. 12. Lambek, J., On the sheaf of possible worlds. In Adamek, J. and Mac Lane, S. (Eds.), Categorical Topology, World Scientific, Singapore, 1989. 13. Lambek, J., What is the world of mathematics? Annals of Pure and Applied Logic, 126, pp. 149–158, 2004. 14. Lambek, J. and I. Moerdijk, Two sheaf representations of elementary toposes. In A.S. Troelstra and D. van Dalen (Eds.), Brouwer Centenary Symposium, North-Holland, Amsterdam, 1982. 15. Makkai, M., Stone duality for first order logic. Advances in Mathematics, 65(2), pp. 97–170, 1987. 16. Mac Lane, S. and I. Moerdijk, Sheaves in Geometry and Logic. Universitext. Springer, 2nd edition, 1992.
On the naturalness of Mal’tsev categories D. Bourn, M. Gran and P.-A. Jacqmin
‘Let us also recall that fundamental progress in homological algebra was achieved by replacing module categories by arbitrary abelian categories. (. . . ) It is already clear from [52] that, in proving some basic lemmas for Mal’cev varieties, one never uses the fact that the category is varietal, but just that the semantical conditions hold. This suggests that one should investigate a purely categorical notion, generalizing that of an abelian category, to develop non-additive ‘variable’ homological arguments.’ A. Carboni, J. Lambek and M.C. Pedicchio in [22], 1990.
Abstract Mal’tsev categories turned out to be a central concept in categorical algebra. On the one hand, the simplicity and the beauty of the notion is revealed by the wide variety of characterizations of a markedly different flavour. Depending on the context, one can define Mal’tsev categories as those for which ‘any reflexive relation is an equivalence’; ‘any relation is difunctional’; ‘the composition of equivalence relations on the same object is commutative’; ‘each fibre of the fibration of points is unital’ or ‘the forgetful functor from internal groupoids to reflexive graphs is saturated on subobjects’. For a variety of universal algebras, these are also equivalent to the existence in its algebraic theory of a Mal’tsev operation, i.e. a ternary operation p(x, y, z) satisfying the axioms p(x, x, y) = y and p(x, y, y) = x. On the other hand, Mal’tsev categories have been shown to be the right context in which to develop the theory of centrality of equivalence relations, Baer sums of extensions, and some homological lemmas such as the denormalized Dominique Bourn Universit´ e du Littoral, Laboratoire LMPA, BP 699, 62228 Calais Cedex, France. e-mail: [email protected] Marino Gran Universit´ e catholique de Louvain, IRMP, Chemin du Cyclotron 2, 1348 Louvain-la-Neuve, Belgique. e-mail: [email protected] Pierre-Alain Jacqmin Universit´ e catholique de Louvain, IRMP, Chemin du Cyclotron 2, 1348 Louvain-la-Neuve, Belgique. e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_3
59
60
D. Bourn, M. Gran and P.-A. Jacqmin
3 × 3 Lemma, whose validity in a regular category is equivalent to the weaker ‘Goursat property’, which has also turned out to be of wide interest.
Introduction The study of Mal’tsev categories originates from a classical theorem of Mal’tsev in 1954 [54]. For a variety of universal algebras V (i.e., a category of models of a finitary one-sorted algebraic theory), he proved that the following conditions are equivalent: (M1) The composition of congruences is commutative, i.e., for any two congruences R and S on the same algebra in V, the equality RS = SR holds. (M2) The theory of V contains a ternary term p satisfying the equations p(x, x, y) = y and p(x, y, y) = x. Varieties satisfying these conditions are now commonly called ‘Mal’tsev varieties’ [60] (or 2-permutable varieties), and such a term p a ‘Mal’tsev operation’. This result has been extended by Lambek a few years later in [51] where he proved that these conditions are also equivalent to (M3) Any homomorphic relation R in V is difunctional; where difunctionality, a property introduced by Riguet [59], is the condition RR◦ R R. Findlay [28] and Werner [61] further characterised Mal’tsev varieties as those satisfying the condition (M4) Any homomorphic reflexive relation in V is a congruence. In his papers [51] and [52], Lambek generalized classical group theory results, such as Goursat’s theorem and Zassenhaus lemma, to Mal’tsev varieties. For instance, Goursat’s theorem attests that given a homomorphic relation R A × B in a Mal’tsev variety V, the quotients AR/R◦ R and RB/RR◦ are isomorphic, where AR = {b ∈ B | ∃a ∈ A, aRb} and RB = {a ∈ A | ∃b ∈ B, aRb}. The proofs of these results relied on the so called calculus of relations and could therefore be transposed to a categorical context, by translating syntactic conditions into semantic ones. Actually, in her thesis [55] supervised by Lambek, Meisen showed that the equivalence between (M1), (M3) and (M4) also holds for any exact category in the sense of Barr [1]. In that context, Mal’tsev categories were introduced by Carboni, Lambek and Pedicchio in [22] where they developed some aspects of homological lemmas in a non-abelian categorical context. Although axiom (M1) must be stated in the context of categories with a good factorisation system (as regular categories [1] for instance), axioms (M3) and (M4) still make sense in any finitely complete category and turn out to
On the naturalness of Mal’tsev categories
61
be still equivalent in that context. Mal’tsev categories were thus defined by Carboni, Pedicchio and Pirovano [24] as finitely complete categories satisfying (M4) (and thus (M3)). A good part of the theory of Mal’tsev categories can be developed in the finitely complete setting. Besides the homological aspects developed in [22], the Mal’tsev context revealed itself particularly useful to develop the conceptual categorical notion of centrality of equivalence relations. Smith [60] had already opened the way in the varietal context, and then Janelidze made a further step by establishing a link between commutators and internal categories in Mal’tsev varieties [40]. Then, partly based also on the results in [24], Pedicchio was the first to explicitly consider a categorical approach to commutator theory in exact Mal’tsev categories (with coequalizers) [57, 58] and in more general categories [43]. With the introduction of the notion of connector [17], centrality was expressed in its simplest form, and it could be fully investigated in the (regular) Mal’tsev context. Finally, in the exact context, the construction of the Baer sum of extensions with abelian kernel equivalence relation emerged in a very natural way [8, 13]. Several new results were later discovered in Mal’tsev categories in connection with the theory of central extensions [42, 26, 25], homological lemmas [10, 31] and, recently, with (non-abelian) embedding theorems [20, 36, 37, 38]. The homological ambition of the initial project of Carboni, Lambek and Pedicchio [22] was not totally achieved. As a matter of fact, this early work was missing the notion of pointed protomodular category [6] which was necessary to define and investigate the conceptual categorical notion of short exact sequences. When furthermore the context is regular, it is then possible to establish the classical Short Five Lemma, the Noether isomorphism theorems, the Chinese Remainder’s theorem, the Snake Lemma and the long exact homology sequence [4]. In this article we review some of the most striking features of Mal’tsev categories. In the first section, we study Mal’tsev operations and their properties. A partial version of them is used to define the notion of a connector between equivalence relations. The second section is devoted to the definition of Mal’tsev categories in the finitely complete context and relevant examples are given. In the third section, we give many characterizations of Mal’tsev categories showing the richness of the notion. One of them says that the fibres of the fibration of points are unital, which implies the uniqueness of a connector between two equivalence relations on a same object in a Mal’tsev category. In Section 4, we study stiffly and naturally Mal’tsev categories. The fifth section is devoted to regular Mal’tsev categories. We give an additional characterization of them in this context via the notion of regular pushouts, which makes the proof of the existence of a Mal’tsev operation in the varietal context easier. In Section 6, we further study regular Mal’tsev categories using the calculus of relations. This naturally brings us to have a glance at the weaker Goursat property, equivalent in the regular context to the denormalized 3 × 3 Lemma. We conclude this article with Section 7 where we study
62
D. Bourn, M. Gran and P.-A. Jacqmin
Baer sums of extensions with abelian kernel equivalence relation in efficiently regular Mal’tsev categories.
Acknowledgements This work was partly supported by the collaboration project Fonds d’Appui a l’Internationalisation (2018-2020) to strengthen the research collaborations ` in mathematics between the Universit´e catholique de Louvain and the Universit´e d’Ottawa. The third author gratefully thanks the National Science and Engineering Research Council for their generous support.
1 Mal’tsev operations Let us start with the Mal’tsev theorem on varieties of algebras. Theorem 1 Let V be a variety of universal algebras. The following statements are equivalent: (M1) For any two congruences R and S on a same algebra in V, the equality RS = SR holds. (M2) The theory of V contains a ternary term p satisfying the equations p(x, x, y) = y and p(x, y, y) = x. (M3) Any homomorphic relation in V is difunctional, i.e., RR◦ R R. (M4) Any homomorphic reflexive relation in V is a congruence. Proof Let us start by proving (M2) ⇒ (M3). We consider a homomorphic relation R A × B in V and elements a1 , a2 in A and b1 , b2 in B such that a1 Rb1 , a2 Rb1 and a2 Rb2 . We deduce that a1 = p(a1 , a2 , a2 )Rp(b1 , b1 , b2 ) = b2 proving RR◦ R R. We now prove (M4) ⇒ (M1). Given two congruences R and S on the same algebra A, the composition RS is reflexive and therefore a congruence by assumption. Thus, RS = R ∨ S is the supremum of R and S in the lattice of congruences on A. Indeed, R = R1A RS and S RS and given a congruence T on A such that R T and S T , we have RS T T = T . Similarly, SR = S ∨ R = R ∨ S = RS proving the desired commutativity. The implication (M3) ⇒ (M4) immediately follows from the next lemma and the proof of (M1) ⇒ (M2) is postponed to Section 5 (Theorem 54).
On the naturalness of Mal’tsev categories
63
Lemma 2 A homomorphic relation R A × A is an equivalence relation if and only if it is reflexive and difunctional. Proof The ‘only if part’ being trivial, let us prove the ‘if part’. Assume R is reflexive and difunctional. Let us first prove it is symmetric. If xRy for some elements x and y in A, we know by reflexivity that yRy, xRy and xRx. Thus difunctionality implies yRx which shows the symmetry of R. For transitivity, let now x, y, z ∈ A be such that xRyRz. Since xRy, yRy and yRz, we have xRz also by difunctionality, proving that R is transitive. Of course, this lemma can be generalized internally to any finitely complete category using the Yoneda embedding. Remark 3 The equivalence between (M2) and(M3) in Theorem 1 can be xyyx displayed in the form of a matrix as . Reading it vertically, this uuvv matrix represents difunctionality of a relation. Indeed, a relation R A×B is difunctional when xRu, yRu and yRv (the left columns) imply xRv (the right column). On the other hand, reading the matrix horizontally, the identities p(x, y, y) = x and p(u, u, v) = v of (M2) appear. This phenomenon is not at all a coincidence, and the general theory of such matrices has been introduced in [46] to understand many properties of varieties of universal algebras related to Mal’tsev conditions (such as (M2)) from a categorical perspective. Let us now have a closer look at Mal’tsev operations. Definition 4 A Mal’tsev operation on a set X is a ternary operation p : X × X × X → X such that the identities p(x, x, y) = y and p(x, y, y) = x hold for any x, y in X. A Mal’tsev algebra is a set X endowed with a Mal’tsev operation. We denote by Mal the variety of Mal’tsev algebras (including the empty set ∅). Definition 5 Let X be a set and p : X × X × X → X a Mal’tsev operation. We say that • p is left associative if it satisfies the axiom: p(p(x, y, z), z, w) = p(x, y, w) • p is right associative if it satisfies the axiom: p(x, y, p(y, z, w)) = p(x, z, w) • p is associative if it satisfies the axiom: p(p(x, y, z), u, v) = p(x, y, p(z, u, v)) • p is commutative if it satisfies the axiom: p(x, y, z) = p(z, y, x)
64
D. Bourn, M. Gran and P.-A. Jacqmin
• p is autonomous if it is a morphism of Mal’tsev algebras X 3 → X, i.e., if it satisfies the axiom: p(p(x1 , y1 , z1 ), p(x2 , y2 , z2 ), p(x3 , y3 , z3 )) = p(p(x1 , x2 , x3 ), p(y1 , y2 , y3 ), p(z1 , z2 , z3 )) Lemma 6 Let p be a right associative Mal’tsev operation p : X × X × X → X on a set X. If x, y, a, b ∈ X are elements such that p(x, y, a) = p(x, y, b), then a = b. Proof It follows from a = p(y, y, a) = p(y, x, p(x, y, a)) = p(y, x, p(x, y, b)) = p(y, y, b) = b.
Proposition 7 Let p be a Mal’tsev operation p : X ×X ×X → X on a set X. Then p is associative if and only if it is left associative and right associative. Proof If p is associative, then we can compute p(p(x, y, z), z, w) = p(x, y, p(z, z, w)) = p(x, y, w) which proves that p is left associative. Right associativity is proved similarly. If now p is both left associative and right associative, we can compute p(p(x, y, z), u, v) = p(p(x, y, z), z, p(z, u, v)) = p(x, y, p(z, u, v)) which proves that p is associative.
Proposition 8 Let p be a Mal’tsev operation p : X ×X ×X → X on a set X. Then p is autonomous if and only if it is associative and commutative. Proof We first assume that p is autonomous. We can then compute p(p(x, y, z), u, v) = p(p(x, y, z), p(x, x, u), p(x, x, v)) = p(p(x, x, x), p(y, x, x), p(z, u, v)) = p(x, y, p(z, u, v)) and p(x, y, z) = p(p(y, y, x), p(y, y, y), p(z, y, y)) = p(p(y, y, z), p(y, y, y), p(x, y, y)) = p(z, y, x) proving associativity and commutativity. Let us now assume that p is associative and commutative. In that case, we have p(p(y, z, u), x, p(p(x, y, z), u, v)) = p(p(p(y, z, u), x, p(x, y, z)), u, v) = p(p(p(y, z, u), y, z), u, v) = p(p(p(u, z, y), y, z), u, v) = p(p(u, z, z), u, v) = p(u, u, v) = v = p(p(y, z, u), p(y, z, u), v) = p(p(y, z, u), x, p(x, p(y, z, u), v))
On the naturalness of Mal’tsev categories
65
showing via Lemma 6 that p(p(x, y, z), u, v) = p(x, p(y, z, u), v). Using this, together with left and right associativity and commutativity, we have p(p(x1 , y1 , z1 ), p(x2 , y2 , z2 ), p(x3 , y3 , z3 )) = p(p(x1 , p(x2 , y2 , z2 ), z1 ), y1 , p(x3 , y3 , z3 )) = p(p(x1 , x2 , p(y2 , z2 , z1 )), y1 , p(x3 , y3 , z3 )) = p(x1 , x2 , p(p(y2 , z2 , z1 ), y1 , p(x3 , y3 , z3 ))) = p(x1 , x2 , p(x3 , y1 , p(p(y2 , z2 , z1 ), y3 , z3 ))) = p(x1 , x2 , p(x3 , y1 , p(p(y2 , y3 , z1 ), z2 , z3 ))) = p(x1 , x2 , p(x3 , y1 , p(y2 , y3 , p(z1 , z2 , z3 )))) = p(p(x1 , x2 , x3 ), y1 , p(y2 , y3 , p(z1 , z2 , z3 ))) = p(p(x1 , x2 , x3 ), p(y1 , y2 , y3 ), p(z1 , z2 , z3 ))
which concludes the proof.
Now that we have studied some properties of Mal’tsev operations and how they interplay, we can define the notion of a connector of equivalence relations as in [16] (slightly more general than the notion of pregroupoid, due to A. Kock [49], see also [48]): Definition 9 Let R and S be two equivalence relations on a same object X in a finitely complete category, and consider the following pullback: R ×X S
pS 1
/S dS 0
pR 0
R
dR 1
/X
A connector between R and S is a morphism p : R ×X S → X satisfying the following axioms: 1. xSp(xRySz) and p(xRySz)Rz; R
x
y S
S
p(xRySz)
R
z
2. p is a partial Mal’tsev operation, i.e., p(xRxSy) = y and p(xRySy) = x;
66
D. Bourn, M. Gran and P.-A. Jacqmin
3. p is left and right associative, i.e., p(p(xRySz)RzSw) = p(xRySw) and p(xRySp(yRzSw)) = p(xRzSw). In this case we say that the connector p makes R and S centralize each other. By the (partial version of) Proposition 7, such a connector is associative, i.e., for any element x, y, z, u, v ∈ X such that xRySzRuSv, we have p(p(xRySz)RuSv) = p(xRySp(zRuSv)). Example 10 If ∇X represents the largest equivalence relation on X, a connector between ∇X and ∇X is simply an associative Mal’tsev operation on X. Given an arrow f : T → X we write (Eq[f ], p1 , p2 ) for its kernel pair which is underlying an equivalence relation defined by the pullback Eq[f ]
/T
p2
p1
T
f
f
/ X.
Example 11 Given a relation (f, g) : T X × Y and the associated kernel equivalence relations Eq[f ] and Eq[g] of f and g, respectively, this relation T is difunctional if and only if Eq[f ] and Eq[g] centralize each other. Example 12 [24, 40] Given a reflexive graph X1 o
d0 s0 d1
/
/ X0
in a finitely complete category and Eq[d0 ] and Eq[d1 ] the kernel equivalence relations of d0 and d1 respectively, connectors between Eq[d0 ] and Eq[d1 ] are in 1-to-1 correspondence with groupoid structures on the reflexive graph. Considering again two equivalence relations R and S on X in a finitely complete category, we define RS via the following pullback RS /
/ S×S s×s
R × R / r×r / X 4
tw2,3
/ X4
where tw2,3 : X 4 → X 4 is the isomorphism defined by tw2;3 (x, y, w, z) = (x, w, y, z). In set theoretical terms, RS is the set of four-tuples (x, y, w, z) such that xRy, wRz, xSw and ySz, often depicted as:
On the naturalness of Mal’tsev categories
x
67 R
y
S
w
S R
z
We also consider the factorization α : RS → R ×X S : (x, y, w, z) → (x, y, z).
(1)
If R ∩ S = ΔX (the discrete relation on X), this factorization is a monomorphism. Indeed, if (x, y, w, z) and (x, y, w , z) are in RS, then wRzRw and wSxSw , showing that w(R ∩ S)w and thus w = w . Moreover, given a connector p : R ×X S → X, we can construct a section for α : RS → R ×X S via R ×X S → RS : (x, y, z) → (x, y, p(x, y, z), z). These observations lead us to the following proposition. Proposition 13 Given two equivalence relations R and S on the same object X in a finitely complete category, if R ∩ S = ΔX , then there is at most one connector between R and S.
2 Mal’tsev categories 2.1 Definition and examples As mentioned in the introduction, the definition of a Mal’tsev category is of an undisputable simplicity [22, 24]: Definition 14 A category E is said to be a Mal’tsev one, when it is finitely complete and such that any reflexive relation in E is an equivalence relation. A typical example of such a category is the category Gp of groups since it satisfies condition (M2) of Theorem 1 with the term p(x, y, z) = xy −1 z. The class of examples can be quickly extended thanks to the following straightforward lemma: Lemma 15 Given a left exact conservative functor U : E → E between finitely complete categories, the category E is Mal’tsev if E is Mal’tsev. So, considering the forgetful functors to the category Ab of abelian groups, the category Rg of rings and, given a ring A, any category of A-modules and A-algebras are immediately Mal’tsev.
68
D. Bourn, M. Gran and P.-A. Jacqmin
The variety Mal of Mal’tsev algebras produces a Mal’tsev category according to the Mal’tsev theorem. More generally, it is the case for any Mal’tsev variety V, considering the left exact conservative forgetful functor: U : V → Mal. On the other hand, given any category E, the functor category F(E, V) is clearly a Mal’tsev category as well. The variety Heyt of Heyting algebras is a Mal’tsev variety [60]. From that, the dual Setop of the category of sets, and more generally the dual Eop of any elementary topos E is a Mal’tsev category [21]. It is also the case for the dual of the category of compact Hausdorff spaces [21]. Another source of examples is given by the following straightforward observation: Lemma 16 The notion of Mal’tsev category is stable under slicing and coslicing. This means that, when E is a Mal’tsev category, so are the slice categories E/Y and the coslice categories Y /E, for any object Y ∈ E. Accordingly, any fibre PtY E of the fibration of points is a Mal’tsev category (see the definition before Theorem 23 below).
2.2 Yoneda embedding for internal structures Considering internal algebras of a Mal’tsev variety, one gets a third important source of examples. Any object X in a category E produces a functor: Y (X) = HomE (−, X) : Eop → Set This, in turn, produces a fully faithful functor: E → F (Eop , Set) which is called the Yoneda embedding. It is left exact when, in addition, the category E is finitely complete, the left exactness property being a synthetic translation of the universal properties of the finite limits. Given any algebraic theory T defined by any number of operations with finite arity and any number of axioms, we shall denote by V(T) the associated variety, and by T(E) the category of the internal T-algebras in the finitely complete category E. Then there is a canonical factorization YT making the following diagram commute: T(E)
YT
UTE
/ F(Eop , V(T)) F (Eop ,UT )
E
Y
/ F(Eop , Set)
where UT and UTE are the induced forgetful functors, which are both left exact and conservative. Accordingly they are faithful and reflect finite limits as well.
On the naturalness of Mal’tsev categories
69
Proposition 17 The functor YT is fully faithful and left exact. Proof The faithfulness is straightforward. Now, given a pair (M, M ) of Talgebras in E, any natural transformation θ : HomE (−, M ) → HomE (−, M ) in F(Eop , V(T)) has an underlying natural transformation θ : HomE (−, M ) → HomE (−, M ) in F(Eop , Set). From it, the Yoneda embedding Y produces a map f : M → M in E; it remains to check that it is a homomorphism of Talgebras, i.e. that some diagrams commute in E. This, again, can be checked via the faithfulness of the Yoneda embedding. The left exactness of the embedding YT is a consequence of the fact that the three other functors are left exact and that UT reflects finite limits. So the functor YT is left exact and conservative, and according to Lemma 15 we get: Proposition 18 Given any finitely complete category E, if V(T) is a Mal’tsev variety then the category T(E) is Mal’tsev.
3 Characterizations An immediate aspect of the richness of the notion of Mal’tsev category is that there are at least three types of characterization of very distinct nature.
3.1 Unital characterization For the first one we need the following: Definition 19 [7] A category E is said to be unital when it is pointed, finitely complete and such that, for any pair (X, Y ) of objects in E, the following pair of monomorphisms: X /
j0X =(1X ,0)
/ X ×Y o
j1Y =(0,1Y )
o Y
is jointly extremally epic. A category E is said to be strongly unital when it is pointed, finitely complete and such that any reflexive relation R on an object X which is right punctual (i.e. containing j1X ) is the largest equivalence relation on X. The previous terminology is justified by the following: Lemma 20 Any strongly unital category is unital.
70
D. Bourn, M. Gran and P.-A. Jacqmin
Proof Consider the following diagram U /
/T
(f,g)
/ X ×Y
/ U
ψ
(f,g)
t u0 ×u1
/
U ×U =
f ×g
/ X ×Y /
where (f, g) : U X × Y is a relation containing j0X and j1Y through maps u0 and u1 and where the right hand side square is a pullback. It determines a unique factorization U T making the left hand side square a pullback as well. The relation T on U is defined by (xU y)T (x U y ) if and only if xU y . Accordingly it is a reflexive relation. Now the map u1 insures that, for all xU y, we have (0U 0)T (xU y); namely the relation T is right punctual. So that we have T = ∇U and t is an isomorphism. According to the left hand side pullback, the map (f, g) is itself an isomorphism, and E is unital. The categories Mon, CoM and SRg of monoids, commutative monoids and semi-rings are unital categories; they are not Mal’tsev categories since, with the order N of the natural numbers, they have a reflexive relation which is not an equivalence relation. When E is finitely complete, the categories Mon(E), CoM(E), and SRg(E) of internal monoids, internal commutative monoids and internal semi-rings are so. This is the case in particular of the category Mon(Top) of topological monoids. More generally, a pointed variety V is unital if and only if it is a Jonsson-Tarski variety, see [4]. Lemma 21 Any pointed Mal’tsev category is strongly unital. Proof Given any right punctual reflexive relation R on X, it is a right punctual equivalence relation. It follows that R = ∇X . Accordingly the categories Gp, Ab and Rg of groups, abelian groups and rings are strongly unital categories. When E is finitely complete, the categories Gp(E), Ab(E), and Rg(E) of internal groups, internal abelian groups and internal rings are so. This is the case in particular of the category Gp(Top) of topological groups. More generally, a pointed variety of algebras V is strongly unital if and only if it has a unique constant 0 and a ternary operation p satisfying p(x, x, y) = y and p(x, 0, 0) = x, see [4]. Again we have: Lemma 22 Given any left exact conservative functor U : E → E between finitely complete categories, the category E is (resp. strongly) unital as soon as so is E . We denote by Pt(E) the category whose objects are the split epimorphisms equipped with a given section and whose maps are the pairs of morphisms commuting with the split epimorphisms and the given sections:
On the naturalness of Mal’tsev categories x
XO s
Y
71
/ X O s
f y
/ Y
f
We also denote by πE : Pt(E) → E the functor associating with any split epimorphism (f, s) its codomain Y . It is a fibration whose cartesian maps are the pullbacks of split epimorphisms. It is called the fibration of points and the fibre above Y is denoted by PtY E, see [6]. We are now ready for the first characterization theorem: Theorem 23 Given a finitely complete category E, the following conditions are equivalent: 1) any (pointed) fibre PtY E of the fibration of points is unital; 2) any relation (f, g) : R X × Y in E is difunctional; 3) E is a Mal’tsev category; 4) any fibre PtY E is strongly unital. Proof 1) ⇒ 2): Suppose that any fibre PtY E is unital. First, let us focus our attention on the following observation: given a pair (R, S) of reflexive relations on an object X such that R ∩ S = ΔX , the commutative square vertically indexed by 0 and horizontally indexed by 1 in the following diagram is a pullback: o RS O pR 0
Ro
pS 1
/
/ SO
pS 0
pR 1
dS 0 dR 1 dR 0
/ /X
dS 1
Indeed, by R ∩ S = ΔX , we know that the factorization RS → R ×X S is a monomorphism. Since PtX E is a unital category, it is an isomorphism in presence of the left hand side vertical section and of the upper horizontal one. Then the map dS1 · pS0 : RS → X produces a connector. Now let be given any relation (f, g) : R → X × Y in E. By Eq[f ] ∩ Eq[g] = ΔR , we get a connector and the relation R is difunctional, by Example 11. 2) ⇒ 3): Follows from Lemma 2. 3) ⇒ 4): When E is a Mal’tsev category, we noticed that so is any fibre PtY E, which is consequently strongly unital. 4) ⇒ 1): Follows from Lemma 20. Thanks to Theorem 1.2.12 in [4], the point 2) gives rise to the following: Corollary 24 A finitely complete category E is Mal’tsev if and only if any commutative square of split epimorphisms (with yf = f x, xs = sy, s sy = sx s and f sx = sy f )
72
D. Bourn, M. Gran and P.-A. Jacqmin
XO o f
x sx
O s Y o
o / / XO f
y sy
O /o / Y
s
is a regular pushout, namely such that the factorization from X to the pullback of f along y is an extremal epimorphism. Remark 25 As observed by Z. Janelidze [45], one can add one more equivalent condition to Theorem 23. Recall that a subtractive category is a finitely complete pointed category for which every left punctual reflexive relation is right punctual. A finitely complete category E is a Mal’tsev category if and only if any fibre PtY E is subtractive.
3.2 Centralization of equivalence relations In this section we are going to show how this first characterization exemplifies that the Mal’tsev context is the right conceptual one to deal with the notion of centrality of equivalence relations. The major interest of unital categories is that it allows one to define an intrinsic notion of commutation of morphisms. When E is a unital category, the pair (j0X , j1Y ) is jointly epic; accordingly, in the following diagram, there is at most one arrow φ making the following triangles commute: X /
j0X
/ X ×Y o
f
$ z Z
φ
j1Y
o Y
f
and the existence of such a factorization becomes a property. Definition 26 [9, 34] Let E be a unital category. We say that a pair (f, f ) of morphisms with common codomain commutes (or cooperates) when there is such a factorization map φ which is called the cooperator of the pair. We say that the map f : X → Y is central when the pair (f, 1Y ) cooperates and that the object X is commutative when the pair (1X , 1X ) cooperates. We shall denote by Com(E) the full subcategory of commutative objects in E. We immediately get: Proposition 27 Let E be a unital category. An object X is commutative if and only if it is endowed with a structure of commutative internal monoid which is necessarily unique. Any morphism between commutative objects is an internal morphism of monoids.
On the naturalness of Mal’tsev categories
73
We are going to show that the previous characterization theorem reduces the question of centralization of equivalence relations to a question of commutation in the fibres of the fibration of points. Indeed, in a Mal’tsev category, any equivalence relation R on X is completely determined as the following subobject in the fibre PtX E: R (dR 0 ,d1 )
R d/
sR 0
dR 0
/ X ×X O pX 0
sX 0
$ X
We shall denote it by ρR : ΥR Υ∇X . First observe that, given any pair (R, S) of equivalence relations on X, the product of ΥR◦ and ΥS in this fibre coincides with the pullback introduced in Definition 9: R ×O X S pR 0
o
σ1S pS 1
σ0R
Ro
/S O dS 0
sR 0 dR 1
/X
sS 0
Then observe that the pair of subobjects (ρR◦ , ρS ) commutes in the unital fibre PtX E if and only if there is a map (dS0 .pS1 , p) : R ×X S → X × X in E S S such that we get p · σ0R = dR 0 and p · σ1 = d1 , namely the Mal’tsev axioms. Accordingly there is a unique possible connector p : R ×X S → X making the pair (R, S) centralize each other. So, in a Mal’tsev category, centralization of equivalence relations becomes a property; we shall denote it as usual by [R, S] = 0. Proposition 28 [16, 14] Let E be a Mal’tsev category, and (R, S) two equivalence relations on X. The pair (R, S) centralizes each other if and only if the pair of subobjects (ρR◦ , ρS ) commutes in the fibre PtX E. This implies in particular that a pair (R, S) admits at most one map p : R ×X S → X satisfying the Mal’tsev axioms and that this map is necessarily a connector. Proof The previous observation shows that if the pair (R, S) centralizes each other in E, then the pair (ρR◦ , ρS ) commutes in the fibre PtX E. Conversely suppose that this pair commutes; we have to show that all the axioms of Definition 9 hold. For that, first introduce on R × X the relation H defined by (xRy)Hz if and only if we have ySz and xSp(xRySz). For any xRySz ∈ R ×X S we get the following diagram relatively to the relation H:
74
D. Bourn, M. Gran and P.-A. Jacqmin
xRy
/< y
yRy
#/
z
Since E is a Mal’tsev category, the relation H is a difunctional relation, and we get (xRy)Hz, namely xSp(xRySz). We get p(xRySz)Rz in the same way. y RzSw) by y = y¯ Now define the relation K on R × (R ×X S) by (xRy)K(¯ and p(xRySp(yRzSw)) = p(xRzSw). We get this last identity for all xRy and yRzSw by the following diagram: xRy yRy
/ yRzSz 8 & / yRzSw
We get p(p(xRySz)RzSw) = p(xRySw) in the same way.
Corollary 29 Let E be a Mal’tsev category, and (R, S) two equivalence relations on X. We have [R, S] = 0 as soon as R ∩ S = ΔX . Proof Straightforward from the first part of the proof of Theorem 23.
The following stability properties easily follow: Proposition 30 [17] Let E be a Mal’tsev category. Let also R, R , S be equiv¯ S¯ on Y . Then we get: alence relations on X and R, (a) [R, S] = 0 ⇐⇒ [S, R] = 0 (b) R ⊂ R and [R, S] = 0 ⇒ [R , S] = 0 ¯ S] ¯ = 0 ⇒ [R × R, ¯ S × S] ¯ =0 (c) [R, S] = 0 and [R, (d) when u : U X is a monomorphism, we get: [R, S] = 0 ⇒ [u−1 (R), u−1 (S)] = 0 Proof See Propositions 3.10, 3.12 and 3.13 in [17].
Definition 31 Let E be a Mal’tsev category. An equivalence relation R on an object X is said to be abelian when we have [R, R] = 0 and central when we have [R, ∇X ] = 0. An object X is said to be affine when we have [∇X , ∇X ] = 0. We shall denote by Aff(E) the full subcategory of affine objects in E. Proposition 32 Let E be a Mal’tsev category. When an equivalence relation R on X is abelian, then the connector p realizing [R, R] = 0 is such that p(xRyRz) = p(zRyRx). An object X is affine if and only if it is endowed with a (necessarily unique) internal Mal’tsev operation which is necessarily associative and commutative. Any morphism between affine objects commutes with the internal Mal’tsev operations.
On the naturalness of Mal’tsev categories
75
Proof Define on R the relation L define by (x, y)L(z, w) if and only if we have y = z and p(xRyRw) = p(wRyRx). The following diagram holds when R is abelian and determines our assertion: / yRy :
xRy
$ / yRz
yRy
The next point is just the description of the connector p associated with the centralization [∇X , ∇X ] = 0. Corollary 33 Let E be a Mal’tsev category. Aff(E) is stable under finite products and subobjects in E. An internal abelian group in E is just a pointed affine object 0A : 1 A. Proof It is a direct consequence of points (c) and (d) in Proposition 30 and of Proposition 32.
3.3 Groupoid characterization Let us denote respectively by RG(E) and Grpd(E) the categories of internal reflexive graphs and of internal groupoids in any finitely complete category E. The forgetful functor WE : Grpd(E) → RG(E) is left exact and conservative. As such, it is faithful and any monomorphism of Grpd(E) is hypercartesian with respect to WE . Internal groupoids According to Example 12 following which a groupoid is a reflexive graph endowed with a connector p on the pair (Eq[d0 ], Eq[d1 ]), we immediately get the first part of the following: Lemma 34 Given a Mal’tsev category E, there is on any reflexive graph at most one groupoid structure. Moreover, the induced inclusion functor WE : Grpd(E) RG(E) is full [24] and such that any sub-reflexive graph of a groupoid is itself a groupoid [7]. Proof Let be given a morphism of reflexive graphs: f1
XO 1 d0
X0
/ Y1 O d0
d1 f0
/ Y0
d1
When these reflexive graphs are underlying groupoid structures, the commutation of this morphism with the connectors is checked by composition with the extremally epic pair involved in the definition of Eq[d0 ] ×X0 Eq[d1 ].
76
D. Bourn, M. Gran and P.-A. Jacqmin
Now given any subobject (f0 , f1 ) : X Y in RG(E) with Y a groupoid, the inverse image f0−1 (Y) along f0 determines a subobject: XO 1 / d0
X0
φ1
/ f −1 (Y) 0 O d0
d1
X0
d1
where the right hand side part is a groupoid as well. Now Eq[di ] = φ−1 1 (Eq[di ]) so that X is underlying a groupoid structure by the point d) in Proposition 30.
Whence another characterization theorem: Theorem 35 [7] Given any finitely complete category E, the following conditions are equivalent: 1) E is a Mal’tsev category; 2) the forgetful functor WE : Grpd(E) → RG(E) is saturated on subobjects, namely any subobject n : X WE (Y) in RG(E) is the image, up to isomorphism, of a monomorphism m : X Y in Grpd(E). Proof [1) ⇒ 2)] is a direct consequence of the previous lemma. Suppose 2) and start with a reflexive relation R on X. Then condition 2) applied to the inclusion R ∇X in RG(E) makes R an equivalence relation. Internal categories Now, what are the internal categories in a Mal’tsev category? The answer is given by the following result which provides us with another characterization of internal groupoids. Proposition 36 [24, 14] Let E be a Mal’tsev category and X1 o
d0 s0 d1
/
/ X0
an internal reflexive graph. The following conditions are equivalent: 1) the following subobjects commute in the fibre PtX0 E: (d0 ,1X1 )
X1 d /
s0
(d1 ,1X1 )
/ X 0 × X1 o O pX0
d0
s0
o X1 :
(1X0 ,s0 )
$ X0 z
d1
2) this reflexive graph is underlying an internal category; 3) this reflexive graph is underlying an internal groupoid. Proof The two subobjects commute in PtX0 E if and only if they have a cooperator φ : X1 ×X0 X1 → X0 × X1 , i.e. a morphism satisfying φ · s0 = (d1 , 1X1 ), φ · s1 = (d0 , 1X1 ), pX0 · φ = d0 d2 and φ · s1 s0 = (1X0 , s0 ):
On the naturalness of Mal’tsev categories
77
X1: ×X0 X d 1 d2
z X1 e
/
s1
(d0 ,1X1 ) s0
d0
s0
φ
$ (d ,1 ) / X 0 × X1 o 1 X 1 o X1 O 9 pX0
(1X0 ,s0 )
% X0 y
d0
s0
d1
where the whole quadrangle is the pullback which defines the internal object X1 ×X0 X1 of composable pairs of the reflexive graph. So the morphism φ is necessarily a pair of the form (d0 d2 , d1 ), where d1 : X1 ×X0 X1 → X1 is such that d1 s0 = 1X1 , d1 s1 = 1X1 . The incidence axioms: d0 d1 = d0 d0 , d1 d1 = d1 d2 come by composition with the upper jointly extremally epic pair (s0 , s1 ). Accordingly this map d1 produces a composition for the composable pairs. Let us set X2 = X1 ×X0 X1 . In order to check the associativity we need the following pullback which defines X3 as the internal objects of ‘triples of composable morphisms’: s0 / X2 X3O o O d0 d3
s2
X2 o
s1
d2 s0 d0
/ X1
The composition map d1 induces a unique couple of maps (d1 , d2 ) : X3 ⇒ X2 such that d0 d1 = d0 d0 , d2 d1 = d1 d3 and d0 d2 = d1 d0 , d2 d2 = d2 d3 . The associativity axiom is given by the remaining simplicial axiom: (3) d1 d1 = d1 d2 . The checking of this axiom comes with composition with the pair (s0 , s2 ) of the previous diagram since it is jointly extremally epic as well. Conversely, the composition morphism d1 : X1 ×X0 X1 → X1 of an internal category satisfies d1 s0 = 1X1 , d1 s1 = 1X1 and consequently produces the cooperator φ = (d0 d2 , d1 ). Whence 1) ⇔ 2). It is clear that 3) ⇒ 2). It remains to check 2) ⇒ 3). Starting with an internal category, consider the following diagram in the fibre PtX0 E: s1
(d0 ,d1 ) / Eq[d0 ] o X1 × X 0 X 1 f O s0
d0
d0
s1 d1
s0
& X o 1
/ X1O d0
s0 d0
/ X0
s0
First let us show that (d0 , d1 ) is a monomorphism, namely that the composition is right cancelable. So, consider the relation H on (Eq[d0 ] ∩ Eq[d1 ]) × X1
78
D. Bourn, M. Gran and P.-A. Jacqmin
(where Eq[d0 ]∩Eq[d1 ] is the object of ‘parallel maps’ in the internal category) defined by (α, β)Hγ if d0 (α) = d1 (γ) and α · γ = β · γ. The following diagram shows that (α, β)H1d1 (γ) , namely that α = β, as soon as α · γ = β · γ: /7 γ
(α, β)
(1d1 (γ) , 1d1 (γ) )
' / 1d
1 (γ)
Accordingly (d0 , d1 ) : X1 ×X0 X1 Eq[d0 ] produces a reflexive (due to the s0 ) and right punctual (commutation of the s1 ) relation on the object (d0 , s0 ) : X1 X0 in the strongly unital fibre PtX0 E. Accordingly it is an isomorphism and the internal category is an internal groupoid.
3.4 Base-change characterization The next characterization is dealing with the base-change functor along split epimorphisms with respect to the fibration of points. Definition 37 Given a split epimorphic pair of functors (T, G) : E E , (T ◦ G = 1E ), this pair is called correlated (resp. strongly correlated ) on monomorphisms when, given any monomorphism m : Z G(X) in E, the morphism GT (m) : GT (Z) → G(X) factorizes through m (resp. factorizes through m via an isomorphism). Lemma 38 Given a split epimorphic pair of functors (T, G), if it is correlated on monomorphisms, then any monomorphism m : Z G(X) such that T (m) is an isomorphism is itself an isomorphism. When E is finitely complete and T is left exact, we have the converse. Proof Suppose the pair is correlated and T (m) an isomorphism. Then, so is GT (m) and the monomorphism m is a split epimorphism as well. Accordingly it is an isomorphism. When T is left exact and m : Z G(X) a monomorphism, the map T (m) is a monomorphism as well. Now consider the following pullback in E: /Z P m m ¯ / G(X) GT (Z) GT (m)
It is preserved by T so that T (m) ¯ is an isomorphism. Under our assumption, so is m; ¯ and GT (m) factorizes through m. If f : Y → X is a morphism in a finitely complete category E, we denote by f ∗ : PtX E → PtY E the base-change functor obtained by pullback
On the naturalness of Mal’tsev categories
79
along f . If E is pointed, we denote by αX (resp. τX ) the unique morphism 0 → X (resp. X → 0) for any object X. From the above lemma, it is easy to check that, given a finitely complete pointed category E, it is unital if and only if for any object X the split epimorphic pair of base-change functors ∗ ∗ (αX , τX ) : PtX E Pt0 E ∼ = E is correlated on monomorphisms. A bit more difficult is the same characterization dealing with strongly unital categories ∗ ∗ , τX ); for that see [7] and the second asserand strongly correlated pairs (αX tion of the following: Proposition 39 Let (T, G) : E E be a split epimorphic pair of functors with E finitely complete and T left exact. If it is correlated on monomorphisms, then the faithful functor G is full as well. It is strongly correlated if and only if the functor G is saturated on subobjects. Proof Suppose we have a map f : G(U ) → G(V ), then take the equalizer j : W G(U ) of the pair (f, GT (f )). The functor T being left exact, its image T (j) is an isomorphism. Accordingly j is itself an isomorphism, and f = GT (f ). So, G is full. When the pair (T, G) is strongly correlated and m : Z G(X) is a monomorphism, then n = T (m) is the monomorphism whose image by G is isomorphic to m. Conversely, if G is saturated on subobjects, starting with a monomorphism m : Z G(X), denote by γ : Z → G(W ) the isomorphism such that m ∼ = G(n) for a monomorphism n : W X. Then the map γ −1 · GT (γ) : GT (Z) → Z produces the desired isomorphism making the pair (T, G) strongly correlated. Whence, now, the third characterization: Theorem 40 [7] Let E be a finitely complete category. It is Mal’tsev if and only if, given any split epimorphism (f, s) in E, the (left exact) base-change functor f ∗ with respect to the fibration of points is saturated on subobjects (and consequently full). Proof By Theorem 23, the category E is Mal’tsev if and only if any fibre PtY E is strongly unital, which is the case if and only if, for any split epimorphism (f, s), the pair (s∗ , f ∗ ) is strongly correlated. According to the last proposition, it is the case if and only if f ∗ is saturated on subobjects. This last characterization is also important because, in the regular context, we shall show that it could be extended to any regular epimorphism f in E (see Theorem 52).
4 Stiffly and naturally Mal’tsev categories There are obviously two extremal situations satisfied by Mal’tsev categories: any equivalence relations R and S on a same object centralize each other and
80
D. Bourn, M. Gran and P.-A. Jacqmin
the only pairs (R, S) of equivalence relations centralizing each other are the ones such that R ∩ S = ΔX . The following notion was introduced by P. Johnstone in [47] (more precisely, he defined this notion via the equivalent formulation 2 in Theorem 45 below): Definition 41 [47] A Mal’tsev category is naturally Mal’tsev if [R, S] = 0 for any equivalence relations R and S on a same object. Any finitely complete additive category is naturally Mal’tsev. So is the subcategory Aff(E) of any Mal’tsev category E. Definition 42 We say that a Mal’tsev category is stiffly Mal’tsev if for any equivalence relations R and S on a same object, [R, S] = 0 if and only if R ∩ S = ΔX . The categories BoRg of boolean rings and CNRg of commutative von Neumann regular rings are examples of stiffly Mal’tsev categories. The variety Heyt of Heyting algebras being a stiffly Mal’tsev category, so are the dual Setop of the category of sets, and more generally the dual Eop of any elementary topos E in view of the left exact conservative functor Eop → Heyt(E) [4]. Clearly the two notions are stable under slicing and coslicing. For the next respective characterizations, we need the following: Definition 43 A linear category is a unital category where any object is commutative. A stiffly unital category is a unital category in which the 0 object is the unique commutative object. When E is unital, the subcategory Com(E) is linear. The category BoSRg of boolean semi-rings is stiffly unital. Theorem 44 [4] Given any Mal’tsev category E, the following statements are equivalent: 1) E is a stiffly Mal’tsev category; 2) the only internal groupoids are the equivalence relations; 3) any fibre PtY E is stiffly unital; 4) the only abelian equivalence relations are the discrete ones. Proof 1) ⇒ 2): Consider any internal groupoid: X1 o
d0 s0 d1
/
/ X0 . We have
[Eq[d0 ], Eq[d1 ]] = 0 and thus Eq[d0 ] ∩ Eq[d1 ] = ΔX1 . So, (d0 , d1 ) : X1 X0 × X0 is a monomorphism, and the groupoid is an equivalence relation. 2) ⇒ 3): A split epimorphism (f, s) : X Y is a commutative object in PtY E, when it is endowed with a monoid structure in this fibre, namely when it is an internal category structure with d0 = f = d1 . According to Proposition 36 it is a groupoid. So (f, f ) and thus f are monomorphisms. Being a split epimorphism as well, f is an isomorphism, i.e. a 0 object in PtY E.
On the naturalness of Mal’tsev categories
81
3) ⇒ 4): Given any equivalence relation R on X, we have [R, R] = 0 if and R only if the split epimorphism (dR 0 , s0 ) : R X is endowed with a commutative monoid structure in the fibre PtX E. So, dR 0 is an isomorphism and we get R ∼ Δ . = X 4) ⇒ 1): If we have [R, S] = 0, we get [R ∩ S, R ∩ S] = 0; so R ∩ S = ΔX and we get 1). Theorem 45 [47, 7] Given any Mal’tsev category E, the following statements are equivalent: 1) E is a naturally Mal’tsev category; 2) any object X is endowed with a natural Mal’tsev operation pX ; 3) any fibre PtY E is linear; 4) any fibre PtY E is additive; 5) any reflexive graph is endowed with a groupoid structure. Proof We have [1) ⇔ 2)] with the connector pX : X ×X ×X → X associated with the centralization [∇X , ∇X ] = 0, see Proposition 32. Now, [1) ⇒ 5)] is straightforward. We get [5) ⇒ 4)] since any split epimorphism (f, s) : X Y is a particular reflexive graph, which, by 5), gives to (f, s) an internal group structure in the fibre PtY E. Now [4) ⇒ 3)] is a consequence of the fact that any finitely complete additive category is linear. Suppose 3) and consider the split epimorphism (pY0 , sY0 ) : Y × Y → Y . It has a monoid structure in the fibre PtY E, which is a ternary operation pY : Y × Y × Y → Y in E. It satisfies the unit axioms which, for pY , turn to be exactly the two Mal’tsev axioms in E.
5 Regular Mal’tsev categories An equivalence relation is called effective when it is the kernel pair of some morphism. A map f is said to be a regular epimorphism when f is the coequalizer of two parallel arrows in E. Recall from [1]: Definition 46 A finitely complete category E is regular when: (a) regular epimorphisms are stable under pullbacks; (b) any effective equivalence relation Eq[f ] has a coequalizer. It is exact when, in addition: (c) any equivalence relation in E is effective. Any variety V of universal algebras is an exact category. Given any map f in a regular category the quotient qf of the kernel equivalence relation Eq[f ] produces a canonical decomposition f = m · qf where m is a monomorphism [1]. Given any regular epimorphism g : X Y and any equivalence reg×g
lation R on X, the canonical decomposition of the map R X ×X Y ×Y produces a reflexive relation S on Y . When E is a Mal’tsev category, it is an equivalence relation we shall denote by g(R). Now we have the following:
82
D. Bourn, M. Gran and P.-A. Jacqmin
Proposition 47 [17] Let E be a regular Mal’tsev category and g : X Y a regular epimorphism. If (R, S) is a pair of centralizing equivalence relations on X, then the equivalence relations g(R) and g(S) also centralize each other. In particular, when X is an affine object, so is Y . When, moreover, the category E is finitely cocomplete, we can produce the commutator of any pair of equivalence relations: Proposition 48 [11, 4] Let E be a finitely cocomplete regular Mal’tsev category. If (R, S) is a pair of equivalence relations on X, there is a universal regular epimorphism ψ : X Y such that we get [ψ(R), ψ(S)] = 0. In particular the inclusion Aff(E) E has a left adjoint. Remark 49 Given two equivalence relations R and S on a same object in a finitely cocomplete regular Mal’tsev category, the commutator [R, S] of R and S can be defined as the kernel equivalence relation Eq[ψ] of the morphism ψ : X Y from the above proposition. Remark 50 The Mal’tsev context being the right conceptual one to deal with the notion of centrality of equivalence relations, it is not unexpected to observe that it is also the right context to deal with nilpotency as well [3]. In the regular context we get the following observations and characterization: Lemma 51 Let E be a regular Mal’tsev category and the following diagram be any pullback of a split epimorphism f along a regular epimorphism q: q
XO f
Y
s
//X O f
q
//Y
s
Then the upward square is a pushout. Proof Consider any pair (φ, σ) of morphisms such that φ · s = σ · q (∗): Eq[q ] o O Eq(f )
dq0
dq1
Eq(s )
Eq[q] o
/
/ XO
dq1
q
//X O φ
f dq0
s
/ /Y
f
q
//Y
s
+T D σ
and complete the diagram by the kernel pairs Eq[q] and Eq[q ] which produce the left hand side pullbacks. The morphism q being a regular epimorphism,
On the naturalness of Mal’tsev categories
83
it is the quotient of its kernel pair Eq[q ]. We shall obtain the desired fac torization X → T by showing that φ coequalizes the pair (dq0 , dq1 ). The left hand side squares being pullbacks, this can be done by composition with the jointly extremal pair (Eq(s ), sq0 ), with sq0 : X Eq[q ] the diagonal giving the reflexivity of Eq[q ]. This is trivial for the composition with sq0 , and a consequence of the equality (∗) for the composition with Eq(s ). Theorem 52 [7] Given any regular category E, it is a Mal’tsev category if and only if any base-change functor q ∗ with respect to the fibration of points along a regular epimorphism q is fully faithful and saturated on subobjects. Proof Any split epimorphism being a regular one, the condition above implies that E is a Mal’tsev category thanks to Theorem 40. Let us show the converse. First notice that in any regular category, and given any regular epimorphism q, the base-change q ∗ is necessarily faithful. Suppose, in addition, that E is a Mal’tsev category. 1) The functor q ∗ is full. Consider the following diagram:
f
q
XO s f¯
Y
m
"
//X O
¯ X C
s
f
q¯
¯ //X D
f¯
//Y
s¯ q
s¯
where the downward squares are pullbacks and m a morphism in PtY E. According to the previous lemma the upward vertical square is a pushout; ¯ such that m · q = q¯ · m and m · s = s¯; we whence a unique map m : X → X get also f¯ · m = f since q is a regular epimorphism, and m is a map in the fibre PtY E such that q ∗ (m) = m . 2) The functor q ∗ is saturated on subobjects. First, any base-change functor g ∗ , being left exact, preserves monomorphisms. Consider now the following diagram where the right hand side quadrangle is a pullback and m is a monomorphism in PtY E: Eq[¯ q · m] o O & Eq(m) Eq(f )
Eq(s )
~ Eq[q] o
/
&
δ0
/ XO !
δ1
Eq[¯ q] o > dq0 dq1
dq0¯ dq1¯
f
s
f¯
/ /Y
m
!/
¯ /X D
q¯
¯ //X F f¯
s¯ q
//Y
s¯
84
D. Bourn, M. Gran and P.-A. Jacqmin
Complete the diagram with the kernel pair Eq[¯ q ·m]. The factorization Eq(m) is a monomorphism. In the Mal’tsev context, this implies that any of the left hand side commutative squares is a pullback: indeed, since Eq(m) is a monomorphism, it is also the case for the factorization τ of the left hand side square indexed by 0 to the pullback of (f , s ) along the split epimorphism (dq0 , sq0 ); but it is an extremal epimorphism as well, since E is a Mal’tsev category, by Condition 1 in Theorem 23; so it is an isomorphism. So, the following downward left hand side diagram is underlying a discrete fibration between equivalence relations. Now, denote by q the quotient of the effective relation Eq[¯ q · m], and by (f, s) the induced split epimorphism.
Eq(f )
/
δ0
Eq[¯ q · m] o O
/ XO
δ1
Eq(s )
Eq[q] o
q
f
s
f
/ /Y
dq0 dq1
//X O s
//Y
q
By the so-called Barr-Kock Theorem [1, 18], the right hand side square is a pullback in the regular category E. Since q ∗ is full, m determines a factoriza¯ in the fibre PtY E: tion n : X → X Eq[¯ q · m] o O & Eq(m) Eq(f )
Eq(s )
~ Eq[q] o
/
&
δ0 δ1
Eq[¯ q] o >
f
dq0¯ dq1¯
s
f¯
/ /Y
dq0 dq1
q
/ XO !
m
!/
¯ /X D
//X O
n
q¯
f
s¯ q
s f¯
//Y
¯ //X E s¯
The upper right hand side quadrangle is a pullback since the two other right hand side commutative squares are so. Accordingly we get m = q ∗ (n) and n is a monomorphism since pulling back along regular epimorphisms reflects monomorphisms [1]. From Corollary 24, we get another characterization: Corollary 53 A regular category E is Mal’tsev if and only if any morphism in Pt(E) with horizontal regular epimorphisms x
XO f
O s Y
//X f
y
O //Y
O s
On the naturalness of Mal’tsev categories
85
is a regular pushout, namely such that the factorization from X to the pullback of f along y is a regular epimorphism. Proof Clearly if this condition holds, it holds in particular for horizontal split epimorphisms. Then the conclusion is given by Corollary 24. Conversely, suppose E is a regular Mal’tsev category, and complete the square by the horizontal kernel equivalence relations:
Eq[x] O χ
Eq(f )
dx 0
o "
dx 1
P¯ v Eq(s ) B
Eq[y] o
/ δ0
δ1
ψ
q¯
/ PD f
dy 1
f
s
/ /Y
dy 0
// 6 6 XO
x
/6 X O
s
//Y
y
Then denote by ψ (resp. χ) the factorization from X to the pullback of f along y (resp. from Eq[x] to the pullback of f along dy0 ). The map χ is a regular epimorphism according to Corollary 24. Moreover the quadrangle x · δ0 = q¯ · δ1 is a pullback, and, E being regular, the factorization δ1 is a regular epimorphism, since so is x. Then the equality ψ · dx1 = δ1 · χ shows that ψ is a regular epimorphism, since so is δ1 · χ. As mentioned in the first section, in the case of a variety V of universal algebras, the Mal’tsev property can be expressed by a ternary term p(x, y, z) satisfying the identities p(x, y, y) = x and p(x, x, y) = y [54]. We shall prove the existence of such a term by adopting a categorical approach, first considered in [23], based on an interpretation of a suitable regular pushout lying in the full subcategory of free algebras. Theorem 54 A variety V of universal algebras is a Mal’tsev category if and only if its algebraic theory has a ternary term p satisfying the identities p(x, y, y) = x and p(x, x, y) = y. Proof In Theorem 1 it is shown that the existence of the Mal’tsev term p implies that the variety is a Mal’tsev category. Conversely, assume that V is a Mal’tsev variety, and denote by X the free algebra on one element, by X + X the free algebra on two elements, and by X + X + X the free algebra on three elements. If ∇ : X + X → X is the codiagonal, then the following diagram commutes: X +X +X
∇+1X
/ X +X ∇
1X +∇
X +X
∇
/X
86
D. Bourn, M. Gran and P.-A. Jacqmin
This diagram is clearly a regular pushout by Corollary 53, so that the canonical factorization α : X + X + X → Eq[∇] to the kernel pair of ∇ in V is a regular epimorphism, i.e. a surjective homomorphism. We can then choose the element (q, r) ∈ Eq[∇], where q(x, y) = x and r(x, y) = y, and we know that there is a ternary term p(x, y, z) ∈ X + X + X with α(p) = (q, r). Consider then the following commutative diagram X + Xf o
1X +∇
p0
X +X +X α
Eq[∇]
∇+1X
/ X +X 8
p1
where p0 and p1 are the projections of the kernel pair Eq[∇]. When applied to the term p, its commutativity exactly expresses the announced identities p(x, y, y) = x and p(x, x, y) = y for the term p. Remark 55 The categorical notion of regular pushout, introduced in full generality in [10] in relationship with the 3 × 3 Lemma, is also related to the notion of double extension [41], that was first considered by G. Janelidze in the category of groups. This latter notion has turned out to play a central role in the theory of (higher) central extensions of an exact Mal’tsev category. Indeed, the possibility of inductively defining higher dimensional categorical Galois structures starting from a Birkhoff reflective subcategory of an exact Mal’tsev category also depends on the existence of double extensions and their higher versions (see [42, 27, 26, 25] and the references therein). For instance, the higher homology of groups, compact groups and crossed modules can be better understood from this categorical perspective, and many new computations can be made thanks to the characterizations of the higher central extensions relative to the higher dimensional Galois structures. Remark 56 The essence of the definition of regular categories is to capture the categorical properties of Set which concern finite limits and regular epimorphisms. This has been formalized by Barr’s embedding theorem [2] which claims that for any small regular category E there exists a fully faithful left exact embedding into a presheaf category E → SetC which preserves the regular epimorphisms. Since in a presheaf category limits and quotients are computed componentwise, with this embedding theorem it is enough to prove some statements about finite limits and regular epimorphisms in Set (i.e. using elements) in order to prove it in full generality for any regular category, see [4] for more details. This embedding theorem has been extended to the regular Mal’tsev case in [36]. An essentially algebraic (i.e. locally presentable) regular Mal’tsev category M is constructed such that any small regular Mal’tsev category E admits a conservative left exact embedding E → MC which preserves the regular epimorphisms. This category M is constructed via some partial operations and ‘approximate Mal’tsev operations’ [20]. In the
On the naturalness of Mal’tsev categories
87
same way as with Barr’s embedding theorem, one can now reduce the proof of some statements about finite limits and regular epimorphisms in any regular Mal’tsev category to the particular case of M and thus use elements and (approximate) Mal’tsev operations. Similar embedding theorems also hold in the regular unital and strongly unital case, see [37, 35]. Using partial Mal’tsev operations, one also has an embedding theorem for (non necessarily regular) Mal’tsev categories [38, 35]. We now observe that, in any exact Mal’tsev category E, the category Cat(E) = Grpd(E) of internal categories (=internal groupoids) and internal functors inherits the exactness property from the base category E. The category Catn (E) of n-fold internal categories is defined by induction by Cat1 (E) = Cat(E) and Catn+1 (E) = Cat(Catn (E)) for n ≥ 1. Theorem 57 [29] Let E be an exact Mal’tsev category. Then: 1. the category Cat(E) of internal categories in E is exact Mal’tsev; 2. the category Catn (E) of n-fold internal categories in E is exact Mal’tsev, for any n ≥ 1. Proof 1. As shown in [24] the category Cat(E) is a full subcategory of the category RG(E) of reflexive graphs in E (Lemma 34). Next, given any internal functor (f0 , f1 ) : X → Y in Cat(E) / Y 1 × Y0 Y1
f2
X1 ×X0 X1 p 0 m p1
p 0 m p1
XO 1
d0
X0
/ Y1 O
f1
d0
d1
/ Y0
f0
d1
it has a canonical factorization in the category RG(E) of reflexive graphs as q1 i1 / / I1 / / Y1 XO 1 O O d0
X0
d0
d1 q0
d0
d1
/ / I0 /
i0
/ Y0
d1
where f0 = i0 · q0 and f1 = i1 · q1 are the (regular epi)-mono factorizations of f0 and of f1 in E, respectively. The induced reflexive graph I in the middle of the diagram above is underlying a groupoid structure (by Lemma 34, for instance), and the factorization above is then the (regular epi)-mono factorization in Cat(E) of the internal functor (f0 , f1 ). These factorizations are clearly pullback stable in Cat(E), since regular epimorphisms in E are pullback stable by assumption. One then checks that
88
D. Bourn, M. Gran and P.-A. Jacqmin
any internal equivalence relation in Cat(E) is a kernel pair (see Theorem 3.2 in [29]) to conclude that Cat(E) is an exact category. The fact that Cat(E) is a Mal’tsev category immediately follows from Lemma 15 and the fact that the forgetful functor Cat(E) → RG(E) to the Mal’tsev category RG(E) is left exact and conservative. 2. By induction, this follows immediately from the first part of the proof. This result shows a difference with the case of a general exact category E, for which the category Cat(E) is not even regular, in general. For instance, the ordinary category Cat(Set) = Cat of small categories and functors is not regular (the same can be said for the category of small groupoids). Let us conclude this section by mentioning the important result in [21] asserting that a regular category C is a Mal’tsev category if and only if any simplicial object in C is an internal Kan complex.
6 Regular Mal’tsev categories and the calculus of relations The aim of this section is to briefly recall the calculus of relations in a regular category and present some instances of its usefulness in the context of Mal’tsev and Goursat categories [22, 21]. We shall also give a categorical result concerning the direct product decomposition of an object coming from universal algebra. Given a relation r0 , r1 : R → X × Y , its opposite relation R◦ is the relation from Y to X given by the subobject r1 , r0 : R → Y × X. Given two relations r0 , r1 : R → X × Y and s0 , s1 : S → Y × Z in a regular category, their composite SR = S ◦ R → X × Z can be defined as follows: take the pullback π1 /S R ×Y S π0
R
r1
/Y
s0
of r1 and s0 , and the (regular epi)-mono factorization of r0 π0 , s1 π1 : R ×Y S
r0 π0 ,s1 π1
**
S◦R
4
/ X × Z. 4
/ X × Z in the The composite S ◦ R is defined as the relation S ◦ R / diagram above. Note that the transitivity of a relation R on an object X can be expressed by the inequality R ◦ R R, and the symmetry by the inequality R◦ R (or, equivalently, by R◦ = R).
On the naturalness of Mal’tsev categories
89
In the following, given an arrow f : A → B in E, we shall identify it to the relation 1A , f → A × B representing its graph. For any arrow f : A → B the corresponding relation is difunctional: f ◦ f ◦ ◦ f = f. Note also that
f ◦ f ◦ = 1B
if and only if f is a regular epimorphism, while f ◦ ◦ f = Eq[f ]. Finally, with the notations we have introduced, any relation r0 , r1 : R → X × Y can be written as the composite R = r1 ◦ r0◦ . Theorem 58 [55, 22] For a regular category E, the following conditions are equivalent: (M1) for any pair of equivalence relations R and S on any object X in E, S ◦ R = R ◦ S; (M3) any relation U from X to Y is difunctional; (M4) E is a Mal’tsev category; (M5) any reflexive relation R on any object X in E is symmetric; (M6) any reflexive relation R on any object X in E is transitive. Proof (M 1) ⇒ (M 3) As observed above, any relation U u0
X
~
u1
Y
can be written as U = u1 ◦ u◦0 . The assumption implies in particular that the kernel pairs Eq[u0 ] and Eq[u1 ] of the projections commute in the sense of the composition of relations (on the object U ): (u◦1 ◦ u1 ) ◦ (u◦0 ◦ u0 ) = (u◦0 ◦ u0 ) ◦ (u◦1 ◦ u1 ). Accordingly, by keeping in mind that the relations u0 and u1 are difunctional: U = u1 ◦ u◦0
= (u1 ◦ u◦1 ◦ u1 ) ◦ (u◦0 ◦ u0 ◦ u◦0 ) = u1 ◦ (u◦1 ◦ u1 ) ◦ (u◦0 ◦ u0 ) ◦ u◦0 = u1 ◦ (u◦0 ◦ u0 ) ◦ (u◦1 ◦ u1 ) ◦ u◦0
= (u1 ◦ u◦0 ) ◦ (u0 ◦ u◦1 ) ◦ (u1 ◦ u◦0 ) = U ◦ U ◦ ◦ U. (M 3) ⇒ (M 4) This appears already as Theorem 23. Using the calculus of relations, we can proceed as follows. Let (U, u0 , u1 ) be a reflexive relation on
90
D. Bourn, M. Gran and P.-A. Jacqmin
an object X, so that 1X ≤ U . By difunctionality we have: U ◦ = 1X ◦ U ◦ ◦ 1X ≤ U ◦ U ◦ ◦ U = U, showing that U is symmetric. On the other hand: U ◦ U = U ◦ 1X ◦ U ≤ U ◦ U ◦ ◦ U = U, and U is transitive. (M 4) ⇒ (M 5) Clear. (M 5) ⇒ (M 3) Let (U, u0 , u1 ) be a relation from X to Y . The relation u◦0 ◦ u0 ◦ u◦1 ◦ u1 is reflexive, since both the kernel pairs u◦0 ◦ u0 and u◦1 ◦ u1 are reflexive. By assumption the relation u◦0 ◦ u0 ◦ u◦1 ◦ u1 is then symmetric: (u◦0 ◦ u0 ◦ u◦1 ◦ u1 )◦ = u◦0 ◦ u0 ◦ u◦1 ◦ u1 . This implies that u◦1 ◦ u1 ◦ u◦0 ◦ u0 = u◦0 ◦ u0 ◦ u◦1 ◦ u1 , and then, by multiplying on the left by u1 and on the right by u◦0 we get the equality u1 ◦ u◦1 ◦ u1 ◦ u◦0 ◦ u0 ◦ u◦0 = u1 ◦ u◦0 ◦ u0 ◦ u◦1 ◦ u1 ◦ u◦0 . By difunctionality of u1 and u◦0 it follows that u1 ◦ u◦0 = u1 ◦ u◦0 ◦ u0 ◦ u◦1 ◦ u1 ◦ u◦0 , and then
u1 ◦ u◦0 = (u1 ◦ u◦0 ) ◦ (u1 ◦ u◦0 )◦ ◦ (u1 ◦ u◦0 ),
showing that U = u1 ◦ u◦0 is difunctional. Observe that (M 4) ⇒ (M 6) is obvious, and let us then prove that (M 6) ⇒ (M 3). Let U = u1 ◦ u◦0 be any relation from X to Y . The relation u◦1 ◦ u1 ◦ u◦0 ◦ u0 is reflexive, thus it is transitive by assumption. This gives the equality (u◦1 ◦ u1 ◦ u◦0 ◦ u0 ) ◦ (u◦1 ◦ u1 ◦ u◦0 ◦ u0 ) = u◦1 ◦ u1 ◦ u◦0 ◦ u0 , yielding u1 ◦ u◦1 ◦ u1 ◦ u◦0 ◦ u0 ◦ u◦1 ◦ u1 ◦ u◦0 ◦ u0 ◦ u◦0 = u1 ◦ u◦1 ◦ u1 ◦ u◦0 ◦ u0 ◦ u◦0 .
On the naturalness of Mal’tsev categories
91
By difunctionality we conclude that u1 ◦ u◦0 ◦ u0 ◦ u◦1 ◦ u1 ◦ u◦0 = u1 ◦ u◦0 , and
U ◦ U ◦ ◦ U = U.
Finally, to see that (M 5) ⇒ (M 1), observe that the relation S ◦ R is reflexive, and then it is symmetric, so that S ◦ R = (S ◦ R)◦ = R◦ ◦ S ◦ = R ◦ S,
concluding the proof.
Direct product decompositions In any regular category E, given two equivalence relations R and S on X such that R ◦ S = S ◦ R, the composite R ◦ S is then an equivalence relation: indeed, the relation R ◦ S is obviously reflexive, but also symmetric, since (R ◦ S)◦ = S ◦ ◦ R◦ = S ◦ R = R ◦ S, and transitive: (R ◦ S) ◦ (R ◦ S) = R ◦ R ◦ S ◦ S = R ◦ S. The equivalence relation R ◦ S is then the supremum R ∨ S of R and S as equivalence relations on X. When this is the case, by Proposition 2.3 in [19], the canonical morphism α : RS → R ×X S in the following diagram: RS
p1 α
p0
% R ×X S
pS 1
dS 0
pR 0
! R
/S
dR 1
/X
from the largest double equivalence relation RS on R and S to the pullback R ×X S is a regular epimorphism. We then get the following: Theorem 59 [19] Let E be an exact category, R and S two equivalence relations on X such that: • R ∧ S = ΔX ;
92
D. Bourn, M. Gran and P.-A. Jacqmin
• R ◦ S = S ◦ R and • R ∨ S = ∇X . Then X is isomorphic to X/R × X/S. Proof The first two assumptions imply that any of the commutative squares on the left hand side in the diagram
p0
τ
/ SO
p1
p1
Ro
/
p0
o RS O
s0 r0 r1
s1
/ /X
//T O t0
qR
(2) t1
/ / X/R
is a pullback (since the canonical morphism RS → R ×X S in (1) is both a monomorphism and a regular epimorphism). The right-hand part of the diagram is obtained by taking the quotient X/R of X by the equivalence relation R, and the quotient T of S by the equivalence relation RS on S, with t0 and t1 the induced factorizations. The fact that the equivalence relation RS on S is the inverse image of the relation R × R on X × X implies that (t0 , t1 ) : T → X/R × X/R is a monomorphism. The relation T actually is an equivalence relation (by Theorem 3 in [5]), and the so-called Barr-Kock theorem [1, 18] implies that the following square is a pullback X
qR
qS
X/S
/ / X/R γ
β
//Q
where γ : X/R → Q is the quotient of X/R by T and β : X/S → Q the unique induced factorization. This square is also a pushout (since τ in the diagram (2) is a regular epimorphism), and in the exact category E this implies that the kernel pair Eq[γ · qR ] of γ · qR is the supremum R ∨ S of R and S as (effective) equivalence relation on X. Since R ∨ S = ∇X , we conclude that Q is the quotient of X by ∇X , therefore it is a subobject of the terminal object 1. Accordingly, the following diagram is a pullback X qS
X/S
and X ∼ = X/R × X/S, as expected.
qR
/ / X/R /1
On the naturalness of Mal’tsev categories
93
p0 p1 /Y Remark 60 In any exact category E, the product X o X ×Y of two objects X and Y is such that Eq[p0 ]∧Eq[p1 ] = ΔX×Y , Eq[p0 ]◦Eq[p1 ] = Eq[p1 ] ◦ Eq[p0 ] and Eq[p0 ] ∨ Eq[p1 ] = ∇X×Y . Theorem 59 can then be seen as a kind of converse to this simple observation.
Remark 61 We observe that the assumptions in Theorem 59 are the (categorical formulations of the) properties defining a pair of factor congruences in the sense of universal algebra. If the base category E is exact Mal’tsev we immediately get the following: Corollary 62 Let E be an exact Mal’tsev category. Whenever two equivalence relations R and S on the same object X are such that R ∧ S = ΔX and R ∨ S = ∇X , then there is a canonical isomorphism X ∼ = X/R × X/S.
A glance at Goursat categories We now briefly recall and study some basic properties of Goursat categories. The origin of this important concept definitely goes back to the celebrated article [22] by A. Carboni, J. Lambek and M.C. Pedicchio, although the explicit definition and a first systematic study of Goursat categories was presented later in [21]. Definition 63 A regular category E is a Goursat category if for any two equivalence relations R and S on the same object X in E one has the equality R ◦ S ◦ R = S ◦ R ◦ S. Remark that a variety of universal algebras is a Goursat category if and only if it is 3-permutable in the usual sense [33]. Any regular Mal’tsev category is clearly a Goursat category, however the converse is not true: indeed, the variety of implication algebras is an example of a 3-permutable variety, and therefore of an exact Goursat category, that is not 2-permutable [56]. A remarkable categorical property of Goursat categories is that the regular image of any equivalence relation is again an equivalence relation. This is actually characteristic of these categories, as shown in [21]. Here below we give a simple and self-contained proof of this result: Proposition 64 A regular category E is a Goursat category if and only if for any equivalence relation R on any object X and any regular epimorphism f : X Y , the regular image f (R) is an equivalence relation. Proof One implication is direct: in a regular category E the relation f (R) is always reflexive and symmetric, and when E is a Goursat category then it is also transitive:
94
D. Bourn, M. Gran and P.-A. Jacqmin
f (R) ◦ f (R) = (f ◦ R ◦ f ◦ ) ◦ (f ◦ R ◦ f ◦ ) = f ◦ f ◦ ◦ f ◦ R ◦ f ◦ ◦ f ◦ f ◦ = f ◦ R ◦ f ◦ = f (R). Note that the assumption has been used in the second equality. For the converse, consider two equivalence relations R and S on X. Then the composite R ◦ S ◦ R can be written as R ◦ S ◦ R = r1 ◦ r0◦ ◦ s1 ◦ s◦0 ◦ r1 ◦ r0◦ = r1 ◦ r0◦ ◦ s1 ◦ s◦0 ◦ r0 ◦ r1◦ = r1 (r0−1 (S)), as observed in [12]. The assumption then implies that R ◦ S ◦ R is an equivalence relation, as a direct image of the equivalence relation r0−1 (S) along the split epimorphism r1 (which is then a regular epimorphism). Its transitivity implies that S ◦ R ◦ S ≤ R ◦ S ◦ R, and then S ◦ R ◦ S = R ◦ S ◦ R. This characterization and the so-called denormalized 3 × 3-Lemma [10, 50] inspired a new characterization of Goursat categories in terms of a special kind of pushouts: Definition 65 [30] Consider a commutative square (with y · f = f · x and x · s = s · y) x / / XO X O (3) f
f
s
Y
y
/ / Y
s
where the vertical morphisms are split epimorphisms and the horizontal ones are regular epimorphisms. This square is a pushout, and it is called a Goursat pushout if the induced morphism Eq[f ] → Eq[f ] is a regular epimorphism. Proposition 66 [30] A regular category E is a Goursat category if and only if any commutative diagram (3) is a Goursat pushout. Proof If E is a Goursat category we know that the regular image x(Eq[f ]) can be computed as follows: x(Eq[f ]) = x ◦ f ◦ ◦ f ◦ x◦ = x ◦ x◦ ◦ x ◦ f ◦ ◦ f ◦ x◦ ◦ x ◦ x◦ = x ◦ f ◦ ◦ f ◦ x◦ ◦ x ◦ f ◦ ◦ f ◦ x◦ = x ◦ f ◦ ◦ y ◦ ◦ y ◦ f ◦ x◦ ◦
◦
= x ◦ x◦ ◦ f ◦ f ◦ x ◦ x◦ = f ◦ f = Eq[f ] where we have used the Goursat assumption, the commutativity of the diagram (3), and fact that x is a regular epimorphism, so that x ◦ x◦ = 1X . For the converse, consider an equivalence relation S on an object X and a regular epimorphism f : X Y . The regular image f (S) = T is certainly reflexive and symmetric, and by Proposition 64 it suffices to show that it is also transitive. Since S is symmetric and transitive, we know that there exists a morphism τS such that the diagram
On the naturalness of Mal’tsev categories
95
p0
S
/S O
τS
Eq[s0 ] O p1
s0
s1
/X
s1
commutes. Moreover, the diagram
s0
/ / f (S) = T O
f
SO
t0
X
//Y
f
is of type (3), the upward pointing arrows being the morphisms giving the reflexivity of S and T , respectively. It follows that the factorization f˜ induced by the universal property of the kernel pair Eq[t0 ] - and making the square Eq[s0 ]
/ / Eq[t0 ]
τT
f τS
z T /
f˜
(t1 ×t1 )(p0 ,p1 )
(t0 ,t1 )
/ Y ×Y
commute - is a regular epimorphism by the assumption. Since (t0 , t1 ) is a monomorphism, it follows that there is a unique morphism τT making the diagram above commute, and τT makes the diagram Eq[t0 ] O p0
T
/T O
τT
p1
t0
t1
/Y
t1
commute. It follows that the relation T is transitive, as desired.
When V is a variety of universal algebras, a direct application of this characterization to a suitable Goursat pushout in the category of free algebras - a similar argument to the one used above to prove Theorem 54 - yields a categorical proof (see [30]) of the following well known Theorem. Theorem 67 For a variety V the following conditions are equivalent: 1. V is a 3-permutable variety; 2. in the algebraic theory of V there are two quaternary terms p and q satisfying the identities p(x, y, y, z) = x, q(x, y, y, z) = z, and p(x, x, y, y) = q(x, x, y, y).
96
D. Bourn, M. Gran and P.-A. Jacqmin
More generally, for any n ≥ 2, other characterizations of n-permutable varieties in terms of ternary operations and identities are considered in [32, 39] using categorical arguments (see also the references therein). We conclude this section with another characterization of Goursat categories, whose proof is based on the calculus of relations and on the notion of Goursat pushout. It concerns commutative diagrams of the form Eq[φ] o O p0
h1 h0
p1
Eq[h] o
] / Eq[f O p0
p1 p0
φ
Ko
/
h
p1
/ /A
/ Eq[g] O p0
h
k0
/ /B
p1
//C g
f k1
(4)
k
//D
(i.e. for any i, j ∈ {0, 1}, pi pj = pj hi , hpj = pj h, f pi = ki φ, and kf = gh) where the three columns and the middle row are exact (i.e. regular epimorphisms equipped with their kernel pairs): Theorem 68 [50, 30] For a regular category E the following conditions are equivalent: • E is a Goursat category; • the Upper 3 × 3 Lemma holds in E: given any commutative diagram (4), the upper row is exact whenever the lower row is exact; • the Lower 3 × 3 Lemma holds in E: given any commutative diagram (4), the lower row is exact whenever the upper row is exact; • the 3 × 3 Lemma holds in E: given any commutative diagram (4), the lower row is exact if and only if the upper row is exact. Note that this homological lemma was not foreseen in the original project in [22]. Regular Mal’tsev categories can also be characterized by a stronger version of the denormalized 3 × 3 Lemma, called the Cuboid Lemma [31]. Remark 69 In a similar way as Mal’tsev categories were first defined in the regular context and later studied in the finitely complete context, Goursat categories can be defined without the assumption of regularity, see [15].
7 Baer sums in Mal’tsev categories In this last section, we shall be interested in those extensions, namely regular epimorphisms f : X Y , which have abelian kernel equivalence relations,
On the naturalness of Mal’tsev categories
97
and we shall show that, from the Mal’tsev context, emerges a very natural notion of Baer sums. Such an extension is actually nothing but an affine object with global support in the slice category E/Y . So, we shall show that, in any Mal’tsev category being sufficiently exact, we are able to associate, with any affine object with global support, an abelian object, called its direction, and to show as well that the set (up to isomorphisms) of the affine objects with global support and a given direction A is endowed with a canonical abelian group structure on the (non-Mal’tsev) general model of [8]. By sufficiently exact, we mean the following: Definition 70 [13] A regular category E is said to be efficiently regular when any equivalence relation R on an object X which is a subobject i : R Eq[f ] of an effective equivalence relation is effective as well as soon as the monomorphism i is regular, i.e. is the equalizer of some pair of morphisms. The categories Ab(Top) and Gp(Top) of (resp. abelian) topological groups are examples of non-exact efficiently regular categories. The major interest of such a category is that any discrete fibration between equivalence relations R → Eq[g] makes R effective as well [13]. Note that this latter property could also be guaranteed by the assumption that the base category E is regular and that regular epimorphisms in E are effective descent morphisms (see [44], for instance, for more details). In this section we shall suppose that E is an efficiently regular Mal’tsev category. Take now any affine object X with global support (namely such that the terminal map τX : X → 1 is a regular epimorphism) and consider the following diagram where p : X × X × X → X is the internal Mal’tsev operation on X giving rise to the affine structure: X (pX 0 ·p1 ,p)
X × XO × X o pX 0
/
pX 2
X (p,pX 1 ·p1 )
X ×X o
pX 0 pX 1
/ X ×O X pX 0
qpX
pX 1
/ /X
//A O τA
0A
τX
//1
Definition 71 [8] The upper horizontal reflexive relation (which is an equivalence relation) is called the Chasles relation Chp associated with the internal Mal’tsev operation p. In set-theoretic terms we get (x, p(x, y, z))Chp (y, z) or, in other words, (x, y)Chp (x , y ) if and only if y = p(x, x , y ). The operation p is commutative (which is the case in any Mal’tsev category) if and only if we get the equivalence: (x, y)Chp (x , y ) ⇐⇒ (y , y)Chp (x , x)
98
D. Bourn, M. Gran and P.-A. Jacqmin
Since E is efficiently regular and since the left hand side square indexed by 0 is a discrete fibration between equivalence relations, the equivalence relation Chp is effective, and so has a quotient qpX which, since τX is a regular epimorphism, produces the split epimorphism (τA , 0A ) and makes the right hand side square a pullback. The vertical right hand side part is necessarily a group in E as a quotient of the vertical groupoid (actually the equivalence relation) ∇X . This group is abelian by Corollary 33. Definition 72 [8] The abelian group A is called the direction of the affine object X with global support and will be denoted by d(X). Proposition 73 Given any abelian group A, its direction is A. Proof The diagram A pA 1 −p0
A ×O A pA 0
A
pA 1
/A O 0A
//1
τA
τA
shows that the direction is indeed A.
Since the inclusion Aff(E) E is full, any morphism f : X → X between affine objects produces a morphism d(f ) making the following diagram commute: X ×X
qpX
f ×f
X × X
//A d(f )
qpX
/ / A
Proposition 74 The group homomorphism d(f ) is an isomorphism if and only if f is an isomorphism. Proof It is clear that if f is an isomorphism, so is d(f ). Conversely suppose d(f ) is an isomorphism. Consider the following diagram: qpX
X ×O X f ×f pX 0
pX 1
)
X ×O X pX 0
X f
/A O
* X
pX 1
qpX
d(f ) ∼ &/ =
//1
AO
//1
The front and back squares indexed by 0 being pullbacks, and d(f ) being an isomorphism, the left hand side quadrangle is a pullback. But X and X
On the naturalness of Mal’tsev categories
99
having global supports, the Barr-Kock Theorem makes the following square a pullback X 1
f
/ X 1
and consequently makes f an isomorphism.
Moreover the fact that the right hand side square defining A is a pullback gives X the structure of an A-torsor which is controlled by the choice of the quotient map qpX . Accordingly we get an internal regular epic discrete fibration q X : ∇X A. Let us denote by AbTors(E) the category whose objects are the pairs (X, q X ) where q X : ∇X A is a regular epic discrete fibration above an abelian group A (which obviously implies that the object X is affine with global support and direction A), and whose morphism are the pairs (f : X → Y, g : A → B) of a morphism f and a group homomorphism g such that g · q X = q Y · (f × f ). Let us denote by Aff ∗ (E) the full subcategory of Aff(E) whose objects have a global support. The previous construction produces a functor Φ : Aff ∗ (E) → AbTors(E) which is an equivalence of categories. Furthermore there is an obvious forgetful functor U : AbTors(E) → Ab(E) such that d = U · Φ. We shall now investigate the properties of the functor U . We immediately get: Proposition 75 Given any efficiently regular Mal’tsev category E, the functor U is conservative, it preserves the finite products and the regular epimorphisms. It preserves the pullbacks when they exist, and consequently reflects them. The restriction about the existence of pullbacks comes from the fact that the objects with global support are not stable under pullback in general. Here comes the main result of this section: Theorem 76 [8] Given any efficiently regular Mal’tsev category E, the functor U is a cofibration. Being also conservative, any map in AbTors(E) is cocartesian, and any fibre UA above an abelian group A is a groupoid. Proof First we shall show that there are cocartesian maps above regular epimorphisms. Let us start with an object (X, q X ) in AbTors(E) above the abelian group A and a regular epic group homomorphism g : A B. Let us denote by Rg X ×X the subobject (g·q X )−1 (0). It produces an equivalence relation on X which is effective since the monomorphism in question is regular. Let us denote by qg : X Y its quotient. According to Proposition 47, the object Y is an affine object, with global support, since so is X. Let us show now that there is a (necessarily unique) map q Y : Y × Y → B such that q Y · (qg × qg ) = g · q X . Since qg × qg is a regular epimorphism,
100
D. Bourn, M. Gran and P.-A. Jacqmin
it is enough to show that g · q X coequalizes its kernel equivalence relation. −→ For sake of simplicity we shall set q X (x, x ) = xx . So we have to show that, −→ → − when we have xRg t and x Rg t (namely g( xt) = 0 and g(x t ) = 0), we − → −→ get g(xx ) = g(tt ), which is straightforward. It remains to show that the following square is a pullback of split epimorphisms: Y ×O Y pY 0
qY
sY 0
Y
//B O 0B
τY
//1
τB
First, we can check that the upward square commutes by composition with the regular epimorphism qg . Let us denote by ψ : Y ×Y → P the factorization through the pullback. Since E is a Mal’tsev category, ψ is necessarily a regular epimorphism by Corollary 53. We can check it is a monomorphism as well in the following way: consider the kernel equivalence relation Eq[ψ · (qg × qg )]; it is easy to check that it is coequalized by qg × qg . Accordingly Eq[ψ] is the discrete equivalence relation and ψ is a monomorphism. Now, given any pair (X, B) of an affine object X with global support and an abelian group B, the map (1X , 0B ) : X → X × B has direction (1d(X) , 0B ) : d(X) → d(X) × B, and it is easy to check that it is cocartesian. Then, starting with any group homomorphism h : d(X) → B, we get the following commutative diagram: (1d(X) ,0B )
d(X) ×B 9 d
(0d(X) ,1B )
d(X)
h
$ /B
where the map < h, 1B > comes from the fact that the product is the direct sum as well in the additive category Ab(E). Moreover, this map, being split, is a regular epimorphism. Accordingly the map h has a cocartesian map above it as well. Corollary 77 [8] Given any efficiently regular Mal’tsev category E, the fibre UA above the abelian group A is endowed with a canonical symmetric closed monoidal structure ⊗A whose unit is A. Proof We recalled that UA is necessarily a groupoid. Given any pair (X, X ) of affine objects with global support and direction A, the tensor product X ⊗A X is defined as the codomain of the (regular epic) cocartesian map above + : A × A → A with domain X × X . The commutative diagram in Ab(E) expressing the associativity of the group law: a+(b+c) = (a+b)+c produces the desired associative isomorphism a(X,X ,X ) : X ⊗A (X ⊗A X ) ∼ =
On the naturalness of Mal’tsev categories
101
(X ⊗A X ) ⊗A X , while the commutative diagram expressing the commutativity of the group law a + b = b + a and the twisting isomorphism τ(X,X ) : X × X ∼ = X × X produce the symmetric isomorphism ⊗ σ(X,X ) : X ⊗A X ∼ X = A X. The unit of this tensor product is determined by the codomain of the cocartesian map with domain 1 above 0A : 1 A, namely A itself. The left unit isomorphism A ⊗A X ∼ = X is produced by the commutative diagram in Ab(E) associated with the left unit axiom 0 + a = a, a similar construction producing the right unit isomorphism. This monoidal structure is closed since in the abelian context the division map d(a, b) = b − a is a group homomorphism. We defined [X, Y ] as the codomain of the cocartesian map above d with domain X × Y . The commutative diagram determined by a + (b − a) = b induces the isomorphism X ⊗A [X, Y ] ∼ = Y , while the one determined by (b + a) − b = a induces the isomorphism X ∼ = [Y, Y ⊗A X]. Accordingly this produces an abelian group structure on the set Ext(A) of the connected components of the groupoid UA whose operation is called in classical terms the Baer sum. Starting with the variety Mal of Mal’tsev algebras, the subvariety Aff(Mal) is the variety of associative and commutative Mal’tsev algebras by Proposition 8. An algebra X in Mal has a global support if and only if it is non-empty. Accordingly the choice of a point in any non-empty affine object makes the fibre UA a connected groupoid, reduces the group Ext(A) to only one object and makes it invisible. Now take the example where the Mal’tsev category E is the slice category Gp/Q of the groups above the group Q, whose affine objects with global support are the exact sequences with A abelian: 1→AXQ→1 The direction of this affine object is nothing but the semi-direct product exact sequence produced by the Q-module structure on A determined by this exact sequence. And the Baer sum, described above, coincides with the classical Baer sum associated with a given Q-module structure on A, see for instance Chapter 4, Cohomology of groups, in Mac Lane’s Homology [53]. To finish this section, let us be a bit more explicit about the construction of X ⊗A X . In set-theoretic terms, given any pair (X, X ) of affine objects with global support, the equivalence relation R+ on X × X producing the → − −→ tensor product is given by (x, x )R+ (t, t ) if and only if xt + x t = 0, while −−−−−−−−−→ −→ →+− xz x z . the direction on X ⊗ X is such that (x, x ), (z, z ) = − A
The inverse of an affine object X with global support and direction A is the affine object X ∗ = [X, A], namely the quotient of X × A by the equivalence −→ relation Rd defined by (x, a)Rd (x , a ) if and only if a − a − xx = 0. The −−−−−−−−→ → As expected, we can xz. direction of X ∗ is such that (x, a), (z, b) = b − a − −
102
D. Bourn, M. Gran and P.-A. Jacqmin
check that we have an isomorphism γ : X → X ∗ of affine objects defined by −−−−−−−−→ −→ −→ γ(x) = (x, 0), whose direction satisfies: d(γ)(xx ) = (x, 0), (x , 0) = −xx .
References 1. Barr, M., Exact categories, Lecture Notes in Mathematics 236 (1971), 1–120. 2. Barr, M., Representation of categories, Journal of Pure and Applied Algebra 41 (1986), 113–137. 3. Berger, C. and Bourn, D., Central reflections and nilpotency in exact Mal’tsev categories, Journal of Homotopy and Related Structures 12 (2017), 765–835. 4. Borceux, F. and Bourn, D., Mal’cev, Protomodular, Homological and Semi-Abelian Categories, Kluwer, Mathematics and Its Applications 566 (2004), 479 pp. 5. Bourn, D., The shift functor and the comprehensive factorization for internal groupoids, Cahiers de Topologie et G´ eom´ etrie Diff´ erentielle Cat´ egoriques 28 (1987), 197–226. 6. Bourn, D., Normalization equivalence, kernel equivalence and affine categories, Lecture Notes in Mathematics 1488 (1991), 43–62. 7. Bourn, D., Mal’cev Categories and fibration of pointed objects, Applied Categorical Structures 4 (1996), 302–327. 8. Bourn, D., Baer sums and fibered aspects of Mal’cev operations, Cahiers de Topologie et G´ eom´ etrie Diff´ erentielle Cat´ egoriques 40 (4) (1999), 297–316. 9. Bourn, D., Intrinsic centrality and associated classifying properties, Journal of Algebra 256 (2002), 126–145. 10. Bourn, D., The denormalized 3 × 3 lemma, Journal of Pure and Applied Algebra 177 (2003), 113–129. 11. Bourn, D., Commutator theory in regular Mal’cev categories, in: Galois theory, Hopf algebras, and Semiabelian categories, G. Janelidze, B. Pareigis and W. Tholen editors, Fields Institute Communications 43, Amer. Math. Soc. (2004), 61–75. 12. Bourn, D., Congruence distributivity in Goursat and Mal’cev categories, Applied Categorical Structures 13 (2) (2005), 101–111. 13. Bourn, D., Baer sums in homological categories, Journal of Algebra 308 (2007), 414– 443. 14. Bourn, D., On the monad of internal groupoids, Theory and Applications of Categories 28 (5) (2013), 150–165. 15. Bourn, D., Suprema of equivalence relations and non-regular Goursat categories, Cahiers de Topologie et G´ eom´ etrie Diff´ erentielle Cat´ egoriques, 59 (2) (2018), 142– 194. 16. Bourn, D. and Gran, M., Centrality and normality in protomodular categories, Theory and Applications of Categories 9 (8) (2002), 151–165. 17. Bourn, D. and Gran, M., Centrality and connectors in Mal’tsev categories, Algebra Universalis 48 (2002), 309–331. 18. Bourn, D. and Gran, M., Regular, Protomodular and Abelian Categories, in Categorical Foundations, edited by M.C. Pedicchio and W. Tholen, Cambridge University Press (2004). 19. Bourn, D. and Gran, M., Normal sections and direct product decompositions, Communications in Algebra 32 (2004), 3825–3842. 20. Bourn, D. and Janelidze, Z., Approximate Mal’tsev operations, Theory and Applications of Categories 21 (8) (2008), 152–171. 21. Carboni A., Kelly, G.M. and Pedicchio, M.C., Some remarks on Maltsev and Goursat categories, Applied Categorical Structures 1 (1993), 385–421. 22. Carboni, A., Lambek, J. and Pedicchio, M.C., Diagram chasing in Mal’cev categories, Journal of Pure and Applied Algebra 69 (1990), 271–284.
On the naturalness of Mal’tsev categories
103
23. Carboni, A. and Pedicchio, M.C., A new proof of the Mal’cev theorem, Categorical studies in Italy (Perugia, 1977), Rend. Circ. Mat. Palermo 2 Suppl. No. 64 (2000), 13–16. 24. Carboni, A., Pedicchio, M.C. and Pirovano, N., Internal graphs and internal groupoids in Mal’tsev categories, Canadian Mathematical Society Conference Proceedings 13 (1992), 97–109. 25. Duvieusart, A., and Gran, M., Higher commutator conditions for extensions in Mal’tsev categories, Journal of Algebra 515 (2018), 298–327. 26. Everaert, T., Higher central extensions in Mal’tsev categories, Applied Categorical Structures 22 (2014), 961–979. 27. Everaert, T., Gran, M. and Van der Linden, T., Higher Hopf formulae for homology and Galois theory, Advances in Mathematics 217 (2008), 2231–2267. 28. Findlay, G. D., Reflexive homomorphic relations, Canadian Mathematical Bulletin 3 (1960), 131–132. 29. Gran, M., Internal categories in Mal’cev categories, Journal of Pure and Applied Algebra 143 (1999), 221–229. 30. Gran, M. and Rodelo, D., A new characterization of Goursat categories, Applied Categorical Structures 20 (2012), 229–238. 31. Gran, M. and Rodelo, D., The Cuboid Lemma and Mal’tsev categories, Applied Categorical Structures 20 (2014), 805–816. 32. Gran, M. and Rodelo, D., Beck-Chevalley conditions and Goursat categories, Journal of Pure and Applied Algebra 221 (2017), 2445–2457. 33. Hagemann, J. and Mitschke, A., On n-permutable congruences, Algebra Universalis 3 (1973), 8–12. 34. Huq, S. A., Commutator, nilpotency and solvability in categories, Quart. J. Math. 19 (1968), 363–389. 35. Jacqmin, P.-A., Embedding theorems in non-abelian categorical algebra, Universit´ e catholique de Louvain thesis (2016). 36. Jacqmin, P.-A., An embedding theorem for regular Mal’tsev categories, Journal of Pure and Applied Algebra 222 (5) (2018), 1049–1068. 37. Jacqmin, P.-A., Embedding theorems for Janelidze’s matrix conditions, Journal of Pure and Applied Algebra 224 (2) (2020), 469–506. 38. Jacqmin, P.-A., Partial Mal’tsev algebras and an embedding theorem for (weakly) Mal’tsev categories, Cahiers de Topologie et G´ eom´ etrie Diff´ erentielle Cat´ egoriques 60 (4) (2019), 365–403. 39. Jacqmin, P.-A. and Rodelo, D., Stability properties characterizing n-permutable categories, Theory and Applications of Categories 32 (2017), 1563–1587. 40. Janelidze, G., Internal categories in Mal’cev varieties, preprint, York University in Toronto (1990). 41. Janelidze, G., What is a double central extension? The question was asked by Ronald Brown, Cahiers de Topologie et G´ eom´ etrie Diff´ erentielle Cat´ egoriques 32 (3) (1991), 191–201. 42. Janelidze, G. and Kelly, G. M., Galois theory and a general notion of central extension, Journal of Pure and Applied Algebra 2 (1994), 135–161. 43. Janelidze, G. and Pedicchio, M. C., Pseudogroupoids and commutators, Theory and Applications of Categories 8 (2001), 408–456. 44. Janelidze, G., Sobral, M. and Tholen, W., Beyond Barr exactness: Effective Descent Morphisms, in Categorical Foundations, edited by M.C. Pedicchio and W. Tholen, Encyclopedia of Mathematics and its Applications 97, Cambridge University Press (2004). 45. Janelidze, Z., Subtractive categories, Applied Categorical Structures 13 (2005), 343– 350. 46. Janelidze, Z., Closedness properties of internal relations I: a unified approach to Mal’tsev, unital and subtractive categories, Theory and Applications of Categories 16 (2006), 236–261.
104
D. Bourn, M. Gran and P.-A. Jacqmin
47. Johnstone, P. T., Affine categories and naturally Mal’cev categories, Journal of Pure and Applied Algebra 61 (1989), 251–256. 48. Johnstone, P. T., The closed subgroup theorem for localic herds and pregroupoids, Journal of Pure and Applied Algebra 70, (1991) 97–106. 49. Kock, A., Generalized fibre bundles, Lecture Notes in Mathematics 1348 (1988), 194– 207. 50. Lack, S., The 3-by-3 Lemma for regular Goursat categories, Homology Homotopy and Applications 6 (2004), 1–3. 51. Lambek, J., Goursat’s theorem and the Zassenhaus lemma, Canadian Journal of Mathematics 10 (1958), 45–56. 52. Lambek, J., On the ubiquity of Mal’cev operations, Contemporary Mathematics 131 (3) (1992), 135–146. 53. Mac Lane, S., Homology, Berlin-G¨ ottingen-Heildelberg: Springer 114 (1963), 422pp. 54. Mal’tsev, A. I., On the general theory of algebraic systems, Matematicheskii Sbornik, N.S. 35 (77) (1954), 3–20. 55. Meisen, J., Relations in categories, McGill University thesis (1972). 56. Mitschke, A., Implication algebras are 3-permutable and 3-distributive, Algebra Universalis 1 (1971), 182–186. 57. Pedicchio, M. C., A categorical approach to commutator theory, Journal of Algebra 177 (1995), 647–657. 58. Pedicchio, M. C., Arithmetical category and commutator theory, Applied Categorical Structures 4 (1996), 297–305. 59. Riguet, J., Relations binaires, fermetures, correspondances de Galois, Bulletin de la Soci´ et´ e Math´ ematique de France 76 (1948), 114–155. 60. Smith, J. D. H., Mal’cev varieties, Lecture Notes in Mathematics 554 (1976). 61. Werner, H., A Mal’cev condition for admissible relations, Algebra Universalis 3 (1973), 263.
Extensions of Lambek Calculi Wojciech Buszkowski
Abstract The Lambek calculus (associative and nonassociative) is a basis of a rich family of formal logics: type logics for categorial grammars, substructural logics, linear logics and multi-modal logics. This paper briefly discusses these developments.
1 Introduction The Lambek calculus was introduced by Lambek [48] under the name Syntactic Calculus. Lambek’s intention was to extend the type reduction procedure for syntactic description, due to Ajdukiewicz [3] and Bar-Hillel [4]. Nowadays the formal grammars based on such logics of types are called categorial grammars (after [5]) or type grammars. The logic from Lambek [48] is called the associative Lambek calculus and denoted by L. One also considers its nonassociative version, i.e. the nonassociative Lambek calculus, denoted by NL (due to Lambek [49]). Both logics admit three binary connectives: · (product), \ (right residual), / (left residual). The latter are also called left division and right division, respectively. My terminology can be justified by the algebraic interpretation of \, / as the residual operations for product and their logical interpretation as the right implication → and the left implication ←. The formulas are built from atoms (variables) p, q, r, . . . by means of these connectives. If α, β are formulas, then α ⇒ β is an arrow. The axioms of NL are: (id) α ⇒ α Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Pozna´ n, Poland, e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_4
105
106
Wojciech Buszkowski
and its inference rules are1 : (r1)
α·β ⇒γ β ⇒ α\γ (cut-1)
(r2)
α·β ⇒γ α ⇒ γ/β
α⇒β β⇒γ α⇒γ
In L one adds the associativity axioms: (as.1) (α · β) · γ ⇒ α · (β · γ) (as.2) α · (β · γ) ⇒ (α · β) · γ This axiomatization resembles the algebraic conditions defining residuated groupoids, i.e. ordered algebras (A, ·, \, /, ≤) such that (A, ≤) is a nonempty poset and ·, \, / are binary operations on A, satisfying the residuation laws: (RES) a · b ≤ c iff b ≤ a\c iff a ≤ c/b, for all a, b, c ∈ A. If product is associative, these algebras are called residuated semigroups. Residuated groupoids and semigroups are algebraic models of NL and L, respectively. The notion of a residuated semigroup is essentially due to Birkhoff [6], whose calculus of residuals was adapted by Lambek [48] for linguistic purposes. Although there exist many natural models of this kind, these algebras do not belong to the mainstream of purely algebraic research. Some more elaborate studies can be found in works on ordered algebraic systems, e.g. [28, 10]. In the last decades, some algebras based on residuated semigroups (groupoids) were extensively studied by representatives of algebraic logic as models of different nonclassical logics, mainly substructural logics [30]. They were also considered as generalized relation algebras [40, 29]. Closely related category-theoretic notions, e.g. cartesian closed categories, (bi)monoidal closed categories and others, form a foundation of categorical logic in Lambek style [50, 55], this means: an abstract proof theory rather than a categorical framework for foundations of mathematics. They will not be discussed here, since several other chapters of this volume elaborate on them. The present paper focuses on some formal logics, which are not directly motivated by applications in linguistics but seem quite natural from the point of view of general logic. After Schroeder-Heister and Doˇsen [71], they are called substructural logics. The name points out the lack of some structural rules in sequent systems for these logics; see Section 4. The aim of this paper is to emphasize the role of the Lambek calculus in the world of nonclassical logics. 1 The double line means that the rule can be applied top-down and bottom-up; it is reversible ex definitione.
Extensions of Lambek Calculi
107
Perhaps, a better name for substructural logics could be residuation logics. Product, also called: multiplicative (or: intensional) conjunction, tensor, fusion, is always related to implication(s) by the equivalences (RES). The residuation connection occupies a central place in the whole subject. This notion will be discussed in more detail in Section 2. Ono and Komori [63] is a seminal study of sequent systems and algebras for different logics not admitting the contraction rule. This paper does not refer to Lambek’s contributions. Later works in substructural logics, however, acknowledge Lambek’s priority and emphasize it in terminology. A basic substructural logic is called Full Lambek Calculus (FL); see e.g. [30]. Section 3 recalls sequent systems for L and NL and briefly discusses some properties of these systems (cut elimination, completeness, decidability, complexity) as well as their role in categorial grammars. As a matter of fact, most results on these two logics were obtained by scholars interested in linguistic applications. Section 4 brings a survey of different substructural logics of intuitionistic character (double negation not maintained). Since this area has been extensively explored since the 1980s, we can only regard some basic systems and their most fundamental properties. At the end, some related multi-modal logics are described. Section 5 focuses on (classical) linear logics, extending some logics from Section 4 by the double negation law(s). Again our presentation of this topic is very concise. We only consider logics without exponentials with emphasis on noncommutative logics. Section 6 contains some final comments. In Sections 4 and 5 we also regard logics whose sequent systems obey Lambek’s restriction: intuitionistic sequents have nonempty antecedents and classical one-sided sequents contain at least two formulas. The logical literature mostly ignores these logics, but they are applied in categorial grammars and seem natural for modal interpretations. This paper is by no way a complete survey of the subject-matter. Although linguistic issues are noted several times, many fundamental topics are omitted. The reader is referred to Moortgat [58] for a thorough discussion; also see Chapter 12 of the handbook [76] and the book [60]. The latter book extensively studies proof nets for multi-modal Lambek calculi, an issue omitted here. Similarly, we do not attempt a representative overview of the huge literature on substructural (linear) logics. The reader is requested to consult other publications, cited here. Some earlier papers on the Lambek calculus, its extensions and variants were collected in the special issues of Studia Logica: 71.3 (2002) and 87.2-3 (2007), published on the occasion of Jim Lambek’s 80-th and 85-th birthday.
108
Wojciech Buszkowski
2 Residuation Let (X, ≤X ), (Y, ≤Y ) be posets. A map f : X → Y is said to be residuated [10], if there exists g : Y → X such that: (1-RES) for all x ∈ X, y ∈ Y , f (x) ≤Y y iff x ≤X g(y) . It follows that f (g(y)) ≤Y y, x ≤X g(f (x)), for x ∈ X, y ∈ Y , and f, g preserve order. In fact, these conditions are equivalent to (1-RES). Furthermore g(y) = max{x ∈ X : f (x) ≤Y y} and f (x) = min{y ∈ Y : x ≤X g(y)}. Consequently, g is unique (if exists) for f , and conversely. We call g the right residual of f and denote by f r . We also call f the left resdual of g and denote by g l . Accordingly, ‘residuated’ means in fact ‘right-residuated’, i.e. ‘possessing a right residual’. In the pair (f, g), satisfying (1-RES), f is right-residuated and g is left-residuated. We call (f, g) a residuation pair. This can be treated as a particular example of adjoint functors in category theory; see Lambek and Scott [55], where a residuation pair is called “a covariant Galois correspndence”. Following Blyth [10], a Galois connection is defined like a residuation pair except that ≤Y in (1-RES) is replaced with ≥Y (i.e. the converse of ≤Y ). The equations: f ◦ g ◦ f = f, g ◦ f ◦ g = g (1) hold for Galois connections as well as residuation pairs (f, g). Residuated maps are closed under composition: if f : X → Y and h : Y → Z are residuated (for some orderings on X, Y, Z), then h ◦ f : X → Z is residuated, and we have (h ◦ f )r = f r ◦ hr , and similarly for left-residuated maps. Galois connections do not have this property. Every residuated (resp. left-residuated) map preserves the existing infinite joins (resp. meets). On complete lattices, the residuated (resp. left-residuated) maps are precisely the completely additive (resp. multiplicative) maps, i.e. those which distribute over infinite joins (resp. meets). This means: f ( xi ) = f (xi ) (resp. f ( xi ) = f (xi ) ) i∈I
i∈I
i∈I
i∈I
If (f, g) is a Galois connection, then f, g turn infinite joins into infinite meets. A simple example of a residuation pair is image-coimage, i.e. (f → , f ← ), where f : X → Y , f → (A) = {f (x) : x ∈ A} for A ⊆ X, f ← (B) = {x ∈ X : f (x) ∈ B} for B ⊆ Y ; the posets are (P(X), ⊆), (P(Y ), ⊆). More generally, if R ⊆ X × Y , R→ (A) = {y ∈ Y : ∃x∈A R(x, y)}, R← (B) = {x ∈ X : R→ ({x}) ⊆ B}, then (R→ , R← ) is a residuation pair. We show another example of a residuated map. Let (X, ≤) be a poset and C : X → X. The operation C is called a closure operation, if it satisfies: (C1) x ≤ C(x), (C2) if x ≤ y then C(x) ≤ C(y), (C3) C(C(x)) = C(x), for all
Extensions of Lambek Calculi
109
x, y ∈ X. The elements of C → (X) are said to be C−closed. The conjunction of (C1), (C2), (C3) is equivalent to the following condition. C(x) ≤ y iff x ≤ y, for all x ∈ X, y ∈ C → (X) Consequently, C is a closure operation on (X, ≤) if and only if C is the left residual of the embedding ι : C → (X) → X, defined by: ι(y) = y for y ∈ C → (X). The ordering on C → (X) is ≤ restricted to this set. Notice that a closure operation on a complete lattice preserves infinite joins in the following sense: C( i∈I xi ) = C( i∈I C(xi )). If (f, g) is a Galois connection, then both g ◦ f and f ◦ g are closure operations on the respective posets. If (f, g) is a residuation pair, then g ◦ f is a closure operation and f ◦ g is an interior operation, i.e. a closure operation on the dual poset. For a binary map f : X × Y → Z and a ∈ X, b ∈ Y , one defines unary maps: f−,b (x) = f (x, b) for x ∈ X, fa,− (y) = f (a, y) for y ∈ Y . The map f is said to be residuated, if for all a ∈ X, b ∈ Y the maps f−,b , fa,− are residuated in the former sense (some orderings on X, Y, Z are fixed). If one writes a · b for f (a, b), c/b for (f−,b )r (c), and a\c for (fa,− )r (c), there holds the following generalization of (RES). (RES’) a · b ≤Z c iff b ≤Y a\c iff a ≤X c/b, for all a ∈ X, b ∈ Y, c ∈ Z. Let us make a terminological remark. The residual maps \, / with the bottom argument fixed are right residuals of the unary maps fa,− , f−,b . We, however, treat them as binary maps and may call them the right and the left residual, respectively, of product. (Perhaps better names could be: \ - the second residual, / - the first residual, according to the place of the variable argument.) Let c ∈ Z. The pair (g c , hc ), where g c (x) = x\c for x ∈ X, hc (y) = c/y for y ∈ Y , is a Galois connection. By the above, · preserves order in both arguments; on complete lattices product is completely additive in both arguments. The residual maps \, / preserve order in the top argument and reverse order in the bottom argument; on complete lattices they distribute over infinite meets in the top argument and turn infinite joins to infinite meets in the bottom argument. The algebraic models of logics, considered in this paper, are residuated algebras in which ·, \, / are binary operations on a nonempty set. Residuated groupoids and semigroups have been defined in Section 1. Algebras with the unit element for product (denoted by 1) are said to be unital. Monoids are unital semigroups, hence unital residuated semigroups are also called residuated monoids. If (A, ≤) is a lattice, then the residuated groupoid (A, ·, \, /, ≤) is said to be lattice-ordered (l.o.). The lattice operations will be denoted by ∨ (join) and ∧ (meet). Residuated lattices are defined as l.o. residuated monoids; this is a basic class of algebras for substructural logics [30]. They form an algebraic variety, since they can be defined as algebras (A, ·, \, /, 1, ∨, ∧), sat-
110
Wojciech Buszkowski
isfying a (finite) set of equations. One defines the lattice order: a ≤ b iff a ∨ b = b. These algebras are said to be commutative, if product is commutative. In commutative residuated groupoids a\b = b/a for all elements a, b, and one writes a → b for both. If (A, ·, \, /, ∨, ∧, 1) is a completeresiduated lattice, then (A, ·, , 1) is a quantale, i.e. (A, ) is a complete −semi-lattice and · is an associative, completely additive operation on A with unit 1. Conversely, every quantale can uniquely be expanded to a complete residuated lattice, by setting: a\b = {x ∈ A : a · x ≤ b}, a/b = {x ∈ A : x · b ≤ a}. If (A, ∨, ∧) is a complete lattice, then the set of all residuated maps f : A → A is a quantale with g · f = g ◦ f , 1 equal to the identity map, andjoins defined pointwise; we denote this quantale by RES((A, ∨, ∧)). If (A, ·, , 1) is a quantale, then the map h(a)(x) = a · x, for a, x ∈ A, is a monomorphism of this quantale to RES((A, ∨, ∧)) that preserves infinite joins: h( i∈I ai ) = i∈I h(ai ). These observations show that, on a complete lattice, every residuated associative binary operation admitting 1 can be represented as the composition of unary residuated maps on the lattice. In a sense this holds for arbitrary residuated associative binary operations, admitting 1, since every residuated monoid can be embedded into a complete residuated lattice, e.g. the lattice of downsets on the residuated monoid (see below). A simple construction of a residuated groupoid resembles the imagecoimage construction. Given a binary operation · on A, one defines the operations , \, / on P(A) as follows. X Y = {a · b : a ∈ X, b ∈ Y } X\Y = {b ∈ A : X {b} ⊆ Y } X/Y = {a ∈ A : {a} Y ⊆ X} (P(A), , \, /, ⊆) is a residuated groupoid (and a complete lattice with the set-theoretic union and intersection); we refer to this algebra as the powerset algebra. If · admits 1, then {1} is the unit for , and one obtains a unital residuated groupoid. If · is associative, then so is , and one obtains a residuated semigroup (monoid). A more general construction replaces · with a relation R ⊆ A3 . One defines: X Y = {c ∈ A : ∃a,b (a ∈ X ∧ b ∈ Y ∧ R(a, b, c))} and X\Y , X/Y as above. For a partially ordered (p.o.) groupoid (A, ·, ≤) (product is monotone), one performs the latter construction with: R(a, b, c) iff c ≤ a · b. The family of all down-sets of (A, ≤) is a subalgebra of the resulting residuated groupoid. Clearly all constructions from this paragraph yield residuated groupoids on complete lattices of sets. As a special case, one takes A = Σ + , i.e. the set of nonempty strings on the alphabet (lexicon) Σ. (In categorial grammars for natural language the elements of Σ are usually interpreted as words; compound expressions,
Extensions of Lambek Calculi
111
e.g. sentences, are represented as strings of words.) The concatenation of strings is an associative binary operation on Σ + . So (P(Σ + ), , \, /, ⊆) is a residuated semigroup. For A = Σ ∗ , i.e. the set of all strings on Σ, one obtains a residuated monoid (P(Σ ∗ ), , \, /, {}, ⊆), where stands for the empty string. Algebras of this kind are called language models 2 . With A = Σ + , they are intended models of L in the theory of categorial grammars. Pentus [65] proved that L is weakly complete with respect to language models (this means: the arrows provable in L are precisely those valid in these models). A formula α is interpreted as a set of strings, called the syntactic category of type α, e.g. the category of sentence (type s), of noun phrase (type np), of verb phrase (type np\s), of transitive verb phrase (type (np\s)/np), of subject (type s/(np\s)), of object (type (s/np)\s), of common noun (type n), and many others. For NL, Σ + is replaced by Σ (+) , i.e. the set of all bracketed strings on Σ. (Bracketed strings are also called phrase structures). They are generated from the elements of Σ by bracketed concatenation: x · y = (xy) (one often writes (x, y)). The laws provable in L allow parsing of compound expressions. Here is a list of basic laws. (L1) (L2) (L3) (L4)
application: α · (α\β) ⇒ β and (α/β) · β ⇒ α composition: (α\β) · (β\γ) ⇒ α\γ and (α/β) · (β/γ) ⇒ α/γ bi-associativity: (α\β)/γ ⇔ α\(β/γ) type-raising: α ⇒ (β/α)\β and α ⇒ β/(α\β)
We write α ⇔ β as an abbreviation of: α ⇒ β and β ⇒ α. Clearly these laws express some elementary laws of residuated semigroups (write a, b, c for α, β, γ, ≤ for ⇒, and = for ⇔). (L1) are the only laws used in basic categorial grammars (AB-grammars), due to Ajdukiewicz and Bar-Hillel. One parses ‘John rests’ as np·(np\s) ⇒ s, and ‘John likes Peter’ as: np · ((np\s)/np) · np ⇒ np · (np\s) ⇒ s For ‘he likes him’ one also needs (L2). (s/(np\s)) · ((np\s)/np) · ((s/np)\s) ⇒ (s/np) · ((s/np)\s) ⇒ s Due to (L4) every noun phrase (np) can be treated as subject (s/(np\s)). From this list only (L1), (L4) are provable in NL, and consequently one cannot parse ‘he likes him’ as above. Categorial grammars based on weak logics, like AB, NL, require more types assigned to words. For instance, ‘he’ is also assigned (s/np)/((np\s)/np); the latter is derivable in L from s/(np\s) due to the law: α/β ⇒ (α/γ)/(β/γ) (apply (r2) to the second arrow in (L2)). The names of (L1) and (L2) are justified by the interpretation of L in type theories. L with the commutative product was studied by van Benthem [75] 2
In mathematical linguistics, sets of strings are called languages.
112
Wojciech Buszkowski
as a logic of semantic types (in fact, van Benthem considered its product-free fragment; the sequent system is Le from Section 4, restricted to productfree formulas). The type α → β is interpreted as the type of functions from the domain of type α to the domain of type β. From atomic types, e.g. e (entity), t (truth value), one builds compound types, e.g. e → t (unary predicate), e → (e → t) (binary predicate), (e → t) → t (quantifier), and so on (here sets and relations are identified with their characteristic functions). A syntactic type is translated into a semantic type as follows. One assigns some semantic types to atomic syntactic types, e.g. t to s, e → t to n, (e → t) → t to np, but e to pn (proper noun). Both α\β and β/α are translated to τ (α) → τ (β), where τ (α) denotes the translation of α. So a type pn\s of verb phrase is translated into e → t, a type (s\s)/s of propositional connective into t → (t → t), a type s/(pn\s) of subject into (e → t) → t, and so on. By a version of the Curry-Howard isomorphism, every proof in the natural deduction format corresponds to some typed linear lambda term with no closed subterms3 . Therefore syntactic derivations in L, translated into Le , determine some typed lambda terms which define the corresponding semantic transformations. Another class of natural models of L are relation algebras. Given a transitive relation U ⊆ W 2 , one considers P(U ) with the relative product ◦ and its residuals \, /, defined as follows. R ◦ S = {(x, y) ∈ U : ∃z((x, z) ∈ R and (z, y) ∈ S)} R\S = {(x, y) ∈ U : R ◦ {(x, y)} ⊆ S} S/R = {(x, y) ∈ U : {(x, y)} ◦ R ⊆ U } Then (P(U ), ◦, \, /, ⊆) is a residuated semigroup. U = W 2 yields a residuated monoid, where the identity relation is the unit. For an arbitrary U ⊆ W 2 , one obtains a residuated groupoid, but \, / must be defined differently. We write S¯ for S ∪ (W 2 − U ). ¯ S/R = {(x, y) ∈ U : {(x, y)} ◦ R ⊆ S} ¯ R\S = {(x, y) ∈ U : R ◦ {(x, y)} ⊆ S}; All constructions, described above, yield residuated groupoids (semigroups) on a complete distributive lattice of sets (except for the case of downsets, even a boolean algebra). Nondistributive lattices can be obtained as the lattices of closed sets with respect to a closure operation C on (P(A), ⊆) that is nuclear, i.e. it satisfies: (C4) C(X) C(Y ) ⊆ C(X Y ) for all X, Y ⊆ A. Under (C1)-(C3) (with X, Y instead of x, y and ⊆ instead of ≤), (C4) is equivalent to: (C4’) for any X ⊆ A and C−closed Y ⊆ A, the sets X\Y and Y /X are C−closed. Accordingly, C → (P(A)), i.e. the family of C−closed sets, is closed under \, /. One defines X C Y = C(X Y ), X ∪C Y = C(X ∪ Y ). The family of C−closed sets with C , \, /, ∪C , ∩ is a complete l.o. residuated groupoid (not necessarily distributive as a lattice). In a similar way one constructs nondistributive l.o. residuated semigroups and monoids. If 3
This reflects Lambek’s restriction: the antecedents of sequents are nonempty.
Extensions of Lambek Calculi
113
· is associative (resp. commutative), then and C are associative (resp. commutative). If 1 is the unit for ·, then C({1}) is the unit for C . Algebras for linear logics can be constructed from phase spaces. Girard [32] defines a phase space as a structure (A, ·, 1, O) such that (A, ·, 1) is a commutative monoid and O ⊆ A. We consider the powerset algebra P(A). Since is commutative, then X\Y = Y /X, and we write X → Y . For X ⊆ A, one defines negation X ∼ = X → O (Girard writes ⊥• for O and X ⊥ for X ∼ ). Then, C(X) = X ∼∼ defines a nuclear closure operation on P(A) (the C−closed sets are called facts). The family of facts is a complete commutative residuated lattice, closed under negation, and X ∼∼ = X (double negation) holds for any fact X. Observe that X ∼ = X ∼∼∼ holds for any X ⊆ A, hence the facts are precisely the sets of the form X ∼ . The latter equation is an instance of (1), because (∼ ,∼ ) is a Galois connection. For noncommutative linear logics with one negation, e.g. Cyclic MALL [77] (see Section 5), one takes a monoid (A, ·, 1), and O is required to satisfy a · b ∈ O iff b · a ∈ O, for all a, b ∈ A.
(2)
This yields X\O = O/X, for any X ⊆ A, so we have two implications \, /, but one negation. Facts are defined as above. Now the family of facts is a complete residuated lattice, closed under ∼ ; the double negation equation is valid. For noncommutative linear logics with two negations, e.g. Noncommutative MALL [1], (2) is not assumed. One defines negations X ∼ = X\O, X − = O/X and closure operations φ(X) = X −∼ , ψ(X) = X ∼− ; they are closure operations on P(A), since (∼ ,− ) is a Galois connection. One requires φ = ψ (the Abrusci condition). Then the resulting closure operation is nuclear. The φ−closed sets (facts) form a complete residuated lattice, closed under both negations. X −∼ = X = X ∼− holds for any fact X. For nonassociative linear logics [18, 19], one takes a unital groupoid (A, ·, 1) and O ⊆ A, satisfying: (a · b) · c ∈ O iff a · (b · c) ∈ O, for all a, b, c ∈ A.
(3)
Assuming (2) and (3), we obtain models for nonassociative linear logic with one negation, whereas (3) and φ = ψ yield models with two negations. (3) is needed to show (C4’). The φ−closed sets are of the form Z ∼ and the ψ−closed sets of the form U − . Assume that φ = ψ and Y is φ−closed. Then, Y = Z ∼ , for some Z ⊆ A, and Y = U − , for some U ⊆ A. Let X ⊆ A. From (3) it follows that X\(Z\O) = (Z X)\O, hence X\Y is φ−closed. Similarly, (O/U )/X = O/(X U ), hence Y /X is ψ−closed, and consequently, φ−closed. For the phase spaces, mentioned above, O is a fact, since O = {1}∼ . It is reasonable to consider linear logics, based on non-unital algebras, e.g. l.o. residuated semigroups (groupoids); see Section 5. The corresponding phase spaces are defined as above except that the underlying algebra is
114
Wojciech Buszkowski
a semigroup (groupoid) (A, ·). Then, the set O ⊆ A need not be a fact. Buszkowski [18, 19] also considers a more general notion of a phase space: the set O is replaced by a relation R ⊆ A2 , and negations are defined by: X ∼ = {b ∈ A : ∀a∈X R(a, b)}, X − = {a ∈ A : ∀b∈X R(a, b)} (so (∼ ,− ) is a polarity; see [55]). For substructural logics of intuitionistic character (see Section 4), one employs intuitionistic phase spaces, introduced in [62], of the form (A, ·, 1, R) or (A, ·, R) with R ⊆ A2 , but now the role of R is different. For the case of monoids (A, ·, 1), one defines basic closed sets [a, b, c] = {x ∈ A : R(axb, c)} for a, b, c ∈ A. The family C of closed sets consists of all intersections of arbitrary families of basic closed sets. So C is a closure system and determines a closure operation, defined as follows: C(X) is the smallest closed set containing X, i.e. the intersection of all basic closed sets containing X. It is easy to show that C is nuclear. Since \, / in P(A) distribute over intersections in the top argument and turn unions into intersection in the bottom argument, it suffices to prove (C4’) for X = {d}, d ∈ A, and Y being a basic closed set. We have {d}\[a, b, c] = [a · d, b, c] and [a, b, c]/{d} = [a, d · b, c]. As above, one obtains a residuated lattice whose elements are the sets in C, i.e. the C−closed sets. For the case of semigroups (groupoids), one applies similar constructions [16].
3 Sequent systems for L and NL The most interesting logical contributions of Lambek [48, 49] are Gentzenstyle sequent systems for L and NL. Sequents of L are of the form Γ ⇒ α, where Γ is a nonempty finite sequence of formulas and α is a formula of L. For NL, the antecedents of sequents are bunches, i.e. the elements of the free groupoid generated by the set of formulas. So neither system admits empty antecedents of sequents (Lambek’s restriction4 ). Γ, Δ, Θ denote finite sequences of formulas for the case of L and bunches for the case of NL. A compound bunch is written as (Γ, Δ). Γ [ ] denotes a context, i.e. a bunch with a unique occurrence of a special variable x, and Γ [Δ] stands for the substitution of Δ for x in Γ [ ]. In both systems the only axioms are (id); one can restrict them to p ⇒ p, for any variable p. The inference rules are shown in Table 15 . Lambek [48, 49] proves the cut-elimination theorems for both systems: every sequent provable in the system can be proved without the cut rule. Nonetheless, this rule is needed in proofs from nonlogical assumptions, e.g. assuming p ⇒ q and q ⇒ r, one cannot prove p ⇒ r without (cut). Clearly this theorem implies the decidability of L and NL. Furthermore, it implies 4 5
In algebraic models, the unit element for product need not exist. To obey Lambek’s restriction, one assumes Γ = in (⇒ \), (⇒ /).
Extensions of Lambek Calculi rule (· ⇒) (⇒ ·) (\ ⇒) (⇒ \) (/ ⇒) (⇒ /) (cut)
115 L Γ,α,β,Γ ⇒γ Γ,α·β,Γ ⇒γ Γ ⇒α Δ⇒β Γ,Δ⇒α·β Γ,β,Γ ⇒γ Δ⇒α Γ,Δ,α\β,Γ ⇒γ α,Γ ⇒β Γ ⇒α\β Γ,α,Γ ⇒γ Δ⇒β Γ,α/β,Δ,Γ ⇒γ Γ,β⇒α Γ ⇒α/β Γ,α,Γ ⇒β Δ⇒α Γ,Δ,Γ ⇒β
NL Γ [(α,β)]⇒γ Γ [α·β]⇒γ Γ ⇒α Δ⇒β (Γ,Δ)⇒α·β Γ [β]⇒γ Δ⇒α Γ [(Δ,α\β)]⇒γ (α,Γ )⇒β Γ ⇒α\β Γ [α]⇒γ Δ⇒β Γ [(α/β,Δ)]⇒γ (Γ,β)⇒α Γ ⇒α/β Γ [α]⇒β Δ⇒α Γ [Δ]⇒β
Table 1 Inference rules of sequent systems for L and NL
the subformula property: each provable sequent has a proof such that all formulas appearing in this proof are subformulas of those appearing in the sequent. Consequently, L and NL are conservative extensions of their language restricted fragments, e.g. the product-free fragments. Either sequent system is equivalent to the corresponding algebraic axiomatization from Section 1 in the following sense: Γ ⇒ α is provable in the sequent system if and only if ·Γ ⇒ α is provable in the algebraic one. By ·Γ we mean the formula arising from Γ after one has replaced every comma with ·. Let A be a residuated groupoid. A valuation in A is a homomorphism from the free algebra of formulas to A. Every valuation μ is extended for nonempty sequences of formulas and bunches by setting: μ(α1 , . . . , αn ) = μ(α1 ) · · · · · μ(αn ) and μ((Γ, Δ)) = μ(Γ ) · μ(Δ). A sequent Γ ⇒ α is said to be true in A for μ, if μ(Γ ) ≤ μ(α). A sequent is valid in A, if it is true for all valuations in A. A sequent is valid in a class of algebras C, if it is valid in every algebra from C. A sequent is entailed by a set of sequents Φ in C, if it is true in A for μ, for any A ∈ C and μ in A such that all sequents from Φ are true in A for μ. We refer to the sequents in Φ as assumptions (in opposition to axioms, assumptions need not be closed under substitutions). L (resp. NL)(in either form) is strongly complete with respect to residuated semigroups (resp. groupoids): the sequents provable from Φ are precisely the sequents entailed by Φ in this class. As a consequence, the sequents provable in the pure system are precisely the sequents valid in this class (weak completeness). The strong completeness can be proved in a routine way. The soundness is easy: axioms (id) are valid and all rules preserve the validity (even the truth for μ). The proof of completeness employs Lindenbaum-Tarski algebras. It is noteworthy that the proof of the weak completeness of L with respect to language models P(Σ + ), due to Pentus [65], is highly nontrivial. This seems to be the most involved proof in the theory of Lambek calculi. Let
116
Wojciech Buszkowski
us note that L is not strongly complete with respect to these models. If p ⇒ p · p is true for μ in P(Σ + ), then μ(p) = ∅, and consequently, p ⇒ q is true for μ. Yet p ⇒ q is not provable from p ⇒ p · p, since there exists a residuated semigroup such that μ(p) ≤ μ(p) · μ(p) but not μ(p) ≤ μ(q) for some valuation μ, e.g. P(Σ ∗ ) with μ(p) = {}, μ(q) = Σ + . NL is not even weakly complete with respect to the corresponding language models P(Σ (+) ) [26]. For ((p · q)/r) · r ⇒ p · r is valid in this class but not provable. A categorial grammar based on a logic L (in the form of a sequent system) can be defined as a triple G = (Σ, I, α0 ) such that Σ is a finite alphabet (lexicon), I is a map that assigns a finite set of formulas (types) to each element of Σ, and α0 is a formula (the designated type). Usually, α0 is an atom, e.g. s. The language of G (L(G)) consists of all strings v1 . . . vn , where n ≥ 1 and v1 , . . . , vn ∈ Σ, such that α1 , . . . , αn ⇒ α0 is provable in L for some αi ∈ I(vi ), i = 1, . . . , n. For nonassociative logics, the sequence (α1 , . . . , αn ) is the yield of a bunch Γ such that Γ ⇒ α0 is provable. L is NP-complete [66], while even the provability from a finite set Φ in NL is PTIME [14]. The categorial grammars based on L generate the (−free) context-free languages [64], and the same holds for those based on NL, even augmented with finitely many assumptions [14]. The provability from assumptions in L, even restricted to formulas built from / only, is undecidable [11]. If product is admitted, the undecidability easily follows from the undecidability of the word problem for semigroups and the fact that every semigroup can be embedded in a residuated semigroup: the latter is the powerset algebra of the former and h(a) = {a} defines the embedding. In the proofs of the context-freeness of categorial grammars based on L and NL one uses some interpolation properties of these logics. An interpolation lemma for L is due to Roorda [70]. By |α| we denote the number of occurrences of variables in α and by |α|p (resp. |Γ |p ) the number of occurrences of p in α (resp. Γ ). Let denote the provability in L. Roorda’s Interpolation Lemma: If Γ, Δ, Γ ⇒ α, Δ = , then there exists a formula δ (an interpolant of Δ in this sequent) such that Δ ⇒ δ, Γ, δ, Γ ⇒ α and, for any variable p, |δ|p exceeds neither |Δ|p , nor |Γ, Γ , α|p . In a cut-free proof one can take different variables in different occurrences of the same axiom p ⇒ p. Therefore every provable sequent is a substitution instance of a provable special sequent, containing 0 or 2 occurrences of each variable; one substitutes variables for variables. In a provable special sequent α1 , . . . , αn ⇒ αn+1 one replaces each formula αi , 1 ≤ i ≤ n, by its interpolant (in this sequent) βi and αn+1 by βn+1 , i.e. an interpolant of α1 , . . . , αn (in this sequent). The resulting sequent β1 , . . . , βn ⇒ β is thin: (i) it is provable in L, (ii) it is special, (iii) every formula βi , 1 ≤ i ≤ n + 1 contains at most one occurrence of each variable (such formulas are said to be thin). For a finite set of variables P and an integer m ≥ 1, by T (P, m) we denote the set of all formulas α, containing variables from P only and such that |α| ≤ m. Using some combinatorial tools, Pentus [64] proves that for every thin sequent β1 , . . . , βn ⇒ β, where n ≥ 2 and all formulas are in T (P, m),
Extensions of Lambek Calculi
117
there exists 1 ≤ i < n such that βi , βi+1 has a thin interpolant δ ∈ T (P, m) and the sequent β1 , . . . , βi−1 , δ, βi+2 , . . . , βn ⇒ β is thin. Accordingly, every thin sequent on T (P, m) can be derived by a sequence of reductions γ1 , γ2 ⇒ γ3 and γ1 ⇒ γ2 such that γi ∈ T (P, m). This yields an analogous derivation of every sequent α1 , . . . , αn ⇒ αn+1 on T (P, m), since the provable sequents are closed under substitutions. Up to the direction of arrows, these are derivations in a context-free grammar whose nonterminals are the formulas in T (P, m). For NL, an analogous interpolation lemma holds [39]. For this case, however, an interpolant is a subformula of some formula occurring in the sequent, and this is preserved for NL augmented with a finite set of nonlogical assumptions Φ (then δ is a subformula of some formula occurring in the sequent or Φ) [14]. Consequently, every categorial grammar based on NL (also augmented with Φ) can be transformed into an equivalent context-free grammar in polynomial time. If P = NP, this is impossible for categorial grammars based on L.
4 Substructural logics A characteristic feature of sequent systems for L and NL is the lack of structural rules, appearing in sequent systems for intuitionistic logic and classical logic6 . For the intuitionistic case, these are: exchange (e), integrality (i) and contraction (c); see Table 2. (Integrality is also called: left weakening.) L NL Γ,α,β,Δ⇒γ Γ [(Δ1 ,Δ2 )]⇒γ (e) Γ,β,α,Δ⇒γ Γ [(Δ2 ,Δ1 )]⇒γ Γ [Δi ]⇒β Γ,Δ⇒β (i) Γ,α,Δ⇒β Γ [(Δ1 ,Δ2 )]⇒β Γ,Δ,Δ,Θ⇒α Γ [(Δ,Δ)]⇒α (c) Γ,Δ,Θ⇒α Γ [Δ]⇒α
rule
Table 2 Structural rules (e), (i) and (c)
These rules express the following algebraic properties of product: (e) commutative: a · b = b · a, (i) decreasing: a · b ≤ a, a · b ≤ b, (c) square-increasing: a ≤ a · a. Substructural logics are characterized as the logics whose sequent systems omit some structural rules. In that sense this name was introduced in [71]. NL and L are typical substructural logics. Actually, NL is more substructural than L. The latter can also be formalized as the former enriched 6 For sequent systems, by a structural rule one means a rule that does not explicitly refer to logical connectives or constants.
118
Wojciech Buszkowski
with the structural rule of associativity: from Γ [((Δ1 , Δ2 ), Δ3 )] ⇒ α infer Γ [(Δ1 , (Δ2 , Δ3 ))] ⇒ α, and conversely. In the literature on substructural logics, these logics are usually formulated in richer languages: besides Lambek connectives ·, \, / one admits lattice connectives ∨, ∧ and constants 1, 0, , ⊥. 1 is interpreted as the unit element for product, 0 as an arbitrary designated element, and , ⊥ as the greatest and the least element, respectively. In some tradition, stemming from relevance logics, lattice connectives and constants , ⊥ are referred to as extensional, whereas Lambek connectives and constants 1, 0 as intensional. Another tradition, stemming from linear logics, refers to the former as additive and to the latter as multiplicative. Linear logicians, however, use a different notation, after Girard [32]. For instance, ⊥ stands for our 0, ⊗ for our ·, & for our ∧, and ⊕ for our ∨ (we will use ⊕ for coproduct, i.e. par in [32]). Our notation is commonly used in substructural logics [30]. A basic substructural logic is Full Lambek Calculus FL, formulated in language ·, \, /, 1, ∨, ∧. We also present its nonassociative version FNL. Sequent systems admit sequents with the empty antecedent. By we denote both the empty sequence and the empty bunch. In models μ() = 1. For bunches, one assumes (, Γ ) = = (Γ, ). One also writes ⇒ α for ⇒ α. The axioms are (id) and (a.1): ⇒ 1. Table 3 shows the inference rules for ∨, ∧, 1. All rules from Table 1 are admitted (Lambek’s restriction is neglected). rule (∨ ⇒) (⇒ ∨) (∧ ⇒) (⇒ ∧) (1 ⇒)
FL FNL Γ,α,Γ ⇒γ Γ,β,Γ ⇒γ Γ [α]⇒γ Γ [β]⇒γ Γ,α∨β,Γ ⇒γ Γ [α∨β]⇒γ Γ ⇒αi same Γ ⇒α1 ∨α2 Γ [αi ]⇒β Γ,αi ,Γ ⇒β Γ,α1 ∧α2 ,Γ ⇒β Γ [α1 ∧α2 ]⇒β Γ ⇒α Γ ⇒β same Γ ⇒α∧β Γ []⇒α Γ,Δ⇒α Γ,1,Δ⇒α Γ [1]⇒α
Table 3 Rules for ∨, ∧, 1
One admits no axioms (rules) for 0 (if added to the language). With 0 one defines negations: α∼ = a\0, α− = 0/α, and similarly in algebras. By (L4), one obtains a ≤ a−∼ and a ≤ a∼− , but not the converse inequalities. If product is commutative, then a∼ = a− ; the resulting single negation will be denoted by ∼ . If ⊥ is added to the language, one admits the axioms (a.⊥): Γ, ⊥, Δ ⇒ α (in the nonassociative format: Γ [⊥] ⇒ α). They express the laws ⊥ · a = ⊥, a · ⊥ = ⊥, valid in residuated groupoids with ⊥. One defines = ⊥\⊥ or adds the constant and the axioms (a.): Γ ⇒ . Residuated lattices with a designated element 0 are called FL-algebras [30]. FL (resp. with 0) is strongly complete with respect to residuated lattices
Extensions of Lambek Calculi
119
(resp. FL-algebras). Similarly, FNL (resp. with 0) is strongly complete with respect to l.o. unital residuated groupoids (resp. with 0; these algebras will be referred to as FNL-algebras). Substructural logics are often defined as all extensions of FL by a set of new axioms (rules). The class of algebraic models for a substructural logic forms a variety or a quasi-variety of residuated lattices. In the first-order language of residuated lattices (FL-algebras) equations correspond to sequents and quasiequations to rules. In particular, one can add some structural rules. FL with (e) is denoted by FLe , FL with (e), (c) by FLec , and similarly for other logics. The cut-elimination theorems are true for all systems FNLS , FLS , where S is any package of structural rules. If (c) is not in S, then this implies the decidability of the logic, since the size of the conclusion of each rule is not less than the size of its premise(s) and there holds the subformula property. FLc is undecidable, whereas the remaining logics FLS are decidable [23]. It also follows that FLS is a conservative extension of LS with 1 (empty antecedents admitted), and similarly for nonassociative logics. FNLe , FLe and their extensions employ one implication only. In the sequent systems, given above, we replace \ with → and omit rules for /. The rule (e) can be omitted, if we represent the antecedents of sequents as multisets. In unital residuated groupoids, product is decreasing if and only if a ≤ 1 for any element a (so 1 = ). In FNLi one proves Γ ⇒ 1 by (a.1) and (i). To get 0 = ⊥, one needs the axioms: (a.0) Γ [0] ⇒ α (in the associative format: Γ, 0, Δ ⇒ α). One abbreviates (i)+(a.0) by (w) (from weakening)7 . A l.o. residuated groupoid whose product is both decreasing and square increasing satisfies a · b = a ∧ b (we have a · b ≤ a ∧ b, by (i), and a ∧ b ≤ (a∧b)·(a∧b) ≤ a·b, by (c) and monotony), hence bounded algebras from this class are precisely Heyting algebras: bounded lattices admitting a residual for meet. It follows that FNLwc = FNLewc coincides with intuitionistic logic (in a language with redundant symbols), and similarly for FLwc . Many well-known nonclassical logics can be presented as axiomatic extensions of FLe . With (w) but not (c), one goes in the direction of many-valued and fuzzy logics. For instance, the L ukasiewicz logic L ∞ can be axiomatized as FLew with the new axiom α ∨ β ⇔ (α → β) → β [30]. As a consequence, one obtains α ⇔∼∼ α (the double negation law), where ∼ α = α → 0. H´ ajek’s basic fuzzy logic amounts to FLew with the new axioms α ∧ β ⇔ α · (α → β) and ⇒ (α → β) ∨ (β → α) [34]. Constructive logic with strong negation amounts to FLew with double negation and the Nelson axiom: α → β ⇔ ((α · α) → β) ∧ (((∼ β) · (∼ β)) →∼ α) Here we slightly differ from standard presentations, admitting sequents Γ ⇒, with the empty succedent interpreted as 0. Then (a.0) is replaced by the axiom 0 ⇒ and the rule: (o) from Γ ⇒ infer Γ ⇒ α (right weakening).
7
120
Wojciech Buszkowski
One defines the intuitionistic implication α →I β = (α · α) → β and negation ¬α = α →I 0 [73, 74]. Some extensions of FLec are closely related to relevance logics [30]; also see Section 5. The logics discussed in this paragraph do not admit cut elimination (if presented as extensions of FLe ). These stronger nonclassical logics, however, require different methods than the basic logics FLS , FNLS . As a rule, they do not possess cut-free sequent systems of the above form. Nonetheless, the role of residuation is emphasized in some modern treatments, e.g. the theory of fuzzy logics in H´ ajek [34]. A current trend in substructural logics attempts to classify and to characterize different nonclassical logics in terms of algebraic properties of their models: residuated lattices satisfying additional axioms; see [24]. In logics admitting the empty antecedents, if ⇒ α is provable, then one says that α is provable or is a theorem. More precisely, α is a 1-theorem, since 1 ≤ μ(α) in models. Every sequent is deductively equivalent to a formula, e.g. α, β ⇒ γ to (α · β)\γ. These logics can be formalized as Frege-Hilbert style systems, operating on formulas only; see [30, 31]. This makes them more easily comparable with other nonclassical logics, presented in this form. These logics (with ∧, ∨) are algebraizable: there exist syntactic translations of formulas of the logic into equations in the first-order language of algebras and in opposite direction such that the formula is provable if and only if the equation is valid in the corresponding class of algebras, and conversely. For example, α is translated into 1 ≤ α, i.e. 1∧α = 1, and α = β to (α\β)∧(β\α) (here we identify propositional formulas with first-order terms). The implicational fragment of FLe , i.e. restricted to formulas with → as the only connective, amounts to the logic BCI. Its Frege-Hilbert system admits the axioms: (a.B) (β → γ) → ((α → β) → (α → γ)) (a.C) (α → (β → γ)) → (β → (α → γ)) (a.I) α → α and the only rule modus ponens: from α → β and α infer β. (The axioms are precisely the types of combinators B , C and I in combinatory logic.) The analogous fragment of FLei amounts to BCK; one replaces (a.I) with (a.K): α → (β → α) (the type scheme of the combinator K). Zielonka [78] shows that the \, /−fragment of L1 admits no finite axiomatization of this kind. Some linguists, however, prefer weaker logics, not admitting empty antecedents, like NL, L; see [59] for a discussion. The latter also seem more natural from the point of view of modal logics. The subsystems of FNL and FL, restricted to sequents Γ ⇒ α with Γ = , will be denoted here FNL− and FL− , respectively. Since they are largely omitted in the literature on subtructural logics, no standard notation can be found there. A more consistent notation might be FL for FL− , while FL1 for FL, but this would disagree with well-established standards. Notice that NL with 1 is not a conservative extension of NL; e.g. p/(q/q) ⇒ p is provable in the former but not in the latter. This also holds for L with 1 versus L as well as other related
Extensions of Lambek Calculi
121
logics. FNL− (resp. FL− ) is strongly complete with respect to l.o. residuated groupoids (resp. semigroups). The cut-elimination theorem holds for each of these systems. As a consequence, FNL− (resp. FL− ) is decidable and conservatively extends NL (resp. L). In what follows, NL1 (resp. L1) denotes L (resp. NL) with 1 (Lambek’s restriction omitted). Buszkowski [17] provides a reduction of the provability in FL to that in FL− for sequents not containing the constant 1. This reduction also works for these logics enriched with any package of structural rules (e), (i), (c) and their nonassociative versions. One defines two translations N, P of formulas into other formulas. N is applied to negative occurrences and P to positive occurrences of formulas in a sequent. Recall that in α1 , . . . , αn ⇒ α each αi is negative and α is positive. If α · β is positive (resp. negative), then both α and β are positive (resp. negative), and similarly for α ∨ β, α ∧ β. If α\β is positive (resp. negative), then α is negative (resp. positive) and β is positive (resp. negative), and similarly for β/α. The translations N, P are defined in Table 4, where stands for the provability in FLS . By − we denote the provability in FL− S . Then can be reduced to a boolean combination of instances of − , according to the following conditions: (P1) (P2) (P3) (P4) (P5)
α1 , . . . , αn ⇒ β iff − N (α1 ), . . . , N (αn ) ⇒ P (β), for n ≥ 1, p for any variable p, α ◦ β iff α and β, for ◦ ∈ {·, ∧}, α ∨ β iff α or β, α\β iff α ⇒ β, and similarly for β/α. γ
N (γ)
P (γ)
p p p α◦β N (α) ◦ N (β) P (α) ◦ P (β) α·β N (α) · N (β) P (α) · P (β) α·β as above (P (α) · P (β)) ∨ P (β) α·β as above (P (α) · P (β)) ∨ P (α) α·β as above (P (α) · P (β)) ∨ P (α) ∨ P (β) α\β P (α)\N (β) N (α)\P (β) α\β (P (α)\N (β)) ∧ N (β) as above β/α N (β)/P (α) P (β)/N (α) β/α (N (β)/P (α)) ∧ N (β) as above
condition variable ◦ ∈ {∧, ∨} α , β α , β α , β α, β α α α α
Table 4 Translations N and P
This also works for logics with 0, , ⊥. One sets P (γ) = N (γ) = γ, for γ ∈ {0, , ⊥}, and adds (P6) 0, ⊥, . For associative logics this reduction can be extended for sequents containing 1 by applying the translation of such sequents into 1-free sequents, given in [45]. (P4) is precisely the disjunction property (DP), which holds for intuitionistic logic and all logics FLS , FNLS . (Actually, DP is usually formulated as the
122
Wojciech Buszkowski
left-to-right implication of (P4), since the converse implication holds for all logics.) Horˇcik and Terui [37] prove that every consistent substructural logic, possessing DP, is PSPACE-hard. The proof reduces the validity problem for quantified boolean formulas to the provability problem in the pure logic. In [17] this theorem was strengthened: every substructural logic between FL− and a consistent substructural logic, possessing DP, is PSPACE-hard. Recall that a logic, presented as a sequent system, is consistent, if not every sequent is provable. DP holds for linear logics, considered in Section 5. The logics FLS without (c), intuitionistic logic and linear logics (without exponentials) extending FL are PSPACE, hence they are PSPACE-complete. The logics discussed above can further be extended in various ways. The distributive laws for ∨, ∧ are not derivable in FL. Adding them as new axioms yields DFL (Distributive FL). It suffices to add: (D) α ∧ (β ∨ γ) ⇒ (α ∧ β) ∨ (α ∧ γ) . Then, one derives the remaining laws, like in the calculus of lattices. The resulting system does not allow cut elimination. A sequent system for DFL, allowing cut elimination, was studied in [43]. This system can be presented as FNL with an additional bunch construction: if Γ and Δ are bunches, then (Γ ; Δ) is a bunch. In algebras semicolon is interpreted as ∧. One admits all rules of FNL, the associativity rule for commas (interpreted as product) and the associativity rule plus (e), (i), (c) for semicolons. Kozak [43] proves the finite model property of a cut-free system of DFL, which implies its completeness (with respect to residuated monoids based on distributive lattices), decidability and the cut-elimination theorem. Another interesting extension is action logic, introduced by Pratt [67] as an algebraic calculus. Action algebras are algebras (A, ·, \, /, ∨,∗ , 1, ⊥) such that (A, ∨, ⊥) is a ∨−semilattice with ⊥, (A, ·, \, /, 1, ≤) is a residuated monoid (≤ is the semilattice ordering) and ∗ is Kleene iteration (a∗ is the least element b such that a ≤ b, 1 ≤ b and b · b ≤ b, i.e. a∗ is the reflexive and transitive closure of a). In language models P(Σ ∗ ) one defines L∗ = {Ln : n ∈ N}, and similarly for relational P(W 2 ). The resulting action algebras algebras ∗ ∗ n are −continuous: a = {a : n ∈ N}. Action algebras can be regarded as Kleene algebras, admitting residuals for product (we skip the definition of a Kleene algebra). In opposition to Kleene algebras, action algebras form a finitely based variety (this means: the class of algebras can be axiomatized by finitely many equations). Action logic can be presented as FL with ⊥ but without ∧, augmented with Kleene iteration. One admits three new axioms and one new rule: ⇒ α∗ α∗ , α∗ ⇒ α∗ α ⇒ α∗ α ⇒ β ⇒ β β, β ⇒ β α∗ ⇒ β
Extensions of Lambek Calculi
123
(cut elimination fails). [15] shows that this logic is not complete with respect to ∗ −continuous action algebras; the equational theory of the latter class is Π10 −complete. The same holds for action logic with ∧, corresponding to action lattices. Kuznetsov [46] proves the undecidability of the latter. It was argued in [35, 36] that in relational algebras S/R and R\S could be interpreted as the weakest pre-condition and the strongest post-condition of programs. Unfortunately, the undecidability results seem to block direct applications of action logic for automated reasoning about programs. One can also add unary modalities 3, 2↓ whose interpretation in algebras satisfies (1-RES): 3(a) ≤ b iff a ≤ 2↓ (b). In general, 2↓ is not a De Morgan dual of 3, hence we write the superscript ↓ . A cut-free sequent system for FNL with 3, 2↓ (FNLm) admits a new bunch construction: if Γ is a bunch, then Γ is a bunch. In algebras, μ(Γ ) = 3(μ(Γ )). The inference rules for 3, 2↓ are as follows. (3 ⇒) (2↓ ⇒)
Γ [α] ⇒ β Γ [3α] ⇒ β Γ [α] ⇒ β Γ [2↓ α] ⇒ β
(⇒ 3)
Γ ⇒α Γ ⇒ 3α
(⇒ 2↓ )
Γ ⇒ β Γ ⇒ 2↓ β
In a similar way one obtains FLm, FNLm− , FLm− . These logics behave analogously to their versions without 3, 2↓ . The reader is referred to [69] for a general discussion and [57, 58, 61] for applications in categorial grammars. DFL with ⊥, and classical negation ¬, satisfying new axioms: (¬1) α ∧ ¬α ⇒ ⊥ (¬2) ⇒ α ∨ ¬α can be called boolean FL and denoted BFL. In a similar way one obtains BFNL, BFNL− , BFL− and other logics of this kind. Presented in this form they do not allow cut elimination. No cut-free sequent systems for these logics are known. BFNL− (resp. BFL− ) is strongly complete with respect to boolean residuated groupoids (resp. semigroups), i.e. residuated groupoids (resp. semigroups) based on a boolean algebra; for BFNL and BFL the algebras must be unital. It follows from some results of [44] that BFL− and BFL are undecidable. BFNL− and BFNL are decidable, and the same holds for their consequence relations [16]. BFNL− is PSPACE-complete [56]. These logics can be treated as extensions of L, NL or L1, NL1 by the connectives of classical logic. Since product distributes over finite joins, it behaves like a normal binary possibility operator of multi-modal logics. Therefore these logics can also be presented in the form of Frege-Hilbert style systems, typical for modal logics. The provable formulas α are −theorems, i.e. μ(α) = in algebras. Systems of this kind for BFNL− and BFL− are given in [41]8 . The connectives are Lambek connectives and classical connectives ¬, ∧, ∨, ⇒, ⇔ (now ⇒ and ⇔ stand for the classical conditional and 8
In [41] these logics are denoted by PNL and PL, respectively.
124
Wojciech Buszkowski
bi-conditional). So Lambek’s arrows α ⇒ β are treated as conditional formulas. In BFNL− , the axioms are all tautologies of classical propositional logic (in the extended language), and the rules are the standard modus ponens: from α ⇒ β and α infer β, and (r1), (r2) from Section 1. In BFL− , one also admits (as.1), (as.2) as axioms. Interestingly, the intuitionistic variant of BFL− admits a cut-free sequent system, closely related to that for DFL, and is decidable [42]. BFNL− can also be regarded as an analogue of the minimal temporal logic Kt [7]. Product is a binary analogue of the unary modal operator F and \, / correspond to H. Actually, BFNL− can be faithfully interpreted in Kt [56]. BFL is closely related to arrow logic of van Benthem [75]. Several authors considered the finite model property (FMP) for substructural logics: the provable sequents (formulas) are precisely those valid in finite algebraic models of the logic, and the strong finite model property (SFMP): the sequents (formulas) provable from a finite set of sequents (formulas) Φ are precisely those which follow from Φ in finite models. Blok and van Alten [8, 9] proved SFMP for different associative and nonassociative logics, admitting (i), by means of some methods of universal algebra. SFMP holds for DFNL, DFNL− , also augmented with structural rules; see [16] for a discussion. The provability from assumptions in FNL and FNL− is undecidable [22], hence these logics do not possess SFMP, and similarly for associative logics without (i). Nonetheless FMP holds for FL, FL− , also with (e) [62]. The latter paper constructs a finite counter-model for an unprovable sequent (formula) from an intuitionistic phase space whose elements are finite sequences of formulas. A thorough discussion of such phase spaces and their applications, e.g. for a model-theoretic proof of cut elimination, can be found in [30]. Earlier a similar method was applied in [47] for linear logic MALL and its extensions; this proof can be adapted for noncommutative and nonassociative linear logics (without exponentials). In fact, FMP for linear logics entails FMP for their intuitionistic fragments without 0 and negation(s), since the former are conservative extensions of the latter [2, 18].
5 Linear logics One also considers other logics, which drop the distributive laws for ∨, ∧ and replace boolean negation by De Morgan negation; in algebras: ¬ is orderreversing and satisfies ¬¬a = a (the double negation law ). The most popular logics of that kind are linear logics. Multiplicative-Additive Linear Logic MALL of Girard [32] can be presented as FLe with 0, ⊥, and the double negation axiom α∼∼ ⇒ α (one defines α∼ = α → 0, hence the converse sequent is provable in FLe ; see (L4)). Noncommutative MALL of Abrusci [1] admits two negations α∼ = α\0, α− = 0/α and amounts to FL with 0, ⊥, and the axioms α−∼ ⇒ α and α∼− ⇒ α. Cyclic MALL of Yetter [77] addi-
Extensions of Lambek Calculi
125
tionally assumes α∼ ⇔ α− (the cyclic law ). Consequently, this logic admits one negation ∼ , satisfying the double negation law. Linear logics, presented as axiomatic extensions of FL or FLe , do not allow cut elimination. For instance, in Noncommutative MALL (q · p∼ )− ⇒ p/q is provable. It can be rewritten as 0/(q · (p\0)) ⇒ p/q. From the axiom 0/(p\0) ⇒ p one derives (0/(p\0))/q, q ⇒ p by (id), (/ ⇒), which yields (0/(p\0))/q ⇒ p/q by (⇒ /). Also 0/(q · (p\0)) ⇒ (0/(p\0))/q is provable, by (⇒ ·), (/ ⇒) and (⇒ /) (twice). So the required sequent can be derived by (cut). There exists no cut-free proof of this sequent in FL with double negation axioms. Indeed, it is not an axiom. It could be the conclusion of (/ ⇒) or (⇒ /) only. For the first case, one premise would be ⇒ q · (p\0), which could arise by (⇒ ·) only. This is impossible, since ⇒ q is not provable. For the second case, the premise would be 0/(q · (p\0)), q ⇒ p, which could arise by (/ ⇒) only. Then, one premise would be 0 ⇒ p. The latter sequent is unprovable. Similar examples work for MALL and Cyclic MALL. An FL-algebra is said to be involutive, if a−∼ = a∼− = a for any element a. An involutive FL-algebra is said to be cyclic, if a∼ = a− for all a. In the same way one defines (cyclic) involutive FNL-algebras. MALL is strongly complete with respect to involutive commutative FL-algebras, Noncommutative MALL with respect to involutive FL-algebras, and Cyclic MALL with respect to cyclic involutive FL-algebras. In [30] and other works on substructural logics, Noncommutative MALL is denoted by InFL (read: Involutive FL), its cyclic version by CyInFL, and MALL by InFLe . The second notation seems more convenient, if one considers weaker logics, and we adopt it for them. For example, the nonassociative version of InFL is denoted by InFNL, its subsystem, allowing neither empty antecedents of sequents, nor multiplicative constants, by InFNL− , and its multiplicative fragment by InNL− . Galatos et al. [30] also consider stronger logics. InDFLec is closely related to the relevance logic R. On the other hand, the relevance logic E is not algebraizable and cannot be presented as an extension of FL, but a conservative extension of E can be presented in this way (without (e)). Roughly, relevance logics can be characterized as extensions of linear logics with the distributive laws for ∧, ∨ and the contraction rule. InNL− is complete with respect to involutive residuated groupoids, i.e. residuated groupoids with unary operations ∼ ,− which are order-reversing and satisfy the double negation laws (as for FL-algebras) and the contraposition law: a∼ /b = a\b− . The latter is equivalent to a\b = a∼ /b∼ and to a/b = a− \b− . The contraposition laws express some interplay of negations and Lambek connectives. In FL-algebras, the first one amounts to (a\0)/b = a\(0/b), an instance of (a\c)/b = a\(c/b), which is valid in residuated semigroups. So every FL-algebra is an involutive residuated groupoid (even monoid) as a reduct. The following equation is valid in involutive residuated groupoids. (a− · b− )∼ = (a∼ · b∼ )− (4)
126
Wojciech Buszkowski
One defines co-product (i.e. par in [32]): a ⊕ b = (a− · b− )∼ , which yields De Morgan laws: (a · b)∼ = b∼ ⊕ a∼ , (a ⊕ b)∼ = b∼ · a∼ , and similarly for − . The operations \, / can be defined in terms of ⊕,∼ ,− as follows. a\b = a∼ ⊕ b a/b = a ⊕ b−
(5)
So all multiplicative operations are definable in terms of ·,∼ ,− (product can be replaced with co-product). We also obtain: a ⊕ b = a− \b = a/b∼ . Girard [32] provides a two-sided cut-free sequent system for MALL with sequents of the form Γ ⇒ Δ, where Γ and Δ are (possibly empty) finite sequences of formulas. In Γ each comma is interpreted as (multiplicative) product, while in Δ as (multiplicative) co-product. For noncommutative logics, the construction of two-sided cut-free systems, having the subformula property, meets some problems; see [1] and a partial solution in [38]9 . These problems disappear for one-sided sequent systems, following [32]. The latter can be designed in two forms: right-sided with sequents ⇒ Δ, like in [32, 1], and left-sided with sequents Γ ⇒, like in Lambek [51] (Lambek’s name for this logic: (Classical) Bilinear Logic). In algebras, Γ ⇒ (resp. ⇒ Δ) is true for μ, if μ(Γ ) ≤ 0 (resp. 1 ≤ μ(Δ)). The axioms and rules of Noncommutative MALL in both forms are shown in Table 6 (we omit ⇒). The systems are restricted to formulas in negation normal form: (iterated) negations occur at variables only and pairs −∼ and ∼− are eliminated. The other connectives are ·, ⊕, ∧, ∨ and the constants are 0, 1, ⊥, . Negations of arbitrary formulas are defined in metalanguage as in Table 5. For n ≥ 0, p(n) denotes p∼···∼ , where ∼ appears n times, and p(−n) denotes p−···− , where − appears n times. (p(n) )∼ = p(n+1) 0∼ = 0 − = 1 ⊥∼ = ⊥− =
(α · β)∼ = β ∼ ⊕ α∼ (α ⊕ β)∼ = β ∼ · α∼ (α ∨ β)∼ = α∼ ∧ β ∼ (α ∧ β)∼ = α∼ ∨ β ∼
(p(n) )− = p(n−1) 1∼ = 1− = 0
∼ = − = ⊥ (α · β)− = β − ⊕ α− (α ⊕ β)− = β − · α− (α ∨ β)− = α− ∧ β − (α ∧ β)− = α− ∨ β −
Table 5 Metalanguage negations
Clearly α∼− = α−∼ = α can be proved by induction on α, using the equations from Table 5. For a valuation μ one requires μ(p(n+1) ) = μ(p(n) )∼ , for any variable p and n ∈ Z. Then, μ(α∼ ) = μ(α)∼ and μ(α− ) = μ(α)− hold for any formula α (induction on α). One also shows that the sequent α, α∼ (resp. α∼ , α) is provable in the left-sided (resp right-sided) cut-free system, for any formula α. The following rules: 9 They are caused by negated compound formulas. In [38] the cut-elimination theorem is proved for a two-sided system, not admitting such formulas.
Extensions of Lambek Calculi
(r-∼∼ )
127
α, Γ Γ, α∼∼
(r-−− )
Γ, α α−− , Γ
are admissible in the left-sided cut-free system and derivable in this system with the cut rules. This also holds for the right-sided system, if one writes (r-∼∼ ) like (r-−− ) above (i.e. −− is replaced by ∼∼ ) and (r-−− ) like (r-∼∼ ) above. So in the left-sided system, α− , Γ is provable if and only if Γ, α∼ is provable. The latter sequents represent the intuitionistic sequent Γ ⇒ α. axiom/rule axioms for p(n) axioms for 0, 1 axioms for ⊥,
rules for 0, 1 rules for · rules for ⊕ rules for ∨ rules for ∧ (cut∼ ) (cut− )
left-sided right-sided p(n) , p(n+1) p(n+1) , p(n) 0 1 Γ, ⊥, Δ Γ, , Δ Γ,Δ Γ,Δ Γ,1,Δ Γ,0,Δ Γ,α,β,Δ Γ1 ,β,Γ2 Δ,α Γ1 ,α,Γ2 β,Δ Γ,α·β,Δ Γ1 ,Δ,α·β,Γ2 Γ1 ,α·β,Δ,Γ2 Γ1 ,β,Γ2 Δ,α Γ1 ,α,Γ2 β,Δ Γ,α,β,Δ Γ1 ,Δ,α⊕β,Γ2 Γ1 ,α⊕β,Δ,Γ2 Γ,α⊕β,Δ Γ,β,Δ Γ,α,Δ Γ,β,Δ Γ,α,Δ Γ,α∨β,Δ Γ,α∨β,Δ Γ,α∨β,Δ Γ,β,Δ Γ,α,Δ Γ,α,Δ Γ,β,Δ Γ,α∧β,Δ Γ,α∧β,Δ Γ,α∧β,Δ ∼ Γ1 ,α,Γ2 Δ,α Γ1 ,α,Γ2 α∼ ,Δ Γ1 ,Δ,Γ2 Γ1 ,Δ,Γ2 Γ1 ,α,Γ2 α− ,Δ Γ1 ,α,Γ2 Δ,α− Γ1 ,Δ,Γ2 Γ1 ,Δ,Γ2
Table 6 One-sided sequent systems for Noncommutative MALL
The cut rules (cut∼ ), (cut− ) are admissible in the cut-free systems from Table 6. For the right-sided system, this cut-elimination theorem was stated in [1] with a syntactic proof, merely outlined. A model-theoretic proof, using phase spaces, appears in [18, 19] for some nonassociative versions of these systems; it can be adapted for the systems above. Observe that metalanguage negations appear in the cut rules only; the cut-free proofs do not manipulate negations except that they appear in axioms. For m = n, p(m) and p(n) are treated as different atoms, e.g. p is not a subformula of p∼ . Consequently, the cut-free systems possess the subformula property in a strong sense: every proof of a sequent Γ consists of sequents built of subformulas of formulas in Γ . In particular, all atoms p(n) occurring in a proof of Γ must occur in Γ . Clearly this yields the decidability of MALL. Furthermore, Noncommutative MALL is a conservative extension of its fragments in restricted languages, e.g. of that without ⊥, . One easily shows that both systems are sound: the axioms are valid in any involutive FL-algebra and the rules preserve the validity (even the truth for a fixed μ). For instance, the left axiom p(n) , p(n+1) is valid, since a · a∼ ≤ 0 in involutive FL-algebras (even: involutive unital residuated groupoids), and the right axiom p(n+1) , p(n) is valid, since 1 ≤ a∼ ⊕ a in these algebras. The
128
Wojciech Buszkowski
left rules for ⊕ are based on the laws a∼ · (a ⊕ b) ≤ b and (a ⊕ b) · b− ≤ a, and their right companion is obvious. The left (resp. right) rule for ∨ (resp. ∧) is sound, since product (resp. co-product) distributes over joins (resp. meets). The completeness of the right-sided system was proved in [1]; this yields the completeness of the left-sided system by the symmetry lemma, stated below. The left-sided system is essentially intuitionistic, restricted to sequents of the form Γ ⇒; see footnote 7. Axioms and rules are plausible in intuitionistic logic, if one interprets both negations as intuitionistic negation, product, ∧ and commas as intuitionistic conjunction, ∨ and ⊕ as intuitionistic disjunction, 0 = ⊥ and 1 = . Both systems are related by the basic symmetry of involutive algebras: a ≤ b iff b∼ ≤ a∼ iff b− ≤ a− . We extend metalanguage negations for sequents: (α1 , α2 , . . . , αn )∼ = αn∼ , . . . , α2∼ , α1∼ , and similarly for − . By induction on proofs one proves the symmetry lemma: Γ is provable in the cut-free leftsided (resp. right-sided) system if and only if Γ ∼ is provable in the cut-free right-sided (resp. left-sided) system (∼ can be replaced by − ). Consequently, the cut-elimination theorem for one of these systems yields this theorem for the other one. For MALL and Cyclic MALL, one restricts the atoms p(n) to p and p∼ , for any variable p. The metalanguage negation ∼ is defined as in Table 5 with the first row: (p)∼ = p∼ , (p∼ )∼ = p. By induction on α one proves: α∼∼ = α for any formula α. Both one-sided systems for Cyclic MALL are those in Table 6 except that one takes p, p∼ and p∼ , p as axioms for p and omits (cut− ). The cyclic rule of Yetter [77]: (cy)
Γ, α α, Γ
is admissible in either cut-free system and derivable in the presence of (cut ∼ ). Clearly the bottom-up version of (cy) is derivable from (cy) and conversely. For MALL one adds the exchange rule in the form: from Γ, α, β, Δ infer Γ, β, α, Δ. Due to this rule, other axioms and rules can be simplified. The axioms for p are p, p∼ only, one omits the left context Γ or Γ1 in axioms and rules, (cut− ) is absent, and the left (resp. right) rules for ⊕ (resp. ·) are reduced to one rule. In a similar way we obtain sequent systems for weaker logics. For InFL− , we remove 0 and 1 from the language and take the systems from Table 6 (in the restricted language) with sequents restricted to sequences of at least two formulas. The latter constraint is analogous to Lambek’s constraint Γ = for intuitionistic sequents Γ ⇒ α. In the way described above one obtains sequent systems for CyInFL− and InFL− e . In a l.o. involutive residuated groupoid, a sequent Γ, Δ ⇒ is true for μ, if μ(Γ ) ≤ μ(Δ)− (equivalently: μ(Δ) ≤ μ(Γ )∼ ), and ⇒ Γ, Δ is true for μ, if μ(Γ )− ≤ μ(Δ) (equivalently: μ(Δ)∼ ≤ μ(Γ ).
Extensions of Lambek Calculi
129
One-sided sequents for nonassociative linear logics look similarly except that sequences of formulas are replaced by bunches [33, 18, 19]. We omit outer parentheses of sequents appearing in axioms and rules. The truth conditions for sequents are as above. Metalanguage negations are defined as in Table 5. For InFNL, axioms and rules are as in Table 6, adapted for bunches, plus (r-shift). We specify some cases only. The axioms are as in Table 6 except that those for ⊥ and are Γ [⊥] and Γ []. The left rule for 1 is: from Γ [] infer Γ [1]. The first of the left rules for ⊕ is: from Γ [β] and Δ, α infer Γ [(Δ, α⊕β)]. The left rule for ∨ is: from Γ [α] and Γ [β] infer Γ [α ∨ β]. The rule: (r-shift)
(Γ, Δ), Θ Γ, (Δ, Θ)
in the left-sided format expresses the algebraic law: (a · b) · c ≤ 0 iff a · (b · c) ≤ 0 , for all a, b, c
(6)
(equivalent to the contraposition law: a∼ /c = a\c− for all a, c). In the rightsided format, (r-shift) expresses the law: 1 ≤ (a ⊕ b) ⊕ c iff 1 ≤ a ⊕ (b ⊕ c) , for all a, b, c
(7)
(equivalent to (6) by the basic symmetry). For InFNL− , (6) must be replaced by: (8) a · b ≤ c− iff b · c ≤ a∼ , for all a, b, c. At the end we mention Compact Bilinear Logic (CBL) due to Lambek [53], which can be described as Noncommutative MLL (i.e. the multiplicative fragment of Noncommutative MALL) with · = ⊕, hence 1=0. The corresponding algebras are pregroups. A pregroup is defined as a p.o. monoid (A, ·, 1, ≤) with two unary operations r ,l , satisfying the adjoint laws: a · ar ≤ 1 ≤ ar · a al · a ≤ 1 ≤ a · al for any a ∈ A. One defines a\b = ar · b, a/b = a · bl . With these operations a pregroup becomes an involutive residuated monoid, where r amounts to ∼ and l to − . Let us note that ar = al , for all elements a, implies that the pregroup is a p.o. group with ar = al = a−1 . In particular, every commutative pregroup is a commutative p.o. group. Also, every finite pregroup is a group; ≤ is the identity relation [12]. Free pregroups generated by finite posets are offered in [53] as type logics for categorial grammars, alternative to L and its variants. By the pregroup laws (a · b)r = br · ar , (a · b)l = bl · al , alr = a = arl , 1r = 1 = 1l , in the free pregroup, generated by a poset (P, ≤), the elements can be represented (n ) (n ) as strings p1 1 . . . pk k such that all pi belong to P and all ni to Z. Here k ≥ 0; the empty string represents 1. The atom p(n) is understood as above (now r stands for ∼ and l for − ). Finite strings of atoms are called (pregroup)
130
Wojciech Buszkowski
types and denoted by Γ , Δ. The relation ⇒ between types is defined as the smallest reflexive and transitive relation, satisfying the following conditions (RED) Γ, p(n) , p(n+1) , Δ ⇒ Γ, Δ (reduction) (EXP) Γ, Δ ⇒ Γ, p(n+1) , p(n) , Δ (expansion) (POS) Γ, p(n) , Δ ⇒ Γ, q (n) , Δ for p ≤ q, if n is even, or q ≤ p, if n is odd (poset) Formulas of L can be translated into pregroup types. For instance, p/(q\s) into p(q r s)l = psl q. Here are pregroup types, corresponding to some Lambek types: [np]r s corresponds to np\s, [np]r s[np]l to (np\s)/np, [np]r s[np]ll sl to (np\s)/(s/np) (a type of ‘that’). So ‘the book that John wrote’ is assigned s, since [np][np]r s[np]ll sl [np][np]r s[np]l ⇒ s is derivable. Lambek’s normalization theorem (also referred to as Switching Lemma): If Γ ⇒ Δ holds, then there exists a step-by-step derivation Γ0 , . . . , Γn from Γ = Γ0 to Δ = Γn such that all reduction steps are applied before all expansion steps. As a consequence, if Δ is an atom or Δ = and Γ ⇒ Δ holds, then no expansion step is needed in a derivation of Δ from Γ . Buszkowski [13] shows that this theorem yields (in fact, is equivalent to) the cut-elimination theorem for the corresponding two-sided sequent system with axioms p(n) ⇒ p(n) for p ∈ P , n ∈ Z, and the following rules. (r-red)
Γ 1 , Γ2 ⇒ Δ (n) Γ1 , p , p(n+1) , Γ2
(r-pos.1)
⇒Δ
Γ1 , q (n) , Γ2 ⇒ Δ Γ1 , p(n) , Γ2 ⇒ Δ
(r-exp)
Γ ⇒ Δ1 , Δ 2 Γ ⇒ Δ1 , p(n+1) , p(n) , Δ2
(r-pos.2)
Γ ⇒ Δ1 , p(n) , Δ2 Γ ⇒ Δ1 , q (n) , Δ2
In (r-pos.1), (r-pos.2) the condition is as in (POS). The cut rule can be written as follows. Γ1 ⇒ Γ2 Γ2 ⇒ Γ3 (r-cut) Γ 1 ⇒ Γ3 CBL is stronger than L1, e.g. (a · b)/c ≤ a · (b/c) is valid in pregroups but not in residuated monoids. Also it is computationally simpler: the parsing procedure is PTIME [12]. Nonetheless some properties of CBL make it hardly comparable with other logical calculi. A bounded pregroup must be trivial (i.e. one-element) algebra; CBL with ⊥ (equivalently: ) is inconsistent. Indeed, in bounded residuated groupoids ⊥ · = · ⊥ = ⊥. In bounded pregroups l = r = ⊥, which yields: = · 1 ≤ · · l = · · ⊥ = ⊥ Hence = ⊥, which means that the algebra is trivial. Since some purely implicational laws are not valid even in classical logic, CBL does not possess any natural type-theoretic semantics. Therefore this framework seems to be algebraic rather than genuine logical. A category-theoretic semantics was proposed in [68] and its application in distributional theory of meaning in [25].
Extensions of Lambek Calculi
131
Lambek’s book [54] describes a large fragment of English in this framework; also see the collection volume [21] for other contributions. Concrete pregroups can be built as sets of unary residuated operations (maps) on a poset (P, ≤), containing the identity map and being closed under composition (product) and unary residuals r , l . Every pregroup (A, ·, 1,r ,l , ≤) can be embedded into such a concrete pregroup of maps on (A, ≤); the embedding is defined by: h(a)(x) = a·x. Lambek [52, 53] shows that the family of all order preserving, downward and upward unbounded maps on Z (with the natural ordering) is a pregroup. This concrete pregroup is the only possible pregroup (up to isomorphism) that consists of all order preserving, downward and upward unbounded maps on a totally ordered set [12]. Other pregroups can be obtained from this one by subalgebras and products. Lambek [52] provides a construction of an involutive FL-algebra as the family of downsets of a pregroup.
6 Final comments This short survey of logical systems extending Lambek calculi is almost finished. To keep the list of references in a reasonable size many interesting issues and valuable contributions have been omitted. Semantic types, briefly mentioned in Section 2, play a crucial role in logical semantics of natural language, tightly connected with categorial grammars. Pregroup grammars do not follow this paradigm. Lambek [54] described this framework as Capulet semantics in opposition to the standard Montague semantics. Other essential omissions are e.g. Abstract Categorial Grammars of P. de Groote (a type-theoretic analysis of syntax and semantics, employing linear typed lambda-terms as both syntactic and semantic structures), Combinatory Categorial Grammars of M. Steedman (stimulated by ideas of H. B. Curry, but closely related to Lambek grammars), Categorial Unification Grammars of H. Uszkoreit, and others. Unification was also employed in learning procedures for categorial grammars, proposed by M. Kanazawa and myself in the 1990s and continued by other authors, e.g. D. B´echet, A. Foret, J. Marciniec. Besides the special issues of Studia Logica, mentioned in Section 1, a Festschrift for Joachim Lambek appeared as a special issue of Linguistic Analysis, 36.1-4, (2010). From among several valuable contributions, let me note [72], examining “the relationship between Chomsky’s Minimalist Program and Lambek’s Categorial Grammar.” Beyond the area of language, substructural logics and linear logics develop rapidly. The former are usually studied in connection with algebraic logic, the latter with computer science; see e.g. the collection [27] and the references there. The present survey focuses on these research lines, precisely on the place of Lambek calculi in them.
132
Wojciech Buszkowski
References 1. Abrusci, V.M. Phase semantics and sequent calculus for pure noncommutative classical linear logic. Journal of Symbolic Logic 56 (1991), 1403–1451. 2. Abrusci, V.M. Classical conservative extensions of Lambek calculus. Studia Logica 71, (2002) 277–314. 3. Ajdukiewicz, K. Die syntaktische Konnexit¨ at. Studia Philosophica 1, (1935), 1–27 . 4. Bar-Hillel, Y. A quasi-arithmetical notation for syntactic description. Language 29, (1953), 47–58. 5. Bar-Hillel, Y., Gaifman, C. , and Shamir, E. On categorial and phrase structure grammars. Bull. Res. Council Israel F9, (1960), 155–166. 6. Birkhoff, G. Lattice Theory. American Mathematical Society, Providence, R.I., (1948). 7. Blackburn, P. , de Rijke, M. and Venema,Y. Modal Logic, Cambridge University Press, (2001). 8. Blok, W.J. and van Alten, C.J. The finite embeddability property for residuated lattices, pocrims and BCK-algebras. Algebra Universalis 48 (2002), 253–271. 9. Blok, W.J. and van Alten, C.J. On the finite embeddability property for residuated ordered groupoids. Transactions of AMS 357, (2005), 4141–4157. 10. Blyth, T.S. Lattices and Ordered Algebraic Structures. Universitext, Springer, (2005). 11. Buszkowski, W. Some decision problems in the theory of syntactic categories. Zeitschrift f¨ ur mathematische Logik und Grundlagen der Mathematik 28, 539–548, (1982). 12. Buszkowski, W. Lambek grammars based on pregroups. In: de Groote,P., Morrill, G. and Retor´ e, C. (eds.), Logical Aspects of Computational Linguistics (LACL 2001), Lecture Notes in Computer Science 2099, Springer, (2001), 95–109. 13. Buszkowski, W. Sequent systems for compact bilinear logic. Mathematical Logic Quarterly 49, (2003), 467–474 . 14. Buszkowski, W. Lambek calculus with nonlogical axioms. In: [20], pp. 77–93. 15. Buszkowski, W. On action logic: Equational theories of action algebras. Journal of Logic and Computation 17, (2007), 199–217. 16. Buszkowski, W. Interpolation and FEP for logics of residuated algebras. Logic Journal of The IGPL 19, (2011), 437–454. 17. Buszkowski, W. Some syntactic interpretations in different systems of Full Lambek Calculus. In: Ju, S. , Liu, H. and Ono, H. (eds.), Modality, Semantics and Interpretations, Logic in Asia: Studia Logica Library , Springer, (2015), 23–48. 18. Buszkowski, W. On Classical Nonassociative Lambek Calculus. In: M. Amblard, P. de Groote, S. Pogodalla and C. Retor´ e (eds.), Logical Aspects of Computational Linguistics (LACL 2016), pp. 68–84, Lecture Notes in Computer Science 10054, Springer, (2016). 19. Buszkowski, W. On Involutive Nonassociative Lambek Calculus. Journal of Logic, Language and Information 28, (2019), 157–181, . 20. Casadio, C., Scott, P.J. , and Seely, R.A.G. (eds.): Language and Grammar. Studies in Mathematical Linguistics and Natural Language. CSLI Publications, (2005). 21. Casadio, C. and Lambek, J. (eds.): Computational Algebraic Approaches to Natural Language. Polimetrica, (2008). 22. Chvalovsky, K. Undecidability of consequence relation in full nonassociative Lambek calculus. Journal of Symbolic Logic 80, (2015), 567–576. 23. Chvalovsky, K. and Horˇ cik, R. Full Lambek Calculus with contraction is undecidable. Journal of Symbolic Logic 81, (2016), 524–540. 24. Ciabattoni, A. , Galatos, N., and Terui, K. Algebraic proof theory for substructural logics: cut-elimination and completions. Annals of Pure and Applied Logic 163, (2012), 266–290. 25. Coecke, B., Sadrzadeh, M. and Clark, S. Mathematical Foundations for a Compositional Distributional Model of Meaning. Linguistic Analysis 36.1-4, (2010), 345–384
Extensions of Lambek Calculi
133
26. Doˇsen, K. A brief survey of frames for the Lambek calculus. Zeitschrift f¨ ur mathematische Logik und Grundlagen der Mathematik 38, (1992), 179–187. 27. Ehrhard, T., Girard, J-Y., Ruet, P. and Scott, P. (eds.): Linear Logic in Computer Science. Cambridge University Press, (2004). 28. Fuchs, L. Partially Ordered Algebraic Systems. Pergamon Press, Oxford, (1963). 29. Galatos, N. and Jipsen, P. Relation algebras as expanded FL-algebras. Algebra Universalis 69, (2013), 1–21. 30. Galatos, N. , Jipsen, P., Kowalski, T. and Ono, H. Residuated Lattices: An Algebraic Glimpse at Substructural Logics. Studies in Logic and The Foundations of Mathematics 151, Elsevier, (2007). 31. Galatos, N. and Ono, H. Cut elimination and strong separation for substructural logics: An algebraic approach. Annals of Pure and Applied Logic 161, (2010), 1097–1133. 32. Girard, J-Y. Linear logic. Theoretical Computer Science 50, (1987), 1–102 . 33. de Groote, P. and Lamarche, F. Classical non-associative Lambek calculus. Studia Logica 71, (2002), 355–388. 34. H´ ajek, P. Metamathematics of Fuzzy Logic. Trends in Logic, Kluwer, (1998). 35. Hoare, C. A. and Jifeng, H. The weakest prespecification I. Fundamenta Informaticae 9, (1986), 51–84. 36. Hoare, C. A. and Jifeng, H. The weakest prespecification II. Fundamenta Informaticae 9, (1986), 217–252. 37. Horˇ cik, R. and K. Terui, K. Disjunction property and complexity of substructural logics. Theoretical Computer Science 412, (2011), 3992–4006. 38. Hudelmaier, J. and Schroeder-Heister, P. Classical Lambek Logic. In: Baumgartner,P. R. H¨ ahnle, R., and Posegga, J. (eds.), Theorem Proving with Analytic Tableaux and Related Methods, Springer, (1995), 245–262. 39. J¨ ager, G. Residuation, structural rules and context-freeness. Journal of Logic, Language and Information 13, (2004), 47–59. 40. J´ onsson, B. and Tsinakis, C. Relation algebras as residuated Boolean algebras. Algebra Universalis 30, (1993), 469–478 41. Kaminski, M. and Francez, N. Relational semantics of the Lambek calculus extended with classical propositional logic. Studia Logica 102, (2004), 479–497. 42. Kaminski, M. and Francez, N. The Lambek calculus extended with intuitionistic propositional logic. Studia Logica 104, (2016), 1051–1082. 43. Kozak, M. Distributive Full Lambek Calculus has the finite model property. Studia Logica 91, (2009), 201–216. 44. Kurucz, A. N´ emeti, I. , Sain, I. and Simon, A. Decidable and undecidable logics with a binary modality. Journal of Logic, Language and Information 4, (1995), 191–206 . 45. Kuznetsov, S. Lambek grammars with the unit. In: de Groote, P. and Nederhof, M.-J. (eds.), Formal Grammar 2010/2011, Lecture Notes in Computer Science 7395, (2012). Springer, 262–266. 46. Kuznetsov, S. The logic of action lattices is undecidable. Thirty-Fourth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), 2019. 47. Lafont, Y. The finite model property for various fragments of linear logic. Journal of Symbolic Logic 62, (1997), 1202–1208. 48. Lambek, J. The mathematics of sentence structure. The American Mathematical Monthly 65, (1958), 154–170. 49. Lambek, J. On the calculus of syntactic types. In: Jakobson, R. (ed.), Structure of Language and Its Mathematical Aspects. Proc. of Symposium in Applied Mathematics, American Mathematical Society, Providence, R.I., (1961),166–178. 50. Lambek, J. Deductive systems and categories I. Journal of Mathematical System Theory 2, (1968), 287–318. 51. Lambek, J. Cut elimination for classical bilinear logic. Fundamenta Informaticae 22, (1995), 53–67. 52. Lambek, J. Some lattice models of bilinear logic. Algebra Universalis 34, (1995), 541– 550.
134
Wojciech Buszkowski
53. Lambek, J. Type grammars revisited. In: Lecomte, A., Lamarche, F., and Perrier, G. (eds.), Logical Aspects of Computational Linguistics (LACL 1997), Lecture Notes in Computer Science 1582, Springer, (1999), p 1–27. 54. Lambek, J. From Word to Sentence - a computational algebraic approach to grammar. Polimetrica, 2008. 55. Lambek, J. and Scott, P. J. Introduction to higher order categorical logic. Cambridge University Press, (1986). 56. Lin, Z. and Ma, M. On the complexity of the equational theory of residuated boolean algebras. In: J. Vaananen et al. (eds.), WoLLIC 2016, Lecture Notes in Computer Science 9803, Springer, (2016), 265–278. 57. Moortgat, M. Multimodal linguistic inference. Journal of Logic, Language and Information 5, 349–385, (1996). 58. Moortgat, M. Categorial type logic. In: [76], pp. 93–177. 59. Moortgat, M. and Oehrle, R. T. Pregroups and type-logical grammar: Searching for convergence. In: [20], 141–160. 60. Moot, R. and Retor´ e, C. The Logic of Categorial Grammars. Lecture Notes in Computer Science 6850, Springer, (2012). 61. Morrill, G. Type-Logical Grammar: Categorial Logic of Signs. Kluwer, (1994). 62. Okada, M. and Terui, K. The finite model property for various fragments of intuitionistic linear logic. Journal of Symbolic Logic 64, 790–802, (1999). 63. Ono, H. and Komori, Y. Logics without the contraction rule. Journal of Symbolic Logic 50, (1985), 169–201. 64. Pentus, M. Lambek grammars are context-free. In: Proc. of the 8th IEEE Symposium on Logic in Computer Science, (1993), pp. 429–433. 65. Pentus, M. Models for the Lambek calculus. Annals of Pure and Applied Logic 75, (1995), 179–213. 66. Pentus, M. Lambek calculus is NP-complete. Theoretical Computer Science 357, (2006), 186–201. 67. Pratt, V. Action logic and pure induction. In: van Eijck, J. (ed.), Logics in AI, Lecture Notes in Computer Science 478, Springer, (1991), 97–120. 68. Preller, A. and Lambek, J. Free compact 2-categories. Mathematical Structures in Computer Science, 17, (2007), 309–340. 69. Restall, G. An Introduction to Substructural Logics. Routledge, (2000). 70. Roorda, D. Resource Logics: Proof Theoretical Investigations. PhD Thesis, University of Amsterdam, (1991). 71. Schroeder-Heister, P. and Doˇsen, K. (eds.): Substructural Logics. Oxford Science Publications, Clarendon Press, (1993). 72. Solias, T. Chomsky Meets Lambek. Linguistic Analysis 36.1-4, (2010), 193–223. 73. Spinks, M. and Veroff, R. Constructive logic with strong negation is a substructural logic I. Studia Logica 88, (2008), 325–348. 74. Spinks, M. and Veroff, R. Constructive logic with strong negation is a substructural logic II. Studia Logica 89, (2008), 401–425. 75. van Benthem, J. Language in Action: Categories, Lambdas and Dynamic Logic. Studies in Logic and The Foundations of Mathematics 130, North-Holland, (1991). 76. van Benthem, J. and ter Meulen, A. (eds.): Handbook of Logic and Language. Elsevier, The MIT Press, (1997). 77. Yetter, D. N. Quantales and (non-commutative) linear logic. Journal of Symbolic Logic 55, (1990), 41–64. 78. Zielonka, W. Weak implicational logics related to the Lambek calculus with the empty string - Gentzen versus Hilbert formalisms. In: Makinson, D. , Malinowski, J. and Wansing, H. (eds.), Towards Mathematical Philosophy, Trends in Logic 28, Springer, (2009), 201–212.
Categories with Families: Unityped, Simply Typed, and Dependently Typed Simon Castellan, Pierre Clairambault, and Peter Dybjer
Abstract We show how the categorical logic of the untyped, simply typed and dependently typed lambda calculus can be structured around the notion of category with families (cwf). To this end we introduce subcategories of simply typed cwfs (scwfs), where types do not depend on variables, and unityped cwfs (ucwfs), where there is only one type. We prove several equivalence and biequivalence theorems between cwf-based notions and basic notions of categorical logic, such as cartesian operads, Lawvere theories, categories with finite products and limits, cartesian closed categories, and locally cartesian closed categories. Some of these theorems depend on the restrictions of contextuality (in the sense of Cartmell) or democracy (in the sense of Clairambault and Dybjer). Some theorems are equivalences between notions with strict preservation of chosen structure. Others are biequivalences involving notions without chosen structure, and where properties are (necessarily) only preserved up to isomorphism. The cwf-based notions play the role of an abstract syntax of formal systems, and we discuss various constructions of initial ucwfs, scwfs, and cwfs with extra structure. As a corollary of our results we show that equality in the free locally cartesian closed category is undecidable. Simon Castellan Inria, Univ Rennes, CNRS, IRISA, France e-mail: [email protected] Pierre Clairambault Univ Lyon, EnsL, UCBL, CNRS, LIP, F-69342, LYON Cedex 07, France. e-mail: [email protected] Peter Dybjer Chalmers University of Technology, SE-412 96 Gteborg, Sweden. e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_5
135
136
Simon Castellan, Pierre Clairambault, and Peter Dybjer
1 Introduction An important part of categorical logic is to establish correspondences between languages (from logic) and categorical models. For example, in their book Introduction to higher order categorical logic [27] Lambek and Scott prove equivalences between typed lambda calculi and cartesian closed categories, between untyped lambda calculi and C-monoids, and between intuitionistic type theories and toposes. Lambek and Scott’s intuitionistic type theories are intuitionistic versions of Church’s simple theory of types, which should not be confused with Martin-L¨ of’s intuitionistic type theories. Interestingly, in the preface of their book [27, p viii] Lambek and Scott express a desire to include a result concerning the latter too: We also claim that intuitionistic type theories and toposes are closely related, in as much as there is a pair of adjoint functors between their respective categories. This is worked out out in Part II. The relationship between Martin-L¨ of type theories and locally cartesian closed categories was established too recently (by Robert Seely) to be treated here.
Seely’s seminal paper [35] claims to prove that a category of Martin-L¨ of type theories is equivalent to a category of locally cartesian closed categories (lcccs). However, his result relies on an interpretation of substitution as pullback, and the latter are only defined up to isomorphism. It is not clear how to choose pullbacks in such a way that the strict laws for substitution are satisfied. This coherence problem is identified and solved by Curien [16] and Hofmann [21] who provide alternative methods for interpreting MartinL¨of type theory in lcccs (see also [15]). By using Hofmann’s interpretation Clairambault and Dybjer [11, 12] show that there is an actual biequivalence of 2-categories. In this paper we ask ourselves what it would take to add the missing chapter on Martin-L¨ of type theory and its correspondence with lcccs to the book by Lambek and Scott. First we would need to add some material to Part 0 in the book on “Introduction to category theory”, including introductions to lcccs, 2-categories, bicategories, pseudofunctors, and biequivalences. But more profoundly, our biequivalence theorem differs from Lambek and Scott’s (and Seely’s) equivalence theorems in important respects, since we replace Seely’s category of Martin-L¨of theories by a 2-category of categories with families (cwfs), with extra structure for type formers Iext , Σ, Π, and pseudo cwf-morphisms which preserve the structure up to isomorphism. Thus cwfs with extra structure replace Martin-L¨of theories on the “syntactic” side of the biequivalence. This style of presenting the correspondence between “syntax” and “semantics” for Martin-L¨ of’s dependent type theory applies equally well to the simply typed lambda calculus and the untyped lambda calculus, provided we consider subcategories of simply typed cwfs (scwfs), where types do not depend on variables, and of unityped cwfs (ucwfs), where there is only one
Categories with Families: Unityped, Simply Typed, and Dependently Typed
137
type. As for full cwfs we need to provide extra structure for modelling λabstraction and application in the untyped λ-calculus and for modelling type formers in the simply typed λ-calculus. This suggests that we ought to rewrite Lambek and Scott’s Part I “Cartesian closed categories and λ-calculus” in a way which harmonizes with our presentation of the biequivalence between Martin-L¨ of type theory and lcccs. Categories wifh families (cwfs) model the most basic rules of dependent type theory, those which deal with substitution, context formation, assumption, and general reasoning about equality. A key feature of cwfs is that the definition can be unfolded to yield a generalized algebraic theory in Cartmell’s sense [8]. As such it suggests a language of cwf-combinators which can be used for the construction of initial cwfs (with extra structure for modelling type formers). We prove several correspondence theorems between “syntax” in the guise of a number of cwf-based notions and “semantics” in the guise of some basic notions from category theory. Some of our theorems require “contextuality”, a notion introduced by Cartmell [8] for his contextual categories. Others require “democracy”, a notion introduced by Clairambault and Dybjer for their biequivalence theorems. Moreover, our equivalence theorems require strict preservation of chosen cwf-structure, while our biequivalence theorems only require preservation of cwf-structure up to isomorphism. In this way we can relate a number of notions from categorical logic such as cartesian operads, Lawvere’s algebraic theories, Obtulowicz’ algebraic theories of type λ-βη [34], categories with finite products and limits, cccs, and lcccs, to the corresponding cwf-based notions. In addition to this we discuss different constructions of initial ucwfs, scwfs and cwfs (with extra structure) with and without explicit substitutions. The purpose of our work is not so much to prove new results, but to suggest a new way to organize basic correspondence theorems in categorical logic, where the ucwf-scwf-cwf-sequence provides a smooth progression of the categorical model theory of untyped, simply typed, and dependently typed λ-calculi. We will also highlight some of the subtleties which arise when relating syntactic and semantic notions. Another important feature is that the correspondences between logical theories and categorical notions are now split into two phases: (i) equivalences and biequivalences between cwf-based notions and basic categorical notions, and (ii) the constructions of initial cwf-based notions. This yields an “abstract syntax” perspective of formal systems, where specific formalisms for untyped, simply typed and dependently typed λ-calculi are instances of the respective isomorphism classes of initial cwf-based notions. This is particularly important for dependent types and Martin-L¨of type theory, since different authors make different choices in the exact formulation of the syntax and inference rules. Being initial in the appropriate category of cwfs is a suitable correctness criterion for these formulations.
138
Simon Castellan, Pierre Clairambault, and Peter Dybjer
In the text we have only discussed the relationship to some of the most important related notions in the literature. We would like to emphasise that we have not at all tried to give a comprehensive account of related work. Such an account would be a daunting task, since we would need to cover a multitude of works on models of the untyped and simply typed lambda calculus and of dependent type theory. Nevertheless, we would like to refer to Jacobs’ book Categorical Logic and Type Theory [25] which provides a comprehensive account of categorical type theory as fibred category theory. In particular, Chapter 2 on Simple Type Theory and Chapter 10 on First Order Dependent Type Theory contain much related material. In Jacobs’ work the notion of fibration takes centre stage while its role is only implicit in our cwf-based approach. In any cwf we can define an indexed category (and hence a fibration) of types indexed by contexts. However, an account of the precise correspondence between the two approaches is outside the scope of this paper.
Plan of the paper In Section 2 we introduce cwfs and explain their connection to Martin-L¨ of type theory. We also define contextual and democratic cwfs. In Section 3 we consider unityped cwfs and show that contextual ucwfs are equivalent both to cartesian operads and to Lawvere theories. We then add extra structure to model the untyped λβη-calculus, and show how initial ucwfs can be built both as calculi of explicit and implicit substitutions. In Section 4 we consider simply typed cwfs. We first show the equivalence between contextual scwfs with finite product types and categories with finite products as structure. Then we show the biequivalence between democratic scwfs and categories with finite products as property. Moreover, we add function types to our contextual scwfs with finite product types and show their equivalence to cartesian closed categories as structure. This is our analogue of the equivalence between simply typed λ-calculi and cartesian closed categories in Lambek and Scott. In Section 5 we discuss alternative definitions of full dependently typed cwfs. We also show an explicit construction of a free cwf. Moreover, we present two biequivalences. The first is between categories with finite limits and democratic cwfs with extensional identity types and Σ-types. The other is between lcccs and democratic cwfs with extensional identity types, Σ-types and Πtypes. Finally, we outline the construction of a free lccc, and show how we can use our biequivalence theorem to prove that equality in this lccc is undecidable. In this section we only give an overview of the theorems and refer the reader to the journal articles [12, 10] for detailed proofs.
Categories with Families: Unityped, Simply Typed, and Dependently Typed
139
2 Dependent type theory and categories with families We recall the general structure of judgments and inference rules of Martin-L¨ of type theory and explain its connection to the definition of cwfs.
2.1 Martin-L¨ of type theory We here consider Martin-L¨ of type theory with extensional identity types in the style of [30, 31].
2.1.1 Judgments In Martin-L¨of [30] intuitionistic type theory is presented as a formal system with four forms of judgment: Γ A type Γ A = A Γ a:A Γ a = a : A These respectively state that A is a well-formed type; that A and A are equal types; that a is a valid term of type A; and that a and a are equal terms of type A. These four forms of judgments are hypothetical, that is, relative to a context Γ = x1 : A1 , . . . , xn : An which assigns types Ai to free variables xi . Martin-L¨of [32] presents an alternative version of the theory in the form of a substitution calculus (see also [37]) with four additional forms of judgment: Γ context Γ = Γ Δ→γ:Γ Δ → γ = γ : Γ One aim is to make the rules for context formation explicit. Another is to formulate a calculus where substitution is a first-class citizen (a term constructor) and not just an operation defined by induction on the types and terms of the theory. Given Γ = x1 : A1 , . . . , xn : An , the judgment Δ → γ : Γ expresses that γ is a substitution (an assignment) of terms for free variables x1 = a1 , . . . , xn = an , where a1 : A1 , . . . , an : An are terms in context Δ. Substitutions can be applied to both terms and types, e.g., if Δ → γ : Γ and Γ A type and Γ a : A, then Δ A[γ] type and Δ a[γ] : A[γ]. But rather than defining a[γ] and A[γ] (which simultaneously substitute terms
140
Simon Castellan, Pierre Clairambault, and Peter Dybjer
for free variables) by induction, they are instead explicit term constructors, and the effect of replacing a variable xi by a term ai is expressed a posteriori via judgmental equality.
2.1.2 Inference rules The inference rules of intuitionistic type theory can be separated into two kinds. The first kind are the general rules, the most basic rules for reasoning with dependent types. They deal with substitution, context formation, assumption, and general equality reasoning. They form the backbone of the dependently typed structure, and carry no information yet about specific term and type formers. The second kind consists of the rules for type formers, such as Π, Σ, and identity types. These rules are divided into formation, introduction, elimination, and equality rules (also called computation rules). Categories with families capture models of the first kind of rules: the backbone of Martin-L¨ of type theory which is independent of type and term constructors.
2.2 Categories with families We first give the definition and then explain the connection to type theory.
2.2.1 Definition The definition uses the category Fam of families of sets. Its objects are families (Ux )x∈X . A morphism with source (Ux )x∈X and target (Vy )y∈Y is a pair consisting of a reindexing function f : X → Y , and a family (gx )x∈X where for each x ∈ X, gx : Ux → Vf (x) is a function. Definition 1 A category with families (cwf) consists of the following: • A category C with a terminal object 1. Notation and terminology. We use Γ, Δ, etc, to range over objects of C, and refer to those as contexts. Likewise, we use δ, γ, etc, to range over morphisms, and refer to those as substitutions. We refer to 1 as the empty context. We write Γ ∈ C(Γ, 1) for the terminal map, representing the empty substitution. • A Fam-valued presheaf, i.e. a functor T : C op → Fam. Notation and terminology. If T (Γ ) = (Ux )x∈X , we write X = Ty(Γ ) and refer to its elements as types in context Γ – we use A, B, C to range over such types. For A ∈ X = Ty(Γ ), we write UA = Tm(Γ, A) and refer to its elements as terms of type A in context Γ . Finally, for γ : Δ → Γ , the
Categories with Families: Unityped, Simply Typed, and Dependently Typed
141
functorial action yields T (γ) : (Tm(Γ, A))A∈Ty(Γ ) → (Tm(Δ, B))B∈Ty(Δ) consisting of a pair of a reindexing function [γ] : Ty(Γ ) → Ty(Δ) referred to as substitution in types, and for each A ∈ Ty(Γ ) a function [γ] : Tm(Γ, A) → Tm(Δ, A[γ]) referred to as substitution in terms. • A context comprehension operation which to a given context Γ ∈ C0 and type A ∈ Ty(Γ ) assigns a context Γ · A and two projections pΓ,A : Γ · A → Γ
qΓ,A ∈ Tm(Γ · A, A[pΓ,A ])
satisfying the following universal property: for all γ : Δ → Γ , for all a ∈ Tm(Δ, A[γ]), there is a unique γ, a : Δ → Γ · A such that pΓ,A ◦ γ, a = γ
qΓ,A [γ, a] = a .
We say that (Γ · A, pΓ,A , qΓ,A ) is a context comprehension of Γ and A. Observe the similarity between the universal properties of context comprehension and cartesian products – the former is a skewed dependently typed version of the latter. It is also closely related to Lawvere comprehension [28]. This definition is the standard, historical definition of cwfs [17]. As notations and terminology suggest, it is closely connected to the syntax of type theory, and particularly to Martin-L¨ of’s substitution calculus. Remark 1 The structure from Definition 1 exactly matches that of MartinL¨of’s substitution calculus mentioned before. The correspondence follows: • Γ ∈ C0 models the judgment Γ context and Γ = Γ ∈ C0 models Γ = Γ . • γ ∈ C(Δ, Γ ) models the judgment Δ → γ : Γ and γ = γ ∈ C(Δ, Γ ) models Δ → γ = γ : Γ . • A ∈ Ty(Γ ) models the judgment Γ A type and A = A ∈ Ty(Γ ) models Γ A = A . • a ∈ Tm(Γ, A) models the judgment Γ a : A and a = a ∈ Tm(Γ, A) models Γ a = a : A. The connection with Martin-L¨ of’s substitution calculus contributes to the appeal of cwfs: they give rise to categorical combinators for dependent types just as cccs give rise to categorical combinators for the simply typed λcalculus [14]. However, Definition 1 has sometimes been criticized for being too close to the syntax, or for relying on less standard mathematical objects such as Fam-valued presheaves. In Section 5.1 we will discuss alternative formulations of cwfs highlighting other aspects of the structure. Remark 2 A key feature of the notion of cwf is that it can be presented as a generalized algebraic theory in the sense of Cartmell [8].
142
Simon Castellan, Pierre Clairambault, and Peter Dybjer
• The generalized algebraic theory of categories introduces the sorts C0 and C(Δ, Γ ), the operations γ ◦ δ and idΓ , and associativity and identity laws. • The Fam-valued presheaf adds the sorts Ty(Γ ) and Tm(Γ, A), the operations A[γ] and a[γ], and associativity and identity laws for both. • The terminal object adds the operation 1 and , and its uniqueness law. • Context comprehension adds the operations Γ · A, pΓ,A , qΓ,A , and γ, a, and the projection and surjective pairing laws. See [17] for a complete presentation of the generalized algebraic theory of cwfs. We remark that we have suppressed some of the arguments of the operations. For example, composition is officially an operation with five arguments: Ξ, Δ, Γ ∈ C0 , δ ∈ C(Ξ, Δ), and γ ∈ C(Δ, Γ ), but we suppress the three first when we write δ ◦γ. Similar remarks hold for the operations A[γ], a[γ], , and γ, a. We sometimes drop even more arguments to simplify notation and for example write pA or p for the official pΓ,A , etc. Moreover, we sometimes write γ : Δ → Γ for γ ∈ C(Δ, Γ ). A cwf is thus a structure (C, T, 1, , ·, p, q, −, −), subject to some equations. However, we often refer to a cwf by the first two components (C, T ) or even the first component C. As already mentioned, cwfs only organize the core of dependent type theory, the basic structure and operations on contexts, types, terms, and substitutions. We will see later how cwfs naturally generalize well-known notions in categorical logic to dependent types. We will also see how they may be enriched with type and term formers, in order to capture Martin-L¨of type theory with Σ-types, Π-types and identity types. Finally, we will see that the syntax of Martin-L¨ of type theory may be defined as the initial cwf in a precise sense.
2.2.2 Structure of contexts The definition of cwfs just contains two operations on contexts: the terminal object representing the empty context and an operation mapping a context Γ and a type A ∈ Ty(Γ ) to a new context Γ · A. It is however not required that all contexts are generated by repeated application of these two rules. In contrast to this, Cartmell [8] adds such a constraint on the structure of context for his contextual categories. We shall use the following formulation, which is equivalent to Cartmell’s: Definition 2 (Contextuality) A cwf is contextual iff there is l : C0 → N , a length function, such that l(Γ ) = 0 iff Γ = 1, and l(Γ ) = n + 1 iff there are unique Δ ∈ C0 and A ∈ Ty(Δ) such that Γ = Δ · A, and l(Δ) = n.
Categories with Families: Unityped, Simply Typed, and Dependently Typed
143
Although this requirement will be used in some of our equivalence theorems, it is not part of our definition of cwf. The reason is that unlike the other parts of the definition of cwfs, it does not correspond to an inference rule of dependent type theory, and it is not expressed in the language of generalized algebraic theories. However, the free cwf is contextual. Without going as far as requiring all contexts to be defined inductively, we sometimes wish to overcome the intrinsic distinction between contexts and types by asking that up to isomorphism, every context is represented by a type [11, 12]. Definition 3 (Democracy) A cwf is democratic provided each context Γ is represented by a type Γ in the sense that there is an isomorphism γΓ : Γ ∼ = 1.Γ Democracy does not imply contextuality. However, in the presence of unit types and Σ-types, the converse holds: any context 1 · A1 · . . . · An may be represented by the iterated Σ-type Σ(A1 , Σ(A2 , . . . , Σ(An−1 , An ) · · · )) ∈ Ty(1). Like contextuality, democracy does not correspond to an inference rule of dependent type theory. However, unlike contextuality, democracy can be expressed in the language of generalized algebraic theories.
2.2.3 Strict morphisms of cwfs We will now introduce a notion of morphisms between cwfs. Definition 4 A (strict) cwf-morphism between cwfs (C, TC ) and (D, TD ) is a pair (F, σ) where F : C → D is a functor preserving 1 on the nose, and σ : TC ⇒ TD ◦ F is a natural transformation between Fam-valued presheaves, preserving context comprehension on the nose. Thus we have σΓ : TyC (Γ ) → TyD (F Γ ) σΓA
: TmC (Γ, A) → TmD (F Γ, σΓ (A))
for Γ ∈ C0 and A ∈ Ty(Γ ). It is convenient to simplify notations and write all the components of a cwf-morphism as F so that we write F (A) for σΓ (A) and F (a) for σΓA (a). Naturality of σ amounts to preservation of substitution, i.e., for all γ : Δ → Γ in C, we have F (A[γ]) = (F A)[F γ]
F (a[γ]) = (F a)[F γ] .
144
Simon Castellan, Pierre Clairambault, and Peter Dybjer
Finally, preservation of context comprehension on the nose means that F (Γ · A) = F Γ · F A, with F (pΓ,A ) = pF Γ,F A and F (qΓ,A ) = qF Γ,F A . Small cwfs and strict cwfs-morphisms form a category, written Cwf .
3 Unityped cwfs As we explained in the introduction, a key feature of the notion of cwf is that it can be presented as a generalized algebraic theory. As a consequence it can be seen both as a notion of model and as an (idealized) language for dependent type theory. It is therefore a suitable intermediary between traditional formal systems for dependent type theory and categorical notions of models. This paper is based on the observation that restricted classes of cwfs can play a similar role for untyped and simply typed systems. In this section we will look at cwfs with only one type and claim that they play a similar role for untyped systems as cwfs do for dependently typed systems.
3.1 Plain ucwfs The definition of cwfs with only one type can be simplified as follows. Definition 5 A unityped category with families (ucwf) consists of the following: • A category C with a terminal object, written 0. Notation and terminology. We use n, m, etc to range over objects of C, and refer to those as contexts. Likewise, we use δ, γ, etc to range over morphisms, and refer to those as substitutions. We refer to 0 as the empty context. We write n ∈ C(n, 0) for the terminal morphism, representing the empty substitution. • A presheaf Tm : C op → Set. Notation and terminlogy. We refer to the elements of Tm(n) as the terms of arity n – we use a, b, etc to range over terms. Finally, for γ : n → m, the functorial action of Tm yields a substitution operation Tm(γ) = [γ] : Tm(m) → Tm(n) . • A context comprehension operation which to a given context n ∈ C0 assigns a context s(n) ∈ C0 along with two projections pn : s(n) → n
qn ∈ Tm(s(n))
Categories with Families: Unityped, Simply Typed, and Dependently Typed
145
satisfying the following universal property: for all γ : m → n, for all a ∈ Tm(m), there is a unique γ, a : m → s(n) such that pn ◦ γ, a = γ
qn [γ, a] = a .
Remark 3 The context comprehension operation for ucwfs amounts to the assignment of a representation s(n) of the presheaf C(−, n) × Tm(−) : C op → Set for all n ∈ C0 . If n ∈ N is a natural number, we write n = sn (0) for the context obtained by n applications of the context comprehension operation. If the ucwf is contextual, all objects of C have the form n for n ∈ N. We think of the terms in Tm(n) as boxes with n inputs and one output. Substitutions γ : m → n are boxes with m inputs and n outputs. We have γ = . . . , a1 , . . ., an : m → n for a1 , . . . , an ∈ Tm(m). For convenience we write γ = a1 , . . . , an . For a ∈ Tm(m), performing the substitution a[γ] amounts to connecting the box ai to the i-th input of the box a, see Figure 1.
Fig. 1 Substitution as plugging boxes
Figure 1 reminds us of other categorical notions that aim to capture algebraic theories, such as Lawvere theories. We will come back later to this similarity. However, Figure 1 is misleading in one respect: it suggests that the boxes a1 , . . . , an have free variables in Var(m) = {1, . . . , m}. In reality, in the context of ucwfs, these free variables are not first-class citizens but are obtained indirectly through sequences of projections. More precisely, the term πim = qi−1 [pi ] · · · [pm−1 ] ∈ Tm(m) will serve in place of the free variable i in the context of size m. With the notations introduced, we then have the expected equation
146
Simon Castellan, Pierre Clairambault, and Peter Dybjer
πim [a1 , . . . , am ] = ai These notations suggest a correspondence to cartesian operads [38], and we will come back to this connection. Before that, writing Ucwf for the category of small ucwfs and strict cwf-morphisms, we note: Proposition 1 The category Ucwf has an initial object. Construction 1. From the definition of ucwf it is immediately clear that an initial ucwf Tucwf can be generated inductively: we simultaneously define the three families C0 , C(n, m), and Tm(n) where the ucwf-operations become introduction rules. Then we take the quotients with respect to the equivalence relations generated by the ucwf-equations. Construction 2. Alternatively, we can construct it as the presheaf of variables over the category of renamings. • The category N of renamings, with objects N0 = N and N (n, m) = Var(n)m . • The presheaf Tm : N op → Set is defined by Tm(n) = Var(n) and i[(a1 , . . . , am )] = ai . • Context comprehension is defined by s(n) = n + 1, pn = (1, . . . , n), and qn = n + 1. This construction and its isomorphism with Tucwf have been formalized in Agda by Brilakis [7]. Interestingly, this is the same as the free cartesian operad. This initial ucwf is contextual. As we will see, this requirement is necessary for the connection with cartesian operads.
3.2 Contextual ucwfs 3.2.1 Cartesian operads Let us now consider the special case of a contextual ucwf C, where the length function induces a bijection C0 ∼ = N. It follows from the laws of cwfs that C(m, n) ∼ = Tm(m)n From right to left we use the n-ary tupling introduced above, while from left-to-right we apply projections. This is essentially the same data as for cartesian operads. Definition 6 A cartesian operad consists of the following: • a family Tm(n), where n ∈ N, of n-ary operations; • an operation of operad composition which maps a ∈ Tm(m) and γ ∈ Tm(n)m to a[γ] ∈ Tm(n) and satisfies identity and associativity laws;
Categories with Families: Unityped, Simply Typed, and Dependently Typed
147
• projections πin ∈ Tm(n), such that πin [(a1 , . . . , an )] = ai . Focusing on contextual ucwfs allows us to extract the mechanism for terms and substitutions that is at play in full cwfs, but without considering types. Writing Ucwf ctx of Ucwf having as objects contextual ucwfs, and Cop for the category of cartesian operads, we have: Theorem 1 The categories Ucwf ctx and Cop are equivalent. Rather than formally define Cop and prove this equivalence, we shall detail it for Lawvere theories, which form a category equivalent to Cop [38].
3.2.2 Lawvere theories Contextual ucwfs are also equivalent to Lawvere theories. The following definition is from [24]. Definition 7 A Lawvere theory consists of a small category C with (necessarily strictly associative) finite products and a strict finite-product preserving identity-on-objects functor L : N → C. Here, N refers to the category of renamings introduced in the proof of Proposition 1, with terminal object 0 and binary products defined by +. A map of Lawvere theories from (L, C) to (L , C ) is a (necessarily strict) finite-product preserving functor from C to C that commutes with the functors L and L . Lawvere theories and their maps form a category Law. Note that L : N → C is usually (equivalently) presented as N0op → C, where N0 , a skeleton of the category of finite sets and functions, has an initial object 0 and finite coproducts given by the sum of integers. Theorem 2 The categories Ucwf ctx and Law are equivalent. Proof Let (C, TC ) be a contextual ucwf. We already noted that C0 ∼ = N. Moreover, we observe that there is a unique contextual ucwf (D, TD ), isomorphic to (C, TC ) and such that D0 = N with 0 terminal and s(n) = n + 1 for all n ∈ D0 . Hence we assume from now on that contextual ucwfs have natural numbers as objects. By Proposition 1 below, N is the base category of the initial ucwf. Hence, if (C, TC ) is a contextual ucwf, there is a unique functor L : N → C, which is the first component of a cwf-morphism. In particular, L preserves the terminal object and context comprehension on the nose. It follows that it is identity-on-objects and strictly finite-product preserving. If (F, σ) is a cwf-functor between contextual ucwfs (assuming w.l.o.g. that these ucwfs have N as objects), it follows that F is a morphism between the corresponding Lawvere theories. Conversely, for any morphism F between
148
Simon Castellan, Pierre Clairambault, and Peter Dybjer
the corresponding Lawvere theories, σ can be uniquely recovered from F and projection. This yields a full and faithful functor from Ucwf ctx to Law. Finally this functor is surjective on objects: from (L, C) a Lawvere theory, there is a ucwf with category C; terms Tm(n) = C(n, 1), and context comprehension s(n) = n + 1. The universal property follows from that of the finite product. From all that, it follows that Ucwf ctx and Law are equivalent. Remark 4 In a recent paper Fiore and Voevodsky [19] prove a closely related result about C-systems, a variant of Cartmell’s contextual categories. They prove that their category of Lawvere theories is isomorphic to the subcategory of C-systems whose length functions (in the definition of contextuality) are bijections.
3.3 λβη-ucwfs Ucwfs give rise to a generalized algebraic theory which captures the combinatorics of terms and substitution in a similar way as cartesian operads and Lawvere theories. This primitive structure may then be enriched with operations and equations for capturing specific theories, such as the pure λβη-calculus. Definition 8 A λβη-ucwf is a ucwf (C, Tm) with two more operations: λn : Tm(s(n)) → Tm(n) apn : Tm(n) × Tm(n) → Tm(n) for all n ∈ C0 , and four more equations: λn (b)[γ] = λm (b[γ ◦ pm , qm ]) apn (c, a)[γ] = apm (c[γ], a[γ]) apn (λn (b), a) = b[idn , a] λn (aps(n) (c[pn ], qn )) = c
(β) (η)
for γ : m → n, b ∈ Tm(s(n)), and c, a ∈ Tm(n). A λβ-ucwf has the same operations, but it not subject to the (η) equation. The definition above is natural and close to the syntax. As we will see later on, it is the direct simplification of the notions of arrow and Π-types in the simply-typed and the dependently typed case discussed later on. However, in the unityped case, this definition can be simplified dramatically. Proposition 2 Let (C, Tm) be a ucwf. Then, λβη-structures on C are equivalently defined as natural isomorphisms between presheaves λ
Tm(s(−)) ∼ = Tm(−)
Categories with Families: Unityped, Simply Typed, and Dependently Typed
149
where the functorial action of s is defined as s(γ) = γ ◦ pm , qm . More precisely, (1) for any λβη-structure λ is such a natural isomorphism, and (2) given such a natural isomorphism, there is a unique λβη-structure giving rise to it. Proof For (1), given a λβη-structure on (C, Tm), we first observe that λn is natural by the substitution law. For c ∈ Tm(n) we set λ−1 n (c) = aps(n) (c[pn ], qn ) ∈ Tm(s(n)) – using β, η and the substitution law for λ, λn and λ−1 n are inverse. For (2), given a natural iso λn , we set apn (c, a) = λ−1 n (c)[idn , a]. The β-rule follows from the fact that λ−1 ◦ λ is the identity. The η-rule follows n n from the naturality of λ−1 plus the fact that λn ◦ λ−1 is the identity. The n substitution law for λ is by naturality of λ, and the substitution law for ap is by naturality of λ−1 . Finally, uniqueness of the λβη-structure (i.e. of apn ) relies on the substitution rule for ap. 3.3.1 Some related models of the untyped λ-calculus There are many notions of model of λ-calculus, see for example Barendregt [5]. We will only briefly discuss the ones given by Obtulowicz [34], Aczel [2], and Lambek and Scott [27]. Obtulowicz’s algebraic theories of type λ–βη are Lawvere theories similar to the Lawvere theories corresponding to contextual λβη-ucwfs, but use an evaluation morphism ε as a primitive instead of ap. These operations are interdefinable, via ε = ap(π12 , π22 ) ∈ Tm(2) and ap(c, a) = ε[c, a] ∈ Tm(n) for c, a ∈ Tm(n). As a basis for his notion of Frege structure, Aczel introduces a notion of lambda structure. This in turn is based on the notion of an explicitly closed family, which is a cartesian operad where Tm(n) ⊆ Tm(0)n → Tm(0), so that a[γ] is function composition, and projections are the projections in the metalanguage. It is thus a cartesian operad which is well-pointed in the sense that a, a ∈ Tm(n) and a[γ] = a [γ] for all γ ∈ C(0, n) implies a = a . To model the λβ-calculus Aczel adds two operations λ0 : Tm(1) → Tm(0) ap0 : Tm(0) × Tm(0) → Tm(0) The resulting notion of lambda structure is equivalent to well-pointed λβucwfs. Since terms are functions, there is a unique way to define the operations λn and apn for n > 0 so that they satisfy the substitution laws of λβ-ucwfs: λn (b)(γ) = λ0 (b(γ ◦ p0 , q0 )) apn (c, a)(γ) = ap0 (c(γ), a(γ))
150
Simon Castellan, Pierre Clairambault, and Peter Dybjer
for γ ∈ C(0, n). The general substitution rules follow from this definition. Lambek and Scott propose C-monoids as their notion of model of the untyped λ-calculus. These are monoids with extra structure coming from combinators of cartesian closed categories. C-monoids capture the equational behaviour of closed rather than open terms. Like in λβη-ucwfs and cartesian closed categories, variables are dealt with indirectly as projections. But in λβη-ucwfs, variable addressing is external, that is, handled by the ucwf structure. There are no term constructors for pairs and projections – in particular closed terms, i.e., terms in Tm(0), do not form a C-monoid as they support no pairing and projection operations. In contrast, C-monoids handle variable addressing through pairs and projections at the term level. We expect a strong relationship between C-monoids and λβη-ucwfs with term-level pairs and projections. The proof should follow [27], encoding open terms and substitution within C-monoids via functional completeness. 3.3.2 Initial λβη-ucwfs To conclude the discussion about λβη-ucwfs, we include a construction of the untyped λ-calculus as the initial such structure. For that, let us say that a strict cwf-morphism F between λβη-ucwfs (C, TmC ) and (D, TmD ) is a strict λβη-ucwf-morphism iff the action of F on terms preserves all the term constructors on the nose. Let us write Ucwf λβη for the category of small λβη-ucwfs and strict λβη-ucwf-morphisms. Then, we have: Proposition 3 The category Ucwf λβη has an initial object. Construction 1. The most direct method is similar to Construction 1 of an initial ucwf. We simultaneously define the three families C0 , C(n, m), and Tm(n) where the λβη-ucwf-operations become introduction rules. Then we take the quotients with respect to the equivalence relations generated by the λβηucwf-equations. This construction can be viewed as a well-scoped variable free version of the λσ-calculus of Abadi, Cardelli, Curien, and L´evy [1]. Construction 2. Another initial λβη-ucwf can be constructed from the (well-scoped) λβη-calculus. We let C0 = N and generate the family Tm(n) by the following rules varn (i) : Tm(n) (i ∈ Var(n)) λn : Tm(s(n)) → Tm(n) apn : Tm(n) × Tm(n) → Tm(n) quotiented by the equivalence relation ∼ generated by β and η: apn (λn (b), a) ∼ b[idn , a] λn (aps(n) (c[pn ], qn )) ∼ c
Categories with Families: Unityped, Simply Typed, and Dependently Typed
151
Note that the variables varn (i) where i ∈ Var(n) were simply represented by the number i in the corresponding construction in Proposition 1. We let C(n, m) = Tm(n)m and define substitution by induction on Tm(n): varn (i)[(a1 , ..., an )] = ai λn (b)[γ] = λs(n) (b[γ ◦ pm , qm ]) apn (c, a)[γ] = apm (c[γ], a[γ]) for γ ∈ C(m, n). Note that this construction is an extension of Construction 2 of an initial plain ucwf. The two constructions, and the fact that both give rise to initial λβη-ucwfs, have been formalised in Agda by Brilakis [7].
4 Simply-typed cwfs En route to full cwfs, we now add types yielding simply-typed cwfs (scwfs) (called non-dependent cwfs in Clairambault and Dybjer [13]). We will then study the relationship with cartesian and cartesian closed categories. (By cartesian category we here mean categories with finite products, whereas some of the literature including Johnstone [26] uses this term for categories with finite limits.)
4.1 Plain scwfs A simply-typed cwf (scwf ) is a cwf where the presheaf of types Ty : C op → Set is constant, i.e. forms a set Ty not depending on the context and invariant under substitution. We can thus simplify the definition as follows: Definition 9 An scwf consists of the following: • A category C with a terminal object 1. • A set Ty. • A family of presheaves TmA : C op → Set for A ∈ Ty. (We also write Tm(Γ, A) for TmA (Γ ).) • A context comprehension operation which to Γ ∈ C0 and A ∈ Ty assigns a context Γ · A and two projections pΓ,A : Γ · A → Γ
qΓ,A ∈ Tm(Γ · A, A)
satisfying the following universal property: for all γ : Δ → Γ , for all a ∈ Tm(Δ, A), there is a unique γ, a : Δ → Γ · A such that
152
Simon Castellan, Pierre Clairambault, and Peter Dybjer
pΓ,A ◦ γ, a = γ
qΓ,A [γ, a] = a
We say that (Γ · A, pΓ,A , qΓ,A ) is a context comprehension of Γ and A. Remark 5 Context comprehension for scwfs amounts to a representation Γ ·A of the presheaf C(−, Γ ) × TmA (−) : C op → Set for all Γ ∈ C0 and A ∈ Ty. An scwf is a particular kind of cwf, and a (strict) scwf-morphism is simply a (strict) cwf-morphism between scwfs. Small scwfs and strict cwf-morphisms form a category Scwf . Scwf has an initial object, but it is not very interesting since its set of types is empty and its base category is restricted to the terminal object. Therefore we fix a set B of basic types and consider the category B-Scwf where objects are small scwfs (C, TyC , TmC ) together with an interpretation function −C : B → Ty. Morphisms are strict scwfs-morphisms that commute with the interpretation. Proposition 4 For all sets B the category B-Scwf has an initial object.
Proof The initial B-scwf, also called the free scwf over a set of types B, can be constructed in much the same two ways as we constructed initial ucwfs. We can either proceed as in Construction 1 where the operations become introduction rules for inductively generating the objects, and terms, and the equations inductively generate an equivalence relation. Alternatively, we can proceed as in Construction 2, and construct a typed version of the category N of Proposition 1. For the second construction we let Ty = B. Then we define C0 = List(Ty) C(Γ, [A1 , . . . , An ]) = Tm(Γ, A1 ) × · · · × Tm(Γ, An ) where Tm(Γ, A) = {varn (i) | A = Ai } containing the ith variable, where Γ = [A1 , . . . , An ]. Moreover, we define the projections as pΓ,A = (varn+1 (1), . . . , varn+1 (n)) qΓ,A = varn+1 (n + 1)
and substitution varn (i)[(a1 , . . . , an )] = ai Just as for ucwfs, the free scwf over a set B is contextual. In analogy with Section 3.2 we could relate scwfs to coloured cartesian operads (multicategories) and multi-sorted Lawvere theories, but we omit the unsurprising details. Instead, we discuss their relationship with cartesian categories.
Categories with Families: Unityped, Simply Typed, and Dependently Typed
153
4.2 Finite products as structure Cartesian categories (categories with finite products) are categories with a terminal object and binary cartesian products. Straightforward as it seems, this definition hides some subtleties which are put to the forefront when one considers the associated notions of morphism between cartesian categories. Namely, does the mere existence of a product for any two objects suffice to obtain a cartesian category, or are the finite products part of the data of a cartesian category? Texts in category theory adopt one view or the other, not always explicitely. In Lambek and Scott’s book, the latter view is explicitely adopted; and morphisms between cartesian categories must preserve this explicit data on the nose. Definition 10 A cartesian category (with structure) consists of the following: • A category C with a terminal object 1, where X : X → 1 denotes the unique arrow into it. • A product operation which to any A, B ∈ C0 assigns an object A × B ∈ C0 and two projections fstA,B : A × B → A
sndA,B : A × B → B
satisfying the following universal property: for all a : X → A, for all b : X → B, there is a unique a, b : X → A × B such that fstA,B ◦ a, b = a
sndA,B ◦ a, b = b .
A strict cartesian functor from cartesian category C (leaving implicit the other components) to D is a functor F : C → D preserving the structure on the nose, i.e. for all A, B ∈ C0 , F (A ×C B) = F (A) ×D F (B), and F (fstCA,B ) = fstD F (A),F (B)
F (sndCA,B ) = sndD F (A),F (B) .
We write CCs for the category of small cartesian categories (with structure) and strict cartesian functors, preserving the structure on the nose. We will later consider another notion of cartesian category, where we are content with the mere existence of a terminal object and a cartesian product for any two objects, and the only data part of the structure is the basic category C – there are no chosen products. This, of course, constrains the maps: the notion of strict cartesian morphisms as above would make no sense without chosen structure. We observe in passing that from the preservation of projections and the universal property, it follows directly that strict cartesian functors also preserve tuples on the nose, in the sense that F (X ) = F (X) and F (a, b) = F (a), F (b).
154
Simon Castellan, Pierre Clairambault, and Peter Dybjer
4.2.1 Finite product types We now wish to compare cartesian categories with structure, as above, with scwfs. They are similar, but the main difference is that scwfs distinguish contexts and types; whereas cartesian categories do not have this distinction. In particular, when constructing an scwf from a cartesian category we must recover the types as the objects of C. It follows that the resulting scwfs support a finite product operation on types. Looking for an equivalence, we define what it means for an scwf to support finite product types and introduce a binary product type A × B, and a unit type N1 . Definition 11 An N1 -structure on an scwf C consists of a type N1 ∈ Ty, and for each Γ a term 01 ∈ TmC (Γ, N1 ) such that for all c ∈ Tm(Γ, N1 ), 01 = c. Definition 12 A ×-structure on an scwf C consists of, for each A, B ∈ Ty, a type A × B ∈ Ty such that for all Γ ∈ C0 there are term formers fstΓ,A,B (−) : Tm(Γ, A × B) → Tm(Γ, A) sndΓ,A,B (−) : Tm(Γ, A × B) → Tm(Γ, B) −, − : Tm(Γ, A) × Tm(Γ, B) → Tm(Γ, A × B) such that fst(a, b) = a snd(a, b) = b fst(c), snd(c) = c a, b[γ] = a[γ], b[γ] . We thus have a type formation rule for A × B and term formation rules for projections and pairs. The first three equations are the usual rules for product types with surjective pairing, while the last one states stability under substitution. By an scwf with finite product types we mean an scwf with an N1 -structure and a ×-structure. Remark 6 Having a ×-structure on an scwf C amounts to requiring that there is a binary type former × and a natural isomorphism of presheaves TmC (−, A) × TmC (−, B) ∼ = TmC (−, A × B) . Likewise, an N1 -structure corresponds to a type N1 and a natural isomorphism between TmC (−, N1 ) and the constant singleton presheaf. The product type structure on scwfs should be preserved by morphisms. Definition 13 If C, D are scwfs with finite product types, then a strict scwfmorphism F : C → D (strictly) preserves N1 -structure if F (N1 C ) = N1 D ,
Categories with Families: Unityped, Simply Typed, and Dependently Typed
155
C and F (0C1 ) = 0D 1 . Similarly, it (strictly) preserves ×-structure if F (A × C D C D B) = F (A) × F (B), F (fstA,B (c)) = fstF A,F B (F (c)) and F (sndA,B (c)) = sndD F A,F B (F (c)). We write Scwf N1 ,× for the category where the objects are small scwfs with product types, and morphisms are strict structure-preserving cwf-morphisms.
Given an scwf with finite product types we construct a cartesian category as follows. First, we define the category of types and terms in context. Definition 14 Let C be an scwf with finite product types. We define the category Ty(Γ ) of types and terms in context Γ ∈ C0 as having: (1) as objects, Ty; (2) as morphisms from A ∈ Ty to B ∈ Ty, the terms b ∈ Tm(Γ · A, B). If b ∈ Tm(Γ · A, B) and c ∈ Tm(Γ · B, C), then their composition is c ◦ b = c[pΓ,A , b] with identity idΓ,A = qΓ,A ∈ Tm(Γ · A, A). It follows that Ty(Γ ) is a cartesian category with structure for all Γ ∈ C0 . Lemma 1 For any Γ , we let N1 be the chosen terminal object in Ty(Γ ). For every A, B ∈ Ty, we let their product be A × B ∈ Ty and the projections fst(qΓ,A×B ) ∈ Ty(Γ )(A × B, A)
snd(qΓ,A×B ) ∈ Ty(Γ )(A × B, B)
We omit the (straightforward) proof. This entails that given an scwf with finite product types C, there is a canonical cartesian category with structure, the cartesian category of closed terms TyC (1C ). Proposition 5 There is a functor C : Scwf N1 ,× → CCs which to any scwf with products C associates TyC (1C ), and to any structurepreserving cwf-morphism F : C → D associates C(F ) : TyC (1C ) → TyD (1D ) given by the action of F on types and terms.
4.2.2 From cartesian categories to scwfs Since C forgets the structure of contexts and only remembers closed types, we expect a functor L in the opposite direction to somehow reconstruct contexts. There are two natural candidates for this. The first is to reverse the effect of C by reconstructing the context formally, in an operation analogous to the construction of the cartesian category of polynomials in Lambek and Scott. We will detail this below. In this way we do not directly get an equivalence, because if C is an arbitrary scwf, the contexts of LC(C) are generated inductively from types. However, we get an equivalence for contextual scwfs.
156
Simon Castellan, Pierre Clairambault, and Peter Dybjer
The other option is to let the category of contexts of L(C) be C, reflecting the dual role of objects in cartesian categories as both contexts and types. Context comprehension is defined via finite products. As simple as it looks, this construction does not yield an equivalence even when restricting scwfs. (It would if one restricted to democratic scwfs such that for each Γ ∈ C0 , we have Γ = 1 · Γ , but this is not a natural hypothesis since it is not satisfied by the syntax). It is, however, behind the biequivalence between scwfs and cartesian categories as property that we shall discuss in the following subsection. Definition 15 If C is a cartesian category with structure, we define an scwf L(C) analogously to the free scwf in Proposition 4. The set of types is Ty = C0 . We define L(C)0 = List(C0 ) L(C)(Γ, [A1 , . . . , An ]) = C(ΠΓ, A1 ) × · · · × C(ΠΓ, An ) where Π[B1 , . . . , Bm ] = (. . . (B1 × B2 ) · · · × Bm ) and Π[] = 1. The terms are Tm(Γ, A) = C(ΠΓ, A). Substitution is defined as a[(γ1 , . . . , γn )] = a ◦ . . . γ1 , γ2 , . . . , , γn ∈ Tm(Δ, A) for (γ1 , . . . , γn ) : Δ → [A1 , . . . , An ] and a ∈ Tm(Γ, A) and composition in L(C) by (a1 , . . . , an ) ◦ γ = (a1 [γ], . . . , an [γ]). For Γ = [A1 , . . . , An ] and 1 ≤ i ≤ n, we write varn (i) ∈ Tm(Γ, Ai ) for the corresponding variable, obtained as the n-ary projection. For Γ = [A1 , . . . , An ] ∈ L(C)0 and A ∈ Ty we let Γ · A = [A1 , . . . , An , A]. The projections are defined by pΓ,A = (varn+1 (1), . . . , varn+1 (n)) qΓ,A = varn+1 (n + 1) . We thus get an scwf with finite product types: the N1 -structure is the terminal object of C and A × B is given by the cartesian product of C. If c ∈ Tm(Γ, A × B), the projections fstA,B (c) = fstA,B ◦ c
sndA,B (c) = sndA,B ◦ c
are immediate. (Note the overloading of fst and snd.) We observe that our construction yields that C(L(C)) = C for each cartesian category with structure C. This construction can be lifted to a functor L : CCs → Scwf N1 ,×
Categories with Families: Unityped, Simply Typed, and Dependently Typed
157
where, given F : C → D, L(F ) : L(C) → L(D) is obtained by letting F act component-wise. It is thus clear that all structure is preserved. Finally, C and L do not yet form an equivalence. Indeed, L(C(C)) is always contextual since its contexts are inductively generated, whereas C might not be. For instance, any context of C which is not obtained as an iterated context extension of types is not accounted for in L(C(C)). However, we have: Theorem 3 The functors C and L form an equivalence of categories: L
CCs i
+
N1 ,× Scwf ctx
C
Proof It only remains to observe that for any contextual scwf C with finite products types we have the isomorphism C ∼ = L(C(C)), where this isomorphism sends a context 1 · A1 · . . . · An to [A1 , . . . , An ].
4.3 Finite products as property 4.3.1 Cartesian categories as property We now define a notion of cartesian category where finite products are defined as a property of a category. Definition 16 Let C be a category. It is cartesian (as a property) if there exists a terminal object in C, and if for any two objects A, B ∈ C0 , there exists a cartesian product of A and B, i.e. a triple (P, π, π ) with P ∈ C0 , π : P → A, π : P → B satisfying the usual universal property. By “there exists”, we mean mere existence. The choice of terminal objects and the assignment of (P, π, π ) from A and B are not part of the structure. A cartesian category is just a particular kind of category with no additional data. Likewise, cartesian functors may be defined as: Definition 17 A functor F : C → D is cartesian if the image of a terminal object is terminal, and the image of a product (P, π, π ) is a product (F P, F π, F π ). We let CCp2 be the 2-category of small cartesian categories (with property) as objects, cartesian functors as 1-cells, and natural transformations as 2-cells. Thus cartesian functors are just certain functors with no additional data. We regard CCp2 as a 2-category because we will prove a biequivalence rather than an equivalence. Indeed, scwfs have chosen structure while cartesian categories (with property) do not. Going from an scwf to a cartesian category and back the structure is forgotten and then chosen again, and we cannot recover the original scwf up to isomorphism, only up to equivalence.
158
Simon Castellan, Pierre Clairambault, and Peter Dybjer
Remark 7 We could also introduce a notion of scwf with property, where context comprehension is only a property and not part of the structure. We will discuss this option briefly in the chapter about full cwfs.
4.3.2 Pseudo scwf-morphisms Previously, we defined a notion of strict scwf-morphism, but this does not match the notion of cartesian functor as property. To address this mismatch we need a notion of pseudo cwf-morphism where structure is only preserved up to isomorphism. Definition 18 A pseudo scwf-morphism from the scwf C to the scwf D consists of a functor F : C → D, a function F Ty : TyC → TyD , and a family FATm : TmCA (−) ⇒ TmD F Ty A (−) Tm for the component of FATm on Γ . of natural transformations. We write FΓ,A These data are subject to the conditions that (1) F 1 is terminal in D, and Tm (2) for all Γ ∈ C0 , A ∈ TyC (Γ ), the triple (F (Γ · A), F (pΓ,A ), FΓ,A (qΓ,A )) is Ty a context comprehension of F Γ and F (A) in D. Tm (qΓ,A )) must For strict cwf-morphisms, the triple (F (Γ · A), F (pΓ,A ), FΓ,A Ty coincide with the context comprehension of F Γ and F (A) chosen by the scwf structure of D. Here, we drop that assumption. For related reasons, the equivalence will work slightly differently than in Section 4.2. There the equivalence followed the slogan “the cartesian category corresponding to a scwf is its category of closed terms”. However, in Theorem 3 products in a cartesian category C are used in a central way in the definition of terms in L(C). These can be chosen once and for all, but the preservation of products up to isomorphism leads to an unwieldy definition of the functorial action of L. Instead, we adopt a simpler approach under the slogan “the cartesian category corresponding to an scwf is its base category”. With the aim of getting a biequivalence rather than an equivalence, the requirement of contextuality will be replaced by the weaker democracy. This additional structure must then also be preserved.
Definition 19 A democratic pseudo scwf-morphism between democratic scwfs C and D additionally has, for each Γ ∈ C0 , as isomorphism dΓ : F Ty (Γ ) ∼ = F Γ in the category Ty C (1), subject to a coherence diagram expressing that F (γΓ ) = γF Γ modulo some transports1 (see [12], p.19). A democratic scwf always has finite product types defined by N1 = 1 and A × B = 1 · A · B. So our biequivalence holds without adding these as extra structure. 1
Using the existing structure one can define a canonical morphism from F Γ to F Ty (Γ ) in Ty C (1). The coherence diagram amounts to dΓ being its inverse – hence F being democratic amounts to requiring that the canonical map from F Γ to F Ty (Γ ) is an iso.
Categories with Families: Unityped, Simply Typed, and Dependently Typed
159
Definition 20 We write Scwf 2dem for the 2-category with democratic small scwfs as objects, democratic pseudo scwf-morphisms as 1-cells, and natural transformations between the underlying functors as 2-cells. One may wonder why the definition of 2-cells does not have a type component. Such a component could be added, but then the biequivalence requires a coherence law which makes it redundant (see [10], p.7 and the expanded discussion in Appendix B).
4.3.3 The biequivalence We now provide the components of the biequivalence. First we observe that there is a forgetful 2-functor C2 : Scwf 2dem → CCp2 which maps a democratic scwf to its base category, a democratic pseudo cwf-morphism to its base functor, and leaves 2-cells unchanged. That this is well-defined relies on the facts that (1) if C is a democratic scwf, then, the base category C is cartesian (with property), because products in C may be defined as Γ ×Δ = Γ ·Δ; (2) if F is a democratic pseudo cwf-morphism, then the fact that the base functor preserves products follows from preservation of context comprehension and democracy. We now construct a 2-functor in the other direction. Proposition 6 There is a 2-functor L2 : CCp2 → Scwf 2dem . Proof In every cartesian category (with property) C we choose a terminal object 1 and a product (A × B, fstA,B , sndA,B ) for every A, B ∈ C0 . We now turn C into an scwf in the following way. Its category of contexts is C. Its set of types is Ty = C0 . If Γ ∈ C0 and A ∈ Ty, the terms are TmC (Γ, A) = C(Γ, A). Context comprehension is given by finite products. Democracy is the isomorphism Γ ∼ = 1 × Γ . Likewise, a cartesian functor F : C → D is extended to types and terms in the obvious way, and yields a democratic pseudo cwf-morphism. Theorem 4 The 2-functors C2 and L2 form a biequivalence of 2-categories: L2
CCp2 j
+
Scwf 2dem
C2
Proof We have C2 L2 = ICCp . To obtain a biequivalence we must construct pseudonatural transformations of 2-functors (or pseudofunctors)
160
Simon Castellan, Pierre Clairambault, and Peter Dybjer
IScwf 2dem o
η
/ L 2 C2
which are inverses up to invertible modifications. Concretely, we construct, for each democratic scwf C, pseudo cwf-morphisms ηC : C → L2 C2 C and C : L2 C2 C → C, both with the identity functor as base component. Besides we have ηCTy (A) = 1.A and Ty C (Γ ) = Γ on types, and ηCTm Γ,A (a) = , a
Tm C Δ,Γ (γ) = q1,Γ [γΓ ◦ Γ ]
on terms. These two pseudo cwf-morphisms are pseudonatural in C, and form an equivalence in Scwf 2dem : ηC ◦ C and idC2 L2 C (resp. C ◦ ηC and idC ) are related by invertible 2-cells with the identity as component. Finally, by unfolding definitions it follows that the invertible 2-cells satisfy the coherence condition for modifications between pseudonatural transformations. Together, Theorems 3 and 4 give two different points of view on the correspondence between scwfs and cartesian categories. It is interesting that we need to use a biequivalence not just for Martin-L¨ of’s type theory and locally cartesian closed categories as in [12], but already for cartesian categories if we omit chosen structure. For cartesian categories we can build both an equivalence (with structure) and a biequivalence (with property), whereas it seems that only the latter is possible for finitely complete categories and locally cartesian closed categories.
4.4 Adding function types Before going on to the dependently typed case, we mention how to add function types to the previous equivalences and biequivalences. First, we recall: Definition 21 A cartesian closed category (with structure) is a cartesian category (with structure) and an operation which to A, B ∈ C0 assigns an object A ⇒ B ∈ C0 with an evaluation εA,B : (A ⇒ B) × A → B such that for all f : C × A → B, ∃!h : C → A ⇒ B s.t. εA,B ◦ (h × A) = f . We write CCCs for the category having small cccs (with structure) as objects, and as morphisms, functors F : C → D preserving the structure on the nose, i.e. F (A ⇒C B) = F A ⇒D F B, and F (εCA,B ) = εD F A,F B . Likewise, we may add function types to scwfs in the following way. Definition 22 A ⇒-structure on an scwf C consists of, for each Γ ∈ C0 and A, B ∈ Ty, a type A ⇒ B together with term formers
Categories with Families: Unityped, Simply Typed, and Dependently Typed
161
λΓ,A,B : Tm(Γ · A, B) → Tm(Γ, A ⇒ B) apΓ,A,B : Tm(Γ, A ⇒ B) × Tm(Γ, A) → Tm(Γ, B) s.t., for a ∈ Tm(Γ, A), b ∈ Tm(Γ · A, B), c ∈ Tm(Γ, A ⇒ B),γ ∈ C(Δ, Γ ): λΓ,A,B (b)[γ] = λΔ,A,B (b[γ ◦ pΔ,A , qΔ,A ]) apΓ,A,B (c, a)[γ] = apΔ,A,B (c[γ], a[γ]) apΓ,A,B (λΓ,A,B (b), a) = b[idΓ , a] λΓ,A,B (apΓ ·A,A,B (c[pΓ,A ], qΓ,A )) = c . If C, D have ⇒-structure, a (strict) cwf-morphism F : C → D preserves it if F (A ⇒C B) = F A ⇒D F B, and F (apCΓ,A,B (c, a)) = apD F Γ,F A,F B (F (c), F (a)). Remark 8 An ⇒-type structure on (C, Ty, Tm) is equivalent to the data of a binary type former ⇒ and a natural isomorphism of preheaves Tm(− · A, B) ∼ = Tm(−, A ⇒ B) , i.e. a representation of Tm(− · A, B). For γ : Δ → Γ , the functorial action Tm(γ · A, B) takes b ∈ Tm(Γ · A, B) to b[γ ◦ p, q] ∈ Tm(Δ · A, B). We recover apΓ,A,B (f, a) = λ−1 (f )[id, a] and derive the β and η-rules, and vice versa. N1 ,×,⇒ Let us write Scwf ctx for the category having as objects small contextual scwfs with an N1 -structure, a ×-structure and a ⇒-structure, and as morphisms the strict cwf-morphisms preserving this structure on the nose. N1 ,×,⇒ Theorem 5 The categories CCCs and Scwf ctx are equivalent.
Proof Straightforward extension of Theorem 3, which boils down to the definition of evaluation from application and vice versa. This is our version of one of the main results of the Lambek and Scott book, namely Theorem 3.11, the equivalence between simply-typed λ-calculi and N1 ,×,⇒ cartesian closed categories, where Scwf ctx plays the role of the category of typed λ-calculi. And indeed, the simply-typed λ-calculus arises as the free scwf with ⇒-structure over a set of types. If B is a set of basic types, we consider B-Scwf ⇒ ctx to be the category with objects scwfs with ⇒-structure together with an interpretation function −C : B → Ty; and as morphisms the strict scwf-morphisms that preserve ⇒-structure and commute with the interpretation. Then we have: Proposition 7 For all sets B, the category B-Scwf ⇒ has an initial object. Construction 1. Just as for ucwfs and scwfs, we can immediately turn the definition of scwf with ⇒-structure into an inductive definition of a free such. The resulting theory is a well-scoped variable-free version of the typed λσcalculus with base types in B.
162
Simon Castellan, Pierre Clairambault, and Peter Dybjer
Construction 2. Alternatively, we can construct an scwf with ⇒-structure free over B from a well-scoped version of the simply-typed λβη-calculus. The construction follows that of Proposition 4, except that the terms are now inductively defined with the following three constructors: varn (i) : Tm([A1 , . . . , Ai , . . . , An ], Ai )
(i ∈ Var(n))
λΓ,A,B : Tm(Γ · A, B) → Tm(Γ, A ⇒ B) apΓ,A,B : Tm(Γ, A ⇒ B) × Tm(Γ, A) → Tm(Γ, B) . The definition of substitution is then extended with λΓ,A,B (b)[γ] = λΔ,A,B (b[γ ◦ pΔ,A , qΔ,A ]) apΓ,A,B (c, a)[γ] = apΔ,A,B (c[γ], a[γ]) for γ ∈ C(Δ, Γ ). Finally we quotient with the equivalence relation generated by βη-conversion: apΓ,A,B (λΓ,A,B (b), a) ∼ b[idΓ , a] λΓ,A,B (apΓ ·A,A,B (c[pΓ,A ], qΓ,A )) ∼ c The two constructions and the proof of their equivalence have been formalised in Agda by Brilakis [7]. We can of course construct free objects in Scwf N1 ,×,⇒ and other categories of scwfs with extra type structure in a similar way. Likewise, we can show a biequivalence between the 2-category of cccs as property rather than structure, and the extension of the 2-category Scwf 2dem where scwfs additionally have a ⇒-structure, which is preserved up to isomorphism. We omit the details since they are similar to those in the proof of Theorem 4.
5 Dependently typed categories with families We now return to full dependently typed cwfs. After discussing alternative definitions, we show an explicit construction of a free cwf. We then add the type formers I, Σ, and Π and give an overview of the biequivalence theorems in Clairambault and Dybjer [11, 12]. Finally we outline the construction of a bifree lccc and the proof of undecidability of equality [9, 10] in this lccc.
Categories with Families: Unityped, Simply Typed, and Dependently Typed
163
5.1 Plain cwfs 5.1.1 Alternative definitions Context comprehension via representable presheaves. First, observe that the family valued presheaf T : C op → Fam may equivalently be given by two Set-valued presheaves Ty : C
op
→ Set
Tm : (
C
Ty)op → Set
C Ty is the category of elements of the first presheaf. Context comwhere prehension for full cwfs is equivalent to requiring that for all Γ ∈ C0 and A ∈ Ty(Γ ), there is a representation Γ · A of the presheaf Δ → Tm(Δ, A[γ]) γ∈C(Δ,Γ )
δ → (γ, a) → (γ ◦ δ, a[δ])
Remark 9 Alternatively, we could consider a weaker notion of cwf with context comprehension as a property rather than structure. We then only require that the above presheaves are representable, that is, we only require the mere existence of the representing presheaves. Natural models. More radically, Awodey [4] and Fiore [18] propose to replace the Fam-valued presheaf T : C op → Fam by two set valued presheaves Ty : C op → Set
Tm : C op → Set
and a natural transformation typeof : Tm ⇒ Ty. One can then define context comprehension in terms of representable natural transformations. Let Y : op C → SetC be the Yoneda embedding. A natural transformation σ : G ⇒ F between presheaves on C is representable in the sense of Grothendieck, if for all C ∈ C0 and c ∈ F (C), there are D ∈ C0 , p ∈ C(D, C), and d ∈ G(D), such that the following diagram in the category of presheaves is a pullback: Y(D)
d
Y(p)
Y(C)
c
+3 G +3 F
σ
where c and d in the diagram are shorthand for the corresponding respective natural transformations F (−)(c) and G(−)(d) from the Yoneda lemma.
164
Simon Castellan, Pierre Clairambault, and Peter Dybjer
Hence, typeof : Tm ⇒ Ty is representable provided for all Γ ∈ C0 and A ∈ Ty(Γ ), there is Γ · A ∈ C0 , pΓ,A ∈ C(Γ · A, Γ ), and qΓ,A ∈ Tm(Γ · A), such that the following diagram is a pullback: C(−, Γ · A)
qΓ,A [−]
pΓ,A ◦−
C(−, Γ )
+3 Tm typeof
A[−]
+3 Ty
We emphasize that the function that maps Γ and A to the triple Γ · A, pΓ,A , qΓ,A is part of the structure – the natural transformation typeof : Tm ⇒ Ty is represented. Awodey [4] and Newstead [33] study this notion under the name natural models. We refer to their papers for further development of the theory. Note that this approach suggests an essentially algebraic view of cwfs rather than a generalized algebraic one. Here we have only one sort Tm(Γ ) containing all terms a of some type A = typeof(a) ∈ Ty(Γ ). In other words terms are “fibred” over types. Remark 10 Alternatively, we get a notion of natural model with context comprehension as property if we only require the mere representability of the natural transformation typeof. This notion is considered by Ahrens, Lumsdaine, and Voevodsky [3]. Categories with attributes. There is a certain redundancy in the definition of categories with families, since we can show (using context comprehension) that terms are in one-to-one correspondence with certain morphisms of the base category: Tm(Γ, A) ∼ = {γ ∈ C(Γ, Γ · A) | pΓ,A ◦ γ = idΓ } In other words, terms in Tm(Γ, A) correspond to sections of pΓ,A , the display map for the type A. We can thus remove the term part of cwfs and get the closely related notion of a category with attributes (cwa) [20]. This consists of a category C with a terminal object, a presheaf Ty : C op → Set for types and substitutions, and an operation which given Γ ∈ C0 and A ∈ Ty(Γ ) associates a context Γ · A and a “display map” pΓ,A : Γ · A → Γ ; and for each γ : Δ → Γ , a chosen pullback square: / Γ ·A
Δ · A[γ] pΔ·A[γ]
Δ
γ
/Γ
pΓ,A
Categories with Families: Unityped, Simply Typed, and Dependently Typed
165
This follows the idea of “substitutions as pullbacks” familiar in categorical logic. This choice of pullbacks is finally required to be split, in the sense that the association of substitutions to pullback squares is functorial. It is fairly easy to prove that categories with attributes are equivalent to categories with families [23]. This proof and several other proofs relating models of dependent type theory are formalized in the UniMath system by Ahrens, Lumsdaine, and Voevodsky [3]. Categories with attributes predate categories with families. In fact categories with families were originally introduced [17] as a modification of cwas. The main point of the change was to obtain a definition that can be expressed as a generalised algebraic theory with a transparent connection with MartinL¨of’s explicit substitution calculus formulation of dependent type theory. This was achieved by making the family of terms into an explicit part of the definition and formalize the sets of types and terms and their substitution operations as a family valued presheaf. In this way the pullback property of type substitution could be removed from the definition since it can be derived from the other part of the structure.
5.1.2 A free cwf As in the previous sections, we next show how to build a free cwf. Recall that in the simply-typed case, we built a free scwf over a set B of basic types, and we could do the same here. However, for simplicity, and because it suffices to prove the undecidability theorem, we will assume that there is only one basic type o. The rules used for the construction of this plain free cwf (except the rule for the base type) correspond to the general rules for dependent type theory. This construction can then be extended to the case where we add rules for specific type formers. The construction of initial ucwfs and free scwfs are rather immediate from the definitions: simply turn the definitions into simultaneous inductive definitions of the families of terms and substitutions, and then define equality of terms and substitutions by another simultaneous inductive definition. Unfortunately, the construction of free full cwfs is no longer as direct. What complicates the matter is the type-equality rule, which means that typability of terms may depend on proofs of equality of types. Thus we have to define equality of contexts, context morphisms, types, and terms simultaneously with their elements. Apart from this the recipe is similar: take the definition of the generalised algebraic theory of cwfs and turn it into a mutual inductive definition where all equality reasoning is made explicit. To build the free cwf we first define raw contexts, raw substitutions, raw types, and raw terms.
166
Simon Castellan, Pierre Clairambault, and Peter Dybjer
Γ ∈ Ctx ::= 1 | Γ · A γ ∈ Sub ::= γ ◦ γ | idΓ | Γ | pA | γ, aA A ∈ Ty ::= o | A[γ] a ∈ Tm ::= a[γ] | qA We then need to define the well-formed contexts and types, and the welltyped substitutions and terms. In Martin-L¨ of’s substitution calculus these are defined by a system of inference rules for all the eight forms of judgments. Here we choose a more economical way, by only defining well-formed equal contexts and types, and the well-typed equal substitutions and terms. Thus we define four families of partial equivalence relations (pers), corresponding to the four forms of equality judgments, by a mutual inductive definition: Γ = Γ
Γ A = A
Δ γ = γ : Γ
Γ a = a : A
where Γ, Γ ∈ Ctx, γ, γ ∈ Sub, A, A ∈ Ty, and a, a ∈ Tm. The basic judgment forms can then be defined as the reflexive instances of the pers: • • • •
Γ abbreviates Γ = Γ , Γ A abbreviates Γ A = A, Δ γ : Γ abbreviates Δ γ = γ : Γ , Γ a : A abbreviates Γ a = a : A.
The four families of partial equivalence relations (pers) are given by a simultaneous inductive definition with the following introduction rules: Per-rules Γ = Γ Γ = Γ Γ =Γ Δ γ = γ : Γ Δ γ = γ : Γ Δ γ = γ : Γ Γ A = A Γ A = A Γ A = A Γ a = a : A Γ a = a : A Γ a = a : A
Γ = Γ Γ = Γ Δ γ = γ : Γ Δ γ = γ : Γ Γ A = A Γ A = A Γ a = a : A Γ a = a : A
Categories with Families: Unityped, Simply Typed, and Dependently Typed
Preservation rules for judgments Γ = Γ
Δ = Δ Γ γ = γ : Δ Γ γ = γ : Δ Γ = Γ
Γ = Γ Γ A = A Γ A = A
Γ A = A Γ a = a : A Γ a=a :A
Congruence rules for operators 1=1
Γ A = A Γ = Γ Γ · A = Γ · A
Δ γ = γ : Γ Γ A = A Δ A[γ] = A [γ ]
Γ = Γ Γ idΓ = idΓ : Γ
Γ = Γ Γ Γ = Γ : 1
Γ δ = δ : Δ Δ γ = γ : Θ Γ γ ◦ δ = γ ◦ δ : Θ
Γ A = A Γ · A p A = p A : Γ
Γ A = A Δ γ = γ : Γ Δ a = a : A[γ] Δ γ, aA = γ , a A : Γ · A Γ a = a : A Δ γ = γ : Γ Δ a[γ] = a [γ ] : A[γ]
Γ A = A Γ · A qA = qA : A[pA ]
Conversion rules Δθ:Θ Γ δ:Δ Ξγ:Γ Ξ (θ ◦ δ) ◦ γ = θ ◦ (δ ◦ γ) : Θ Γ γ:Δ Γ γ = γ ◦ idΓ : Δ Γ A Γ A[idΓ ] = A
Γ A Δγ:Γ Θδ:Δ Θ A[γ ◦ δ] = (A[γ])[δ] Γ a:A Δγ:Γ Θδ:Δ Θ a[γ ◦ δ] = (a[γ])[δ] : (A[γ])[δ]
Γ a:A Γ a[idΓ ] = a : A Γ A
Γ γ:Δ Γ γ = idΔ ◦ γ : Δ
Γ γ:1 Γ γ = Γ : 1
Δγ:Γ Δ a : A[γ] Δ pA ◦ γ, aA = γ : Γ
Γ A Δγ:Γ Δ a : A[γ] Δ qA [γ, aA ] = a : A[γ]
Δγ :Γ ·A Δ γ = pA ◦ γ, qA [γ]A : Γ · A
167
168
Simon Castellan, Pierre Clairambault, and Peter Dybjer
These rules correspond to the general rules for intuitionistic type theory, that is, the rules which are given before the rules for the type formers, see the discussion in Section 2.1.2. Finally, we have a rule for the base type: Base type 1o=o
Theorem 6 The cwf T , defined with the following data: • T0 = {Γ | Γ }/=c , where Γ =c Γ if Γ = Γ is derivable. • T ([Γ ], [Δ]) = {γ | Γ γ : Δ}/ =ΓΔ where γ =ΓΔ γ iff Γ γ = γ : Δ is derivable. Note that this makes sense since it only depends on the equivalence classes [Γ ] and [Δ] of Γ and Δ (morphisms and morphism equality are preserved by object equality). • TyT ([Γ ]) = {A | Γ A}/ =Γ where A =Γ B if Γ A = B. • TmT ([Γ ], [A]) = {a | Γ a : A}/ =ΓA where a =ΓA a if Γ a = a : A. is the free cwf on one base type. By free cwf on one base type we mean that it is initial in the category Cwf o having as objects small cwfs with a chosen type o ∈ Ty(1), and as morphisms the strict cwf-morphisms which preserve the chosen base type. We refer to Castellan, Clairambault, and Dybjer [9, 10] for the proof of freeness of T . In the papers mentioned above, we show that the free cwf is also bifree in the fully dependent version Cwf 2dem of the 2-category Scwf 2dem of Section 4.3.3 (with a base type). Before we consider additional structure, let us define the morphisms of this 2-category, which will play an important role later on. Definition 23 A pseudo-cwf morphism from a cwf C to a cwf D is a pair (F, σ) where F : C → D is a functor and for each Γ ∈ C0 , σΓ is a Fammorphism from T Γ to T F Γ preserving the structure up to isomorphism. In particular there are isomorphisms (again writing F for all components): θA,γ : F A[F γ] 1 !F : ρΓ,A : F (Γ · A)
∼ =F Γ F (A[γ]) ∼ = F1 ∼ = FΓ · FA
(for γ : Γ → Δ)
where ∼ =F Γ means that it is an isomorphism in the category Ty D (F Γ ) of types over F Γ in D. These data must satisfy some coherence diagrams (see [12], Definition 3.1 for details). We can now prove the following. Theorem 7 The cwf T is bifree over one base type, i.e. it is bi-initial in the 2-category Cwf 2o having as objects small cwfs with one base type, as 1cells pseudo cwf-morphisms preserving the base type up to iso, and as 2-cells natural transformations between the base functors.
Categories with Families: Unityped, Simply Typed, and Dependently Typed
169
Proof We refer the reader to [10] for details and proofs. We remark that T is democratic and also bi-initial in Cwf 2o,dem . Previously we defined initial ucwfs and scwfs (with extra structure) in two ways: with and without explicit substitutions. The above construction of the free cwf gives rise to a calculus with explicit substitutions. There is an analogous construction of a free cwf where substitution is instead defined as a meta-operation, which is close to the standard formulation of the general rules for dependent type theory. We do not have space to spell out this construction with implicit substitutions here, but refer to Streicher [36].
5.2 Extensional identity types, Σ-types, and finite limits 5.2.1 Extensional identity types and Σ-types We now add extensional identity types and Σ-types to cwfs. Definition 24 An Iext -structure on cwf C consists of for each Γ ∈ C0 , A ∈ Ty(Γ ) and a, a ∈ Tm(Γ, A), a type IA (a, a ); and a term rA,a ∈ Tm(Γ, IA (a, a)) such that if c ∈ Tm(Γ, IA (a, a )) then a = a and c = rA,a , and such that IA (a, a )[γ] = IA[γ] (a[γ], a [γ]) for any γ ∈ C(Δ, Γ ). This captures extensional, rather than intensional, identity types. The difference is that the two equations a = a and c = rA,a whenever Tm(Γ, IA (a, a )) is inhabited are only valid for extensional identity types. It follows that the reflexivity term is preserved by substitution as well: rA,a [γ] is forced to coincide with rA[γ],a[γ] since the two both inhabit IA[γ] (a[γ], a[γ]). As in the previous sections, strict cwf-functors between cwfs equipped with an extensional identity type structure are said to preserve it strictly if the action on morphisms maps the components of the source to the components of the target, on the nose. In the remainder of this paper, a more important role will be played by morphisms preserving this structure up to isomorphism. If F : C → D is a pseudo cwf-morphism where C and D are equipped with an Iext -structure, we say that it preserves it provided there is, for any A ∈ TyC (Γ ), a, a ∈ TmC (Γ, A), an isomorphism F (IA (a, a )) ∼ = IF A (F a, F a ) in TyD (Γ ).
170
Simon Castellan, Pierre Clairambault, and Peter Dybjer
Definition 25 The definition of an N1 -structure for a cwf is the same as for an scwf, except that N1 now is also required to be stable under substitution, i.e., a natural transformation of type presheaves 1 ⇒ Ty(−) and, as before, a natural isomorphism between preheaves 1∼ = TmC (−, N1 ) where again 1 is the constant singleton presheaf. If a cwf is democratic it has an N1 -structure, since N1 may be defined as 1 ∈ Ty(1). Definition 26 A Σ-structure on a cwf C consists of, for each Γ ∈ C0 , A ∈ Ty(Γ ), B ∈ Ty(Γ · A), a type Σ(A, B) ∈ Ty(Γ ), and term formers fstΓ,A,B (−) : Tm(Γ, Σ(A, B)) → Tm(Γ, A) sndΓ,A,B (−) : Πc∈Tm(Γ,Σ(A,B)) Tm(Γ, B[id, fst(c)]) −, − : (Σa∈Tm(Γ,A) Tm(Γ, B[id, a])) → Tm(Γ, Σ(A, B)) subject to the same equations as in Definition 12, plus the additional Σ(A, B)[γ] = Σ(A[γ], B[γ ◦ p, q]) . A Σ-type structure thus gives rise to a natural transformation Σ of type presheaves TyC (Γ · A) ⇒ TyC (Γ ) A∈TyC (Γ )
and isomorphisms
TmC (Γ, B[id, a]) ∼ = TmC (Γ, ΣΓ (A, B))
a∈TmC (Γ,A)
which are stable under substitution (see Definition 12). Note the difference between this and the characterization of a ×-structure as natural isomorphisms between presheaves. Since A ∈ Ty(Γ ) and B ∈ Ty(Γ ·A) are dependent types the two sides of the isomorphism are no longer presheaves. It follows by induction on the length of a context that a contextual cwf with an N1 -structure and a Σ-structure is democratic. As usual, a strict cwf-morphism between cwfs with Σ-structure preserves it if it maps the structure in the source cwf to the structure in the target cwf on the nose. For pseudo-morphisms, it turns out that there is nothing to add. In any cwf C with a Σ-structure, for any Γ ∈ C0 , A ∈ Ty(Γ ) and B ∈ Ty(Γ · A), we have the isomorphism Γ · A · B ∼ = Γ · Σ(A, B). Since pseudo
Categories with Families: Unityped, Simply Typed, and Dependently Typed
171
cwf-morphisms are already known to preserve context extension, it follows that they automatically preserve Σ-structures, in the following sense. Proposition 8 A pseudo cwf-morphism F from C to D, where both cwfs have a Σ-structure, also preserves it in the sense that there is an isomorphism: sA,B : F (Σ(A, B)) ∼ = Σ(F A, F B[ρ−1 Γ,A ]) such that projections and pairs are preserved, modulo some transports (notably following sA,B , see Proposition 3.5 in [12] for details). ext ,Σ for the 2-category having as objects small demoLet us write Cwf 2,I dem cratic cwfs with an Iext -structure and a Σ-structure; as morphisms the pseudo cwf-morphisms preserving these structures up to isomorphism, and as 2-cells natural transformations between the base functors.
5.2.2 The biequivalence with finitely complete categories In Theorem 4, we proved a biequivalence between democratic scwfs and cartesian categories. We now sketch the dependently typed version: a biequivalence between democratic cwfs with Iext - and Σ-structures; and finitely complete (also called left exact) categories. Definition 27 A category C is finitely complete if it has all finite limits. A functor F : C → D between finitely complete categories is left exact if it preserves finite limits: the image of a limiting cone is a limiting cone. We write FL for the 2-category with small finitely complete categories as objects, left exact functors as 1-cells, and natural transformations as 2-cells. In the light of Section 4, it is natural to insist that we consider finitely complete categories to be categories with property, rather than with additional structure. How could we make a corresponding notion of finitely complete categories with structure? One could ask for a cartesian category with structure additionally equipped with a choice of equalizers; one could ask for a choice of pullbacks (and a terminal object); or one could directly ask for a choice of a limit of any finite diagram. One could then consider a category of these, where structure is preserved on the nose. However, the strict equivalence of Theorem 5 does not to extend to this situation: whereas in the simply-typed case we may prove an equivalence with structure or a biequivalence with property, it seems that the only possibility to relate the two in the present case is the biequivalence. We will comment again on that later. Let us now give some information about the main ingredients of the biequivalence. The first observation is that if C is a cwf with Iext and Σstructures, then for each Γ ∈ C0 there is an equivalence between the category of types over a context Ty(Γ ), and the slice category C/Γ . Indeed, each type
172
Simon Castellan, Pierre Clairambault, and Peter Dybjer
A ∈ Ty(Γ ) yields a display map pΓ,A : Γ · A → Γ regarded as an object in C/Γ . In the other direction, any γ : Δ → Γ is isomorphic (in C/Γ ) to pΓ,Inv(γ) : Γ · Inv(γ) → Γ , a projection corresponding to a type Inv(γ) ∈ Ty(Γ ), the “inverse image”, defined as (a cwf formalization of) x : Γ Σy:Δ (γ(x) = y) type. Via this equivalence of categories it follows that for each Γ ∈ C0 the slice category C/Γ has products, that is, C has pullbacks. Since it has a terminal object, it ext ,Σ then it has all finite limits. Likewise, if F : C → D is a 1-cell in Cwf 2,I dem preserves pullbacks in C – in fact, we have an equivalence: Lemma 2 Let C and D be democratic cwfs with Iext - and Σ-structures and F : C → D be a pseudo cwf-morphism preserving democracy. Then, F preserves the Iext -structure if and only if F preserves pullbacks. Proof The harder direction is only if, which boils down to the preservation of the inverse image. This can be proved from intricate calculations on cwf combinators. Details appear in [12], Lemma 4.3 and Proposition 4.4. We can then show that there is a forgetful 2-functor : ext ,Σ C : Cwf 2,I → FL . dem
The other direction is much more complicated. The equivalence of categories Ty(Γ ) C/Γ together with Seely’s approach to interpreting type theory in locally cartesian closed categories [35] suggest, from a finitely complete category C, to redefine the types over Γ as the objects of the slice category C/Γ . Now the question is how to define substitution, i.e. −[γ] : (C/Γ )0 → (C/Δ)0 for γ : Δ → Γ ? In categorical logic, substitution is usually defined by pullback. However, the problem is that for an arbitrary choice of pullbacks, there is no reason why this assignment should be functorial. Consider the following two pullback squares: · Ω
δ
/· /Δ
γ
/ Γ ·A pΓ,A /Γ
· Ω
/ Γ ·A pΓ,A γ◦δ /Γ
There is no reason why the left hand side diagram, which is a composition of chosen pullbacks, and the right hand side diagram, which is a chosen pullback, should coincide – although they are always isomorphic. In other words the codomain fibration is not split, whereas the fibration implicit in a cwf is always split. This is a fundamental issue: Seely’s proposed interpretation [35] sends types that are provably equal in the syntax to morphisms in C that are only known to be isomorphic. This coherence problem may be solved in
Categories with Families: Unityped, Simply Typed, and Dependently Typed
173
two ways: Curien [16] proposes to change the syntax by weakening equality to isomorphism in the syntax, enriching it with explicit coercions between isomorphic types, and showing the extended syntax equivalent to the original via a difficult coherence theorem. We refer to Curien, Garner, and Hofmann [15] for details. In [21], Hofmann proposes instead to solve the problem by exploiting a construction of B´enabou [6], which associates to each fibration an equivalent split fibration. This construction can be extended to dependent types: given a category C with finite limits, we build a cwf whose types are no longer just objects of C/Γ , but objects of C/Γ with a pre-chosen substitution pullback, for every possible substitution – such that this choice is split. For details, the reader is referred to [12], Section 5. As we show there, Hofmann’s construction yields a pseudofunctor : ext ,Σ L : FL → Cwf 2,I . dem ext ,Σ , one must This is not a functor: when sending F : C → D to Cwf 2,I dem extend F to types, i.e. display maps p together with a fixed choice of substitution pullbacks. But the choice of substitution pullbacks for p in C does not suffice to completely determine a choice of substitution pullbacks for F p in D, hence those must be chosen; causing L to fail functoriality on the nose. For this reason, it seems unlikely that switching to finite limit categories with structure would yield an equivalence of categories with strict maps, unless one considers categories with finite limits with a split choice of pullbacks 2 . ext ,Σ . Theorem 8 There is a biequivalence of 2-categories FL2 Cwf 2,I dem
Proof Once the mediating pseudofunctors are constructed, the proof is fairly close to that of Theorem 4. The reader is referred to [12], Section 6 for details.
5.3 Π-types and locally cartesian closed categories We now add Π-types and extend the results of Section 5.2. We shall sketch that Theorem 8 yields a biequivalence with locally cartesian closed categories. 2 One reviewer pointed out that one should not attribute the weakness of the equivalence solely to the pseudo-functoriality of L. Indeed, the reviewer suggests that a version of L based on Lumsdaine-Warren’s “left adjoint splitting” [29] rather than B´ enabou-Hofmann’s construction might be functorial, yet still only yield a biequivalence.
174
Simon Castellan, Pierre Clairambault, and Peter Dybjer
5.3.1 Π-types Definition 28 A Π-type structure on a cwf C consists of, for each Γ ∈ C0 , A ∈ Ty(Γ ), B ∈ Ty(Γ · A), a type Π(A, B) ∈ Ty(Γ ), and term formers λΓ,A,B : Tm(Γ · A, B) → Tm(Γ, Π(A, B)) apΓ,A,B : Tm(Γ, Π(A, B)) → Πa∈Tm(Γ,A) Tm(Γ, B[id, a] subject to the equations of Definition 22, plus the additional Π(A, B)[γ] = Π(A[γ], B[γ ◦ p, q]) . We note that a Π-type structure gives rise to a natural transformation Π of type presheaves: TyC (Γ · A) ⇒ TyC (Γ ) Γ → A∈TyC (Γ )
and isomorphisms TmC (Γ · A, B) ∼ = TmC (Γ, ΠΓ (A, B)) which are stable under substitution (see Definition 22). Note the difference between this and the characterization of an ⇒-structure as natural isomorphisms between presheaves. Since A ∈ Ty(Γ ) and B ∈ Ty(Γ · A) are dependent types TmC (Γ · A, B) and TmC (Γ, ΠΓ (A, B)) are no longer families of presheaves. Strict cwf-morphisms between cwfs with Π-structure preserve it if they map all components of the Π-structure in the source cwf to the same component in the target cwf, on the nose. For pseudo-morphisms, we define: Definition 29 A pseudo cwf-morphism F from C to D, where both cwfs have a Π-structure, also preserves it provided for all Γ ∈ C0 , A ∈ Ty(Γ ) and B ∈ Ty(Γ · A) there is an isomorphism in TyD (F Γ ) iA,B : F (Π C (A, B)) ∼ = Π D (F A, F B[ρ−1 Γ,A ]) such that application is preserved, modulo some transports (notably by iA,B , see Definition 9 in [12] for details). It is sufficient to require preservation of application, preservation of abstraction then follows. This follows the situation in cartesian closed categories with structure where evaluation is part of the structure, whereas abstraction is defined uniquely by the universal property – and although functors are only required to preserve evaluation, it follows that they preserve abstraction too. ext ,Σ,Π Let us write Cwf 2,I for the 2-category where the objects are small dem democratic cwfs with Iext -structure, Σ-structure and Π-structure; the mor-
Categories with Families: Unityped, Simply Typed, and Dependently Typed
175
phisms are pseudo cwf-morphisms preserving this structure (up to isomorphism), and the 2-cells are natural transformations between the base functors.
5.3.2 The biequivalence with locally cartesian closed categories Let us first recall locally cartesian closed categories. Definition 30 A category C is locally cartesian closed (lccc) if it has a terminal object and if for all Γ ∈ C0 , the slice category C/Γ is cartesian closed. Again, this definition is in terms of property rather than structure. This is one of the two usual definitions of locally cartesian closed categories. Equivalently, one could ask C to have finite limits, and require that for all γ : Δ → Γ , the pullback functor γ ∗ : C/Γ → C/Δ, which associates to any f : · → Δ the left hand side morphism of the pullback diagram /·
· γ ∗ (f )
Δ
γ
/Γ
f
obtained via finite limits, has a right adjoint Πδ : C/Δ → C/Γ . It is this right adjoint that was proposed by Seely for the interpretation of Π-types. As usual, this right adjoint to the pullback functor may be equivalently described as the data of cofree objects. Instantiated here, a right adjoint to γ ∗ : C/Γ → C/Δ consists of, for all δ : · → Δ an object of C/Δ, an object Πγ (δ) : · → Γ in C/Γ together with a co-unit δ : γ ∗ (Πγ (δ)) → δ (a morphism in C/Δ) satisfying the universal property of co-free objects. These data, along with the universal property, amount to a dependent product diagram 3 :
·
δ δ
·
/·
/Δ
/Γ
γ
Πγ (δ)
which is universal among any such diagram over δ and γ, as described below. 3 It was pointed out by a reviewer that those were called “distributivity pullbacks” by Weber [39]
176
Simon Castellan, Pierre Clairambault, and Peter Dybjer
·
δ δ
·
/·
·
/ ·
/Δ
/Γ
γ
Πγ (δ)
In other words, a locally cartesian closed category may be defined as a γ δ category C with finite limits such that, additionally, for every · → Δ → Γ in C there is a dependent product diagram as above. A locally cartesian closed functor may then be defined as a left exact functor (the image of a terminal object is terminal and the image of a pullback diagram is a pullback diagram) such that the image of a dependent product diagram is a dependent product diagram. There is a 2-category LCC2 having small lcccs as objects, locally cartesian closed functors as 1-cells, and natural transformations as 2-cells. Now we can build our biequivalence. First, we define a forgetful 2-functor ext ,Σ,Π C : Cwf 2,I → LCC2 , dem
which omits all components and only keeps the base category, as in Section 5.2.2. We must prove that if C is the base category of a democratic cwf with Iext -structure, Σ-structure and Π-structure, then C is locally cartesian closed. This is straightforward: using Π-types it is easy to show that for each Γ , Ty(Γ ) is cartesian closed, but Ty(Γ ) is equivalent to C/Γ as shown in Section 5.2.2. Alternatively one may construct dependent products: from γ δ · → Δ → Γ one may construct an isomorphic (in the obvious sense) sequence of projections via inverse image: p
p
Γ · Inv(γ) · Inv(δ) → Γ · Inv(γ) → Γ . For any such sequence of projections Γ · A · B → Γ · A → Γ there is a dependent product diagram, called the chosen dependent product diagram: εA,B
y
Γ ·A·B
/ Γ · Π(A, B)
Γ · A · Π(A, B)[pA ] pΠ(A,B)[pA ]
pB
/ Γ ·A
pA
/Γ
pΠ(A,B)
where εA,B = p, ap(q, q[p]). Combined with inverse image, this shows that δ
γ
any · → Δ → Γ has a dependent product diagram. Dependent product diagrams also permit a nice characterisation of the preservation of Π-structures: Lemma 3 Let F : C → D be a pseudo cwf-morphism between cwfs with Πstructure. Then F preserves the Π-structure if and only if the image of any chosen dependent product diagrams is a dependent product diagram.
Categories with Families: Unityped, Simply Typed, and Dependently Typed
177
Proof Proved through intricate calculations – see [12], Proposition 4.8, for details. ext ,Σ,Π This completes the definition of the forgetful 2-functor C : Cwf 2,I → dem 2 LCC . In the other direction, the construction is the same as for Theorem 8; the fact that the pseudofunctor L yields pseudo cwf-morphisms preserving Π-structures follows from Lemma 3. We conclude: ext ,Σ,Π . Theorem 9 There is a biequivalence of 2-categories LCC2 Cwf 2,I dem
5.3.3 Undecidability in the bifree locally cartesian closed category The construction of a free cwf in Theorem 6 can easily be extended when we add Iext , N1 , Σ and Π-types. This is done by adding the per-rules corresponding to formation, introduction, elimination, and equality rules for I ext , N1 , Σ and Π. We do not have room for explicitly displaying those rules, but they can be found in Castellan, Clairambault, and Dybjer [10]. We can thus construct a free cwf T Iext ,N1 ,Σ,Π,o with Iext , N1 , Σ and Π-type structures. It is both a free cwf (with the extra structure) on one base type and with respect to strict morphisms, and a bifree cwf (with the extra structure) on one base type and with respect to pseudo morphisms. Theorem 10 The cwf T Iext ,N1 ,Σ,Π,o is initial in the category Cwf Iext ,N1 ,Σ,Π,o , as well as bi-initial in the 2-category Cwf 2,Iext ,N1 ,Σ,Π,o . Moreover, it is demoext ,Σ,Π,o . cratic and bi-initial in Cwf 2,I dem Proof Details appear in [10].
An object I is bi-initial in a 2-category iff for any A there is an arrow I → A and for any two arrows f, g : I → A there exists a unique 2-cell θ : f ⇒ g. It follows that θ is invertible, and that bi-initial objects are equivalent. In the statement above, o denotes a base type. The objects of both Cwf Iext ,N1 ,Σ,Π,o ext ,N1 ,Σ,Π,o and Cwf 2,I have a distinguished type o ∈ Ty(1), preserved on the dem Iext ,N1 ,Σ,Π,o ext ,N1 ,Σ,Π,o nose for Cwf and up to isomorphism for Cwf 2,I . dem The presence of the base type o does not affect the biequivalence in Theorem 9, which extends to a biequivalence Cwf 2,Σ,Π,o LCC2,o where the dem latter has a distinguished object o ∈ C0 preserved by functors up to isomorphism. As bi-initial objects are transported to bi-initial objects via a biequivalence, it follows from Theorem 10 that the base category of the cwf T Iext ,N1 ,Σ,Π,o is bi-initial in LCC2,o : Theorem 11 The base category of T Iext ,N1 ,Σ,Π,o is the bifree lccc on one object. Having constructed a bifree lccc, it is natural to consider its word problem: given two (syntactic) substitutions γ, γ : Δ → Γ in T Iext ,N1 ,Σ,Π,o , is it decidable whether they are equal, or equivalently whether they have the same
178
Simon Castellan, Pierre Clairambault, and Peter Dybjer
interpretation in any locally cartesian closed category? As γ, γ are syntactic constructs in extensional type theory, one expects undecidability. However, prior undecidability proofs for extensional type theory rely on structure not available in our case. The folklore argument uses a universe and Hofmann’s undecidability proof [22] uses a type of natural numbers. In [10], we generalize the folklore result to hold with only one base type, i.e. in T Iext ,N1 ,Σ,Π,o , but without natural numbers or a universe. This relies on an encoding of combinatory logic in Martin-L¨ of type theory with I-types, Π-types, and a base type o; as a context ΓCL containing: k : o, s : o, · : o ⇒ o ⇒ o, axk : Πxy : o. I(o, k · x · y, x), axs : Πxyz : o. I(o, s · x · y · z, x · z · (y · z)) where the left-associative binary infix symbol “·” stands for application. While the above uses the syntax of type theory, it is easy to set up the same context just using the cwf combinators available in T Iext ,N1 ,Σ,Π,o , hence reducing the decision of equality between terms M, N in combinatory logic to deciding ? ΓCL M = N : o e.g. an equality of two terms M, N ∈ Tm(ΓCL , o). We conclude: Theorem 12 Equality is undecidable in the bifree locally cartesian closed category on one base type. Acknowledgements Pierre Clairambault was supported by the LABEX MILYON (ANR10-LABX-0070) of Universit´ e de Lyon, within the program “Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR). Peter Dybjer was supported by the Centre for Advanced Study at the Norwegian Academy of Science and Letters in Oslo, where he was a fellow in the Homotopy Type Theory and Univalent Foundations project while working on this paper. Finally, we would like to thank the anonymous reviewers for several useful suggestions.
References 1. Abadi, M., Cardelli, L., Curien, P., and L´ evy, J. Explicit substitutions. In Conference Record of the Seventeenth Annual ACM Symposium on Principles of Programming Languages, San Francisco, California, USA, January 1990 (1990), pp. 31–46. 2. Aczel, P. Frege Structures and the Notions of Proposition, Truth, and Set. NorthHolland, 1980, pp. 31–59. 3. Ahrens, B., Lumsdaine, P. L., and Voevodsky, V. Categorical structures for type theory in univalent foundations. Logical Methods in Computer Science 14, 3 (2018).
Categories with Families: Unityped, Simply Typed, and Dependently Typed
179
4. Awodey, S. Natural models of homotopy type theory. Mathematical Structures in Computer Science 28, 2 (2018), 241–286. 5. Barendregt, H. P. The Lambda Calculus. North-Holland, 1984. Revised edition. 6. B´ enabou, J. Fibred categories and the foundation of naive category theory. Journal of Symbolic Logic 50 (1985), 10–37. 7. Brilakis, K. On Initial Categories with Families – Formalization of Unityped and Simply Typed CwFs in Agda. Master’s thesis, Chalmers University of Technology, 2018. 8. Cartmell, J. Generalized algebraic theories and contextual categories. Annals of Pure and Applied Logic 32 (1986), 209–243. 9. Castellan, S., Clairambault, P., and Dybjer, P. Undecidability of equality in the free locally cartesian closed category. In 13th International Conference on Typed Lambda Calculi and Applications, TLCA 2015, July 1-3, 2015, Warsaw, Poland (2015), pp. 138–152. 10. Castellan, S., Clairambault, P., and Dybjer, P. Undecidability of equality in the free locally cartesian closed category (extended version). Logical Methods in Computer Science 13, 4 (2017). 11. Clairambault, P., and Dybjer, P. The biequivalence of locally cartesian closed categories and Martin-L¨ of type theories. In Typed Lambda Calculi and Applications - 10th International Conference, TLCA 2011, Novi Sad, Serbia, June 1-3, 2011. Proceedings (2011), pp. 91–106. 12. Clairambault, P., and Dybjer, P. The biequivalence of locally cartesian closed categories and Martin-L¨ of type theories. Mathematical Structures in Computer Science 24, 6 (2014). 13. Clairambault, P., and Dybjer, P. Game semantics and normalization by evaluation. In Foundations of Software Science and Computation Structures - 18th International Conference, FoSSaCS 2015, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceedings (2015), A. M. Pitts, Ed., vol. 9034 of Lecture Notes in Computer Science, Springer, pp. 56–70. 14. Curien, P. Categorical combinators. Information and Control 69, 1-3 (1986), 188–254. 15. Curien, P., Garner, R., and Hofmann, M. Revisiting the categorical interpretation of dependent type theory. Theoretical Computer Science 546 (2014), 99–119. 16. Curien, P.-L. Substitution up to isomorphism. Fundamenta Informaticae 19, 1,2 (1993), 51–86. 17. Dybjer, P. Internal type theory. In TYPES ’95, Types for Proofs and Programs (1996), no. 1158 in Lecture Notes in Computer Science, Springer, pp. 120–134. 18. Fiore, M. Discrete generalised polynomial functors. Slides for a talk given at the 39th International Colloquium on Automata, Languages and Programming (ICALP), 2012. 19. Fiore, M., and Voevodsky, V. Lawvere theories and C-systems. Accepted for publication in Proceedings of the American Mathematical Society. 20. Hofmann, M. Interpretation of type theory in locally cartesian closed categories. In Proceedings of CSL. Springer LNCS, 1994. 21. Hofmann, M. On the interpretation of type theory in locally cartesian closed categories. In CSL (1994), L. Pacholski and J. Tiuryn, Eds., vol. 933 of Lecture Notes in Computer Science, Springer. 22. Hofmann, M. Extensional concepts in intensional type theory. PhD thesis, University of Edinburgh, 1995. 23. Hofmann, M. Syntax and semantics of dependent types. In Semantics and Logics of Computation, A. Pitts and P. Dybjer, Eds. Cambridge University Press, 1996. 24. Hyland, M., and Power, J. The category theoretic understanding of universal algebra: Lawvere theories and monads. Electr. Notes Theor. Comput. Sci. 172 (2007), 437–458. 25. Jacobs, B. Categorical logic and type theory, vol. 141 of Studies in Logic and the Foundations of Mathematics. Elsevier, 1999.
180
Simon Castellan, Pierre Clairambault, and Peter Dybjer
26. Johnstone, P. T. Sketches of an elephant: A topos theory compendium, vol. 2. Oxford University Press, 2002. 27. Lambek, J., and Scott, P. J. Introduction to higher order categorical logic. No. 7 in Cambridge studies in advanced mathematics. Cambridge University Press, 1986. 28. Lawvere, F. W. Equality in hyperdoctrines and comprehension schema as an adjoint functor. In Applications of Categorical Algebra, Proceedings of Symposia in Pure Mathematics, A. Heller, Ed. AMS, 1970. 29. Lumsdaine, P. L., and Warren, M. A. The local universes model: An overlooked coherence construction for dependent type theories. ACM Trans. Comput. Log. 16, 3 (2015), 23:1–23:31. 30. Martin-L¨ of, P. Constructive mathematics and computer programming. In Logic, Methodology and Philosophy of Science, VI, 1979 (1982), North-Holland, pp. 153– 175. 31. Martin-L¨ of, P. Intuitionistic Type Theory. Bibliopolis, 1984. 32. Martin-L¨ of, P. Substitution calculus. Notes from a lecture given in G¨ oteborg, November 1992. 33. Newstead, C. Algebraic Models of Dependent Type Theory. PhD thesis, Department of Mathematical Sciences, Carnegie Mellon University, 2018. 34. Obtulowicz, A. Functorial semantics of the type free λ-βη calculus. In Foundations of Computation Theory (1977), pp. 302–307. 35. Seely, R. A. G. Locally cartesian closed categories and type theory. Proceedings of the Cambridge Philosophical Society 95 (1984), 33–48. 36. Streicher, T. Semantics of Type Theory. Birkh¨ auser, 1991. 37. Tasistro, A. Formulation of Martin-L¨ of’s theory of types with explicit substitutions. Tech. rep., Department of Computer Sciences, Chalmers University of Technology and University of G¨ oteborg, 1993. 38. Trimble, T. Towards a doctrine of operads. Article in nlab https://ncatlab.org/ toddtrimble/published/Towards+a+doctrine+of+operads#cartesian_operads_are_ equivalent_to_lawvere_theories, 2013 (accessed 8 March 2019). 39. Weber, M. Polynomials in categories with pullbacks. Theory Appl. Categ 30, 16 (2015), 533–598.
The Mathematics of Text Structure Bob Coecke
Abstract In previous work we gave a mathematical foundation, referred to as DisCoCat, for how words interact in a sentence in order to produce the meaning of that sentence. To do so, we exploited the perfect structural match of grammar and categories of meaning spaces. Here, we give a mathematical foundation, referred to as DisCoCirc, for how sentences interact in texts in order to produce the meaning of that text. First we revisit DisCoCat. While in DisCoCat all meanings are fixed as states (i.e. have no input), in DisCoCirc word meanings correspond to a type, or system, and the states of this system can evolve. Sentences are gates within a circuit which update the variable meanings of those words. Like in DisCoCat, word meanings can live in a variety of spaces e.g. propositional, vectorial, or cognitive. The compositional structure are string diagrams representing information flows, and an entire text yields a single string diagram in which word meanings lift to the meaning of the entire text. While the developments in this paper are independent of a physical embodiment (cf. classical vs. quantum computing), both the compositional formalism and suggested meaning model are highly quantum-inspired, and implementation on a quantum computer would come with a range of benefits. We also praise Jim Lambek for his role in mathematical linguistics in general, and the development of the DisCo program more specifically. Bob Coecke Oxford University, Department of Computer Science Cambridge Quantum Computing Ltd. e-mail: [email protected], [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_6
181
182
Bob Coecke
1 Introduction DisCoCat (cf. Categorical Compositional Distributional) [28] resulted from addressing the following question: There are dictionaries for words. Why aren’t there any dictionaries for sentences? This question is both of academic interest as well as of practical interest. Firstly, it addresses how we can understand sentences that we never heard before, provided we understand the words. Secondly, it could enable machines to do so too—see e.g. [15] for a preliminary discussion. Now, one obviously can extend the question to: Why aren’t there any dictionaries for entire texts? While there is no such thing like a grammar that very rigidly organises sentences in a text, there should be something else structuring sentences in a text, as, to put it naively, we can’t just swap sentences randomly around while retaining the meaning of the text. What exactly is the structure governing sentences? Also, how does it relate to the structure governing words in sentences? Overall, this paper has two purposes: • To give a mathematical foundation for sentence meaning composition, and how sentence meaning interacts with word meaning composition, by modifying as well as elaborating upon our theory of grammar-driven word meaning composition known as DisCoCat. • As this paper was commissioned for a volume dedicated to the great Jim Lambek, to praise Lambek for the development of mathematical linguistics in general, for his role in the development of the DisCoCat program more specifically, and for the role of physics in all of that [14]. Concerning the 1st goal, we believe that the results in this paper may constitute radical progress for the broad DisCoCat research program, as three important longstanding issues are addressed ‘in full generality’: a. How sentence meanings compose in order to form the meaning of text. b. How word meanings evolve in text, when learning new things. c. What the type of the sentence meaning space is. The key idea to achieve (a) is to not treat (all) word-meanings as static entities, like we did in DisCoCat, but as dynamic ones, so that they can evolve in the light of what is conveyed about that word within the text, i.e. (b). For example, using a movie analogy, we can learn more about the main actors of the text, and/or these actors themselves can learn about the rest of the world. Technically, rather than states, such actors will be represented by types, sentences then act on those types as I/O-processes, transforming some initial state of an actor into a resulting one, which yields (c). In other words, text made up of several sentences is organised as a circuit. Since:
The Mathematics of Text Structure
183
1. Word-meanings: states → types 2. Sentence-meanings: states → I/O-processes 3. Text-meaning: ∅ → circuit are a radical departure from the state-focussed DisCoCat program, we name it DisCoCirc (cf. Circuit-shaped Compositional Distributional). Other than that, all of the attractive features of DisCoCat are all retained (see Sec. 4.4), such as sentences with different grammatical structure having the same type, model-flexibility, and the diagrammatic format (cf. Sec. 3). In fact, DisCoCat can be seen as an instance of DisCoCirc, in that in any DisCoCirc(uit) there will be multiple participating DisCoCat(s)—and the gloves are off too:
DisCoCat
DisCoCirc
Jim Lambek’s legacy. We already started our praise for Lambek in the title of this paper, by tweaking the title of Lambek’s seminal paper “The Mathematics of Sentence Structure” [47]. Not much has changed in that Lambek’s story is pretty much still what mathematical linguists agree on today. Everyone? Well, Lambek himself certainly did not. Some 20 years ago Lambek decided to ‘do a von Neumann’,1 and replaced Lambek grammar with pregroup grammar [52, 53]. Around 2004 I was giving a talk at the McGill category theory seminar about our then new diagrammatic description of quantum teleportation [1, 16]. Lambek immediately pointed out: “Those are pre-groups!” The compact closed category-theory underpinning of quantum theory, have indeed pregroups as the posetal instance. It was this connection, between grammar and teleportation diagrams, that inspired the DisCoCat model, making it look as if word-meaning being teleported around in sentences by means of the channels provided by the pregroup grammar [14]. Lambek himself explicitly stressed this connection between language and physics in a paper written in 2006 [54], which due to my all too slow editorial skills only appeared in 2011. 1
von Neumann denounced his own quantum mechanical formalism merely three years after it was published as a book [81], and devoted a large portion of the remainder of his life to finding an alternative formalism—see [66] for a discussion.
184
Bob Coecke
Lambek made many more pioneering contributions, including significant contributions to linear logic, which we briefly get into in Sec. 5.2, and of course there also is the Curry-Howard-Lambek isomorphism [48, 49, 50, 51]. Just as in that context programming becomes an instance of category theory, something very similar is true in the case of DisCoCat for language. The need for structural understanding. Being from a pure mathematician, Lambek’s work on the ‘real world’ was heavily structurally driven (and so is ours in this paper). These days prediction-driven work drawing from big data has clearly taken the forefront. Allow us to share some reflections on that, by looking at some key historical scientific developments. In particular, undeniably, Natural Language Processing (NLP) has made great progress thanks to the great progress recently made in Machine Learning (ML). On the other hand it is fair to say that this hasn’t necessarily increased our structural understanding of language. It would be a major mistake to only follow the path where empirical success takes us, and ignore that which increases understanding, as we have learned the hard way from how (the most important ever) progress has been made in physics. The discovery of the theory of Newtonian mechanics came from the study of movements of planets and stars. This study was data driven as these movements are vital for ships on sea to find their way home. The longest surviving model was Ptolemy’s epicycle model. While there were Copernicus, Kepler, Galilei and Newton, it took until Einstein for that line of research to match Ptolemy’s correct predictions, as the latter accounted for relativistic effects. The reason was simply that for any anomaly one could always add a sufficient number of epicycles. This shows that while structural insights (cf. Earth not in the middle of the universe), may take time to get the predictions right, but in the end even make the data driven models obsolete.2 There is a deeper aspect to this story, namely: Why did we have to go to outer space to see mechanics in action in a manner that we could truly understand it? The simple reason is that the role of friction on earth is so imposing that one was rather led to take on the Aristotelian point of view that any movement requires force. Only in outer space we see frictionless movement. We think that the same is true for language: the manner it is used is full of deviations from the structural core. We believe that just like in the case of Newtonian mechanics, understanding that structural core will ultimately lead to much better predictive power. As an analogy, if ML would truly solve all of our problems, then one would expect it to be used to calculate trajectories of space ships and compute quantum spectra, learning from the available experimental data, but I personally don’t expect that we will dump non-Euclidean geometry and Hilbert space for these purposes, despite the 2
While neural networks have an ontological underpinning taking some inspiration from the human brain, the universal approximation theorem for neural networks [30], which states that one can approximate any continuous function on compact subsets of Rn , seems to be somewhat on par with the unrestricted adding of epicycles.
The Mathematics of Text Structure
185
fact that real physics calculations, for many reasons, all deviate from the ideal. That said, ML could probably have saved Copernicus a hell of a lot of time when analysing Brahe’s data. Why pregroups? Pregroups are easily the simplest model of grammar, and have a very simple graphical presentation as wires-structures. We think that such a simplicity can be very helpful, just like how Copernicus’ simple circle model allowed Kepler and Newton to finally understand movement on Earth. More specifically, just like friction obstructed us from discovering the laws of mechanics here on Earth, more sophisticated features of language as well as all kinds of cultural aberrations may also obstruct us from seeing the foundational structures of meaning. There may of course also be more fundamental arguments for using pregroups, and Lambek took his conceptual motivation for pregroups from psychology. Some computational linguists have strong feelings about which grammatical algebra to be used, and many think that Lambek got it wrong. But even then, Copernicus was wrong too, since planets don’t move on circles around the sun, but without him we would not have been where we are now. Some related works. The sentence type used in this paper was also used in the recent DisCoCat paper that introduces Cartesian verbs [24], and as discussed in [24] Sec. 2.3, precursors of this idea are in [35, 45, 44]. Also within the context of DisCoCat, the work by Toumi et al. [19, 77] involves multi-sentence interaction by relying on discourse representation structures [40], which comes at the cost of reducing meaning to a number. Textual context is also present in the DisCoCat-related papers [63, 86], although no sentence composition mechanism is proposed. Within more traditional natural language semantics research, dynamic semantics [37, 79] models sentence meanings as I/O-transitions and text as compositions thereof. However, the approach is still rooted in predicate logic, just as Montague semantics is, hence not accounting for more general meaning spaces, and also doesn’t admit the explicit type structure of diagrams/monoidal categories. Dynamic semantics is a precursor of dynamic epistemic logic (DEL) [8, 7] which we briefly address in Sec. 5.1; we expect that DEL, and generalisations thereof, may in fact emerge from our model of language meaning by considering an epistemics-oriented subset of meanings. In [69], static and dynamic vector meanings are explicitly distinguished, taking inspiration for the latter from dynamic semantics. There are many other logic-oriented approaches to text e.g. [5], of text organisation e.g. [59], and of the use of categorical structure.3 3
Despite our somewhat provocative stance towards ML there is a clear scope for combining DisCoCirc with ML-methods. Some work in this direction, for the case of DisCoCat, is [57].
186
Bob Coecke
2 Background: DisCoCat 2.1 diagrams Diagrams are made up of boxes: ... box
... each of which may have a number of inputs and outputs, and in this paper these boxes will typically be labeled by words or sentences. Boxes without either inputs or outputs may also occur, which we call states and effects respectively, and we depict them here as follows: ... state
effect
...
The inputs and outputs of boxes can then be connected by wires yielding general diagrams e.g.: state
box 1
box 2
What determines a diagram are (see also [23] Corollary 3.5 & Definition 3.8): • the connectedness of the wire-structure, and • the labels on wires and boxes. Diagrams can either be read upward, like how we build structures from the ground up, or downward, like how gravity causes downfall. I prefer the constructive view in which diagrammatic structures are built rather than where they emerge by letting the forces have a go. Unfortunately, the elders of our portion of spoken language disagreed, and for that reason, in this paper, diagrams will be read from top to bottom. In category-theoretic terms, diagrams live in monoidal categories where the wires correspond to objects and the boxes correspond to morphisms. Places where one can find easy-going introductions to the categories-diagrams connection include [26, 23] There are two particular kinds of diagrams that will play a role in this paper, both discussed in great detail in [23]. Circuits are diagrams obtained by composing boxes in parallel and sequentially. In category-theoretic terms they live in a symmetric monoidal category. They admit a clear flow of time, from inputs to outputs, as a circuit carries a causal structure with boxes as nodes and wires as causal relationships (see [23] Theorem 3.22). This in particular implies that an output of a box will always be connected to an input of a box in its ‘future’.
The Mathematics of Text Structure
187
String diagrams on the other hand allow for outputs to be connected to any input, and even to other outputs. Similarly, inputs can be connected to inputs. This craziness is enabled by the fact that string diagrams allow for cap- and cup-shaped wires:
One can also think of these cap- and cup-shaped wires as boxes, respectively as a state and an effect, and then string diagrams can be given the shape of a circuit ([23] Theorem 4.19). Conveniently, every one-input-one-output-box can be transformed into a two-output-state, and vice versa: box
→
state
box
→
state
and this is a bijective correspondence called box-state duality (see [23] Sec. 4.1.2). More general uses of caps/cups for converting types are referred to as transposition. In category-theoretic terms string diagrams live in a compact closed category (also called autonomous category) [46, 74]. We also usually assume that string diagrams can be flipped vertically: state
box 2
box 1
→
box 2
state
box 1
For example, if we flip a state, we get an effect, and vice versa, and if we flip a cap, we get a cup, and vice versa. Above we also flipped the text in the boxes, but this was just to make a point, and won’t do this anymore as this obstructs readability. In category-theoretic terms this flipping is called a dagger structure, or adjoints [2, 73] (see also [23] Sec. 4.3.1). Another particular kind of box are spiders [22, 23]. In category-theoretic terms they correspond to so-called dagger special commutative Frobenius algebras. We represent them by a dot with some input and output wires: ... ... The key property of spiders is that they fuse together: ... ...
... ... ...
= ...
188
Bob Coecke
An alternative way to think of these spiders is as multi-wires, which are generalised wires in that, rather than having two ends, can have multiple ends. What corresponds to fusion is that if two multi-wires are connected, then all the ends of one are also connected with all the ends of the other. As we will justify in Sec 2.4, the spider with three legs should be thought of as the logical AND: :=
AND
The one with a single leg should be thought of as discarding.
2.2 From grammar to wirings From the mathematics of sentence structure [3, 10, 47] we know that the ‘fundamental particles’ making up phrases and sentences are not words, but some basic grammatical types. The noun-type n, and the type of whole sentences s are examples of these basic types. On the other hand, the transitive-verbtype is not a basic type, but a composite one made up of two noun-types and one sentence-type. The precise manner in which these basic types interact depends on which categorial grammar one uses. We will adopt Lambek’s pregroups [52, 53], since it can be formulated diagrammatically. For each basic type x there are two corresponding ‘anti-types’, which we denote −1 x and x−1 . Think of these as a left and a right inverse. Then, in English, a transitive verb has type: −1
n · s · n−1
To understand this type, consider a transitive verb like hate. Simply saying hate doesn’t convey any useful information, until we also specify who hates who. That’s exactly the role of the anti-types: they specify that in order to form a meaningful sentence the transitive verb needs a noun on the left and a noun on the right, which then cancel out with the anti-types: Bob Alice hates n
−1 n·s·n−1
n
So both n · −1 n and n−1 · n vanish, and what remains is s, confirming that Alice hates Bob is a proper sentence (a.k.a. grammatically well-typed). So where are the diagrams? They depict the cancelations:
The Mathematics of Text Structure n
n and
−1 n
189 -1
n s n -1
n
n−1 and n cancel out
cancel out s is retained
For a more complex sentence (but with very similar meaning) like: Alice does not love Bob the wiring will be more complex [28]: n
-1
n
s
s -1 n -1 n
s
s -1 n -1 n
s
n -1
n
but the idea remains the same. In general, one can extract these wirings from the book [53], which assigns types to all grammatical roles. The main idea of DisCoCat is to think of these wires not just as a representation of an algebraic computation, but as a representation of how the meanings of the words making up the sentence interact. Representing wordmeanings as follows, also accounting for the types: Alice
hates
Bob
we can now apply the wiring to these as follows: Alice
hates
Bob
The wires now ‘feed’ the object and the subject into the verb in order to produce the meaning of the whole sentence. One should contrast this compositional model for word and sentence meanings to the bag-of-word-model still employed in distributional NLP and information retrieval [38], where as the name says, words in a sentence are treated just like a structureless bag.
2.3 Internal wirings of meanings Not only grammatical structure, but also certain meanings themselves can be represented using wiring. Examples include functional words, which play some kind of logical role and would cary no empirical data, and other words where a wiring structure provides a simplification. The internal structure of words can also help to better understand the overall meaning of a sentence.
190
Bob Coecke
2.3.1 Functional words An example from [64, 28] is the sentence: not does NOT
Alice
love
Bob
We see that both does and not have been represented using wires:
NOT
does being entirely made up of wires, while not also involving a ¬-labeled box which represents negation of the meaning. That these wirings are well-chosen becomes clear when we yank them: Alice
loves
Bob
NOT
i.e. we get the negation of the meaning of Alice loves Bob. An example from [67, 68] uses spiders for relative pronouns: who She
hates
Bob
Simplification now yields: [...] hates Bob hates
Bob
She AND
i.e. the conjunction of she (i.e. being female) and the property [...]hates Bob, which is indeed again the intended meaning of the sentence. Building further on the idea that the merge spider represents AND, in [42] an account was given of coordination between identical syntactic types.
2.3.2 Adjectives and to be Intersective adjectives [39] are adjectives which leave a noun unaltered except for specifying an additional property, e.g. red car, hot chilies or sad Bob, as opposed to crashed car, rotten chilies or dead Bob. While a general adjective has a composite type e.g.:
The Mathematics of Text Structure
191 dead
Bob
the type of an intersective adjective can be reduced to a single wire [13]: sad Bob
yielding a conjunction: sad
Bob AND
Closely related to adjective is the verb to be, since sad Bob and Bob is sad convey the same meaning. Of course, the overall type of these two statements is different, being a noun and a sentence respectively, but we will see later that this difference vanishes when we move from DisCoCat to DisCoCirc. Accepting Bob is sad to be a noun, the following internal wiring of the verb to be is induced by the one of intersective adjectives: is Bob
sad
Bob
=
sad
sad
Bob
=
2.3.3 Compact verbs Recently, in [24], an internal wiring for a special type of verbs was proposed. The main idea is that the verb’s only role is to impose an adjective verbs on the subject and an adjective verbo on the object. For example, paints will put a paintbrush in the subject’s hand, and makes the object change colour. For many verbs this description can be used as an adequate first approximation. So a transitive verb is semi-Cartesian if it has the following internal wiring: verbs verbo
(1)
s
As indicated in the picture, this wiring implies that its sentence type consists of two noun-types, which is a very natural choice to make. However, a verb like being married to is clearly not of that form, as it expresses an entanglement of the specific subject and the specific object. The natural generalisation of the idea of semi-Cartesian verbs is then:
192
Bob Coecke marries
(2)
s
This representation was first used in [35], and also in [44], and we call it the compact representation of transitive verbs.
2.4 Models of meaning Given that diagrams live in abstract (a.k.a. axiomatic) categories, they allow for a wide range of concrete models. It suffices to pick a concrete category that has cups and caps, and then wires become the objects (a.k.a. spaces) and boxes become the morphisms (a.k.a. maps between these spaces). In NLP, the vector space model takes wires to be spaces of distributions and boxes to be linear maps. The distributions are empirically established, by means of counting co-occurences with a selected set of basis words [71]. Adjusting this model to DisCoCat, the cups and caps are:
:=
|ii
:=
i
ii|
i
and spiders are in one-to-one correspondence with orthonormal bases [27]. Explicitly, given an orthonormal basis {|i}i the spiders arise as follows: ... := ...
|i . . . ii . . . i|
i
Hence, caps and cups are instances of spiders, and so are copy and merge: :=
i
|iii|
:=
|iii|
i
Another model employed in DisCoCat instead considers sets and relations [28]. By thinking of relations as Boolean-valued matrices this model is closely related to the previous one, and can be thought of as ‘possibilistic’ distributions (contra ‘probablilistic’). This particular model also justifies to interpret merge as AND, as in the model states are subsets, and merge then corresponds to intersecting these states. One can also take wires to be spaces of density matrices and boxes to be superoperators. This model was initially introduced to account for ambiguity of meaning [41, 61, 62], and was also used to capture lexical entailment [6, 9].
The Mathematics of Text Structure
193
It will play a central role in this paper, although with a somewhat different interpretation. Again, density matrices can be established empirically [62]. Another recently developed model takes the convex subsets of certain convex spaces [13] to be the wires, following Gardenfors’ conceptual spaces program [32]. This model represents meanings in a manner that appeals directly to our senses. A plethora of generalisations thereof are in [20]. Again, empirical methods can be used to establish meanings.
2.5 Comparing meanings Once we have computed the meanings of sentences in a concrete model, we can compare these meanings. Here are two examples of doing so: Similarity. Establishing similarity is one of the standard tasks in NLP, and one does this in terms of a distance-measure: the less the distance, the more similar meanings are. One can use the inner-product (or some function thereof), which for sentences σ1 or σ2 is diagrammatically denoted as: σ1 σ2
as one can indeed think of an inner-product as the composition of one state with the adjoint of another state (cf. Sec. 2.1). This manner of comparing meanings generalises to arbitrary dagger compact closed categories, such as the category of density matrices and completely positive maps. In the concrete representation of density matrices we obtain:
T r (σ2 σ1 ) =
σ1 σ2
=
σ ¯1
σ2
where we used the fact that density matrices are self-adjoint, and that the transpose of the adjoint is the conjugate (which is indicated by the bar). Graded entailment. One may want to know if one meaning entails another one. Given the noisiness of empirical data, a useful strict entailment relation might be hard to achieve. Instead, a graded entailment relation that tells us the degree to which one meaning entails another one is more useful. Strict entailment relations correspond to partial orderings, and a graded ones correspond to a labeled extension thereof. Still, many models of meaning in use, like the vector space model, don’t even admit a natural non-trivial graded entailment structure, and it is here that density matrices have a role to play. As shown in [9], for those such a structure does exist and is well-studied, namely the L¨owner ordering for positive matrices [58]:
194
Bob Coecke
σ1 ≤k σ2 ⇔ σ2 − k σ1 is positive It is useful to play around a bit with the scaling of the density matrices. If one normalises density matrices by setting the trace to 1, then there are no strict comparisons. On the other hand, when one sets the largest eigenvalue to 1, then we get for the specific case of projectors (i.e. scaled density matrices with all non-zero eigenvalues the same): σ1 ≤ σ2 ⇔ σ1 ◦ σ2 = σ1 just like in Birkhoff-von Neumann quantum logic [12], which is then naturally interpreted as propositional inclusion. Some alternative scalings are in [78].
3 Features and flaws of DisCoCat Here is a summary of the main features of DisCoCat: Feature 1. The initially key identified feature of DisCoCat was that meanings of sentences with different grammatical structure still live in the same space, something that is crucial for comparing meanings (cf. Sec. 2.5). Earlier approaches that combined grammar and meaning, most notably, Smolensky’s connectionist cognitive architecture [75], did not have this feature. The quest for a model that does so was put forward in [15]. Feature 2. The DisCoCat algorithm that assigns meaning to sentences given the meanings of its words and its grammatical structure can be presented as an intuitive diagram that clearly shows how word-meanings interact to produce the meaning of the sentence. Feature 3. Wire-structure can be used in DisCoCat to provide meanings of functional words as we did in Sec. 2.3 (while in standard NLP they are usually treated as noise), and to simplify the representation of words with composite types like adjectives and verbs as we did in Secs. 2.3.2 and 2.3.3. Feature 4. While in this paper we used pregroups, as argued in [21, 34], DisCoCat also supports other categorial grammars such as standard Lambek calculus [47], Lambek-Grishin calculus [36] and CCG [76]. Feature 5. As discussed in Sec. 2.4, in DisCoCat word meanings can live in many kinds of spaces provided these organise themselves in a monoidal category that matches the structure of the grammar. Feature 6. DisCoCat allows for integrating grammar and meaning in one whole. In the above we indeed had examples of compositional structure entering meaning-boxes, which then interact with the grammatical structure. 4 4 Admittedly, more work needs to be done for further exploiting this feature. Crucially, while initial formulations of DisCoCat either used a categorical product or a functor in order
The Mathematics of Text Structure
195
Feature 7. Contra the bag-of-words model in NLP [38], the spaces for different grammatical types vary, which reflects the fact that their functionality within sentences is very different. In other words, meaning spaces are typed, with all the usual advantages. If words can play different grammatical roles, e.g. both as noun and adjective, then there also are canonical ways for interconverting these, e.g. a noun becomes an adjective as follows: red
→
red
Feature 8. Proof-of-concept experiments showed that DisCoCat outperformed its competitors for certain academic benchmark tasks [35, 43]. As already indicated in the introduction, DisCoCat has some shortcomings: Flaw 1. DisCoCat does not answer the question of how the meanings of sentences compose in order to provide the meaning of an entire text. Flaw 2. DisCoCat assumes words to have a fixed meaning, while in text meanings will typically evolve. Flaw 3. DisCoCat doesn’t determine the sentence type. We will now resolve each of these flaws in one go!
4 Composing sentences: meet DisCoCirc We will represent the |σi | words in a sentence σi as a horizontal string, and the |τ | sentences in a text τ as a vertical stack: |σ |
Word11 . . . word1 1 . . . . |σ | Word1|τ | . . . word|τ |n .
4.1 Naive composition of sentences for DisCoCat In DisCoCat, each of the sentences (cf. those in Sec. 2) is a state, i.e. they have a single output of sentence-type, and no input. This substantially restricts the manner in which we can compose them. The structure available to us in to combine grammar and meaning [28, 64], the way forward is to assume a compositional structure encompassing both grammar and aspects of meaning.
196
Bob Coecke
DisCoCat are wires and spiders. The desirable thing to do is to also rely on this structure for composing sentences, so that word-meaning composition can interact with sentence-meaning composition. But then, pretty much the only thing one can do is to take the conjunction of all sentences: 1st sentence
σ1
2nd sentence
σ2
.. . 3rd sentence
(3)
AND
σn AND
We could call this the bag-of-sentences model, since without changing the diagram we can flip the vertical order of sentences, or not even give one: σn
.. . σ2
σ1
σ2
...
σn
σ1
An example where this makes perfect sense is: It is cloudy. Liverpool has beaten Napoli. Brexit has become a total mess. as these sentences clearly commute. A non-example is: Bob is born. Bob drinks beer. Bob dies. Here one could still argue that the meaning of the sentences now dictates their ordering, so the latter could be extracted even if the sentences arrive in a bag, of course, at the cost of having to know the meanings of all words. The argument completely breaks down here: Add egg yolk and salt. Whisk mix for 20 seconds. Add mustard and acid. Whisk mix for 30 seconds. Slowly add oil while whisking. as the order of adding ingredients is key to making good mayonnaise. Changing the order still would result in a meaningful recipe, but not mayonnaise.
The Mathematics of Text Structure
197
More importantly, what is also clear from this example is that a lot more is going on besides the order of things: the ingredients and actions interact with each other similarly to how words in a sentence interact with each other as described in Sec. 2. Our bag-of-sentence-model doesn’t reflect any of that. We will now make that interaction structure explicit, while still only making use of the structure available to us in DisCoCat. Of course, something will need to change, and rather than a conservative extension of the DisCoCat framework, we introduce a fundamentally modified framework, DisCoCirc, while retaining the features of the DisCoCat framework listed in Sec. 3.
4.2 Sentences as I/O-processes Consider the following example: Alice is a dog. Bob is a person. Alice bites Bob. Clearly, the meaning of the third sentence crucially depends on what we learn about the meaning of the nouns Alice and Bob in the first two sentences, turning dog bites man into man bites dog if Bob were to be a dog and Alice were to be a person. Also, before the 1st sentence is stated, Alice is just a meaningless name, and the same goes for Bob until the 2nd sentence is stated. So the meaning of Alice and Bob evolves as the text progresses, and it is the sentences that update our knowledge about Alice and Bob. What we propose is that the 3rd sentence, which would look like: noun as state
Alice
bites
Bob
noun as state
no canonical sentence type
in DisCoCat, would instead be drawn like this: Alice
Bob bites
noun meaning as wire sentence type has Alice-wire
noun meaning as wire sentence type has Bob-wire
So in particular, the nouns Alice and Bob are now not states but wires (a.k.a types) and the sentence is an I/O-box: Alice bites Bob
(4)
with the nouns Alice and Bob both as inputs and as outputs. In this way, the sentence can act on the nouns and update their meanings. Hence:
198
Bob Coecke
A sentence is not a state, but a process, that represents how words in it are updated by that sentence. For the remainder of this paper we will restrict ourselves to updating nouns, but the same applies to other word-types. Using the wirerepresentation of the verb to be of Sec. 2.3.2, the wire-representation of Alice is a dog becomes: Alice
is
Alice dog
dog
=
and similarly, that of Bob is a person becomes: Bob person
In Sec. 2.3 we mentioned that while our treatment of to be leads to a nountype, this wouldn’t be a problem anymore in DisCoCirc. And indeed, within our new sentences-as-processes realm we obtain the same type as the sentence (4) simply by adjoining a ‘passive’ wire: Alice
Alice
Bob
Bob person
dog
This passive wire stands for the fact that the ancillary noun is part of the text as a whole, but doesn’t figure in this particular sentence. By adding it, the I/O-types of all the sentences are the same, so they can be composed: Alice 1st sentence
Bob
dog
(5) person
2nd sentence
3rd sentence
bites
So in general, given a text, we end up with a wire diagram that looks like this:
The Mathematics of Text Structure
199 σ1 σ2 σ3
(6)
σ4 σ5
where the sentences themselves also have a wire diagram. In particular, it’s a process, and this process alters our understanding of words in the text. This yields another slogan: Text is a process that alters meanings of words Using the form (2) for biting we obtain for Alice bites Bob: Alice
Bob Alice
Bob
bites bites
=
(7)
Diagram (5) now simplifies to: Alice
Bob person
dog
(8) bites
and is clearly distinct from the case of man bites dog, which would be: Alice
Bob
person
dog
bites
One thing we can also now do now is show that different texts can have the same meaning. For example, the single sentence: Alice who is a dog bites Bob who is a person. which has as its diagram:
200
Bob Coecke
Alice
Bob
who
bites
is
who
is person
dog
can, using spider-fusion, be directly ‘morphed’ into the diagram: Alice
Bob is dog
is person
bites
One thing we may be aiming for is a network that shows how the different nouns are related, or maybe just whether they either are related, or not at all. How the network is connected will depend on, for example, the kinds of verbs that appear in the text, and which subject-object pairs they connect. A further simplification can be made if we only require a binary knowledge, e.g. knows vs. doesn’t know, which we can respectively represent as: knows
doesnt know
In the resulting network we then get clusters of connected nouns.
4.3 The use of states In the main example of Sec. 4.2 we made use of states to represent dog and person. The reason we use states for them is that the text doesn’t help us understand these nouns. In contrast, the text is all about helping us understand Alice and Bob and their interactions. So we can clearly distinguish two roles for nouns:
The Mathematics of Text Structure
201
• Static nouns: the text does not alter our understanding of them. • Dynamic nouns: the text does alter our understanding of them. This distinction between dynamic and static nouns may seem somewhat artificial, and indeed, it exists mainly for practical purposes. From a foundational perspective the natural default would be to let all nouns be dynamic, and not just nouns but all words, since also adjectives and verbs may be subject to change of meaning. However, taking only some nouns to be static is a very reasonable simplification given that in a typical text the meaning of many other words would not alter in any significant manner. Doing so significantly simplifies diagrams, and in particular their width. This does give rise to the practical question of how to decide on the ‘cut’ between the dynamic and static nouns. We briefly address this in Sec. 6.2. Also in the main example of Sec. 4.2, we had no prior understanding of Alice nor Bob.5 In general we may already have some prior understanding about certain nouns. One way to specify this is by means of initial states, to which we then apply the circuit representing the text: ω1
ω2
ω3 σ1
σ2 σ3 σ4 σ5
where the initial states represented by plain dots stand for the case of no prior understanding (cf. Alice and Bob earlier). Without changing the circuit we can also put them where they enter the text: σ1 ω1
ω2 σ2
ω3 σ3 σ4 σ5
Of course, once we insert initial states, we cannot precompose with other text anymore. A straightforward way to avoid this problem is by instead using initial processes: 5
This is in particular the case given that the names Alice and Bob are gender neutral (cf. Alice Cooper and Bob in Blackadder II & IV).
202
Bob Coecke
ω
which adjoin understanding to a type just like adjectives do.
4.4 DisCoCat from DisCoCirc We now show that DisCoCat is an instance of DisCoCirc. Assuming that (1) text is restricted to a single sentence, and that (2) nouns are static, we exactly obtain DisCoCat sentences. Alternatively, assuming that (2’) dynamic nouns have an initial state, we also obtain a DisCoCat sentence: ωs
ωo ωv
ωs
=
ωv
ωo
4.5 Individual and subgroup meanings In many cases one would just be interested in the meaning of a single dynamic noun, rather than the global meaning of a text. Or, maybe one is interested in the specific relation of two or more dynamic nouns. The way to achieve this is by discarding all others. For example, here we care about the meaning of the 2nd dynamic noun: σ1 ω1
ω2 σ2
ω3 σ3 σ4 σ5
discarded
while here we care about the relationship between the 2nd and the 5th one:
The Mathematics of Text Structure
203
σ1 ω1
ω2 σ2
ω3 σ3 σ4 σ5
The latter can for example teach us if two agents either agree or disagree, or, either cooperate or anti-cooperate. In the case of three or more agents it can tell us more refined forms of interaction, e.g. do they pairwise cooperate or globally (cf. the W-state vs. the GHZ-state in quantum theory [31]). Subgroups may also arise naturally, when agents vanish from the story e.g. by being murdered. In that case discarding instead of being of an epistemic nature is actual ontic vanishing: σ1 ω1
ω2 σ2
ω3 σ3
vanishes
σ4 σ5
This vanishing can even be a part of the verb structure for those verbs that induce the vanishing of an object, for example: Alice
Bob kills
(9)
Of course, if it remains of importance who the actual killer is, then we shouldn’t use the simple semi-Cartesian verb structure.
4.6 Example Consider the following text:6 6
Loosely adapted from “C’era una volta il West”, Sergio Leone, 1968.
204
Bob Coecke
Harmonica (is the brother of) Claudio. Frank hangs Claudio. Snaky (is in the gang of) Frank. Harmonica shoots Snaky. Harmonica shoots Frank. As a diagram this becomes: Snaky
Harmonica
Claudio
Frank brother
hangs
gang
shoots
shoots
Using spider-fusion, transposition of states into effects, and identifying 2legged spiders with either caps, cups or plain wires, simplifies this to: Harmonica
Snaky
Frank
Claudio
brother
hangs gang
(10)
shoots shoots
Notice that one party induces the effect, while the other one is subjected to termination, matches the grammatical subject-object distinction. There
The Mathematics of Text Structure
205
are indeed two dimensions to diagram (10), a static one, representing the connections between the dynamic nouns: Snaky
Harmonica
sh
oo
ts
gang
brother
shoots
hangs Claudio
Frank
as well as a temporal-causal structure associated to these.
4.7 Other cognitive modes The mathematical formalism presented here for text structure may be equally useful for modelling other cognitive modes, not just the linguistic one. One obvious example is the visual mode, which we can think of as movies. Here dynamic nouns correspond to the characters of the movie, and sentences to scenes. The grammatical structure then corresponds to interactions of characters. For example, the scene:
corresponds to the sentence: Harmonica shoots Frank. The subject corresponds to Harmonica (played by Bronson), the object to Frank (played by Fonda) and the verb is the shooting of Frank by Harmonica:
206
Bob Coecke characters
Harmonica
Frank
shooting
shoots
being shot
dead Frank alive Harmonica
More broadly the verb corresponds to the interaction of the characters. Text corresponds to sequences of scenes, which ‘act’ on the characters that take part in it, hence forming a circuit. Having a matching diagrammatic formalism for text and for movies allows one then to make translations between these, via the corresponding diagrams. For example, we can translate the example of Sec. 4.6 to a movie: Harmonica
(10)
Snaky
Frank
→
where the snapshots represent the entire scene they are part of.
Claudio
The Mathematics of Text Structure
207
5 Logic and language Propositional logic emerged from language, translating words like and, or and not into logical connectives AND, OR and NOT. under the impetus of Aristotle and others. DisCoCirc re-enforces that link with several branches of modern non-classical logics. Here are two proof-of-concept examples.
5.1 Dynamic epistemic logic from language Epistemic logic is concerned with how one represents knowledge in logical terms, and dynamic epistemic logic (DEL) [8, 7] how this logic gets updated when acquiring new knowledge, e.g. from communication, using language. Since in DisCoCirc we have a build-in update mechanism, one may expect that DEL-update could emerge from DisCoCirc-update, and hence DEL could directly emerge from language structure. This seems indeed to be the case. In order to establish this, the types Alice and Bob will now represent the knowledge of those agents, rather than what we know about them. Sentences describing communication of knowledge typically involves a doubly transitive verb (i.e. one that both has a direct and an indirect object), or alternatively, a preposition like to. For example: Alice tells Bob (a) secret. Alice tells (a) secret to Bob. As we haven’t proposed a wiring yet for a doubly transitive verb nor for to, we will do so now, and we will also give an internal wiring for tells specific to this epistemic context. Grammatical wirings are taken from [53]. Setting: doubly transitive tells
Alice
Bob secret
transitive Alice
to
tells secret
indeed result in the same simplified diagram:
Bob
208
Bob Coecke Alice
Bob
Alice secret
=
Bob
secret
(11)
From this then follows an obvious wiring of knows: Bob
knows secret
This is the same wiring as we had for is, which makes sense, since being in an ontic context translates to knowing in an epistemic context. We hope to further develop this link in a dedicated forthcoming paper, being guided by the conviction that a diagrammatic framework for DEL can be established that directly draws from spoken language, and that moreover allows ust to accommodate a wide variety of models beyond the propositional and probabilistic ones.
5.2 Linear and non-linear and Another feature of DisCoCirc is that it dictates different representations of and, namely, when either conjoining properties that subjects possess, or, conjoining the subjects themselves. In linear logic (LL) lingo [33, 72], these respectively correspond to a linear conjunction and a non-linear conjunction. Hence, these different representations correspond to uses of and with different meanings. Consider the sentence: Alice wears (a) hat and (a) scarf. and the sentence: Alice and Bob wear (a) hat. The fundamental difference between these sentences is that: • Alice wearing both a hat and a scarf only requires one Alice, while, • Alice and Bob both wearing a hat requires two hats. In the case of the former we assign two properties to a single agent, namely wearing a hat and wearing a scarf, while in the case of the latter a single property, wearing a hat, is attributed to two agents. This means that we require (a.k.a ‘consume’ in LL lingo) the property wearing a hat twice, hence, it needs to be copied. Put differently, non-linear conjunction allows for copying, so it is the AND we have in classical propositional logic. A physics analogy would be that and in the 1st sentence refers to having two physical particles e.g. a proton and electron, while and in the 2nd sentence lists two properties of a single particle, e.g. position and velocity.
The Mathematics of Text Structure
209
The non-linear and is what we have been using until now all the time in this paper by means of spiders. So it is the linear and that needs an alternative treatment. We will do what is standard when representing two things in a string diagrams (see [23] Section 3.1.1), namely putting two wires side-by-side. So the different representations of AND look as follows: hat
scarf
Alice Bob
Then we get for the 1st sentence: Alice wears
hat
scarf
and for the 2nd sentence we instead have: Alice Bob wear
hat
where internally in wear some copying must happen: wear wears wears
The difference is also apparent in how each of the sentences can be decomposed in two sentences, which in the case of the 1st one yields a sequential composition, and in the case of the 2nd one a parallel composition: Alice wears hat Alice wears scarf
Alice wears hat Bob wears hat
6 Concrete models We gave the beginnings of a compositional structure of word and sentence composition, with (multi-)wires and boxes as primitives, and illustrated how
210
Bob Coecke
one reasons with these in the absence of concrete models. We now provide some ideas for which kinds of models are particularly suitable for DisCoCirc.
6.1 Sketch of a concrete model As we are dealing with updating and corresponding information gains, the vector space model of NLP (see Section 2.4) won’t do. Density matrices do have a clear notion of information gain, and for this reason form the basis of quantum information theory (see e.g. [11]). We now describe the ingredients of a DisCoCirc model based on density matrices. States, e.g. word meanings: state
... are density matrices. An example of a state is the maximally mixed state of quantum theory, which has a density matrix corresponding to the (scaled) identity, and represents the state of no information whatsoever.
The states of perfect information correspond to the pure states of quantum theory, that is, matrices arising as doubled vectors |ψψ|. The effect:
corresponds to the trace. General processes, e.g. sentence meanings: ... box
... correspond to trace-preserving completely positive maps. Pure processes are those that send pure states to pure states, and these arise from linear maps as the Krauss forms f † ◦ − ◦ f . In this manner, the spiders: ... ... of Sec. 2.4 become part of this model too. However, there are reasons to move away from these particular choices of spiders, and even part of the axioms for spiders. We give some suggestions here of some potential alternatives. The cups and caps of Sec. 2.4 remain perfectly ok, so they respectively become: :=
ij
|iijj|
:=
ii| ◦ − ◦ |jj
ij
The main role of merge is to assign properties. One could set:
The Mathematics of Text Structure
211
:= Px ◦ − ◦ Px
x
(12)
where Px is the orthogonal projector on the subspace corresponding to property x. As suggested in [83, 82] it is indeed natural to think of subspaces as properties, just like in Birkhoff-von Neumann quantum logic [12] which we already mentioned earlier. This operation inherits associativity from diagram composition, which is an obvious minimal requirement. It is not commutative, but that also makes perfect sense when thinking of the changing colours of a chameleon, where post-composition should discard previous colours. After re-scaling projectors become a special case of density matrices, and using spectral decomposition ρx = i pi Pi one can associate properties to general density matrices, for example as follows: x
:=
pi Pi ◦ − ◦ Pi
i
In the follow-up paper [25] we present a class of similar generalisations.
6.2 Computing text meaning The following steps produce the data needed to derive text meaning: 1. Identify sentences using punctuation. 2. Establish grammatical types of all words using standard parsers. 3. Identify the dynamic nouns. As this concerns a new concept, this is also a new task and hence will need additional research. One could rely to some extent on grammar and multiplicity of occurrence throughout the text, but actual meaning will likely also play a role. For certain problems the dynamic nouns may be a given, namely those that are of particular interest as part of the statement of the problem, for example, when analysing the relationship of certain parties of particular interest. 4. Form a diagram (see Sec. 4.2): • The dynamic nouns are the systems of the circuit. • The sentences are the gates of the circuit. • The internal wiring of the gates is given by the grammar. 5. Establish meanings of states, which can be done using standard methods, or those previously developed for DisCoCat. In order to obtain the actual meaning of the text: 6. Insert all meanings into the diagram.
212
Bob Coecke
7. For computing the resulting (possibly simplified) diagram, one way to do so is to decompose the diagram in tensor products and sequential compositions of boxes, caps/cups and dot operations. A more direct manner for computing diagrams is outlined in Theorem 5.61 of [23].
6.3 Comparing texts To compare texts we can simply rely on what we did in DisCoCat (see Sec. 2.5), provided we use initial states (see Sec. 4.3) so that the meaning of the text as a whole becomes a state, or, compute similarity as follows: τ¯1
τ2
using a generalisation of the Hilbert-Schmidt product, i.e. the inner-product applied to the states arising from box-state duality (see Sec. 2.1): τi
(13)
Also graded entailment is obtained as in DisCoCat (see Sec. 2.5) when using initial states, or, representation text as a state as in (13).
7 Physical embodiment In the abstract we mentioned that while the developments in this paper are independent of a physical embodiment, most notably a classical vs. a quantum embodiment, both the compositional formalism and the suggested concrete model of meaning of Section 6.1 are highly quantum-inspired. The compositional structure is directly imported from quantum theory [14, 17, 18], the suggested concrete model of meaning employs the density matrices which von Neumann designed specifically for quantum theory [80], and also our suggested alternatives for spiders belong to an area of current activity in quantum foundations (see e.g. [55, 56, 29, 4, 70]), which aims for a quantum analog of Bayesian inference theory. Therefore, it should come as no surprise that implementation of DisCoCirc on a quantum computer would come with a wide range of benefits. For example, as pointed out in [87], classically the required space resources grow exponentially in the number of dynamic nouns, and this exponential growth could vanish on a quantum computer. Similarly, density matrices substantially increase the space required to represent meanings, while for a quantum
The Mathematics of Text Structure
213
computer they come for free. Regarding time resources, quantum computational speed-ups have already been identified for DisCoCat [87], by exploiting progress in quantum machine learning [85], and these straightforwardly carry over to DisCoCirc. Expect many dedicated publications on further advantages of implementing DisCoCirc on a quantum computer to be forthcoming, and in fact, quantum natural language processing (QNLP) may become one of the leading areas of the so-called NISQ era [65], given its tolerance for imperfection [87]. Currently, efforts are under way to implement the quantum algorithm of [87] on a simulator [60], and very recently, another paper appeared [84] that is entirely dedicated to QNLP.
Thanks We thanks Dan Marsden, Dusko Pavlovic and Alexis Toumi for valuable discussions on the content of the paper, including DM contributing “The gloves are off!” slogan as his interpretation of the DisCo-ing cats cartoon. Phil Scott also provided corrections and suggestions. The SYCO 3, QPL and ACT referees also provided useful feed-back including pointers to related work, which was also provided by Valeria de Paiva, Graham White, Patrik Eklund and Alexandre Rademaker. We are grateful to Ilyas Khan for the additional motivational context within which this paper was produced.
References 1. Abramsky, S., and Coecke, B. A categorical semantics of quantum protocols. In Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science (LICS) (2004), pp. 415–425. arXiv:quant-ph/0402130. 2. Abramsky, S., and Coecke, B. Abstract physical traces. Theory and Applications of Categories 14, 6 (2005), 111–124. arXiv:0910.3144. 3. Ajdukiewicz, K. Die syntaktische konnexit¨ at. Studia Philosophica 1 (1937), 1–27. 4. Allen, J.-M. A., Barrett, J., Horsman, D. C., Lee, C. M., and Spekkens, R. W. Quantum common causes and quantum causal models. Phys. Rev. X 7 (2017), 031021. 5. Asher, ., and Lascarides, A. Logics of conversation. Cambridge University Press, 2003. 6. Balkir, E., Sadrzadeh, M., and Coecke, B. Distributional Sentence Entailment Using Density Matrices. Springer International Publishing, Cham, 2016, pp. 1–22. 7. Baltag, A., Coecke, B., and Sadrzadeh, M. Algebra and sequent calculus for epistemic actions. Electronic Notes in Theoretical Computer Science 126 (2005), 27–52. 8. Baltag, A., Moss, L., and Solecki, S. The logic of public announcements, common knowledge, and private suspicions. In Proceedings of the 7th conference on Theoretical aspects of rationality and knowledge (1998), Morgan Kaufmann Publishers Inc., pp. 43–56. 9. Bankova, D., Coecke, B., Lewis, M., and Marsden, D. Graded hyponymy for compositional distributional semantics. Journal of Language Modelling 6, 2 (2019), 225–260. 10. Bar-Hillel, Y. A quasiarithmetical notation for syntactic description. Language 29 (1953), 47–58.
214
Bob Coecke
11. Bennett, C. H., and Shor, P. W. Quantum information theory. IEEE Trans. Inf. Theor. 44, 6 (Sept. 2006), 2724–2742. 12. Birkhoff, G., and von Neumann, J. The logic of quantum mechanics. Annals of Mathematics 37 (1936), 823–843. 13. Bolt, J., Coecke, B., Genovese, F., Lewis, M., Marsden, D., and Piedeleu, R. Interacting conceptual spaces I. In Concepts and their Applications, M. Kaipainen, A. Hautam¨ aki, P. G¨ ardenfors, and F. Zenker, Eds., Synthese Library, Studies in Epistemology, Logic, Methodology, and Philosophy of Science. Springer, 2018. to appear. 14. Clark, S., Coecke, B., Grefenstette, E., Pulman, S., and Sadrzadeh, M. A quantum teleportation inspired algorithm produces sentence meaning from word meaning and grammatical structure. Malaysian Journal of Mathematical Sciences 8 (2014), 15–25. arXiv:1305.0556. 15. Clark, S., and Pulman, S. Combining symbolic and distributional models of meaning. In Proceedings of AAAI Spring Symposium on Quantum Interaction (2007), AAAI Press. 16. Coecke, B. Kindergarten quantum mechanics. In Quantum Theory: Reconsiderations of the Foundations III (2005), A. Khrennikov, Ed., AIP Press, pp. 81–98. arXiv:quantph/0510032. 17. Coecke, B. An alternative Gospel of structure: order, composition, processes. In Quantum Physics and Linguistics. A Compositional, Diagrammatic Discourse, C. Heunen, M. Sadrzadeh, and E. Grefenstette, Eds. Oxford University Press, 2013, pp. 1 – 22. arXiv:1307.4038. 18. Coecke, B. From quantum foundations via natural language meaning to a theory of everything. In The Incomputable: Journeys Beyond the Turing Barrier, S. B. Cooper and M. I. Soskova, Eds., Theory and Applications of Computability. Springer International Publishing, 2017, pp. 63–80. arXiv:1602.07618. 19. Coecke, B., De Felice, G., Marsden, D., and Toumi, A. Towards compositional distributional discourse analysis. In Proceedings of the 2018 Workshop on Compositional Approaches in Physics, NLP, and Social Sciences, Nice, France, 2-3rd September 2018 (2018), M. Lewis, B. Coecke, J. Hedges, D. Kartsaklis, and D. Marsden, Eds., vol. 283 of Electronic Proceedings in Theoretical Computer Science, Open Publishing Association, pp. 1–12. 20. Coecke, B., Genovese, F., Lewis, M., and Marsden, D. Generalized relations in linguistics and cognition. In Logic, Language, Information, and Computation - 24th International Workshop, WoLLIC 2017, London, UK, July 18-21, 2017, Proceedings (2017), J. Kennedy and R. J. G. B. de Queiroz, Eds., vol. 10388 of Lecture Notes in Computer Science, Springer, pp. 256–270. 21. Coecke, B., Grefenstette, E., and Sadrzadeh, M. Lambek vs. Lambek: Functorial vector space semantics and string diagrams for Lambek calculus. Annals of Pure and Applied Logic 164 (2013), 1079–1100. arXiv:1302.0393. 22. Coecke, B., and Kissinger, A. Categorical quantum mechanics II: Classical-quantum interaction. International Journal of Quantum Information 14, 04 (2016), 1640020. 23. Coecke, B., and Kissinger, A. Picturing Quantum Processes. A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press, 2017. 24. Coecke, B., Lewis, M., and Marsden, D. Internal wiring of cartesian verbs and prepositions. In Proceedings of the 2018 Workshop on Compositional Approaches in Physics, NLP, and Social Sciences, Nice, France, 2-3rd September 2018 (2018), M. Lewis, B. Coecke, J. Hedges, D. Kartsaklis, and D. Marsden, Eds., vol. 283 of Electronic Proceedings in Theoretical Computer Science, Open Publishing Association, pp. 75–88. 25. Coecke, B., and Meichanetzidis, K. Meaning updating of density matrices. arXiv:2001.00862 (2020). ´ O. Categories for the practicing physicist. In New Struc26. Coecke, B., and Paquette, E. tures for Physics, B. Coecke, Ed., Lecture Notes in Physics. Springer, 2011, pp. 167– 271. arXiv:0905.3010.
The Mathematics of Text Structure
215
27. Coecke, B., Pavlovi´ c, D., and Vicary, J. A new description of orthogonal bases. Mathematical Structures in Computer Science, to appear 23 (2013), 555–567. arXiv:quantph/0810.1037. 28. Coecke, B., Sadrzadeh, M., and Clark, S. Mathematical foundations for a compositional distributional model of meaning. In A Festschrift for Jim Lambek, J. van Benthem, M. Moortgat, and W. Buszkowski, Eds., vol. 36 of Linguistic Analysis. 2010, pp. 345–384. arxiv:1003.4394. 29. Coecke, B., and Spekkens, R. W. Picturing classical and quantum bayesian inference. Synthese 186, 3 (2012), 651–696. 30. Cybenko, G. Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems 2 (1989), 183–192. 31. D¨ ur, W., Vidal, G., and Cirac, J. I. Three qubits can be entangled in two inequivalent ways. Physical Review A 62, 062314 (2000). 32. G¨ ardenfors, P. The Geometry of Meaning: Semantics Based on Conceptual Spaces. MIT Press, 2014. 33. Girard, J. Y. Linear logic. Theoretical Computer Science 50, 1 (1987), 1–101. 34. Grefenstette, E. Category-Theoretic Quantitative Compositional Distributional Models of Natural Language Semantics. PhD thesis, University of Oxford, 2013. 35. Grefenstette, E., and Sadrzadeh, M. Experimental support for a categorical compositional distributional model of meaning. In The 2014 Conference on Empirical Methods on Natural Language Processing. (2011), pp. 1394–1404. arXiv:1106.4058. 36. Grishin, V. N. On a generalization of the Ajdukiewicz-Lambek system. Studies in nonclassical logics and formal systems (1983), 315–334. 37. Groenendijk, J., and Stokhof, M. Dynamic predicate logic. Linguistics and philosophy 14, 1 (1991), 39–100. 38. Harris, Z. S. Distributional structure. Word 10, 2-3 (1954), 146–162. 39. Kamp, H., and Partee, B. Prototype theory and compositionality. Cognition 57 (1995), 129–191. 40. Kamp, H., and Reyle, U. From discourse to logic: Introduction to modeltheoretic semantics of natural language, formal logic and discourse representation theory, vol. 42. Springer Science & Business Media, 2013. 41. Kartsaklis, D. Compositional Distributional Semantics with Compact Closed Categories and Frobenius Algebras. PhD thesis, University of Oxford, 2014. 42. Kartsaklis, D. Coordination in categorical compositional distributional semantics. In Semantic Spaces at the Intersection of NLP, Physics and Cognitive Science (2016). arXiv:1606.01515. 43. Kartsaklis, D., and Sadrzadeh, M. Prior disambiguation of word tensors for constructing sentence vectors. In The 2013 Conference on Empirical Methods on Natural Language Processing. (2013), ACL, pp. 1590–1601. 44. Kartsaklis, D., and Sadrzadeh, M. A study of entanglement in a categorical framework of natural language. In Proceedings of the 11th Workshop on Quantum Physics and Logic (QPL) (2014), Kyoto Japan. 45. Kartsaklis, D., Sadrzadeh, M., Pulman, S., and Coecke, B. Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In Logic and Algebraic Structures in Quantum Computing and Information. Cambridge University Press, 2015. arXiv:1401.5980. 46. Kelly, G. M., and Laplaza, M. L. Coherence for compact closed categories. Journal of Pure and Applied Algebra 19 (1980), 193–213. 47. Lambek, J. The mathematics of sentence structure. American Mathematics Monthly 65 (1958). 48. Lambek, J. Deductive systems and categories. Mathematical Systems Theory 2, 4 (1968), 287–318. 49. Lambek, J. Deductive systems and categories ii. standard constructions and closed categories. In Category theory, homology theory and their applications I. Springer, 1969, pp. 76–122.
216
Bob Coecke
50. Lambek, J. From lambda-calculus to cartesian closed categories. To HB Curry: essays on combinatory logic, lambda calculus and formalism (1980), 375–402. 51. Lambek, J. From categorial grammar to bilinear logic. Substructural logics 2 (1993), 207–237. 52. Lambek, J. Type grammar revisited. Logical Aspects of Computational Linguistics 1582 (1999). 53. Lambek, J. From word to sentence. Polimetrica, Milan (2008). 54. Lambek, J. Compact monoidal categories from linguistics to physics. In New Structures for Physics, B. Coecke, Ed., Lecture Notes in Physics. Springer, 2011, pp. 451–469. 55. Leifer, M. S., and Poulin, D. Quantum graphical models and belief propagation. Annals of Physics 323, 8 (2008), 1899–1946. 56. Leifer, M. S., and Spekkens, R. W. Towards a formulation of quantum theory as a causally neutral theory of Bayesian inference. Physical Review A 88, 5 (2013), 052130. 57. Lewis, M. Compositionality for recursive neural networks. arXiv:1901.10723, 2019. ¨ 58. L¨ owner, K. Uber monotone matrixfunktionen. Mathematische Zeitschrift 38, 1 (1934), 177–216. 59. Mann, W. C., and Thompson, S. A. Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse 8, 3 (1988), 243–281. 60. Newsroom, I. Intel to support the Irish centre for high end computing on new collaborative quantum computing project, 2019. https://newsroom.intel.ie/news-releases/intelto-support-the-irish-centre-for-high-end-computing-on-new-collaborative-quantumcomputing-project/. 61. Piedeleu, R. Ambiguity in categorical models of meaning. Master’s thesis, University of Oxford, 2014. 62. Piedeleu, R., Kartsaklis, D., Coecke, B., and Sadrzadeh, M. Open system categorical quantum semantics in natural language processing. In CALCO 2015 (2015). arXiv:1502.00831. 63. Polajnar, T., Rimell, L., and Clark, S. An exploration of discourse-based sentence spaces for compositional distributional semantics. In Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics (2015), pp. 1–11. 64. Preller, A., and Sadrzadeh, M. Bell states and negative sentences in the distributed model of meaning. Electronic Notes in Theoretical Computer Science 270, 2 (2011), 141–153. 65. Preskill, J. Quantum computing in the nisq era and beyond. Quantum 2 (2018), 79. 66. Redei, M. Why John von Neumann did not like the Hilbert space formalism of quantum mechanics (and what he liked instead). Studies in History and Philosophy of Modern Physics 27, 4 (1996), 493–510. 67. Sadrzadeh, M., Clark, S., and Coecke, B. The Frobenius anatomy of word meanings I: subject and object relative pronouns. Journal of Logic and Computation 23 (2013), 1293–1317. arXiv:1404.5278. 68. Sadrzadeh, M., Clark, S., and Coecke, B. The Frobenius anatomy of word meanings II: possessive relative pronouns. Journal of Logic and Computation 26 (2016), 785–815. arXiv:1406.4690. 69. Sadrzadeh, M., and Muskens, R. Static and dynamic vector semantics for lambda calculus models of natural language. arXiv:1810.11351 (2018). 70. Salek, S., and Barrett, J. Quantum weak causal modelling, 2019. Draft. 71. Sch¨ utze, H. Automatic word sense discrimination. Computational linguistics 24, 1 (1998), 97–123. 72. Seely, R. A. G. Linear logic, ∗-autonomous categories and cofree algebras. Contemporary Mathematics 92 (1989), 371–382. 73. Selinger, P. Dagger compact closed categories and completely positive maps. Electronic Notes in Theoretical Computer Science 170 (2007), 139–163.
The Mathematics of Text Structure
217
74. Selinger, P. A survey of graphical languages for monoidal categories. In New Structures for Physics, B. Coecke, Ed., Lecture Notes in Physics. Springer-Verlag, 2011, pp. 275– 337. arXiv:0908.3347. 75. Smolensky, P., and Legendre, G. The Harmonic Mind: From Neural Computation to Optimality-Theoretic Grammar Vol. I: Cognitive Architecture Vol. II: Linguistic and Philosophical Implications. MIT Press, 2005. 76. Steedman, M. The syntactic process, vol. 24. MIT press Cambridge, MA, 2000. 77. Toumi, A. Categorical compositional distributional questions, answers & discourse analysis. Master’s thesis, University of Oxford, 2018. 78. van de Wetering, J. Ordering information on distributions. arXiv:1701.06924 (2017). 79. Visser, A. Contexts in dynamic predicate logic. Journal of Logic, Language and Information 7, 1 (1998), 21–52. 80. von Neumann, J. Wahrscheinlichkeitstheoretischer aufbau der quantenmechanik. Nachrichten von der Gesellschaft der Wissenschaften zu G¨ ottingen, MathematischPhysikalische Klasse 1 (1927), 245–272. 81. von Neumann, J. Mathematische grundlagen der quantenmechanik. Springer-Verlag, 1932. Translation, Mathematical foundations of quantum mechanics, Princeton University Press, 1955. 82. Widdows, D. Orthogonal negation in vector spaces for modelling word-meanings and document retrieval. In 41st Annual Meeting of the Association for Computational Linguistics (Japan, 2003). 83. Widdows, D., and Peters, S. Word vectors and quantum logic: Experiments with negation and disjunction. Mathematics of language 8, 141-154 (2003). 84. Wiebe, N., Bocharov, A., Smolensky, P., Troyer, M., and Svore, K. M. Quantum language processing. arXiv:1902.05162 (2019). 85. Wiebe, N., Braun, D., and Lloyd, S. Quantum algorithm for data fitting. Physical review letters 109, 5 (2012), 050505. 86. Wijnholds, G., and Sadrzadeh, M. Classical copying versus quantum entanglement in natural language: The case of vp-ellipsis. arXiv:1811.03276 (2018). 87. Zeng, W., and Coecke, B. Quantum algorithms for compositional natural language processing. arXiv:1608.01406.
Aspects of Categorical Recursion Theory Pieter Hofstra and Philip Scott
Abstract We present a survey of some developments in the general area of category-theoretic approaches to the theory of computation, with a focus on topics and ideas particularly close to the interests of Jim Lambek.
1 Introduction Algorithms have been discussed for thousands of years, starting with the Babylonians and later the Greeks (e.g. Plato’s academy, Euclid in Alexandria, etc.). These ideas were subsequently passed to (or rediscovered in) many mathematical cultures and civilizations (see [3]). Indeed, the word algorithm itself comes from the Latinized name of the author of a book on Hindu arithmetic, the Persian mathematician Muhammed ibn-M¯ us¯ a al-Khw¯ arizm¯i (c. 825). Yet it was only in the 19th century that serious approaches to understanding the foundations of algorithms and computable functions began. For example, the modern idea of defining functions by iteration and proofs by induction seems to have originated in the writings of Richard Dedekind [28]. David Hilbert’s seminal lectures on the foundations of mathematics led Thoralf Skolem in the early 1920’s to axiomatize the primitive recursive functions, a class of inductively defined numerical functions which were intuitively computable. Was this all of them? Alas, no: a routine application of Cantor’s diagonal argument ([27], p.91) shows that there are intuitively computable functions which are not primitive recursive. Indeed, in 1928 Hilbert’s student Pieter Hofstra Department of Mathematics and Statistics, University of Ottawa, Canada, e-mail: [email protected], partially supported by an NSERC Discovery grant Philip Scott Department of Mathematics and Statistics, University of Ottawa, Canada, e-mail: phil@ site.uottawa.ca, partially supported by an NSERC Discovery grant.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_7
219
220
Pieter Hofstra and Philip Scott
Wilhelm Ackermann constructed an explicit example of a recursively defined, intuitively computable function which grows faster than any primitive recursive function. Throughout the 1920’s Hilbert discussed the Entscheidungsproblem (Decision Problem) for predicate logic, whose surprising final (negative) answer was obtained independently by Alonzo Church and Alan Turing in 1936, influenced by work of Kurt G¨odel (1931). Indeed, this was the culmination of seminal research developing the modern theory of computability and computable functions by the logicians Church, G¨ odel, and Church’s students Stephen Kleene, J. Barkley Rosser, and Alan Turing in the period 1931-1936. Jim Lambek, in his writings and public presentations, had a long-time interest in the foundations of computability and its history [3]. His published papers include his well-known introduction of abacuses in 1961 [65] as a simple model of computation (an alternative to Turing machines) as well as his work on applying Gerhard Gentzen’s cut-elimination algorithm and normal forms to categorical coherence theory [67, 69, 70]. This led to his interest in typed combinatory algebras, typed lambda calculi and categorical theories of computation (see [73] and Part III of the book with the second author [77]). In linguistics and anthropology1 , as well as in mathematics, he often expressed interest in doing computation via relation algebras, e.g. from kinship terminology [8] to Mal’cev categories [15] to exact completions and partial equivalence relations [78]. We shall further explore some of these ideas in Section 3 below. Here we shall examine three particular questions that occupied Lambek for many years. 1. Are there natural recursion theories? 2. What are the computable functions and functionals in various concrete categorical structures? 3. Are there intrinsic algebraic/categorical approaches to recursion theory? The detailed discussion of these questions will be pursued in the following sections. As a warm up, we describe the informal meaning of Lambek’s questions (and, in part, some of the associated answers). 1
In lecture notes from his McGill course, Lambek also explored simple machines for producing and recognizing formal grammars. In these notes he also introduced his future article [72] “A mathematician looks at French conjugation”, which uses matrix algebra to implement an algorithm to compute correct French verb forms. The article begins with typical Lambek humour: “The penalty one has to pay for the enjoyment of French culture is that one must face the problem of conjugating the French verb.”
Aspects of Categorical Recursion Theory
221
1.1 On Lambek’s Questions Concerning questions (1) and (2), in many conversations and lectures Lambek emphasized that “natural” recursion theories (and their classes of computable functions) should arise by examining the computable numerical functions in various free categories arising in categorical logic. Here, by a free structured category (where the kinds of structure we may wish to consider include monoidal structure, finite limits, cartesian closed structure, etc.) we mean the structured category with natural numbers object (NNO) freely generated by the empty graph. Such a category is then initial among the categories with this structure. We can define the notion of a representable numerical function analogously to how this is done in mathematical logic (as in G¨odel [38], cf. also [77]) and we can ask: which numerical functions (partial functions, functionals, etc.) are representable in the various relevant free categories? To this end, these questions were taken up in Lambek and Scott ([77], Part III) for categories associated to various higher-order logics, and will be discussed in more detail in Section 4 below. We summarize some of the early literature in Figure 1. Free Categories
Definable Functions and Functionals
Cartesian and monoidal with NNO Cartesian closed with NNO The free elementary topos with NNO
Primitive recursive functions ([110, 102]) G¨ odel’s Dialectica Functionals ([77]) Provably total functions of HAH ([77]) and Higher provably recursive functionals ([111])
C-monoids and CCCs with reflexive objects
Church’s untyped lambda calculus (with surjective pairing) and the partial recursive functions ([77])
Fig. 1 Natural Recursion Theories and their computable function(al)s
Concerning Question (3), since the 1960’s there has been increasing interest in developing general categorical frameworks for computability theory. We mention here in particular the early work by Eilenberg and Elgot [32] on recursiveness, and the groundbreaking work by Di Paola and Heller on recursion categories [100, 45, 101], the modern incarnation of which we shall discuss in Section 5. Other aspects of computation and computability were studied from a categorical standpoint by various authors; for example, Lawvere [80] (see also [121]) gives a general version of the diagonal argument from which the well-known first recursion theorem, the fixed point theorem in untyped lambda calculus, and G¨odel’s diagonalization lemma can be obtained; while Mulry [98] introduces the recursive topos as a natural setting to consider a generalization of the Banach-Mazur functionals to all higher types. We shall not attempt to give a full historical account of the large recent literature on categorical recursion theory. Instead, we shall focus on a few
222
Pieter Hofstra and Philip Scott
areas that the authors have become involved in which represent new directions of independent interest, but which also overlap with Lambek’s interests.
1.2 General Notation and Background We now introduce some notation and terminology for some of the structure appearing frequently in this paper. We assume that the reader is familiar with basic category theory. Standard references include [89, 5]. Some familiarity with lambda calculus [6] and the basic theory of computation [27, 99] is also an advantage. The (large) category of sets and functions is denoted by Set. The category of sets and partial functions (that is, single-valued relations) is denoted by Par. By Rel, we mean the category of sets and relations. An idempotent is an endomorphism e : A → A for which ee = e. An idempotent e splits when there is an embedding-retraction pair m : B → A, r : A → B with rm = 1B , mr = e. The idempotent splitting or Karoubi envelope of C is the category K(C) whose objects are the idempotents of C, and whose morphisms (A, e) → (B, d) are maps f : A → B with df = f = f e. More generally, for E a set of idempotents , KE (C) is the full subcategory of K(C) on the objects determined by E. Finally, a retract of an object A is an object B together with an embedding-retraction pair m : B → A, r : A → B, with rm = 1B . We write B A to indicate that B is a retract of A.
2 What is a computable function? As explained in the introduction, one of the fundamental challenges for mathematical logic in the first quarter of the 20th century was to make precise the notions of computation, computable function, and computable set.2 This section reviews some of these developments of the classical theory, setting the stage for the categorical approaches to be introduced later. 2
Originally, the terminology recursive function has been prevalent, due to the emphasis on the use of recursive procedures. Indeed, many authors have referred to the subject as recursion theory. However, as argued in [114], the term computability more aptly captures the flavour of the subject, and also emphasizes inclusion of other notions or models of computation.
Aspects of Categorical Recursion Theory
223
2.1 The primitive recursive functions As summarized in the Introduction, historically the attempts to define computability focussed on iterative or recursive procedures. These seem to have been first analyzed in the writings of Dedekind in the 19th century [28]. A formal system of Primitive Recursive Arithmetic, concomitant with Hilbert’s foundational lectures in the 1920’s, was developed by Skolem [113]. R´ ozsa P´eter’s work in the early 1930s (later presented as [103]) is considered to have provided the foundations for the theory of recursive functions; many of the central ideas were further developed in detail by Hilbert-Bernays [47], and especially Goodstein [40]. Moreover, these functions were also used by G¨odel in his famous Incompleteness Theorem paper [38]. Consider total numerical functions Nk → N, k ≥ 1. We recall the traditional definition of primitive recursion, then include a somewhat non-standard definition by Lambek. Definition 1 (Primitive Recursive Functions) The primitive recursive functions are the smallest class Prim of numerical functions generated from Basic Functions by composition (or substitution) and primitive recursion. The Basic Functions are the constant zero function Z(x) = 0, the successor function S(x) = x + 1, and the projection functions Uin : Nn → N, Uin (x1 , . . . , xn ) = xi . The closure rules are as follows (x denotes an element of Nn ): • Composition: if f1 : Nn → N ∈ Prim and g(u1 , · · · , uk ) : Nk → N ∈ Prim, then comp(g, f1 , . . . , fk ) : Nn → N ∈ Prim, where comp(g, f1 , ..., fk )(x) = g(f1 (x), . . . , fk (x)). • Primitive Recursion: if g(x), h(x, y, u) ∈ Prim then so is rec(x, y), where rec(x, 0) = g(x), rec(x, S(y)) = h(x, y, rec(x, y)). Using this, we may also define a relation R ⊆ Nk to be primitive recursive when its characteristic function is. Most of the numerical functions and relations used in everyday mathematics are primitive recursive. Let us mention a somewhat nonstandard definition3 of primitive recursive functions, introduced by Lambek in [3], p. 246. Following Lambek, we elide function arguments, writing e.g. f xuv for f (x, u, v), etc. Definition 2 (Lambek’s Primitive Recursive Functions) (i) Basic functions: Identity Ix = x, Successor Sx = x+1, and Zero Zx = 0. (ii) Generating Rules: (a) Substitution: given functions f xuz and gy we can form hxyz = f x(gy)z. 3
We modify slightly the Basic functions, which were missing one.
224
Pieter Hofstra and Philip Scott
(b) Interchanging two arguments: given f xuvy, we can form gxuvy = f xvuy. (c) Contracting two arguments: given f xuvy, we can form gxuy = f xuuy. (d) Introducing dummy arguments: given f xy, we can form gxuy = f xy. (e) Primitive Recursion: given gx and hxyz, we can form f xy, where f x0 = gx, f x(Sy) = hxy(f xy). Note that Lambek’s rules generating Prim are closely related to the CurryHoward functional interpretation of intuitionistic sequent calculus proofs (with non-logical axioms). Indeed, consider a proof of an intuitionistic sequent A1 , · · · , An B. The functional interpretation interprets the proof by functional “proof terms” (see Girard [37] and Lambek [74]) of the form x1 : A1 , · · · , xn : An f x : B. The identity function interprets the identity sequent A A. The generating rules (a)-(d) above correspond respectively to interpreting the following rules of sequent calculus (by associating to proof terms for each of the premises a proof term of the conclusion): cut (a), interchange (b), contraction (c) and weakening (d). Γ, A, Δ C Σ A cut Γ, Σ, Δ C
Γ, A, B, Δ C interchange Γ, B, A, Δ C
Γ, A, A, Δ B contraction Γ, A, Δ B
Γ, Δ B weakening Γ, A, Δ B
Finally, the zero function, successor, and primitive recursion (e) above may be thought-of as non-logical axioms or rules specifying a weak natural numbers object, i.e. a particular type N. In this sense, we have distinguished proofs 0 : N and x : N Sx : N, and primitive recursion is a special case of the iterator (see Subsection 4.1 below).
2.2 The computable functions So, does Computable = Primitive Recursive? Alas, no, by a standard application of Cantor’s diagonal argument. Indeed, the previously mentioned Ackermann function (which is computable but not primitive recursive) can be defined by a so-called double recursion scheme (see e.g. R. P´eter’s book [103]). So what is a computable function? This was taken up in a remarkable development in the years 1931–1937 (primarily centered around Princeton University) which, as it turned out, led to the foundations of modern computer science. Let us briefly recall the history.
Aspects of Categorical Recursion Theory
225
• A. Church (1932-34) and his students (S. C. Kleene, J. B. Rosser) developed (untyped) lambda calculus as a model of computation (and, as later realized in the 1960’s, a foundation of modern programming language theory). Church formulated Church’s Thesis (1936): the intuitively computable numerical functions are exactly those you can compute in λcalculus. This thus answered the age-old question we began with. Originally, however, Church’s thesis was not believed by G¨ odel (there being insufficient evidence at the time). However, in rapid developments, new evidence arose: • Kleene (1934-35) developed the partial μ-recursive functions: we add to Prim the following generating scheme on partial functions, called minimalisation: given g(x, y), we can form f (x) = μy.g(x, y) = 0 where μy.g(x, y) = 0 means the least y such that g(x, y) = 0 4 (If we wish to restrict to total functions, we add the proviso ∀x∃y.g(x, y) = 0.) • G¨odel-Herbrand (1934). G¨odel lectured on an equation calculus to define “computable” functions, based in part on a letter from Herbrand. This is described in Kleene’s book [61]. • Turing (1936) independently introduced Turing machines: an abstract mechanical computing device. He gave a convincing analysis of the meaning of being “computable” without restrictions on space or time. This led to Turing’s thesis: the intuitively computable functions were those computable by Turing’s abstract machines. This ground-breaking paper also showed the recursive unsolvability of Hilbert’s Entscheidungsproblem, simultaneously and independently solved by Church in 1936 (who was inspired by his studies in untyped lambda calculus). • Turing then became a student of Church at Princeton. During the period 1936-37, Church, Kleene and Turing carefully proved the “equivalence” of the above different models of computability, in the sense that all notions gave exactly the same class of computable functions! This work convinced G¨odel of the truth of the Church-Turing thesis (CT).5
2.3 Some Newer Models of Computability After the exciting results in the late 1930’s, mathematicians continued the analysis of abstract theories of computing. For example Emil Post (1943) and Provided for all z < y, g(x, z) is defined and not = 0. If there is no such y, μy.g(x, y) = 0 is undefined. 5 CT is not a mathematical statement: it is an experimental statement, identifying an informal class (namely, the “intuitively computable” numerical functions) with a precise mathematical class of functions. 4
226
Pieter Hofstra and Philip Scott
Andrei Markov (1951) developed theories of computability based on string rewriting grammars (following in the footsteps of the Norwegian mathematician Axel Thue). These notions of computability turned out to be Turing complete, i.e., equivalent to Turing computability. In 1944, Post [107] also initiated the systematic study of the recursively enumerable sets (previously defined by Kleene and Church in terms of images of recursive functions), in particular the study of the r.e. degrees. A particularly interesting period in the more recent modelling arose in 1960-61 (simultaneously and almost independently): the development of Unlimited Register Machines. Within a period of a few months, papers by J. Lambek, Z. Melzak, M. Minsky, and (slightly delayed) J. Shepherdson and J. Sturgis introduced this influential model of computability.6 Lambek’s paper [65] was by far the simplest to read of all the papers on Register Machines, and used a highly graphical syntax, akin to flowcharts. Register machines were particularly influential pedagogically, compared to the intricacies of Turing machines. A direct translation between Lambek’s machine models and Turing Machines is given in Boolos and Jeffrey [11]. Let us briefly recall the formalism. A Lambek abacus consists of a series of Locations (or registers) of arbitrary capacity (denoted X, Y , Z, · · · ), into which we may put (or remove) pebbles, called Counters. We assume an unlimited supply of (indistinguishable) pebbles as counters. There are a small number of Elementary Instructions for building abacuses, as follows:
Start ↓
↓ Stop
↓ X+ ↓
↓ Y−
(If Y = ∅, take one pebble away and go to the left; else go to the right)
Fig. 2 Abacus instructions
Here X + denotes the operation of adding one pebble to location X. Programs are formed from a finite number of instructions, arranged in a flow chart (directed graph) with root Start, possibly with feedback loops. In section 7.1 we will discuss the categorical semantics of such a graphical notion of computation. 6
Lambek’s and Melzak’s papers appeared back to back in the same issue of the Bulletin of the Canadian Mathematical Society. Lambek’s paper is a considerable simplification of Melzak’s approach.
Aspects of Categorical Recursion Theory
227
3 Lambek’s Categorical Proof Theory Categorical logic is concerned with the study of classes of categories with additional categorical structure, such as categories with finite limits, regular categories, monoidal (closed) categories, cartesian closed categories, first-order categories, toposes, and so on. Ideally, such a class of categories corresponds to a well-behaved fragment of logic; for example, cartesian closed categories correspond to typed lambda calculus (see below). This correspondence means that there is a sound and complete interpretation of the logic in this class of categories. On the one hand, this allows us to use proof-theoretic techniques (rewriting for example) to reason about categorical structure, while on the other hand we may apply categorical results to obtain information about logical systems. Categorical proof theory is particularly concerned with the study of syntactically generated categories and their properties. This section describes some of the contributions due to Lambek, as well as some related developments.
3.1 A brief history Lambek’s early works in mathematical linguistics [64, 66] as well as his later work in categorical coherence theory [67, 69, 70] employed proof theory, notably Gentzen’s sequent calculi. Coherence theorems in category theory were aimed at answering the following very general question: (see Mac Lane [89]) given a freely generated structured category C, prove that every diagram (built from some canonical morphisms) commutes. Lambek reformulated the question more generally as follows: (i) Given a freely generated structured category C, how do we effectively generate the hom-sets HomC (A, B)? (ii) Find an effective method to solve the word problem for hom-sets in such C. In particular, any two morphisms with the same domain and codomain generated from the canonical morphisms must be equal. Lambek’s seminal idea was to reformulate this problem using proof-theory, then apply Gentzen’s Cut-Elimination (or Normalization) theorems. Namely, he considered freely generated monoidal or residuated categories as kinds of “logics” or “labelled deductive systems”: the objects of such categories are “formulas” (freely generated from some atomic ones), while arrows would then be equivalence classes of proofs (or proof trees). In particular, an arrow f ∈ HomC (A, B) would be considered as a proof of the Gentzen sequent A B, while composition of arrows f : A → B and g : B → C to obtain g ◦ f : A → C becomes an instance of the CutRule. The equations of a category force one to impose the notion of “equality
228
Pieter Hofstra and Philip Scott
of proofs”. Algebraically, one generates a congruence relation on proofs (or better, between proof trees). For (i), we generate all proofs of the sequents A B, by Gentzen’s proof search. For the word problem (ii), Gentzen’s cut-elimination methods amount to introducing a compatible rewriting system on proofs. To decide if two proof trees denote the same arrow or not, reduce each to a unique normal (or cutfree) form. The problem of deciding equality of arrows amounts to deciding if their normal forms are identical or not.7 Lambek pursued these ideas in the late 1960’s and early 1970’s using cutelimination to solve the word problem for (among others) residuated and biclosed monoidal categories in [67, 69, 70]. But it was soon realized by proof theorists, beginning with G. Mints [95], that natural deduction calculi (and their associated lambda calculi of proof-terms, under normalization) leads to a smoother technical framework for such word problems. Mints and his students greatly increased the scope of Lambek’s proof-theoretic approaches to coherence, influencing even Kelly and Mac Lane [59]. Normalization approaches to coherence/decision problems for monoidal categories (using reduction of lambda-like proof terms) were first investigated by Mints and his students ([95, 96], reprinted in [97]). In the case of monoidal closed categories, it was shown in Mac Lane [88] that Mints’ proof-theoretic methods agreed almost exactly with the approach to coherence due to Kelly and Mac Lane, all of which in turn were influenced by Lambek’s original use of Cut-Elimination. Meanwhile, in the 70s and 80s, Lambek’s own algebraic studies on functional completeness and combinatory logics [71, 73], led him to consider connections of lambda calculi to freely generated cartesian and cartesian closed categories. Around the same time, work in computer science in applying lambda calculi and natural deduction to functional languages led to the nowcommon practice of assigning lambda- (or proof-) terms to proof trees [37]. Hence “equality of proofs” becomes provable equality of the associated terms assigned to the proof trees. This is sometimes known as the Curry-HowardLambek correspondence, to be discussed in more detail below. After the introduction of Girard’s Linear Logic in 1986 [36] (which used sequent calculi and gives particular analysis of the structural rules) Lambek realized his earlier work in linguistics amounted to a kind of substructural (linear) logic without structural rules. He introduced generalizations of deductive systems to more general Gentzen sequents with their associated multicategories and term calculi [74]. On the subject of categorical proof theory, cut-elimination, internal languages and applications to (structured) monoidal categories, linear logics, coherence theorems, et cetera, there has been an explosion of activity. As a small sample of the extensive literature, we mention works of R. Blute, R. Cockett, R. Seely and co-workers [10, 26, 9] K. Do˘sen, et al. [29, 30, 31], B. Jay [54, 55], and I. Mackie, et. al. [90]. 7 An equivalent formulation [116] of a coherence theorem for a free category of some kind says: given any two objects A, B, there is at most one proof (built from canonical arrows) of the associated sequent A B.
Aspects of Categorical Recursion Theory
229
3.2 Internal Languages and free categories As mentioned above, coherence problems are often formulated in terms of free categories. Let us make this more precise. Suppose that S-Cat is a category whose objects are structured categories and whose morphisms are structurepreserving functors. There is a forgetful functor S-Cat −→ DirGrph to the category of directed graphs. The free structured category generated by a (small) graph G, denoted F(G), can be described in terms of a left adjoint to this forgetful functor. In [77] this left adjoint is constructed using logical syntax along the following lines.
(i) One sets up an equivalence of categories S-Cat −→ Lang where Lang is some category of formal theories (whose morphisms are “interpretations” which preserve the structure exactly). The equivalence is implemented by a pair of functors: L : S-Cat −→ Lang which associates to every category C a so-called internal language and C : Lang −→ S-Cat, which associates to a language L, a category C(L), called the (syntactic) category generated by L. (ii) Next, one constructs, given a directed graph G, the theory LG generated by G. The types of LG are generated from the nodes of G, while the terms are generated using the term-formation rules of the logic by including the arrows of G as term-forming operations. The free structured category F(G) generated by G may then be taken to be C(LG ), the syntactic category of LG . We thus have the following picture: C
q S-Cat \
1 Lang B
L F
DirGrph
L
Of particular importance is the case where G is the empty graph. The resulting category F(G) is then the initial structured category. In the book [77], such theories include typed (and even untyped) lambda calculi (corresponding to cartesian closed categories with additional structure) and intuitionistic higher order logics (Russellian type theories) with full impredicative comprehension, extensionality, and Peano’s axioms (corresponding to elementary toposes with logical morphisms and natural numbers). We briefly discuss the two cases of Cartesian Closed Categories and Elementary Toposes (both with natural numbers object) below. It is important to note that in order to obtain a 1-categorical equivalence S-Cat Lang of this kind, we need to consider the objects of S-Cat
230
Pieter Hofstra and Philip Scott
not just as structured categories, but as categories equipped with specified structure. Similarly, we require the functors to preserve this specified structure on the nose. It is possible to avoid working with chosen structure, but then one should instead consider S-Cat as a 2-category, and set up a 2categorical equivalence with a suitable 2-category of theories. An example of this finer analysis appears (in this volume) in the paper of Castellan et al. [17], which discusses the Seely correspondence between locally cartesian closed categories and dependent type theories, and, more generally, provides a suitable 2-categorical perspective on categorical logic.
3.3 CCCs and the Curry-Howard-Lambek correspondence Cartesian closed categories were introduced by Lawvere in the early 1960s as the categorical analog of Church’s typed lambda calculi. In the early 1970s, Lambek explored this correspondence, along with connections to Sch¨onfinkel and Curry’s works on combinatory algebras and functional completeness. The precise tripartite categorical equivalence of cartesian closed categories, typed lambda calculi, and labelled deductive systems for positive intuitionistic propositional calculus (modulo equality of proofs) was developed in detail in [77]. This yields a modern version of the so-called Curry-Howard correspondence [37], with the additional idea (Lambek [67, 69]) of equations between proofs, and is summarized in Theorem 1 below. Definition 3 A cartesian closed category C (with specified structure) is a cartesian category C (i.e., a category with specified finite products) such that, for each object A ∈ C, the functor (−) × A : C → C has a specified right adjoint, denoted (−)A . Thus, there is a natural isomorphism (natural in B and C): HomC (C × A, B) ∼ = HomC (C, B A ). Examples of CCCs The category of sets is a CCC with B A the set of all functions A → B. More generally, any functor category [C op , Set] is a CCC, where GF (C) is the set of natural transformations from HomC (−, C) × F to G. The category Cat of small categories is also cartesian closed, as are many categories of “nice” topological spaces, such as compactly generated Hausdorff spaces.
Next, consider simply typed lambda calculi. Definition 4 A simply typed lambda calculus consists of the following data. First, it has a collection of simple types generated from a set of ground types G by the grammar
Aspects of Categorical Recursion Theory
Types
231
A, B ::= G | 1 | A × B | A ⇒ B.
At each type, we assume given an infinite set of variables; we write x : A to indicate that x is a variable of type A. Next, we have, for all types A1 , . . . , Ak , B a (possibly empty) set of basic terms E(A1 , . . . , Ak ; B). Then the collection of typed terms is generated using the rules displayed in Figure 3. We make the usual assumptions (see e.g. [6, 77]) regarding free and bound variables, and write F V (t) for the set of free variables of t; each x ∈ F V (t) has a unique type, and from the term t we can recover the types of the free variables in t.
f ∈ E(A1 , . . . , Ak ; B) ∗:1
f (x1 , . . . , xk ) : B
a:A b:B
a, b : A × B
t : A 1 × A2 πi t : A i
xi : Ai
x:A
ϕ(x) : B
λx : A.ϕ(x) : B A f : BA t : A ft : B
Fig. 3 Lambda Calculus Terms
Finally, we have equations between terms of the same type. We write t =X s : A to express that the terms t, s are equal, and that the free variables of t and s are contained in the set X. The relations =X are congruences satisfying the following clauses 8 : • • • • • •
t =X s, X ⊆ Y implies t =Y s t =X s implies f t =X f s (where f : B A and t, s : B) ϕ(x) =X∪{x} ψ(x) implies λx.ϕ(x) =X λx.ψ(x) a =X ∗ (where a : 1) πi a1 , a2 =X ai ; a =X π1 a, π2 a (η) f =X λx.f x where x ∈ F V (f ) (β) (λx.ϕ(x))t =X ϕ[t/x] ;
It is possible to augment simply typed lambda calculus with additional types, terms, and equations (cf. [77]). We discuss the case of adding natural numbers and lists in Subsection 4.1 below. An important example of a simply typed lambda calculus arises as follows. Definition 5 (Simply typed λ-calculus from a graph) Consider a directed graph G = (G, E). The calculus LG has as ground types the vertices of G, and as basic terms the edges of G, (i.e. whenever f : A → B is in E, there is a basic term f (x) : B, with x : A.) The congruence t =X s on terms is the smallest congruence satisfying the rules of simply typed lambda calculus. 8
Here we present lambda calculi as ordinary equational theories, as in [77]. One could also write equational logics in an appropriate sequent calculus, writing t =X s : A as X s = t : A (cf. Barendregt’s lambda theories [6] and the use of HOL below).
232
Pieter Hofstra and Philip Scott
We now define the category CCC whose objects are CCCs (with chosen products and exponentials), and whose morphisms are functors preserving the chosen products and exponentials on the nose. On the other hand, we define the category Typed λ-calc to have typed lambda calculi as objects, and translations as morphisms. Here, a translation between two calculi is a mapping sending types to types and terms to terms, in such a way that all type and term formation operations are preserved and that provable equality between terms is preserved. Definition 6 (Internal language of a CCC) Let C be a cartesian closed category. The internal language of C is the simply typed lambda calculus L(C) generated by the underlying graph of C, together with all equations holding between arrows of C. In the other direction, we construct a CCC C(L) from a typed lambda calculus L: Definition 7 (Syntactic Category) Let L be a simply typed lambda calculus. Define a category C(L) by: Objects The types of L. Morphisms For any term t : T with F V (t) = {x1 : T1 , . . . , xn : Tn }, we have a morphism [t] : T1 × · · · × Tn → T . Here [t] is the equivalence class of t under provable equality of the theory L. Identities The identity at an object T is represented by the term x : T . Composition Given terms t(x), s(y) representing morphisms A → B and B → C respectively (where we assume that t is substitutable for y in s), the term s[t/y] (the result of substituting t(x) for all variables y in s) represents the composite A → C. We now have the promised result9 : Theorem 1 (Curry-Howard-Lambek correspondence [77]) The pair of functors L : CCC → Typed λ-calc (internal language) and C : Typed λ-calc → CCC constitute an equivalence of categories. The above theorem extends to include adding the natural numbers and similar data types (of which the categorical aspects are discussed in the next Section). 9 Lambek reported that when he lectured at a category theory meeting in Murten in 1984 announcing these results, Sammy Eilenberg is reported to have said: “Good, then we can forget all about the λ-calculus!”
Aspects of Categorical Recursion Theory
233
3.4 Elementary toposes and HAH We now outline another instance of an equivalence between a class of categories and a fragment of logic, namely elementary toposes with NNO and higher-order intuitionistic arithmetic (HAH). Recall that in a category C, a subobject of an object A is an equivalence class of monomorphisms m : X → A, where two monomorphisms are equivalent precisely when they factor through each other. The collection of subobjects of A is denoted Sub(A). The assignment A → Sub(A) is a contravariant functor from C to the category of posets. A category is said to have canonical subobjects when every subobject has a chosen representative. In Set, for example, we may represent a subobject through its set-theoretic image. Definition 8 (Elementary Topos) A category C is a topos when it has the following structure: • C has finite limits • C has power-objects: for each A there exists an object PA and natural bijection HomC (B, PA) ∼ = Sub(B × A). A power-object PA (when it exists) represents the functor Sub(− × A). In the category of sets, we may take PA to be the powerset of A, and then the defining bijection becomes the familiar correspondence between relations R ⊆ B × A and functions r : B → PA. In a topos, we write Ω for P1. This is the subobject classifier : there is a natural bijection HomC (B, Ω) ∼ = Sub(B). We think of Ω as the object of truth values of C, and of PA as the exponential ΩA . The category of sets is of course a typical example of a topos, as are functor categories [C op , Set]. Other examples will be discussed below. The qualifier elementary is used to stress the inclusion of toposes other than Grothendieck Toposes (which are required to be cocomplete and have a small set of generators). 10 In the context of elementary toposes, one often considers logical morphisms between toposes. These are functors preserving all the topos structure. Just as for CCCs, we work with toposes with specified structure and morphisms strictly preserving this structure. 10
Grothendieck toposes were introduced in the early 1960s by the Grothendieck school of algebraic geometry [4] as sheaves on a site. In the early 1970s, Lawvere and Tierney [81] introduced elementary toposes. It was realized that such toposes could be considered as a universe of mathematics, where the objects and morphisms can be treated as sets and functions, provided one refrains from using classical reasoning (the law of excluded middle and the Axiom of Choice).
234
Pieter Hofstra and Philip Scott
Definition 9 The category Top has: Objects Elementary toposes with specified finite limits and power objects, and with canonical subobjects. Morphisms Logical functors preserving all specified structure on the nose. Next, let us describe (intuitionistic) higher-order logic (HOL). This formal system can be thought of as an extension of simply typed lambda calculus, with added type and term constructors for the type Ω of propositions and for power objects PA. (However, we do not include exponentials explicitly, as they are definable in terms of the other operations 11 .) Thus the types are generated from ground types G using the following grammar: Types
A, B ::= G | 1 | A × B | Ω | PA.
The terms are generated from basic terms and variables using the rules displayed in Figure 4 (where we omit the rules already stated for simply typed lambda calculus in Figure 3): a : T a : T a = a : Ω
a : T α : PT a∈α:Ω
x:A
ϕ(x) : Ω
{x : A | ϕ(x)} : PA
Fig. 4 Terms of higher-order intuitionist logic
In [77] there are two axiomatizations of higher order logic, including the one above based on equality (between terms of the same type), comprehension, extensionality, and (in case we add a type of natural numbers) Peano’s axioms. Following Russell, Henkin, and Prawitz, since we are assuming a primitive equality predicate at each type, we can define the usual logical connectives as in Figure 5 below. p∧q p⇒q ⊥ ¬p
:= := := := :=
∗=∗
p, q = , p∧q =p ∀x:Ω x ∀x:Ω (p ⇒ x)
p∨q ∀x:A ϕ(x) ∃x:A ϕ(x) ∃!x:A ϕ(x)
:= := := :=
∀x:Ω (((p ⇒ x) ∧ (q ⇒ x)) ⇒ x) {x : A | ϕ(x)} = {x : A | } ∀y:Ω (∀x:A (ϕ(x) ⇒ y) ⇒ y) ∃x : A({x : A | ϕ(x)} = {x : A | x = x })
Fig. 5 Type-theoretic encoding of logic 11
Moreover, as discussed in [77], strict logical functors will preserve only the powerset structure on the nose. In keeping with the logic literature and because of its historical importance, we denote the type of truth values by Ω, rather than treating it as P1. Logical functors will preserve Ω on the nose.
Aspects of Categorical Recursion Theory
235
We now define an entailment relation Γ X q. Here, Γ is a finite set of formulas (i.e., terms of type Ω), q is a formula, and X is a typed variable context containing all the free variables of Γ and q; the meaning of Γ X q is that q can be derived (using the rules for intuitionistic logic) from Γ . When Γ = ∅ we simply write X q. There are standard structural rules (including Cut), substitution, rules for equality, rules for products, and for comprehension. For example, there is the comprehension rule X (y ∈ {x : A | ϕ(x)}) = ϕ(y). We refer to [77] for a complete list of rules. By a type theory we mean an extension of HOL by sequents Γ X q. When L is such a type theory, we write L for the entailment relation of L (although we may omit the superscript when L is understood). In L, we say that two terms t, s of the same type are provably equal when L X t = s. Just as for simply typed lambda calculi, it is common to include a type of natural numbers; the type theory obtained by adding the natural numbers to HOL (and no further basic types) is called Higher-order intuitionistic Arithmetic, or HAH for short. An interpretation of one type theory in another is a mapping of types to types that preserves all type formation operations, together with a mapping of terms that respects the typing, the term formation operations and the provable equality. Type theories and interpretations form a category denoted Lang. A type theory is classical if in addition it has Aristotle’s axiom of excluded middle: ∀p : Ω(p ∨ ¬p). Such a system of classical type theory was employed in G¨odel’s famous incompleteness paper [38]. Given a type theory L one may now build a syntactic topos as follows: Definition 10 (Generated Toposes T (L)) The topos T (L) generated by the type theory L has as objects “sets” (i.e., closed terms α of type PA , modulo provable equality). Morphisms α → β, where α : PA and β : PB, are “provably functional relations”, i.e. closed terms ϕ : P(A × B) (modulo provable equality) such that we can prove: L ∀x:A (x ∈ α ⇒ ∃!y:B (y ∈ β ∧ (x, y) ∈ ϕ)) T (L) is the category of “sets” and “functions” formally definable in higherorder logic L. The assignment L → T (L) is a functor T : Lang → Top. For L0 = pure type theory, T (L0 ) is called the free topos, denoted Ftop . It enjoys the following universal property: for any elementary topos E there exists a logical functor F : Ftop → E which is unique up to (unique) natural isomorphism. In other words, Ftop is the initial object of Top. In the other direction we may assign to a topos C its internal language L(C), just as for CCCs. This gives a functor L : Top → Lang.
236
Pieter Hofstra and Philip Scott
Theorem 2 (Lambek-Scott [77]) The functors L, T described above constitute an equivalence of categories Top q
T L
1 Lang.
As for simply typed lambda calculus, we may extend this result by adding datatypes. Most importantly, we can consider type theories with natural numbers and toposes with natural number objects (see next Section).
4 What are computable functions in categories? We turn to the study of computable functions in categories. In this section, we limit ourselves to computable numerical functions; later we shall consider computable maps on other datatypes.
4.1 Natural Numbers Objects and Prim In order to discuss number-theoretic functions in categories, we briefly recall Lawvere’s notion of Natural Numbers Objects (NNOs) in cartesian closed categories [79, 77] and more generally NNOs in cartesian and monoidal categories [102]. Definition 11 (Lawvere [79]) A Natural Numbers Object (NNO) in a 0 S (cartesian closed) category C is a diagram 1 −→ N −→ N initial among a
h
diagrams 1 −→ A −→ A. i.e., there exists a unique Itah : N → A satisfying: Itah ◦ 0 = a , Itah ◦ S = h ◦ Itah Existence, without uniqueness, of such an arrow Itah yields the notion of a weak NNO. Any arrow Itah : N → A (unique or not) satisfying the equations above is called an iterator at type A. Diagrammatically,
1
0
N a
S
Itah
A
N Itah
h
A
In Set this says: Itah (0) = a Itah (n + 1) = h(Itah (n))
For any NNO (weak or strong) we may define, for any natural number n ∈ N, the standard numeral n : 1 → N by
Aspects of Categorical Recursion Theory
0=0
237
n +1=S◦n .
We stress that depending on the nature of the ambient category, there may be non-standard numerals, that is, points 1 → N that are not of the form n . A natural numbers object in a cartesian closed category is equivalent to the following scheme of Iteration with parameters. This general scheme (and its variants for monoidal categories) is sufficient for representing the primitive recursive functions [79, 77] and is the appropriate definition for NNO’s in cartesian (or monoidal) categories, as in [77], p.71. Definition 12 (Parametrized NNO) 0
S
N×A
S×1A
A diagram 1 −→ N −→ N in a cartesian category C is a parametrized NNO g f if for all arrows A −→ B, B −→ B, there exists a unique Itgf : N × A → B such that:
A
0!,1A g
Itgf
Itgf (0, a) = g(a) Itgf (n + 1, a) = f (Itgf (n, a))
Itgf f
B
In Set this says:
N×A B
Existence without uniqueness of the arrow Itgf above yields a weak parametrized NNO. A typical example is the notion of adding an iterator to a simply typed lambda calculus. Iterators in typed lambda calculus Following [77], we add to the terms of simply typed lambda calculus in Figure 3 an atomic type N and term formation operations 0:N
x:N Sx : N
a : A h : AA x : N It(a, h, x) : A
(allowing in particular the definition of standard numerals n ). We then add to the equations of the simply typed lambda calculus the following equations: It(a, h, 0) =X a
It(a, h, Sx) =X∪{x} h(It(a, h, x)), provided x ∈ X.
Calling this lambda theory L, the associated syntactic category C(L) (Definition 7) is a cartesian closed category with weak NNO. In general, when we consider a category C the difference between a weak and a strong NNO in C can be understood in logical terms by considering
238
Pieter Hofstra and Philip Scott
the form of induction allowed in the internal language. For example, when C has a strong NNO we can prove the entailment x,y x + y = y + x where, crucially x, y are free variables of type N. When C only has a weak NNO one can prove by (external) induction that for every n ∈ N: x x + n =n + x. Next, consider a (not necessarily symmetric) monoidal category (C, ⊗, I). Following Par´e and Rom´an [102], we may define notions of Left and Right NNOs, in analogy with Definition 12. 0
S
Definition 13 (Left Parametrized NNO) A diagram I −→ N −→ N in g a monoidal category C is a left parametrized NNO if for all arrows A −→ f B, B −→ B, there exists a unique k : N ⊗ A → B such that:
I ⊗A
0⊗1A
∼ =
A
N⊗A
S⊗1A
k g
B
N⊗A k
f
B
In the same manner, tensoring by A on the left (rather than the right) results in a Right Parametrized NNO; weak objects are defined similarly by assuming merely existence (but not necessarily uniqueness) of k. For many examples of such monoidal NNOs, see [102]. We remark that there are yet other axiomatizations. A Peano-Lawvere category is a category for which the forgetful functor C N → C has a left adjoint (where N is regarded as the free monoid on one generator). A systematic study of the free such category can be found in Burroni’s [12]. Another relevant class of categories is that of list-arithmetic pretoposes. These were developed by Maietti [91] (see also [92]) in order to provide a categorical setting accommodating Joyal’s arithmetic universes ([56]), which in turn serve as a categorical account of the Incompleteness Theorem. A pretopos is a category that has finite limits, pullback-stable disjoint coproducts, and pullback-stable quotients of equivalence relations. Such a category has parameterized list objects when for each object A there is an object LA equipped with maps e : 1 → LA, c : LA × A → LA (thought of as the empty list and concatenation). These are required to satisfy the following universal property: for any a : B → C and h : C × A → C there is a unique Itah : B × LA → C making the following commute:
Aspects of Categorical Recursion Theory 1B ,e
/ B × LA o BH HH HH It HH a HH ah H# Co
239 1B ×c
h
B × LA ×A It ×1 ah A C ×A
As is the case for NNOs, we may also consider a weak version where we only require existence and not uniqueness of the iterator Itah . A list-arithmetic pretopos is now defined as a pretopos admitting parametrized list objects for all A. Note that taking A = 1 gives the notion of a parameterized NNO.
4.2 Representability We now turn to representability of numerical functions in categories with NNOs. Definition 14 (Lambek-Scott [77]) 0
Let C be a cartesian category with a weak parametrized NNO 1 −→ S N −→ N. A function f : Nk → N is representable in C if there is an arrow n1 , · · · , n k = m whenever f (n1 , · · · , nk ) = m. F : Nk → N ∈ C such that F Of course, the determination of which numerical functions are representable depends on the category: in the category Set, all numerical functions are representable! Following Lambek’s question in the Introduction, we shall look at free categories with NNOs. Theorem 3 (Rom´ an [110]) The class of representable numerical functions in Fcart , the free cartesian category with parametrized NNO, is Prim. Hence the unique representation functor Fcart −→ Set has as image the subcategory of sets whose objects are powers Nn and whose maps are tuples of primitive recursive functions. Rom´an’s proof essentially shows that Goodstein’s development [40] of Skolem’s primitive recursive arithmetic can be mimicked in Fcart . In that sense, the result is not so surprising. However the following striking result considers the extension to Fmon , the free monoidal category with a LNNO. Recall primitive recursion requires projection functions Uin : Nn → N, yet in a monoidal category, in general ⊗ does not have explicit projections. Nevertheless: Theorem 4 (Par´ e-Rom´ an [102]) (i) The primitive recursive functions are representable in any monoidal category with LNNO. (ii) Indeed, Fmon , the free monoidal category with LNNO, exists and is isomorphic to Fcart , the free cartesian category with parametrized NNO.
240
Pieter Hofstra and Philip Scott
Why is this? The reason is that the objects of Fmon are generated from N under tensoring and we can code projections and diagonals between tensor powers N⊗k . This then allows the representability of the primitive recursive functions in a similar manner to Fcart . The former result (coding projections and diagonals) is proved by an elegant categorical argument in Par´e-Rom´ an, while Plotkin [106] gives a direct (albeit nontrivial) coding of the primitive recursive functions in Fmon .
4.3 Going beyond the primitive recursive functions: free CCCs How do we get more representable functions? We increase the logical strength (the types) from the logic of {∧, } (or {⊗, I}) to the logic of {∧, ⇒, }, i.e. to the cartesian closed level. Consider the free CCC with natural numbers generated by the empty graph, denoted Fccc (as defined in Section 3.3). The following is a theorem about simply typed lambda calculus, translated into the language of CCCs: Theorem 5 (Lambek-Scott [77]) In Fccc , the free CCC with weak NNO: 1. All primitive recursive functions and the Ackermann function are representable. 2. The representable functions form a proper subclass of the total recursive functions, namely the provably total functions of Peano Arithmetic, or equivalently the ε0 -recursive functions [116, 37]. More generally, the representable total functions of Fccc are the lowest level of the hierarchy of G¨odel’s Dialectica Functionals, i.e., G¨ odel’s primitive recursive functionals of finite type [39, 115]. There is also a version of G¨odel’s Incompleteness for Fccc . Let Z represent the zero function. Theorem 6 (A version of Incompleteness, or 1 is not a generator) In Fccc , there is a closed term F : N ⇒ N such that for each numeral n , Fn = 0, but F = Z . For a proof, see Corollaries 2.11, 2.12 in [77], p.263. Finally, a topic of considerable importance in theoretical computer science: Computation by normalization We should also recall the notion of computation by normalization or, for a logician, by cut-elimination [37]. In the rewriting theory of typed lambda calculus, we can set up strongly normalizing rewrite systems in which terms can be rewritten to (unique) normal forms.
Aspects of Categorical Recursion Theory
241
Given a term f : N ⇒ N and a numeral n : N, to compute f n by normalization, we first normalize this term to its unique normal form of type N. This yields a numeral m, for which we can prove f n = m; cf. [37]. This gives the value of f on input numerals. By Curry-Howard-Lambek, normalization techniques may also be used to solve coherence problems (decidability of equality) for various free CCCs, via their internal languages [77]: to check if two arrows in a free CCC are equal or not, it suffices to show that their normal forms (qua lambda terms) are identical, up to change of bound variables.
4.4 Some properties of the free topos Pure intuitionistic type theory L0 has many interesting properties, which translate into algebraic properties of the free topos Ftop (see [77]) and are often key metamathematical principles of intuitionistic systems ([115]). In what follows we write instead of L0 for derivability in intuitionist higher order arithmetic HAH. • Consistency: not ( ⊥) . • Disjunction Property: If p ∨ q , then p or q. • Existence Property (EP): If ∃x:A ϕ(x) then ϕ(a) for some closed term a : A. In particular, in Ftop EP says that numerals are standard, i.e. that numerals f
1 −→ N are all of the form n , for some n ∈ N. • Troelstra’s Uniformity Principle (UP) for A = PC: If ∀x:A ∃y:N ϕ(x, y) then ∃y:N ∀x:A ϕ(x, y). In Ftop , UP says the arrows PC → N are constant, i.e. factor through a standard numeral. • Independence of premisses (IP): If ¬p ⇒ ∃x:A ϕ(x) then ∃x:A (¬p ⇒ ϕ(x)). • Markov’s Rule (MR): If ∀x:A (ϕ(x) ∨ ¬ϕ(x)) and ¬∀x:A ¬ϕ(x), then ∃x:A ϕ(x). • The Existence Property with a parameter of type A = PC: If ∀x:A ∃y:B ϕ(x, y) then ∀x:A ϕ(x, ψ(x)), where ψ(x) : B. Proofs: The original proofs [76] for EP and DP used Friedman (impredicative) realizability. When the authors lectured on this, Peter Freyd realized these rules had purely algebraic statements, with direct categorical proofs, using Artin gluing categories ([120]). The Freyd gluing techniques were ex-
242
Pieter Hofstra and Philip Scott
panded to include the proof rules above in [76] and in a series of later papers by the authors. This is also presented in [77]. The free Boolean topos Fbool is defined in the same way as the free topos, but generated from classical type theory. As argued in [77], alas the free Boolean topos is not an ideal universe for classical mathematicians. For example, as a consequence of G¨odel’s Incompleteness Theorem, there are nonstandard numerals. To see this, let G be any undecidable G¨odel sentence. It may be shown that ϕ(x) := (x = 0 ⇒ G) ∧ (x = 0 ⇒ ¬G) determines a numeral f : 1 → N in Fbool ; however, it cannot be a standard numeral, else we could decide G. We now turn to the matter of representable numerical functions in the free topos. First we recall the definition of representability of a function in HAH12 : Definition 15 (Representability in HAH) A total function f : Nk → N is representable in HAH when there exists a formula Rf (x1 , . . . , xk , y) such that (i) f (n1 , . . . , nk ) = m if and only if Rf ( n1 , . . . , n k , m) (ii) ∀x1 : N . . . xk : N∃!y : N.Rf (x1 . . . , xk , y). In the literature, one often considers a weaker notion of representability, in which clause (i) remains, but (ii) above is replaced by (ii’) for all n1 , . . . , nk ∈ N. ∃!y : N.Rf ( n1 , . . . , n k , y). We refer to this weaker notion as numeralwise representability. It follows that a total numerical function Nk → N is representable in HAH if and only if it is representable by an arrow N k → N in the free topos. (See Prop. 3.1, p. 264 in [77] for details.) Theorem 7 (Lambek-Scott [77]) (i) In HAH (and hence in the free topos), every representable numerical function is recursive. In particular, the global sections functor Ftop (1, −) : Ftop → Set sends morphisms Nk → N to recursive functions Nk → N. (ii) Not all total recursive functions so arise. (The second part of the theorem can be established by means of a diagonal argument.) This of course leads to the question of which total recursive functions are representable in HAH. This is related to the representability of numerical functions in Girard’s system Fω , but we shall not pursue this here. We note that the situation changes radically if we consider classical type theory (the free Boolean topos). 12
The same definition works in other formal systems such as Peano Arithmetic.
Aspects of Categorical Recursion Theory
243
Theorem 8 (Lambek-Scott [77]) (i) The numeralwise representable functions in classical type theory are exactly the total recursive functions (G¨ odel). (ii) In classical type theory, numeralwise representable functions coincide with representable ones (by a result of V. Huber-Dyson [51], detailed in [77]). Hence the representable numerical functions in classical type theory are exactly the total recursive functions. Unfortunately, as we have seen, the free Boolean topos has non-standard numerals. Thus, the global sections functor from the free Boolean topos to Set in general sends arrows Nk → N to partial, rather than total, numerical functions. This suggests that the representability of partial functions may be at least as important as that of total functions. In fact, we shall see that even at the intuitionistic level, the theory becomes much smoother. Definition 16 A partial function f : Nk N is representable in HAH if there is a formula Rf (x1 , . . . , xk , y) such that (i) for all n1 , . . . , nk ∈ N, f (n1 , . . . , nk ) is defined and equal to m if and n1 , . . . , n k , m) only if Rf ( (ii) ∀x : Nk ∀y : N∀z : N.Rf (x, y) ∧ Rf (x, z) ⇒ y = z. We now have the following characterization: Theorem 9 (Lambek-Scott [77]) A partial numerical function is representable in HAH (i.e., in the free topos) if and only if it is partial recursive.
4.5 C-monoids and Untyped Lambda Calculi As mentioned earlier, Church’s untyped lambda calculus played a key role in the original development of computability theory, as well as modern programming language theory. It was Dana Scott in the late 1960s who pointed out that untyped lambda calculi may be considered as typed lambda calculi with one non-trivial type (up to isomorphism). This arose from his development of domain theory, the mathematical modelling of untyped lambda calculi and the semantics of programming languages [2]. An algebraic framework for this development is given in [77], pp. 93-114, which we now sketch. For some historical references, the reader can see [73, 112]. Recall, monoids are categories with one object. A monoid has a terminal object precisely when it is trivial. However, when we ignore the terminal object, we may formulate a notion of cartesian closure: Definition 17 (Lambek-Scott [77]) A C-monoid is a monoid M with constants π1 , π2 , ε, unary operation (−)∗ , and binary operation −, − satisfying the equations of a CCC without a terminal object: i.e. products, surjective pairing, β, η. Explicitly:
244
Pieter Hofstra and Philip Scott
π1 a, b = a π2 a, b = b π1 c, π2 c = c (εkπ1 , π2 )∗ = k εh∗ π1 , π2 = h The following results illustrate how C-monoids relate to untyped lambda calculi and CCCs. They are an untyped variation of Theorem 1. Theorem 10 (Lambek-Scott [77]) (i) There is a bijective correspondence between C-monoids and untyped lambda calculi with products and surjective pairing. 13 This correspondence extends to an isomorphism between the category of C-monoids and the category of such untyped lambda calculi (cf. [77], p.106). (ii) C-monoids correspond to CCC’s generated by a non-trivial reflexive object U , i.e., an object U ∼ 1 satisfying U U ∼ = =U ∼ = U × U . In this case, End(U ) will be such a C-monoid (cf. [77], p.99). Without the η-rule, we would have U × U U, U U U (cf. also [2],[6] ). (iii) With respect to appropriate numeral systems (e.g. Church or Barendregt numerals (see [6], Sections 6.3, 6.4), the computable functions in the free C-monoid are precisely the partial recursive functions (cf. [77], p.276.) We remark that part (ii) of the above theorem uses an observation of D. Scott ([112],[77]), which says: if we form the idempotent splitting completion (Karoubi envelope) of a C-monoid, we obtain a CCC which is generated by a reflexive object U . There are precise senses in which all C-monoids are isomorphic to such CCCs ([77], p.99.) Since Church’s untyped lambda calculus was an early foundation of computability theory, it is no surprise that the computable functions in the free C-monoid are precisely the partial recursive ones.
4.6 Plotkin’s characterization of Kleene’s μ-recursion We recall Lambek’s Lemma [68], which is often used in denotational semantics. Given an endofunctor T : C → C we define a T -algebra as a map T A → A. Maps of T -algebras are commutative squares TA A
Tf
f
/ TB / B.
This gives a category of T -algebras; a T -algebra is called initial when it is an initial object in this category. 13
Such untyped lambda calculi extended with surjective pairing do not enjoy good rewriting properties. By a famous result of Klop [6], Ch.15, §3, the Church-Rosser theorem fails for them. Thus, the consistency of such systems would involve constructing a non-trivial C-monoid (cf. [77], pp.107-114.) or more general models [6].
Aspects of Categorical Recursion Theory
245 α
Lemma 1 (Lambek [68]) If T A −→ A is an initial T -algebra, then α is an isomorphism. For us, the following is the prime example: The NNO N as an initial successor algebra in Set Consider the endofunctor T (−) = 1 + (−) on Set (often called the lifting α functor ), with the T -algebra structure (1 + N) −→ N, where α = [0, S], for 0 S 1 −→ N and N −→ N. The NNO property says that α is an initial T -algebra. α
In Set, Lambek’s Lemma then gives the familiar fact that 1 + N −→ N is an isomorphism, for α = [0, S]. As we have seen above, initiality of α gives us primitive recursion. Now what about if we turn things around? Plotkin asked for the finality of the coalgebra α−1 : N → 1 + N. It is not final in Set but Plotkin shows it is final in Par. Interestingly, this turns out to give exactly Kleene μ-recursion for partial functions. Let C be a monoidal category with (right distributive) binary sums and a 0 S weak left (or right) natural numbers object I −→ N −→ N. Following Plotkin, we extend Definition 14 of representable function to include partial functions, as follows. We shall say a partial function f : Nk N is representable by an arrow F : Nk → N ∈ C if for all n1 , . . . , nk ∈ Nk , f (n1 , · · · , nk ) ∼ n1 , · · · , n k = m = m ⇒ F
(†)
where ∼ = means Kleene equality. Theorem 11 (Plotkin [106]) Let C be a monoidal category with (right distributive) binary sums and a weak left (or right) natural numbers object 0 S I −→ N −→ N such that [0, S] is an isomorphism and (N, [0, S]−1 ) is a weakly final lifting coalgebra. Then all partial recursive functions are representable. It is natural to ask if we can replace the “⇒” in equation (†) above by the stronger condition “⇔” (as in Definition 16 (i))? Plotkin calls this latter notion strong representability. The proof of Theorem 9 above (in [77], p.270) shows that for many arithmetical theories, representable partial functions are partial recursive. Plotkin takes the analog of this result (for strong representability) as an actual assumption to obtain a positive answer: Theorem 12 (Plotkin [106]) Let C be a monoidal category with (right distributive) binary sums and a 0 S weak left (or right) natural numbers object I −→ N −→ N such that [0, S] is an isomorphism and (N, [0, S]−1 ) is a weakly final lifting coalgebra. If 0 = S0 and if all strongly representable functions are partial recursive, then all partial recursive functions are strongly representable in C.
246
Pieter Hofstra and Philip Scott
As pointed out to us by Plotkin, by the discussion above Theorem 12, we obtain a Corollary in keeping with the theme of this paper: Corollary 1 Consider the free category F N endowed with the structure in Theorem 12. Then all partial recursive functions are strongly representable in F N .
5 Abstract Computability In this section we address the question: what is a category of computable maps? This should be compared with “synthetic” approaches to other areas of mathematics such as synthetic differential geometry, synthetic domain theory, homotopy type theory, and differential categories. A synthetic approach to computability aims at describing the categorical structure common to all reasonable notions of computation; hence in such categories every morphism is by definition computable. Note the contrast with the work described in the previous section, where one starts with a category that, a priori, has no prescribed computability-theoretic content, and where one identifies some maps as representing computable numerical functions. Most notions of computation are inherently partial, in the sense that they allow for the computable maps to be partial maps. This fact, together with the importance of partial maps in other areas of mathematics, has resulted in a long history of studying partial maps in categories, going back to the early days of topos theory. This history largely overlaps with attempts to formulate aspects of computability theory in categorical terms, which in turn are closely related to the study of categories of domains, as in [2].
5.1 Categories of Partial Maps We begin with a recent abstract treatment of categories of partial maps by Cockett and Lack [25]. There are at least two reasons for favouring this axiomatization: first, it is sufficiently general, in that it subsumes all the previous treatments. Second, it is algebraic, in the sense that it identifies categories of partial maps as ordinary categories equipped with additional algebraic structure. This allows for the application of powerful techniques from categorical algebra. For a much more detailed presentation and comparison with other approaches, see loc. cit. and follow-ups. Definition 18 (Restriction Category) A restriction category is a category C together with an assignment ( ) : HomC (A, B) −→ HomC (A, A) mapping f −→ f satisfying:
Aspects of Categorical Recursion Theory
R.1 R.2 R.3 R.4
ff fg gf gf
247
=f = gf whenever dom(f ) = dom(g) = gf whenever dom(f ) = dom(g) = f gf whenever cod(f ) = dom(g)
We have f = f , as well as f f = f . Maps satisfying f = f are called restriction idempotents. The collection of restriction idempotents on A is denoted O(A); the composition operation makes O(A) into a meet-semilattice; for each f : A → B, there is an induced meet-semilattice homomorphism f ∗ : O(B) → O(A) sending e ∈ O(B) to ef ∈ O(A). A map f : A → B is total if f = idA . We obtain a wide subcategory Tot(C) → C. Examples of Restriction Categories
1. Par is a restriction category when we define x if x ∈ Dom(f ) f (x) = ↑ else 2. The restriction structure on Par is inherited by various subcategories, most notably the subcategory on the partial computable functions. This uses the fact that if f is computable, then so is f . 3. Every category can be viewed as a restriction category by declaring f to be the identity for all f .
Definition 19 (Local Partial Order) For f, g : A → B in a restriction category, define f ≤ g ⇔ f = gf . For example, in Par, we have: f ≤ g precisely when Graph(f ) ⊆ Graph(g), i.e., when g extends f . Many notions for plain categories can be modified to make sense in the partial world. For example: Definition 20 (Cartesian Structure) A restriction terminal object is an object 1 together with, for each object A, a unique total map !A : A → 1 with !B f ≤!A for all f : A → B. A restriction product of A, B is an object A × B with total projections πA , πB such that for f : C → A, g : C → B there is a unique f, g with πA f, g ≤ f , πB f, g ≤ g and f, g = f g.
248
Pieter Hofstra and Philip Scott
/1 ? ≤ f !B B A
!A
C x FFF Fg xx x x f,g FFF FF xx ≥ ≤ x # |x A o πA A × B πB / B f
A Cartesian Restriction Category is a restriction category C which has a restriction terminal object and binary restriction products.
5.2 Turing Categories Turing categories, introduced in [21] are restriction categories that essentially encode simultaneously the ideas underlying Kleene’s Snm and Enumeration theorems. They are also closely related to cartesian closed categories generated by models of untyped lambda calculus, in that they weaken the cartesian closure, while generalizing to the partial world. Definition 21 (Turing Category) A Turing category is a cartesian restriction category C with an object A (called a Turing Object), and a family of τX,Y “universal application morphisms” {A × X −→ Y | X, Y ∈ C} with weak f
h
Currying: for every Z × X −→ Y there exists a total map Z −→ A factoring through τX,Y : τX,Y /Y A ×O X x; x x xx h×1xxx f x Z ×X Note that this expresses the idea that A acts as a weak exponential Y X , for any pair of objects X, Y . One particular consequence is that every object is a retract of A. In particular, all finite restriction products An are retracts of A. An elementary but useful fact is the fact that the class of Turing categories is closed under idempotent splitting: if C is a Turing category, then so is KE (C) where E is the class of restriction idempotents. A Turing category can equally well be described by “universal self• application” τAA , denoted A × A −→ A. Theorem 13 (Cockett-Hofstra [21]) A Turing Category is a cartesian restriction category with an object A such that (i) every object is a retract of • A and (ii) there is a universal self-application map A × A −→ A. Here are some of the motivating examples of Turing categories:
Aspects of Categorical Recursion Theory
249
Examples of Turing Categories
1. Let {φm }m∈N be a standard enumeration of unary partial recursive functions (see [27]). Kleene’s First Model Comp(N) is the category whose objects are powers Nk and whose maps Nk → Nm are m-tuples of partial computable functions of k variables. N is a Turing object, there are retractions Nk N and m • n := φm (n) gives a universal application, by Kleene’s theorems. The restriction idempotents in this case are precisely the r.e. sets. Hence the restriction idempotent splitting of this category has the r.e. sets as objects, and partial computable functions as maps. This example can be generalized to give categories Comp(NA ), where A is an oracle. 2. Consider a C-monoid, or more generally a reflexive object U in a ccc, where 1 U , U × U U, U U U . If (m, r) : U U U is a retraction r×id
ev
pair, then •U := U × U −→U U U × U −→ U determines a total Turing structure with Turing object U. 3. Term models of Partial Combinatory Logic (PCL) yield Turing categories. PCL is a partial algebraic theory with constant symbols s, k and one binary application symbol • (we write xy instead of x • y, and associate to the left). Terms are formed in the usual way, together with a clause for forming restricted (partial) terms: Terms
t, t ::= VAR | s | k | tt | t|t
where t|t is to be interpreted as “t restricted to dom(t )”. (The categorical interpretation of such a restricted term is [t]]◦[[t ].) The following equations are imposed: kxy = x|y , sxyz = xz(yz), and sxy ↓. See [23] for details. The case of the closed term model is particularly significant because it corresponds to the initial Turing category. Note that a total point t : 1 → A of the Turing object corresponds to a provably total closed term t of P CL. The global sections functor is therefore not faithful, since there exist many closed terms that are not provably total, for example k|P where P is the paradox combinator.
From the axioms of a Turing category, one may derive some basic results from computability theory such as the recursion theorems. The restriction idempotents (partial identities serving as the domains of maps) in a Turing category play the role of recursively enumerable sets; pullback of restriction idempotents then corresponds to m-reducibility. Note that the standard model Comp(N) also has ranges, in the sense that every morphism not only has a domain but also a range; such categories are studied in detail in [19, 20]; see also [119].
250
Pieter Hofstra and Philip Scott
Since the axioms of a Turing category do not preclude total models, one cannot expect results such as the undecidability of the halting problem or Rice’s theorem to follow in general. A detailed discussion of the development of basic computability theory in the setting of Turing categories and how this depends on additional structure can be found in [18].
5.3 Computable maps and PCAs Turing categories are closely related to a class of structures called partial combinatory algebras (PCAs), as suggested by Example 3. Let C be a cartesian restriction category. An applicative structure in C is a • pair A = (A, •), where A × A −→ A is a morphism called application. There are no requirements on • (such as associativity). Define •n : A × An → A •n
•×id
inductively, so •n+1 := A × A × An −→ A × An −→ A. f
Definition 22 (Computable maps) A map An −→ A is A-computable when it is “named” by a total point of A, i.e. there is a total point p : 1 → A such that (identifying An with 1 × An ): n
• /A A ×O An w; w w ww p×idA www f w An
(Intuitively, f (x) = p • x.) Moreover, we require that f is total on its first n−1 arguments. More generally, we say a map f : An → Am is A-computable if all its components are. Since there are no axioms on an applicative object, the collection of Acomputable maps cannot be expected to have any good closure properties. In particular, it cannot be expected to form a subcategory of C. When it does, the object A is called combinatory complete. This characterization is the categorical formulation of combinatory completeness (see also [87]). Classically, an applicative structure is called combinatory complete when every “polynomial” built from variables, elements of A and application, is represented by an element of A, see [7, 118]. When t is a polynomial and x is a variable, we write λ∗ x.t for the element representing t. That is: (λ∗ x.t)a = t[a/x] for all a ∈ A. Equivalently, an applicative structure is a PCA exactly when it is a model of the theory PCL (see Example 3 above). Definition 23 A combinatory complete applicative structure A is called a partial combinatory algebra (PCA). For A a PCA, denote by Comp(A) the restriction category whose objects are the finite powers of A and whose morphisms are the A-computable maps.
Aspects of Categorical Recursion Theory
251
At first sight, it may not be evident that combinatory completeness has many interesting consequences. It ensures, however, that PCAs are powerful enough to represent various useful programming constructs. Lemma 2 In any PCA, we can define the following: (i) Booleans, pairs, numerals (using e.g. n = λ∗ aλ∗ f.f n a). What is more, any partial computable function f : N → N can then be represented in A in the sense that there is an element af ∈ A for which f (n) = m implies af n = f (n). (ii) Fixed points: there is an element y ∈ A for which ya = a(ya) for all a ∈ A. (iii) Recursors: there is an element r ∈ A for which rab 0 = a, rab(n + 1) = b(rab n) n for all a, b ∈ A and n ∈ N. Standard examples of PCAs include Kleene’s first model (natural numbers with partial recursive application, typically denoted K1 ), term models of PCL, and models of untyped lambda calculus. The following example is of importance in higher-order computability, and will be used in the next section. We presuppose a surjective coding of finite sequences − : N∗ → N. For g : N → N, let gˆ(k) = g(0), . . . , g(k − 1). Finally, let ∗ denote concatenation of sequences; for a sequence L and n ∈ N we write n ∗ L instead of n ∗ L. Kleene’s second model Consider f, g ∈ NN . Define a (possibly partial) function f g : N → N by f (n ∗ gˆ(k)) − 1 where k = μr .f (n ∗ gˆ(r)) > 0 (f g)(n) = undefined if no such k exists. Then define a partial application • : NN × NN → NN by f g if f g is total f •g = undefined otherwise. This model is typically denoted K2 , and captures a notion of “computable operations acting on continuous data”. If we restrict NN to the set of total computable functions, we get a sub-PCA K2eff of “computable operations acting on computable data”. For details see [86]. What is the correct notion of morphism of PCAs? Regarding A, B as computational devices, a morphism ϕ : A → B should at least express that A can be interpreted, or simulated, within B. The following definition is due to Longley [83]. We state it in set-theoretic terms here, but it can easily be rendered diagrammatically in a cartesian restriction category:
252
Pieter Hofstra and Philip Scott
Definition 24 (Simulation) A simulation from A to B is a function ϕ : A → B for which there exists b ∈ B such that x • y↓
=⇒
b • ϕ(x) • ϕ(y) = ϕ(x • y).
Simulations compose, and in fact form a 2-category. We point out that in [83] a relational version of this definition is given; however, as demonstrated in [50], it is possible to view relational simulations as Kleisli morphisms over a base category of functional simulations. Numerals as Simulation Every PCA admits a choice of numerals; such a choice amounts to a simulation K1 → A. All non-constant simulations K1 → A are in fact isomorphic to each other.
Returning to the connections between PCAs and Turing categories, we note that by construction Comp(A) is a cartesian restriction category. The following shows that PCAs are a fundamental notion for generating Turing categories: every PCA gives rise to a Turing category, and every Turing category is generated by the PCA structure on the Turing object. Theorem 14 (Cockett-Hofstra [21]) (i) If A = (A, •) is a PCA, then Comp(A) is a Turing category, with Turing object A. (ii) If (C, A) is a Turing category with Turing object A, then (A, •) is a PCA and C ∼ = KE (Comp(A)), for some class of idempotents E. Thus “Categories of the form Comp(A) serve as a minimal environment (for) PCA’s and ... computable maps ...; other Turing categories are supposed to be viewed as (non-essential) inflations of such minimal categories” ([21]). Earlier we contrasted the approach of identifying representable numerical functions in free categories with NNO with the synthetic approach of Turing categories. However, there is a slightly different perspective on Turing categories, that perhaps brings the two approaches closer together. Instead of considering Turing categories in isolation, i.e., synthetically, one can consider Turing categories structured over a base category. For example, the Turing category Comp(N) can be considered as a non-full subcategory of Par. This point of view is particularly relevant when one wishes to consider non-computable functions or study, e.g., non-r.e.degrees. More generally, we think of a Turing category C with a cartesian restriction functor F : C → B into a base category B as specifying an object F A of B together with a notion of computation on F A. The object F A is necessarily a PCA, but C is not always Comp(F A); the reason is that F A may have total elements t : 1 → F A that are not in the image of F . Hence Comp(F A) may contain
Aspects of Categorical Recursion Theory
253
morphisms that are not represented in C. This forces the consideration of relative PCAs, and the full characterization of Turing categories over a fixed base in terms of such relative PCAs can be found in [22].14 Note that there is an analogy between the two perspectives on Turing categories and those on toposes: one may consider toposes relative to a fixed base topos S (as is common in the study of Grothendieck toposes, where S plays the role of the universe of sets), or one may study elementary toposes such as the free topos without regarding them as being constructed over a base.
6 Realizability We now briefly turn our attention to a strand of research that also heavily involves the study of categorical structures associated to models of computation, but that is different from the earlier themes in that it primarily considers such structures as models of various logical systems.
6.1 Kleene Realizability Realizability, originally devised by Kleene in the seminal paper [60] 15 , is to be thought of as a semantics for constructive mathematical systems16 . In Kleene’s original work, the system at hand was Heyting Arithmetic (HA), and the interpretation was defined in terms of partial computable functions. The central notion is written n r A, where n ∈ N and A a formula in the language of arithmetic, and is pronounced “n realizes A”, or “n is a realizer for A”. The intuition is that n codes information about why A is true. The definition is by induction on the structure of A (and uses an enumeration φ0 , φ1 , . . .) of unary partial computable functions: Definition 25 (Kleene Realizability) Define n r A (for sentences A) by 14 This characterization involves a notion of simulation between Turing categories (over a fixed base), generalizing the foundational work by Longley [83] on simulations between PCAs (called applicative morphisms in loc. cit.). 15 We omit a discussion of the history of the subject, of which some of the main threads are detailed in [117]. 16 Recent work by Krivine and others has shown that it is also possible to define realizability interpretations of classical systems.
254
Pieter Hofstra and Philip Scott
nrt = s nrA∧B nrA∨B nrA → B n r ∃x.A n r ∀x.A
iff n = 0 and t = s is true iff n = a, b where a r A and b r B iff n = a, b where either a = 0 and b r A or a = 1 and b r B iff for all m ∈ N, if m r A then φn (m) ↓ and φn (m) r B iff n = a, b where b r A[a/x] iff for all m ∈ N, φn (m) ↓ and φn (m) r A[m/x]
The Soundness theorem now states: HA A =⇒ ∃n ∈ N.n r A. The converse, however, is false: there are realizable statements that are underivable. Most notably, Extended Church’s Thesis (ECT0 ) is the scheme: ∀x(A(x) → ∃y.B(x, y)) → ∃e∀x(A(x) → B(x, e • x)) Here, A is assumed to be an almost negative formula, and e • x denotes the application of the e-th computable function to x, suitably represented in HA. One can show that all instances of ECT0 are realizable but not provable in HA. Moreover, ECT0 axiomatizes Kleene realizability, in the sense that the realizable statements of HA are precisely those that are derivable in HA + ECT0 . Over the years, many variations on Kleene’s original definition have been studied, with the purpose of establishing, among other things, consistency results and proof-theoretic properties of various formal systems. For example, q-realizability incorporates derivability into the definition of realizability, and can be used to establish the existence and disjunction properties of HA.
6.2 Realizability Toposes How does realizability manifest itself categorically? Historically, the topostheoretic treatment of Boolean-valued and Heyting-valued models ([34, 46]) inspired the idea of considering sets of realizers as truth values. This idea led Hyland to the discovery of Eff , the Effective Topos [52], an elementary (non-Grothendieck) topos with the property that the first-order arithmetical statements about the NNO are precisely the Kleene-realizable statements. Thus, among other things, the internal language of Eff is a natural extension of Kleene realizability to higher-order logic. Various notions from computability theory find a natural home in Eff . For example, the Turing degrees manifest themselves in the form of subtoposes of Eff : Theorem 15 ([52, 104]) The lattice of Turing degrees faithfully embeds into the lattice of subtoposes of Eff . (Here, the notion of subtopos is taken in the geometric sense: it is a full subcategory closed under finite limits, whose inclusion has a finite-limit preserving left adjoint.) Not every subtopos arises from a Turing degree however; see [82] for more information.
Aspects of Categorical Recursion Theory
255
There are several ways to present the Effective Topos and its variants. Perhaps the simplest is via exact completions (see [13, 16], as well as [94]). A category C is called exact if it has finite limits, pullback-stable quotients of equivalence relations, and if every coequalizer is the coequalizer of its kernel pair. Every topos is exact. Now to each category with finite limits C one may associate an exact category C ex/lex by freely adding quotients of equivalence relations, and the Effective Topos is of this form. The finite limit category in question is called Pasm, the category of partitioned assemblies. Definition 26 (Partitioned Assemblies) The category Pasm has objects pairs (X, α) with X a set and α : X → N a function; a morphism (X, α) → (Y, β) is a function f : X → Y which is tracked, in the sense that there exists a code e ∈ N such that ∀x ∈ X.e • α(x) ↓ ∧ e • α(x) = β(f (x)). Theorem 16 (Carboni et al. [13, 14]) The Effective Topos is the exact completion of the category of partitioned assemblies: Eff Pasmex/lex . The above construction of Eff can be refined by considering an intermediate category: Definition 27 (Assemblies) The category Asm has objects pairs (X, α) with X a set and α : X → P+ N a function (where P+ denotes the non-empty powerset); a morphism (X, α) → (Y, β) is a function f : X → Y which is tracked, in the sense that there exists a code e ∈ N such that ∀x ∈ X∀a ∈ α(x).e • a ↓ ∧ e • a ∈ β(f (x)). The category Asm is regular : it has finite limits and admits stable quotients of equivalence relations. Any finite limit category C admits a free regular completion C reg , and any regular category D admits a free exact completion Dex/reg . With this notation, we now have the following relations between Pasm, Asm, and Eff : Theorem 17 (Carboni et al. [13, 14]) There are equivalences Asm
Pasmreg and Eff Asmex/reg . The category of assemblies happens to be much more than regular: it is a quasitopos and has a NNO, given by (N, {−}). As such, a lot of the computabilitytheoretic features of Eff already manifest themselves in this subcategory. For example, in Asm we may consider higher-type computability over N. An alternative construction of Eff , more logical in nature, makes use of the concept of a tripos (see [53]; tripos is an acronym for “topos-representing indexed preordered set”.) One considers the Set-indexed preorder Set(−, PN); for a set X, we preorder Set(X, PN) by: α X β ⇐⇒ ∃e ∈ N∀x ∈ X∀a ∈ α(x).e • a ↓ ∧ e • a ∈ βf (x).
256
Pieter Hofstra and Philip Scott
There is now a general construction turning a tripos into a topos, and Eff arises as the result of applying this construction to Set(−, PN). This construction highlights the original idea of regarding sets of realizers as truthvalues, in analogy with H-valued sets for H a complete Heyting algebra.
6.3 PCAs and Toposes The construction of the Effective Topos generalizes in various ways. We focus on the following fact17 : for each PCA A = (A, •), there is an associated realizability topos RT(A). In fact, we may associate to A a category of partitioned assemblies Pasm(A) (where the objects are sets X equipped with a function α : X → A), and let RT(A) = Pasm(A)ex/lex . Alternatively we build the tripos Set(−, PA). The functoriality of A → RT(A), including the correct notion of “Morita equivalence” for PCAs was worked out in [83]; the complete characterization of (geometric) morphisms between toposes of the form RT(A) in terms of morphisms of (ordered) PCAs appears in [50]. An important construction, both for the analysis of realizability toposes and for applications of realizability, is that of the category of PERs over a PCA. A PER (partial equivalence relation) on a set A is simply a symmetric and transitive relation; equivalently, it is an equivalence relation on a subset of A (then called the domain of the PER). When R is a PER on A, we write A/R = {[a] | (a, a) ∈ R} for the set of equivalence classes. In case of a PCA, this leads to the following: Definition 28 (Category of PERs) Let A = (A, •) be a PCA. The category PER(A) has as objects PERs (A, R) on A. A morphism (A, R) → (A, S) is a function f : A/R → A/S that is tracked in the sense that there exists a ∈ A such that ∀x ∈ A.(x, x) ∈ R → a • x ↓ ∧ f [x] = [a • x]. The category PER(A) can be seen as a full subcategory of Asm(A) on those objects (X, α) for which α(x) ∩ α(y) = ∅ implies x = y. It is (locally) cartesian closed, and has a NNO. We will return to this structure in the section on higher type computability below. Since PCAs give rise both to Turing categories and to realizability toposes, it is natural to wonder how the latter two are related. We mention here one result that builds on earlier insights into how realizability toposes can be regarded as colimit completions [108, 109]. In [22] a universal property of partitioned assemblies is exhibited: it is the free fibred preorder on a functor, in a suitable restriction-category theoretic sense. In case of a PCA A with associated Turing category Comp(A), applying this construction gives a fibration, 17 It was already known well before the discovery of the effective topos that combinatory algebras carried sufficient structure to define notions of realizability, see e.g. [33].
Aspects of Categorical Recursion Theory
257
and taking total maps recovers Pasm(A). Moreover, this construction has the property that it turns simulations between Turing categories into actual functors on the level of partitioned assemblies. To conclude our discussion of realizability toposes we mention the abstract characterization of toposes of the form RT(A) due to Frey [35]. In order to state this result, we need to define a few concepts. First, suppose that Γ ∇ is a pair of adjoint functors with Γ ◦ ∇ ∼ = 1. Then a map f is called closed (w.r.t. this adjunction) if the square A ∇Γ A
f
∇Γ f
/B / ∇Γ B
in which the vertical maps are the unit morphisms is a pullback18 . Moreover, an object A is called separated when the unit A → ∇Γ A is monic19 . Finally, an object is called discrete when it is orthogonal to all closed regular epimorphisms. Theorem 18 (Frey [35]) A locally small category E is equivalent to RT(A) for a PCA A if and only if the following conditions hold: • E is exact and locally cartesian closed; • E has enough projectives and the full subcategory Proj(E) on the projective objects is closed under finite limits; • The global sections function Γ : E → Set has a right adjoint ∇ which factors through Proj(E); • There exists a separated, projective object D such that for any projective object P there exists a closed map P → D. This theorem should be regarded as the analogue of the well-known Giraud theorem characterizing Grothendieck toposes among exact categories in terms of their relation to Set. Note that the first conditions express that E is of the form C ex/lex , and that the last two conditions therefore characterize categories of the form Pasm(A). We end this section by a brief mention of another approach to partial recursive functions and PER, introduced by Lambek [75] and studied further in [78]. In this view, one considers the category of relations generated by the monoid of primitive recursive functions (qua relations). Taking this viewpoint, a partial recursive function is simply a single-valued recursively 18 The terminology closed derives from the fact that for realizability toposes RT(A), closed subobjects for the double negation topology are characterized by this condition. 19
This terminology also derives from the fact that in RT(A) this characterizes the separated objects for the double negation topology.
258
Pieter Hofstra and Philip Scott
enumerable (r.e.) relation, and the category PER is a kind of Karoubi envelope construction: the category whose objects are arbitrary pers on N and whose maps are r.e. functional relations between them. The full subcategory of PER given by r.e. pers and r.e. functional relations is particularly interesting in this regard, since it turns out to be exact. In [78], it is considered as a candidate for a kind of exact completion of the monoid of primitive recursive functions, although the precise nature of this completion is yet to be determined.
7 Other Directions This final section briefly introduces some facets of computation that have a somewhat different character than the work discussed so far. First, we discuss traced monoidal categories and PCAs arising in “Geometry of Interaction” situations. Next, we turn to computability at higher type, giving a very brief introduction to some of the concepts and ideas in that area. Finally, we mention some of the categorical approaches to complexity theory.
7.1 Traced Categories In an influential paper, Joyal, Street, and Verity [57] introduced the notion of an abstract trace in monoidal categories. Such traces arise in a wide range of areas, including knot theory, fixed point theory and theoretical computer science. We will be especially concerned with applications arising in the algebra of feedback in networks and the associated fixed point theories. Traced monoidal categories also play a prominent role in the categorical analysis of Girard’s Geometry of Interaction (GoI) Program in Linear Logic, in which one analyzes the dynamics and flow of information in cut-elimination in networks of proofs [1, 42]. For simplicity, we consider the case of symmetric monoidal categories. A parametrized trace on a symmetric monoidal category C is an operation U T rX,Y : C(X ⊗ U, Y ⊗ U ) → C(X, Y ), satisfying a number of axioms discussed in detail in [57, 1]. XU-
U Fig. 6 The trace T rX,Y (f )
Yf
U -
Aspects of Categorical Recursion Theory
259
The theory has a particularly geometric flavour, and the papers, loc. cit., use a string calculus both for describing the axioms and for diagrammatic reasoning. A particular evocative picture is to think of the trace as a form of “feedback”, as in Figure 6 and the discussion below. Examples relevant to this paper include Rel and Par, with ⊗ = , the disjoint union of sets. In the case of Par, the trace of a map f : X U → Y U is given by the following summation formula: U (f ) = fXY + fU Y fUnU fXU T rX,Y n∈N
Here fXY denotes the partial map X Y obtained from f by naturally restricting the domain and codomain (using injections and partial projections), and similarly for the other components. The sum of a family of partial maps hn : X Y is defined iff the domains of the hn are disjoint, in which n∈N case ( n∈N hn )(x) = hk (x) if hk (x)↓, and is undefined otherwise. Such traces given by the above formula are called “particle-style” ([1]) based on the following intuition: in Figure 6 imagine particles entering the box at X. Either they exit immediately at Y via fXY or they exit through U and continue to cycle on U some finite number n times via fU U and then eventually exit at Y. In [1], it is shown how a so-called GoI situation gives rise to a linear combinatory algebra. A GoI situation is a traced symmetric monoidal category equipped with a traced symmetric monoidal endofunctor, and an object U satisfying various domain equations. By applying the GoI construction, one obtains a compact closed category containing an object whose points form a linear combinatory algebra. By the latter, one means an applicative structure (A, •) equipped with an endomap ! : A → A and several combinators, allowing for the application (a, b) → a•!b to form a total combinatory algebra. Lambek’s register machines were described by a language of flowcharts and feedback. They can be naturally represented in a symmetric traced category with ⊗ = coproduct [58]. The original categorical studies of iterative notions of flowchart computation in a programming language setting was by C. Elgot. In this case iteration is given by a kind of feedback loop in a category whose hom-sets have infinite sums (Elgot’s ideas are detailed in [93], and pursued more abstractly in traced Σ-monoid enriched tensor categories by Haghverdi [41]). Finally, traced monoidal categories in which the monoidal tensor ⊗ is obtained from a cartesian or genuine tensor product are discussed in [1], as well as a more general notion of partially traced categories, in [43].
260
Pieter Hofstra and Philip Scott
7.2 Typed PCAs The notions of computation considered so far has been untyped, in the sense that it is based on a single base type containing both the input/output values of computable maps and the (codes for) computable maps. In the notion of PCA, this is reflected by the fact that the partial application a • b regards a as a code for a partial map and b as an input. In various situations however, we do wish to consider computation over different types, for example because we wish to distinguish between the type of computable operations and the type of its inputs and outputs. One of the key concepts in the study of such situations is that of a typed PCA. Definition 29 (Typed PCA) Let T be the collection of simple types generated by a single base type N . A Typed Partial Combinatory Algebra (TPCA) over T is a set-valued assignment τ → A(τ ) for τ ∈ T , together with for all σ, τ ∈ T a partial application function •σ,τ : A(σ → τ ) × A(σ) → A(τ ). As for PCAs, we write application using infix notation, associating to the left; we also suppress the typing information. One requires the existence of combinators kσ,τ ∈ A(σ → (τ → σ)) ;
sσ,τ,ρ ∈ A((σ → τ → ρ) → ((σ → τ ) → (σ → ρ)))
(for all types σ, τ, ρ) satisfying • k•x•y =x • s • x • y↓ • s • x • y • z = (x • z) • (y • z) We remark that some authors also require the existence of fixed point combinators, numerals and recursors (see Lemma 2 for what these are in the untyped setting). This essentially guarantees that a TPCA is a model of Plotkin’s simply typed programming language PCF (see [105]). Examples of TPCAs 1. Let A(N ) = N, and A(σ → τ ) = A(τ )A(σ) . Then we can let application be evaluation • : A(τ )A(σ) × A(σ) → A(τ ). This is called the full (total) TPCA over N. 2. In the previous example we may instead let A(σ → τ ) = Par(A(σ), A(τ )), the set of all partial functions. Then we get a TPCA where application is partial. 3. If C is a CCC with NNO, we consider the subcategory on the simple types over the NNO. Taking global sections gives a TPCA. 4. Any PCA A = (A, •) is a typed PCA where A(σ) = A, and •σ,τ = • for all types σ, τ .
Aspects of Categorical Recursion Theory
261
5. Term models of typed lambda calculus form TPCAs in the expected way, as do term models of programming languages based on typed lambda calculus, such as PCF.
Just as for PCAs, there is a notion of simulation between TPCAs. For example, to say that A has numerals (and that computable functions are representable) is to say that there is a simulation of Kleene’s first model into A. See [86] for details.
7.3 Computation at higher types Most of the developments described above concern first-order computability (possibly taking place in a higher-order setting). We now briefly discuss computability at higher types. The relation between higher-order computability and first-order computability is analogous to that between functional analysis and analysis. Thus in higher-order computability one studies functionals NN → N, and so on. Immediately, one recognizes the many possibilities: one could consider functionals acting on all total functions, or on all partial functions, or on all total computable functions, or on all partial computable functions, et cetera. We refer to the detailed survey paper [84] for a comprehensive historical overview. Kleene’s S1-S9 One of the most fundamental notions of higher type computability was introduced in the landmark paper [62]. The collection of pure types over N is defined by: (k) N0 = 1, N(k+1) = NN . Kleene’s conditions S1-S9 define a class of partial maps of type Φ : N(k1 ) × · · · × N(kr ) → N. More precisely, the definition specifies a relation {e}(v1 , . . . , vr ) x, where e ∈ N is an index, the vi are elements of the pure types N(ki ) , and x ∈ N. Thus the resulting definition is an example of partial functionals operating on total functions.
Another classic example of a notion of computation at higher type, first introduced in [63], is the following:
262
Pieter Hofstra and Philip Scott
Hereditarily Effective Operations Define simultaneously, for each simple type over the natural numbers, a set of natural numbers and an equivalence relation on the set as follows: • HEO0 = N, and n ∼0 m ⇔ n = m. • HEOσ→τ = {e ∈ N | φe induces a total function HEOσ → HEOτ }, and e ∼σ→τ e ⇔ ∀n ∈ HEOσ .φe (n) ∼τ φe (n).
One of the central contributions in [86] is the development of a general framework (called computability model ) for studying the wide variety of possible notions of higher type computation. It also supports a general notion of simulation between models, and of equivalence. Typed PCAs form an important class of examples of computability models. We shall now sketch a result by Longley characterizing the so-called extensional collapse of a large family of TPCAs. From now on, we assume our TPCAs come equipped with a choice of numerals N → A(N ). Definition 30 (Extensional Collapse of a TPCA) Let A be a TPCA. Define, at each simple type σ, a PER ∼σ on A(σ) as follows: • a ∼N b iff a = b = n for some n ∈ N • a ∼σ→τ b iff for all x, y ∈ A(σ) with x ∼σ y, a • x ∼τ b • y. The sets A(σ)/ ∼σ form a simple type structure over N, denoted EC(A). Definition 31 A typed PCA A is (i) continuous if there is a numeral-respecting simulation A → K2 ; (ii) full continuous if it is continuous and all functions N → N are represented in A; (iii) effective if there is a numeral-respecting simulation A → K1 ; The following general result (referred to as the Ubiquity Theorem) now describes the extensional collapse of these important classes of typed PCAs: Theorem 19 (Longley [85]) Let A be a typed PCA. (i) If A is full continuous, then EC(A) = C, the total continuous functionals (which may be taken to be C = EC(K2 )). (ii) If A is effective (and satisfies a few minor technical conditions), then EC(A) = HEO, the hereditarily effective operations. There is a third part to the theorem, which characterizes the collapse of a class of relative TPCAs. By the latter, we mean a TPCA A together with a sub-TPCA A# , that is, a collection of subsets A# (σ) ⊆ A(σ) closed under the application and containing the combinators k, s. There is a corresponding relative version of the extensional collapse. Longley’s third theorem then states that when (A, A# ) is a relative TPCA with A full continuous and A# effective, EC(A, A# ) = RC, the total recursive continuous functionals. The latter may be taken to be EC(K2 , K2eff ).
Aspects of Categorical Recursion Theory
263
7.4 Higher-order computation in toposes Since toposes are cartesian closed we can also consider higher type computability in toposes. Let us consider this first in the case of the effective topos. The following result already appears in [52]: Theorem 20 (Hyland [52]) The total functionals of higher type over the NNO in Eff are precisely the hereditarily effective operations. Next, consider the Mulry topos; this is the topos of sheaves on the monoid of total computable functions, with the canonical topology. (The latter amounts to taking as basic coverings sets {f1 , . . . , fk } for which k N → N is i=1 Im(fi ) = N.) For the following result, a functional G : N 2 called Banach-Mazur when for each computable h : N → N, the composite ˜ is computable, where h ˜ : N → NN is the transpose of h. G◦h Theorem 21 (Mulry [98]) The functionals NN → N in the Mulry topos are precisely the Banach-Mazur functionals. Finally, let us consider the free topos. What are the total functionals of pure type in the free topos, i.e. arrows N(k) → N( ) , k, ≥ 1? This question is answered in an interesting paper of A. Scedrov [111]. Theorem 22 (Scedrov [111]) Let F be the free topos, let C be the free CCC with NNO and let F C be the full subcategory of F generated by C. The morphisms of F C are precisely those Kleene computable functionals that are provably total in the internal logic of F. The proof uses a gluing (or Friedman Realizability) argument (cf. [76, 77, 120]) together with induction. The ambient set theory is the free topos F itself, and, for each type level j ≥ 0 we construct an Effective topos Eff (j) internally in the free topos, gluing it along a certain left exact functor Δ : Eff (j) → F.
7.5 Complexity Theory While classical computability theory is often concerned with the degree of unsolvability of various problems, the branch most relevant to computer science is that of complexity theory, where one classifies solvable problems according to the time and/or resources their solutions require. In particular, one is interested in complexity classes and the connections between those. For example, the class PTIME consists of problems whose solution (regarded as a function of the input value n ∈ N) requires p(n) steps (of a deterministic Turing machine, say), where p is a polynomial with positive integer coefficients. We refer to [44] for an introduction.
264
Pieter Hofstra and Philip Scott
Early work in Implicit Computational Complexity by Martin Hofmann e.g. [48] used complexity-bounded combinatory algebras and realizability to study logics of bounded complexity. A BCK algebra is an applicative structure A having the combinators b, c, k, where (still associating to the left) bxyz = x(yz) ;
cxyz = xzy ;
kxy = x.
Any total PCA is a BCK algebra, but not vice versa: the diagonal x → xx is generally not computable in a BCK algebra. One of the results in [48] shows that there is a BCK algebra structure on the natural numbers capturing PTIME computation: Theorem 23 (Hofmann [48]) There exists a BCK algebra structure on N such that the computable maps w.r.t. this structure are precisely the polynomial-time computable functions. Related applications of such bounded combinatory algebras (to reprove the theorem that the representable functions of Bounded Linear Logic are exactly those in PTIME) appear in [49]. Recent work in Turing categories has focussed on the following general question: which complexity classes (e.g. LINEAR, PTIME, LOGSPACE, etc.) can occur as the total maps of a Turing Category? Of course, such a Turing category cannot be a subcategory of Par, since it would then necessarily contain all total computable functions. Hence, it has to be a category whose global sections functor is not faithful. The paper [24] explores the area in more detail. Their main theorem characterizes when a Cartesian Category C with a Universal Object U , a pair of disjoint elements {t, f}, and various abstract coding retract structure can arise as the total maps of a Turing Category. The construction makes use of the idea that the given retract structure allows one to simulate a simple programming language. Passing to the presheaf topos of C then provides the required structure of a trace (on the coproduct) for implementing this language to obtain a PCA. As a consequence of this characterization, one obtains the following corollary: Corollary 2 Any countable Cartesian category with a universal object U and a pair of disjoint elements is the total maps of a Turing category. In order to apply this result to show that a particular complexity class arises as the total maps of a Turing category, one is thus required to establish that the class in question admits the required closure conditions and pairing operations. For example, the classes of LINEAR and PTIME maps (between binary numbers) can be shown to meet these requirements [24]. However, it is not fully understood for which complexity classes this is possible.
Aspects of Categorical Recursion Theory
265
Conclusion We hope that we have shown in this -admittedly biased- overview of categorical recursion theory how various of Lambek’s seminal ideas have initiated and inspired numerous strands of research that are still being pursued today. We also hope to have conveyed to the reader that there are still many interesting unanswered questions and relatively unexplored facets of categorical recursion theory that deserve further investigation.
References 1. Abramsky, S., Haghverdi, E., and Scott, P. Geometry of interaction and linear combinatory algebras. Math. Structures in Computer Science 12 (2002), 1–40. 2. Amadio, R. M., and Curien, P.-L. Domains and lambda-calculi. Cambridge University Press, 1998. 3. Anglin, W. S., and Lambek, J. The Heritage of Thales. Undergraduate Texts in Mathematics. Springer, 1995. 4. Artin, M., Grothendieck, A., and Verdier, J.-L., Eds. SGA4: Th´ eorie des topos et cohomologie ´ etale des sch´ emas. No. 269, 270, 305 in Lecture Notes in Mathematics. Springer, 1972/3. 5. Awodey, S. Category Theory, 2 ed. No. 52 in Oxford Logic Guides. Oxford University Press, 2010. 6. Barendregt, H. P. The lambda calculus: its syntax and semantics, vol. 103 of Studies in Logic and the Foundations of Mathematics. Elsevier, Amsterdam, 1984. Revised edition. 7. Bethke, I. Notes on Partial Combinatory Algebras. PhD thesis, Universiteit van Amsterdam, 1988. 8. Bhargava, M., and Lambek, J. A rewrite system of the Western Pacific: Lounsbury’s analysis of trobriand kinship terminology. Theoretical Linguistics 21, 2-3 (1995), 241–253. 9. Blute, R., Cockett, J. R. B., and Seely, R. A. G. Categories for computation in context and unified logic. J. Pure and Applied Algebra 116 (1997), 49–98. 10. Blute, R., Cockett, J. R. B., Seely, R. A. G., and Trimble, T. Natural deduction and coherence for weakly distributive categories. J. Pure and Applied Algebra 3, 113 (2002), 229–296. 11. Boolos, G., Burgess, J., and Jeffrey, R. Computability and Logic, 4th ed. Cambridge University Press, 2007. 12. Burroni, A. R´ ecursivit´ e graphique (1` ere partie): cat´ egorie des fonctions r´ ecursives primitives formelles. Cah. Topol. G´ eom. Diff´ er. Cat´ eg 27, 1 (1986), 49–79. 13. Carboni, A. Some free constructions in realizability and proof theory. Journal of Pure and Applied Algebra 103 (1995), 117–148. 14. Carboni, A., Freyd, P. J., and Scedrov, A. A categorical approach to realizability and polymorphic types. Lecture Notes in Computer Science 298 (1988), 23–42. 15. Carboni, A., Lambek, J., and Pedicchio, M. C. Diagram chasing in Mal’cev categories. Journal of Pure and Applied Algebra 69, 3 (1990), 271–284. 16. Carboni, A., and Vitale, E. M. Regular and exact completions. Journal of Pure and Applied Algebra 125, 1–3 (1998), 79–116. 17. Castellan, S., Clairambault, P., and Dybjer, P. Categories with families: unityped, simply typed, and dependently typed. In This Volume. 2020.
266
Pieter Hofstra and Philip Scott
18. Cockett, J. R. B. Categories and Computability: Notes for the Estonia Winter School. http://pages.cpsc.ucalgary.ca/~robin/, 2010. 19. Cockett, J. R. B., Guo, X., and Hofstra, P. J. W. Range categories I: General theory. Theory and Applications of Categories 26 (2012), 412–452. 20. Cockett, J. R. B., Guo, X., and Hofstra, P. J. W. Range categories II: Towards regularity. Theory and Applications of Categories 26 (2012), 453–500. 21. Cockett, J. R. B., and Hofstra, P. J. W. Introduction to Turing categories. Annals of Pure and Applied Logic (2007). 22. Cockett, J. R. B., and Hofstra, P. J. W. Categorical simulations. Journal of Pure and Applied Algebra 214, 10 (2010), 1835–1853. 23. Cockett, J. R. B., and Hofstra, P. J. W. Unitary theories, unitary categories. Electronic Notes in Theoretical Computer Science (2010). 24. Cockett, J. R. B., Hofstra, P. J. W., and Hrubes, P. Total maps of Turing categories. ENTCS 308 (2014), 129–146. 25. Cockett, J. R. B., and Lack, S. Restriction categories I. Theoretical Computer Science 270 (2002), 223–259. 26. Cockett, J. R. B., and Seely, R. A. G. Proof theory for full intuitionistic linear logic, bilinear logic and mix categories. Theory and Applications of Categories 3, 5 (1997), 85–131. 27. Cutland, N. J. Computability. Cambridge University Press, 1980. 28. Dedekind, R. Was sind und sollen die Zahlen? Braunschweig: Vieweg, 1888. 29. Do˘sen, K. Cut Elimination in Categories, vol. 6 of Trends in Logic. Kluwer, Dordregt, 1999. 30. Do˘sen, K., and Petric, Z. Proof-Theoretical Coherence, vol. 1 of Studies in Logic. King’s College Publications, 2004. 31. Do˘sen, K., and Petric, Z. Proof-Net Categories. Polimetrica, Monza, 2007. 32. Eilenberg, S., and Elgot, C. C. Recursiveness. Academic Press, 1970. 33. Feferman, S. A language and axioms for explicit mathematics. In Algebra and Logic, J. N. Crossley, Ed. Springer-Verlag, 1975, pp. 87–139. 34. Fourman, M. P., and Scott, D. S. Sheaves and logic. In Applications of Sheaves, C. M. M.P. Fourman and D. Scott, Eds., vol. 753 of Lecture Notes in Mathematics. Springer-Verlag, 1979, pp. 302–401. 35. Frey, J. Characterizing partitioned assemblies and realizability toposes. Journal of Pure and Applied Algebra 223, 5 (2019), 2000–2014. 36. Girard, J.-Y. Linear logic. Theoretical Computer Science 50 (1987), 1–102. 37. Girard, J.-Y., Lafont, Y., and Taylor, P. Proofs and Types. Cambridge University Press, 1989. ¨ 38. G¨ odel, K. Uber formal unentscheidbare S¨ atze der Principia Mathematica und verwandter Systeme, I. Monadshefte f¨ ur Mathematik und Physik 38, 1 (173-198 1931). ¨ 39. G¨ odel, K. Uber eine bisher noch nicht ben¨ utzte Erweiterung des finiten Standpunktes. Dialectica 12, 3/4 (1958), 280–287. 40. Goodstein, R. L. Recursive Number Theory. Studies in Logic and the Foundations of Mathematics. Elsevier, Amsterdam, 1957. 41. Haghverdi, E. A Categorical Approach to Linear Logic, Geometry of Proofs and Full Completeness. PhD thesis, University of Ottawa, 2000. 42. Haghverdi, E., and Scott, P. A categorical model for the geometry of interaction. Theoretical Computer Science 350 (2006), 252–274. 43. Haghverdi, E., and Scott, P. J. Towards a typed geometry of interaction. Math. Structures in Comp. Science 20(3) (2010), 1–49. 44. Harel, D., and Feldman, Y. Algorithmics, 3rd ed. Addison-Welsey, 2004. 45. Heller, A. An existence theorem for recursion categories. Journal of Symbolic Logic 55, 3 (1990), 1252–1268. 46. Higgs, D. A category approach to boolean-valued set theory. Tech. rep., University of Waterloo, 1973.
Aspects of Categorical Recursion Theory
267
47. Hilbert, D., and Bernays, P. Grundlagen der Matematik I. No. 40 in Die Grundlehren der mathematische Wissenschaften. Springer-Verlag, 1934. 48. Hofmann, M. Safe recursion with higher types and BCK-algebra. Annals of Pure and Applied Logic 104, 3 (2000), 113–166. 49. Hofmann, M., and Scott, P. J. Realizability models for BLL-like languages. Theoretical Computer Science 318 (2004), 121–137. 50. Hofstra, P. J. W., and van Oosten, J. Ordered partial combinatory algebras. Mathematical Proceedings of the Cambridge Philosophical Society 134 (2003), 445–463. 51. Huber-Dyson, V. Strong representability of number-theoretic functions. Tech. rep., Hughes Aircraft, 1965. 52. Hyland, J. M. E. The effective topos. In The L.E.J. Brouwer Centenary Symposium (1982), A. Troelstra and D. V. Dalen, Eds., North Holland Publishing Company, pp. 165–216. 53. Hyland, J. M. E., Johnstone, P. T., and Pitts, A. M. Tripos theory. Mathematical Proceedings of the Cambridge Philosophical Society 88 (1980), 205–232. 54. Jay, C. B. Languages for monoidal categories. Journal of Pure and Applied Algebra 59 (1989), 61–85. 55. Jay, C. B. The structure of free closed categories. Journal of Pure and Applied Algebra 66 (1990), 271–285. 56. Joyal, A. The G¨ odel incompleteness theorem, a categorical approach (abstract). Cah. de Top. Geom. Diff. 16, 3 (2005). 57. Joyal, A., Street, R., and Verity, D. Traced monoidal categories. Mathematical Proceedings of the Cambridge Philosophical Society 119 (1996), 447–468. 58. Katis, P., Sabadini, N., and Walters, R. F. C. Feedback, trace, and fixed-point semantics. Theoretical Informatics and Applications 36, 2 (2002), 181–194. 59. Kelly, G. M., and Lane, S. M. Coherence in closed categories. J. Pure and Applied Algebra 1, 1 (1971), 97–140. 60. Kleene, S. C. On the interpretation of intuitionistic number theory. Journal of Symbolic Logic 53, 1 (1945), 109–124. 61. Kleene, S. C. Introduction to Metamathematics. North Holland, 1952. 62. Kleene, S. C. Recursive functionals and quantifiers of finite types I. Transactions of the American Mathematical Society 91, 1 (1959), 1–52. 63. Kreisel, G. Interpretation of analysis by means of constructive functionals of finite types. In Constructivity in mathematics: proceedings of the colloquium held in Amsterdam (1959), A. Heyting, Ed., North-Holland, Amsterdam, pp. 101–128. 64. Lambek, J. The mathematics of sentence structure. Amer. Math. Monthly 65 (1958), 154–169. 65. Lambek, J. How to program an infinite abacus. Canadian Mathematical Bulletin 4, 3 (1961), 295–302. 66. Lambek, J. On the calculus of syntactic types, vol. 12 of Proc. Symposium Appl. Math. AMS, 1961, pp. 166–178. 67. Lambek, J. Deductive systems and categories I. J. Math. Syst. Theory (1968). 68. Lambek, J. A fixpoint theorem for complete categories. Math. Zeitschrift 103 (1968), 151–161. 69. Lambek, J. Deductive Systems and Categories II, vol. 86 of Lecture Notes in Mathematics. Springer, 1969, pp. 76–122. 70. Lambek, J. Deductive Systems and Categories III, vol. 274 of Lecture Notes in Mathematics. Springer, 1972, pp. 57–82. 71. Lambek, J. Functional completeness of cartesian categories. Annals of Mathematical Logic 6, 3 (1974), 259 – 292. 72. Lambek, J. A mathematician looks at French conjugation. Theoretical Linguistics 2, 1-3 (1975), 203–214. 73. Lambek, J. From λ-calculus to cartesian closed categories. In To H. B. Curry, Essays on Combinatory Logic, Lambda Calculus and Formalism, J. P. Seldin and J. R. Hindley, Eds. Academic Press, 1980, pp. 375–402.
268
Pieter Hofstra and Philip Scott
74. Lambek, J. Multicategories revisited. Contemp. Mathematics 92 (1989), 217–239. 75. Lambek, J. Relations in operational categories. J. Pure and Applied Algebra 116 (1997), 221–248. 76. Lambek, J., and Scott, P. Intuitionist type theory and the free topos. Journal of Pure and Applied Algebra 19 (1980), 215–257. 77. Lambek, J., and Scott, P. Introduction to higher order categorical logic, vol. 7 of Cambridge studies in advanced mathematics. Cambridge University Press, 1986. 78. Lambek, J., and Scott, P. J. An exactification of the monoid of primitive recursive functions. Studia Logica 81, 1 (2005), 1–18. 79. Lawvere, F. W. An elementary theory of the category of sets. (extended version published in TAC reprints: http://www.tac.mta.ca/tac/). Proceedings of the National Academy of Science of the U.S.A 52 (1964), 1506–1511. 80. Lawvere, F. W. Diagonal arguments and cartesian closed categories. In Category Theory, Homology Theory and their Applications, II (Battelle Institute Conference, Seattle, Wash., 1968, Vol. Two). Springer-Verlag, Berlin, 1969, pp. 134–145. 81. Lawvere, F. W. Quantifiers and sheaves. In Actes du ICM, Nice 1970, I (1971), Gauthier-Villars, Paris, pp. 329–334. 82. Lee, S., and van Oosten, J. Basic subtoposes of the effective topos. Annals of Pure and Applied Logic 164, 9 (2013), 866–883. 83. Longley, J. Realizability toposes and language semantics. PhD thesis, University of Edinburgh, 1994. 84. Longley, J. Notions of computability at higher types I. In Logic Colloquium 2000 (2000), R. Cori, A. Razborov, S. Todorˇ cevi´ c, and C. Wood, Eds., vol. 19 of Lecture Notes in Logic, Cambridge University Press, pp. 32–142. 85. Longley, J. On the ubiquity of certain total type structures. Mathematical Structures in Computer Science 17, 5 (2007), 841–953. 86. Longley, J., and Normann, D. Higher-Order Computability. Springer, 2015. 87. Longo, G., and Moggi, E. A category theoretic characterization of functional completeness. Theoretical Computer Science 70, 2 (1990), 193–211. 88. Mac Lane, S. Why commutative diagrams coincide with equivalent proofs. Contemp. Mathematics 13 (1982), 387–401. 89. Mac Lane, S. Categories for the Working Mathematician. Graduate Texts in Mathematics. Springer, 1998. 90. Mackie, I., Rom´ an, L., and Abramsky, S. An internal language for autonomous categories. Applied Categorical Structures 1 (1993), 311–343. 91. Maietti, M. Joyal’s arithmetic universe as list-arithmetic pretopos. Theory and Applications of Categories 24, 3 (2010), 39–83. 92. Maietti, M., and Vickers, S. An induction principle for consequence in arithmetic universes. Journal of Pure and Applied Algebra 216 (2012), 2049–2067. 93. Manes, E., and Arbib, M. Algebraic Approaches to Program Semantics. SpringerVerlag, 1986. 94. Menni, M. A characterization of the left exact categories whose exact completions are toposes. Journal of Pure and Applied Algebra 177, 3 (2003), 287–301. 95. Mints, G. E. Closed categories and the theory of proofs. Zap. Nau´ cn Seminar Leningrad Otdel Mat. Inst. Steklov (LOMI) 68 (1977), 83–114. 96. Mints, G. E. Proof theory and category theory (in russian). Aktual’nye voprosy logiki i metodologii nauki, Naukova Dumka, Kiev (1980), 252–278. 97. Mints, G. E. Selected Papers in Proof Theory. No. 3 in Studies in Proof Theory. North-Holland, 1992. 98. Mulry, P. S. Generalized Banach-Mazur functionals in the topos of recursive sets. Journal of Pure and Applied Algebra 26, 71–83 (1982). 99. Odifreddi, P. Classical recursion theory, vol. 125 of Studies in Logic. North-Holland, 1989. 100. Paola, R. D., and Heller, A. Dominical categories: recursion theory without elements. Journal of Symbolic Logic 52 (1987), 595–635.
Aspects of Categorical Recursion Theory
269
101. Paola, R. D., and Montagna, F. Some properties of the syntactic p-recursion categories generated by consistent, recursively enumerable extensions of Peano arithmetic. Journal of Symbolic Logic 56, 2 (1991), 643–660. 102. Par´ e, R., and Rom´ an, L. Monoidal categories with natural numbers object. Studia Logica 48, 3 (1989). 103. P´ eter, R. Recursive Functions. Academic Press, 1967. 104. Phoa, W. Relative computability in the effective topos. Mathematical Proceedings of the Cambridge Philosophical Society 106 (1989), 419–422. 105. Plotkin, G. D. LCF considered as a programming language. Theoretical Computer Science 5 (1977), 223–255. 106. Plotkin, G. D. Partial recursive functions and finality. In Computation, Logic, Games, and Quantum Foundations. The Many Facets of Samson Abramsky, L. O. B. Coecke and P. Panangaden, Eds., vol. 7860 of Lecture Notes in Computer Science. Springer, 2013, pp. 311–326. 107. Post, E. Recursively enumerable sets of positive integers and their decision problems. Bulletin of the American Mathematical Society 50 (1944), 284–213. 108. Robinson, E. P., and Rosolini, G. Colimit completions and the effective topos. Journal of Symbolic Logic 55, 2 (1990), 678–699. 109. Robinson, E. P., and Rosolini, G. An abstract look at realizability. In Computer Science Logic, 15th International Workshop (CSL 2001) (2001), L. Fribourg, Ed., vol. 2142 of Lecture Notes in Computer Science, Springer, pp. 173–187. 110. Rom´ an, L. Cartesian categories with natural numbers object. Journal of Pure and Applied Algebra 58 (1989), 267–278. 111. Scedrov, A. Kleene computable functionals and the higher order existence property. Journal of Pure and Applied Algebra 52 (1988), 313–320. 112. Scott, D. Relating theories of the lambda calculus. In To H. B. Curry, Essays on Combinatory Logic, Lambda Calculus and Formalism, J. P. Seldin and J. R. Hindley, Eds. Academic Press, 1980, pp. 403–450. 113. Skolem, T. The foundations of elementary arithmetic. In From Frege to G¨ odel: A Source Book in Mathematical Logic, 1879–1931, J. van Heijenoort, Ed. Harvard University Press, 1967 (1923), pp. 302–333. 114. Soare, R. Computability and recursion. Bulletin of Symbolic Logic 2, 3 (1996), 284–321. 115. Troelstra, A. S., Ed. Metamathematical Investigation of Intuitionistic Arithmetic and Analysis, vol. 344 of LNCS. Springer Verlag, 1973. 116. Troelstra, A. S., and Schwichtenberg, H. Basic Proof Theory, 2 ed. Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 2000. 117. van Oosten, J. Realizability: a historical essay. Mathematical Structures in Computer Science 12 (2002), 239–263. 118. van Oosten, J. Realizability: an introduction to its categorical side, vol. 152 of Studies in Logic. North-Holland, 2008. 119. Vinogradova, P. Investigating computability in Turing categories. Master’s thesis, University of Ottawa, 2011. 120. Wraith, G. C. Artin gluing. Journal of Pure and Applied Algebra 4 (1974), 345–348. 121. Yanofsky, N. A universal approach to self-referential paradoxes, incompleteness and fixed points. Bulletin of Symbolic Logic 09, 3 (2003), 362–386.
Morphisms of Rings Robert Par´e
Abstract Natural questions related to the double category of rings with homomorphisms and bimodules lead to a reevaluation of what a morphism of rings is. We introduce matrix-valued homomorphisms and then drop preservation of identities, giving what are sometimes called amplification homomorphisms. We show how these give extensions of the double category of rings and give some arguments justifying their study.
Introduction In 1967 when I started my PhD under Jim Lambek’s direction, he had already shifted his main interest from ring theory to category theory. I never had a course in ring theory from him but I learned in his category theory course, and in more detail, from his book [5], that rings were not necessarily commutative but had an identity element 1, and that homomorphisms, in addition to preserving sum and product, should also preserve 1. In the last chapter of that same book, he introduces bimodules and their tensor product and shows that it is associative and unitary up to isomorphism. The tensor product is of course only defined if the rings of scalars match up properly just like composition in categories. It would seem then that we have a sort of category whose objects are rings and whose morphisms are bimodules. Although he doesn’t say so there (it wasn’t the place), it’s clear from his later work that he knew then or shortly after that rings, bimodules and linear maps form a bicategory [1]. Robert Par´ e Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada, B3H 4R2, e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_8
271
272
Robert Par´ e
We are used to thinking of morphisms as structure-preserving functions, so what are we to make of bimodules as morphisms? And what’s the relationship with homomorphisms? These are the questions we address here. In sections 1 and 2 we introduce the relevant double category theory motivated by the example which concerns us, viz. the double category of rings, homomorphisms, bimodules and equivariant maps. We are led naturally to matrix-valued homomorphisms, which gives us a graded category of rings whose degree 1 part is the usual one. Accompanying this is a new double category whose basic properties we expose. This is the content of sections 3 and 4. Then double categorical considerations lead to a further extension of the category (and double category) of rings, to what have been called amplification morphisms. Their basic properties are treated in section 5.
1 Double categories The category of rings, Ring, has rings with 1 as objects and homomorphisms preserving 1 as morphisms. This is a nice category. It is complete, cocomplete, regular, locally finitely presentable, etc. Given rings R and S, an S-R-bimodule M is a simultaneous left S-module and right R-module whose left and right actions commute: (sm)r = s(mr) . If T is another ring and N a T -S-bimodule, the tensor product over S, N ⊗S M is naturally a T -R-bimodule. We have associativity isomorphisms P ⊗T (N ⊗S M ) ∼ = (P ⊗T N ) ⊗S M and unit isomorphisms M ⊗R R ∼ =M ∼ = S ⊗S M as clearly exposed in Chapter 5 of [5]. To keep track of the various rings involved and what’s acting on what and on which side we can write •
M: R
/S
to mean that M is an S-R-bimodule. Then the tensor product looks like a composition M R • /S N ⊗S M
•
•N
T.
Morphisms of Rings
273
So we get a sort of category whose composition is not really associative nor unitary but only so up to isomorphism. In order to formalize this we must incorporate these isomorphisms into the structure. They are morphisms between morphisms, called 2-cells. Given two S-R-bimodules M, M : R • / S, a 2-cell M •
R
$
:S
φ
• M
is a linear map of bimodules, i.e. a function such that φ(m1 + m2 ) = φ(m1 ) + φ(m2 ) φ(sm) = sφ(m) φ(mr) = φ(mr) . This is the data for a bicategory: objects (rings), arrows (bimodules), 2cells (linear maps). They can be composed in various ways and satisfy a host of equations, all of which would seem obvious to anyone working with bimodules. See the seminal work [1] for details. Now we have a structure (rings) with two candidates for morphism, homomorphism and bimodule, and we might ask which is the right one. In fact they are both good but for different purposes. So a better question is, how are they related? The answer will come from the theory of double categories. A double category A has objects (A, B, C, D below) and two kinds of morphism, which we call horizontal (f , g below) and vertical (v, w below). These are related by a further kind of morphism, double cells as in A v•
C
f α +3
/B •w
/D.
g
The horizontal arrows form a category HorA with composition denoted by juxtaposition and identities by 1A . Cells can also be composed horizontally also forming a category. The vertical arrows compose to give a bicategory VertA whose 2-cells are the globular cells of A, i.e. those with identities on the top and bottom 1A /A A +3
v•
C
1C
•w
/C .
274
Robert Par´ e
Vertical composition is denoted by • and vertical identities by idA . Finally double cells can also be composed vertically (also denoted • and idf ). The composition of cells is as associative and unitary as possible, provided the coherence isomorphisms are factored in to make the boundaries match. Rather than give a formal definition, we will be better served by a couple of representative examples. The reader is referred to [4] for a precise definition. Example 1 The double category which will concern us here is the double category of rings, Ring. Its objects are rings (with 1) and its horizontal arrows are unitary homomorphisms. Its vertical arrows are bimodules. More precisely a vertical arrow from R to S is an S-R-bimodule. A double cell R M•
S
f φ
g
+3
/ R •M
/ S
/ M which is linear in the sense that it preserves addition is a map φ : M and is compatible with the actions φ(sm) = g(s)φ(m) φ(mr) = φ(m)f (r) . In other words it is an S-R-linear map from M to M when the codomain is made into an S-R-bimodule by “restriction of scalars”. Horizontal composition of morphisms and cells is just function composition. Vertical composition is given by R M•
S
N•
T
f φ
/ R +3
•M
g
/ S
ψ
h
+3
•N
/ T
R =
N ⊗S M •
T
f
/ R
ψ⊗g φ
+3
h
• N ⊗S M
/ T
where (ψ ⊗g φ)(n ⊗S m) = ψ(n) ⊗S φ(m). It is easily checked that this is well defined and is associative in the appropriate sense, giving us the double category Ring. Example 2 A more basic example, and one to keep in mind, is the following. The double category Rel has sets as objects and functions as horizontal arrows, so HorRel = Set. A vertical arrow R : X • / Y is a relation between X and Y and there is a unique cell
Morphisms of Rings
275
X
f
+3
R•
Y
/ X
g
• R
/ Y
if (and only if) we have ∀x,y (x ∼R y ⇒ f (x) ∼R g(y)) . Rel is a strict double category in the sense that vertical composition is strictly associative and unitary. Strict double categories were introduced by Ehresmann [2] and are (ignoring size considerations) the same as category objects in Cat, the category of categories. Ring on the other hand is a weak double category in that vertical composition is only associative and unitary up to coherent isomorphism as is clear from the above discussion. We consider the weak double categories to be the more important notion, certainly for this work, and call them simply double categories without modifiers. They were introduced in [4], where more examples can be found.
2 Companions and conjoints Functions are often defined to be relations that are single-valued and everywhere defined. Category theorists would tend to take functions as primitive and define relations in the category of sets as subobjects of a product. Then every function has an associated relation, its graph Gr(f ) = {(x, y)|f (x) = y} . In any case there is a close relationship between functions and certain relations and this can be formulated in purely double category terms. / B a horizontal arrow, and Definition 3 Let A be a double category, f : A / • v: A B a vertical one in A. We say that v is a companion of f if we are given cells, the binding cells A idA •
A
such that
1A
α f
/A •v
/B
A and
v•
B
f
β
1B
/B • idB
/B
276
Robert Par´ e
A idA •
A
1A
α f
/A •v
/B
f
/B
A
• idB
β
1B
/B
idA •
=
A
f
/B
idf f
• idB
/ B
i.e. βα = idf , and A idA •
A
v•
B
1A
α f
β
1B
/A •v
/B
A ·=·
v•
• idB
/B
B
1A
1v
1B
/A •v
/B
i.e. β • α · = · 1v . The · = · sign means equality once the canonical isomorphisms (v • idA ∼ =v∼ = idB •v) are inserted to make the boundaries agree. Companions, when they exist, are unique up to isomorphism, and we use the notation f∗ to denote a choice of companion for f . Companions compose: (gf )∗ ∼ = g∗ • f∗ and we also have
(1A )∗ ∼ = idA .
It’s an easy exercise to show that in Rel, every function has a companion, its graph (if v is a relation R, then α expresses the fact that Gr(f ) ⊆ R, and β that R ⊆ Gr(f )). We see that companions give a precise meaning to expressions like “a function is a relation such that...” or more generally “a horizontal arrow is isomorphic to a vertical one”. / S has a companProposition 4 (a) In Ring, every homomorphism f : R ion, namely S considered as an S-R-bimodule with actions • given by s • s = s s s • r = sf (r) . (b) A bimodule M : R • / S is a companion if and only if it is free on one generator as a left S-module. Proof (a) Denote by S : R • / S the bimodule in the statement, i.e. the bimodule gotten from the S-S-bimodule S by “restriction of scalars” along f on the right. Then we have cells
Morphisms of Rings
277
/R
1R
R
α +3
R•
R
f
f
R
•S
β
S•
/S
S
α(r) = f (r)
1g
+3
/S •S
/S
β(s) = s .
It’s an easy calculation to show that α and β are indeed cells and that βα = idf and β • α · = · 1S . (b) Suppose the bimodule M : R • / S is free as a left S-module with generator m0 ∈ M . Then for every r ∈ R there is a unique element s ∈ S such that m0 r = sm0 . Call this s, f (r), so that f (r) is uniquely determined by the equation f (r)m0 = m0 r . / S. We check that
It is easy to check that f is a homomorphism f : R multiplication is preserved, as an example. f (r1 r2 )m0 = m0 r1 r2 = f (r1 )m0 r2 = f (r1 )f (r2 )m0 . So f (r1 r2 ) = f (r1 )f (r2 ). Also, while we are at it, f (1)m0 = m0 1 = 1m0 so f (1) = 1. Now define cells R R•
R
1R α +3
f
/R
R
•M
/S
and
M•
S
f β
1S
+3
/S •S
/S
by taking α(r) = m0 r, and β(m) to be the unique element of S such that β(m)m0 = m. The calculations showing that α and β are cells (i.e. linear maps) and that the binding equations βα = idf β • α · = · 1M hold are easy exercises. We check
278
Robert Par´ e
R R•
R
M•
S
/R
1R α +3
•M
f
/S
β
1S
+3
R ·=·
•S
/S
M•
S
/R
1R
+3
1M
1S
•M
/S
in part to illustrate where the · = · comes in. The left hand diagram has to be / M ⊗R R on the modified by placing the canonical isomophism ρ−1 : M / M on the right to make the boundaries on both sides left and λ : S ⊗S M of the equation the same. Then for m ∈ M we have λ(β ⊗ α)ρ−1 (m) = λ(β ⊗ α)(m ⊗ 1) = λ(βm ⊗ α1) = (βm)(α1) = (βm)(m0 1) = m. 2 The generator for M is not unique. If m1 is another one, then there exists / S are the an invertible element a ∈ S such that m1 = am0 . If f0 , f1 : R homomorphisms corresponding to m0 and m1 respectively, then we have f1 (r)m1 = f1 (r)am0 . On the other hand, we also have f1 (r)m1 = m1 r = am0 r = af0 (r)m0 so f1 (r)a = af0 (r) or
f1 (r) = af0 (r)a−1 .
We summarize this in the following proposition. Proposition 5 If a bimodule M : R • / S is a rank one free left S-module, / S. f is unique up to conjuthen M ∼ = f∗ for some homomorphism f : R gation by a unit of S. Conjugation by an element of S is actually an isomorphism in a 2-category of rings. Every double category A (strict or not) has a horizontal 2-category, Hor A. The objects are those of A, the 1-cells are the horizontal arrows of A, and the 2-cells are the special cells of A, i.e. cells of the form
Morphisms of Rings
279 f
A idA •
α
A
g
/B • idB
/ B.
Vertical composition of 2-cells, α and β, in Hor A uses the canonical isomor/ id phisms λ = ρ : id • id A
f
A idA •
idA •
A
α
A
λ−1 A
g
idA •
β
A
h
/B
B
• idB
/B
λB
• idB
/B
• idB
B.
A moderate amount of straightforward calculation shows that Hor A is indeed a 2-category. When applied to the double category Ring we get a 2-category whose objects are rings, whose arrows are homomorphisms and whose 2-cells are linear maps of the form f
R
α +3
R•
R
S
/S •S
/S .
Such an α is determined by its value at 1. We have α(r) = α(r · 1) = g(r)α(1) = α(1 · r) = α(1)f (r) . This gives the following. Definition 6 The 2-category of rings, Ring, has rings as objects, homomorphisms as 1-cells and as 2-cells f •
R
• g
$
:S ,
elements s ∈ S such that for all r sf (r) = g(r)s .
280
Robert Par´ e
This is not that surprising. If we think of a ring as a one-object additive category, then homomorphisms are additive functors, and a 2-cell as above is just a natural transformation. Nevertheless it can be useful to keep in mind. There is a dual notion to companion, that of conjoint, which we spell out. / B be a horizontal arrow in a double category A Definition 7 Let f : A and v : B • / A a vertical one. We say that v is conjoint to f if we are given cells (conjunctions) f
A
/B
ψ
idA •
A
+3
•v
1A
χ
v•
and
/A
/A
1A
A B
+3
• idA
/A
f
such that f
A
ψ
idA •
A
1A
+3
/B
χ
•v
/A
/B
1B
+3
f
A
• idB
/B
=
idA •
A
f
/B
idf
+3
f
• idB
/B
,
i.e. χψ = idf , and B v•
A
idA •
A
/B
1B χ
f ψ
1A
+3
• idB
+3
/B
B ·=·
v•
•v
/A
A
/B
1B 1v
1A
+3
•v
/A
,
i.e. ψ • χ · = · 1v . This definition looks very much like that of adjoint, and that is how we think of it: v is right adjoint to f , even though they are different types of arrows. That the notion is the vertical dual to that of companion is clear. The double category Ring is isomorphic to its vertical dual Ringco ∼ = Ring . The isomorphism takes a ring R to its opposite ring Rop , i.e., with mul/ S gives a homomorphism tiplication switched. A homomorphism f : R
Morphisms of Rings
281
/ S op (same direction) whereas an S-R-bimodule M : R • / S f op : Rop op gives an R -S op -bimodule M op : S op • / Rop (opposite direction). So the results on companions are readily dualizable. We denote by f ∗ a conjoint / S has a conjoint f ∗ , namely for f . In Ring, every homomorphism f : R S : S • / R with left action by R given by “restriction” r • s = f (r)s .
3 Matrix-valued homomorphisms / S correspond to We saw in the previous section how homomorphisms f : R / bimodules M : R • S which are free on one generator as left S-modules. What happens if M is free on p generators? We might expect these to correspond to some kind of homomorphic relation from R to S associating to each r ∈ R, not a unique element of S but rather p of them. This is not exactly what happens, but we do get something interesting. As a warm-up, let’s assume M is free on two generators m1 , m2 as a left S-module. Nothing is said about the right action (as before). Then for each r ∈ R we get unique s11 , s12 , s21 , s22 ∈ S such that m1 r = s11 m1 + s12 m2 m2 r = s21 m1 + s22 m2 . Let’s denote sij by fij (r). So to each r we associate not two but four elements of S! Of course the “same” is true if M is free on p generators m1 , . . . , mp : mi r =
p
fij (r)mj .
j=1
Consider mi (rr ) =
p
fij (rr )mj
j=1
and
So fik (rr ) =
(mi r)r =
p
j=1
p
fij (r)mj r p p = j=1 fij (r) ( k=1 fjk (r )mk ) p p = k=1 f (r)f (r ) mk . ij jk j=1 j=1
fij (r)fjk (r ), i.e. we get a homomorphism f: R
/ M atp (S)
into the ring of p × p matrices in S. This leads to the following:
282
Robert Par´ e
/ M atp (S) inTheorem 8 (a) Any matrix-valued homomorphism f : R (p) duces an S-R-bimodule structure on S . (b) Any S-R-bimodule M : R • / S which is free on p generators as a left S-module is isomorphic (as an S-R-bimodule) to S (p) with R-action induced / M atp (S) as in (a). by a homomorphism f : R (c) The homomorphism f in (b) is unique up to conjugation by an invertible p × p matrix A in M atp (S). Proof (a) Let S (p) denote the set of row vectors in S of length p, i.e. 1 × p matrices. Then for any element s = [s1 , . . . , sp ] ∈ S (p) let s • s = [s s1 , . . . , s sp ] and s • r = sf (r)
(matrix multiplication).
The bimodule conditions are easily verified. /M atp (S) (b) We saw just above how to construct a homomorphism f : R / from a bimodule M : R • S with an S-basis m1 , . . . , mp . It is uniquely determined by m • r = f (r)m with m = column vector of mi ’s. Because M is free on m1 , . . . , mp as an S-module we already have an S-isomorphism φ : S (p)
/M
φ(s) = sm. Now make S
(p)
into an S-R-bimodule as in (a), i.e. s • r = sf (r). Then φ(s • r) = φ(sf (r)) = sf (r)m = sm • r = φ(s) • r ,
so φ is an S-R-isomorphism. (c) This is the usual change of bases calculation. It’s just a question of taking care to get everything on the right side. If m is another S-basis for M we get an invertible S-matrix A such that m = A m so if f is the homomorphism we get from m , we have m • r = f (r)m A m • r = f (r)A m m • r = A−1 f (r)A m
Morphisms of Rings
283
so f (r) = A−1 f (r)A.
2
/ M at(S) as a kind of homoWe’d like to think of a homomorphism R morphic relation from R to S. So let’s look at some special cases. Example 9 (Pairs of homomorphisms) / S be homomorphisms. Then we get a homomorphism Let f, g : R / M at2 (S) given by h: R f (r) 0 h(r) = 0 g(r) In general, we have the subring of diagonal matrices S (p) ⊆ M atp (S) / S give a matrix-valued homomorphism
so p homomorphisms fi : R
/ M atp (S) .
f: R
Example 10 (Derivations) / S be a homomorphism and d an f -derivation, i.e. an additive Let f : R / S such that function d : R d(rr ) = d(r)f (r ) + f (r)d(r ) . / M at2 (S)
Then we get a homomorphism R r −→
f (r) 0 d(r) f (r)
In fact the set of matrices D=
s 0 s, s ∈ S s s
is a subring of M at2 (S), and derivations correspond exactly to homomor/ M at2 (S) that factor through D. phisms R More generally we can consider the subring of lower triangular matrices
s 0 L= , s ∈ S . s, s s s
/ M at2 (S) that factors through L corresponds Then a homomorphism R / S and a derivation d from f to g, i.e. to a pair of homomorphisms f, g : R / S such that an additive function d : R d(rr ) = d(r)f (r ) + g(r)d(r ) .
284
Robert Par´ e
Example 11 We give one more, somewhat mysterious, example to illustrate the variety of morphisms we get just in the 2 × 2 case. For any ring S we can construct a ring of “complex numbers” over S:
s s C(S) = s, s ∈ S . −s s
/ M at2 (S) that factors This is a subring of M at2 (S). A homomorphism R / S with the through C(S) corresponds to two additive functions γ, σ : R properties γ(rr ) = γ(r)γ(r ) − σ(r)σ(r ) σ(rr ) = σ(r)γ(r ) + γ(r)σ(r ) .
Hopefully these examples will have convinced the reader that considering / M atp (S) as a kind of relation from R to S is an homomorphisms R interesting idea worth pursuing. If they really are a kind of morphism from R to S we should be able to compose them. To get an idea of how this might work, the previous theorem / M atp (S) corresponds to a bimodule says that a homomorphism f : R (p) / S : R • S and we know how to compose bimodules. So given another / M atq (T ) we get T (q) : S • / T and if we compose homomorphism g : S these we get T (q) ⊗S S (p) ∼ = ⊕p (T (q) ⊗S S) ∼ = T (pq) . = T (q) ⊗S (⊕p S) ∼ So what we can expect is a graded composition, graded by the multiplicative monoid of positive integers (N+ , ·). If we are a bit more careful with the above isomorphisms we get an explicit description of the graded composition. / M atq (T ), let f∗ = / M atp (S) and g : S For homomorphisms f : R (p) (q) / / S : R • S and g∗ = T : S • T be the bimodules induced by f and g as in the above theorem. So we get a bimodule g∗ ⊗S f∗ = T (q) ⊗S S (p) : R
•
/T .
Let e1 , . . . , ep be the standard basis for S (p) and e1 , . . . , eq the standard basis for T (q) . Then the ej ⊗ei for 1 ≤ i ≤ p, 1 ≤ j ≤ q form a basis for T (q) ⊗S S (p) . We have (ej ⊗ ei )r = ej ⊗ ( k fki (r)ek ) = k ej ⊗ fki (r)ek = k ej fki (r) ⊗ ek = k l glj (fki (r))el ⊗ ek .
So we apply f to r to get a p×p matrix and then apply g to each of the entries to get a block p × p matrix of q × q matrices. Ordering the basis {ej ⊗ ei } will give us a (pq) × (pq) matrix. The ordering is arbitrary but a judicious choice
Morphisms of Rings
285
will make calculations easier and the block matrix picture suggests just such a choice. We order them lexicographically from the right e1 ⊗ e1 , e2 ⊗ e1 , . . . , eq ⊗ e1 , e1 ⊗ e2 , . . . , eq ⊗ ep i.e. ej ⊗ ei is in the j + q(i − 1) position. This gives us an isomorphism M atp M atq (T ) ∼ = M atpq (T ), and with the aid of this we can now compose f / M atpq (T ). This composition enlarges with g to get a homomorphism R the category of rings to an (N+ , ·)-graded category. Of course associativity and the unit laws have to be proved, which is a bit messy and a more general categorical approach will clarify things. First of all for any p, M atp (R) is functorial in R, i.e. we have a family of / Ring. Then the isomorphisms M atP M atq (R) ∼ functors M atp : Ring = M atpq (R) are natural in R and they satisfy an associativity condition giving us a graded monad. Graded monads were explicitly defined as such in [3] but certainly go back to B´enabou [1]. Definition 12 Let (M, ·, 1) be a monoid. An M -graded monad consists of a / A, together with category A and for each m ∈ M an endofunctor Tm : A natural transformations / T1 η : 1A and
/ Tmm
μm,m : Tm Tm satisfying unit laws Tm η
Tm
/ T m T1
1Tm
ηTm
T1 Tm
μ1,m
/
μm,1
Tm
and associativity Tm Tm Tm
μm,m Tm
/ Tmm Tm μmm ,m
Tm μm ,m
Tm Tm m
μm,m m
/ Tmm m .
This is nothing but a lax functor T: M
/ Cat
where M is the locally discrete one-object 2-category with 1-cells given by the elements of M .
286
Robert Par´ e
/ Ring. Proposition 13 (1) For every p ∈ N+ , M atp is a functor Ring / M atpq for every (2) There is a natural isomorphism μp,q : M atp M atq p, q ∈ N+ . (3) The families M atp p∈N+ , μp,q p,q∈N+ together with the canonical iso/ M at1 form an (N+ , ·, 1)-graded monad M at. morphism η : 1Ring Proof (1) M atp (f ) is application of f to a matrix entry-wise. This is obviously functorial. (2) An element of M atp M atq (R) is a p × p matrix of q × q matrices, and μpq of such is just the pq × pq matrix we get by erasing the inside brackets. This is also obviously natural. (3) It is also clear that the μpq are associative, the only difference between the two calculations being the order in which we erase the brackets inside the “block block” matrix. / M at1 consists in putting square brackets around an The unit η : 1Ring element to make it a 1 × 1 matrix, so the unit laws are equally transparent. 2 Given a graded monad T = ( Tm , η, μm,m ) Kleisli category AT . The objects are those of A / B in AT is a morphism f : A m, (m, f ) : A A
(m,f )
/B
(m ,g)
we can construct a graded and a morphism of degree / Tm B in A. Composition
/C μ
f m,m / T m B T m g / T m Tm C / Tmm C and units by is given by A / (1, ηA) : A A. That AT is a graded category is an easy calculation, just like for the usual Kleisli category. We see now that our matrix-valued homomorphisms are exactly the Kleisli morphisms for the graded monad M at. This gives a new, larger category of rings, RingM at .
Remark 14 Graded monads have recently appeared in the computer science literature (see e.g. [3] and references there). Our Kleisli category is not the same as theirs where their grading is on the objects rather than on the morphisms. The theory of graded monads and its extension to double categories is very interesting but that would take us too far afield so we leave it for future work.
4 The graded double category of rings We can extend the double category of rings by adding in the new graded morphisms. The double category RingM at has objects all rings but horizontal / M atp (S). The vertical / S, i.e. f : R arrows are the graded ones, (p, f ) : R arrows are still bimodules M : R • / S. A double cell
Morphisms of Rings
287 (p,f )
R
φ
M•
S
+3
(q,g)
/ R •M
/ S
is a linear map (a cell in Ring) R
f
/ M atp (R ) φ
M•
S
+3
• M atq,p (M )
/ M atq (S )
g
where M atq,p (M ) is the bimodule of q × p matrices with entries in M , with the M atq (S ) action given by matrix multiplication on the left, and similarly for M atp (R ). Theorem 15 (1) RingM at is a double category. ∼ (2) RingM at is vertically self dual, i.e. Ringco M at = RingM at . (3) Every horizontal arrow has a companion. Proof (1) This is a straightforward but long and uninformative calculation, best done in the context of graded monads, so is omitted here. (2) This is not completely obvious because M atp (S)op is not the same as M atp (S op ). As sets they are the same but the multiplications are different. If we evaluate A times B in each of these, the b’s come before the a’s but in the first case it’s column times row and the reverse in the second. But they are isomorphic, the isomorphisms given by transpose tS : M atp (S)op
/ M atp (S op )
A −→ AT . If we denote the opposite product by ∗, then tS (A ∗ B) = tS (BA) = (BA)T whose (i, j)th entry is
bjk aki .
k
On the other hand
tS (A)tS (B) = AT B T
whose (i, j)th entry is k
aki ∗ bjk =
k
bjk aki .
288
Robert Par´ e
The vertical involution / RingM at
Θ : Ringco M at
is defined on objects by taking the opposite ring Θ(R) = Rop and on vertical arrows, M : R
•
/ S, it is as before:
Θ(M ) : S op
/ Rop
•
is M considered as a left Rop right S op bimodule. For a horizontal arrow / S, Θ(p, f ) : Rop / S op is given by (p, f ) : R f op
Rop
/ M atp (S)op
tS
(p,f )
/ R
/ M atp (S op ) .
For a cell R M•
S
S op M•
Rop
φ
+3
•M
/ S
(q,g) Θ(q,g)
/ S op
Θ(φ)
+3
Θ(p,f )
•M
/ Rop
is given by S op
g φ
M•
Rop
/ M atq (S )op
f
+3
tS
/ M atq (S op )
• M atq,p (M )
/ M atp (R )op
tR
tM
+3
• M atp,q (M )
/ M atp (Rop ) .
Here tM is also taking transpose. / S is a homomorphism f : R (3) A horizontal arrow (p, f ) : R and its companion is the bimodule f∗ = S (p) : R
•
/S
introduced in the previous section. The binding cells
/ M atp (S)
Morphisms of Rings
289
R R•
R
(1,1R ) α +3
(p,f )
/R • f∗
/S
R f∗ •
and
S
(p,f ) β
+3
(1,1S )
/S •S
/S ,
are given by η
R
/ M at1 (R) α
R•
α(r) = f (r)
R
β(s) = s
f
/ M atp (S) β
S (p) •
S
• M atp,1 (S (p) )∼ =M atp (S)
/ M atp (S)
f
R
+3
η
+3
• M at1,p (S)=S (p)
/ M at1 (S) .
The verification of the linearity of α and β and the binding equations are left to the reader. 2 Corollary 16 In RingM at , every horizontal arrow has a conjoint. We can now describe the 2-cells in the 2-category Hor RingM at explicitly in terms of matrices. / S in Hor RingM at , Proposition 17 Given morphisms (p, f ) and (q, g) : R / (q, g) is a q × p matrix A with entries in S, such that a 2-cell φ : (p, f ) for every r ∈ R, Af (r) = g(r)A . Proof A 2-cell φ : (p, f )
/ (q, g) is a double cell in RingM at R R•
R which is a double cell
(p,f ) φ
+3
(q,g)
/S •S
/S
290
Robert Par´ e f
R
/ M atp (S) φ
R•
R
g
+3
• M atq,p (S)
/ M atq (S) .
This is entirely determined by its value at 1 φ(r) = φ(1)f (r) = g(r)φ(1) . 2
Take A = φ(1).
Remark 18 We only looked at free modules of finite rank, partly in preparation for the next section, but we can also consider infinite rank ones. Then the matrices are row finite, i.e. for every i, aij = 0 except for finitely many j. Now the distinction between row vectors and column vectors is clear. The first have finite support whereas the second are arbitrary. The double category we would get this way would not be vertically self dual. Every horizontal arrow would still have a companion, the coproduct of copies of the codomain. But conjoints don’t always exist. There is a candidate for the conjoint, the product of copies of the codomain, but only one of the conjunctions generalizes.
5 Adjoint bimodules / B has a companion f∗ If, in a double category A, a horizontal arrow f : A and a conjoint f ∗ then f∗ is left adjoint to f ∗ in VertA. The unit and counit of the adjunction are given by A idA •
A
idA •
A
1A
α f
ψ
1A
/A
B f∗ •
• f∗
/B
•f
/A
and ∗
A
f∗ •
B
1B
χ f
β
1B
/B • idB
/B
• idB
/B.
Definition 19 We say that an object B of a double category A is Cauchy complete if every vertical arrow v : A • / B with a right adjoint is the companion of a horizontal arrow. We say that A is Cauchy if every object is Cauchy complete.
Morphisms of Rings
291
Remark 20 The notion of Cauchy completeness for enriched categories (which we are extending to double categories here) was introduced by Lawvere in [6]. One might ask then, is the double category RingM at Cauchy? Not quite, but almost. And this leads to a further generalization of morphism of rings. Recall that two bimodules M : R • / S and N : S • / R are adjoint, or more precisely M is left adjoint to N , if there are an S-S linear map /S
: M ⊗R N and an R-R linear map
/ N ⊗S M
η: R such that
M ⊗R: N ⊗S M
M ⊗R η
M ⊗R R and
⊗S M
∼ =
/M
$ / S ⊗S M
∼ =
N ⊗S: M ⊗R N
η⊗R N
R ⊗R N
N ⊗S
∼ =
/N
∼ =
$ / M ⊗S S
commute. The following theorem is well-known. Theorem 21 A bimodule M : R • / S has a right adjoint if and only if it is finitely generated and projective as a left S-module. It is easier to give a proof than to hunt down a reference which gives it in the precise form we want. We do this after some preliminary remarks. M is finitely generated, by m1 , . . . , mp say, if and only if the S-linear map τ : S (p)
/M
p τ (s1 . . . sp ) = i=1 si mi is surjective. If M is S-projective, then τ splits, i.e. there is an S-linear map / S (p) σ: M such that τ σ = 1M . In fact, M is a finitely generated and projective S-module if and only if there exist p, τ, σ such that τ σ = 1M . / S. Then τ σ = 1M means Let the components of σ be σ1 , . . . , σp : M that for every m ∈ M we will have m=
p i=1
σi (m)mi
292
Robert Par´ e
i.e. the σi provide an S-linear choice of coordinates for m relative to the generators m1 . . . mp . All of this is independent of R. Proof (Of theorem) Suppose M is left adjoint to N with notation as above. p Then η(1) = i=1 ni ⊗ mi . The first triangle equation gives for any m
(m ⊗ ni )mi = m .
So we take τ : S (r)
/ M to be τ (s) =
n
s i mi
i=1
and σi (m) = (m ⊗ ni ) . Then τ σ = 1μ and M is finitely generated and projective. Conversely, take N = M ∗ = HomS (M, S). We immediately get
: M ⊗R N
/S
(m ⊗ f ) = f (m) . If M is finitely generated and projective, we have τ : S (p) / N ⊗S M by / S (p) . Define η : R σ: M η(1) =
p
/ M and
σi ⊗ τ (ei ) .
i=1
The triangle equations are easily checked.
2
Given this theorem then, we see that S is Cauchy-complete in RingM at if and only if every finitely generated projective left S-module is free. Commutative rings with this property are of considerable interest in algebraic geometry having to do with when vector bundles are trivial. If S is a PID or a local ring then it is Cauchy. That polynomial rings are so is the content of the Quillen–Suslin theorem, which is highly non trivial. The fact that Cauchy completeness in RingM at leads to such questions gives some legitimacy to this double category. Finitely generated projective is the next best thing to free of finite rank, so how does this relate to the previous sections? For any r we can write mi r =
p j=1
If we let fij (r) = σj (mi r) we get
σj (mi r)mj .
Morphisms of Rings
293
mi r =
p
fij (r)mj
(*)
j=1
the same formula as in Section 3. Theorem 22 (1) The functions fij define a non-unitary homomorphism / M atp (S). f: R (2) Any such homomorphism comes from a bimodule which is finitely generated and projective as a left S-module. Proof (a) f is clearly additive. Multiply (∗) by r on the right and apply σk to get σk (mi rr ) =
p
fij (r)σk (mj r )
j=1
i.e.
fik (rr ) =
p
fij (r)fjk (r ) .
j=1
/ M atp (S). fij (1) = σj (mi ) and so correThus f is a homomorphism R (p) / S (p) , which is not the identity unless sponds to the linear map στ : S the mi form a basis. / M atp (S), define M by (b) Given a homomorphism f : R M = s ∈ S (p) | sf (1) = s .
M is an S-R-bimodule. First of all it’s clearly a sub left S-module of S (p) . Define the right action of R by s • r = sf (r) .
sf (r)f (1) = sf (r1) = sf (r) so sf (r) ∈ M . The bimodule equations automatically hold because f is a homomorphism: the only thing to check is s • 1 = s, i.e. sf (1) = s, which is in the definition of M . / M by τ (s) = sf (1) and let σ : M / S (p) be the inDefine τ : S (p) clusion. Clearly τ σ = 1M , so M is finitely generated projective as a left / M atp (S) be S-module. The generators are τ (ei ) = ei f (1). Now let g : R the homomorphism defined by gij (r) = σj (ei f (1) • r) as in the discussion just before the statement of the theorem. Then gij (r) = σj (ei f (1)f (r)) = σj (ei f (r)) which is the j th component of the ith column of f (r), i.e. gij (r) = fij (r). 2
294
Robert Par´ e
/ M atp (S) have already appeared in the quantum Homomorphisms R field theory literature (see e.g. [7]) where they are called amplifying homomorphisms or amplimorphisms for short. Let’s define the double category Ampli whose objects are rings (with 1), / S, i.e. non-unitary homomorhorizontal arrows are amplimorphisms R / M atp (S) for some p. Composition is like for RingM at : first phisms R apply f to an element r ∈ R to get a p × p matrix in S, and then apply g to each entry separately to get a p × p block matrix of q × q matrices, and then consider this as a (pq) × (pq) matrix. Vertical arrows are bimodules M : R • / S and cells (p,f )
R
φ
M•
S
+3
(q,g)
/ R •M
/ S
are cells R
f
φ
M•
S
i.e. additive functions φ : M r ∈ R, s ∈ S we have
/ M atp (R )
g
+3
• M atq,p (M )
/ M atq (S )
/ M atq,p (M ) such that for every m ∈ M , φ(mr) = φ(m)f (r) φ(sm) = g(s)φ(m) .
Here we have taken the definition of cells to be the same as for RingM at , which doesn’t refer to identities at all and doesn’t need modification. One could instead define cells as S-R-linear maps from M into “M atq,p (M ) with scalars restricted” along f and g. For non-unitary homomorphisms, restriction of scalars doesn’t work exactly as for unitary ones. A modification is required to insure the unit conditions for the action. One has to look at S-R-linear maps from M into {A ∈ M atq,p (M ) | Af (1) = A = g(1)A} . However this is easily seen to be equivalent to the definition we have given. Theorem 23 (1) Ampli is a double category. (2) Ampli is vertically self dual. (3) Every horizontal arrow has a companion and a conjoint. (4) Every adjoint pair of vertical arrows is represented by a horizontal one, i.e. Ampli is Cauchy.
Morphisms of Rings
295
Proof (1) Horizontal composition of arrows and cells is the same as for RingM at , as is vertical composition. We check that the vertical composition of cells, as given in RingM at , is well-defined even if g is not unitary. Consider the composition of R
(p,f ) φ
M•
S
•M
(q,g) ψ
N•
T
+3
/ R
+3
(l,h)
/ S
•N
/ T .
It is given by R N ⊗S M •
T
f
/ M atp (R )
ψ⊗g φ
+3
h
• M atl,q (N )⊗M atq (S ) M atq,p (M )
/ M atl (T )
followed by the canonical M atl,q (N ) ⊗M atq (S ) M atq,p (M )
/ M atl,p (N ⊗S M ) .
ψ ⊗g φ is defined by (ψ ⊗g φ)(n ⊗S m) = ψ(n) ⊗M atq (S ) φ(m) and the only place that g enters is in the equation (ψ ⊗g φ)(ns ⊗S m) = (ψ ⊗g φ)(n ⊗S sm) i.e. ψ(ns) ⊗M atq (S ) φ(m) = ψ(n) ⊗M atq (S ) φ(sm) or ψ(n)g(s) ⊗M atq (S ) φ(m) = ψ(n) ⊗M atq (S ) g(s)φ(m) which clearly holds. “Unitarity” does not enter into it. (2) The vertical duality is the same as for RingM at , i.e. taking the opposite ring and adjusting the horizontal arrows by the use of transpose. / S, we’ve already constructed its (3) Given an amplimorphism (p, f ) : R companion in Theorem 22: (p, f )∗ = {s ∈ S (p) | sf (1) = s} .
296
Robert Par´ e
We have just to show that it’s actually a companion. Define the binding cells as follows: (p,f ) (1,1R ) /S /R R R R•
R
α +3
(p,f )
• (p,f )∗
/S
β
(p,f )∗ •
S
+3
•S
(1,1g )
/S
are the linear maps 1R
R
α +3
R•
R
f
/R
f
R
• M atp,1 ((p,f )∗ )
β
(p,f )∗ •
/ M atp (S)
/ M atp (S)
S
+3
1S
• M at1,p (S)
/S
β(s) = s .
α(r) = f (r)
In the definition of α, note that M atp,1 ((p, f )∗ ) is a p × 1 matrix of 1 × p matrices so can be identified with a p × p matrix. We have to check that the rows satisfy the defining property of (p, f )∗ , sf (1) = s. This is done simultaneously for all rows by f (r)f (1) = f (r) (the ith row is ei f (r)f (1) = ei f (r)). The existence of conjoints is dual. (4) This is just a formal summary of the preceding discussion: M has a right adjoint if and only if it is finitely generated and projective and this holds if and only if it is induced by a non-unitary homomorphism into a matrix ring. 2 We can now describe explicitly, in terms of matrices, the 2-category Ampli = Hor Ampli. The objects are rings with 1, the morphisms are pairs / M atp (S) is a (not necessarily (p, f ) where p is a positive integer and f : R unitary) homomorphism. Composition (p , f )(p, f ) is (p p, h) where h = (R
f
M atp (f )
/ M atp (S)
/ M atp M atp (T ) ∼ = M atp p (T )) .
A 2-cell t : (p, f ) ⇒ (q, g) is an R-R linear map R
f
φ
R•
R
/ M atp (S)
g
+3
• M atq,p (S)
/ M atq (S)
which is uniquely determined by its value on 1, as
Morphisms of Rings
297
φ(r) = φ(1 · r) = φ(1)f (r) = φ(r · 1) = g(r)φ(1) . Given any q × p matrix A such that Af (r) = g(r)A for all r, the function φ(r) = Af (r) gives such a cell. However, different A’s may give the same φ. Indeed (Af (1))f (r) = Af (r). To get uniqueness we have only to impose the extra condition Af (1) = A. Note that the vertical identity transformation on f is not Ip , the identity p × p matrix, which obviously doesn’t satisfy this last condition, but rather f (1) itself. We summarize this discussion in the following. Proposition 24 The 2-category Ampli of amplifying homomorphisms has / S as morphisms and unitary rings as objects, amplimorphisms (p, f ) : R as 2-cells φ : (p, f ) ⇒ (q, g), q × p matrices A such that (1) Af (1) = A, (2) for every r ∈ R, Af (r) = g(r)A. The identity 2-cell on (p, f ) is the p × p matrix f (1). Corollary 25 Two representations (p, f ) and (q, g) of the same S-R-bimodule are related as follows: There is a q × p matrix A and a p × q matrix B such that (1) Af (1) = A and Af (r) = g(r)A (2) Bg(1) = B
and Bg(r) = f (r)B
(3) AB = g(1) and BA = f (1) .
Postscript Double category considerations have naturally led to generalizing homomorphisms to amplimorphisms which arose independently in connection with quantum field theory. We also discovered a natural notion of 2-cell allowing us to compare parallel amplimorphisms. These are called intertwiners in the physics literature. Even if we restrict to actual homomorphisms the 2-cells are not trivial and provide a good unifying notion. Amplimorphisms of degree 1 are non-unitary homomorphisms. I don’t know what Jim would make of that, but later in life he had turned his attention to quantum mechanics, so I like to believe that he would be pleased with these developments.
References 1. B´ enabou, J., Introduction to Bicategories, in Reports of the Midwest Category Seminar, Lecture Notes in Math., no. 47 (1967), Springer Verlag, 1-77.
298
Robert Par´ e
2. Ehresmann, C., Cat´ egories et structures, Dunod, Paris, 1965. 3. Fujii S., Katsumata S., Melli` es PA. Towards a Formal Theory of Graded Monads. In: Jacobs B., L¨ oding C. (eds) Foundations of Software Science and Computation Structures. FoSSaCS 2016. Lecture Notes in Computer Science, vol 9634. Springer, Berlin, Heidelberg. 4. Grandis, M., Par´ e, R., Limits in double categories, Cahiers Topologie G´ eom. Diff´ erentielle Cat´ eg. 40 (1999), 162-220. 5. Lambek, J., Lectures on Rings and Modules, Blaisdell Publishing, 1966. 6. Lawvere, F. W., Metric spaces, generalized logic, and closed categories, Rendiconti del seminario mat´ ematico e fisico di Milano, XLIII (1973), 135-166. 7. Szlachanyi, K, Vecsernyes, K, Quantum symmetry and braid group statistics in G-spin models, Commun. Math. Phys. 156, (1993), 127-168,
Pomset Logic The other approach to noncommutativity in logic Christian Retor´e
Abstract Thirty years ago, I introduced a noncommutative variant of classical linear logic, called pomset logic, coming from a particular categorical interpretation of linear logic known as coherence spaces. In addition to the usual commutative multiplicative connectives of linear logic, pomset logic includes a noncommutative connective, “” called before, associative and self-dual: (A B)⊥ = A⊥ B ⊥ . The conclusion of a pomset logic proof is a Partially Ordered Multiset of formulas. Pomset logic enjoys a proof net calculus with cut-elimination, denotational semantics, and faithfully embeds sequent calculus. The study of pomset logic has reopened with recent results on handsome proof nets, on its sequent calculus, or on its follow-up calculi like deep inference by Guglielmi and Straßburger. Therefore, it is high time we published a thorough presentation of pomset logic, including published and unpublished material, old and new results. Pomset logic (1993) is a noncommutative variant of linear logic (1987) as is Lambek calculus (1958 !) and it can also be used as a grammatical formalism. Those two calculi are quite different, but we hope that the algebraic presentation we give here, with formulas as algebraic terms and with a semantic notion of proof (net) correctness, better matches Lambek’s view of what a logic should be.
1 Presentation Lambek used to refer to his logic [19] with the words syntactic calculus, thus expressing his preference for algebra, thereafter confirmed with his move from Christian Retor´ e LIRMM, Univ Montpellier, CNRS, Montpellier, France e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 C. Casadio and P. J. Scott (eds.), Joachim Lambek: The Interplay of Mathematics, Logic, and Linguistics, Outstanding Contributions to Logic 20, https://doi.org/10.1007/978-3-030-66545-6_9
299
300
Christian Retor´ e
categorial grammars to pregroup grammars, which are not a logical system. Up until the invention of linear logic in the late 1980s, the Lambek calculus was a rather isolated logical system, despite some study of frame semantics, which are typical of substructural logics. Linear logic [8] arose from the study of the denotational semantics of system F, itself arising from the study of ordinals [7]. For interpreting system F (second order lambda calculus) with variable types, one needed to refine the categorical interpretation of simply typed lambda calculus via cartesian closed categories. In order to quantify over types, Girard considered the category of coherence spaces, initially called qualitative domains, with stable maps, which preserve directed joins and pullbacks. A finer study of coherence spaces led Girard to decompose the arrow type construction into two steps: one is to contract several objects of type A into one (modality/exponential !) and the other one being linear implication (noted ), which rather corresponds to a change of state than to a consequence relation. Linear logic was first viewed as a proof system (sequent calculus or proof nets) that is well interpreted by coherence spaces. The initial article [8] also included the definition of phase semantics, which resembles frame semantics developed for the Lambek calculus. It was not long before the connection between linear logic and Lambek calculus was found: after some early remarks by Girard, Yetter [56] observed the connection at the semantic level, while Abrusci [1] explored the syntactic, proof theoretical connection, while [39] explored proof nets and completed the insight of [52]. Basically, Lambek calculus is noncommutative intuitionistic multiplicative logic, the order between the two restrictions, intuitionistic and noncommutative, being independent. An important remark that I discussed with Lamarche in [18] states that noncommutativity requires linearity in order to get a proper logical calculus. Around 1988, my PhD advisor Jean-Yves Girard pointed to my attention a binary noncommutative connective in coherence spaces. In coherence spaces, this connective, called “before”, has intriguing properties: • is self-dual (AB)⊥ ≡ (A⊥ B ⊥ ), without swapping the two components — by X ≡ Y we mean that there is a pair of canonical invertible linear maps between X and Y . • is noncommutative (A B) ≡ (B A) • is associative ((A B) C) ≡ (A (B C)); • it lies in between the commutative conjunction ⊗ and disjunction : there is a canonical linear map from (A ⊗ B) to (A B) and one from (A B) to (AB) — remember that coherence spaces validate the mix rule (A ⊗ B) (AB); I designed a proof net calculus with this connective, in which a sequent that is the conclusion of a proof is a partially ordered multiset of formulas. This proof net calculus enjoys cut-elimination and a sound and faithful (coherence) semantics, in the sense that having an interpretation which is preserved under cut-elimination is the same as being syntactically correct, cf. Theorem 6.
Pomset Logic: the other approach to noncommutativity in logic
301
I proposed a version of sequent calculus that easily translates into those proof nets and enjoys cut-elimination as well [36]. However, despite many attempts by me and others (Sylvain Pogodalla, Lutz Straßburger) over many years, we did not find a sequent calculus that would be complete w.r.t. the proof nets. Later on, Alessio Guglielmi, soon joined by Lutz Straßburger, designed the calculus of structures, a term calculus more flexible than sequent calculus (deep inference) with the before connective [10, 11, 14] (called “seq”), a system that is quite close to dicograph rewriting in this paper[45, 46]. They tried to see whether one of their systems called SBV was equivalent to pomset logic; here we include in Section 7.2 a new result saying deep pomset, i.e., a rewrite view of pomset logic proof nets, corresponds to SBV as well as an old result of ours saying that SBV tautologies are provable formulas of pomset proof nets. As a reviewer of my habilitation [47] Lambek wrote: He constructs a model of linear logic using graphs, which is new to me. His most original contribution is probably the new binary connective which he has added to his noncommutative version of linear logic, although I did not find where it is treated in the sequent calculus. (J. Lambek, Dec. 3, 2001)
I deliberately omitted my work on sequent calculus in my habilitation manuscript, because none of the sequent calculi I experimented with was complete w.r.t. pomset proof nets which are “perfect”, i.e., enjoy all the expected proof theoretical properties. In addition, at that time I did not yet have a counter example to my proposal of a sequent calculus; the one in Figure 10 of Section 6 was found ten years later with Lutz Straßburger. However, very recently, Slavnov found a sequent calculus that is complete w.r.t. pomset proof nets [54]. The structure of the decorated sequents that Slavnov uses is rather complex1 and the connective is viewed as the identification of two dual connectives, one being more like a ⊗ and the other more like a . As this work is not mine I shall not say much about it, but Slavnov’s work really sheds new light on pomset logic. Given the complexity of Slavnov sequent calculus, it is enjoyable to provide here a simple sequent calculus for pomset logic, even though it does not generate all proof nets of pomset logic. Pomset logic and the Lambek calculus systems share some properties: • • • • 1
They both are linear calculi; They both handle noncommutative connective(s) and structured sequents; They both have a sequent calculus; They both enjoy cut-elimination;
A decorated sequent according to Slavnov is a multiset of pomset formulas A1 , . . . , An with p ≤ n/2 binary relations (Rk ) for 1 ≤ k ≤ p between sequences of length p ≤ n/2 of formulas from Γ ; those relations are such that whenever (B1 , . . . , Bk Rk (C1 , . . . , Ck ) the two sequences (B1 , . . . , Bk ) and (C1 , . . . , Ck ) have no common elements and (B1 , . . . , Bk )Rk (C1 , . . . , Ck ) entails (Bσ(1) , . . . , Bσ(k) )Rk (Cσ(1) , . . . , Cσ(k) ) for any permutation σ of {1, . . . , k} – those relations correspond to the existence of disjoint paths in the proof nets from Bi to Ci .
302
Christian Retor´ e
• They both have a complete sequent calculus (regarding pomset logic, the complete sequent calculus is quite new); • They both can be used as a grammatical system. However, Lambek calculus and pomset logic are quite different in many respects: • Lambek calculus is naturally an intuitionistic calculus while pomset logic is naturally a classical calculus — although in both cases variants of the other kind can be defined, cyclic linear logic of Yetter is a classical version of Lambek calculus [56], and the LLMS system of Reddy is a term calculus for the semantics of higher-order imperative languages, and it can be considered as an intuitionistic version of pomset logic [33].2 • Lambek calculus is a restriction of the usual multiplicative linear logic according to which the connectives are no longer commutative, while pomset logic is an extension of usual commutative multiplicative linear logic with a noncommutative connective. • Lambek calculus deals with totally ordered multisets of hypotheses while pomset logic deals with partially ordered multisets of formulas. As grammatical systems, pomset logic allows relatively free word order, while Lambek calculus deals only with linear word orders. • Lambek calculus has an elegant truth-value interpretation within the subsets of a monoid (frame semantics, phase semantics), while there is no such notion for pomset logic. • Lambek calculus has no simple concrete interpretation of proofs up to cutelimination (denotational semantics) while coherence semantics faithfully interprets the proofs of pomset logic. This list shows that those two comparable systems also have many differences. However, the presentation of pomset logic provided by the present article makes Lambek calculus and pomset logic rather close on an abstract level. As he often said, Lambek did not like standard graphical or geometrical presentations of linear logic-like proof nets. He told me several times that moving from geometry to algebra has been a great progress in mathematics and solved many issues, notably in geometry, and that proof net study was going the other way round. I guess this is related to what he said about Theorem 6 in the present paper: It seems that this ingenious argument avoids the complicated long trip condition of Girard. It constitutes a significant original contribution to the subject. (J. Lambek, Dec. 3, 2001)
This paper is a mix (!) of easy-to-access published work [6, 39, 4, 42, 43, 40, 48], research reports and more confidential publications [36, 35, 35, 21, 37, 22, 18, 23, 45, 46, 44, 47, 32], unpublished material between 1990 and 2
This model of higher-order imperative programming with “before” strongly inspired the author’s subsequent model [34], although it is not really a logical system anymore.
Pomset Logic: the other approach to noncommutativity in logic
303
2020, which are all presented in the same and rather new unified perspective. The presented material can be divided into four topics: proof nets handsome proof nets both for MLL Lambek calculus and pomset logic, and other work on proof nets [6, 18, 39]; combinatorics (di)cographs and sp orders [35, 36, 40, 4, 45, 46, 44, 47, 48, 32]; coherence semantics [36, 37, 43, 47]; grammatical applications of pomset logic to computational linguistics [21, 22, 23, 45, 46, 47]. The contents of the present article are divided into seven sections numbered as follows: 2. As a starter, we offer a glimpse of pomset logic, an informal tour which summarizes the most important constructions and results of pomset logic. 3. We then present results on series parallel partial orders, cographs and dicographs that subsumes those two notions and present dicographs either as sp pomsets of formulas or as dicographs of atoms, and explain the guidelines for finding a sequent calculus. This combinatorial part is a prerequisite for the subsequent sections. 4. Proof nets without links, the so-called handsome proof nets, are presented as well as the cut-elimination for them. 5. The semantics of proof nets, preserved under cut-elimination and equivalent to their syntactic correctness, is then presented. 6. Then the sequentialisation, “the quest” of a complete sequent calculus, is discussed and we provide an example of a proof net that does not derive from any simple sequent calculus. We do not present Slavnov’s sequent calculus, which is quite complicated and thoroughly presented in his recent paper [54]. 7. We then present proofs in an algebraic manner, a` la deep inference, with deduction rules as term rewriting and show the correspondence between this view and the calculus of structures known as SBV. 8. Finally, we explain how one can design grammars by associating words with partial proof nets of pomset logic.
2 A glimpse of pomset logic When asked for a presentation of a different sort of logic, the preferred way of most readers is to provide them with a sequent calculus. Hence we shall give a simple sequent calculus which is a subcalculus of the sequents that pomset logic is able to derive. The formulas of pomset logic are defined from atoms (propositional variables or their negation) by means of the usual commutative multiplicative
304
Christian Retor´ e
connectives and ⊗ together with the new noncommutative multiplicative connective (before)— the three of them are associative. F ::= P | P ⊥ | F ⊗ F | F F | F F
It is assumed that formulas are always in negative normal form: negation only applies to propositional variables. This is possible and standard when negation is involutive and satisfies the De Morgan laws: (A⊥ )⊥ (AB)⊥ (A B)⊥ (A ⊗ B)⊥
= = = =
A (A⊥ ⊗ B ⊥ ) (A⊥ B ⊥ ) (A⊥ B ⊥ )
Sequents of pomset logic are right handed and they are partially ordered multisets of formulas (pomsets of formulas). We assume those partial orders are described by operations from the one point order. Although we shall be much more precise in the next section (Section 3.2) about partial orders, we need to define two operations on partial orders, at least informally. Given two partially ordered multisets of formulas, Γ and Δ, let us define two orders whose domain is the disjoint union of the two domains and which preserve order on each domain: • {Γ, Δ} their parallel composition: any two formulas one of them in Γ and the other one in Δ cannot be compared. This operation is associative and commutative. • Γ ; Δ their series composition: any formula in Γ is smaller than any formula in Δ. This operation is associative, but noncommutative. The expression Γ [X] denotes any pomset including a propositional variable X, and given a pomset Δ the expression Γ [Δ] denotes the pomset obtained by substituting in Γ [X] the pomset Δ (as a term) for the formula X. The sequent calculus in Figure 1 extends classical multiplicative linear logic. Orders can be “weakened” until the discrete order is reached. When dimix is not used (hence entropy cannot be used either) this calculus is MLL. As sequent calculus is best suited for classical logic, and as intuitionistic logic fits in well with natural deduction, multiplicative linear logic is better expressed with proof nets, and this is even more striking in the pomset logic case. Nevertheless, for pedagogical reasons we give a simple sequent calculus for pomset logic, which does not encompass all proof nets to be defined later. There is an elegant proof net calculus into which we map the sequent calculus proofs, defined in Section 8.1, which identifies the sequent calculus proofs that are essentially similar, like the ones obtained one from the other by commuting rules. In addition to the par and times links, one needs a link for ‘before’. Although a Danos-Regnier criterion is absolutely possible, it is unnatural for this calculus, for which it is easier to use edge bicoloured graphs (blue and red) with undirected B edges and R edges, some of them being directed. Links are given in Figure 3.
Pomset Logic: the other approach to noncommutativity in logic Γ
Δ
Γ ; Δ
Γ
dimix
Γ
305
entropy(Γ sub sp order of Γ )
{a, a⊥ } {A, Γ }
{B, Δ}
⊗ / cut when A = B ⊥
{Γ, (A ⊗ B), Δ} Γ [{A, B}] Γ [AB]
Γ [A; B]
··
(A ∼ B)
Γ [A B]
→
(A ∼ B)
Fig. 1 A simple sequent calculus for pomset logic {a, a⊥ }
{b, b⊥ }
aa⊥
bb⊥
(aa⊥ ) ⊗ (bb⊥ )
c, c⊥ dimix
(aa⊥ ) ⊗ (bb⊥ ); {c, c⊥ } {(aa⊥ ) ⊗ (bb⊥ ); c, c⊥ }
entropy
Fig. 2 Example of a proof in pomset logic in the simple sequent calculus of Figure 1.
Premises
Axiom None A
Par A and B
B
`
a a⊥ RnB link Conclusion(s) a and a⊥
Times ⊗ A and B
Before A and B A
B
AB AB
AB AB
A
Cut K and K ⊥ B
⊗
A⊗B A⊗B
K
⊗
K⊥
Cut None
Fig. 3 The links of pomset logic as edge bicoloured graphs. In a proof structure, the conclusion of a link is the premise of at most one link, and each premise of a link is the conclusion of exactly one link. A formula that is not the premise of any link is said to be a conclusion of the proof structure. Cuts are conclusions K ⊗ K ⊥ ; they never can be the premise of any link.
306
Christian Retor´ e
Proof nets are defined as the simple graphs defined from those links for which blue edges define a perfect matching and without elementary circuits (directed cycles without twice the same vertex) alternating the B (axioms and formulas) and the R edges (connectives).
a `
a⊥ b⊥
b ` ⊗
c
c⊥
Fig. 4 The proof net corresponding to the sequent calculus proof in Figure 2
However, there is a much more interesting view of multiplicative proof nets, the so-called handsome proof nets that I first introduced for usual multiplicative linear logic [40, 44] which do not have links, as we shall see in Section 4.1. A handsome proof net is a graph which does not depend on the associativity and commutativity of the connectives. The logical formula is the R graph, the axioms linking atoms are the B edges and the criterion is: every alternating elementary circuit contains a chord (that is an edge directed or not linking two points on the circuit but not itself in the circuit). 3 This proof net calculus enjoys cut-elimination. Furthermore, cuts take part to the order on the conclusions, which might be viewed as the encoding of a strategy to reduce them (see Section 4.2). Some graph rewriting rules preserve the correctness of handsome proof nets, so we develop in Section 7.2 a notion of derivation of proof nets from ⊥ axioms (a1 a⊥ 1 )⊗· · ·⊗(an an ) which are themselves proof nets. This kind of derivation is called deep pomset logic; of course as shown in [45, 46] rewriting preserves correctness. Such a rewrite view of pomset logic was just suggested at the end of [44], but it was later developed by Gugliemi and Straßburger in [14] within the calculus of structures and deep inference. We here prove that their SBV calculus corresponds to deep pomset logic. Of course we would like 3 In a proof net with links there cannot be chords on alternate elementary cycles. Hence this criterion when applied to proof nets with links is the one we defined above for proof nets with links.
Pomset Logic: the other approach to noncommutativity in logic a⊥
307
b
b⊥
a
c⊥
c
Fig. 5 The handsome proof net corresponding to the proof net in Figure 2 and to the sequent calculus proof in Figure 2
to know whether the rewriting from axioms produces all correct proof nets, but we do not know. Pomset logic is easily interpreted in coherence spaces. Proofs (proof nets) are interpreted as elements of the coherence space associated with the conclusion, in such a way that this interpretation of proofs is preserved by cutelimination (see Section 5.2). Furthermore, the fact that the interpretation of a proof net as a set of “experiments” is a semantic object (a clique of the corresponding coherence space) is equivalent to the correctness of the proof net (see Section 5.2).
3 Structured sequents as dicographs of formulas In order to draw a distinction between AB and BA, we need some structure on the formulas in a sequent, i.e., on multisets of formulas, and operations on partial orders. This is analogous to what happens with cyclic linear logic [56]: formulas are organised in a total cyclic order, and binary rules need operations to combine cyclic orders.
3.1 Looking for structured sequents The formulas of pomset logic we consider are defined from atoms (propositional variables or their negation) by means of the usual commutative multiplicative connectives and ⊗ together with the new noncommutative connective (before) – the three of them are associative. As mentioned above, as De Morgan laws allow, it is assumed that formulas are always in negative normal form.
308
Christian Retor´ e
We want to deal with series parallel partial orders of formulas: O1 O2 corresponds to parallel composition of partial orders (disjoint union) and O1 O2 corresponds to the series composition of partial orders (every formula in the first partial order O1 is less than every formula in the second partial order O2 ). Thus, a formula written with and corresponds to a partial order between its atoms. Unsurprisingly, we first need to study a bit of partial orders defined with series and parallel composition. However, what about the multiplicative conjunction, namely, the ⊗ connective? It is commutative, but it is distinct from . In order to include ⊗ in this view, where formulas are binary relations on their atoms, we consider the more general class of irreflexive binary relations that are obtained by parallel composition, series composition and ⊗ symmetric series compositions, which basically consists in adding the relations of R1 R2 and the ones of R2 R1 . The relations that are defined using , ⊗, are called directed cographs or dicographs for short. If only and ⊗ are used the relations obtained are cographs. They have already been quite useful for studying MLL; see e.g., Theorem 8 below. Before defining pomset logic, we need a presentation of directed cographs.
3.2 Directed cographs or dicographs An irreflexive relation R ⊂ P 2 may be viewed as a graph with vertices P and with both directed edges and undirected edges but without loops. Given = {(a, b) ∈ an irreflexive relation R let us call its directed part (its arcs) R ¯ R|(b, a) ∈ R} and its symmetric part (its edges) R = {(a, b) ∈ R|(b, a) ∈ R}. ¯ and It is convenient to note a−b for the edge or pair of arcs (a, b), (b, a) in R to denote a → b for (a, b) in R when (b, a) is not in R. Definition 1 (dicograph) We consider the class of directed cographs, called dicographs, which is the smallest class of binary irreflexive relations containing the empty relation on the singleton sets and closed under the following operations defined on pairs of cographs with disjoint domains E1 and E2 yielding a binary relation on E1 E2 R2 = R1 R2 (E1 × E2 ) (E2 × E1 ) • symmetric series composition R1 ⊗ • directed series composition R1 R2 = R1 R2 (E1 × E2 ) 2 = R1 R2 • parallel composition R1 R
When directed series composition is not used, the graph is said to be a cograph. When symmetric series composition is not used, the graph is said to be a series-parallel partial order (sp-order).
Whenever there are no directed edges (a.k.a. arcs) the dicograph is a cograph ( is not used). Cographs are characterised by the absence of P4 as
Pomset Logic: the other approach to noncommutativity in logic
309
many people (re)discovered, including us, [40]; see e.g. [17]. The graph P4 : a−b−c−d is a path of length 4, and P4 -free means that the restriction of the graph to four distinct points never is P4 : either it contains another edge, or does not contain the three consecutive edges. Whenever there are only directed edges (a.k.a. arcs) the dicograph is an sp is not used) and they are characterised as N -free partial orders — order (⊗ as rediscovered in [36]; see e.g. [27]. The finite order N is a < b, c < b, c < d, and N -free means that the restriction of the order to four distinct points never is N : either it contains another order relation, or it does not contain the three order relations of N . We characterised the class of directed dicographs as follows [4, 45, 46]: Theorem 1 An irreflexive binary relation R is a dicograph if and only if: • • •
is an N-free order (R is an sp order). R ¯ ¯ R is P4 -free (R is a cograph). Weak transitivity:4 for all a, b, c in the domain of R and (b, c) ∈ R then (a, c) ∈ R and if (a, b) ∈ R then (a, c) ∈ R if (a, b) ∈ R and (b, c) ∈ R
A dicograph can be described with a term in which each element of the domain appears exactly once. This term is written with the three binary operators ⊗, and and for a given dicograph this term is unique up to the associativity of the three operators, and to the commutativity of the first two, namely and ⊗. Definition 2 The dual (or negation) R⊥ of a dicograph R on P is defined and (R⊥ ) = (P 2 \ ⊥ = R as follows: points are given a ⊥ superscript, R ⊥ ⊥ ⊥ ⊥ ¯ \ {(x, x)|x ∈ P } or (a ) = (a) , (a ) = a, (X ⊗ Y )⊥ = (X ⊥ Y ⊥ ), R) ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ) = (X ⊗ Y ), (X Y ) = (X Y ). (X Y Definition 3 (equivalent points in a dicograph) Two points a and b of P are said to be equivalent w.r.t. a relation whenever for all x ∈ P with x = a, b one has (x, a) ∈ R ⇔ (x, b) ∈ R and (a, x) ∈ R ⇔ (b, x) ∈ R. There are three kinds of equivalent points: • Two points a and b in a dicograph are said to be freely equivalent in ·· a dicograph (notation a ∼ b) whenever the term can be written (using and and ⊗) T [ab]. In associativity of and the commutativity of other words, a ∼ b, (a, b) ∈ R, (b, a) ∈ R. • Two points a and b in a dicograph are said to be arc equivalent in → a dicograph (notation a ∼ b) whenever the term can be written (using ⊗ and and ⊗) T [a associativity of , and the commutativity of b]. In other words, a ∼ b, (a, b) ∈ R, (b, a) ∈ R. 4 The definition we give here is ours [4] from 1997. In 1999, Guglielmi found an alternative but equivalent definition [11].
310
Christian Retor´ e
• Two points a and b in a dicograph are said to be edge equivalent in − a dicograph (notation a ∼ b) whenever the term can be written (using and and ⊗) T [a ⊗ b]. In associativity of and the commutativity of other words, a ∼ b, (a, b) ∈ R, (b, a) ∈ R.
3.3 Dicograph inclusion and (un)folding The order on a multiset of formulas, can be viewed as a set of constraints. Hence, when a sequent is derivable with an sp order I it is also derivable with a sub sp order J ⊂ I — we named this structural rule entropy [36]. All but one (⊗4) of the transformations in Figure 6 of a dicograph into a subdicograph (w.r.t. inclusion) preserve provability. Hence, we need to characterise the inclusion of a dicograph into another and possibly to view the inclusion as a computational process that can be performed step by step. Fortunately, in [4] we characterised the inclusion of a dicograph in another dicograph by a rewriting relation: Theorem 2 A dicograph R is included into a dicograph R if and only if the term R rewrites to the term R using the rules of Figure 3.3 up to the and to the commutativity of and ⊗. associativity of ⊗, and ,
3.4 Folding and unfolding pomset logic sequents A structured sequent of pomset logic (resp. of MLL) is a multiset of formulas of pomset logic (resp. of MLL) with the connectives , ⊗, endowed with a dicograph. Definition 4 (Folding/Unfolding) On such sequents one may define “folding” and “unfolding” which transform a dicograph of formulas into another dicograph of formulas by combining two equivalent formulas A and B of the dicograph into one formula A ∗ B (folding) or by splitting one compound formula A ∗ B into its two immediate subformulas A and B with A and B equivalent in the dicograph. More formally: Folding Given a multiset of formulas X1 , . . . , Xn endowed with a dicograph T , ·· j ] into T [(Xi Xj )] — in the multiset, if Xi ∼ Xj in T rewrite T [Xi X the two formulas Xi and Xj have been replaced with a single Xi Xj . → if Xi ∼ Xj in T rewrite T [Xi Xj ] into T [(Xi Xj )] — in the multiset, the two formulas Xi and Xj have been replaced with a single formula Xi X j .
Pomset Logic: the other approach to noncommutativity in logic rule name
dicograph
311
dicograph
4 (X Y) ⊗ (U V ) (X ⊗ U) (Y ⊗ V) ⊗ Y) ⊗ U ⊗3 (X
⊗2
U Y ⊗
U) Y (X ⊗
Y U
(U U) V) ⊗4 (X Y) ⊗ V ) (X ⊗ (Y ⊗ U Y) ⊗ ⊗3l (X ⊗3r ⊗2
U) (X ⊗ Y
(U V) Y ⊗
V) U (Y ⊗
U Y ⊗
U Y
Y) V ) (X (Y 4 (X (U U) V)
Y) U 3l (X
3r
2
Y (X U)
V) (U Y U Y
(Y U V) Y U
4 Fig. 6 A complete rewriting system for dicograph inclusion. Beware that the first rule ⊗ marked with a symbol is wrong when the rewriting rule is viewed as a linear implication )⊗ (U V ) (X ⊗ U )(Y ⊗ V ) although all other rewriting rules are on formulas: (X Y correct when viewed as linear implications.
⊗
−
Xj ] into T [(Xi ⊗ Xj )] — in the if Xi ∼ Xj in T rewrite T [Xi ⊗ multiset, the two formulas Xi and Xj have been replaced with a single formula Xi ⊗ Xj .
Unfolding
is the opposite:
j ] — in the multiset, the formula turn T [(Xi Xj )] into T [Xi X ·· Xi Xj has been replaced with two formulas Xi and Xj with Xi ∼ Xj turn T [(Xi Xj )] into T [Xi Xj ] — in the multiset, the formula Xi Xj → has been replaced with two formulas Xi and Xj with Xi ∼ Xj Xj ] — in the multiset, the formula ⊗ turn T [(Xi ⊗ Xj )] into T [Xi ⊗ → Xi Xj has been replaced with two formulas Xi and Xj with Xi ∼ Xj
3.5 A sequent calculus attempt with sp pomset of formulas Now let us try to extend multiplicative linear logic with a noncommutative multiplicative self-dual connective (rather than to restrict existing connectives to be noncommutative), and let us also try to deal with partially ordered multisets of formulas, with A B corresponding to “the subformula A (a resource) comes before the subformula B (another resource)”.
312
Christian Retor´ e
That way one may think of an order on computations: a cut between (A B)⊥ and A⊥ B ⊥ reduces to two smaller cuts A−cut−A⊥ and B−cut−B ⊥ with the cut on A being prior to the cut on B, while a cut between (AB)⊥ and A⊥ ⊗ B ⊥ reduces to two smaller cuts A−cut−A⊥ and B−cut−B ⊥ with the cut on A being in parallel with the cut on B. This makes sense when linear logic proofs are viewed as programs and cutelimination as computation. Doing so one may obtain a sequent calculus using partially ordered multisets of formulas as in [36], but if one wants a sequent with several conclusions that are partially ordered to be equivalent to a sequent with a unique conclusion, one has to only consider sp partial orders of formulas, as defined and series composition in Subsection 3.2 with parallel composition noted noted . If we want all formulas in the sequent to be ordered, the calculus should handle right-handed sequents, i.e., be classical.5 As seen above, we can represent this sp-ordered multiset of formulas endowed with an sp order by an sp term whose points are the formulas and and the associativity of such a term is unique up to the commutativity of and . This simple sequent calculus is just here to suggest how one may inductively construct proofs in a well-established framework. One should not be too demanding; we mainly ask for this calculus to only handle sp orders on conclusions6 so they can be represented with a formula, and to yield correct proof nets only. This sequent calculus is much more restricted than the rules of sequent calculus in [42] or [36], dealing with general partial orders, and whose aim was to provide a complete calculus yielding all the proof nets, while today Slavnov proposed such a complete sequent calculus [54]. In this sequent calculus, cuts are conclusions of the form K ⊗ K ⊥ and they can be part of the order on conclusions: a cut Γ [((A ⊗ B) ⊗ (A⊥ B ⊥ )] reduces into two cuts that are equivalent and parallel in the order Γ [((A ⊗ ⊗ B ⊥ )], while a cut Γ [((A B) ⊗ (A⊥ B ⊥ )] reduces into two cuts A⊥ )(B that are equivalent but come one before the other Γ [((A ⊗ A⊥ ) (B ⊗ B ⊥ )]. This will be better explained in the proof net section. 5
Lambek calculus is intuitionistic and when it is turned into a classical system, formulas are endowed with a cyclic order [56, 1, 18], i.e., a ternary relation which is not an order and which is quite complicated when partial — see the “seaweeds” that first appeared in [53] and subsequently used by Abrusci and Ruet [3] and by de Groote and Lamarche [5]. An alternative rule for ⊗ with sp orders is to apply the ⊗ rule between two minimum formulas in their order component, and to have cut between two formulas one of which is isolated in its ordered sequent. These alternative rules are trickier, and according to our recent investigation, this trickier sequent calculus does not enjoy better properties than the simple sequent calculus given in Figure 7.
6
Pomset Logic: the other approach to noncommutativity in logic Γ
Γ
Δ dimix
Γ Δ
Γ
313
entropy(Γ sub sp order of Γ )
a, a⊥
AΓ Γ [AB] Γ [AB]
B Δ
⊗ B)Δ Γ (A
⊗ / cut when A = B ⊥
··
when A ∼ B
Γ [A B] Γ [A B]
→
when A ∼ B
Fig. 7 Sequent calculus on sp pomset or formulas; called sp-pomset sequent calculus
4 Proof nets This section presents proof structures and nets (the correct proof structures), in an abstract and algebraic manner, without links or trip conditions; such proof structures and nets are called handsome proof structures and nets. Basically, proof nets consist of a dicograph R of atoms representing the conclusion formula, and axioms that are disjoint pairs of dual atoms constituting a partition B of the atoms of R. The proof net can be viewed as an edge bicoloured graph: the dicograph is represented by R arcs and edges (Red and Regular in the pictures), while the axioms B (Blue and Bold in the picture). In such a setting, the correctness criterion expresses some kind of orthogonality between R and B. A proof net can also be viewed as a term, axioms being denoted by indices used exactly twice on dual atoms.
4.1 Handsome pomset proof nets In fact, proof nets have (almost) been defined above! Definition 5 A pomset logic handsome proof structure or dicog-PN is a graph G = (V, B, R) with two kinds of edges: ⊥ • V: {a1 , . . . , an , a⊥ 1 , . . . , an } i.e., vertices consist of a finite number n of pairs of dual atoms — some of which may have the same name; hence, they should be distinguished with a superscript. • B edges (B stands for Bold or Blue) are undirected and link a vertex to its dual. B edges define a perfect matching of G, that is to say, no two B edges are adjacent and every vertex is incident to a B edge.
314
Christian Retor´ e
• R edges (R for Regular or Red) are a dicograph over the vertices. There are no loops, some R edges are directed and are called arcs, some are undirected. The dicograph can be described by a term, but when terms that only differ because of the associativity of , , ⊗ and the commutativity of , ⊗, the proof structures π and π are equal.
Of course, not all proof structures are correct; for instance ({a, a⊥ }, B = a⊥ ) is incorrect (cf. examples in Figure 8). {a, a⊥ }, R = a ⊗ Correctness criterion A handsome proof structure is said to be a handsome proof net or to be correct whenever every elementary circuit (directed cycle) of alternating edges in R and in B contains a chord — an edge or arc connecting two points of the circuit but not itself nor its reverse in the circuit. In short, every ae circuit contains a chord. Observe that this chord cannot be in B, hence it is in R, and it can either be an R arc or an R edge. Theorem 3 (Nguyˆ en) Recently it was established that checking whether a proof structure satisfies the above correctness criterion is coNP complete [28].
a⊥
a⊥
b
b⊥
a
b⊥
a
c
c⊥
b
c
c⊥
(a) Two incorrect handsome proof structures (chordless æ-circuit: a, c⊥ , c, a⊥ , a in both cases) a⊥
a⊥
b
b⊥
a
c⊥
c
b
b⊥
a
c⊥
c
(b) Two correct handsome proof structures (i.e. two correct handsome proof nets) Fig. 8 Two incorrect handsome proof structures on top, two correct handsome proof structures (or proof nets) underneath.
Pomset Logic: the other approach to noncommutativity in logic
315
Theorem 4 Given a proof net (V, B, R) if R R (so R ⊂ R) using rewriting rules of Figure 3.3 except ⊗4 then (B, R ) is a proof net as well, i.e., all the rewrite rules preserve the correctness criterion on page 314. Proof See [45, 46].
Corollary 1 If a proof structure (V, B, R) with ⊥ K ⊥ (x⊥ R = t[K(x1 , ..., xk ) ⊗ 1 , ..., xk )]
is correct, so is the proof structure (V, B, R ) with x⊥ ⊥ R = t[(x1 ⊗ 1 ) · · · (xk ⊗ xk )] In other words, rewriting a cut of a correct handsome proof net into smaller cuts preserves the correctness criterion. Proof We proceed by induction on the size of the cut-formula K to show that R is actually obtained by rewriting not using ⊗4, and given Theorem 4 the result holds. If K contains no connective, then k = 1 and there is nothing to prove. K2 then K ⊥ = K1⊥ K 2⊥ and If K = K1 ⊗ ⊥ ⊥ (K1 K K2 K2 ) ⊗ 2 ) = (K1 ⊗ (K1⊥ K 2⊥ )) ⊗ (K1 ⊗ ⊥ ⊥ 2 )⊗ K2 by ⊗3 K1 )K which rewrites to ((K1 ⊗ 2⊥ ⊗ K1⊥ )(K K2 )) by ⊗3. which rewrites to ((K1 ⊗
(K1⊥ K 2⊥ ) If K = K1 K2 then K ⊥ = K1⊥ K2⊥ and (K1 K2 ) ⊗ ⊥ ⊥ which rewrites to ((K1 ⊗ K1 ) (K2 K2 ) by ⊗ 5 2⊥ ⊗ K1⊥ )(K K2 )) by 2. which rewrites to ((K1 ⊗
In both cases we end up with a similar situation with K1 , K2 having one connective less. Proposition 1 In some cases ⊗4 does not preserve correctness, i.e., turns a correct proof net into an incorrect proof structure. ¯ = (aa⊥ ) ⊗ Proof Consider the following example: B = {a−a⊥ , b−b⊥ } R = B ⊥ ⊥ ⊥ ⊥ ⊥ (bb ) = {a−b, a−b , a −b , a −b}. ¯ rewrites to Using ⊗4, R = B ⊥ R = (a ⊗ b)(a ⊗ b⊥ ) = {a−b, a⊥−b⊥ }, and the proof net (B, R ) contains the chordless æ circuit (a, a⊥ ) ∈ B, (a⊥ , b⊥ ) ∈ R , (b⊥ , b) ∈ B, (b, a) ∈ R. Observe that it does not mean that every correct proof net (V, B, R) with ¯ by the allowed rewrite rules axioms B can be obtained from (V, B, R = B) ¯ it is known that ¯ is ⊗ ⊥ i (ai a ). Indeed, since R ⊂ B (all but ⊗4) where B i ¯ R but one cannot tell whether ⊗4 has been used. B Indeed, as shown above ⊗4 does not preserve correctness but it may happen, and all the rules (including ⊗4) do not preserve incorrectness, i.e., may turn an incorrect proof structure into a correct proof net.
316
Christian Retor´ e
4.2 Cut and cut-elimination What about the cut rule? The calculus we present here has no rules in the standard sense, in particular no binary rules that would cancel a K and a K ⊥ . A cut is a tensor K ⊗ K ⊥ which may not be a strict subformula of some other formula. K⊥ So a cut in this setting simply is a symmetric series composition K ⊗ K ⊥ ). Assume the atoms of K are in a dicograph whose form is T (K ⊗ ⊥ {a1 , . . . , an } so atoms of K ⊥ are {a⊥ 1 , . . . , an }. Cut-elimination consists in suppressing all edges and arcs between two atoms of K, all edges and arcs between two atoms of K ⊥ , and all edges ai , a⊥ j with i = j — so the only ⊥ edges incident to ai are ai , ai (call those edges atomic cuts) and ai x with x neither in K not in K ⊥ . This yields a correct proof net by Corollary 1. If, in this graph, an atom a is in the B relation with an a⊥ in K ∪ K ⊥ , then the result of cut elimination is the closest point not in K nor in K reached by an alternating sequence of B-edges and elementary cuts starting from a – observe that this point is necessarily named a⊥ , that we call its cut neighbour. To obtain the proof resulting from cut-elimination, suppress all the atoms of K and K ⊥ as well as the incident arcs and edges and connect every atom to its cut neighbour with a B edge. The B edges from an atom to its cut neighbour can be obtained step by step by replacing a path of a B edge and R edge and a B edge with a B edge, and this preserves correctness (it is related to the rule a↑ of SBV discussed in Section 7. This final step leads us to the cut-elimination theorem for handsome pomset proof nets: Theorem 5 Cut-elimination preserves the correctness criterion of dicogPN proof nets and consequently the dicog-PN proof nets enjoy cut-elimination. Proof The preservation of the absence of a chordless æ circuit during cutelimination is proved in [45, 46]. There are two steps in cut-elimination: one consists in turning the cut into atomic cuts, which is proved to preserve correctness in Corollary 1 and the other consists in shortening successions of ⊥ atomic cuts and axioms according to the following pattern: a⊥ i Baj Rcutak Bal ⊥ reduces to ai Bal provided the R edges belong to the cut (we here distinguish the vertices, but their names as logical atoms are all equal or dual, ai = aj = ak = al , because they are linked by cuts or axioms). Such a suppression of æ paths without chords cannot create chordless æ circuits.
Pomset Logic: the other approach to noncommutativity in logic
317
4.3 From sequent calculus and rewrite proofs to dicog-PN Proofs of the sequent calculus given in Figure 7 are easily turned into a dicogPN proof net inductively. Such a derivation starts with axioms ai , a⊥ i and, as it is well known in any kind of multiplicative linear logic, the atoms ai and a⊥ i that can be traced from the axiom that introduced them to the conclusion sequent, which, after some unfolding, can be viewed as a dicograph of atoms R. The dicog-PN proof structure corresponding to the sequent calculus proof simply is (B, R), and fortunately is a correct proof net. Proposition 2 A proof of sequent calculus corresponds to a dicog-PN i.e., to a handsome proof structure without a chordless alternate elementary path, i.e. into a handsome proof net. Proof In [45, 46] we established by induction on the proof that neither the folding rules nor the unfolding rules can introduce a chordless ae cycle. Proposition 3 Any proof obtained by rewriting from AXn yields a handsome proof structure without chordless alternate elementary paths, i.e. into a dicog-PN. Proof Observe that AXn satisfies the criterion, so, because of Theorem 4, the result is clear.
5 Denotational semantics of pomset logic within coherence spaces Denotational semantics or categorical interpretation of a logic is the interpretation of a logic in such a way that a proof d of A B is interpreted as a morphism d from an object A to an object B in such a way that d = d whenever d reduces to d by (the transitive closure) of β-reduction or cut-elimination. A proof d of B (when there is no A) is simply interpreted as a morphism from the terminal object 1 to B (in the monoidal case, from I to B). More details can be found in [20, 9]. Once the interpretation of propositional variables is defined, the interpretation of complex formulas is defined by induction on the complexity of the formula. The set Hom(A, B) of morphisms from A to B is in bijective correspondence with an object written A B. Morphisms are defined by induction on the proofs, and one has to check that the interpretations of a proof before and after one step of cut-elimination is unchanged. Categorical interpretations of intuitionistic logic take place in cartesian closed categories while categorical interpretations of {⊗, } in linear logic take place in a monoidal closed category (with monads for multiplicative exponential linear logic).
318
Christian Retor´ e
5.1 Coherence spaces The category of coherence spaces is a concrete category: objects are (countable) sets endowed with a binary relation, and morphisms are linear maps. It interprets the proofs up to cut-elimination or β-reduction initially of propositional intuitionistic logic and propositional linear logic (possibly quantified). Actually, coherence spaces are tightly related to linear logic: indeed, linear logic arose from this particular semantics, invented to model second order lambda calculus, i.e., quantified propositional intuitionistic logic [7]. Coherence spaces are themselves inspired from the categorical work on ordinals by Jean-Yves Girard; they are the binary qualitative domains. Definition 6 A coherence space A is a set |A| (possibly infinite) called the web of A whose elements are called tokens, endowed with a binary reflexive and symmetric relation called coherence on |A| × |A| noted α ¨ α [A] or simply α ¨ α when A is clear. The following notations are common and useful: α ˝ α [A] iff α ¨ α [A] and α = α α ˚ α [A] iff α ¨ α [A] or α = α α ˇ α [A] iff α ¨ α [A] and α = α A proof of A is to be interpreted by a clique of the corresponding coherence space A, a clique being a set of pairwise coherent tokens in |A| — we write x ∈ A for x ⊂ |A| and for all α, α in x one has α ¨ α . Observe that for all x ∈ A, if x ⊂ x then x ∈ A. Definition 7 A linear morphism F from A to B is a morphism mapping cliques of A to cliques of B such that: • For all x ∈ A if (x ⊂ x) then F (x ) ⊂ F (x); • For every family (xi )i∈I of pairwise compatible cliques — that is to say, if (xi ∪ xj ) ∈ A holds for all i, j ∈ I — F (∪i∈I xi ) = ∪i∈I F (xi ); 7 • For all x, x ∈ A if (x ∪ x ) ∈ A then F (x ∩ x ) = F (x) ∩ F (x ) – this last condition is called stability. Due to the removal of structural rules, linear logic has two kinds of conjunctions: Γ, A
Δ, B
Γ, Δ, A ⊗ B
⊗
Γ, A
Γ, B
Γ, A&B
&
Those two rules are equivalent when contraction and weakening are allowed. Looking at the rules bottom-up, in the multiplicatives contexts are 7
The morphism is said to be stable when this second condition is replaced with F (∪i∈I xi ) = ∪i∈I F (xi ) holds more generally for the union of a directed family of cliques of A, i.e. ∀i, j∃k (xi ∪ xj ) = xk .
Pomset Logic: the other approach to noncommutativity in logic
319
split (⊗ rule above), and in the additives contexts are duplicated (& rule above). Regarding denotational semantics, the web of the coherence space associated with a formula A ∗ B with ∗ a multiplicative connective is the Cartesian product |A| × |B| of the webs of A and B — while it is the disjoint union of the webs of A and B when the connective ∗ is additive. Negation is a unary connective which is both multiplicative and additive: |A⊥ | = |A| and α ¨ α[A⊥ ] iff α ˚ α [A] One may wonder how many binary multiplicative connectives there are, i.e., how many different coherence relations one may define on |A| × |B| from the coherence relations on A and on B. We can limit ourselves to the ones that are covariant functors in both A and B — indeed there is a negation, hence a contravariant connective in A is a covariant connective in A⊥ . Hence when both components are ˚ so are the two couples, and when they are both coherent, so are the two couples. As it is easily observed, given two tokens α, α in a coherence space C exactly one of the three following properties hold: α ˇ α
α = α
α ˝ α
To define a multiplicative connective, we define when (α, β) ¨ (α , β )[A ∗ B] holds depending on whether α ¨ α [A] and β ¨ β [B] hold. Thus defining a binary multiplicative connective is to fill a nine-cell table as in the ones below; the first column indicates the relation between α and α in A, while the first row indicates the relation between β and β in B. However if ∗ is assumed to be covariant in both its arguments, seven out of the nine cells are filled, so the only free values are the ones in the right upper cell, NE=North-East, and the left bottom cell SW=South-West. NE and SW cannot be “=”, so it makes four possibilities. A∗B ˇ = ˝
ˇ = ˝ ˇ ˇ NE? ˇ = ˝ SW? ˝ ˝
If one wants ∗ to be commutative, there are only two possibilities, namely N E = SW =˝ () and N E = SW =ˇ (⊗). AB ˇ = ˝ ˇ = ˝
ˇˇ˝ ˇ=˝ ˝˝˝
and
A⊗B ˇ = ˝
ˇ=˝ ˇˇˇ ˇ=˝ ˇ˝˝
However, if we do not ask for the connective ∗ to be commutative we have a third connective A B and a fourth connective A B which is simply B A.
320
Christian Retor´ e
AB ˇ = ˝ ˇ = ˝
ˇˇˇ ˇ=˝ ˝˝˝
A B ˇ = ˝ and
ˇ = ˝
ˇˇ˝ ˇ=˝ ˇ˝˝
The coherence relations on those connectives are defined as follows: (α, β) ¨ (α , β )[A ⊗ B] iff α ¨ α [A] and β ¨ β [B] (α, β) ¨ (α , β )[AB] iff α ¨ α [A] or β ¨ β [B] (α, β) ¨ (α , β )[A B] iff α ˝ α [A] or (α = α and β ¨ β [B]) The definition of and on coherence spaces applies to sp partial orders of formulas. Given an sp-order T [A1 , . . . , An ] on the formulas A1 , . . . , An , i.e. a dicograph term T using only and the above definitions of and (α1 , . . . , αn ) ˝ (α1 , . . . , αn )[T [A1 , . . . , An ] are strictly coherent whenever: there exists i such that αi ˝ αi and for every j < i one has αj = αj . The linear morphisms from A to B that is Hom(A, B) can be represented by the coherence space A B = A⊥ B 8 AB ˇ=˝ ˇ = ˝
˝˝˝ ˇ=˝ ˇˇ˝
Let us see how linear morphisms are in a one-to-one correspondence with cliques of A B. Given a clique f ∈ (A B) the map Ff from cliques of A to cliques of B defined by Ff (x) = {β ∈ |B| | ∃α ∈ x (α, β) ∈ f } is a linear morphism. Conversely, given a linear morphism F , the set {(α, β) ∈ |A| × |B| | β ∈ F ({α})} is a clique of A B. One can observe that the subset {((α, (β, γ)), ((α, β), γ)) | α ∈ |A|, β ∈ |B|, γ ∈ |C|} of |A|×|B|×|C| defines a linear isomorphism from A(BC)) to (AB)C, that {((α, β), (α, β)) | α ∈ |A|, β ∈ |B|} defines a linear morphism from A ⊗ B to A B and the same set of pairs of tokens also defines a linear morphism from A B to AB. However, for general coherence spaces A and B there is no canonical linear map from A B to B A. Given two different tokens (α, β) and (α , β ) in |A| × |B|, observe that: 1. (α, β) ˝ (α , β )[A B] means (α = α and β ˝ β [B]) or α ˝ α [A] 2. (α, β) ˝ (α , β )[A⊥ B ⊥ ] means (α = α and β ˇ β [B]) or α ˇ α [A] Given that those two tokens are different, either: If α = α then either α ˝ α [A], 1 holds and 2 does not hold or α ˝ α [A], 2 holds and 1 does not hold. If α = α then β = β and either β ˝ β [B], 1 holds and 2 does not hold or β ˇ β [B] 2 holds and 1 does not hold. 8
This internalisation of Hom(A, B) makes the category closed, but not cartesian closed because the associated conjunction, namely ⊗, is not a product.
Pomset Logic: the other approach to noncommutativity in logic
321
Consequently, (A B)⊥ ≡ A⊥ B ⊥ . Linear logic arose from coherence semantics, and consequently coherence semantics is close to linear logic syntax. Coherence spaces may even be turned into a fully abstract model in the multiplicative case (without “before”); see [25]. The “before” connective arose from coherence semantics; hence, it is a good idea to explore the coherence semantics of the logical calculi we designed for pomset logic in order to see whether they are sound.
5.2 A sound and faithful interpretation of proof nets in coherence spaces An important criterion guiding the design of the deductive systems for pomset logic is that those systems are sound w.r.t. coherence semantics — in addition to cut-elimination discussed previously. We shall here interpret a proof net with conclusion T (a formula or a dicograph of atoms) as a clique of the corresponding coherence space T . Computing the semantics of a cut-free proof net is rather easy, using Girard’s experiments, but from axioms to conclusions as done in [37, 43]. However, we define the interpretation of a proof structure (not necessarily a proof net) as a set of tokens of the web of the conclusion formula. Assume the proof structure is B = {ai−a⊥ i |1 ≤ i ≤ n} and that each of the ai as a corresponding coherence space a is also denoted by ai . For each ai choose a token αi ∈ |ai |. If the conclusion is a dicograph T replacing each occurrence of ai and each occurrence of a⊥ i with αi yields a term, which when converting x ∗ y (with ∗ being one of the connectives, , , ⊗) with (x, y), yields a token in the web of the coherence space associated with T — this token in |T | is called the result of the experiment. Given a normal (cut-free) proof structure π with conclusion T the interpretation π of the normal proof structure π is the set of all the results of the experiments on π. One has the following result that Lambek appreciated, because it replaces graph theoretical considerations with algebraic properties: Theorem 6 A proof structure π with conclusion T is a proof net (contains no chordless aecircuit) if and only if its interpretation π is a clique of the coherence space T (is a semantic object). Proof The proof is a consequence of: • both folding and unfolding (see Subsection 8.1 or [44, 45, 46]) preserve correctness; • semantic characterisation of proof nets with links; correctness is proved in [37] for MLL and pomset logic — the published version left out pomset logic [43].
322
Christian Retor´ e
The actual result we proved is a bit more: in order to check correctness a given four-token coherence space is enough, and this provides an algorithm to check correctness, which is of an exponential complexity in accordance with the recent results by Nguyˆen [28]. When π is not normal, i.e., includes cuts, not all experiments succeed and provide results: an experiment is said to succeed when in every cut KcutK ⊥ the value α on an atom a in K is the same as the value on the corresponding atom a⊥ in K ⊥ . Otherwise the experiment fails and has no result. The set of the results of all succeeding experiments of a proof net π is a clique of the coherence space T . It is the interpretation π of the normal proof net π. Whenever π reduces to π by cut elimination, π = π . That way one is able to predict whether a proof structure will reduce to a proof net9 without actually performing cut-elimination: Theorem 7 Let π be a proof structure and let π ∗ be its normal form; then π ∗ is a proof net plus zero or more loops (cut between two atoms that are connected with an axiom) whenever two succeeding experiments of π have coherent or equal results. Proof See [37, 43].
6 Sequentialisation with pomset sequents or dicographs sequents In 2001, Lambek noticed the absence of sequent calculus in my habilitation [47]. Although there is one in my PhD that was refined later to only use sp orders, I did not put sequent calculus on the forefront firstly because the proof net calculus enjoys many more mathematical properties and secondly because the sequent calculi I know do not generate all the proof nets. I unsuccessfully tried, and Sylvain Pogodalla and Lutz Straßburger as well, to prove that every correct proof net is the image of a proof in the sequent calculus — the one given here or some variant. The sp-pomset sequent calculus presented in Figure 7 is clearly equivalent to the dicograph sequent calculus with dicographs of atoms as sequents; in the may well be dicograph sequent calculus, the symmetric series compositions ⊗ and used on contexts, as the rule, and all connective introduction rules consist in internalising the ∗ operation inside a formula as a ∗ connective. This calculus is shown in Figure 9. Observe that entropy does not allow inclusion of dicograph in general, but only of an outer sp-order; indeed, in general, dicograph inclusion does not preserve correctness, as explained in Proposition 1. An induction on either sequent calculus given in this paper shows that: 9
Proof nets reduce to proof nets, correctness is preserved under cut-elimination, but an incorrect proof structure may well reduce to a proof net.
Pomset Logic: the other approach to noncommutativity in logic axiom
⊥ aa
Γ
Δ dimix
Γ Δ
O[Γ1 , . . . , Γp ] O [Γ1 , . . . , Γp ]
AΓ Γ [AB] Γ [AB]
entropy
B Δ
··
with Γi :dicographs, O, O sp-orders, O ⊂ O
⊗ / cut when A = B ⊥
⊗ B)Δ Γ (A
if A ∼ B
323
Γ [A B] Γ [A B]
→
if A ∼ B
B] Γ [A ⊗ Γ [A ⊗ B]
−
⊗ if A ∼ B
Fig. 9 Dicograph sequent calculus with dicographs of atoms as sequents
Proposition 4 Let δ be a proof of a dicograph sequent R, and let πδ = (B, R) be the corresponding proof net. Then the axioms and atoms of πδ can be ⊥ partitioned into two classes Π1 = (ai−a⊥ i )i∈I1 and Π2 = (ai−ai )i∈I2 in such a way that either: 1. there are only arcs from Π1 to Π2 connection: calling R1 = R Π1 2. the only edges between Π1 and Π2 are a ⊗ 1 T 1 , R1 = A2 T 2 , and R = (A1 ⊗ A2 )T 2 and R2 = R Π2 , R1 = A1 T b⊥ a⊥
f
b
a c
c⊥
d
d⊥
f⊥ e⊥
e
Fig. 10 A proof net with no corresponding sequent calculus proof (found with Lutz Straßburger)
324
Christian Retor´ e
Proposition 5 There does exist a proof net without any sequent calculus proof, for example the one in Figure 10. Proof First one has to observe that the proof structure in Figure 10 is a proof net, i.e., contains no chordless alternate elementary circuit: indeed, it contains no alternate elementary circuit. Because of Proposition 4, there should exist a partition into two parts with 1. either only arcs from one part to the other part, 2. or a tensor connection between the two parts.
If the first case applies, i.e., if there were a partition into two parts with only arcs from one to another, all vertices connected with an undirected edge, be it a B or an R edge, should be in the same component: a, a⊥ , b, b⊥ , c, c⊥ should be in the same component, say Π1 and f, f ⊥ , d, d⊥ , e, e⊥ should be in the same component, say Π2 ; but this is impossible because there is both an R arc from Π1 to Π2 , e.g., a⊥ → f , and an R arc from Π2 to Π1 , e.g., e → b. So the first case does not apply. Because the first case does not apply, there should exist two parts, with a tensor rule as the only connection between the two parts. The two possible (e⊥ (c tensors are a ⊗ b⊥ ) and d ⊗ f ⊥ ), but it is impossible: (c • a⊗ b⊥ ) cannot be the only connection between the two parts, as there exists an undirected path from c to a not using any of the two tensor R edges: R B R B B R c ←→ c⊥ −→ d⊥ ←→ d ←→ f ⊥ ←→ f ←− a⊥ (e⊥ • d⊗ f ⊥ ) cannot be the only connection between the two parts, as there exists an undirected path from f ⊥ to d not using any of the two tensor R edges: B
R
B
R
B
R
B
f ⊥ ←→ f ←− a⊥ ←→ a ←→ c ←→ c⊥ −→ d⊥ ←→ d In the next Section 7 we shall see that the correct proof net in Figure 10 ⊥ ) ⊗(b b ⊥ ) ⊗(c c ⊥ ) ⊗(d d ⊥ ) ⊗(e e ⊥)⊗ can be derived from an axiom (aa ⊥ (f f ) by means of the rewriting rules of Figure 3.3 but ⊗4, see Figure 11 and in SBV as well, see Figure 13.
7 Pomset logic in deep inference style In [45] (1998), I considered the rewriting rules of Figure 3.3 which preserve correctness (but ⊗4), but as Gugliemi noticed in [12] (2007) I did not use the rewriting rules as a proof calculus to derive tautologies. However, in the conclusion of [44] I show that the rewriting rules that are correct and concern the MLL connectives (⊗3 and ⊗2 = M IX) are equivalent to
Pomset Logic: the other approach to noncommutativity in logic
325
MLL; I explain that one could do the same for pomset logic with the rules of Figure 3.3 that preserve correctness. This rewriting view was developed from 2001 with terms rather than graphs by Guglielmi and Straßburger as the calculus of structures [11, 14, 55]. Before we define a rewriting deductive system for pomset logic, let us revisit (as we did in [44, 48]) the deductive system of Multiplicative Linear Logic (MLL). Those results are highly inspired from proof nets, but once they are established they can be presented before proof nets are defined. In this section a sequent is simply a dicograph of atoms, which as explained above can be viewed using folding of Section 3.4 as a dicograph of formulas or as an sp order between formulas, depending on how many folding transformations there are and which ones are performed. Regarding Multiplicative Linear Logic (MLL), observe that AX n = 1≤i≤n (ai ai ⊥ ) is the largest cograph w.r.t. inclusion that can be derived in MLL with the (ai ai ⊥ ) as axioms: any additional R edge or arc would make a direct æ circuit with an axiom. However, AX n is actually derivable in MLL, hence in any extension of MLL: a1 , a1 ⊥
a2 , a 2 ⊥
AX 1 : a1 a1 ⊥
a2 a2 ⊥
AX 2 : ⊗1≤i≤2 (ai ai ⊥ )
⊗
AX 3 : ⊗1≤i≤3 (ai ai ⊥ )
a3 , a 3 ⊥ a3 a3 ⊥
AX 4 : ⊗1≤i≤4 (ai ai ⊥ )
AX 5 : · · ·
⊗
a4 , a 4 ⊥ a4 a4 ⊥
···
⊗ ⊗
7.1 Standard multiplicative linear logic as cograph rewriting In [44], we considered an alternative way to derive theorems of usual multiplicative linear logic (MLL) by considering a formula as a binary relation, as ⊗ and and more precisely, as a cograph over its atoms, by viewing ⊗ as . As there is no connective in linear logic the series composition is not used, and there is no sp order on conclusions. Following the discussion in the Introduction to the present Section 7, any ⊥ ⊥ sequent of MLL can be viewed as a cograph C[a1 , a⊥ 1 , a2 , a2 , . . . an , an ] on 2n atoms that is included into AXn . Because of Theorem 3.3, AXn rewrites ⊥ ⊥ to C[a1 , a⊥ 1 , a2 , a2 , . . . an , an ] using the rules of Figure 3.3 that concern and ⊗, i.e., ⊗4, ⊗3 and ⊗2. Observe that when viewed as a linear implication (considering the rules involving those two connectives), the first 4 is an incorrect linear implication, while ⊗3 is derivable in MLL line ⊗
326
Christian Retor´ e
and ⊗2 in MLL+MIX where the rule MIX is the one studied in [6], which also is derivable with ⊗2: Γ
Δ
Γ, Δ
MIX
Actually all tautologies of multiplicative linear logic MLL can be derived using ⊗3 from an axiom AXn = 1≤i≤n (ai ai ⊥ ), and all tautologies of linear logic enriched with the MIX rule, MLL+MIX, can be derived by ⊗3 and ⊗2 (MIX). Thus, we can define a proof system gMLL for MLL working with sequents i (ai a i ⊥ ) (the two as cographs of atoms as follows. Axioms are AXn : ⊗ dual atoms are connected by an edge in a different relation called A for A axioms). There is just one deduction rule presented as a rewrite rule (up to commutativity and associativity): ⊗3. Let us call this deductive system gMLL (g for graph), then [44, 48] established that cograph rewriting is an alternative proof system to MLL and MLL+MIX. Theorem 8 MLL proves a sequent Γ with 2n atoms if and only if gMLL proves the unfolding Γ cog of Γ (the cograph Γ cog of atoms corresponding to Γ , that is the of the unforging of each formula in Γ ), i.e., AXn rewrites to Γ cog using ⊗3. MLL+MIX proves a sequent Γ with 2n atoms if and only if gMLL+MIX proves the unfolding Γ cog of Γ , i.e., AXn rewrites to Γ cog using ⊗3 and ⊗2. Proof Easy induction on sequent calculus proofs, see e.g. [44, 48]. Straßburger made a direct proof in [55]. Thus all MLL proofs can be obtained that way from axioms, but despite Staßburger’s result, unfortunately for pomset logic, it is hard to prove it directly on a non-inductive notion of proof like proof nets. Proposition 6 The calculi gMLL and gMLL+MIX can safely be extended to structured sequents of formulas of MLL (not just atoms), i.e., cographs of MLL formulas with the rules of folding and unfolding with the same results. Proof This is just an easy remark, based on proof nets, which can be viewed as a consequence of Subsection 8.1.
7.2 Deep pomset is SBV The above result for MLL suggests presenting pomset logic as a rewriting system from “axioms” that are tensors of xx⊥ axioms, with the rewriting
Pomset Logic: the other approach to noncommutativity in logic
327
rules of Figure 3.3 except the one that does not preserve correctness, namely ⊗4 — the sp-pomset sequent calculus of Figure 7 is defined along the same lines. This rewriting calculus is rather natural because the rewriting rules preserve correctness (but ⊗4), while the rewriting rules themselves correspond to proofs and to canonical linear maps in coherence spaces. This suggests that a rewriting system defined as gMLL+MIX in the previous section (but with dicographs instead of cographs) might yield all correct proof nets, but this is still an open question. Let us call deep pomset the rewriting system for pomset logic defined by axioms and rewriting rules: 1≤i≤n (ai a i ⊥ ) is a tautology. Axioms AXn = ⊗ Rules Whenever a dicograph of atoms D is a tautology, so is the dicograph D obtained by any of the 10 rules ⊗3, ⊗2, ⊗ 4, ⊗ 3l, ⊗ 3r, ⊗ 2, 4, 3l, 3r, 2 of Figure 3.3 — i.e., all rules of Figure 3.3 but ⊗4. Observe the D has the same atoms as D. For simplicity, we leave out the circumflex accents which draw a distinction between the logical connectives (, , ⊗) and the corresponding operations because in this section there are only dicographs, on dicographs (, , ⊗), denoted by terms. Axiom: ⊗2 4 ⊗3 2 × ⊗2 2 × 4 ⊗3 ⊗3l ⊗3 3 ⊗3 ⊗3 3r
(ee⊥ ) ⊗ (bb⊥ ) ⊗ (cc⊥ ) ⊗ (f f ⊥ ) ⊗ (aa⊥ ) ⊗ (dd⊥ ) [(e⊥ e) (b⊥ b)] ⊗ (cc⊥ ) ⊗ (f f ⊥ ) ⊗ (aa⊥ ) ⊗ (dd⊥ ) [(e⊥ b⊥ )(e b)] ⊗ (cc⊥ ) ⊗ (f f ⊥ ) ⊗ (aa⊥ ) ⊗ (dd⊥ ) [{(e⊥ b⊥ ) ⊗ (cc⊥ ) ⊗ (f f ⊥ )}(e b)] ⊗ (aa⊥ ) ⊗ (dd⊥ ) [{((cc⊥ ) (e⊥ b⊥ )) (f f ⊥ ))}(e b)] ⊗ (aa⊥ ) ⊗ (dd⊥ ) [(c b⊥ f )(c⊥ e⊥ f ⊥ )(e b)] ⊗ (aa⊥ ) ⊗ (dd⊥ ) [{(aa⊥ ) ⊗ (c b⊥ f )}(c⊥ e⊥ f ⊥ )(e b)] ⊗ (dd⊥ ) [({(aa⊥ ) ⊗ (c b⊥ )} f )(c⊥ e⊥ f ⊥ )(e b)] ⊗ (dd⊥ ) [({a⊥ (a ⊗ (c b⊥ ))} f )(c⊥ e⊥ f ⊥ )(e b)] ⊗ (dd⊥ ) [(a ⊗ (c b⊥ ))(a⊥ f )(c⊥ e⊥ f ⊥ )(e b)] ⊗ (dd⊥ ) (a ⊗ (c b⊥ ))(a⊥ f ){(c⊥ e⊥ f ⊥ ) ⊗ (dd⊥ )}(e b) (a ⊗ (c b⊥ ))(a⊥ f )(c⊥ {[(e⊥ f ⊥ ) ⊗ d]d⊥ }(e b) (a ⊗ (c b⊥ ))(a⊥ f )((e⊥ f ⊥ ) ⊗ d)(c⊥ d⊥ )(e b)
Fig. 11 The derivation of the proof net of Figure 10 in Deep Pomset.
We shall prove that deep pomset is equivalent to the SBV rewriting system defined in Figure 12 — we follow the simple presentation given in [14]. The SBV system is defined as term rewriting, rather than as dicographs rewriting. In the SBV term system there are bidirectional rewriting rules that are simply equality of the dicographs (e.g., associativity or commutativity of the order operations). Another difference is that axiom is expressed in BV as ◦, that
328
Christian Retor´ e
• The rewriting rules ⊗3, ⊗2, ⊗ 4, ⊗ 3l, ⊗ 3r, ⊗ 2, 4, 3l, 3r, 2 (but ⊗4) of Figure 3.3 are the structural rules of SBV. Because of 1 the interchange law on four terms is enough in SBV while Pomset Logic (without 1) requires the ternary and binary rules as well. • The rule 1↓ says 1 may appear or vanish “anywhere”: Rule 1↓ S[T ] S[(T 1)] S[T ] S[(T 1)] S[T ] S[(1 T )] S[T ] S[(T ⊗ 1)] • The rule rule a↓ says that 1 may be replaced with an “axiom” ai a⊥ i : S[1] S[(ai a⊥ i )] • The rule a↑ says that a pair of edge equivalent dual atoms (i.e. equivalent and connected with a tensor) can vanish every where (in the original formulation of SBV S[(ai ⊗ a⊥ i )] rewrites into S[1] but given the above rules for 1 above it is unnecessary). S[(ai ⊗ a⊥ i )] S x∈{a ,a⊥ } i
i
Fig. 12 The term calculus SBV [14], written with MLL/pomset symbols. Observe that associativity, commutativity are implicit in handsome proof nets (graphs) but invertible rewriting rules in SBV. Axiom a↓ 1a↓ ⊗2 4 (1a↓) × 2 ⊗ 2x2 4x2 1a↓ ⊗3 3 1a↓ ⊗3 3
1 (e⊥ e) (e⊥ e) ⊗ (b⊥ b) (e⊥ e) (b⊥ b) (e⊥ b⊥ )(e b) ((cc⊥ ) ⊗ (e⊥ b⊥ ) ⊗ (f f ⊥ ))(e b) ((cc⊥ ) (e⊥ b⊥ ) (f f ⊥ ))(e b) (c b⊥ f )(c⊥ e⊥ f ⊥ )(e b) (((c b⊥ ) ⊗ (aa⊥ )) f )(c⊥ e⊥ f ⊥ )(e b) (((c b⊥ ) ⊗ a)a⊥ ) f )(c⊥ e⊥ f ⊥ )(e b) ((c b⊥ ) ⊗ a)(a⊥ f )(c⊥ e⊥ f ⊥ )(e b) ((c b⊥ ) ⊗ a)(a⊥ f )(c⊥ ((e⊥ f ⊥ ) ⊗ (dd⊥ ))(e b) ((c b⊥ ) ⊗ a)(a⊥ f ){c⊥ [((e⊥ f ⊥ ) ⊗ d)d⊥ ]}(e b) ((c b⊥ ) ⊗ a)(a⊥ f )(((e⊥ f ⊥ ) ⊗ d)(c d⊥ )(e b)
Fig. 13 An example of an SBV derivation: the dicog proof net of Figure 10 in SBV (thanks to Lutz Straßburger). We grouped the rules 1↓ and a↓. As we proved in this paper, derivation can be converted into a Deep Pomset derivation, the one in Figure 11.
I shall write 1 10 , a unit for all the three connectives, which may appear anywhere, and which may be rewritten as aa⊥ . The ◦ symbol is cute, but ◦ is unusual for denoting a unit, or a truth value, or the set {∗} with a single element (which actually is the multiplicative unit in the category of coherence spaces), so I prefer the standard notation “1”.
10
Pomset Logic: the other approach to noncommutativity in logic
329
7.2.1 Simulating a↑, the atomic-cut reduction in Deep Pomset In the proof net framework, a↑ is the elimination of an atomic cut, that can − be internal, i.e., a formula a ∼ a⊥ that can be a subformula. Let us see that this can be simulated within pomset logic. Proposition 7 Let S[t ∗ u] be a dicograph. Then S[t ∗ u] with ∗ = , ⊗, rewrites into S[t]u with the correct rewriting rules only, i.e., ⊗4 can be avoided. Proof Because S[t ∗ u] ⊂ S[t]u there is no doubt the largest dicograph S[t ∗ u] rewrites into the smaller one S[t]u, but we need to check that ⊗4 is unnecessary. This is rather straightforward, by induction on the structure of S[ ]. Definition 8 Given a proof net π = (V, B, R) with R = S[u ∗ (a ⊗ a⊥ )]. An atomic cut reduction is a transformation of π into π = (V , B , R ), with V = {a, a⊥ }, B = B \ {B(a) B a, a⊥ B B(a⊥ )} ∪ {B(a) B B(a⊥ )} and R = S[u] — the atoms a and a⊥ are no longer in the domain of the R dicograph. The expression B(x) denotes the unique vertex B-related to x. Proposition 8 When an atomic cut reduction is performed on a proof net π = (V, B, R) with R = S[u ∗ (ai ⊗ a⊥ i )] yielding π = (V , B , R = S[u]) as in the above definition π is a correct proof net as well. Proof We view atomic cut reduction as a two-step process. The first step consists in turning π into π − = (V, B, S[t](a ⊗ a⊥ ) and as shown in Proposition 7 this transformation preserves correctness (⊗4 is not used): π − is correct. The second step consists in replacing in π − the sequence of three edges {B(a) B a, a⊥ R B(a⊥ )} ∪ {B(a) B B(a⊥ )} with a single B edge B(a) B B(a⊥ ). Observe that in π − there are no R edges incident to a nor to a⊥ . Hence the æ circuits are unchanged in π − , and in π are the same up to the replacement of {B(a) B a, a⊥ R B(a⊥ )} ∪ {B(a) B B(a⊥ )} with the single B edge B(a) B B(a⊥ ). If one æ circuit of π would be chordless so would be its image in π − . The next proposition (also proved in [55]) shows that if a proof net is derivable in deep pomset, so is the proof net obtained by performing an atomic cut reduction: Proposition 9 Let π = (V, B, R) with R = S[t∗(a⊗a⊥ )] be a proof net derivable in deep pomset and let π = (V, B, R[t]) be the proof net obtained from π by an atomic cut reduction as in the above definition. If Axn+1 rewrites to π with only the correct rewriting rules, then Axn (one axiom less) rewrites into π with only the correct rewriting rules.
330
Christian Retor´ e
Proof Let a⊥ = B(a) and a = B(a⊥ ) the atoms that are linked to a and a⊥ with axioms, that are the two end-vertices of the same B edge in a proof net (or the ones that are created jointly by a a↓ in SBV). Let us call δ a sequence of rewritings leading from Axn+1 = (a⊥ a) ⊗ ⊥ (a a) ⊗ (⊗i (ai ai ⊥ )) to R = S[t ∗ (a ⊗ a⊥ )]. Observe that the restriction of R to a⊥ , a, a⊥ , a can only be (a⊗a⊥ )aa⊥ , − because a ∼ a⊥ in R, and because R is a subdicograph of Axn+1 . Thus the rewriting δ includes at some point the rewriting the sub term w = (a⊥ a) ⊗ (a⊥ a) by an occurrence i of the rewriting rule ⊗3 either into u = a⊥ (a ⊗ (a⊥ a)) or into v = ((a⊥ a) ⊗ a⊥ )a) — that are the only − possibilities since at the end of the rewriting δ we have a ∼ a⊥ , i.e., a and a⊥ should be kept linked by a ⊗. − Because at the end of the rewriting δ we have a ∼ a⊥ , if δ reduced w to u (resp. to v), then δ reduces later on the sub term u = (a ⊗ (a⊥ a)) of u (resp. the sub term v = (a⊥ a) ⊗ a⊥ of v) into u = (a ⊗ a⊥ )a (resp. into v = a⊥ (a ⊗ a⊥ )) by an occurrence j the rewriting rule ⊗3. Because the ⊗3 rules i and j commute with the rewriting rules that precede them, we may assume that δ starts with the ⊗3 rules i and j turning (a⊥ a) ⊗ (a⊥ a) of Axn+1 into (a ⊗ a⊥ )(aa⊥ ), then followed by a rewriting δ − from [(a ⊗ a⊥ )(aa⊥ )] ⊗ (⊗i (ai ai ⊥ )) to R = S[t ∗ (a ⊗ a⊥ )]. The result is obtained by considering the projection δ − x∈{a,a⊥ of δ − on the dicographs from which a and a⊥ are dropped out. The only prohibited rule ⊗ 4 cannot result from a correct rule by dropping out an atom. The correct rewriting derivation δ − x∈{a,a⊥ reduces Axn = (aa⊥ ) ⊗ (⊗i (ai ai ⊥ )) into R = S[t].
7.2.2 Dealing with the unit in Deep Pomset To establish the correspondence between deep pomset and SBV either we can enrich pomset logic with a unit, or rather consider the dicographs proved by SBV without any unit remaining. We encode 1 in RnB graphs with an axiom, i.e., a B edge whose two ·· ends and ⊥ are always par-equivalent ∼ ⊥ in the dicograph t, i.e., t is ⊥ t[ ] — as the derivation proceeds, and ⊥ are never driven apart: in other words we define 1 = ( ⊥ ) — and we may use 1 or ( ⊥ ) to denote it in dicographs.11 A case study of possible æ circuits and their R chords shows the two following propositions: 11
There are other faithful encodings of 1 but this one preserves the structure of the proof net: a simple graph, whose B edges define a perfect matching, R edges are a dicograph, and every alternate elementary cycle has an R chord.
Pomset Logic: the other approach to noncommutativity in logic
331
Proposition 10 Let u and t[x] be dicographs (x stands for a vertex). Assume that (B, t[u]) is a correct proof net (every æ circuit contains an R chord). Then (B ∪ B ⊥ , t[u ⊗ 1]) is a correct proof net as well. Proposition 11 Let u and t[x] be dicographs (x stands for a vertex). Assume that (B ∪ − ⊥ , t[u ⊗ 1]) is a proof net (every æ circuit contains an R chord). Then (B, t[u]) is a correct proof net as well. Proposition 12 Given three dicographs t, u, v the dicograph t[u] ⊗ v rewrites to t[u ⊗ v] and to t[uv] and t[u v] and t[v u]. Proof Easy induction on the structure of dicographs with a “hole” t[ ].
In order to ease the correspondence with SBV, let us slightly extend deep pomset into unitary deep pomset which involves 1: Axioms AXn = ⊗1≤i≤n (ai ai ⊥ ) ⊗ ⊗1≤j≤n ( j j ⊥ ) is a tautology, and
j = j whenever j = j . Rules Whenever a dicograph of atoms D which is a tautology rewrites to a dicograph D (hence with the same atoms) by any of the 10 rules ⊗3, ⊗2, ⊗ 4, ⊗ 3l, ⊗ 3r, ⊗ 2, 4, 3l, 3r, 2 of Figure 3.3 — i.e., all rules of Figure 3.3 but ⊗4. Whenever those rules are applied, they must never tear apart an j from the related ⊥ j . Add1 / Rm1 / Subst1 The rules Add1 (insert 1) and Rm1 (remove 1) and Subst1 (replace 1 with (ai a⊥ i ) are respectively defined as 1, a↓, a↑ in SBV. Because of Propositions 10 (1) and 11 (a↑) and common sense (a ↓), those rules preserve the correctness criterion. Add1: t[u] → t[u ⊗ 1] (and an axiom ( j B j ⊥ ) with a fresh j corresponding to 1 is added) Rm1: t[u ⊗ 1] → t[u] (and the axiom ( j B j ⊥ ) corresponding to 1 is removed) Subst1: Consists in replacing dual atoms j and ⊥ j by dual atoms ⊥ ai and ai (no matter whether ai replaces j and a⊥ i replaces
⊥ j or the converse. )
7.2.3 Result: SBV is Deep Pomset Theorem 9 Any SBV derivation yielding a dicograph without 1 can be mapped rule by rule into a unitary deep pomset derivation and vice versa. Proof There is an obvious one-to-one correspondence between the rules of SBV and the ones of unitary deep pomset – this is the reason why we added the rules Add1 Rm1 and Subst1, which respectively correspond to 1, a↑, a↓. A little difference is that Subst1 acts upon the whole previous (rewriting) derivation while s↓ is local.
332
Christian Retor´ e
Theorem 10 Any SBV derivation yielding a term without 1 can be mapped rule by rule into a deep pomset derivation (without the rules for 1 of unitary deep pomset derivations) and vice versa. Proof Clearly, Deep Pomset is a subcalculus of SBV, rule by rule. So let us focus on the other direction, from SBV to Deep Pomset. The result is a consequence of both Proposition 9 which states that the a↑ can be simulated in Deep Pomset logic (without 1 rules) and of Proposition 12 which allows us to move the 1s inside the dicograph, at the place where they are actually used by some rewrite rule. We turn any SBV derivation without any 1 in its final dicograph into an SBV derivation which starts with a tensor of 1s that are immediately replaced with axioms ai a⊥ i and without any 1-rules inside. Observe that such an SBV derivation actually is a deep pomset derivation: simply erase the initial tensor of 1s and resume the derivation when it has been replaced with a tensor of axioms ai a⊥ i . Observe that in an SBV proof the 1s are mandatory to start with. Given that they vanish during the derivation, each 1 either is deleted according to the equalities that say it is a unit for ⊗, , or is turned into ai a⊥ i . Because during the derivation nothing is duplicated nor contracted, so atoms can be tracked, we can erase the first kind of 1 from the derivation from where it appeared to where it disappeared, and this is an SBV derivation as well. From now on we may assume without lost of generality that the SBV derivation contains several 1s that are introduced in the derivation and later on replaced with a ai a⊥ i axiom. We proceed by induction on the number of such 1s. Consider the first 1 that appears in the SBV derivation dsbv , say at the nth step of dsbv , and disappears at step n + k of the SBV derivation. Let us call d1 the SBV derivation where this 1 appears, d2 the part of the SBV derivation till the replacement of this 1 by ai a⊥ i , and d3 the end of this SBV derivation. The new derivation without this 1 that appears in the middle of the derivation before being replaced with (ai a⊥ i ) is defined as follows:
1. Let us add a 1 to the axiom of the SBV derivation, and immediately after, let us replace this 1 with ai a⊥ i which is going to replace 1 at the end of d1 . 2. The first part of the derivation consists in d1 , that is d1 with (ai a⊥ i ) at the end of every dicograph in the derivation. 3. Thereafter the derivation consists in moving the axiom to the place (ai a⊥ i ) by deep pomset rules as in Proposition 12. 4. Then the derivation is d2 , that is d2 with 1 replaced with (ai a⊥ i ). 5. The last part of the derivation is unchanged: it is d3 .
Hence we can turn the SBV derivation into an SBV derivation that starts with axioms that is ⊗n∈I 1 which is immediately replaced with axioms ⊗n∈I (ai a⊥ i ), yielding the same dicograph and only using deep pomset.
Pomset Logic: the other approach to noncommutativity in logic
333
Question We may wonder whether all proof nets can be obtained from AXn = ⊗i∈I (ai a⊥ i ) using only the correct rewriting rules (which simply are the correct inclusion patterns) of Figure 3.3 (all of them but ⊗4). This would provide an inductive definition of the proofs of pomset logic, different from the inductive definition provided by a sequent calculus with a sequentialisation theorem.
7.3 Cut-elimination in Deep Pomset and in SBV What about the cut rule? For such logical systems based on rewriting systems like gMLL(+MIX), of the dicog-RS view of pomset logic, which does not work with “logical rules” in the standard sense, there are no binary rules that would combine a K and a K ⊥ . Hence, as we said earlier, the natural view of a cut is simply a tensor K ⊗ K ⊥ which in pomset logic never is inserted inside a ⊗ formula, while SBV allows a generalisation with cuts K ⊗K ⊥ occurring “inside” dicographs. The rule i↑ of SBV generalises a ↑: i↑ rewrites any subterm K ⊗ K ⊥ to 1— or suppresses K ⊗ K ⊥ if one does not want units. A way to express cutelimination in rewriting systems is to say that the dicographs that can be derived with i↑ can be derived without i↑. Corollary 1 shows that decomposing a cut (A ⊗ B) ⊗ (A⊥ B ⊥ ) or (A B) ⊗ (A⊥ B ⊥ ) into two smaller cuts A ⊗ A⊥ and B ⊗ B ⊥ can be done by correct rewritings — no matter whether this cut appears inside a dicograph. In Proposition 9, we proved that whenever there exists a derivation of S[t ∗ (aa⊥ )] (a atomic) from AXn , there is one derivation of S[t] from AXn−1 . From those two propositions, we obtain for free an alternative proof of cutelimination for SBV or for Deep Pomset Logic, a result that first appeared in [55]: Theorem 11 (Cut-elimination for SBV [55]) Let t be a dicograph derived with i↑ from an axiom Axm . Then t can be derived from an axiom Axn with n ≤ m without the rule i↑. As can be seen, Deep Inference set the stage for new conceptions of cut and of cut-elimination, studied by Guglielmi and Straßburger; presenting their work on Deep Inference falls beyond the scope of this chapter on pomset logic.
8 Grammatical use Relations like dicographs have pleasant algebraic properties, but when it comes to combining trees as in grammatical derivations, it is better to view
334
Christian Retor´ e
the trees in order to have some intuition. So we first present proof nets with links before defining a grammatical formalism.
8.1 Proof nets with links In order to define a grammar of pomset proof nets, it is easier to use proof nets with links (the links have been presented in Figure 3 which look like standard proof nets: the formula trees of the conclusions T1 , . . . , Tn with binary connectives (, ⊗, ) and axioms linking dual atoms, together with an sp partial order on the conclusions T1 , . . . , Tn ). It is quite easy to turn a dicog-PN proof net into a pomset proof net using folding of Subsection 3.4 — and vice-versa using unfolding. A dicograph proof symbol — net π = (B, R) with R being S[T1 , . . . , Tn ] with S containing no ⊗ S is an sp order — corresponds to a pomset proof net π SP with conclusions T1f , . . . , Tnf where Tif is the formula corresponding to Ti obtained by replacing an operation on dicograph ∗ with the corresponding multiplicative connective ∗ ∈ {⊗, , }. There usually are many ways to write a dicograph R as a term S[T1 , . . . , Tn ] depending on the associativity of ⊗, , , commutativity of ⊗, and and the n may vary when the outer most are turned into and connectives or not (as it is the case for in usual proof nets for MLL). In case π necessarily has a single conclusion, the outer most connective of R is ⊗, R = T1 , and S is the trivial sp order on one formula. The transformation from π to π SP can be done “little by little” by allowing “intermediate” proof structures whose conclusion is a dicograph of formulas. Such a proof structure is said to be correct whenever every æ circuit contains a chord, the formula trees being bicoloured as in Figure 3 — in figure 14 π1 is the dicog-PN proof net, while π4 is a pomset proof net with links having a single conclusion. Let π = (B, D[F1 , . . . , Fp ]) with D a dicograph on the formulas F1 , . . . , Fn be an intermediate proof structure. A folding of π is a simply a folding of D[F1 , . . . , Fp ] as defined in Subsection 3.4 (two equivalent formulas Fi ∗Fj are replaced in D by one formula Fi ∗ Fj ). An unfolding of π is simply an unfolding of D[F1 , . . . , Fp ] as defined in Subsection 3.4 (a formula Fi ∗ Fj is replaced by two equivalent formulas Fi ∗Fj ). Proposition 13 Let πf and πu be two intermediate proof structures, with πu being an unfolding of πf — or πf being a folding of πu . The two following properties are equivalent: • πu is correct. • πf is correct. Proof This proof consists in a thorough examination of new ae circuits that may appear during the transformation and of the edges that are chords and that may vanish during the transformation [46].
Pomset Logic: the other approach to noncommutativity in logic
335
β⊥
γ⊥
β
γ
α
α⊥
(a) π1 γ⊥
β `
γ
β⊥
β ` β⊥
α
α⊥
(b) π2 γ⊥ γ
β `
`
β⊥
β ` β⊥
γ ` γ⊥
α ` α⊥
` α
α⊥
(c) π3 γ⊥ γ
β `
`
β⊥
⊗ ` α⊥
α
((α ` α⊥ ) ⊗ (γ ` γ ⊥ )) (β ` β ⊥ )
(d) π4 Fig. 14 Folding a dicograph proof net into an sp proof net step by step (π1 , π2 , π3 , π4 ) — the conclusions are the black vertices.
336
Christian Retor´ e
We now can give again the definition of pomset proof nets with links that appear in [36, 42]: Definition 9 A pomset proof structure with links is defined as a combination of links: every conclusion of a link is the premise of at most one link, and every premise of a link is the conclusion of exactly one link. Formulas that are not the premise of any link are called conclusions of the proof net. Conclusions are connected with an R sp-order. Because of the shape of the RnB links the Criterion 1, every æ circuit contains a chord becomes simpler: Correctness criterion A pomset proof structure with links is correct whenever there is no ae circuit. A possible variant for pomset proof structures with links consists in replacing axioms that are a single B edge (cf. the links in Figure 3) with a sequence of a B edge, an R symmetric edge and a B edge: additional R edge a
a⊥
This R edge, which is not incident with any other R edge, does not change anything to the æ paths and circuits, nor to the correctness of the proof structure (the non-trivial sense is proved in Proposition 11), but that way any B edge corresponds to a formula (while in a handsome proof net, every vertex corresponds to an atom).
8.2 Grammars with partial proof nets In the 1990s, Lecomte was aiming at extensions of the Lambek grammars that would handle relatively free word orders, discontinuous constituents and other tricky linguistic phenomena, but still within a logical framework — as opposed to CCG, which extends AB grammars with ad hoc rewriting rules whose logical content is unclear. Grammars defined within a logical framework have at least two advantages: rules remain general and the connection with semantics, logical formulas and lambda terms is a priori more transparent. Following a suggestion by Jean-Yves Girard, Alain Lecomte contacted me just after I passed my PhD on pomset logic, so we proposed a kind of grammar with pomset logic. We explored such a possibility in [21, 24, 22, 23] and it was later improved by Sylvain Pogodalla in [31] (see also [47]). We followed two guidelines: • words are associated not with formulas but with partial proof nets with a tree-like structure, in particular they have a single output;
Pomset Logic: the other approach to noncommutativity in logic
337
• word order is a partial order, an sp order described by the occurrences of the connective in the proof net. An analysis or parse structure is a combination of the partial proof nets into a complete proof net with output S. The two ways to combine partial proof nets are by “plugging” an hypothesis to the conclusion of another partial proof net, and to perform cuts between partial proof nets. Given that words label axioms, instead of having a single B edge from a−a⊥ we write a sequence of three edges, a B edge, an R edge, a B edge, the middle one being labelled with the word word a
a⊥
This little variant changes nothing regarding the correctness of the proof net in terms of æ paths. Rather than lengthy explanations, let us give two examples of a grammatical derivation in this framework. One may notice in the examples that the partial pomset proof nets that we use in the lexicon are of a restricted form: • there are just two conclusions: – the output b which is the syntactic category of the resulting phrase once the required “arguments” have been provided; – a conclusion a⊥ (X1 ⊗ Y1 ) · · ·(Xn ⊗ Yn ) without any ⊗ connective in the Xi ; • an axiom connects a⊥ in the conclusion with an a in one of the Xi — with the corresponding word the label of a; In a first version we defined from the proof net an order between atoms (hence words) by “there exists a directed path” from a to b. However, it is more convenient, in particular from a computational point of view, to label the proof net with sp orders of words. Doing so is a computational improvement but those labels are fully determined by the proof net; they contain no additional information. Here are the labelling rules: • Initialisation: – a⊥ is labelled with the one point sp order consisting of the corresponding word; – Xi ⊗ Yi is labelled with an empty sp order. • Propagation: – The two conclusions of a given axiom have the same label; – One of the two premises of a tensor link is labelled with the sp order the other by R and the conclusion by S; RS when the two premises are – The conclusion of a par link is labelled RS labelled R and S;
338
Christian Retor´ e
– The conclusion of a link is labelled R S when the two premises are labelled R and S. The propagation rules always succeed because of the correctness criterion and the tree-like structure of the partial proof nets. The propagation rules yield a complete labelling of the proof net and the sp order that labels the output S is the partial order over words. Pierre Pierre np ⊥
np Marie
Marie np ⊥
np chanter
chanter vinf ⊥
vinf
vinf np ⊗ entend v
entend
np
⊗
v⊥
S⊥
S `
v ⊥ ` ((np v (np ⊗ v)) ⊗ S ⊥ ) Table 1 A lexicon with partial pomset proof nets
Pomset Logic: the other approach to noncommutativity in logic
339
entendre
Marie
chanter vinf
np ⊥
vinf ⊥
np ⊗ entend v
Pierre
np ⊥
np
⊗
v⊥
S⊥
S `
v ⊥ ` ((np v (np ⊗ v)) ⊗ S ⊥ )
Fig. 15 Analysis of a relatively free word order sentence — order P ierre entend (M ariechanter)
We give an example of a lexicon of an analysis of a relatively free word order phenomenon in French — the lexicon is in Table 1 and the analysis in Figure 15. One can say both “Pierre entend Marie chanter” (Pierre hears Mary singing) and “Pierre entend chanter Marie” (Pierre hears singing Mary). Indeed when there is no object, French accepts that the subject is after the verb, in the given example as well as in the relatives introduced by the relative pronoun “que/whom”: “Pierre que regarde Marie chante” (Pierre that Mary watches sings) and “Pierre que Marie regarde chante” (Pierre that
340
Christian Retor´ e
Marie watches sings). Observe that there is a single analysis for the different possible word orders and not a different analysis for each word order. Using cuts one is able, in addition to free word order phenomena, to provide an account of discontinuous constituents, e.g., French negation “ne . . . pas”. During cut-elimination, the label splits into two parts, so “ne” and “pas” go to their proper places, as shown in Figure 16. When cut is used, one may allow that incorrect lexical partial proof nets are associated with words, but after cut-elimination the result, i.e., the linguistic analysis, ought to be a fully correct proof net. It is difficult to say something on the generative capacity of this grammatical formalism because it produces (or recognises) sp order of words and not chains of words — and there are not so many such grammatical formalisms, an exception being [26]. Theorem 12 (Pogodalla) Pomset grammars with a restricted form for partial pomset proof nets yielding trees and total word orders are equivalent to Lexicalised Tree Adjoining Grammars [31]. These are much more than languages that can be generated by Lambek grammars, that are context free. In both cases, parsing as proof search is NP complete – trying all the possibilities in pomset grammar is in NP (and likely to be NP complete), and provability for Lambek calculus has been shown to be NP complete [30] – of course if the Lambek grammar is converted into an extremely large context free grammar using the result of Matti Pentus [29], parsing of Lambek grammars is polynomial, cubic or better in the number of words in the sentence. Especially when using cuts and tree-like partial proof nets, this calculus is close to several codings of LTAG in noncommutative linear logic ` a la LambekAbrusci [2].12
9 Conclusion and perspective We presented an overview of pomset logic with both published and unpublished results. Pomset logic is a variant of linear logic, as the Lambek calculus is, and it can be used for modelling grammar, in particular for natural language as the Lambek calculus can. Apart from this, as said in the introduction, Lambek calculus and pomset logic are quite different, although they are both noncommutative variants of (multiplicative) linear logic. Lambek calculus is a noncommutative restriction of intuitionistic linear logic, while pomset logic is a noncommutative extension of classical linear logic. 12
The related work [16], which also encodes TAGs in noncommutative linear logic ` a la Lambek-Abrusci, presented with natural deduction, requires ad hoc extensions of the noncommutative linear logic like some crossing of the axioms which are excluded from those Lambek-Abrusci logics [51, 39].
Pomset Logic: the other approach to noncommutativity in logic
341
regarde
Y : neg regarde
X regarde : v
np[obj ]
ne. . . pas np
ne, pas ne : neg ⊥
X : neg ⊥
W : neg `
⊗
pas : neg
ne : neg ⊥ ` pas : neg
S⊥
⊗
S
⊗
`
Cut
(a)
X regarde : v ⊥
`
Z : neg ⊥
(X : neg ⊥ ` X regarde : v ⊥ ) ` ((np v Y : neg np[obj ]) ⊗ S ⊥ )
The proof net made from the partial proof nets NE. . . PAS (discontinuous constituent) and from the partial proof net REGARDE, before cut-elimination. ne, pas
regarde
Y : neg
ne regarde : v
np[obj ]
np ne : neg ⊥
`
ne regarde : v ⊥
⊗
S⊥ S
`
(ne : neg ⊥ ` ne regarde : v ⊥ ) ` ((np v pas : neg np[obj ]) ⊗ S ⊥ ) The proof net analysing NE REGARDE PAS, after reduction, the three words are in the (b) proper order. Fig. 16 Handling discontinuous constituents in pomset proof nets
342
Christian Retor´ e
But perhaps the resemblance is more abstract than that. Indeed Lambek was surprised that with proof nets people intend to replace a syntactic calculus, an algebraic structure, with graphical or geometrical objects. However, for pomset logic, the best presentation is certainly the calculus of dicographs, which can be viewed as terms, and therefore belong to algebra. It is not surprising that Lambek preferred my algebraic correctness criterion that uses coherence spaces [37, 43, 47] (Theorem 6 in this paper), to the double trip condition of Girard [8]. This presentation is by no means the necrology of pomset logic. For instance, Slavnov recently proposed a sequent calculus which is complete w.r.t. pomset proof nets [54]. In his sequent calculus, multisets of formulas are endowed with binary relations on sequences of n conclusions, and “before” is, with respect to Slavnov’s calculus, a collapse of two connectives, namely, a before which behaves like a “times” ⊗ and a before which behaves like a “par” . Straßburger contributed to pomset logic with the counterexample in the paper, but his major contributions to pomset logic are the development of Deep Inference (including SBV that we discussed in this paper) and the comparison between pomset logic defined with handsome proof nets, Deep Inference [14], proofs without syntax [15]. Straßburger is presently exploring new ideas on applications of pomset logic to safety and privacy together with Horne. Guglielmi (in e.g. [13]) is tuning the syntactic rules of a selfdual modality